* [RFC PATCH] riscv: Optimize gcd() performance by selecting CPU_NO_EFFICIENT_FFS
@ 2025-02-17 1:37 Kuan-Wei Chiu
2025-03-28 14:07 ` Alexandre Ghiti
0 siblings, 1 reply; 4+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-17 1:37 UTC (permalink / raw)
To: paul.walmsley, palmer, aou
Cc: jserv, eleanor15x, linux-riscv, linux-kernel, Kuan-Wei Chiu
When the Zbb extension is not supported, ffs() falls back to a software
implementation instead of leveraging the hardware ctz instruction for
fast computation. In such cases, selecting CPU_NO_EFFICIENT_FFS
optimizes the efficiency of gcd().
The implementation of gcd() depends on the CPU_NO_EFFICIENT_FFS option.
With hardware support for ffs, the binary GCD algorithm is used.
Without it, the odd-even GCD algorithm is employed for better
performance.
Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
Although selecting NO_EFFICIENT_FFS seems reasonable without ctz
instructions, this patch hasn't been tested on real hardware. We'd
greatly appreciate it if someone could help test and provide
performance numbers!
arch/riscv/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 7612c52e9b1e..2dd3699ad09b 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -91,6 +91,7 @@ config RISCV
select CLINT_TIMER if RISCV_M_MODE
select CLONE_BACKWARDS
select COMMON_CLK
+ select CPU_NO_EFFICIENT_FFS if !RISCV_ISA_ZBB
select CPU_PM if CPU_IDLE || HIBERNATION || SUSPEND
select EDAC_SUPPORT
select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && !DYNAMIC_FTRACE)
--
2.34.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] riscv: Optimize gcd() performance by selecting CPU_NO_EFFICIENT_FFS
2025-02-17 1:37 [RFC PATCH] riscv: Optimize gcd() performance by selecting CPU_NO_EFFICIENT_FFS Kuan-Wei Chiu
@ 2025-03-28 14:07 ` Alexandre Ghiti
2025-04-04 13:54 ` Kuan-Wei Chiu
0 siblings, 1 reply; 4+ messages in thread
From: Alexandre Ghiti @ 2025-03-28 14:07 UTC (permalink / raw)
To: Kuan-Wei Chiu, paul.walmsley, palmer, aou
Cc: jserv, eleanor15x, linux-riscv, linux-kernel
Hi Kuan-Wei,
First sorry for the late review.
On 17/02/2025 02:37, Kuan-Wei Chiu wrote:
> When the Zbb extension is not supported, ffs() falls back to a software
> implementation instead of leveraging the hardware ctz instruction for
> fast computation. In such cases, selecting CPU_NO_EFFICIENT_FFS
> optimizes the efficiency of gcd().
>
> The implementation of gcd() depends on the CPU_NO_EFFICIENT_FFS option.
> With hardware support for ffs, the binary GCD algorithm is used.
> Without it, the odd-even GCD algorithm is employed for better
> performance.
>
> Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> ---
> Although selecting NO_EFFICIENT_FFS seems reasonable without ctz
> instructions, this patch hasn't been tested on real hardware. We'd
> greatly appreciate it if someone could help test and provide
> performance numbers!
>
> arch/riscv/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 7612c52e9b1e..2dd3699ad09b 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -91,6 +91,7 @@ config RISCV
> select CLINT_TIMER if RISCV_M_MODE
> select CLONE_BACKWARDS
> select COMMON_CLK
> + select CPU_NO_EFFICIENT_FFS if !RISCV_ISA_ZBB
> select CPU_PM if CPU_IDLE || HIBERNATION || SUSPEND
> select EDAC_SUPPORT
> select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && !DYNAMIC_FTRACE)
So your patch is correct. But a kernel built with RISCV_ISA_ZBB does not
mean the platform supports zbb and in that case, we'd still use the slow
version of gcd().
Then I would use static keys instead, can you try to come up with a
patch that does that?
Thanks,
Alex
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] riscv: Optimize gcd() performance by selecting CPU_NO_EFFICIENT_FFS
2025-03-28 14:07 ` Alexandre Ghiti
@ 2025-04-04 13:54 ` Kuan-Wei Chiu
2025-04-23 6:57 ` Alexandre Ghiti
0 siblings, 1 reply; 4+ messages in thread
From: Kuan-Wei Chiu @ 2025-04-04 13:54 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: paul.walmsley, palmer, aou, jserv, eleanor15x, linux-riscv,
linux-kernel, Andrew Morton
+Cc Andrew, since this might touch lib/math/gcd.c
On Fri, Mar 28, 2025 at 03:07:36PM +0100, Alexandre Ghiti wrote:
> Hi Kuan-Wei,
>
> First sorry for the late review.
>
> On 17/02/2025 02:37, Kuan-Wei Chiu wrote:
> > When the Zbb extension is not supported, ffs() falls back to a software
> > implementation instead of leveraging the hardware ctz instruction for
> > fast computation. In such cases, selecting CPU_NO_EFFICIENT_FFS
> > optimizes the efficiency of gcd().
> >
> > The implementation of gcd() depends on the CPU_NO_EFFICIENT_FFS option.
> > With hardware support for ffs, the binary GCD algorithm is used.
> > Without it, the odd-even GCD algorithm is employed for better
> > performance.
> >
> > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > ---
> > Although selecting NO_EFFICIENT_FFS seems reasonable without ctz
> > instructions, this patch hasn't been tested on real hardware. We'd
> > greatly appreciate it if someone could help test and provide
> > performance numbers!
> >
> > arch/riscv/Kconfig | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 7612c52e9b1e..2dd3699ad09b 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -91,6 +91,7 @@ config RISCV
> > select CLINT_TIMER if RISCV_M_MODE
> > select CLONE_BACKWARDS
> > select COMMON_CLK
> > + select CPU_NO_EFFICIENT_FFS if !RISCV_ISA_ZBB
> > select CPU_PM if CPU_IDLE || HIBERNATION || SUSPEND
> > select EDAC_SUPPORT
> > select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && !DYNAMIC_FTRACE)
>
>
> So your patch is correct. But a kernel built with RISCV_ISA_ZBB does not
> mean the platform supports zbb and in that case, we'd still use the slow
> version of gcd().
>
> Then I would use static keys instead, can you try to come up with a patch
> that does that?
>
Here's my current thought: I'd like to add a static key named
efficient_ffs_key in gcd.c, and possibly call
static_branch_disable(&efficient_ffs_key) somewhere under arch/riscv/
when RISCV_ISA_ZBB is enabled but the Zbb extension is not detected at
runtime.
However, I'm new to the RISC-V kernel code and not sure where would be
the most appropriate place to insert that static_branch_disable() call.
Suggestions are very welcome!
Below is the diff for context.
Regards,
Kuan-Wei
diff --git a/lib/math/gcd.c b/lib/math/gcd.c
index e3b042214d1b..514b8a86b461 100644
--- a/lib/math/gcd.c
+++ b/lib/math/gcd.c
@@ -2,6 +2,7 @@
#include <linux/kernel.h>
#include <linux/gcd.h>
#include <linux/export.h>
+#include <linux/jump_label.h>
/*
* This implements the binary GCD algorithm. (Often attributed to Stein,
@@ -11,6 +12,8 @@
* has decent hardware division.
*/
+DEFINE_STATIC_KEY_TRUE(efficient_ffs_key);
+
#if !defined(CONFIG_CPU_NO_EFFICIENT_FFS)
/* If __ffs is available, the even/odd algorithm benchmarks slower. */
@@ -20,7 +23,7 @@
* @a: first value
* @b: second value
*/
-unsigned long gcd(unsigned long a, unsigned long b)
+static unsigned long gcd_binary(unsigned long a, unsigned long b)
{
unsigned long r = a | b;
@@ -44,7 +47,7 @@ unsigned long gcd(unsigned long a, unsigned long b)
}
}
-#else
+#endif
/* If normalization is done by loops, the even/odd algorithm is a win. */
unsigned long gcd(unsigned long a, unsigned long b)
@@ -54,6 +57,11 @@ unsigned long gcd(unsigned long a, unsigned long b)
if (!a || !b)
return r;
+#if !defined(CONFIG_CPU_NO_EFFICIENT_FFS)
+ if (static_branch_likely(&efficient_ffs_key))
+ return binary_gcd(a, b);
+#endif
+
/* Isolate lsbit of r */
r &= -r;
@@ -80,6 +88,4 @@ unsigned long gcd(unsigned long a, unsigned long b)
}
}
-#endif
-
EXPORT_SYMBOL_GPL(gcd);
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] riscv: Optimize gcd() performance by selecting CPU_NO_EFFICIENT_FFS
2025-04-04 13:54 ` Kuan-Wei Chiu
@ 2025-04-23 6:57 ` Alexandre Ghiti
0 siblings, 0 replies; 4+ messages in thread
From: Alexandre Ghiti @ 2025-04-23 6:57 UTC (permalink / raw)
To: Kuan-Wei Chiu
Cc: paul.walmsley, palmer, aou, jserv, eleanor15x, linux-riscv,
linux-kernel, Andrew Morton
Hi Kuan-Wei,
On 04/04/2025 15:54, Kuan-Wei Chiu wrote:
> +Cc Andrew, since this might touch lib/math/gcd.c
>
> On Fri, Mar 28, 2025 at 03:07:36PM +0100, Alexandre Ghiti wrote:
>> Hi Kuan-Wei,
>>
>> First sorry for the late review.
>>
>> On 17/02/2025 02:37, Kuan-Wei Chiu wrote:
>>> When the Zbb extension is not supported, ffs() falls back to a software
>>> implementation instead of leveraging the hardware ctz instruction for
>>> fast computation. In such cases, selecting CPU_NO_EFFICIENT_FFS
>>> optimizes the efficiency of gcd().
>>>
>>> The implementation of gcd() depends on the CPU_NO_EFFICIENT_FFS option.
>>> With hardware support for ffs, the binary GCD algorithm is used.
>>> Without it, the odd-even GCD algorithm is employed for better
>>> performance.
>>>
>>> Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
>>> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
>>> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
>>> ---
>>> Although selecting NO_EFFICIENT_FFS seems reasonable without ctz
>>> instructions, this patch hasn't been tested on real hardware. We'd
>>> greatly appreciate it if someone could help test and provide
>>> performance numbers!
>>>
>>> arch/riscv/Kconfig | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index 7612c52e9b1e..2dd3699ad09b 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -91,6 +91,7 @@ config RISCV
>>> select CLINT_TIMER if RISCV_M_MODE
>>> select CLONE_BACKWARDS
>>> select COMMON_CLK
>>> + select CPU_NO_EFFICIENT_FFS if !RISCV_ISA_ZBB
>>> select CPU_PM if CPU_IDLE || HIBERNATION || SUSPEND
>>> select EDAC_SUPPORT
>>> select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && !DYNAMIC_FTRACE)
>>
>> So your patch is correct. But a kernel built with RISCV_ISA_ZBB does not
>> mean the platform supports zbb and in that case, we'd still use the slow
>> version of gcd().
>>
>> Then I would use static keys instead, can you try to come up with a patch
>> that does that?
>>
> Here's my current thought: I'd like to add a static key named
> efficient_ffs_key in gcd.c, and possibly call
> static_branch_disable(&efficient_ffs_key) somewhere under arch/riscv/
> when RISCV_ISA_ZBB is enabled but the Zbb extension is not detected at
> runtime.
>
> However, I'm new to the RISC-V kernel code and not sure where would be
> the most appropriate place to insert that static_branch_disable() call.
> Suggestions are very welcome!
Sorry for the late answer, I missed your message.
So we put all of those initializations that depend on the discovery of
extensions at the end of setup_arch()
(https://elixir.bootlin.com/linux/v6.14.3/source/arch/riscv/kernel/setup.c#L334).
Thanks,
Alex
>
> Below is the diff for context.
>
> Regards,
> Kuan-Wei
>
> diff --git a/lib/math/gcd.c b/lib/math/gcd.c
> index e3b042214d1b..514b8a86b461 100644
> --- a/lib/math/gcd.c
> +++ b/lib/math/gcd.c
> @@ -2,6 +2,7 @@
> #include <linux/kernel.h>
> #include <linux/gcd.h>
> #include <linux/export.h>
> +#include <linux/jump_label.h>
>
> /*
> * This implements the binary GCD algorithm. (Often attributed to Stein,
> @@ -11,6 +12,8 @@
> * has decent hardware division.
> */
>
> +DEFINE_STATIC_KEY_TRUE(efficient_ffs_key);
> +
> #if !defined(CONFIG_CPU_NO_EFFICIENT_FFS)
>
> /* If __ffs is available, the even/odd algorithm benchmarks slower. */
> @@ -20,7 +23,7 @@
> * @a: first value
> * @b: second value
> */
> -unsigned long gcd(unsigned long a, unsigned long b)
> +static unsigned long gcd_binary(unsigned long a, unsigned long b)
> {
> unsigned long r = a | b;
>
> @@ -44,7 +47,7 @@ unsigned long gcd(unsigned long a, unsigned long b)
> }
> }
>
> -#else
> +#endif
>
> /* If normalization is done by loops, the even/odd algorithm is a win. */
> unsigned long gcd(unsigned long a, unsigned long b)
> @@ -54,6 +57,11 @@ unsigned long gcd(unsigned long a, unsigned long b)
> if (!a || !b)
> return r;
>
> +#if !defined(CONFIG_CPU_NO_EFFICIENT_FFS)
> + if (static_branch_likely(&efficient_ffs_key))
> + return binary_gcd(a, b);
> +#endif
> +
> /* Isolate lsbit of r */
> r &= -r;
>
> @@ -80,6 +88,4 @@ unsigned long gcd(unsigned long a, unsigned long b)
> }
> }
>
> -#endif
> -
> EXPORT_SYMBOL_GPL(gcd);
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-04-23 7:07 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-17 1:37 [RFC PATCH] riscv: Optimize gcd() performance by selecting CPU_NO_EFFICIENT_FFS Kuan-Wei Chiu
2025-03-28 14:07 ` Alexandre Ghiti
2025-04-04 13:54 ` Kuan-Wei Chiu
2025-04-23 6:57 ` Alexandre Ghiti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox