* [RFC PATCH] riscv: Optimize gcd() performance by selecting CPU_NO_EFFICIENT_FFS @ 2025-02-17 1:37 Kuan-Wei Chiu 2025-03-28 14:07 ` Alexandre Ghiti 0 siblings, 1 reply; 4+ messages in thread From: Kuan-Wei Chiu @ 2025-02-17 1:37 UTC (permalink / raw) To: paul.walmsley, palmer, aou Cc: jserv, eleanor15x, linux-riscv, linux-kernel, Kuan-Wei Chiu When the Zbb extension is not supported, ffs() falls back to a software implementation instead of leveraging the hardware ctz instruction for fast computation. In such cases, selecting CPU_NO_EFFICIENT_FFS optimizes the efficiency of gcd(). The implementation of gcd() depends on the CPU_NO_EFFICIENT_FFS option. With hardware support for ffs, the binary GCD algorithm is used. Without it, the odd-even GCD algorithm is employed for better performance. Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> --- Although selecting NO_EFFICIENT_FFS seems reasonable without ctz instructions, this patch hasn't been tested on real hardware. We'd greatly appreciate it if someone could help test and provide performance numbers! arch/riscv/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 7612c52e9b1e..2dd3699ad09b 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -91,6 +91,7 @@ config RISCV select CLINT_TIMER if RISCV_M_MODE select CLONE_BACKWARDS select COMMON_CLK + select CPU_NO_EFFICIENT_FFS if !RISCV_ISA_ZBB select CPU_PM if CPU_IDLE || HIBERNATION || SUSPEND select EDAC_SUPPORT select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && !DYNAMIC_FTRACE) -- 2.34.1 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] riscv: Optimize gcd() performance by selecting CPU_NO_EFFICIENT_FFS 2025-02-17 1:37 [RFC PATCH] riscv: Optimize gcd() performance by selecting CPU_NO_EFFICIENT_FFS Kuan-Wei Chiu @ 2025-03-28 14:07 ` Alexandre Ghiti 2025-04-04 13:54 ` Kuan-Wei Chiu 0 siblings, 1 reply; 4+ messages in thread From: Alexandre Ghiti @ 2025-03-28 14:07 UTC (permalink / raw) To: Kuan-Wei Chiu, paul.walmsley, palmer, aou Cc: jserv, eleanor15x, linux-riscv, linux-kernel Hi Kuan-Wei, First sorry for the late review. On 17/02/2025 02:37, Kuan-Wei Chiu wrote: > When the Zbb extension is not supported, ffs() falls back to a software > implementation instead of leveraging the hardware ctz instruction for > fast computation. In such cases, selecting CPU_NO_EFFICIENT_FFS > optimizes the efficiency of gcd(). > > The implementation of gcd() depends on the CPU_NO_EFFICIENT_FFS option. > With hardware support for ffs, the binary GCD algorithm is used. > Without it, the odd-even GCD algorithm is employed for better > performance. > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > --- > Although selecting NO_EFFICIENT_FFS seems reasonable without ctz > instructions, this patch hasn't been tested on real hardware. We'd > greatly appreciate it if someone could help test and provide > performance numbers! > > arch/riscv/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index 7612c52e9b1e..2dd3699ad09b 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -91,6 +91,7 @@ config RISCV > select CLINT_TIMER if RISCV_M_MODE > select CLONE_BACKWARDS > select COMMON_CLK > + select CPU_NO_EFFICIENT_FFS if !RISCV_ISA_ZBB > select CPU_PM if CPU_IDLE || HIBERNATION || SUSPEND > select EDAC_SUPPORT > select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && !DYNAMIC_FTRACE) So your patch is correct. But a kernel built with RISCV_ISA_ZBB does not mean the platform supports zbb and in that case, we'd still use the slow version of gcd(). Then I would use static keys instead, can you try to come up with a patch that does that? Thanks, Alex _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] riscv: Optimize gcd() performance by selecting CPU_NO_EFFICIENT_FFS 2025-03-28 14:07 ` Alexandre Ghiti @ 2025-04-04 13:54 ` Kuan-Wei Chiu 2025-04-23 6:57 ` Alexandre Ghiti 0 siblings, 1 reply; 4+ messages in thread From: Kuan-Wei Chiu @ 2025-04-04 13:54 UTC (permalink / raw) To: Alexandre Ghiti Cc: paul.walmsley, palmer, aou, jserv, eleanor15x, linux-riscv, linux-kernel, Andrew Morton +Cc Andrew, since this might touch lib/math/gcd.c On Fri, Mar 28, 2025 at 03:07:36PM +0100, Alexandre Ghiti wrote: > Hi Kuan-Wei, > > First sorry for the late review. > > On 17/02/2025 02:37, Kuan-Wei Chiu wrote: > > When the Zbb extension is not supported, ffs() falls back to a software > > implementation instead of leveraging the hardware ctz instruction for > > fast computation. In such cases, selecting CPU_NO_EFFICIENT_FFS > > optimizes the efficiency of gcd(). > > > > The implementation of gcd() depends on the CPU_NO_EFFICIENT_FFS option. > > With hardware support for ffs, the binary GCD algorithm is used. > > Without it, the odd-even GCD algorithm is employed for better > > performance. > > > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > --- > > Although selecting NO_EFFICIENT_FFS seems reasonable without ctz > > instructions, this patch hasn't been tested on real hardware. We'd > > greatly appreciate it if someone could help test and provide > > performance numbers! > > > > arch/riscv/Kconfig | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > index 7612c52e9b1e..2dd3699ad09b 100644 > > --- a/arch/riscv/Kconfig > > +++ b/arch/riscv/Kconfig > > @@ -91,6 +91,7 @@ config RISCV > > select CLINT_TIMER if RISCV_M_MODE > > select CLONE_BACKWARDS > > select COMMON_CLK > > + select CPU_NO_EFFICIENT_FFS if !RISCV_ISA_ZBB > > select CPU_PM if CPU_IDLE || HIBERNATION || SUSPEND > > select EDAC_SUPPORT > > select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && !DYNAMIC_FTRACE) > > > So your patch is correct. But a kernel built with RISCV_ISA_ZBB does not > mean the platform supports zbb and in that case, we'd still use the slow > version of gcd(). > > Then I would use static keys instead, can you try to come up with a patch > that does that? > Here's my current thought: I'd like to add a static key named efficient_ffs_key in gcd.c, and possibly call static_branch_disable(&efficient_ffs_key) somewhere under arch/riscv/ when RISCV_ISA_ZBB is enabled but the Zbb extension is not detected at runtime. However, I'm new to the RISC-V kernel code and not sure where would be the most appropriate place to insert that static_branch_disable() call. Suggestions are very welcome! Below is the diff for context. Regards, Kuan-Wei diff --git a/lib/math/gcd.c b/lib/math/gcd.c index e3b042214d1b..514b8a86b461 100644 --- a/lib/math/gcd.c +++ b/lib/math/gcd.c @@ -2,6 +2,7 @@ #include <linux/kernel.h> #include <linux/gcd.h> #include <linux/export.h> +#include <linux/jump_label.h> /* * This implements the binary GCD algorithm. (Often attributed to Stein, @@ -11,6 +12,8 @@ * has decent hardware division. */ +DEFINE_STATIC_KEY_TRUE(efficient_ffs_key); + #if !defined(CONFIG_CPU_NO_EFFICIENT_FFS) /* If __ffs is available, the even/odd algorithm benchmarks slower. */ @@ -20,7 +23,7 @@ * @a: first value * @b: second value */ -unsigned long gcd(unsigned long a, unsigned long b) +static unsigned long gcd_binary(unsigned long a, unsigned long b) { unsigned long r = a | b; @@ -44,7 +47,7 @@ unsigned long gcd(unsigned long a, unsigned long b) } } -#else +#endif /* If normalization is done by loops, the even/odd algorithm is a win. */ unsigned long gcd(unsigned long a, unsigned long b) @@ -54,6 +57,11 @@ unsigned long gcd(unsigned long a, unsigned long b) if (!a || !b) return r; +#if !defined(CONFIG_CPU_NO_EFFICIENT_FFS) + if (static_branch_likely(&efficient_ffs_key)) + return binary_gcd(a, b); +#endif + /* Isolate lsbit of r */ r &= -r; @@ -80,6 +88,4 @@ unsigned long gcd(unsigned long a, unsigned long b) } } -#endif - EXPORT_SYMBOL_GPL(gcd); _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] riscv: Optimize gcd() performance by selecting CPU_NO_EFFICIENT_FFS 2025-04-04 13:54 ` Kuan-Wei Chiu @ 2025-04-23 6:57 ` Alexandre Ghiti 0 siblings, 0 replies; 4+ messages in thread From: Alexandre Ghiti @ 2025-04-23 6:57 UTC (permalink / raw) To: Kuan-Wei Chiu Cc: paul.walmsley, palmer, aou, jserv, eleanor15x, linux-riscv, linux-kernel, Andrew Morton Hi Kuan-Wei, On 04/04/2025 15:54, Kuan-Wei Chiu wrote: > +Cc Andrew, since this might touch lib/math/gcd.c > > On Fri, Mar 28, 2025 at 03:07:36PM +0100, Alexandre Ghiti wrote: >> Hi Kuan-Wei, >> >> First sorry for the late review. >> >> On 17/02/2025 02:37, Kuan-Wei Chiu wrote: >>> When the Zbb extension is not supported, ffs() falls back to a software >>> implementation instead of leveraging the hardware ctz instruction for >>> fast computation. In such cases, selecting CPU_NO_EFFICIENT_FFS >>> optimizes the efficiency of gcd(). >>> >>> The implementation of gcd() depends on the CPU_NO_EFFICIENT_FFS option. >>> With hardware support for ffs, the binary GCD algorithm is used. >>> Without it, the odd-even GCD algorithm is employed for better >>> performance. >>> >>> Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> >>> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> >>> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> >>> --- >>> Although selecting NO_EFFICIENT_FFS seems reasonable without ctz >>> instructions, this patch hasn't been tested on real hardware. We'd >>> greatly appreciate it if someone could help test and provide >>> performance numbers! >>> >>> arch/riscv/Kconfig | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig >>> index 7612c52e9b1e..2dd3699ad09b 100644 >>> --- a/arch/riscv/Kconfig >>> +++ b/arch/riscv/Kconfig >>> @@ -91,6 +91,7 @@ config RISCV >>> select CLINT_TIMER if RISCV_M_MODE >>> select CLONE_BACKWARDS >>> select COMMON_CLK >>> + select CPU_NO_EFFICIENT_FFS if !RISCV_ISA_ZBB >>> select CPU_PM if CPU_IDLE || HIBERNATION || SUSPEND >>> select EDAC_SUPPORT >>> select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && !DYNAMIC_FTRACE) >> >> So your patch is correct. But a kernel built with RISCV_ISA_ZBB does not >> mean the platform supports zbb and in that case, we'd still use the slow >> version of gcd(). >> >> Then I would use static keys instead, can you try to come up with a patch >> that does that? >> > Here's my current thought: I'd like to add a static key named > efficient_ffs_key in gcd.c, and possibly call > static_branch_disable(&efficient_ffs_key) somewhere under arch/riscv/ > when RISCV_ISA_ZBB is enabled but the Zbb extension is not detected at > runtime. > > However, I'm new to the RISC-V kernel code and not sure where would be > the most appropriate place to insert that static_branch_disable() call. > Suggestions are very welcome! Sorry for the late answer, I missed your message. So we put all of those initializations that depend on the discovery of extensions at the end of setup_arch() (https://elixir.bootlin.com/linux/v6.14.3/source/arch/riscv/kernel/setup.c#L334). Thanks, Alex > > Below is the diff for context. > > Regards, > Kuan-Wei > > diff --git a/lib/math/gcd.c b/lib/math/gcd.c > index e3b042214d1b..514b8a86b461 100644 > --- a/lib/math/gcd.c > +++ b/lib/math/gcd.c > @@ -2,6 +2,7 @@ > #include <linux/kernel.h> > #include <linux/gcd.h> > #include <linux/export.h> > +#include <linux/jump_label.h> > > /* > * This implements the binary GCD algorithm. (Often attributed to Stein, > @@ -11,6 +12,8 @@ > * has decent hardware division. > */ > > +DEFINE_STATIC_KEY_TRUE(efficient_ffs_key); > + > #if !defined(CONFIG_CPU_NO_EFFICIENT_FFS) > > /* If __ffs is available, the even/odd algorithm benchmarks slower. */ > @@ -20,7 +23,7 @@ > * @a: first value > * @b: second value > */ > -unsigned long gcd(unsigned long a, unsigned long b) > +static unsigned long gcd_binary(unsigned long a, unsigned long b) > { > unsigned long r = a | b; > > @@ -44,7 +47,7 @@ unsigned long gcd(unsigned long a, unsigned long b) > } > } > > -#else > +#endif > > /* If normalization is done by loops, the even/odd algorithm is a win. */ > unsigned long gcd(unsigned long a, unsigned long b) > @@ -54,6 +57,11 @@ unsigned long gcd(unsigned long a, unsigned long b) > if (!a || !b) > return r; > > +#if !defined(CONFIG_CPU_NO_EFFICIENT_FFS) > + if (static_branch_likely(&efficient_ffs_key)) > + return binary_gcd(a, b); > +#endif > + > /* Isolate lsbit of r */ > r &= -r; > > @@ -80,6 +88,4 @@ unsigned long gcd(unsigned long a, unsigned long b) > } > } > > -#endif > - > EXPORT_SYMBOL_GPL(gcd); _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-04-23 7:07 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-17 1:37 [RFC PATCH] riscv: Optimize gcd() performance by selecting CPU_NO_EFFICIENT_FFS Kuan-Wei Chiu 2025-03-28 14:07 ` Alexandre Ghiti 2025-04-04 13:54 ` Kuan-Wei Chiu 2025-04-23 6:57 ` Alexandre Ghiti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox