* [PATCH 1/1] GCD: add binary GCD algorithm
@ 2014-08-15 12:49 Zhaoxiu Zeng
2014-08-15 13:58 ` Peter Zijlstra
2014-08-22 21:31 ` Andrew Morton
0 siblings, 2 replies; 8+ messages in thread
From: Zhaoxiu Zeng @ 2014-08-15 12:49 UTC (permalink / raw)
To: Andrew Morton, Ingo Molnar, George Spelvin, David Howells,
Peter Zijlstra, AKASHI Takahiro, Josh Boyer
Cc: linux-kernel, Zhaoxiu Zeng
Because some architectures (alpha, armv6, etc.) don't provide hardware division,
the mod operation is slow! Binary GCD algorithm uses simple arithmetic operations,
it replaces division with arithmetic shifts, comparisons, and subtraction.
Signed-off-by: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
---
lib/Kconfig | 15 +++++++++++++++
lib/gcd.c | 31 ++++++++++++++++++++++++++++++-
2 files changed, 45 insertions(+), 1 deletion(-)
diff --git a/lib/Kconfig b/lib/Kconfig
index a5ce0c7..80e8e54 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -177,6 +177,21 @@ config CRC8
when they need to do cyclic redundancy check according CRC8
algorithm. Module will be called crc8.
+#
+# GCD
+#
+choice
+ prompt "GCD implementation"
+ default GCD_ALGO_EUCLIDEAN
+
+config GCD_ALGO_EUCLIDEAN
+ bool "Euclidean algorithm"
+
+config GCD_ALGO_BINARY
+ bool "Binary GCD algorithm (Stein's algorithm)"
+
+endchoice
+
config AUDIT_GENERIC
bool
depends on AUDIT && !AUDIT_ARCH
diff --git a/lib/gcd.c b/lib/gcd.c
index 3657f12..911ec7f 100644
--- a/lib/gcd.c
+++ b/lib/gcd.c
@@ -6,7 +6,7 @@
unsigned long gcd(unsigned long a, unsigned long b)
{
unsigned long r;
-
+#ifdef CONFIG_GCD_ALGO_EUCLIDEAN
if (a < b)
swap(a, b);
@@ -16,6 +16,35 @@ unsigned long gcd(unsigned long a, unsigned long b)
a = b;
b = r;
}
+#else
+ r = a | b;
+
+ if (!a || !b)
+ return r;
+
+ r = r ^ (r - 1);
+ if (!(a & r))
+ goto even_odd; /* a/c even, b/c odd */
+ if (!(b & r))
+ goto odd_even; /* a/c odd, b/c even */
+
+ /* a/c and b/c both odd */
+ while (a != b) {
+ if (a > b) {
+ a -= b;
+even_odd:
+ do
+ a >>= 1;
+ while (!(a & r));
+ } else {
+ b -= a;
+odd_even:
+ do
+ b >>= 1;
+ while (!(b & r));
+ }
+ }
+#endif
return b;
}
EXPORT_SYMBOL_GPL(gcd);
--
1.9.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] GCD: add binary GCD algorithm
2014-08-15 12:49 [PATCH 1/1] GCD: add binary GCD algorithm Zhaoxiu Zeng
@ 2014-08-15 13:58 ` Peter Zijlstra
2014-08-15 16:41 ` zhaoxiu.zeng
2014-08-15 20:16 ` George Spelvin
2014-08-22 21:31 ` Andrew Morton
1 sibling, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2014-08-15 13:58 UTC (permalink / raw)
To: Zhaoxiu Zeng
Cc: Andrew Morton, Ingo Molnar, George Spelvin, David Howells,
AKASHI Takahiro, Josh Boyer, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1900 bytes --]
On Fri, Aug 15, 2014 at 08:49:16PM +0800, Zhaoxiu Zeng wrote:
> Because some architectures (alpha, armv6, etc.) don't provide hardware division,
> the mod operation is slow! Binary GCD algorithm uses simple arithmetic operations,
> it replaces division with arithmetic shifts, comparisons, and subtraction.
>
> Signed-off-by: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
> ---
> lib/Kconfig | 15 +++++++++++++++
> lib/gcd.c | 31 ++++++++++++++++++++++++++++++-
> 2 files changed, 45 insertions(+), 1 deletion(-)
>
> diff --git a/lib/Kconfig b/lib/Kconfig
> index a5ce0c7..80e8e54 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -177,6 +177,21 @@ config CRC8
> when they need to do cyclic redundancy check according CRC8
> algorithm. Module will be called crc8.
>
> +#
> +# GCD
> +#
> +choice
> + prompt "GCD implementation"
> + default GCD_ALGO_EUCLIDEAN
> +
> +config GCD_ALGO_EUCLIDEAN
> + bool "Euclidean algorithm"
> +
> +config GCD_ALGO_BINARY
> + bool "Binary GCD algorithm (Stein's algorithm)"
> +
> +endchoice
Does this result in the user being asked which GCD algo he wants? If so,
I think that's bad.
> +#else
> + r = a | b;
> +
> + if (!a || !b)
> + return r;
> +
> + r = r ^ (r - 1);
> + if (!(a & r))
> + goto even_odd; /* a/c even, b/c odd */
> + if (!(b & r))
> + goto odd_even; /* a/c odd, b/c even */
> +
> + /* a/c and b/c both odd */
> + while (a != b) {
> + if (a > b) {
> + a -= b;
> +even_odd:
> + do
> + a >>= 1;
> + while (!(a & r));
> + } else {
> + b -= a;
> +odd_even:
> + do
> + b >>= 1;
> + while (!(b & r));
> + }
> + }
> +#endif
> return b;
> }
So I looked at wikipedia (I wasn't aware of this algorithm, clever
though), and am still somewhat puzzled by your 'r'.
What's 'wrong' with their iterative version, or what's better about your
'r' stuff?
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] GCD: add binary GCD algorithm
2014-08-15 13:58 ` Peter Zijlstra
@ 2014-08-15 16:41 ` zhaoxiu.zeng
2014-08-15 21:53 ` George Spelvin
2014-08-15 20:16 ` George Spelvin
1 sibling, 1 reply; 8+ messages in thread
From: zhaoxiu.zeng @ 2014-08-15 16:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrew Morton, Ingo Molnar, George Spelvin, David Howells,
AKASHI Takahiro, Josh Boyer, linux-kernel
在 2014/8/15 21:58, Peter Zijlstra 写道:
> On Fri, Aug 15, 2014 at 08:49:16PM +0800, Zhaoxiu Zeng wrote:
>> Because some architectures (alpha, armv6, etc.) don't provide hardware division,
>> the mod operation is slow! Binary GCD algorithm uses simple arithmetic operations,
>> it replaces division with arithmetic shifts, comparisons, and subtraction.
>>
>> Signed-off-by: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
>> ---
>> lib/Kconfig | 15 +++++++++++++++
>> lib/gcd.c | 31 ++++++++++++++++++++++++++++++-
>> 2 files changed, 45 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/Kconfig b/lib/Kconfig
>> index a5ce0c7..80e8e54 100644
>> --- a/lib/Kconfig
>> +++ b/lib/Kconfig
>> @@ -177,6 +177,21 @@ config CRC8
>> when they need to do cyclic redundancy check according CRC8
>> algorithm. Module will be called crc8.
>>
>> +#
>> +# GCD
>> +#
>> +choice
>> + prompt "GCD implementation"
>> + default GCD_ALGO_EUCLIDEAN
>> +
>> +config GCD_ALGO_EUCLIDEAN
>> + bool "Euclidean algorithm"
>> +
>> +config GCD_ALGO_BINARY
>> + bool "Binary GCD algorithm (Stein's algorithm)"
>> +
>> +endchoice
>
> Does this result in the user being asked which GCD algo he wants? If so,
> I think that's bad.
>
For the architecture which doesn't provide hardware division, "GCD_ALGO_BINARY"
can be selected in arch/???/Kconfig.
>
>> +#else
>> + r = a | b;
>> +
>> + if (!a || !b)
>> + return r;
>> +
>> + r = r ^ (r - 1);
>> + if (!(a & r))
>> + goto even_odd; /* a/c even, b/c odd */
>> + if (!(b & r))
>> + goto odd_even; /* a/c odd, b/c even */
>> +
>> + /* a/c and b/c both odd */
>> + while (a != b) {
>> + if (a > b) {
>> + a -= b;
>> +even_odd:
>> + do
>> + a >>= 1;
>> + while (!(a & r));
>> + } else {
>> + b -= a;
>> +odd_even:
>> + do
>> + b >>= 1;
>> + while (!(b & r));
>> + }
>> + }
>> +#endif
>> return b;
>> }
>
> So I looked at wikipedia (I wasn't aware of this algorithm, clever
> though), and am still somewhat puzzled by your 'r'.
>
> What's 'wrong' with their iterative version, or what's better about your
> 'r' stuff?
>
If use "r = (r & -r)" to replace "r = r ^ (r - 1)", the function works fine too.
"r = (r & -r)" get the lsb of (a | b), it's the largest power of 2 that divides
a and b. Using r not 1, can avoid pre-shift right and post-shift left.
The result of "r = r ^ (r - 1)" equal to "lsb | (lsb - 1)".
Using "r = r ^ (r - 1)" because it requires fewer instructions than "r = (r & -r)"
on some architectures.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] GCD: add binary GCD algorithm
2014-08-15 13:58 ` Peter Zijlstra
2014-08-15 16:41 ` zhaoxiu.zeng
@ 2014-08-15 20:16 ` George Spelvin
2014-08-15 21:32 ` Peter Zijlstra
1 sibling, 1 reply; 8+ messages in thread
From: George Spelvin @ 2014-08-15 20:16 UTC (permalink / raw)
To: peterz, zhaoxiu.zeng
Cc: akpm, dhowells, jwboyer, linux-kernel, linux, mingo,
takahiro.akashi
> So I looked at wikipedia (I wasn't aware of this algorithm, clever
> though), and am still somewhat puzzled by your 'r'.
>
> What's 'wrong' with their iterative version, or what's better about your
> 'r' stuff?
I need to look more carefully, but it looks nifty.
Basically, it's avoiding the usual issue of counting the lsbits to factor
out a power of 2.
Rather than taking out the common factors of 2 a bit at a time, to
reduce to the case where one is even and one is odd (i.e. ((a ^ b) & 1)
== 1), just find the low-order differing bit and use that instead of
"1" in the algorithm. I.e. ((1 ^ b) & r) == r.
I have to see if there's a nicer way to arrange things, but it looks good.
What I'd like is a better way to automatically configure "divide is
slow" from an architecture.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] GCD: add binary GCD algorithm
2014-08-15 20:16 ` George Spelvin
@ 2014-08-15 21:32 ` Peter Zijlstra
0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2014-08-15 21:32 UTC (permalink / raw)
To: George Spelvin
Cc: zhaoxiu.zeng, akpm, dhowells, jwboyer, linux-kernel, mingo,
takahiro.akashi
[-- Attachment #1: Type: text/plain, Size: 363 bytes --]
On Fri, Aug 15, 2014 at 04:16:01PM -0400, George Spelvin wrote:
> What I'd like is a better way to automatically configure "divide is
> slow" from an architecture.
So the usual thing and have arch/*/Kconfig select the fancy one if it
doesn't have a hardware divide instruction.
For instance:
arch/arm/Kconfig:
config ARM
...
select GCD_FANCY_ALGO if CPU_v6
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] GCD: add binary GCD algorithm
2014-08-15 16:41 ` zhaoxiu.zeng
@ 2014-08-15 21:53 ` George Spelvin
0 siblings, 0 replies; 8+ messages in thread
From: George Spelvin @ 2014-08-15 21:53 UTC (permalink / raw)
To: peterz, zhaoxiu.zeng
Cc: akpm, dhowells, jwboyer, linux-kernel, linux, mingo,
takahiro.akashi
Here's a variant using the even-odd speedup:
(Feel free to add my S-o-b if you use it.)
/*
* This implements Brent & Kung's "even-odd" variant of the binary GCD
* algorithm. (Often attributed to Stein, but as Knith has noted, appears
* a first-century Chinese math text.)
*
* First, find the common powers of 2 from a and b. Here, they are not
* divided by that common power, but rather we form a bitmap r which
* covers the least-significant bit set in either.
* Let s = (r+1)/2 be the common factor of 2.
*
* Then, shift down both a and b until both are odd multiples of s.
* Assuming a > b, then gcd(a, b) = gcd(a-b, b). But a-b is an even
* multiple of s and we can divide it by two at least once.
*
* Further, either a-b or a+b is a multiple of 4s, so by choosing between
* the two we can divde by 4. This is done by (in the case that a > b):
* 1. Compute (a - b) / 2
* 2. If this is an odd multiple of s (AND with r is non-zero), then
* add b to make an even multiiple of s. (This cannot overflow,
* as it equals (a + b) / 2, which must be less than max(a, b).)
* 3. Divide by 2 one or more more times until we are left with an odd
* multiple of r.
* 4. The result replaces a, and we go back to finding the larger.
*
* While the algorothm is equally correct without step2 and the first
* divide by 2, the benefit of adding it it can be evaluated with a CMOV
* rather than a full conditional branch.
*/
unsigned long bgcd2(unsigned long a, unsigned long b)
{
unsigned long r = a | b; /* To find common powers of 2 */
if (!a || !b)
return r;
r ^= r-1; /* Only the msbit of r (r/2 + 1) really matters */
if (!(a & r))
goto a_even;
if (!(b & r))
goto b_even;
while (a != b) {
/* (a & r) == (b & r) == (r >> 1) != 0 */
if (a > b) {
a = (a - b) >> 1;
if (a & r)
a += b;
a_even: do
a >>= 1;
while (!(a & r));
} else {
b = (b - a) >> 1;
if (b & r)
b += a;
b_even: do
b >>= 1;
while (!(b & r));
}
}
return b;
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] GCD: add binary GCD algorithm
2014-08-15 12:49 [PATCH 1/1] GCD: add binary GCD algorithm Zhaoxiu Zeng
2014-08-15 13:58 ` Peter Zijlstra
@ 2014-08-22 21:31 ` Andrew Morton
2014-08-22 22:36 ` George Spelvin
1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2014-08-22 21:31 UTC (permalink / raw)
To: Zhaoxiu Zeng
Cc: Ingo Molnar, George Spelvin, David Howells, Peter Zijlstra,
AKASHI Takahiro, Josh Boyer, linux-kernel
On Fri, 15 Aug 2014 20:49:16 +0800 Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com> wrote:
> Because some architectures (alpha, armv6, etc.) don't provide hardware division,
> the mod operation is slow! Binary GCD algorithm uses simple arithmetic operations,
> it replaces division with arithmetic shifts, comparisons, and subtraction.
I had a look around and it seems that most (all?) gcd() and lcd()
callers are on initialization-time slowpaths.
Do you know of a workload which will significantly benefit from this
change? If so, by how much?
Otherwise I don't think we can justify the additional maintenance
cost/risk, sorry.
And if we *do* decide to proceed with this patch, we should include a
patch which enables it on as many architectures as possible, so it gets
runtime tested.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] GCD: add binary GCD algorithm
2014-08-22 21:31 ` Andrew Morton
@ 2014-08-22 22:36 ` George Spelvin
0 siblings, 0 replies; 8+ messages in thread
From: George Spelvin @ 2014-08-22 22:36 UTC (permalink / raw)
To: akpm, zhaoxiu.zeng
Cc: dhowells, jwboyer, linux-kernel, linux, mingo, peterz,
takahiro.akashi
> Otherwise I don't think we can justify the additional maintenance
> cost/risk, sorry.
This is am extremely self-contained and easy to test piece of code.
Look at the history; the total edits to the function since the
beginning of git in 2005 are:
- A theoretical bug fix to handle zero arguments better
- A global header file cleanup that hit lib/gcd.c
- Relocation to lib/ from sound/core/pcm_timer.c
And... nothing else.
This is the sort of leaf function that, once it works, gets left alone
forever.
Your point about "it's not a bottleneck, so why optimize it?" is valid,
but the maintenance issue is something of a red herring.
The maintenance hassle I *do* worry about the Kconfig magic to decide
whether to enable it. But that's something I'd defer to the ARM
maintainers on.
> And if we *do* decide to proceed with this patch, we should include a
> patch which enables it on as many architectures as possible, so it gets
> runtime tested.
It's pretty harmless even on machines which *do* have fast division,
so we could just enable it everywhere.
GMP, whose benchmarking I'm inclined to trust, uses binary GCD for all
small (3-word or less) values.
https://gmplib.org/manual/Binary-GCD.html
The expected nymber of iterations to perform the Euclidean algorithm
for two random numbers 1 <= x,y <= n is 1.216 log2(n) + 0.06.
Binary GCD is quite similar. Its main advantage is the large gain
in the first step when the inputs are of very different size.
Binary GCD's perfoemance problems come from unpredictable branches.
Euclid's algorithm has less of those.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-08-22 22:36 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-15 12:49 [PATCH 1/1] GCD: add binary GCD algorithm Zhaoxiu Zeng
2014-08-15 13:58 ` Peter Zijlstra
2014-08-15 16:41 ` zhaoxiu.zeng
2014-08-15 21:53 ` George Spelvin
2014-08-15 20:16 ` George Spelvin
2014-08-15 21:32 ` Peter Zijlstra
2014-08-22 21:31 ` Andrew Morton
2014-08-22 22:36 ` George Spelvin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox