From: "George Spelvin" <linux@horizon.com>
To: dalias@libc.org, geert@linux-m68k.org
Cc: akpm@linux-foundation.org, davem@davemloft.net, deller@gmx.de,
ink@jurassic.park.msu.ru, james.hogan@imgtec.com,
jejb@parisc-linux.org, jonas@southpole.se, lennox.wu@gmail.com,
lftan@altera.com, linux-alpha@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-m68k@lists.linux-m68k.org,
linux-metag@vger.kernel.org, linux-mips@linux-mips.org,
linux-parisc@vger.kernel.org, linux-sh@vger.kernel.org,
linux@arm.linux.org.uk, linux@horizon.com,
linux@lists.openrisc.net, liqin.linux@gmail.com,
mattst88@gmail.com, monstr@monstr.eu,
nios2-dev@lists.rocketboards.org, peterz@infradead.org,
ralf@linux-mips.org, rth@twiddle.net, sparclinux@vger.kernel.org,
uclinux-h8-devel@lists.sourceforge.jp,
ysato@users.sourceforge.jp, zengzhaoxiu@163.com,
zhaoxiu.zeng@gmail.com
Subject: Re: [patch V3] lib: GCD: add binary GCD algorithm
Date: 28 Apr 2016 15:15:51 -0400 [thread overview]
Message-ID: <20160428191551.17438.qmail@ns.horizon.com> (raw)
In-Reply-To: <20160428175843.GZ21636@brightrain.aerifal.cx>
> How does a CPU lack an efficient ffs/ctz anyway? There are all sorts
> of ways to implement it without a native insn, some of which are
> almost or just as fast as the native insn on cpus that have the
> latter. On anything with a fast multiply, the de Bruijn sequence
> approach is near-optimal, and otherwise one of the binary-search type
> approaches (possibly branchless) can be used. If the compiler doesn't
> generate an appropriate one for __builtin_ctz, that's arguably a
> compiler bug.
What's wanted here is something faster than any of those.
Yes, there's a simple constant-time branch-free implementation:
unsigned inline __attribute__((const))
hweight32(uint32_t x)
{
x -= (x >> 1) & 0x55555555;
x = ((x >> 2) & 0x33333333) + (x & 0x33333333);
x += x >> 4;
x &= 0x0f0f0f0f;
x += x >> 8;
x += x >> 16;
return x & 63;
}
unsigned inline __attribute__((const))
__ffs32(uint32_t x)
{
return hweight(~x & (x-1));
}
but if you work it through, that's about 19 instructions; a few more on
platforms without 32-bit immediates. The shift itself makes an even 20,
and there are a lot of sequential dependencies (I count a 17-op chain
including the shift) limiting execution time.
The de Bruijn hack reduces the length but adds a memory access for
the table lookup. (http://supertech.csail.mit.edu/papers/debruijn.pdf)
In the GCD code, the number to normalize is basically random, so the
normalization loop shifts an average of 1 bit. One bit half the time,
a second bit 1/4 of the time, etc.
(The posted code in the FAST_FFS case omits one guaranteed shift at the
end of the loop because the normalization code is constant-time.)
So "fast __ffs" basically means faster than *one* iteration of
"while (!(x & 1)) x >>= 1;".
In this case "fast" means cheaper than *one* unpredictable branch, which
is a very small handful of instructions.
next prev parent reply other threads:[~2016-04-28 19:15 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-28 11:43 [patch V3] lib: GCD: add binary GCD algorithm zengzhaoxiu
2016-04-28 12:18 ` kbuild test robot
2016-04-28 16:48 ` George Spelvin
2016-04-28 17:51 ` Geert Uytterhoeven
2016-04-28 17:58 ` Rich Felker
2016-04-28 18:11 ` Geert Uytterhoeven
2016-04-28 19:15 ` George Spelvin [this message]
[not found] ` <20160428164856.10120.qmail-HzZAx2gCgqrSUeElwK9/Pw@public.gmane.org>
2016-04-28 19:54 ` Sam Ravnborg
[not found] ` <1461843824-19853-1-git-send-email-zengzhaoxiu-9Onoh4P/yGk@public.gmane.org>
2016-04-28 17:10 ` Josh Juran
2016-05-20 10:27 ` kbuild test robot
2016-04-28 17:22 ` James Bottomley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160428191551.17438.qmail@ns.horizon.com \
--to=linux@horizon.com \
--cc=akpm@linux-foundation.org \
--cc=dalias@libc.org \
--cc=davem@davemloft.net \
--cc=deller@gmx.de \
--cc=geert@linux-m68k.org \
--cc=ink@jurassic.park.msu.ru \
--cc=james.hogan@imgtec.com \
--cc=jejb@parisc-linux.org \
--cc=jonas@southpole.se \
--cc=lennox.wu@gmail.com \
--cc=lftan@altera.com \
--cc=linux-alpha@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-m68k@lists.linux-m68k.org \
--cc=linux-metag@vger.kernel.org \
--cc=linux-mips@linux-mips.org \
--cc=linux-parisc@vger.kernel.org \
--cc=linux-sh@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
--cc=linux@lists.openrisc.net \
--cc=liqin.linux@gmail.com \
--cc=mattst88@gmail.com \
--cc=monstr@monstr.eu \
--cc=nios2-dev@lists.rocketboards.org \
--cc=peterz@infradead.org \
--cc=ralf@linux-mips.org \
--cc=rth@twiddle.net \
--cc=sparclinux@vger.kernel.org \
--cc=uclinux-h8-devel@lists.sourceforge.jp \
--cc=ysato@users.sourceforge.jp \
--cc=zengzhaoxiu@163.com \
--cc=zhaoxiu.zeng@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox