* find_new_bit bloat from x86 tree...
@ 2008-04-27 5:07 David Miller
2008-04-27 20:41 ` Ingo Molnar
2008-04-28 11:41 ` Alexander van Heukelum
0 siblings, 2 replies; 3+ messages in thread
From: David Miller @ 2008-04-27 5:07 UTC (permalink / raw)
To: linux-kernel; +Cc: mingo, torvalds, akpm, viro
Ingo, what the heck is this?
commit 64970b68d2b3ed32b964b0b30b1b98518fde388e
Author: Alexander van Heukelum <heukelum@mailshack.com>
Date: Tue Mar 11 16:17:19 2008 +0100
x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
Thanks for bloating up the inline expansion of this thing on every
architecture that doesn't do __ffs() in a simple sequence of a few
instructions like x86 does.
Now every call that matches your tests gets this turd inline:
static inline unsigned long __ffs(unsigned long word)
{
int num = 0;
#if BITS_PER_LONG == 64
if ((word & 0xffffffff) == 0) {
num += 32;
word >>= 32;
}
#endif
if ((word & 0xffff) == 0) {
num += 16;
word >>= 16;
}
if ((word & 0xff) == 0) {
num += 8;
word >>= 8;
}
if ((word & 0xf) == 0) {
num += 4;
word >>= 4;
}
if ((word & 0x3) == 0) {
num += 2;
word >>= 2;
}
if ((word & 0x1) == 0)
num += 1;
return num;
}
as well as all of that address formation, bit shifting, and masking.
Please revert or make this conditional on something architectures can
opt-in for.
The version actually applied was posted only on linux-kernel, instead
of also CC:'ing linux-arch as previous versions had been. Nobody
commented on this version other than you Ingo.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: find_new_bit bloat from x86 tree...
2008-04-27 5:07 find_new_bit bloat from x86 tree David Miller
@ 2008-04-27 20:41 ` Ingo Molnar
2008-04-28 11:41 ` Alexander van Heukelum
1 sibling, 0 replies; 3+ messages in thread
From: Ingo Molnar @ 2008-04-27 20:41 UTC (permalink / raw)
To: David Miller; +Cc: linux-kernel, torvalds, akpm, viro, Alexander van Heukelum
* David Miller <davem@davemloft.net> wrote:
> Ingo, what the heck is this?
>
> commit 64970b68d2b3ed32b964b0b30b1b98518fde388e
> Author: Alexander van Heukelum <heukelum@mailshack.com>
(Alexander Cc:-ed.)
> Date: Tue Mar 11 16:17:19 2008 +0100
>
> x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
>
> Thanks for bloating up the inline expansion of this thing on every
> architecture that doesn't do __ffs() in a simple sequence of a few
> instructions like x86 does.
ok, that's bad. How much is the before/after vmlinux btw, i guess you
measured it? We could either make it per arch selectable, or we could
just uninline it all. It shouldnt be a big deal on x86 and it should
help on RISC platforms.
Ingo
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: find_new_bit bloat from x86 tree...
2008-04-27 5:07 find_new_bit bloat from x86 tree David Miller
2008-04-27 20:41 ` Ingo Molnar
@ 2008-04-28 11:41 ` Alexander van Heukelum
1 sibling, 0 replies; 3+ messages in thread
From: Alexander van Heukelum @ 2008-04-28 11:41 UTC (permalink / raw)
To: David Miller; +Cc: linux-kernel, mingo, torvalds, akpm, viro, heukelum
On Sat, Apr 26, 2008 at 10:07:26PM -0700, David Miller wrote:
> Ingo, what the heck is this?
>
> commit 64970b68d2b3ed32b964b0b30b1b98518fde388e
> Author: Alexander van Heukelum <heukelum@mailshack.com>
> Date: Tue Mar 11 16:17:19 2008 +0100
>
> x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
>
> Thanks for bloating up the inline expansion of this thing on every
> architecture that doesn't do __ffs() in a simple sequence of a few
> instructions like x86 does.
>
> Now every call that matches your tests gets this turd inline:
>
> static inline unsigned long __ffs(unsigned long word)
> {
> int num = 0;
>
> #if BITS_PER_LONG == 64
> if ((word & 0xffffffff) == 0) {
> num += 32;
> word >>= 32;
> }
> #endif
> if ((word & 0xffff) == 0) {
> num += 16;
> word >>= 16;
> }
> if ((word & 0xff) == 0) {
> num += 8;
> word >>= 8;
> }
> if ((word & 0xf) == 0) {
> num += 4;
> word >>= 4;
> }
> if ((word & 0x3) == 0) {
> num += 2;
> word >>= 2;
> }
> if ((word & 0x1) == 0)
> num += 1;
> return num;
> }
>
> as well as all of that address formation, bit shifting, and masking.
>
> Please revert or make this conditional on something architectures can
> opt-in for.
Alternatively, implement __ffs out of line? Like:
static inline unsigned long __ffs(unsigned long word)
{
return generic___ffs(word);
}
And a generic___ffs implemented in lib/ffs.c?
If __ffs is too big to be inlined it should not be inlined. That
is a generic problem and has nothing to do with this patch, IMHO.
Greetings,
Alexander
> The version actually applied was posted only on linux-kernel, instead
> of also CC:'ing linux-arch as previous versions had been. Nobody
> commented on this version other than you Ingo.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-04-28 11:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-27 5:07 find_new_bit bloat from x86 tree David Miller
2008-04-27 20:41 ` Ingo Molnar
2008-04-28 11:41 ` Alexander van Heukelum
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox