public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* find_new_bit bloat from x86 tree...
@ 2008-04-27  5:07 David Miller
  2008-04-27 20:41 ` Ingo Molnar
  2008-04-28 11:41 ` Alexander van Heukelum
  0 siblings, 2 replies; 3+ messages in thread
From: David Miller @ 2008-04-27  5:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, torvalds, akpm, viro


Ingo, what the heck is this?

commit 64970b68d2b3ed32b964b0b30b1b98518fde388e
Author: Alexander van Heukelum <heukelum@mailshack.com>
Date:   Tue Mar 11 16:17:19 2008 +0100

    x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps

Thanks for bloating up the inline expansion of this thing on every
architecture that doesn't do __ffs() in a simple sequence of a few
instructions like x86 does.

Now every call that matches your tests gets this turd inline:

static inline unsigned long __ffs(unsigned long word)
{
	int num = 0;

#if BITS_PER_LONG == 64
	if ((word & 0xffffffff) == 0) {
		num += 32;
		word >>= 32;
	}
#endif
	if ((word & 0xffff) == 0) {
		num += 16;
		word >>= 16;
	}
	if ((word & 0xff) == 0) {
		num += 8;
		word >>= 8;
	}
	if ((word & 0xf) == 0) {
		num += 4;
		word >>= 4;
	}
	if ((word & 0x3) == 0) {
		num += 2;
		word >>= 2;
	}
	if ((word & 0x1) == 0)
		num += 1;
	return num;
}

as well as all of that address formation, bit shifting, and masking.

Please revert or make this conditional on something architectures can
opt-in for.

The version actually applied was posted only on linux-kernel, instead
of also CC:'ing linux-arch as previous versions had been.  Nobody
commented on this version other than you Ingo.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: find_new_bit bloat from x86 tree...
  2008-04-27  5:07 find_new_bit bloat from x86 tree David Miller
@ 2008-04-27 20:41 ` Ingo Molnar
  2008-04-28 11:41 ` Alexander van Heukelum
  1 sibling, 0 replies; 3+ messages in thread
From: Ingo Molnar @ 2008-04-27 20:41 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, torvalds, akpm, viro, Alexander van Heukelum


* David Miller <davem@davemloft.net> wrote:

> Ingo, what the heck is this?
> 
> commit 64970b68d2b3ed32b964b0b30b1b98518fde388e
> Author: Alexander van Heukelum <heukelum@mailshack.com>

(Alexander Cc:-ed.)

> Date:   Tue Mar 11 16:17:19 2008 +0100
> 
>     x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
> 
> Thanks for bloating up the inline expansion of this thing on every 
> architecture that doesn't do __ffs() in a simple sequence of a few 
> instructions like x86 does.

ok, that's bad. How much is the before/after vmlinux btw, i guess you 
measured it? We could either make it per arch selectable, or we could 
just uninline it all. It shouldnt be a big deal on x86 and it should 
help on RISC platforms.

	Ingo

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: find_new_bit bloat from x86 tree...
  2008-04-27  5:07 find_new_bit bloat from x86 tree David Miller
  2008-04-27 20:41 ` Ingo Molnar
@ 2008-04-28 11:41 ` Alexander van Heukelum
  1 sibling, 0 replies; 3+ messages in thread
From: Alexander van Heukelum @ 2008-04-28 11:41 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, mingo, torvalds, akpm, viro, heukelum

On Sat, Apr 26, 2008 at 10:07:26PM -0700, David Miller wrote:
> Ingo, what the heck is this?
> 
> commit 64970b68d2b3ed32b964b0b30b1b98518fde388e
> Author: Alexander van Heukelum <heukelum@mailshack.com>
> Date:   Tue Mar 11 16:17:19 2008 +0100
> 
>     x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
> 
> Thanks for bloating up the inline expansion of this thing on every
> architecture that doesn't do __ffs() in a simple sequence of a few
> instructions like x86 does.
> 
> Now every call that matches your tests gets this turd inline:
> 
> static inline unsigned long __ffs(unsigned long word)
> {
> 	int num = 0;
> 
> #if BITS_PER_LONG == 64
> 	if ((word & 0xffffffff) == 0) {
> 		num += 32;
> 		word >>= 32;
> 	}
> #endif
> 	if ((word & 0xffff) == 0) {
> 		num += 16;
> 		word >>= 16;
> 	}
> 	if ((word & 0xff) == 0) {
> 		num += 8;
> 		word >>= 8;
> 	}
> 	if ((word & 0xf) == 0) {
> 		num += 4;
> 		word >>= 4;
> 	}
> 	if ((word & 0x3) == 0) {
> 		num += 2;
> 		word >>= 2;
> 	}
> 	if ((word & 0x1) == 0)
> 		num += 1;
> 	return num;
> }
> 
> as well as all of that address formation, bit shifting, and masking.
> 
> Please revert or make this conditional on something architectures can
> opt-in for.

Alternatively, implement __ffs out of line? Like:

static inline unsigned long __ffs(unsigned long word)
{
	return generic___ffs(word);
}

And a generic___ffs implemented in lib/ffs.c?

If __ffs is too big to be inlined it should not be inlined. That
is a generic problem and has nothing to do with this patch, IMHO.

Greetings,
	Alexander

> The version actually applied was posted only on linux-kernel, instead
> of also CC:'ing linux-arch as previous versions had been.  Nobody
> commented on this version other than you Ingo.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-04-28 11:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-27  5:07 find_new_bit bloat from x86 tree David Miller
2008-04-27 20:41 ` Ingo Molnar
2008-04-28 11:41 ` Alexander van Heukelum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox