From: Willy TARREAU <willy@w.ods.org>
To: hugang <hugang@soulinfo.com>
Cc: akpm@digeo.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH] Faster generic_fls
Date: Thu, 1 May 2003 15:52:04 +0200 [thread overview]
Message-ID: <20030501135204.GC308@pcw.home.local> (raw)
In-Reply-To: <20030501150557.6dc913f7.hugang@soulinfo.com>
On Thu, May 01, 2003 at 03:05:57PM +0800, hugang wrote:
> Hello
>
> Here is the table version of fls. Before version of it is wrong.
Ok, I recoded the tree myself with if/else, and it's now faster than all others,
whatever the compiler. I have cut the tree to only 16 bits evaluations because
it was still faster than a full 32 bits tree on x86, probably because of the
code length. However, on Alpha EV6 (gcc-2.95.3), the 32 bits tree is faster.
The 32 bits code is also the same speed as the 16 bits on gcc-3.2.3 if I
specify -mbranch-cost=2 (because gcc assumes that today's CPUs need one cycle
per jump).
Here is the function, followed by the logs executed on an Athlon.
Disclaimer: Daniel's PentiumIII seems to give very different results than my
Athlon, so every test should be considered for one arch only !!!
To quickly resume the results, on gcc-3.2.3 it's 45% faster than the table algo,
I don't understand why, and just a little bit faster than Daniel's function.
On gcc-2.95.3, it's only 20% faster than the table, but 46% than Daniel's.
However, I have good news for you, the table code is 15% faster than my tree
on an Alpha EV6 ;-)
It may be because gcc can use different tricks on this arch.
Cheers
Willy
static inline int fls_tree_fls(unsigned n) {
#ifndef REALLY_WANT_A_32BITS_TREE
/* it seems as if working on half a tree is faster than the
complete tree.
*/
register t = 0;
if (n >= (1<<16)) {
n >>= 16;
t = 16;
}
if (n < (1<<8))
if (n < (1<<4))
if (n < 4)
return t + n - ((n + 1) >> 2);
else
return t + 3 + (n >= 8);
else
if (n < (1<<6))
return t + 5 + (n >= (1<<5));
else
return t + 7 + (n >= (1<<7));
else
if (n < (1<<12))
if (n < (1<<10))
return t + 9 + (n >= (1<<9));
else
return t + 11 + (n >= (1<<11));
else
if (n < (1<<14))
return t + 13 + (n >= (1<<13));
else
return t + 15 + (n >= (1<<15));
#else
/* perhaps this is faster on systems with BIG caches, but on
athlon, at least, it's slower than the previous one
*/
if (n < (1<<16))
if (n < (1<<8))
if (n < (1<<4))
if (n < 4)
return n - ((n + 1) >> 2);
else
return 3 + (n >= 8);
else
if (n < (1<<6))
return 5 + (n >= (1<<5));
else
return 7 + (n >= (1<<7));
else
if (n < (1<<12))
if (n < (1<<10))
return 9 + (n >= (1<<9));
else
return 11 + (n >= (1<<11));
else
if (n < (1<<14))
return 13 + (n >= (1<<13));
else
return 15 + (n >= (1<<15));
else
if (n < (1<<24))
if (n < (1<<20))
if (n < (1<<18))
return 17 + (n >= (1<<17));
else
return 19 + (n >= (1<<19));
else
if (n < (1<<22))
return 21 + (n >= (1<<21));
else
return 23 + (n >= (1<<23));
else
if (n < (1<<28))
if (n < (1<<26))
return 25 + (n >= (1<<25));
else
return 27 + (n >= (1<<27));
else
if (n < (1<<30))
return 29 + (n >= (1<<29));
else
return 31 + (n >= (1<<31));
#endif
}
== results on Athlon 1800+ (1.53 GHz), gcc-2.95.3 -O3 -march=i686 :
4294967295 iterations of nil... checksum = 4294967295
real 0m5.600s
user 0m5.590s
sys 0m0.007s
4294967295 iterations of old... checksum = 4294967265
real 0m39.640s
user 0m39.499s
sys 0m0.141s
4294967295 iterations of new... checksum = 4294967265
real 0m45.088s
user 0m44.936s
sys 0m0.158s
4294967295 iterations of fls_table... checksum = 4294967264
real 0m35.988s
user 0m35.833s
sys 0m0.159s
4294967295 iterations of fls_tree... checksum = 4294967265
real 0m28.699s
user 0m28.605s
sys 0m0.096s
== results on Athlon 1800+ (1.53 GHz), gcc-3.2.3 -O3 -march=athlon :
4294967295 iterations of nil... checksum = 4294967295
real 0m16.624s
user 0m16.624s
sys 0m0.001s
4294967295 iterations of old... checksum = 4294967265
real 0m57.685s
user 0m57.568s
sys 0m0.125s
4294967295 iterations of new... checksum = 4294967265
real 0m37.513s
user 0m37.415s
sys 0m0.103s
4294967295 iterations of fls_table... checksum = 4294967264
real 0m56.068s
user 0m55.991s
sys 0m0.085s
4294967295 iterations of fls_tree...
checksum = 4294967265
real 0m36.636s
user 0m36.516s
sys 0m0.125s
=== alpha EV6 / 466 Mhz, gcc-2.95.3 -O3 ====
4294967295 iterations of new... checksum = 4294967265
real 2m19.951s
user 2m19.543s
sys 0m0.018s
4294967295 iterations of fls_table... checksum = 4294967265
real 1m12.825s
user 1m12.667s
sys 0m0.005s
4294967295 iterations of fls_tree... checksum = 4294967265
real 1m33.469s
user 1m33.242s
sys 0m0.013s
next prev parent reply other threads:[~2003-05-01 13:42 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-04-30 2:46 [RFC][PATCH] Faster generic_fls Daniel Phillips
2003-04-30 7:19 ` Willy Tarreau
2003-04-30 8:36 ` Aaron Lehmann
2003-04-30 8:43 ` Aaron Lehmann
2003-04-30 8:52 ` Aaron Lehmann
2003-04-30 8:51 ` P
2003-04-30 13:51 ` Daniel Phillips
2003-04-30 11:14 ` Falk Hueffner
2003-04-30 13:03 ` Daniel Phillips
2003-04-30 13:17 ` Falk Hueffner
2003-04-30 14:07 ` Daniel Phillips
2003-04-30 14:11 ` Linus Torvalds
2003-04-30 14:53 ` Falk Hueffner
2003-04-30 15:28 ` Linus Torvalds
2003-04-30 16:03 ` Falk Hueffner
2003-04-30 16:16 ` Linus Torvalds
2003-04-30 16:43 ` Falk Hueffner
2003-04-30 20:25 ` Alan Cox
2003-04-30 21:59 ` Falk Hueffner
2003-04-30 22:22 ` Alan Cox
2003-04-30 23:41 ` Linus Torvalds
2003-04-30 22:47 ` Alan Cox
2003-05-01 0:12 ` Falk Hueffner
2003-05-01 1:07 ` Linus Torvalds
2003-04-30 19:15 ` Daniel Phillips
2003-04-30 20:59 ` Willy Tarreau
2003-05-01 1:02 ` Daniel Phillips
2003-05-01 9:19 ` Willy Tarreau
2003-05-01 16:02 ` Linus Torvalds
2003-04-30 20:55 ` Andrew Morton
2003-05-01 5:03 ` hugang
2003-05-01 5:11 ` Andrew Morton
2003-05-01 5:33 ` hugang
2003-05-01 7:05 ` hugang
2003-05-01 13:52 ` Willy TARREAU [this message]
2003-05-01 14:14 ` Falk Hueffner
2003-05-01 14:26 ` Willy TARREAU
2003-05-01 14:53 ` hugang
2003-05-01 15:54 ` Thomas Schlichter
2003-05-02 0:33 ` hugang
2003-05-01 17:16 ` Willy TARREAU
2003-05-01 23:27 ` Thomas Schlichter
2003-05-02 0:10 ` Willy Tarreau
2003-05-02 0:43 ` Daniel Phillips
2003-05-02 0:54 ` Andrew Morton
2003-05-02 12:21 ` Daniel Phillips
2003-05-02 1:47 ` Thomas Schlichter
2003-05-02 13:24 ` Willy Tarreau
2003-05-02 0:13 ` Daniel Phillips
2003-05-02 0:13 ` Lincoln Dale
2003-05-01 5:16 ` hugang
2003-05-01 10:22 ` Willy TARREAU
2003-05-01 11:17 ` hugang
2003-05-01 11:45 ` Willy TARREAU
[not found] <87d6j34jad.fsf@student.uni-tuebingen.de.suse.lists.linux.kernel>
[not found] ` <Pine.LNX.4.44.0304301801210.20283-100000@home.transmeta.com.suse.lists.linux.kernel>
2003-05-01 1:46 ` Andi Kleen
2003-05-01 4:40 ` Linus Torvalds
2003-05-01 12:00 ` David S. Miller
2003-05-02 5:14 ` Linus Torvalds
-- strict thread matches above, loose matches on Subject: below --
2003-05-01 13:31 linux
2003-05-01 17:15 Chuck Ebbert
2003-05-02 8:50 Chuck Ebbert
2003-05-02 9:37 ` Peter Osterlund
2003-05-02 10:04 Chuck Ebbert
2003-05-02 22:55 ` David S. Miller
2003-05-03 2:21 Chuck Ebbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030501135204.GC308@pcw.home.local \
--to=willy@w.ods.org \
--cc=akpm@digeo.com \
--cc=hugang@soulinfo.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.