public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Willy TARREAU <willy@w.ods.org>
To: hugang <hugang@soulinfo.com>
Cc: akpm@digeo.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH] Faster generic_fls
Date: Thu, 1 May 2003 15:52:04 +0200	[thread overview]
Message-ID: <20030501135204.GC308@pcw.home.local> (raw)
In-Reply-To: <20030501150557.6dc913f7.hugang@soulinfo.com>

On Thu, May 01, 2003 at 03:05:57PM +0800, hugang wrote:
> Hello
> 
> Here is the table version of fls. Before version of it is wrong.

Ok, I recoded the tree myself with if/else, and it's now faster than all others,
whatever the compiler. I have cut the tree to only 16 bits evaluations because
it was still faster than a full 32 bits tree on x86, probably because of the
code length. However, on Alpha EV6 (gcc-2.95.3), the 32 bits tree is faster.
The 32 bits code is also the same speed as the 16 bits on gcc-3.2.3 if I
specify -mbranch-cost=2 (because gcc assumes that today's CPUs need one cycle
per jump).

Here is the function, followed by the logs executed on an Athlon.

Disclaimer: Daniel's PentiumIII seems to give very different results than my
Athlon, so every test should be considered for one arch only !!!

To quickly resume the results, on gcc-3.2.3 it's 45% faster than the table algo,
I don't understand why, and just a little bit faster than Daniel's function.

On gcc-2.95.3, it's only 20% faster than the table, but 46% than Daniel's.

However, I have good news for you, the table code is 15% faster than my tree
on an Alpha EV6 ;-)
It may be because gcc can use different tricks on this arch.

Cheers
Willy

static inline int fls_tree_fls(unsigned n) {
#ifndef REALLY_WANT_A_32BITS_TREE
    /* it seems as if working on half a tree is faster than the
       complete tree.
    */
    register t = 0;
    if (n >= (1<<16)) {
	n >>= 16;
	t = 16;
    }
    if (n < (1<<8))
	if (n < (1<<4))
	    if (n < 4)
		return t + n - ((n + 1) >> 2);
	    else
		return t + 3 + (n >= 8);
	else
	    if (n < (1<<6))
		return t + 5 + (n >= (1<<5));
	    else
		return t + 7 + (n >= (1<<7));
    else
	if (n < (1<<12))
	    if (n < (1<<10))
		return t + 9 + (n >= (1<<9));
	    else
		return t + 11 + (n >= (1<<11));
	else
	    if (n < (1<<14))
		return t + 13 + (n >= (1<<13));
	    else
		return t + 15 + (n >= (1<<15));
#else
    /* perhaps this is faster on systems with BIG caches, but on
       athlon, at least, it's slower than the previous one
    */
    if (n < (1<<16))
	if (n < (1<<8))
	    if (n < (1<<4))
		if (n < 4)
		    return n - ((n + 1) >> 2);
		else
		    return 3 + (n >= 8);
	    else
		if (n < (1<<6))
		    return 5 + (n >= (1<<5));
		else
		    return 7 + (n >= (1<<7));
	else
	    if (n < (1<<12))
		if (n < (1<<10))
		    return 9 + (n >= (1<<9));
		else
		    return 11 + (n >= (1<<11));
	    else
		if (n < (1<<14))
		    return 13 + (n >= (1<<13));
		else
		    return 15 + (n >= (1<<15));
    else
	if (n < (1<<24))
	    if (n < (1<<20))
		if (n < (1<<18))
		    return 17 + (n >= (1<<17));
		else
		    return 19 + (n >= (1<<19));
	    else
		if (n < (1<<22))
		    return 21 + (n >= (1<<21));
		else
		    return 23 + (n >= (1<<23));
	else
	    if (n < (1<<28))
		if (n < (1<<26))
		    return 25 + (n >= (1<<25));
		else
		    return 27 + (n >= (1<<27));
	    else
		if (n < (1<<30))
		    return 29 + (n >= (1<<29));
		else
		    return 31 + (n >= (1<<31));
#endif
}

== results on Athlon 1800+ (1.53 GHz), gcc-2.95.3 -O3 -march=i686 :

4294967295 iterations of nil... checksum = 4294967295

real    0m5.600s
user    0m5.590s
sys     0m0.007s
4294967295 iterations of old... checksum = 4294967265

real    0m39.640s
user    0m39.499s
sys     0m0.141s
4294967295 iterations of new... checksum = 4294967265

real    0m45.088s
user    0m44.936s
sys     0m0.158s
4294967295 iterations of fls_table... checksum = 4294967264

real    0m35.988s
user    0m35.833s
sys     0m0.159s
4294967295 iterations of fls_tree... checksum = 4294967265

real    0m28.699s
user    0m28.605s
sys     0m0.096s

== results on Athlon 1800+ (1.53 GHz), gcc-3.2.3 -O3 -march=athlon :

4294967295 iterations of nil... checksum = 4294967295

real    0m16.624s
user    0m16.624s
sys     0m0.001s
4294967295 iterations of old... checksum = 4294967265

real    0m57.685s
user    0m57.568s
sys     0m0.125s
4294967295 iterations of new... checksum = 4294967265

real    0m37.513s
user    0m37.415s
sys     0m0.103s
4294967295 iterations of fls_table... checksum = 4294967264

real    0m56.068s
user    0m55.991s
sys     0m0.085s
4294967295 iterations of fls_tree... 
checksum = 4294967265

real    0m36.636s
user    0m36.516s
sys     0m0.125s

=== alpha EV6 / 466 Mhz, gcc-2.95.3 -O3 ====
4294967295 iterations of new... checksum = 4294967265

real    2m19.951s
user    2m19.543s
sys     0m0.018s
4294967295 iterations of fls_table... checksum = 4294967265

real    1m12.825s
user    1m12.667s
sys     0m0.005s
4294967295 iterations of fls_tree... checksum = 4294967265

real    1m33.469s
user    1m33.242s
sys     0m0.013s


  reply	other threads:[~2003-05-01 13:42 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-30  2:46 [RFC][PATCH] Faster generic_fls Daniel Phillips
2003-04-30  7:19 ` Willy Tarreau
2003-04-30  8:36   ` Aaron Lehmann
2003-04-30  8:43     ` Aaron Lehmann
2003-04-30  8:52     ` Aaron Lehmann
2003-04-30  8:51       ` P
2003-04-30 13:51   ` Daniel Phillips
2003-04-30 11:14 ` Falk Hueffner
2003-04-30 13:03   ` Daniel Phillips
2003-04-30 13:17     ` Falk Hueffner
2003-04-30 14:07       ` Daniel Phillips
2003-04-30 14:11   ` Linus Torvalds
2003-04-30 14:53     ` Falk Hueffner
2003-04-30 15:28       ` Linus Torvalds
2003-04-30 16:03         ` Falk Hueffner
2003-04-30 16:16           ` Linus Torvalds
2003-04-30 16:43             ` Falk Hueffner
2003-04-30 20:25             ` Alan Cox
2003-04-30 21:59               ` Falk Hueffner
2003-04-30 22:22                 ` Alan Cox
2003-04-30 23:41               ` Linus Torvalds
2003-04-30 22:47                 ` Alan Cox
2003-05-01  0:12                 ` Falk Hueffner
2003-05-01  1:07                   ` Linus Torvalds
2003-04-30 19:15     ` Daniel Phillips
2003-04-30 20:59       ` Willy Tarreau
2003-05-01  1:02         ` Daniel Phillips
2003-05-01  9:19           ` Willy Tarreau
2003-05-01 16:02             ` Linus Torvalds
2003-04-30 20:55 ` Andrew Morton
2003-05-01  5:03   ` hugang
2003-05-01  5:11     ` Andrew Morton
2003-05-01  5:33       ` hugang
2003-05-01  7:05         ` hugang
2003-05-01 13:52           ` Willy TARREAU [this message]
2003-05-01 14:14             ` Falk Hueffner
2003-05-01 14:26               ` Willy TARREAU
2003-05-01 14:53             ` hugang
2003-05-01 15:54               ` Thomas Schlichter
2003-05-02  0:33                 ` hugang
2003-05-01 17:16               ` Willy TARREAU
2003-05-01 23:27                 ` Thomas Schlichter
2003-05-02  0:10                   ` Willy Tarreau
2003-05-02  0:43                     ` Daniel Phillips
2003-05-02  0:54                       ` Andrew Morton
2003-05-02 12:21                         ` Daniel Phillips
2003-05-02  1:47                       ` Thomas Schlichter
2003-05-02 13:24                         ` Willy Tarreau
2003-05-02  0:13                   ` Daniel Phillips
2003-05-02  0:13             ` Lincoln Dale
2003-05-01  5:16   ` hugang
2003-05-01 10:22     ` Willy TARREAU
2003-05-01 11:17       ` hugang
2003-05-01 11:45         ` Willy TARREAU
     [not found] <87d6j34jad.fsf@student.uni-tuebingen.de.suse.lists.linux.kernel>
     [not found] ` <Pine.LNX.4.44.0304301801210.20283-100000@home.transmeta.com.suse.lists.linux.kernel>
2003-05-01  1:46   ` Andi Kleen
2003-05-01  4:40     ` Linus Torvalds
2003-05-01 12:00       ` David S. Miller
2003-05-02  5:14         ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2003-05-01 13:31 linux
2003-05-01 17:15 Chuck Ebbert
2003-05-02  8:50 Chuck Ebbert
2003-05-02  9:37 ` Peter Osterlund
2003-05-02 10:04 Chuck Ebbert
2003-05-02 22:55 ` David S. Miller
2003-05-03  2:21 Chuck Ebbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030501135204.GC308@pcw.home.local \
    --to=willy@w.ods.org \
    --cc=akpm@digeo.com \
    --cc=hugang@soulinfo.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox