All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willy TARREAU <willy@w.ods.org>
To: hugang <hugang@soulinfo.com>
Cc: akpm@digeo.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH] Faster generic_fls
Date: Thu, 1 May 2003 15:52:04 +0200	[thread overview]
Message-ID: <20030501135204.GC308@pcw.home.local> (raw)
In-Reply-To: <20030501150557.6dc913f7.hugang@soulinfo.com>

On Thu, May 01, 2003 at 03:05:57PM +0800, hugang wrote:
> Hello
> 
> Here is the table version of fls. Before version of it is wrong.

Ok, I recoded the tree myself with if/else, and it's now faster than all others,
whatever the compiler. I have cut the tree to only 16 bits evaluations because
it was still faster than a full 32 bits tree on x86, probably because of the
code length. However, on Alpha EV6 (gcc-2.95.3), the 32 bits tree is faster.
The 32 bits code is also the same speed as the 16 bits on gcc-3.2.3 if I
specify -mbranch-cost=2 (because gcc assumes that today's CPUs need one cycle
per jump).

Here is the function, followed by the logs executed on an Athlon.

Disclaimer: Daniel's PentiumIII seems to give very different results than my
Athlon, so every test should be considered for one arch only !!!

To quickly resume the results, on gcc-3.2.3 it's 45% faster than the table algo,
I don't understand why, and just a little bit faster than Daniel's function.

On gcc-2.95.3, it's only 20% faster than the table, but 46% than Daniel's.

However, I have good news for you, the table code is 15% faster than my tree
on an Alpha EV6 ;-)
It may be because gcc can use different tricks on this arch.

Cheers
Willy

static inline int fls_tree_fls(unsigned n) {
#ifndef REALLY_WANT_A_32BITS_TREE
    /* it seems as if working on half a tree is faster than the
       complete tree.
    */
    register t = 0;
    if (n >= (1<<16)) {
	n >>= 16;
	t = 16;
    }
    if (n < (1<<8))
	if (n < (1<<4))
	    if (n < 4)
		return t + n - ((n + 1) >> 2);
	    else
		return t + 3 + (n >= 8);
	else
	    if (n < (1<<6))
		return t + 5 + (n >= (1<<5));
	    else
		return t + 7 + (n >= (1<<7));
    else
	if (n < (1<<12))
	    if (n < (1<<10))
		return t + 9 + (n >= (1<<9));
	    else
		return t + 11 + (n >= (1<<11));
	else
	    if (n < (1<<14))
		return t + 13 + (n >= (1<<13));
	    else
		return t + 15 + (n >= (1<<15));
#else
    /* perhaps this is faster on systems with BIG caches, but on
       athlon, at least, it's slower than the previous one
    */
    if (n < (1<<16))
	if (n < (1<<8))
	    if (n < (1<<4))
		if (n < 4)
		    return n - ((n + 1) >> 2);
		else
		    return 3 + (n >= 8);
	    else
		if (n < (1<<6))
		    return 5 + (n >= (1<<5));
		else
		    return 7 + (n >= (1<<7));
	else
	    if (n < (1<<12))
		if (n < (1<<10))
		    return 9 + (n >= (1<<9));
		else
		    return 11 + (n >= (1<<11));
	    else
		if (n < (1<<14))
		    return 13 + (n >= (1<<13));
		else
		    return 15 + (n >= (1<<15));
    else
	if (n < (1<<24))
	    if (n < (1<<20))
		if (n < (1<<18))
		    return 17 + (n >= (1<<17));
		else
		    return 19 + (n >= (1<<19));
	    else
		if (n < (1<<22))
		    return 21 + (n >= (1<<21));
		else
		    return 23 + (n >= (1<<23));
	else
	    if (n < (1<<28))
		if (n < (1<<26))
		    return 25 + (n >= (1<<25));
		else
		    return 27 + (n >= (1<<27));
	    else
		if (n < (1<<30))
		    return 29 + (n >= (1<<29));
		else
		    return 31 + (n >= (1<<31));
#endif
}

== results on Athlon 1800+ (1.53 GHz), gcc-2.95.3 -O3 -march=i686 :

4294967295 iterations of nil... checksum = 4294967295

real    0m5.600s
user    0m5.590s
sys     0m0.007s
4294967295 iterations of old... checksum = 4294967265

real    0m39.640s
user    0m39.499s
sys     0m0.141s
4294967295 iterations of new... checksum = 4294967265

real    0m45.088s
user    0m44.936s
sys     0m0.158s
4294967295 iterations of fls_table... checksum = 4294967264

real    0m35.988s
user    0m35.833s
sys     0m0.159s
4294967295 iterations of fls_tree... checksum = 4294967265

real    0m28.699s
user    0m28.605s
sys     0m0.096s

== results on Athlon 1800+ (1.53 GHz), gcc-3.2.3 -O3 -march=athlon :

4294967295 iterations of nil... checksum = 4294967295

real    0m16.624s
user    0m16.624s
sys     0m0.001s
4294967295 iterations of old... checksum = 4294967265

real    0m57.685s
user    0m57.568s
sys     0m0.125s
4294967295 iterations of new... checksum = 4294967265

real    0m37.513s
user    0m37.415s
sys     0m0.103s
4294967295 iterations of fls_table... checksum = 4294967264

real    0m56.068s
user    0m55.991s
sys     0m0.085s
4294967295 iterations of fls_tree... 
checksum = 4294967265

real    0m36.636s
user    0m36.516s
sys     0m0.125s

=== alpha EV6 / 466 Mhz, gcc-2.95.3 -O3 ====
4294967295 iterations of new... checksum = 4294967265

real    2m19.951s
user    2m19.543s
sys     0m0.018s
4294967295 iterations of fls_table... checksum = 4294967265

real    1m12.825s
user    1m12.667s
sys     0m0.005s
4294967295 iterations of fls_tree... checksum = 4294967265

real    1m33.469s
user    1m33.242s
sys     0m0.013s


  reply	other threads:[~2003-05-01 13:42 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-30  2:46 [RFC][PATCH] Faster generic_fls Daniel Phillips
2003-04-30  7:19 ` Willy Tarreau
2003-04-30  8:36   ` Aaron Lehmann
2003-04-30  8:43     ` Aaron Lehmann
2003-04-30  8:52     ` Aaron Lehmann
2003-04-30  8:51       ` P
2003-04-30 13:51   ` Daniel Phillips
2003-04-30 11:14 ` Falk Hueffner
2003-04-30 13:03   ` Daniel Phillips
2003-04-30 13:17     ` Falk Hueffner
2003-04-30 14:07       ` Daniel Phillips
2003-04-30 14:11   ` Linus Torvalds
2003-04-30 14:53     ` Falk Hueffner
2003-04-30 15:28       ` Linus Torvalds
2003-04-30 16:03         ` Falk Hueffner
2003-04-30 16:16           ` Linus Torvalds
2003-04-30 16:43             ` Falk Hueffner
2003-04-30 20:25             ` Alan Cox
2003-04-30 21:59               ` Falk Hueffner
2003-04-30 22:22                 ` Alan Cox
2003-04-30 23:41               ` Linus Torvalds
2003-04-30 22:47                 ` Alan Cox
2003-05-01  0:12                 ` Falk Hueffner
2003-05-01  1:07                   ` Linus Torvalds
2003-04-30 19:15     ` Daniel Phillips
2003-04-30 20:59       ` Willy Tarreau
2003-05-01  1:02         ` Daniel Phillips
2003-05-01  9:19           ` Willy Tarreau
2003-05-01 16:02             ` Linus Torvalds
2003-04-30 20:55 ` Andrew Morton
2003-05-01  5:03   ` hugang
2003-05-01  5:11     ` Andrew Morton
2003-05-01  5:33       ` hugang
2003-05-01  7:05         ` hugang
2003-05-01 13:52           ` Willy TARREAU [this message]
2003-05-01 14:14             ` Falk Hueffner
2003-05-01 14:26               ` Willy TARREAU
2003-05-01 14:53             ` hugang
2003-05-01 15:54               ` Thomas Schlichter
2003-05-02  0:33                 ` hugang
2003-05-01 17:16               ` Willy TARREAU
2003-05-01 23:27                 ` Thomas Schlichter
2003-05-02  0:10                   ` Willy Tarreau
2003-05-02  0:43                     ` Daniel Phillips
2003-05-02  0:54                       ` Andrew Morton
2003-05-02 12:21                         ` Daniel Phillips
2003-05-02  1:47                       ` Thomas Schlichter
2003-05-02 13:24                         ` Willy Tarreau
2003-05-02  0:13                   ` Daniel Phillips
2003-05-02  0:13             ` Lincoln Dale
2003-05-01  5:16   ` hugang
2003-05-01 10:22     ` Willy TARREAU
2003-05-01 11:17       ` hugang
2003-05-01 11:45         ` Willy TARREAU
     [not found] <87d6j34jad.fsf@student.uni-tuebingen.de.suse.lists.linux.kernel>
     [not found] ` <Pine.LNX.4.44.0304301801210.20283-100000@home.transmeta.com.suse.lists.linux.kernel>
2003-05-01  1:46   ` Andi Kleen
2003-05-01  4:40     ` Linus Torvalds
2003-05-01 12:00       ` David S. Miller
2003-05-02  5:14         ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2003-05-01 13:31 linux
2003-05-01 17:15 Chuck Ebbert
2003-05-02  8:50 Chuck Ebbert
2003-05-02  9:37 ` Peter Osterlund
2003-05-02 10:04 Chuck Ebbert
2003-05-02 22:55 ` David S. Miller
2003-05-03  2:21 Chuck Ebbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030501135204.GC308@pcw.home.local \
    --to=willy@w.ods.org \
    --cc=akpm@digeo.com \
    --cc=hugang@soulinfo.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.