public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Willy Tarreau <willy@w.ods.org>
To: Daniel Phillips <dphillips@sistina.com>
Cc: Linus Torvalds <torvalds@transmeta.com>, linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH] Faster generic_fls
Date: Wed, 30 Apr 2003 09:19:07 +0200	[thread overview]
Message-ID: <20030430071907.GA30999@alpha.home.local> (raw)
In-Reply-To: <200304300446.24330.dphillips@sistina.com>

Hi Daniel,

On Wed, Apr 30, 2003 at 04:46:23AM +0200, Daniel Phillips wrote:
> Here's a faster implementation of generic_fls, that I discovered accidently,
> by not noticing 2.5 already had a generic_fls, and so I rolled my own.  Like
> the incumbent, it's O log2(bits), but there's not a lot of resemblance beyond
> that.  I think the new algorithm is inherently more parallelizable than the
> traditional approach.  A processor that can speculatively evaluate both sides 
> of a conditional would benefit even more than the PIII I tested on.

I did some tests on an Athlon MP2200 (1.8 GHz) and a Pentium 4/2.0 GHz, with
both gcc 2.95.3 and gcc 3.2.3.

The results are interesting, because they show that the compiler used is far
more important than the code ! To resume quickly, your code is faster than the
old one on an athlon, gcc-2.95.2 -O3, and in any combination of gcc-3.2.3.
On a P4, new code compiled with 3.2.3 is still slower than old code with 2.95.3.

But gcc's optimizer seems to be even worse with each new version. The NIL
checksum takes 3 times more to complete on gcc3/athlon than gcc2/athlon !!!
And on some test, the P4 goes faster when the code is optimized for athlon than
for P4 !

Oh, and the new2 function is another variant I have been using by the past, but
it is slower here. It performed well on older machines with gcc 2.7.2. Its main
problem may be the code size, due to the 32 bits masks. Your code has the
advantage of being short and working on small data sets.

#define new2_fls32(___a) ({ \
    register int ___x, ___bits = 0; \
    if (___a) { \
        ___x = (___a); \
        ++___bits; \
        if (___x & 0xffff0000) { ___x &= 0xffff0000; ___bits += 16;} \
        if (___x & 0xff00ff00) { ___x &= 0xff00ff00; ___bits +=  8;} \
        if (___x & 0xf0f0f0f0) { ___x &= 0xf0f0f0f0; ___bits +=  4;} \
        if (___x & 0xcccccccc) { ___x &= 0xcccccccc; ___bits +=  2;} \
        if (___x & 0xaaaaaaaa) { ___x &= 0xaaaaaaaa; ___bits +=  1;} \
    } \
    ___bits; \
    })



Here are the results.

Cheers,
Willy

===== gcc-2.95.3 -march=i686 -O2 / athlon MP/2200 (1.8 GHz) =====

4294967295 iterations of nil... checksum = 4294967295

real    0m4.778s
user    0m4.770s
sys     0m0.010s
4294967295 iterations of old... checksum = 4294967265

real    0m37.923s
user    0m37.920s
sys     0m0.000s
4294967295 iterations of new... checksum = 4294967265

real    0m47.956s
user    0m47.920s
sys     0m0.030s
4294967295 iterations of new2... checksum = 4294967295

real    0m43.884s
user    0m43.860s
sys     0m0.020s


===== gcc-2.95.3 -march=i686 -O3 / athlon MP/2200 =====

4294967295 iterations of nil... checksum = 4294967295

real    0m4.779s
user    0m4.770s
sys     0m0.010s
4294967295 iterations of old... checksum = 4294967265

real    0m39.235s
user    0m39.180s
sys     0m0.050s
4294967295 iterations of new... checksum = 4294967265

real    0m32.175s
user    0m32.170s
sys     0m0.000s
4294967295 iterations of new2... checksum = 4294967295

real    0m37.560s
user    0m37.520s
sys     0m0.020s

===== gcc-2.95.3 -march=i686 -O3 / Pentium4 2.0 GHz =====

4294967295 iterations of nil... checksum = 4294967295

real    0m4.370s
user    0m4.370s
sys     0m0.000s
4294967295 iterations of old... checksum = 4294967265

real    0m22.148s
user    0m22.150s
sys     0m0.000s
4294967295 iterations of new... checksum = 4294967265

real    0m25.854s
user    0m25.850s
sys     0m0.000s
4294967295 iterations of new2... checksum = 4294967295

real    0m30.743s
user    0m28.930s
sys     0m0.160s


==== gcc 3.2.3 -march=i686 -O3 / Pentium 4 / 2.0 GHz ====

4294967295 iterations of nil... checksum = 4294967295

real    0m6.719s
user    0m6.710s
sys     0m0.000s
4294967295 iterations of old... checksum = 4294967265

real    0m45.571s
user    0m43.510s
sys     0m0.180s
4294967295 iterations of new... checksum = 4294967265

real    0m28.251s
user    0m26.420s
sys     0m0.010s
4294967295 iterations of new2... checksum = 4294967265

real    0m48.881s
user    0m47.080s
sys     0m0.000s
-> awful, 60% more time than with gcc-2.95.3 !

==== gcc 3.2.3 -march=pentium4 -O3 / Pentium 4 / 2.0 GHz ====
(yes, gcc 3.2 may be producing better code... for the eyes, not
for the CPU)

4294967295 iterations of nil... checksum = 4294967295

real    0m4.422s
user    0m4.420s
sys     0m0.000s
4294967295 iterations of old... checksum = 4294967265

real    0m34.950s
user    0m34.950s
sys     0m0.000s
4294967295 iterations of new... checksum = 4294967265

real    0m27.667s
user    0m25.810s
sys     0m0.000s
4294967295 iterations of new2... checksum = 4294967295

real    0m42.945s
user    0m42.760s
sys     0m0.160s

==== gcc 3.2.3 -march=athlon-mp -O3 / Pentium 4 / 2.0 GHz ====

4294967295 iterations of nil... checksum = 4294967295

real    0m6.486s
user    0m6.490s
sys     0m0.000s
4294967295 iterations of old... checksum = 4294967265

real    0m41.240s
user    0m39.220s
sys     0m0.160s
4294967295 iterations of new... checksum = 4294967265

real    0m25.780s
user    0m25.780s
sys     0m0.000s
4294967295 iterations of new2... checksum = 4294967295

real    0m51.789s
user    0m49.750s
sys     0m0.010s

==== gcc 3.2.3 -march=athlon-mp -O3 / Athlon MP2200 / 1.8 GHz ====

4294967295 iterations of nil... checksum = 4294967295

real    0m14.544s
user    0m14.540s
sys     0m0.000s
4294967295 iterations of old... checksum = 4294967265

real    0m44.609s
user    0m44.600s
sys     0m0.000s
4294967295 iterations of new... checksum = 4294967265

real    0m26.765s
user    0m26.760s
sys     0m0.000s
4294967295 iterations of new2... checksum = 4294967295

real    1m6.066s
user    1m6.030s
sys     0m0.030s

==== gcc 3.2.3 -march=pentium4 -O3 / Athlon MP2200 / 1.8 GHz ====

4294967295 iterations of nil... checksum = 4294967295

real    0m10.775s
user    0m10.750s
sys     0m0.030s
4294967295 iterations of old... checksum = 4294967265

real    0m45.837s
user    0m45.830s
sys     0m0.000s
4294967295 iterations of new... checksum = 4294967265

real    0m24.414s
user    0m24.410s
sys     0m0.010s
4294967295 iterations of new2... checksum = 4294967295

real    1m7.299s
user    1m7.280s
sys     0m0.010s


==== gcc 3.2.3 -march=i686 -O3 / Athlon MP2200 / 1.8 GHz ====

4294967295 iterations of nil... checksum = 4294967295

real    0m13.010s
user    0m13.000s
sys     0m0.010s



  reply	other threads:[~2003-04-30  7:07 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-30  2:46 [RFC][PATCH] Faster generic_fls Daniel Phillips
2003-04-30  7:19 ` Willy Tarreau [this message]
2003-04-30  8:36   ` Aaron Lehmann
2003-04-30  8:43     ` Aaron Lehmann
2003-04-30  8:52     ` Aaron Lehmann
2003-04-30  8:51       ` P
2003-04-30 13:51   ` Daniel Phillips
2003-04-30 11:14 ` Falk Hueffner
2003-04-30 13:03   ` Daniel Phillips
2003-04-30 13:17     ` Falk Hueffner
2003-04-30 14:07       ` Daniel Phillips
2003-04-30 14:11   ` Linus Torvalds
2003-04-30 14:53     ` Falk Hueffner
2003-04-30 15:28       ` Linus Torvalds
2003-04-30 16:03         ` Falk Hueffner
2003-04-30 16:16           ` Linus Torvalds
2003-04-30 16:43             ` Falk Hueffner
2003-04-30 20:25             ` Alan Cox
2003-04-30 21:59               ` Falk Hueffner
2003-04-30 22:22                 ` Alan Cox
2003-04-30 23:41               ` Linus Torvalds
2003-04-30 22:47                 ` Alan Cox
2003-05-01  0:12                 ` Falk Hueffner
2003-05-01  1:07                   ` Linus Torvalds
2003-04-30 19:15     ` Daniel Phillips
2003-04-30 20:59       ` Willy Tarreau
2003-05-01  1:02         ` Daniel Phillips
2003-05-01  9:19           ` Willy Tarreau
2003-05-01 16:02             ` Linus Torvalds
2003-04-30 20:55 ` Andrew Morton
2003-05-01  5:03   ` hugang
2003-05-01  5:11     ` Andrew Morton
2003-05-01  5:33       ` hugang
2003-05-01  7:05         ` hugang
2003-05-01 13:52           ` Willy TARREAU
2003-05-01 14:14             ` Falk Hueffner
2003-05-01 14:26               ` Willy TARREAU
2003-05-01 14:53             ` hugang
2003-05-01 15:54               ` Thomas Schlichter
2003-05-02  0:33                 ` hugang
2003-05-01 17:16               ` Willy TARREAU
2003-05-01 23:27                 ` Thomas Schlichter
2003-05-02  0:10                   ` Willy Tarreau
2003-05-02  0:43                     ` Daniel Phillips
2003-05-02  0:54                       ` Andrew Morton
2003-05-02 12:21                         ` Daniel Phillips
2003-05-02  1:47                       ` Thomas Schlichter
2003-05-02 13:24                         ` Willy Tarreau
2003-05-02  0:13                   ` Daniel Phillips
2003-05-02  0:13             ` Lincoln Dale
2003-05-01  5:16   ` hugang
2003-05-01 10:22     ` Willy TARREAU
2003-05-01 11:17       ` hugang
2003-05-01 11:45         ` Willy TARREAU
     [not found] <87d6j34jad.fsf@student.uni-tuebingen.de.suse.lists.linux.kernel>
     [not found] ` <Pine.LNX.4.44.0304301801210.20283-100000@home.transmeta.com.suse.lists.linux.kernel>
2003-05-01  1:46   ` Andi Kleen
2003-05-01  4:40     ` Linus Torvalds
2003-05-01 12:00       ` David S. Miller
2003-05-02  5:14         ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2003-05-01 13:31 linux
2003-05-01 17:15 Chuck Ebbert
2003-05-02  8:50 Chuck Ebbert
2003-05-02  9:37 ` Peter Osterlund
2003-05-02 10:04 Chuck Ebbert
2003-05-02 22:55 ` David S. Miller
2003-05-03  2:21 Chuck Ebbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030430071907.GA30999@alpha.home.local \
    --to=willy@w.ods.org \
    --cc=dphillips@sistina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox