linux-alpha.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] alpha: simplify and optimize sched_find_first_bit
@ 2010-04-08 18:34 mattst88
  2010-04-08 18:52 ` Richard Henderson
       [not found] ` <4BBE2B5C.6020802@twiddle.net>
  0 siblings, 2 replies; 5+ messages in thread
From: mattst88 @ 2010-04-08 18:34 UTC (permalink / raw)
  To: linux-alpha
  Cc: Matt Turner, Peter Zijlstra, Ingo Molnar, Richard Henderson,
	Ivan Kokshaysky

From: Matt Turner <mattst88@gmail.com>

Search only the first 100 bits instead of 140, saving a couple
instructions. Also use inline assembly to use cmov instructions instead
of letting gcc emit multiple branches. The resulting code is about 6x
faster.

Thanks to Zack Weinberg and Uros Bizjak (GCC Bug 43691) for helping me
identify problems with the inline assembly.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: linux-alpha@vger.kernel.org
Signed-off-by: Matt Turner <mattst88@gmail.com>
---
 arch/alpha/include/asm/bitops.h |   24 ++++++++++++------------
 1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index 15f3ae2..cfa7526 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -436,22 +436,22 @@ static inline unsigned int hweight8(unsigned int w)
 
 /*
  * Every architecture must define this function. It's the fastest
- * way of searching a 140-bit bitmap where the first 100 bits are
- * unlikely to be set. It's guaranteed that at least one of the 140
- * bits is set.
+ * way of searching a 100-bit bitmap.  It's guaranteed that at least
+ * one of the 100 bits is cleared.
  */
 static inline unsigned long
-sched_find_first_bit(unsigned long b[3])
+sched_find_first_bit(const unsigned long b[2])
 {
-	unsigned long b0 = b[0], b1 = b[1], b2 = b[2];
 	unsigned long ofs;
-
-	ofs = (b1 ? 64 : 128);
-	b1 = (b1 ? b1 : b2);
-	ofs = (b0 ? 0 : ofs);
-	b0 = (b0 ? b0 : b1);
-
-	return __ffs(b0) + ofs;
+	unsigned long output;
+	asm(
+		"cmoveq %0,64,%1        # ofs = (b[0] ? ofs : 64);\n"
+		"cmoveq %0,%2,%0        # temp = (b[0] ? b[0] : b[1]);\n"
+		"cttz   %0,%0           # output = cttz(temp);\n "
+		: "=r" (output), "=r" (ofs)
+		: "r" (b[1]), "0" (b[0]), "1" (0)
+	);
+	return output + ofs;
 }
 
 #include <asm-generic/bitops/ext2-non-atomic.h>
-- 
1.6.4.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] alpha: simplify and optimize sched_find_first_bit
  2010-04-08 18:34 mattst88
@ 2010-04-08 18:52 ` Richard Henderson
       [not found] ` <4BBE2B5C.6020802@twiddle.net>
  1 sibling, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2010-04-08 18:52 UTC (permalink / raw)
  To: mattst88; +Cc: linux-alpha, Peter Zijlstra, Ingo Molnar, Ivan Kokshaysky

On 04/08/2010 11:34 AM, mattst88@gmail.com wrote:
> -	return __ffs(b0) + ofs;
> +	unsigned long output;
> +	asm(
> +		"cmoveq %0,64,%1        # ofs = (b[0] ? ofs : 64);\n"
> +		"cmoveq %0,%2,%0        # temp = (b[0] ? b[0] : b[1]);\n"
> +		"cttz   %0,%0           # output = cttz(temp);\n "
> +		: "=r" (output), "=r" (ofs)
> +		: "r" (b[1]), "0" (b[0]), "1" (0)
> +	);
> +	return output + ofs;

NACK.

You need to move that cttz out of the asm as well, to continue
to support pre-ev67.  Ack if you adjust to use __ffs.


r~

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] alpha: simplify and optimize sched_find_first_bit
       [not found] ` <4BBE2B5C.6020802@twiddle.net>
@ 2010-04-29  2:08   ` Matt Turner
  0 siblings, 0 replies; 5+ messages in thread
From: Matt Turner @ 2010-04-29  2:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: linux-alpha

[-- Attachment #1: Type: text/plain, Size: 2621 bytes --]

On Thu, Apr 8, 2010 at 3:15 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 04/08/2010 11:34 AM, mattst88@gmail.com wrote:
>> +     asm(
>> +             "cmoveq %0,64,%1        # ofs = (b[0] ? ofs : 64);\n"
>> +             "cmoveq %0,%2,%0        # temp = (b[0] ? b[0] : b[1]);\n"
>> +             "cttz   %0,%0           # output = cttz(temp);\n "
>> +             : "=r" (output), "=r" (ofs)
>> +             : "r" (b[1]), "0" (b[0]), "1" (0)
>
> I must say I'd also prefer a comment like
>
>        /* This is equivalent to
>                ofs = (b[0] ? 0 : 64);
>                tmp = (b[0] ? b[0] : b[1]);
>           but is a bit faster than what GCC would produce on its own.  */
>        asm("cmoveq %0,64,%1\n\tcmoveq %0,%2,%0"
>            : "=r"(output), "=r"(ofs)
>            : "r"(b[1]), "0"(b[0]), "1"(0));
>
> ... except that I can't see that it is, at least for mainline gcc.
>
> [anchor:~] cat z.c
> long foo(const unsigned long *b)
> {
>  unsigned long b0, b1, ofs, tmp;
>
>  b0 = b[0];
>  b1 = b[1];
>  ofs = (b0 ? 0 : 64);
>  tmp = (b0 ? b0 : b1);
>
>  /* tmp = __ffs(tmp); -- elided for clarity wrt ev5 vs ev67 */
>  return tmp + ofs;
> }
>
> -mcpu=ev5 -Os (to avoid nop padding):
>        ldq $2,0($16)
>        ldq $1,8($16)
>        lda $0,64($31)
>        cmovne $2,0,$0
>        cmovne $2,$2,$1
>        addq $0,$1,$0
>
> -mcpu=ev6 -Os:
>        ldq $0,0($16)
>        ldq $1,8($16)
>        cmovne $0,$0,$1
>        cmpeq $0,0,$0
>        sll $0,6,$0
>        addq $0,$1,$0
>
> I seem to recall that cmov is slightly more expensive on ev6,
> so gcc doesn't prefer it and came up with an equivalent using
> cmpeq+sll.
>
> If some previous version of gcc isn't so smart, I'm ok with
> continuing to use the asm.
>
>
> r~
>

So with your test program, the code generation results are:

4.3.4 -Os: good
4.3.4 -O1: bad
4.3.4 -O2: good
4.3.4 -O3: good

4.4.3 -Os: bad
4.4.3 -O1: bad
4.4.3 -O2: bad
4.4.3 -O3: good

4.5.0 -Os: good
4.5.0 -O1: bad
4.5.0 -O2: good
4.5.0 -O3: good

 o -O3 is produces good code in all versions.
 o -O1 is bad in all versions.
 o -Os and -O2 regressed from 4.3.4 to 4.4.3,
   but are back to 4.3.4 quality as of 4.5.0.

All produced cmov instructions just as you said.

My patch doesn't help any of the bad cases and even causes some that
were good to produce worse code, so it's not useful. Does any of this
look like it should warrant a gcc bug report, Richard?

I'll send a patch just to update sched_find_first_bit to search just
the first 100-bits.

Thanks!
Matt

[-- Attachment #2: test --]
[-- Type: application/octet-stream, Size: 9196 bytes --]

Test Program
#  define __kernel_cttz(x)      __builtin_ctzl(x)
unsigned long __ffs(unsigned long word)
{
 /* Whee.  EV67 can calculate it directly.  */
 return __kernel_cttz(word);
}

long foo(const unsigned long *b)
{
 unsigned long b0, b1, ofs, tmp;

 b0 = b[0];
 b1 = b[1];
 ofs = (b0 ? 0 : 64);
 tmp = (b0 ? b0 : b1);

 tmp = __ffs(tmp);
 return tmp + ofs;
}

# gcc-4.3.4 -Os -mcpu=ev67 -c z.c && objdump -d z.o

z.o:     file format elf64-alpha

Disassembly of section .text:

0000000000000000 <__ffs>:
   0:   60 06 f0 73     cttz    a0,v0
   4:   01 80 fa 6b     ret

0000000000000008 <foo>:
   8:   00 00 10 a4     ldq     v0,0(a0)
   c:   08 00 30 a4     ldq     t0,8(a0)
  10:   c1 04 00 44     cmovne  v0,v0,t0
  14:   a0 15 00 40     cmpeq   v0,0,v0
  18:   20 d7 00 48     sll     v0,0x6,v0
  1c:   61 06 e1 73     cttz    t0,t0
  20:   00 04 01 40     addq    v0,t0,v0
  24:   01 80 fa 6b     ret

# gcc-4.3.4 -O1 -mcpu=ev67 -c z.c && objdump -d z.o

z.o:     file format elf64-alpha

Disassembly of section .text:

0000000000000000 <__ffs>:
   0:   60 06 f0 73     cttz    a0,v0
   4:   01 80 fa 6b     ret

0000000000000008 <foo>:
   8:   00 00 bb 27     ldah    gp,0(t12)
   c:   00 00 bd 23     lda     gp,0(gp)
  10:   f0 ff de 23     lda     sp,-16(sp)
  14:   00 00 5e b7     stq     ra,0(sp)
  18:   08 00 3e b5     stq     s0,8(sp)
  1c:   01 04 f0 47     mov     a0,t0
  20:   00 00 10 a6     ldq     a0,0(a0)
  24:   08 00 21 a4     ldq     t0,8(t0)
  28:   a9 15 00 42     cmpeq   a0,0,s0
  2c:   29 d7 20 49     sll     s0,0x6,s0
  30:   90 04 01 46     cmoveq  a0,t0,a0
  34:   00 00 40 d3     bsr     ra,38 <foo+0x30>
  38:   00 04 20 41     addq    s0,v0,v0
  3c:   00 00 5e a7     ldq     ra,0(sp)
  40:   08 00 3e a5     ldq     s0,8(sp)
  44:   10 00 de 23     lda     sp,16(sp)
  48:   01 80 fa 6b     ret

# gcc-4.3.4 -O2 -mcpu=ev67 -c z.c && objdump -d z.o

z.o:     file format elf64-alpha

Disassembly of section .text:

0000000000000000 <__ffs>:
   0:   60 06 f0 73     cttz    a0,v0
   4:   01 80 fa 6b     ret
   8:   1f 04 ff 47     nop
   c:   00 00 fe 2f     unop

0000000000000010 <foo>:
  10:   00 00 10 a4     ldq     v0,0(a0)
  14:   08 00 30 a4     ldq     t0,8(a0)
  18:   c1 04 00 44     cmovne  v0,v0,t0
  1c:   a0 15 00 40     cmpeq   v0,0,v0
  20:   20 d7 00 48     sll     v0,0x6,v0
  24:   61 06 e1 73     cttz    t0,t0
  28:   00 04 01 40     addq    v0,t0,v0
  2c:   01 80 fa 6b     ret

# gcc-4.3.4 -O3 -mcpu=ev67 -c z.c && objdump -d z.o

z.o:     file format elf64-alpha

Disassembly of section .text:

0000000000000000 <__ffs>:
   0:   60 06 f0 73     cttz    a0,v0
   4:   01 80 fa 6b     ret
   8:   1f 04 ff 47     nop
   c:   00 00 fe 2f     unop

0000000000000010 <foo>:
  10:   00 00 10 a4     ldq     v0,0(a0)
  14:   08 00 30 a4     ldq     t0,8(a0)
  18:   c1 04 00 44     cmovne  v0,v0,t0
  1c:   a0 15 00 40     cmpeq   v0,0,v0
  20:   20 d7 00 48     sll     v0,0x6,v0
  24:   61 06 e1 73     cttz    t0,t0
  28:   00 04 01 40     addq    v0,t0,v0
  2c:   01 80 fa 6b     ret

# gcc-4.4.3 -Os -mcpu=ev67 -c z.c && objdump -d z.o

z.o:     file format elf64-alpha

Disassembly of section .text:

0000000000000000 <__ffs>:
   0:   60 06 f0 73     cttz    a0,v0
   4:   01 80 fa 6b     ret

0000000000000008 <foo>:
   8:   00 00 bb 27     ldah    gp,0(t12)
   c:   00 00 bd 23     lda     gp,0(gp)
  10:   f0 ff de 23     lda     sp,-16(sp)
  14:   08 00 30 a4     ldq     t0,8(a0)
  18:   08 00 3e b5     stq     s0,8(sp)
  1c:   00 00 30 a5     ldq     s0,0(a0)
  20:   00 00 5e b7     stq     ra,0(sp)
  24:   10 04 e1 47     mov     t0,a0
  28:   d0 04 29 45     cmovne  s0,s0,a0
  2c:   a9 15 20 41     cmpeq   s0,0,s0
  30:   29 d7 20 49     sll     s0,0x6,s0
  34:   00 00 40 d3     bsr     ra,38 <foo+0x30>
  38:   00 04 20 41     addq    s0,v0,v0
  3c:   00 00 5e a7     ldq     ra,0(sp)
  40:   08 00 3e a5     ldq     s0,8(sp)
  44:   10 00 de 23     lda     sp,16(sp)
  48:   01 80 fa 6b     ret

# gcc-4.4.3 -O1 -mcpu=ev67 -c z.c && objdump -d z.o

z.o:     file format elf64-alpha

Disassembly of section .text:

0000000000000000 <__ffs>:
   0:   60 06 f0 73     cttz    a0,v0
   4:   01 80 fa 6b     ret

0000000000000008 <foo>:
   8:   00 00 bb 27     ldah    gp,0(t12)
   c:   00 00 bd 23     lda     gp,0(gp)
  10:   f0 ff de 23     lda     sp,-16(sp)
  14:   00 00 5e b7     stq     ra,0(sp)
  18:   08 00 3e b5     stq     s0,8(sp)
  1c:   00 00 30 a4     ldq     t0,0(a0)
  20:   08 00 10 a6     ldq     a0,8(a0)
  24:   a9 15 20 40     cmpeq   t0,0,s0
  28:   29 d7 20 49     sll     s0,0x6,s0
  2c:   d0 04 21 44     cmovne  t0,t0,a0
  30:   00 00 40 d3     bsr     ra,34 <foo+0x2c>
  34:   00 04 20 41     addq    s0,v0,v0
  38:   00 00 5e a7     ldq     ra,0(sp)
  3c:   08 00 3e a5     ldq     s0,8(sp)
  40:   10 00 de 23     lda     sp,16(sp)
  44:   01 80 fa 6b     ret

# gcc-4.4.3 -O2 -mcpu=ev67 -c z.c && objdump -d z.o

z.o:     file format elf64-alpha

Disassembly of section .text:

0000000000000000 <__ffs>:
   0:   60 06 f0 73     cttz    a0,v0
   4:   01 80 fa 6b     ret
   8:   1f 04 ff 47     nop
   c:   00 00 fe 2f     unop

0000000000000010 <foo>:
  10:   00 00 bb 27     ldah    gp,0(t12)
  14:   00 00 bd 23     lda     gp,0(gp)
  18:   f0 ff de 23     lda     sp,-16(sp)
  1c:   08 00 30 a4     ldq     t0,8(a0)
  20:   08 00 3e b5     stq     s0,8(sp)
  24:   00 00 30 a5     ldq     s0,0(a0)
  28:   00 00 5e b7     stq     ra,0(sp)
  2c:   10 04 e1 47     mov     t0,a0
  30:   d0 04 29 45     cmovne  s0,s0,a0
  34:   a9 15 20 41     cmpeq   s0,0,s0
  38:   29 d7 20 49     sll     s0,0x6,s0
  3c:   00 00 40 d3     bsr     ra,40 <foo+0x30>
  40:   00 04 20 41     addq    s0,v0,v0
  44:   00 00 5e a7     ldq     ra,0(sp)
  48:   08 00 3e a5     ldq     s0,8(sp)
  4c:   10 00 de 23     lda     sp,16(sp)
  50:   01 80 fa 6b     ret
  54:   00 00 fe 2f     unop
  58:   1f 04 ff 47     nop
  5c:   00 00 fe 2f     unop

# gcc-4.4.3 -O3 -mcpu=ev67 -c z.c && objdump -d z.o

z.o:     file format elf64-alpha

Disassembly of section .text:

0000000000000000 <__ffs>:
   0:   60 06 f0 73     cttz    a0,v0
   4:   01 80 fa 6b     ret
   8:   1f 04 ff 47     nop
   c:   00 00 fe 2f     unop

0000000000000010 <foo>:
  10:   00 00 30 a4     ldq     t0,0(a0)
  14:   08 00 10 a4     ldq     v0,8(a0)
  18:   c0 04 21 44     cmovne  t0,t0,v0
  1c:   a1 15 20 40     cmpeq   t0,0,t0
  20:   21 d7 20 48     sll     t0,0x6,t0
  24:   60 06 e0 73     cttz    v0,v0
  28:   00 04 01 40     addq    v0,t0,v0
  2c:   01 80 fa 6b     ret

# gcc-4.5.0 -Os -mcpu=ev67 -c z.c && objdump -d z.o

z.o:     file format elf64-alpha

Disassembly of section .text:

0000000000000000 <__ffs>:
   0:   60 06 f0 73     cttz    a0,v0
   4:   01 80 fa 6b     ret

0000000000000008 <foo>:
   8:   00 00 30 a4     ldq     t0,0(a0)
   c:   08 00 10 a4     ldq     v0,8(a0)
  10:   c0 04 21 44     cmovne  t0,t0,v0
  14:   a1 15 20 40     cmpeq   t0,0,t0
  18:   21 d7 20 48     sll     t0,0x6,t0
  1c:   60 06 e0 73     cttz    v0,v0
  20:   00 04 01 40     addq    v0,t0,v0
  24:   01 80 fa 6b     ret

# gcc-4.5.0 -O1 -mcpu=ev67 -c z.c && objdump -d z.o

z.o:     file format elf64-alpha

Disassembly of section .text:

0000000000000000 <__ffs>:
   0:   60 06 f0 73     cttz    a0,v0
   4:   01 80 fa 6b     ret

0000000000000008 <foo>:
   8:   00 00 bb 27     ldah    gp,0(t12)
   c:   00 00 bd 23     lda     gp,0(gp)
  10:   f0 ff de 23     lda     sp,-16(sp)
  14:   00 00 5e b7     stq     ra,0(sp)
  18:   08 00 3e b5     stq     s0,8(sp)
  1c:   00 00 30 a4     ldq     t0,0(a0)
  20:   08 00 10 a6     ldq     a0,8(a0)
  24:   a9 15 20 40     cmpeq   t0,0,s0
  28:   29 d7 20 49     sll     s0,0x6,s0
  2c:   d0 04 21 44     cmovne  t0,t0,a0
  30:   00 00 40 d3     bsr     ra,34 <foo+0x2c>
  34:   00 04 20 41     addq    s0,v0,v0
  38:   00 00 5e a7     ldq     ra,0(sp)
  3c:   08 00 3e a5     ldq     s0,8(sp)
  40:   10 00 de 23     lda     sp,16(sp)
  44:   01 80 fa 6b     ret

# gcc-4.5.0 -O2 -mcpu=ev67 -c z.c && objdump -d z.o

z.o:     file format elf64-alpha

Disassembly of section .text:

0000000000000000 <__ffs>:
   0:   60 06 f0 73     cttz    a0,v0
   4:   01 80 fa 6b     ret
   8:   1f 04 ff 47     nop
   c:   00 00 fe 2f     unop

0000000000000010 <foo>:
  10:   00 00 30 a4     ldq     t0,0(a0)
  14:   08 00 10 a4     ldq     v0,8(a0)
  18:   c0 04 21 44     cmovne  t0,t0,v0
  1c:   a1 15 20 40     cmpeq   t0,0,t0
  20:   21 d7 20 48     sll     t0,0x6,t0
  24:   60 06 e0 73     cttz    v0,v0
  28:   00 04 01 40     addq    v0,t0,v0
  2c:   01 80 fa 6b     ret

# gcc-4.5.0 -O3 -mcpu=ev67 -c z.c && objdump -d z.o

z.o:     file format elf64-alpha

Disassembly of section .text:

0000000000000000 <__ffs>:
   0:   60 06 f0 73     cttz    a0,v0
   4:   01 80 fa 6b     ret
   8:   1f 04 ff 47     nop
   c:   00 00 fe 2f     unop

0000000000000010 <foo>:
  10:   00 00 30 a4     ldq     t0,0(a0)
  14:   08 00 10 a4     ldq     v0,8(a0)
  18:   c0 04 21 44     cmovne  t0,t0,v0
  1c:   a1 15 20 40     cmpeq   t0,0,t0
  20:   21 d7 20 48     sll     t0,0x6,t0
  24:   60 06 e0 73     cttz    v0,v0
  28:   00 04 01 40     addq    v0,t0,v0
  2c:   01 80 fa 6b     ret

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] alpha: simplify and optimize sched_find_first_bit
@ 2010-04-29  3:37 Matt Turner
  2010-04-29  3:56 ` Richard Henderson
  0 siblings, 1 reply; 5+ messages in thread
From: Matt Turner @ 2010-04-29  3:37 UTC (permalink / raw)
  To: linux-alpha
  Cc: Matt Turner, Peter Zijlstra, Ingo Molnar, Richard Henderson,
	Ivan Kokshaysky

Search only the first 100 bits instead of 140, saving a couple
instructions. The resulting code is about 1/3 faster (40K ticks/1000
iterations down to 30K ticks/1000 iterations).

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: linux-alpha@vger.kernel.org
Signed-off-by: Matt Turner <mattst88@gmail.com>
---
 arch/alpha/include/asm/bitops.h |   20 +++++++++-----------
 1 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index 15f3ae2..2e49c1f 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -436,22 +436,20 @@ static inline unsigned int hweight8(unsigned int w)
 
 /*
  * Every architecture must define this function. It's the fastest
- * way of searching a 140-bit bitmap where the first 100 bits are
- * unlikely to be set. It's guaranteed that at least one of the 140
- * bits is set.
+ * way of searching a 100-bit bitmap.  It's guaranteed that at least
+ * one of the 100 bits is cleared.
  */
 static inline unsigned long
-sched_find_first_bit(unsigned long b[3])
+sched_find_first_bit(const unsigned long b[2])
 {
-	unsigned long b0 = b[0], b1 = b[1], b2 = b[2];
-	unsigned long ofs;
+	unsigned long b0, b1, ofs, tmp;
 
-	ofs = (b1 ? 64 : 128);
-	b1 = (b1 ? b1 : b2);
-	ofs = (b0 ? 0 : ofs);
-	b0 = (b0 ? b0 : b1);
+	b0 = b[0];
+	b1 = b[1];
+	ofs = (b0 ? 0 : 64);
+	tmp = (b0 ? b0 : b1);
 
-	return __ffs(b0) + ofs;
+	return __ffs(tmp) + ofs;
 }
 
 #include <asm-generic/bitops/ext2-non-atomic.h>
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] alpha: simplify and optimize sched_find_first_bit
  2010-04-29  3:37 [PATCH] alpha: simplify and optimize sched_find_first_bit Matt Turner
@ 2010-04-29  3:56 ` Richard Henderson
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2010-04-29  3:56 UTC (permalink / raw)
  To: Matt Turner; +Cc: linux-alpha, Peter Zijlstra, Ingo Molnar, Ivan Kokshaysky

On 04/28/2010 08:37 PM, Matt Turner wrote:
> Search only the first 100 bits instead of 140, saving a couple
> instructions. The resulting code is about 1/3 faster (40K ticks/1000
> iterations down to 30K ticks/1000 iterations).
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
> Cc: linux-alpha@vger.kernel.org
> Signed-off-by: Matt Turner <mattst88@gmail.com>

Acked-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-04-29  3:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-29  3:37 [PATCH] alpha: simplify and optimize sched_find_first_bit Matt Turner
2010-04-29  3:56 ` Richard Henderson
  -- strict thread matches above, loose matches on Subject: below --
2010-04-08 18:34 mattst88
2010-04-08 18:52 ` Richard Henderson
     [not found] ` <4BBE2B5C.6020802@twiddle.net>
2010-04-29  2:08   ` Matt Turner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).