linux-alpha.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] alpha: simplify and optimize sched_find_first_bit
@ 2010-04-08 18:34 mattst88
  2010-04-08 18:52 ` Richard Henderson
       [not found] ` <4BBE2B5C.6020802@twiddle.net>
  0 siblings, 2 replies; 5+ messages in thread
From: mattst88 @ 2010-04-08 18:34 UTC (permalink / raw)
  To: linux-alpha
  Cc: Matt Turner, Peter Zijlstra, Ingo Molnar, Richard Henderson,
	Ivan Kokshaysky

From: Matt Turner <mattst88@gmail.com>

Search only the first 100 bits instead of 140, saving a couple
instructions. Also use inline assembly to use cmov instructions instead
of letting gcc emit multiple branches. The resulting code is about 6x
faster.

Thanks to Zack Weinberg and Uros Bizjak (GCC Bug 43691) for helping me
identify problems with the inline assembly.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: linux-alpha@vger.kernel.org
Signed-off-by: Matt Turner <mattst88@gmail.com>
---
 arch/alpha/include/asm/bitops.h |   24 ++++++++++++------------
 1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index 15f3ae2..cfa7526 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -436,22 +436,22 @@ static inline unsigned int hweight8(unsigned int w)
 
 /*
  * Every architecture must define this function. It's the fastest
- * way of searching a 140-bit bitmap where the first 100 bits are
- * unlikely to be set. It's guaranteed that at least one of the 140
- * bits is set.
+ * way of searching a 100-bit bitmap.  It's guaranteed that at least
+ * one of the 100 bits is cleared.
  */
 static inline unsigned long
-sched_find_first_bit(unsigned long b[3])
+sched_find_first_bit(const unsigned long b[2])
 {
-	unsigned long b0 = b[0], b1 = b[1], b2 = b[2];
 	unsigned long ofs;
-
-	ofs = (b1 ? 64 : 128);
-	b1 = (b1 ? b1 : b2);
-	ofs = (b0 ? 0 : ofs);
-	b0 = (b0 ? b0 : b1);
-
-	return __ffs(b0) + ofs;
+	unsigned long output;
+	asm(
+		"cmoveq %0,64,%1        # ofs = (b[0] ? ofs : 64);\n"
+		"cmoveq %0,%2,%0        # temp = (b[0] ? b[0] : b[1]);\n"
+		"cttz   %0,%0           # output = cttz(temp);\n "
+		: "=r" (output), "=r" (ofs)
+		: "r" (b[1]), "0" (b[0]), "1" (0)
+	);
+	return output + ofs;
 }
 
 #include <asm-generic/bitops/ext2-non-atomic.h>
-- 
1.6.4.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread
* [PATCH] alpha: simplify and optimize sched_find_first_bit
@ 2010-04-29  3:37 Matt Turner
  2010-04-29  3:56 ` Richard Henderson
  0 siblings, 1 reply; 5+ messages in thread
From: Matt Turner @ 2010-04-29  3:37 UTC (permalink / raw)
  To: linux-alpha
  Cc: Matt Turner, Peter Zijlstra, Ingo Molnar, Richard Henderson,
	Ivan Kokshaysky

Search only the first 100 bits instead of 140, saving a couple
instructions. The resulting code is about 1/3 faster (40K ticks/1000
iterations down to 30K ticks/1000 iterations).

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: linux-alpha@vger.kernel.org
Signed-off-by: Matt Turner <mattst88@gmail.com>
---
 arch/alpha/include/asm/bitops.h |   20 +++++++++-----------
 1 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index 15f3ae2..2e49c1f 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -436,22 +436,20 @@ static inline unsigned int hweight8(unsigned int w)
 
 /*
  * Every architecture must define this function. It's the fastest
- * way of searching a 140-bit bitmap where the first 100 bits are
- * unlikely to be set. It's guaranteed that at least one of the 140
- * bits is set.
+ * way of searching a 100-bit bitmap.  It's guaranteed that at least
+ * one of the 100 bits is cleared.
  */
 static inline unsigned long
-sched_find_first_bit(unsigned long b[3])
+sched_find_first_bit(const unsigned long b[2])
 {
-	unsigned long b0 = b[0], b1 = b[1], b2 = b[2];
-	unsigned long ofs;
+	unsigned long b0, b1, ofs, tmp;
 
-	ofs = (b1 ? 64 : 128);
-	b1 = (b1 ? b1 : b2);
-	ofs = (b0 ? 0 : ofs);
-	b0 = (b0 ? b0 : b1);
+	b0 = b[0];
+	b1 = b[1];
+	ofs = (b0 ? 0 : 64);
+	tmp = (b0 ? b0 : b1);
 
-	return __ffs(b0) + ofs;
+	return __ffs(tmp) + ofs;
 }
 
 #include <asm-generic/bitops/ext2-non-atomic.h>
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-04-29  3:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-08 18:34 [PATCH] alpha: simplify and optimize sched_find_first_bit mattst88
2010-04-08 18:52 ` Richard Henderson
     [not found] ` <4BBE2B5C.6020802@twiddle.net>
2010-04-29  2:08   ` Matt Turner
  -- strict thread matches above, loose matches on Subject: below --
2010-04-29  3:37 Matt Turner
2010-04-29  3:56 ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).