[PATCH 1/3] X86: Optimise fls(), ffs() and fls64()

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: David Howells <dhowells@redhat.com>
To: torvalds@osdl.org, mingo@elte.hu, tglx@linutronix.de
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Howells <dhowells@redhat.com>
Subject: [PATCH 1/3] X86: Optimise fls(), ffs() and fls64()
Date: Fri, 26 Mar 2010 14:42:41 +0000	[thread overview]
Message-ID: <20100326144241.8583.95617.stgit@warthog.procyon.org.uk> (raw)

fls(N), ffs(N) and fls64(N) can be optimised on x86/x86_64.  Currently they
perform checks against N being 0 before invoking the BSR/BSF instruction, or
use a CMOV instruction afterwards.  Either the check involves a conditional
jump which we'd like to avoid, or a CMOV, which we'd also quite like to avoid.

Instead, we can make use of the fact that BSR/BSF doesn't modify its output
register if its input is 0.  By preloading the output with -1 and incrementing
the result, we achieve the desired result without the need for a conditional
check.

The 32-bit version of fls64() can also have all its conditional checks removed
by chaining BSRs with a subtraction in the middle.  However, this may be
suboptimal on CPUs for which BSR/BSF is really slow (i486 and below for
example).

Signed-off-by: David Howells <dhowells@redhat.com>
---

 arch/x86/include/asm/bitops.h |   79 ++++++++++++++++++++++++++---------------
 1 files changed, 50 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 02b47a6..a8e85ab 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -394,18 +394,12 @@ static inline unsigned long __fls(unsigned long word)
  */
 static inline int ffs(int x)
 {
-	int r;
-#ifdef CONFIG_X86_CMOV
-	asm("bsfl %1,%0\n\t"
-	    "cmovzl %2,%0"
-	    : "=r" (r) : "rm" (x), "r" (-1));
-#else
-	asm("bsfl %1,%0\n\t"
-	    "jnz 1f\n\t"
-	    "movl $-1,%0\n"
-	    "1:" : "=r" (r) : "rm" (x));
-#endif
-	return r + 1;
+	int bitpos = -1;
+
+	asm("bsfl %1,%0"
+	    : "+r" (bitpos)
+	    : "rm" (x));
+	return bitpos + 1;
 }
 
 /**
@@ -421,19 +415,52 @@ static inline int ffs(int x)
  */
 static inline int fls(int x)
 {
-	int r;
-#ifdef CONFIG_X86_CMOV
-	asm("bsrl %1,%0\n\t"
-	    "cmovzl %2,%0"
-	    : "=&r" (r) : "rm" (x), "rm" (-1));
+	int bitpos = -1;
+
+	asm("bsrl %1,%0"
+	    : "+r" (bitpos)
+	    : "rm" (x));
+	return bitpos + 1;
+}
+
+/**
+ * fls64 - find last set bit in a 64-bit word
+ * @x: the word to search
+ *
+ * This is defined in a similar way as the libc and compiler builtin
+ * ffsll, but returns the position of the most significant set bit.
+ *
+ * fls64(value) returns 0 if value is 0 or the position of the last
+ * set bit if value is nonzero. The last (most significant) bit is
+ * at position 64.
+ */
+#if BITS_PER_LONG == 32
+static __always_inline int fls64(__u64 x)
+{
+	__u32 h = x >> 32;
+	int bitpos = -1;
+
+	asm("bsrl	%1,%0	\n"
+	    "subl	%2,%0	\n"
+	    "bsrl	%3,%0	\n"
+	    : "+r" (bitpos)
+	    : "rm" (x), "i"(32), "rm" (h));
+
+	return bitpos + 33;
+}
+#elif BITS_PER_LONG == 64
+static __always_inline int fls64(__u64 x)
+{
+	long bitpos = -1;
+
+	asm("bsrq %1,%0"
+	    : "+r" (bitpos)
+	    : "rm" (x));
+	return bitpos + 1;
+}
 #else
-	asm("bsrl %1,%0\n\t"
-	    "jnz 1f\n\t"
-	    "movl $-1,%0\n"
-	    "1:" : "=r" (r) : "rm" (x));
+#error BITS_PER_LONG not 32 or 64
 #endif
-	return r + 1;
-}
 #endif /* __KERNEL__ */
 
 #undef ADDR
@@ -446,12 +473,6 @@ static inline int fls(int x)
 
 #include <asm-generic/bitops/hweight.h>
 
-#endif /* __KERNEL__ */
-
-#include <asm-generic/bitops/fls64.h>
-
-#ifdef __KERNEL__
-
 #include <asm-generic/bitops/ext2-non-atomic.h>
 
 #define ext2_set_bit_atomic(lock, nr, addr)			\

next             reply	other threads:[~2010-03-26 14:43 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4BACCB4E.7010108@draigBrady.com>
2010-03-26 14:42 ` David Howells [this message]
2010-03-26 14:42   ` [PATCH 2/3] Adjust the comment on get_order() to describe the size==0 case David Howells
2010-03-26 14:42   ` [PATCH 3/3] Optimise get_order() David Howells
2010-03-26 14:42     ` David Howells
2010-03-26 17:23   ` [PATCH 1/3] X86: Optimise fls(), ffs() and fls64() Linus Torvalds
2010-03-26 17:37     ` Scott Lurndal
2010-03-26 17:42       ` Linus Torvalds
2010-04-06 13:57         ` Jamie Lokier
2010-04-06 14:40           ` Linus Torvalds
2010-03-26 17:42   ` David Howells
2010-03-26 17:45     ` Linus Torvalds
2010-03-26 17:58       ` Ralf Baechle
2010-03-26 18:03         ` Linus Torvalds
2010-03-26 18:16           ` Matthew Wilcox
2010-04-06 13:30           ` Matthew Wilcox
2010-04-14 11:49           ` David Howells
2010-04-14 14:30             ` Avi Kivity
2010-04-15  8:48             ` David Howells
2010-04-15  8:49               ` Avi Kivity
2010-04-15 11:41                 ` Jamie Lokier
2010-03-26 17:52     ` Matthew Wilcox
2010-04-14 13:13   ` David Howells
2010-01-13 19:39 David Howells
2010-01-13 20:15 ` Geert Uytterhoeven
2010-01-13 21:59 ` David Howells

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:02b47a6 dfblob:a8e85ab )
 OR (
bs:"[PATCH 1/3] X86: Optimise fls(), ffs() and fls64()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100326144241.8583.95617.stgit@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).