linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@amd64.org>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Wu Fengguang <fengguang.wu@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Jamie Lokier <jamie@shareable.org>,
	Roland Dreier <rdreier@cisco.com>,
	Al Viro <viro@ZenIV.linux.org.uk>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH 2/5] bitops: compile time optimization for hweight_long(CONSTANT)
Date: Thu, 4 Feb 2010 16:10:50 +0100	[thread overview]
Message-ID: <20100204151050.GC32711@aftab> (raw)
In-Reply-To: <4B69D362.10608@zytor.com>

On Wed, Feb 03, 2010 at 11:49:54AM -0800, H. Peter Anvin wrote:
> On 02/03/2010 10:47 AM, Peter Zijlstra wrote:
> > On Wed, 2010-02-03 at 19:14 +0100, Borislav Petkov wrote:
> > 
> >> alternative("call hweightXX", "popcnt", X86_FEATURE_POPCNT)
> > 
> > Make sure to apply a 0xff bitmask to the popcnt r16 call for hweight8(),
> > and hweight64() needs a bit of magic for 32bit, but yes, something like
> > that ought to work nicely.
> > 
> 
> Arguably the "best" option is to have the alternative being a jump to an
> out-of-line stub which does the necessary parameter marshalling before
> calling a stub.  This technique is already used in a few other places.

Ok, here's a first alpha prototype and completely untested. The asm
output looks ok though. I've added separate 32-bit and 64-bit helpers in
order to dispense with the if-else tests. The hw-popcnt versions are the
opcodes for "popcnt %eax, %eax" and "popcnt %rax, %rax", respectively,
so %rAX has to be preloaded with the bitmask and the computed value
has to be retrieved from there afterwards. And yes, it looks not that
elegant so I'm open for suggestions.

The good thing is, this should work on any toolchain since we don't rely
on the compiler to know about popcnt and we're protected by CPUID flag
so that the hw-popcnt version is used only on processors which support
it.

Please take a good look and let me know what do you guys think.

Thanks.

--
 arch/x86/include/asm/bitops.h |    4 ++
 arch/x86/lib/Makefile         |    2 +-
 arch/x86/lib/popcnt.c         |   62 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/lib/popcnt.c

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 02b47a6..deb5013 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -434,6 +434,10 @@ static inline int fls(int x)
 #endif
 	return r + 1;
 }
+
+
+extern int arch_hweight_long(unsigned long);
+
 #endif /* __KERNEL__ */
 
 #undef ADDR
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index cffd754..c03fe2d 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -22,7 +22,7 @@ lib-y += usercopy_$(BITS).o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_KPROBES) += insn.o inat.o
 
-obj-y += msr.o msr-reg.o msr-reg-export.o
+obj-y += msr.o msr-reg.o msr-reg-export.o popcnt.o
 
 ifeq ($(CONFIG_X86_32),y)
         obj-y += atomic64_32.o
diff --git a/arch/x86/lib/popcnt.c b/arch/x86/lib/popcnt.c
new file mode 100644
index 0000000..179a6e8
--- /dev/null
+++ b/arch/x86/lib/popcnt.c
@@ -0,0 +1,62 @@
+#include <linux/kernel.h>
+#include <linux/bitops.h>
+
+int _hweight32(void)
+{
+	unsigned long w;
+
+	asm volatile("" : "=a" (w));
+
+	return hweight32(w);
+}
+
+int _hweight64(void)
+{
+	unsigned long w;
+
+	asm volatile("" : "=a" (w));
+
+	return hweight64(w);
+}
+
+int _popcnt32(void)
+{
+
+	unsigned long w;
+
+	asm volatile(".byte 0xf3\n\t.byte 0x0f\n\t.byte 0xb8\n\t.byte 0xc0\n\t"
+			: "=a" (w));
+
+	return w;
+}
+
+int _popcnt64(void)
+{
+
+	unsigned long w;
+
+	asm volatile(".byte 0xf3\n\t.byte 0x48\n\t.byte 0x0f\n\t."
+		     "byte 0xb8\n\t.byte 0xc0\n\t"
+		     : "=a" (w));
+
+	return w;
+}
+
+int arch_hweight_long(unsigned long w)
+{
+	if (sizeof(w) == 4) {
+		asm volatile("movl %[w], %%eax" :: [w] "r" (w));
+		alternative("call _hweight32",
+			    "call _popcnt32",
+			    X86_FEATURE_POPCNT);
+		asm volatile("" : "=a" (w));
+
+	} else {
+		asm volatile("movq %[w], %%rax" :: [w] "r" (w));
+		alternative("call _hweight64",
+			    "call _popcnt64",
+			    X86_FEATURE_POPCNT);
+		asm volatile("" : "=a" (w));
+	}
+	return w;
+}
-- 
1.6.6

-- 
Regards/Gruss,
Boris.

--
Advanced Micro Devices, Inc.
Operating Systems Research Center

  reply	other threads:[~2010-02-04 15:10 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-30  9:45 [PATCH 0/5] [RESEND] FMODE_NONOTIFY and FMODE_NEG_OFFSET bits Wu Fengguang
2010-01-30  9:45 ` [PATCH 1/5] fanotify: fix FMODE_NONOTIFY bit number Wu Fengguang
2010-02-01 20:44   ` Andrew Morton
2010-01-30  9:45 ` [PATCH 2/5] bitops: compile time optimization for hweight_long(CONSTANT) Wu Fengguang
2010-02-01 20:48   ` Andrew Morton
2010-02-03 13:39     ` Wu Fengguang
2010-02-03 15:08       ` Andrew Morton
2010-02-03 15:15         ` Peter Zijlstra
2010-02-03 15:42           ` Andrew Morton
2010-02-03 15:47             ` Peter Zijlstra
2010-02-03 17:11               ` H. Peter Anvin
2010-02-03 18:14             ` Borislav Petkov
2010-02-03 18:47               ` Peter Zijlstra
2010-02-03 19:49                 ` H. Peter Anvin
2010-02-04 15:10                   ` Borislav Petkov [this message]
2010-02-04 15:13                     ` Peter Zijlstra
2010-02-04 15:54                       ` Borislav Petkov
2010-02-04 16:04                         ` Peter Zijlstra
2010-02-05 12:11                           ` Borislav Petkov
2010-02-05 12:14                             ` Peter Zijlstra
2010-02-05 21:54                             ` H. Peter Anvin
2010-02-06  9:36                               ` Borislav Petkov
2010-02-07  1:55                                 ` H. Peter Anvin
2010-02-08  9:28                                   ` Borislav Petkov
2010-02-08  9:35                                     ` H. Peter Anvin
2010-02-08  9:59                                       ` Borislav Petkov
2010-02-11 17:24                                         ` Borislav Petkov
2010-02-11 17:33                                           ` H. Peter Anvin
2010-02-12 17:06                                             ` Borislav Petkov
2010-02-12 17:28                                               ` H. Peter Anvin
2010-02-12 17:47                                                 ` Borislav Petkov
2010-02-12 19:05                                                   ` H. Peter Anvin
2010-02-17 13:57                                                     ` Michal Marek
2010-02-17 17:20                                                       ` Borislav Petkov
2010-02-17 17:31                                                         ` Michal Marek
2010-02-17 17:34                                                           ` Borislav Petkov
2010-02-17 17:39                                                           ` Michal Marek
2010-02-18  6:19                                                             ` Borislav Petkov
2010-02-19 14:22                                                               ` [PATCH] x86: Add optimized popcnt variants Borislav Petkov
2010-02-19 16:06                                                                 ` H. Peter Anvin
2010-02-19 16:45                                                                   ` Borislav Petkov
2010-02-19 16:53                                                                     ` H. Peter Anvin
2010-02-22 14:17                                                                       ` Borislav Petkov
2010-02-22 17:21                                                                         ` H. Peter Anvin
2010-02-22 18:49                                                                           ` Borislav Petkov
2010-02-22 19:55                                                                             ` H. Peter Anvin
2010-02-23  6:37                                                                               ` Borislav Petkov
2010-02-23 15:58                                                                               ` Borislav Petkov
2010-02-23 17:34                                                                                 ` H. Peter Anvin
2010-02-23 17:54                                                                                   ` Borislav Petkov
2010-02-23 18:17                                                                                     ` H. Peter Anvin
2010-02-23 19:06                                                                                       ` Borislav Petkov
2010-02-26  5:27                                                                                         ` H. Peter Anvin
2010-02-26  7:47                                                                                           ` Borislav Petkov
2010-02-26 17:48                                                                                             ` H. Peter Anvin
2010-02-27  8:28                                                                                               ` Borislav Petkov
2010-02-27 20:00                                                                                                 ` H. Peter Anvin
2010-03-09 15:36                                                                                                   ` Borislav Petkov
2010-03-09 15:50                                                                                                     ` Peter Zijlstra
2010-03-09 16:23                                                                                                       ` Borislav Petkov
2010-03-09 16:32                                                                                                         ` Peter Zijlstra
2010-03-09 17:32                                                                                                           ` Borislav Petkov
2010-03-09 17:37                                                                                                             ` Peter Zijlstra
2010-03-18 11:17                                                                                                   ` Borislav Petkov
2010-03-18 11:19                                                                                                   ` [PATCH 1/2] bitops: Optimize hweight() by making use of compile-time evaluation Borislav Petkov
2010-03-18 11:20                                                                                                   ` [PATCH 2/2] x86: Add optimized popcnt variants Borislav Petkov
2010-02-18 10:51                                                       ` [PATCH 2/5] bitops: compile time optimization for hweight_long(CONSTANT) Peter Zijlstra
2010-02-18 11:51                                                         ` Borislav Petkov
2010-02-14 10:12                                           ` Peter Zijlstra
2010-02-14 11:24                                             ` Borislav Petkov
2010-02-14 12:23                                               ` Peter Zijlstra
2010-02-14 14:19                                                 ` Borislav Petkov
2010-02-14 18:36                                               ` H. Peter Anvin
2010-02-14 20:28                                                 ` Borislav Petkov
2010-02-14 22:13                                                   ` H. Peter Anvin
2010-02-04 15:16                     ` H. Peter Anvin
2010-02-04 15:39                     ` Brian Gerst
2010-02-03 17:10       ` H. Peter Anvin
2010-01-30  9:45 ` [PATCH 3/5] vfs: O_* bit numbers uniqueness check Wu Fengguang
2010-01-30  9:45 ` [PATCH 4/5] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos Wu Fengguang
2010-01-30  9:45 ` [PATCH 5/5] devmem: dont allow seek to last page Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100204151050.GC32711@aftab \
    --to=bp@amd64.org \
    --cc=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=hpa@zytor.com \
    --cc=jamie@shareable.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rdreier@cisco.com \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).