From: Borislav Petkov <bp@amd64.org>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Wu Fengguang <fengguang.wu@intel.com>,
LKML <linux-kernel@vger.kernel.org>,
Jamie Lokier <jamie@shareable.org>,
Roland Dreier <rdreier@cisco.com>,
Al Viro <viro@ZenIV.linux.org.uk>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Ingo Molnar <mingo@elte.hu>, Brian Gerst <brgerst@gmail.com>
Subject: Re: [PATCH 2/5] bitops: compile time optimization for hweight_long(CONSTANT)
Date: Sat, 6 Feb 2010 10:36:59 +0100 [thread overview]
Message-ID: <20100206093659.GA28326@aftab> (raw)
In-Reply-To: <4B6C93A2.1090302@zytor.com>
On Fri, Feb 05, 2010 at 01:54:42PM -0800, H. Peter Anvin wrote:
> On 02/05/2010 04:11 AM, Borislav Petkov wrote:
> > +
> > +unsigned int __arch_hweight16(unsigned int w)
> > +{
> > + unsigned int res = 0;
> > +
> > + asm volatile("xor %%dh, %%dh\n\t"
> > + __arch_hweight_alt(32)
> > + : "=di" (res)
> > + : "di" (w)
> > + : "ecx", "memory");
> > +
>
> This is wrong in more ways than I can shake a stick at.
Thanks for reviewing it though - how else would I learn :).
> a) "di" doesn't mean the DI register - it means the DX register (d) or
> an immediate (i). Since you don't have any reference to either %0 or %1
> in your code, you have no way of knowing which one it is. The
> constraint for the di register is "D".
right.
> b) On 32 bits, the first argument register is in %eax (with %edx used
> for the upper half of a 32-bit argument), but on 64 bits, the first
> argument is in %rdi, with the return still in %rax.
Sure, it is right there in arch/x86/include/asm/calling.h. Shame on me.
> c) You call a C function, but you don't clobber the set of registers
> that a C function would clobber. You either need to put the function in
> an assembly wrapper (which is better in the long run), or clobber the
> full set of registers that is clobbered by a C function (which is better
> in the short term) -- which is eax, edx, ecx on 32 bits, but rax, rdi,
> esi, rdx, rcx, r8, r9, r10, r11 on 64 bits.
I think you mean rsi instead of esi here.
Well, the example Brian pointed me to - __mutex_fastpath_lock - lists
the full set of clobbered registers. Please elaborate on the assembly
wrapper for the function, wouldn't I need to list all the clobbered
registers there too or am I missing something?
> d) On the other hand, you do *not* need a "memory" clobber.
Right, in this case we have all non-barrier like inlines so no memory
clobber, according to the comment above alternative() macro.
Thanks.
--
Regards/Gruss,
Boris.
-
Advanced Micro Devices, Inc.
Operating Systems Research Center
next prev parent reply other threads:[~2010-02-06 9:36 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-30 9:45 [PATCH 0/5] [RESEND] FMODE_NONOTIFY and FMODE_NEG_OFFSET bits Wu Fengguang
2010-01-30 9:45 ` [PATCH 1/5] fanotify: fix FMODE_NONOTIFY bit number Wu Fengguang
2010-02-01 20:44 ` Andrew Morton
2010-01-30 9:45 ` [PATCH 2/5] bitops: compile time optimization for hweight_long(CONSTANT) Wu Fengguang
2010-02-01 20:48 ` Andrew Morton
2010-02-03 13:39 ` Wu Fengguang
2010-02-03 15:08 ` Andrew Morton
2010-02-03 15:15 ` Peter Zijlstra
2010-02-03 15:42 ` Andrew Morton
2010-02-03 15:47 ` Peter Zijlstra
2010-02-03 17:11 ` H. Peter Anvin
2010-02-03 18:14 ` Borislav Petkov
2010-02-03 18:47 ` Peter Zijlstra
2010-02-03 19:49 ` H. Peter Anvin
2010-02-04 15:10 ` Borislav Petkov
2010-02-04 15:13 ` Peter Zijlstra
2010-02-04 15:54 ` Borislav Petkov
2010-02-04 16:04 ` Peter Zijlstra
2010-02-05 12:11 ` Borislav Petkov
2010-02-05 12:14 ` Peter Zijlstra
2010-02-05 21:54 ` H. Peter Anvin
2010-02-06 9:36 ` Borislav Petkov [this message]
2010-02-07 1:55 ` H. Peter Anvin
2010-02-08 9:28 ` Borislav Petkov
2010-02-08 9:35 ` H. Peter Anvin
2010-02-08 9:35 ` H. Peter Anvin
2010-02-08 9:59 ` Borislav Petkov
2010-02-11 17:24 ` Borislav Petkov
2010-02-11 17:33 ` H. Peter Anvin
2010-02-12 17:06 ` Borislav Petkov
2010-02-12 17:28 ` H. Peter Anvin
2010-02-12 17:47 ` Borislav Petkov
2010-02-12 19:05 ` H. Peter Anvin
2010-02-17 13:57 ` Michal Marek
2010-02-17 17:20 ` Borislav Petkov
2010-02-17 17:31 ` Michal Marek
2010-02-17 17:34 ` Borislav Petkov
2010-02-17 17:39 ` Michal Marek
2010-02-18 6:19 ` Borislav Petkov
2010-02-19 14:22 ` [PATCH] x86: Add optimized popcnt variants Borislav Petkov
2010-02-19 16:06 ` H. Peter Anvin
2010-02-19 16:45 ` Borislav Petkov
2010-02-19 16:53 ` H. Peter Anvin
2010-02-22 14:17 ` Borislav Petkov
2010-02-22 17:21 ` H. Peter Anvin
2010-02-22 18:49 ` Borislav Petkov
2010-02-22 19:55 ` H. Peter Anvin
2010-02-23 6:37 ` Borislav Petkov
2010-02-23 15:58 ` Borislav Petkov
2010-02-23 17:34 ` H. Peter Anvin
2010-02-23 17:54 ` Borislav Petkov
2010-02-23 17:54 ` Borislav Petkov
2010-02-23 18:17 ` H. Peter Anvin
2010-02-23 19:06 ` Borislav Petkov
2010-02-26 5:27 ` H. Peter Anvin
2010-02-26 7:47 ` Borislav Petkov
2010-02-26 17:48 ` H. Peter Anvin
2010-02-26 17:48 ` H. Peter Anvin
2010-02-27 8:28 ` Borislav Petkov
2010-02-27 20:00 ` H. Peter Anvin
2010-03-09 15:36 ` Borislav Petkov
2010-03-09 15:50 ` Peter Zijlstra
2010-03-09 16:23 ` Borislav Petkov
2010-03-09 16:32 ` Peter Zijlstra
2010-03-09 17:32 ` Borislav Petkov
2010-03-09 17:37 ` Peter Zijlstra
2010-03-18 11:17 ` Borislav Petkov
2010-03-18 11:19 ` [PATCH 1/2] bitops: Optimize hweight() by making use of compile-time evaluation Borislav Petkov
2010-03-18 11:20 ` [PATCH 2/2] x86: Add optimized popcnt variants Borislav Petkov
2010-04-06 23:04 ` [tip:core/hweight] " tip-bot for Borislav Petkov
2010-04-07 7:02 ` Borislav Petkov
2010-02-18 10:51 ` [PATCH 2/5] bitops: compile time optimization for hweight_long(CONSTANT) Peter Zijlstra
2010-02-18 11:51 ` Borislav Petkov
2010-02-14 10:12 ` Peter Zijlstra
2010-02-14 11:24 ` Borislav Petkov
2010-02-14 12:23 ` Peter Zijlstra
2010-02-14 14:19 ` Borislav Petkov
2010-02-14 18:36 ` H. Peter Anvin
2010-02-14 18:36 ` H. Peter Anvin
2010-02-14 20:28 ` Borislav Petkov
2010-02-14 22:13 ` H. Peter Anvin
2010-02-14 22:13 ` H. Peter Anvin
2010-02-04 15:16 ` H. Peter Anvin
2010-02-04 15:39 ` Brian Gerst
2010-02-04 15:39 ` Brian Gerst
2010-02-03 17:10 ` H. Peter Anvin
2010-01-30 9:45 ` [PATCH 3/5] vfs: O_* bit numbers uniqueness check Wu Fengguang
2010-01-30 9:45 ` [PATCH 4/5] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos Wu Fengguang
2010-01-30 9:45 ` [PATCH 5/5] devmem: dont allow seek to last page Wu Fengguang
-- strict thread matches above, loose matches on Subject: below --
2010-01-22 15:50 [PATCH 00/10] perf/x86 queue Peter Zijlstra
2010-01-22 15:50 ` [PATCH 01/10] perf_events: improve x86 event scheduling (v5) Peter Zijlstra
2010-01-22 15:50 ` [PATCH 02/10] perf_events: Add fast-path to the rescheduling code Peter Zijlstra
2010-01-22 15:50 ` [PATCH 03/10] perf_event: x86: Allocate the fake_cpuc Peter Zijlstra
2010-01-29 9:27 ` [tip:perf/core] " tip-bot for Peter Zijlstra
2010-01-22 15:50 ` [PATCH 04/10] perf_event: x86: Fixup weight tying issue Peter Zijlstra
2010-01-29 9:27 ` [tip:perf/core] perf_event: x86: Fixup constraints typing issue tip-bot for Peter Zijlstra
2010-01-22 15:50 ` [PATCH 05/10] perf_event: x86: Clean up some of the u64/long bitmask casting Peter Zijlstra
2010-01-29 9:27 ` [tip:perf/core] " tip-bot for Peter Zijlstra
2010-01-22 15:50 ` [PATCH 06/10] perf_event: x86: Reduce some overly long lines with some MACROs Peter Zijlstra
2010-01-29 9:27 ` [tip:perf/core] " tip-bot for Peter Zijlstra
2010-01-22 15:50 ` [PATCH 07/10] bitops: Provide compile time HWEIGHT{8,16,32,64} Peter Zijlstra
2010-01-29 9:28 ` [tip:perf/core] " tip-bot for Peter Zijlstra
2010-01-29 10:01 ` Andrew Morton
2010-01-29 10:04 ` Ingo Molnar
2010-01-29 10:13 ` Andrew Morton
2010-01-29 11:03 ` Peter Zijlstra
2010-01-29 16:24 ` Linus Torvalds
2010-01-29 22:50 ` H. Peter Anvin
2010-01-30 16:28 ` Peter Zijlstra
2010-02-01 12:43 ` Peter Zijlstra
2010-02-01 19:06 ` H. Peter Anvin
2010-04-06 23:03 ` [tip:core/hweight] bitops: Optimize hweight() by making use of compile-time evaluation tip-bot for Peter Zijlstra
2010-01-29 10:32 ` [PATCH 07/10] bitops: Provide compile time HWEIGHT{8,16,32,64} John Kacur
2010-01-29 11:05 ` Peter Zijlstra
2010-01-29 11:13 ` John Kacur
2010-01-30 0:09 ` H. Peter Anvin
2010-01-30 7:34 ` Ingo Molnar
2010-01-22 15:50 ` [PATCH 08/10] perf_event: Optimize the constraint searching bits Peter Zijlstra
2010-01-22 16:08 ` Stephane Eranian
2010-01-22 16:22 ` Peter Zijlstra
2010-01-22 16:28 ` Stephane Eranian
2010-01-29 9:28 ` [tip:perf/core] perf_event: x86: " tip-bot for Peter Zijlstra
2010-01-22 15:50 ` [PATCH 09/10] perf_event: x86: Optimize constraint weight computation Peter Zijlstra
2010-01-29 9:28 ` [tip:perf/core] " tip-bot for Peter Zijlstra
2010-01-22 15:50 ` [PATCH 10/10] perf_event: Optimize the fast path a little more Peter Zijlstra
2010-01-29 9:28 ` [tip:perf/core] perf_event: x86: " tip-bot for Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100206093659.GA28326@aftab \
--to=bp@amd64.org \
--cc=akpm@linux-foundation.org \
--cc=brgerst@gmail.com \
--cc=fengguang.wu@intel.com \
--cc=hpa@zytor.com \
--cc=jamie@shareable.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rdreier@cisco.com \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.