From: Ingo Molnar <mingo@kernel.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>,
x86@kernel.org, Rusty Russell <rusty@rustcorp.com.au>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 1/2] x86/bitops: implement __test_bit
Date: Mon, 31 Aug 2015 10:15:26 +0200 [thread overview]
Message-ID: <20150831081526.GB9974@gmail.com> (raw)
In-Reply-To: <20150831075947.GA9974@gmail.com>
* Ingo Molnar <mingo@kernel.org> wrote:
> > Disassembly of section .text:
> >
> > 00000000 <__variable_test_bit>:
> > __variable_test_bit():
> > 0: 8b 54 24 08 mov 0x8(%esp),%edx
> > 4: 8b 44 24 04 mov 0x4(%esp),%eax
> > 8: 0f a3 02 bt %eax,(%edx)
> > b: 19 c0 sbb %eax,%eax
> > d: c3 ret
> > e: 66 90 xchg %ax,%ax
> >
> > 00000010 <__constant_test_bit>:
> > __constant_test_bit():
> > 10: 8b 4c 24 04 mov 0x4(%esp),%ecx
> > 14: 8b 44 24 08 mov 0x8(%esp),%eax
> > 18: 89 ca mov %ecx,%edx
> > 1a: c1 fa 04 sar $0x4,%edx
> > 1d: 8b 04 90 mov (%eax,%edx,4),%eax
> > 20: d3 e8 shr %cl,%eax
> > 22: 83 e0 01 and $0x1,%eax
> > 25: c3 ret
>
> But that's due to the forced interface of generating a return code. Please
> compare it at an inlined usage site, where GCC is free to do the comparison
> directly and use the result in flags.
So I was thinking about the patch below on top of yours.
But turns out GCC indeed generates worse code even under the best of
circumstances. For example the nested_vmx_disable_intercept_for_msr() change:
@@ -4275,24 +4275,24 @@ static void nested_vmx_disable_intercept
*/
if (msr <= 0x1fff) {
if (type & MSR_TYPE_R &&
- !test_bit(msr, msr_bitmap_l1 + 0x000 / f))
+ !__test_bit(msr, msr_bitmap_l1 + 0x000 / f))
/* read-low */
__clear_bit(msr, msr_bitmap_nested + 0x000 / f);
before (i.e. your series):
ffffffff818b1082: 89 d0 mov %edx,%eax
ffffffff818b1084: 48 0f a3 07 bt %rax,(%rdi)
ffffffff818b1088: 45 19 c0 sbb %r8d,%r8d
ffffffff818b108b: 45 85 c0 test %r8d,%r8d
ffffffff818b108e: 75 04 jne ffffffff818b1094 <nested_vmx_disable_intercept_for_msr+0x43>
after (with my 'optimization' patch applied):
ffffffff818b1091: 89 d0 mov %edx,%eax
ffffffff818b1093: 49 89 c0 mov %rax,%r8
ffffffff818b1096: 49 c1 f8 06 sar $0x6,%r8
ffffffff818b109a: 4e 8b 04 c7 mov (%rdi,%r8,8),%r8
ffffffff818b109e: 49 0f a3 d0 bt %rdx,%r8
ffffffff818b10a2: 72 04 jb ffffffff818b10a8 <nested_vmx_disable_intercept_for_msr+0x48>
So GCC when left to its own devices, generates one more instruction and 4 more
bytes. Why does GCC do that? Why doesn't it use BT directly and use the flag, i.e.
something like [pseudocode]:
ffffffff818b1082: 89 d0 mov %edx,%eax
ffffffff818b1084: 48 0f a3 07 bt %rax,(%rdi)
ffffffff818b108e: 75 04 jne ffffffff818b1094 <nested_vmx_disable_intercept_for_msr+0x43>
?
In any case I take back my objection:
Acked-by: Ingo Molnar <mingo@kernel.org>
Thanks,
Ingo
---
arch/x86/include/asm/bitops.h | 19 +------------------
1 file changed, 1 insertion(+), 18 deletions(-)
Index: tip/arch/x86/include/asm/bitops.h
===================================================================
--- tip.orig/arch/x86/include/asm/bitops.h
+++ tip/arch/x86/include/asm/bitops.h
@@ -323,24 +323,12 @@ static inline int variable_test_bit(long
return oldbit;
}
-static __always_inline int __constant_test_bit(long nr, const unsigned long *addr)
+static __always_inline int __test_bit(long nr, const unsigned long *addr)
{
return ((1UL << (nr & (BITS_PER_LONG-1))) &
(addr[nr >> _BITOPS_LONG_SHIFT])) != 0;
}
-static inline int __variable_test_bit(long nr, const unsigned long *addr)
-{
- int oldbit;
-
- asm volatile("bt %2,%1\n\t"
- "sbb %0,%0"
- : "=r" (oldbit)
- : "m" (*addr), "Ir" (nr));
-
- return oldbit;
-}
-
#if 0 /* Fool kernel-doc since it doesn't do macros yet */
/**
* test_bit - Determine whether a bit is set
@@ -362,11 +350,6 @@ static int __test_bit(int nr, const vola
? constant_test_bit((nr), (addr)) \
: variable_test_bit((nr), (addr)))
-#define __test_bit(nr, addr) \
- (__builtin_constant_p((nr)) \
- ? __constant_test_bit((nr), (addr)) \
- : __variable_test_bit((nr), (addr)))
-
/**
* __ffs - find first set bit in word
* @word: The word to search
next prev parent reply other threads:[~2015-08-31 8:15 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-30 8:38 [PATCH 1/2] x86/bitops: implement __test_bit Michael S. Tsirkin
2015-08-30 8:38 ` [PATCH 2/2] kvm/x86: use __test_bit Michael S. Tsirkin
2015-08-31 6:05 ` [PATCH 1/2] x86/bitops: implement __test_bit Ingo Molnar
2015-08-31 6:13 ` H. Peter Anvin
2015-08-31 7:56 ` Michael S. Tsirkin
2015-08-31 7:59 ` Ingo Molnar
2015-08-31 8:15 ` yalin wang
2015-08-31 8:19 ` Ingo Molnar
2015-08-31 8:15 ` Ingo Molnar [this message]
2015-08-31 11:19 ` Michael S. Tsirkin
2015-09-01 9:24 ` Ingo Molnar
2015-09-01 9:40 ` Michael S. Tsirkin
2015-09-01 11:39 ` Ingo Molnar
2015-09-01 15:03 ` Michael S. Tsirkin
2015-09-01 23:48 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150831081526.GB9974@gmail.com \
--to=mingo@kernel.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mst@redhat.com \
--cc=rusty@rustcorp.com.au \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.