All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	x86@kernel.org, Rusty Russell <rusty@rustcorp.com.au>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 1/2] x86/bitops: implement __test_bit
Date: Mon, 31 Aug 2015 10:15:26 +0200	[thread overview]
Message-ID: <20150831081526.GB9974@gmail.com> (raw)
In-Reply-To: <20150831075947.GA9974@gmail.com>


* Ingo Molnar <mingo@kernel.org> wrote:

> > Disassembly of section .text:
> > 
> > 00000000 <__variable_test_bit>:
> > __variable_test_bit():
> >    0:   8b 54 24 08             mov    0x8(%esp),%edx
> >    4:   8b 44 24 04             mov    0x4(%esp),%eax
> >    8:   0f a3 02                bt     %eax,(%edx)
> >    b:   19 c0                   sbb    %eax,%eax
> >    d:   c3                      ret    
> >    e:   66 90                   xchg   %ax,%ax
> > 
> > 00000010 <__constant_test_bit>:
> > __constant_test_bit():
> >   10:   8b 4c 24 04             mov    0x4(%esp),%ecx
> >   14:   8b 44 24 08             mov    0x8(%esp),%eax
> >   18:   89 ca                   mov    %ecx,%edx
> >   1a:   c1 fa 04                sar    $0x4,%edx
> >   1d:   8b 04 90                mov    (%eax,%edx,4),%eax
> >   20:   d3 e8                   shr    %cl,%eax
> >   22:   83 e0 01                and    $0x1,%eax
> >   25:   c3                      ret    
> 
> But that's due to the forced interface of generating a return code. Please 
> compare it at an inlined usage site, where GCC is free to do the comparison 
> directly and use the result in flags.

So I was thinking about the patch below on top of yours.

But turns out GCC indeed generates worse code even under the best of 
circumstances. For example the nested_vmx_disable_intercept_for_msr() change:

@@ -4275,24 +4275,24 @@ static void nested_vmx_disable_intercept
         */
        if (msr <= 0x1fff) {
                if (type & MSR_TYPE_R &&
-                  !test_bit(msr, msr_bitmap_l1 + 0x000 / f))
+                  !__test_bit(msr, msr_bitmap_l1 + 0x000 / f))
                        /* read-low */
                        __clear_bit(msr, msr_bitmap_nested + 0x000 / f);

before (i.e. your series):

ffffffff818b1082:       89 d0                   mov    %edx,%eax
ffffffff818b1084:       48 0f a3 07             bt     %rax,(%rdi)
ffffffff818b1088:       45 19 c0                sbb    %r8d,%r8d
ffffffff818b108b:       45 85 c0                test   %r8d,%r8d
ffffffff818b108e:       75 04                   jne    ffffffff818b1094 <nested_vmx_disable_intercept_for_msr+0x43>

after (with my 'optimization' patch applied):

ffffffff818b1091:       89 d0                   mov    %edx,%eax
ffffffff818b1093:       49 89 c0                mov    %rax,%r8
ffffffff818b1096:       49 c1 f8 06             sar    $0x6,%r8
ffffffff818b109a:       4e 8b 04 c7             mov    (%rdi,%r8,8),%r8
ffffffff818b109e:       49 0f a3 d0             bt     %rdx,%r8
ffffffff818b10a2:       72 04                   jb     ffffffff818b10a8 <nested_vmx_disable_intercept_for_msr+0x48>

So GCC when left to its own devices, generates one more instruction and 4 more 
bytes. Why does GCC do that? Why doesn't it use BT directly and use the flag, i.e. 
something like [pseudocode]:

ffffffff818b1082:       89 d0                   mov    %edx,%eax
ffffffff818b1084:       48 0f a3 07             bt     %rax,(%rdi)
ffffffff818b108e:       75 04                   jne    ffffffff818b1094 <nested_vmx_disable_intercept_for_msr+0x43>

?

In any case I take back my objection:

  Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo


---
 arch/x86/include/asm/bitops.h |   19 +------------------
 1 file changed, 1 insertion(+), 18 deletions(-)

Index: tip/arch/x86/include/asm/bitops.h
===================================================================
--- tip.orig/arch/x86/include/asm/bitops.h
+++ tip/arch/x86/include/asm/bitops.h
@@ -323,24 +323,12 @@ static inline int variable_test_bit(long
 	return oldbit;
 }
 
-static __always_inline int __constant_test_bit(long nr, const unsigned long *addr)
+static __always_inline int __test_bit(long nr, const unsigned long *addr)
 {
 	return ((1UL << (nr & (BITS_PER_LONG-1))) &
 		(addr[nr >> _BITOPS_LONG_SHIFT])) != 0;
 }
 
-static inline int __variable_test_bit(long nr, const unsigned long *addr)
-{
-	int oldbit;
-
-	asm volatile("bt %2,%1\n\t"
-		     "sbb %0,%0"
-		     : "=r" (oldbit)
-		     : "m" (*addr), "Ir" (nr));
-
-	return oldbit;
-}
-
 #if 0 /* Fool kernel-doc since it doesn't do macros yet */
 /**
  * test_bit - Determine whether a bit is set
@@ -362,11 +350,6 @@ static int __test_bit(int nr, const vola
 	 ? constant_test_bit((nr), (addr))	\
 	 : variable_test_bit((nr), (addr)))
 
-#define __test_bit(nr, addr)			\
-	(__builtin_constant_p((nr))		\
-	 ? __constant_test_bit((nr), (addr))	\
-	 : __variable_test_bit((nr), (addr)))
-
 /**
  * __ffs - find first set bit in word
  * @word: The word to search


  parent reply	other threads:[~2015-08-31  8:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-30  8:38 [PATCH 1/2] x86/bitops: implement __test_bit Michael S. Tsirkin
2015-08-30  8:38 ` [PATCH 2/2] kvm/x86: use __test_bit Michael S. Tsirkin
2015-08-31  6:05 ` [PATCH 1/2] x86/bitops: implement __test_bit Ingo Molnar
2015-08-31  6:13   ` H. Peter Anvin
2015-08-31  7:56     ` Michael S. Tsirkin
2015-08-31  7:59       ` Ingo Molnar
2015-08-31  8:15         ` yalin wang
2015-08-31  8:19           ` Ingo Molnar
2015-08-31  8:15         ` Ingo Molnar [this message]
2015-08-31 11:19         ` Michael S. Tsirkin
2015-09-01  9:24           ` Ingo Molnar
2015-09-01  9:40             ` Michael S. Tsirkin
2015-09-01 11:39               ` Ingo Molnar
2015-09-01 15:03                 ` Michael S. Tsirkin
2015-09-01 23:48                   ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150831081526.GB9974@gmail.com \
    --to=mingo@kernel.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mst@redhat.com \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.