xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/alternatives: Force inline stac() and clac()
@ 2014-08-18 16:08 Andrew Cooper
  2014-08-18 20:40 ` Jan Beulich
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Cooper @ 2014-08-18 16:08 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

In this case, we know better than the compiler.

gcc 4.7 (Debian Wheezy) chooses to create translation-unit-local functions
(even for non-debug builds) named stac() and clac(), and calls them.

$ objdump -d xen-syms | grep -c "<stac>:"
6

$ objdump -d xen-syms | grep -o "callq  [0-9a-f]\+ <stac>" | uniq -c
      5 callq  ffff82d0801166c9 <stac>
     20 callq  ffff82d08015ef99 <stac>
      4 callq  ffff82d080165169 <stac>
      8 callq  ffff82d080188cb9 <stac>
      3 callq  ffff82d080228779 <stac>
      4 callq  ffff82d08022c5c9 <stac>

Forcing always_inline removes these functions, and replaces each of the callqs
with the expected 3byte nops.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/include/asm-x86/asm_defns.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/include/asm-x86/asm_defns.h b/xen/include/asm-x86/asm_defns.h
index f8ef1f8..1674c7c 100644
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -179,13 +179,13 @@ void ret_from_intr(void);
 #define ASM_STAC ASM_AC(STAC)
 #define ASM_CLAC ASM_AC(CLAC)
 #else
-static inline void clac(void)
+static always_inline void clac(void)
 {
     /* Note: a barrier is implicit in alternative() */
     alternative(ASM_NOP3, __stringify(__ASM_CLAC), X86_FEATURE_SMAP);
 }
 
-static inline void stac(void)
+static always_inline void stac(void)
 {
     /* Note: a barrier is implicit in alternative() */
     alternative(ASM_NOP3, __stringify(__ASM_STAC), X86_FEATURE_SMAP);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/alternatives: Force inline stac() and clac()
  2014-08-18 16:08 [PATCH] x86/alternatives: Force inline stac() and clac() Andrew Cooper
@ 2014-08-18 20:40 ` Jan Beulich
  2014-08-18 23:21   ` Andrew Cooper
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Beulich @ 2014-08-18 20:40 UTC (permalink / raw)
  To: andrew.cooper3, xen-devel

>>> Andrew Cooper <andrew.cooper3@citrix.com> 08/18/14 6:16 PM >>>
>In this case, we know better than the compiler.
>
>gcc 4.7 (Debian Wheezy) chooses to create translation-unit-local functions
>(even for non-debug builds) named stac() and clac(), and calls them.
>
>$ objdump -d xen-syms | grep -c "<stac>:"
>6
>
>$ objdump -d xen-syms | grep -o "callq  [0-9a-f]\+ <stac>" | uniq -c
      >5 callq  ffff82d0801166c9 <stac>
     >20 callq  ffff82d08015ef99 <stac>
      >4 callq  ffff82d080165169 <stac>
      >8 callq  ffff82d080188cb9 <stac>
      >3 callq  ffff82d080228779 <stac>
      >4 callq  ffff82d08022c5c9 <stac>
>
>Forcing always_inline removes these functions, and replaces each of the callqs
>with the expected 3byte nops.

I'm fine putting the patch in, but isn't this a compiler bug? Creating a 5-byte
call instruction instead of a 3-byte inline expansion should even preclude out
of line placement under -Os (which otherwise is the most likely reason for
functions not getting inlined).

Jan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/alternatives: Force inline stac() and clac()
  2014-08-18 20:40 ` Jan Beulich
@ 2014-08-18 23:21   ` Andrew Cooper
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew Cooper @ 2014-08-18 23:21 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 18/08/2014 21:40, Jan Beulich wrote:
>>>> Andrew Cooper <andrew.cooper3@citrix.com> 08/18/14 6:16 PM >>>
>> In this case, we know better than the compiler.
>>
>> gcc 4.7 (Debian Wheezy) chooses to create translation-unit-local functions
>> (even for non-debug builds) named stac() and clac(), and calls them.
>>
>> $ objdump -d xen-syms | grep -c "<stac>:"
>> 6
>>
>> $ objdump -d xen-syms | grep -o "callq  [0-9a-f]\+ <stac>" | uniq -c
>       >5 callq  ffff82d0801166c9 <stac>
>      >20 callq  ffff82d08015ef99 <stac>
>       >4 callq  ffff82d080165169 <stac>
>       >8 callq  ffff82d080188cb9 <stac>
>       >3 callq  ffff82d080228779 <stac>
>       >4 callq  ffff82d08022c5c9 <stac>
>> Forcing always_inline removes these functions, and replaces each of the callqs
>> with the expected 3byte nops.
> I'm fine putting the patch in, but isn't this a compiler bug? Creating a 5-byte
> call instruction instead of a 3-byte inline expansion should even preclude outdoe
> of line placement under -Os (which otherwise is the most likely reason for
> functions not getting inlined).

I am not sure.  A static inline function in a header file is never a
guarantee for forcing the inlining the function, which is why
always_inline exists.

The ALTERNATIVE() macro does contain three push/pop sections, and the
alternative() macro contains a memory clobber.  It is entirely possible
that gcc has decided early on that abstracting this as a local function
is the easiest automated way to deal with the potential side effects.

It might even be rather more clever, and deciding to optimise fewer
entries being placed in the .altinstructions section, which is an
optimisation we specifically wish to avoid.

Either way, this is a grey area, and I wouldn't say for certain that
this is a compiler bug, but certainly an outcome which we would wish to
avoid.

~Andrew

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-08-18 23:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-18 16:08 [PATCH] x86/alternatives: Force inline stac() and clac() Andrew Cooper
2014-08-18 20:40 ` Jan Beulich
2014-08-18 23:21   ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).