From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH] x86/alternatives: Force inline stac() and clac() Date: Tue, 19 Aug 2014 00:21:12 +0100 Message-ID: <53F28A68.9060704@citrix.com> References: <1408378137-16138-1-git-send-email-andrew.cooper3@citrix.com> <53F272B502000078000BABA7@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <53F272B502000078000BABA7@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 18/08/2014 21:40, Jan Beulich wrote: >>>> Andrew Cooper 08/18/14 6:16 PM >>> >> In this case, we know better than the compiler. >> >> gcc 4.7 (Debian Wheezy) chooses to create translation-unit-local functions >> (even for non-debug builds) named stac() and clac(), and calls them. >> >> $ objdump -d xen-syms | grep -c ":" >> 6 >> >> $ objdump -d xen-syms | grep -o "callq [0-9a-f]\+ " | uniq -c > >5 callq ffff82d0801166c9 > >20 callq ffff82d08015ef99 > >4 callq ffff82d080165169 > >8 callq ffff82d080188cb9 > >3 callq ffff82d080228779 > >4 callq ffff82d08022c5c9 >> Forcing always_inline removes these functions, and replaces each of the callqs >> with the expected 3byte nops. > I'm fine putting the patch in, but isn't this a compiler bug? Creating a 5-byte > call instruction instead of a 3-byte inline expansion should even preclude outdoe > of line placement under -Os (which otherwise is the most likely reason for > functions not getting inlined). I am not sure. A static inline function in a header file is never a guarantee for forcing the inlining the function, which is why always_inline exists. The ALTERNATIVE() macro does contain three push/pop sections, and the alternative() macro contains a memory clobber. It is entirely possible that gcc has decided early on that abstracting this as a local function is the easiest automated way to deal with the potential side effects. It might even be rather more clever, and deciding to optimise fewer entries being placed in the .altinstructions section, which is an optimisation we specifically wish to avoid. Either way, this is a grey area, and I wouldn't say for certain that this is a compiler bug, but certainly an outcome which we would wish to avoid. ~Andrew