Re: [PATCH v2 7/7] x86/build: Use new .nop directive when available

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel@lists.xen.org, wei.liu2@citrix.com,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH v2 7/7] x86/build: Use new .nop directive when available
Date: Fri, 2 Mar 2018 19:34:54 +0000	[thread overview]
Message-ID: <c82e6bd5-c755-34be-92bd-593c46ebf4fd@citrix.com> (raw)
In-Reply-To: <5A99070D02000078001ADA01@prv-mh.provo.novell.com>

On 02/03/18 07:10, Jan Beulich wrote:
>>>> On 01.03.18 at 17:58, <andrew.cooper3@citrix.com> wrote:
>> On 01/03/18 10:54, Jan Beulich wrote:
>>>>>> On 01.03.18 at 11:36, <roger.pau@citrix.com> wrote:
>>>> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
>>>>>>>> Andrew Cooper <andrew.cooper3@citrix.com> 02/28/18 7:20 PM >>>
>>>>>> On 28/02/18 16:22, Jan Beulich wrote:
>>>>>>>>>> On 26.02.18 at 12:35, <andrew.cooper3@citrix.com> wrote:
>>>>>>>> --- a/xen/include/asm-x86/alternative-asm.h
>>>>>>>> +++ b/xen/include/asm-x86/alternative-asm.h
>>>>>>>> @@ -1,6 +1,8 @@
>>>>>>>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>>>>>>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>>>>>>>  
>>>>>>>> +#include <asm/nops.h>
>>>>>>>> +
>>>>>>>>  #ifdef __ASSEMBLY__
>>>>>>>>  
>>>>>>>>  /*
>>>>>>>> @@ -18,6 +20,14 @@
>>>>>>>>      .byte \pad_len
>>>>>>>>  .endm
>>>>>>>>  
>>>>>>>> +.macro mknops nr_bytes
>>>>>>>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>>>>>>>> +    .nop \nr_bytes, ASM_NOP_MAX
>>>>>>> And do you really need to specify ASM_NOP_MAX here? What's
>>>>>>> wrong with letting the assembler pick what it wants as the longest
>>>>>>> NOP?
>>>>>> I don't want a toolchain change to cause us to go beyond 11 byte nops,
>>>>>> because of the associated decode stall on almost all hardware.  Using
>>>>>> ASM_NOP_MAX seemed like the easiest way to keep the end result
>>>>>> consistent, irrespective of toolchain support.
>>>>> I don't understand - an earlier patch takes care of runtime replacing them
>>>>> anyway. What stalls can then result?
>>>> The runtime replacement won't happen when using the .nops directive
>>>> AFAICT, because the original padding section will likely be filled
>>>> with opcodes different than 0x90, and thus the runtime nop
>>>> optimization won't be performed.
>>> Oh, indeed. That puts under question the whole idea of using
>>> .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer
>>> to withdraw my ack.
>>>
>>>> I also agree that using the default (not proving a second argument)
>>>> seems like a better solution. Why would the toolstack switch to
>>>> something that leads to worse performance? That would certainly be
>>>> considered a bug.
>>> Why? They may change it based on data available for newer /
>>> older / whatever hardware. Any build-time choice is going to be
>>> suboptimal somewhere, so I think we absolutely should not
>>> bypass runtime replacing these NOPs, the more that now we
>>> may have quite large sequences of them.
>> The pont of having the toolchain put out optimised nops is to avoid the
>> need for us to patch the site at all.  I.e. calling optimise_nops() on a
>> set of toolchain nops defeats the purpose in the overwhelming common
>> case of running on a system which prefers P6 nops.
>>
>> The problem of working out when to optimise is that, when we come to
>> apply an individual alternative, we don't know if we've already patched
>> this site before.  Even the unoptimised algorithm has a corner case
>> which explodes, if there is a stream of 0x90's on the end of a
>> replacement e.g. in a imm or disp field.
>>
>> Put simply, we cannot determine, by peeking at the patchsite, whether it
>> has been patched or not (other than keeping a full copy of the origin
>> site as a reference).  As soon as we chose to optimise the nops of the
>> origin site, we cannot determine anything at all.
>>
>> Thinking out loud, we could perhaps have a section containing one byte
>> per origin site, which we use to track whether we've already optimised
>> the padding bytes, and whether the contents have been replaced.  This
>> would also add an extra long into struct altentry, but its all cold data
>> after boot.
> What about alternatively simply updating the struct alt_instr
> instances to describe the code _after_ a patch that was applied?
> That'll allow to always know how much padding there is.

There are multiple alt_instr pointing to the same origin sites when
using ALTERNATIVE_2, meaning you keep all the safety problems with the
current setup.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel