From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH 1/4] VMX: streamline entry.S code
Date: Mon, 26 Aug 2013 14:22:39 +0100
Message-ID: <521B569F.1040408@citrix.com>
References: <521786A002000078000EE064@nat28.tlf.novell.com>
	<5217876402000078000EE077@nat28.tlf.novell.com>
	<521B3197.8060408@citrix.com>
	<521B51B702000078000EE53E@nat28.tlf.novell.com>
	<521B40A4.2090704@citrix.com>
	<521B704C02000078000EE661@nat28.tlf.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta14.messagelabs.com ([193.109.254.103])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <Andrew.Cooper3@citrix.com>) id 1VDwkk-0004tv-MZ
	for xen-devel@lists.xenproject.org; Mon, 26 Aug 2013 13:22:42 +0000
In-Reply-To: <521B704C02000078000EE661@nat28.tlf.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>, Keir Fraser <keir@xen.org>, Eddie Dong <eddie.dong@intel.com>, Jun Nakajima <jun.nakajima@intel.com>
List-Id: xen-devel@lists.xenproject.org

On 26/08/2013 14:12, Jan Beulich wrote:
>>>> On 26.08.13 at 13:48, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 26/08/2013 12:01, Jan Beulich wrote:
>>>>> -.globl vmx_asm_do_vmentry
>>>>> -vmx_asm_do_vmentry:
>>>> If you move the ENTRY(vmx_asm_do_vmentry) up from below, you should be
>>>> able to completely drop the jmp in it.
>>> That would be possible, at the expense of added padding. I prefer
>>> it the way it is now, as vmx_asm_do_vmentry is not performance
>>> critical (as being used exactly once per HVM vCPU).
>> There are a number of places where we have ENTRY()-like constructs but
>> don't want the padding with it.
>>
>> Would an __ENTRY() macro go down well?  I can spin a patch for it.
> x86 Linux has GLOBAL() for that purpose - I'd like this better than
> __ENTRY() both from a name space perspective and from
> describing its purpose.

Ok - I will spin a patch.

>
>> My point about re-executing it does still apply.  Looking at the code, I
>> do not believe it is correct to be executing vmx_intr_assist or
>> nvmx_switch_guest multiple times on a context switch to an HVM VCPU. 
>> vmx_intr_assist at the very least has a huge amount of work to do before
>> it considers exiting.
>>
>> It does appear that there is possible interaction between do_softirq()
>> and vmx_intr_assist(), at which point vmx_intr_assist() should be run
>> after do_softirq(), which removes the apparently redundant run with
>> interrupts enabled.
> None of this seems related to the patch anymore - if you think
> there's more stuff that needs changing, let's discuss this in a
> separate thread.

Certainly.

>
>>> The %cr2 write's move is indeed debatable - I tried to get it farther
>>> away from the producer of the data in %rax, but it's not clear
>>> whether that's very useful. The second purpose was to get
>>> something interleaved with the many "pop"s, so that the CPU can
>>> get busy other than just its memory load ports. If controversial
>>> I'm fine with undoing that change.
>> From my understanding of a serialising instruction, it forces the
>> completion of all previous instructions before starting, and prevents
>> the issue of any subsequent instructions until it itself has completed.
>>
>> Therefore, I doubt it has the intended effect.
> Wait - this is again also a separation from the producer of the
> data. Whether modern CPUs can deal with that I'm not sure,
> but it surely doesn't hurt to hide eventual latency.
>
> Jan
>

For non-serialising instructions, it is a good idea (and likely some a
compiler would anyway).  Moving the GET_CURRENT() will probably be quite
effective as most subsequent instructions depend on it.

Serialising instructions on the other hand will not be affected by these
issues (given their nature), but I would prefer to defer judgement to
someone who has a better idea of the microarchitectural implications.

Either way, as the concerns are now just down to playing with the
optimal static instruction scheduling,

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>