All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoffer Dall <christoffer.dall@linaro.org>
To: Shanker Donthineni <shankerd@codeaurora.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvmarm <kvmarm@lists.cs.columbia.edu>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH] arm64: KVM: Save two instructions in __guest_enter()
Date: Mon, 29 Aug 2016 20:13:55 +0200	[thread overview]
Message-ID: <20160829181355.GB10162@cbox> (raw)
In-Reply-To: <1470791736-13949-1-git-send-email-shankerd@codeaurora.org>

On Tue, Aug 09, 2016 at 08:15:36PM -0500, Shanker Donthineni wrote:
> We are doing an unnecessary stack push/pop operation when restoring
> the guest registers x0-x18 in __guest_enter(). This patch saves the
> two instructions by using x18 as a base register. No need to store
> the vcpu context pointer in stack because it is redundant and not
> being used anywhere, the same information is available in tpidr_el2.
> 
> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> ---
>  arch/arm64/kvm/hyp/entry.S | 66 ++++++++++++++++++++++------------------------
>  1 file changed, 32 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index ce9e5e5..d2e09a1 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -55,37 +55,32 @@
>   */
>  ENTRY(__guest_enter)
>  	// x0: vcpu
> -	// x1: host/guest context
> -	// x2-x18: clobbered by macros
> +	// x1: host context
> +	// x2-x17: clobbered by macros
> +	// x18: guest context
>  
>  	// Store the host regs
>  	save_callee_saved_regs x1
>  
> -	// Preserve vcpu & host_ctxt for use at exit time
> -	stp	x0, x1, [sp, #-16]!
> +	// Preserve the host_ctxt for use at exit time
> +	str	x1, [sp, #-16]!
>  
> -	add	x1, x0, #VCPU_CONTEXT
> +	add	x18, x0, #VCPU_CONTEXT
>  
> -	// Prepare x0-x1 for later restore by pushing them onto the stack
> -	ldp	x2, x3, [x1, #CPU_XREG_OFFSET(0)]
> -	stp	x2, x3, [sp, #-16]!
> +	// Restore guest regs x19-x29, lr
> +	restore_callee_saved_regs x18

couldn't moving this load here be bad for prefetching?

>  
> -	// x2-x18
> -	ldp	x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
> -	ldp	x4, x5,   [x1, #CPU_XREG_OFFSET(4)]
> -	ldp	x6, x7,   [x1, #CPU_XREG_OFFSET(6)]
> -	ldp	x8, x9,   [x1, #CPU_XREG_OFFSET(8)]
> -	ldp	x10, x11, [x1, #CPU_XREG_OFFSET(10)]
> -	ldp	x12, x13, [x1, #CPU_XREG_OFFSET(12)]
> -	ldp	x14, x15, [x1, #CPU_XREG_OFFSET(14)]
> -	ldp	x16, x17, [x1, #CPU_XREG_OFFSET(16)]
> -	ldr	x18,      [x1, #CPU_XREG_OFFSET(18)]
> -
> -	// x19-x29, lr
> -	restore_callee_saved_regs x1
> -
> -	// Last bits of the 64bit state
> -	ldp	x0, x1, [sp], #16
> +	// Restore guest regs x0-x18
> +	ldp	x0, x1,   [x18, #CPU_XREG_OFFSET(0)]
> +	ldp	x2, x3,   [x18, #CPU_XREG_OFFSET(2)]
> +	ldp	x4, x5,   [x18, #CPU_XREG_OFFSET(4)]
> +	ldp	x6, x7,   [x18, #CPU_XREG_OFFSET(6)]
> +	ldp	x8, x9,   [x18, #CPU_XREG_OFFSET(8)]
> +	ldp	x10, x11, [x18, #CPU_XREG_OFFSET(10)]
> +	ldp	x12, x13, [x18, #CPU_XREG_OFFSET(12)]
> +	ldp	x14, x15, [x18, #CPU_XREG_OFFSET(14)]
> +	ldp	x16, x17, [x18, #CPU_XREG_OFFSET(16)]
> +	ldr	x18,      [x18, #CPU_XREG_OFFSET(18)]
>  
>  	// Do not touch any register after this!
>  	eret
> @@ -100,6 +95,16 @@ ENTRY(__guest_exit)
>  
>  	add	x2, x0, #VCPU_CONTEXT
>  
> +	// Store the guest regs x19-x29, lr
> +	save_callee_saved_regs x2

same question here (although with a different weight as we were already
'jumping back' with the memory address in our store sequence.

If this is a real concern, a better approach would be to override x0
with the vcpu context pointer, do two pairs of load/stores using x2,x3
for the vcpu x0-x3, and then proceed with the rest of the registers.

> +
> +	// Retrieve the guest regs x0-x3 from the stack
> +	ldp	x21, x22, [sp], #16	// x2, x3
> +	ldp	x19, x20, [sp], #16	// x0, x1
> +
> +	// Store the guest regs x0-x18
> +	stp	x19, x20, [x2, #CPU_XREG_OFFSET(0)]
> +	stp	x21, x22, [x2, #CPU_XREG_OFFSET(2)]
>  	stp	x4, x5,   [x2, #CPU_XREG_OFFSET(4)]
>  	stp	x6, x7,   [x2, #CPU_XREG_OFFSET(6)]
>  	stp	x8, x9,   [x2, #CPU_XREG_OFFSET(8)]
> @@ -109,20 +114,13 @@ ENTRY(__guest_exit)
>  	stp	x16, x17, [x2, #CPU_XREG_OFFSET(16)]
>  	str	x18,      [x2, #CPU_XREG_OFFSET(18)]
>  
> -	ldp	x6, x7, [sp], #16	// x2, x3
> -	ldp	x4, x5, [sp], #16	// x0, x1
> +	// Restore the host_ctxt from the stack
> +	ldr	x2, [sp], #16
>  
> -	stp	x4, x5, [x2, #CPU_XREG_OFFSET(0)]
> -	stp	x6, x7, [x2, #CPU_XREG_OFFSET(2)]
> -
> -	save_callee_saved_regs x2
> -
> -	// Restore vcpu & host_ctxt from the stack
> -	// (preserving return code in x1)
> -	ldp	x0, x2, [sp], #16
>  	// Now restore the host regs
>  	restore_callee_saved_regs x2
>  
> +	// Preserving return code (x1)

nit: preserve is a strange word to choose to describe what you do here.

if you want to do what I suggested above, you could change the two
callers to return the return code in x0, and the vcpu pointer in x1 and
then you can save this instruction as well.

>  	mov	x0, x1
>  	ret
>  ENDPROC(__guest_exit)
> -- 

Thanks,
-Christoffer

WARNING: multiple messages have this Message-ID (diff)
From: christoffer.dall@linaro.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] arm64: KVM: Save two instructions in __guest_enter()
Date: Mon, 29 Aug 2016 20:13:55 +0200	[thread overview]
Message-ID: <20160829181355.GB10162@cbox> (raw)
In-Reply-To: <1470791736-13949-1-git-send-email-shankerd@codeaurora.org>

On Tue, Aug 09, 2016 at 08:15:36PM -0500, Shanker Donthineni wrote:
> We are doing an unnecessary stack push/pop operation when restoring
> the guest registers x0-x18 in __guest_enter(). This patch saves the
> two instructions by using x18 as a base register. No need to store
> the vcpu context pointer in stack because it is redundant and not
> being used anywhere, the same information is available in tpidr_el2.
> 
> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> ---
>  arch/arm64/kvm/hyp/entry.S | 66 ++++++++++++++++++++++------------------------
>  1 file changed, 32 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index ce9e5e5..d2e09a1 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -55,37 +55,32 @@
>   */
>  ENTRY(__guest_enter)
>  	// x0: vcpu
> -	// x1: host/guest context
> -	// x2-x18: clobbered by macros
> +	// x1: host context
> +	// x2-x17: clobbered by macros
> +	// x18: guest context
>  
>  	// Store the host regs
>  	save_callee_saved_regs x1
>  
> -	// Preserve vcpu & host_ctxt for use at exit time
> -	stp	x0, x1, [sp, #-16]!
> +	// Preserve the host_ctxt for use at exit time
> +	str	x1, [sp, #-16]!
>  
> -	add	x1, x0, #VCPU_CONTEXT
> +	add	x18, x0, #VCPU_CONTEXT
>  
> -	// Prepare x0-x1 for later restore by pushing them onto the stack
> -	ldp	x2, x3, [x1, #CPU_XREG_OFFSET(0)]
> -	stp	x2, x3, [sp, #-16]!
> +	// Restore guest regs x19-x29, lr
> +	restore_callee_saved_regs x18

couldn't moving this load here be bad for prefetching?

>  
> -	// x2-x18
> -	ldp	x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
> -	ldp	x4, x5,   [x1, #CPU_XREG_OFFSET(4)]
> -	ldp	x6, x7,   [x1, #CPU_XREG_OFFSET(6)]
> -	ldp	x8, x9,   [x1, #CPU_XREG_OFFSET(8)]
> -	ldp	x10, x11, [x1, #CPU_XREG_OFFSET(10)]
> -	ldp	x12, x13, [x1, #CPU_XREG_OFFSET(12)]
> -	ldp	x14, x15, [x1, #CPU_XREG_OFFSET(14)]
> -	ldp	x16, x17, [x1, #CPU_XREG_OFFSET(16)]
> -	ldr	x18,      [x1, #CPU_XREG_OFFSET(18)]
> -
> -	// x19-x29, lr
> -	restore_callee_saved_regs x1
> -
> -	// Last bits of the 64bit state
> -	ldp	x0, x1, [sp], #16
> +	// Restore guest regs x0-x18
> +	ldp	x0, x1,   [x18, #CPU_XREG_OFFSET(0)]
> +	ldp	x2, x3,   [x18, #CPU_XREG_OFFSET(2)]
> +	ldp	x4, x5,   [x18, #CPU_XREG_OFFSET(4)]
> +	ldp	x6, x7,   [x18, #CPU_XREG_OFFSET(6)]
> +	ldp	x8, x9,   [x18, #CPU_XREG_OFFSET(8)]
> +	ldp	x10, x11, [x18, #CPU_XREG_OFFSET(10)]
> +	ldp	x12, x13, [x18, #CPU_XREG_OFFSET(12)]
> +	ldp	x14, x15, [x18, #CPU_XREG_OFFSET(14)]
> +	ldp	x16, x17, [x18, #CPU_XREG_OFFSET(16)]
> +	ldr	x18,      [x18, #CPU_XREG_OFFSET(18)]
>  
>  	// Do not touch any register after this!
>  	eret
> @@ -100,6 +95,16 @@ ENTRY(__guest_exit)
>  
>  	add	x2, x0, #VCPU_CONTEXT
>  
> +	// Store the guest regs x19-x29, lr
> +	save_callee_saved_regs x2

same question here (although with a different weight as we were already
'jumping back' with the memory address in our store sequence.

If this is a real concern, a better approach would be to override x0
with the vcpu context pointer, do two pairs of load/stores using x2,x3
for the vcpu x0-x3, and then proceed with the rest of the registers.

> +
> +	// Retrieve the guest regs x0-x3 from the stack
> +	ldp	x21, x22, [sp], #16	// x2, x3
> +	ldp	x19, x20, [sp], #16	// x0, x1
> +
> +	// Store the guest regs x0-x18
> +	stp	x19, x20, [x2, #CPU_XREG_OFFSET(0)]
> +	stp	x21, x22, [x2, #CPU_XREG_OFFSET(2)]
>  	stp	x4, x5,   [x2, #CPU_XREG_OFFSET(4)]
>  	stp	x6, x7,   [x2, #CPU_XREG_OFFSET(6)]
>  	stp	x8, x9,   [x2, #CPU_XREG_OFFSET(8)]
> @@ -109,20 +114,13 @@ ENTRY(__guest_exit)
>  	stp	x16, x17, [x2, #CPU_XREG_OFFSET(16)]
>  	str	x18,      [x2, #CPU_XREG_OFFSET(18)]
>  
> -	ldp	x6, x7, [sp], #16	// x2, x3
> -	ldp	x4, x5, [sp], #16	// x0, x1
> +	// Restore the host_ctxt from the stack
> +	ldr	x2, [sp], #16
>  
> -	stp	x4, x5, [x2, #CPU_XREG_OFFSET(0)]
> -	stp	x6, x7, [x2, #CPU_XREG_OFFSET(2)]
> -
> -	save_callee_saved_regs x2
> -
> -	// Restore vcpu & host_ctxt from the stack
> -	// (preserving return code in x1)
> -	ldp	x0, x2, [sp], #16
>  	// Now restore the host regs
>  	restore_callee_saved_regs x2
>  
> +	// Preserving return code (x1)

nit: preserve is a strange word to choose to describe what you do here.

if you want to do what I suggested above, you could change the two
callers to return the return code in x0, and the vcpu pointer in x1 and
then you can save this instruction as well.

>  	mov	x0, x1
>  	ret
>  ENDPROC(__guest_exit)
> -- 

Thanks,
-Christoffer

WARNING: multiple messages have this Message-ID (diff)
From: Christoffer Dall <christoffer.dall@linaro.org>
To: Shanker Donthineni <shankerd@codeaurora.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	kvmarm <kvmarm@lists.cs.columbia.edu>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Will Deacon <will.deacon@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	James Morse <james.morse@arm.com>
Subject: Re: [PATCH] arm64: KVM: Save two instructions in __guest_enter()
Date: Mon, 29 Aug 2016 20:13:55 +0200	[thread overview]
Message-ID: <20160829181355.GB10162@cbox> (raw)
In-Reply-To: <1470791736-13949-1-git-send-email-shankerd@codeaurora.org>

On Tue, Aug 09, 2016 at 08:15:36PM -0500, Shanker Donthineni wrote:
> We are doing an unnecessary stack push/pop operation when restoring
> the guest registers x0-x18 in __guest_enter(). This patch saves the
> two instructions by using x18 as a base register. No need to store
> the vcpu context pointer in stack because it is redundant and not
> being used anywhere, the same information is available in tpidr_el2.
> 
> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> ---
>  arch/arm64/kvm/hyp/entry.S | 66 ++++++++++++++++++++++------------------------
>  1 file changed, 32 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index ce9e5e5..d2e09a1 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -55,37 +55,32 @@
>   */
>  ENTRY(__guest_enter)
>  	// x0: vcpu
> -	// x1: host/guest context
> -	// x2-x18: clobbered by macros
> +	// x1: host context
> +	// x2-x17: clobbered by macros
> +	// x18: guest context
>  
>  	// Store the host regs
>  	save_callee_saved_regs x1
>  
> -	// Preserve vcpu & host_ctxt for use at exit time
> -	stp	x0, x1, [sp, #-16]!
> +	// Preserve the host_ctxt for use at exit time
> +	str	x1, [sp, #-16]!
>  
> -	add	x1, x0, #VCPU_CONTEXT
> +	add	x18, x0, #VCPU_CONTEXT
>  
> -	// Prepare x0-x1 for later restore by pushing them onto the stack
> -	ldp	x2, x3, [x1, #CPU_XREG_OFFSET(0)]
> -	stp	x2, x3, [sp, #-16]!
> +	// Restore guest regs x19-x29, lr
> +	restore_callee_saved_regs x18

couldn't moving this load here be bad for prefetching?

>  
> -	// x2-x18
> -	ldp	x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
> -	ldp	x4, x5,   [x1, #CPU_XREG_OFFSET(4)]
> -	ldp	x6, x7,   [x1, #CPU_XREG_OFFSET(6)]
> -	ldp	x8, x9,   [x1, #CPU_XREG_OFFSET(8)]
> -	ldp	x10, x11, [x1, #CPU_XREG_OFFSET(10)]
> -	ldp	x12, x13, [x1, #CPU_XREG_OFFSET(12)]
> -	ldp	x14, x15, [x1, #CPU_XREG_OFFSET(14)]
> -	ldp	x16, x17, [x1, #CPU_XREG_OFFSET(16)]
> -	ldr	x18,      [x1, #CPU_XREG_OFFSET(18)]
> -
> -	// x19-x29, lr
> -	restore_callee_saved_regs x1
> -
> -	// Last bits of the 64bit state
> -	ldp	x0, x1, [sp], #16
> +	// Restore guest regs x0-x18
> +	ldp	x0, x1,   [x18, #CPU_XREG_OFFSET(0)]
> +	ldp	x2, x3,   [x18, #CPU_XREG_OFFSET(2)]
> +	ldp	x4, x5,   [x18, #CPU_XREG_OFFSET(4)]
> +	ldp	x6, x7,   [x18, #CPU_XREG_OFFSET(6)]
> +	ldp	x8, x9,   [x18, #CPU_XREG_OFFSET(8)]
> +	ldp	x10, x11, [x18, #CPU_XREG_OFFSET(10)]
> +	ldp	x12, x13, [x18, #CPU_XREG_OFFSET(12)]
> +	ldp	x14, x15, [x18, #CPU_XREG_OFFSET(14)]
> +	ldp	x16, x17, [x18, #CPU_XREG_OFFSET(16)]
> +	ldr	x18,      [x18, #CPU_XREG_OFFSET(18)]
>  
>  	// Do not touch any register after this!
>  	eret
> @@ -100,6 +95,16 @@ ENTRY(__guest_exit)
>  
>  	add	x2, x0, #VCPU_CONTEXT
>  
> +	// Store the guest regs x19-x29, lr
> +	save_callee_saved_regs x2

same question here (although with a different weight as we were already
'jumping back' with the memory address in our store sequence.

If this is a real concern, a better approach would be to override x0
with the vcpu context pointer, do two pairs of load/stores using x2,x3
for the vcpu x0-x3, and then proceed with the rest of the registers.

> +
> +	// Retrieve the guest regs x0-x3 from the stack
> +	ldp	x21, x22, [sp], #16	// x2, x3
> +	ldp	x19, x20, [sp], #16	// x0, x1
> +
> +	// Store the guest regs x0-x18
> +	stp	x19, x20, [x2, #CPU_XREG_OFFSET(0)]
> +	stp	x21, x22, [x2, #CPU_XREG_OFFSET(2)]
>  	stp	x4, x5,   [x2, #CPU_XREG_OFFSET(4)]
>  	stp	x6, x7,   [x2, #CPU_XREG_OFFSET(6)]
>  	stp	x8, x9,   [x2, #CPU_XREG_OFFSET(8)]
> @@ -109,20 +114,13 @@ ENTRY(__guest_exit)
>  	stp	x16, x17, [x2, #CPU_XREG_OFFSET(16)]
>  	str	x18,      [x2, #CPU_XREG_OFFSET(18)]
>  
> -	ldp	x6, x7, [sp], #16	// x2, x3
> -	ldp	x4, x5, [sp], #16	// x0, x1
> +	// Restore the host_ctxt from the stack
> +	ldr	x2, [sp], #16
>  
> -	stp	x4, x5, [x2, #CPU_XREG_OFFSET(0)]
> -	stp	x6, x7, [x2, #CPU_XREG_OFFSET(2)]
> -
> -	save_callee_saved_regs x2
> -
> -	// Restore vcpu & host_ctxt from the stack
> -	// (preserving return code in x1)
> -	ldp	x0, x2, [sp], #16
>  	// Now restore the host regs
>  	restore_callee_saved_regs x2
>  
> +	// Preserving return code (x1)

nit: preserve is a strange word to choose to describe what you do here.

if you want to do what I suggested above, you could change the two
callers to return the return code in x0, and the vcpu pointer in x1 and
then you can save this instruction as well.

>  	mov	x0, x1
>  	ret
>  ENDPROC(__guest_exit)
> -- 

Thanks,
-Christoffer

  parent reply	other threads:[~2016-08-29 18:03 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-10  1:15 [PATCH] arm64: KVM: Save two instructions in __guest_enter() Shanker Donthineni
2016-08-10  1:15 ` Shanker Donthineni
2016-08-10  1:15 ` Shanker Donthineni
2016-08-25 13:31 ` Christoffer Dall
2016-08-25 13:31   ` Christoffer Dall
2016-08-29 15:22   ` Shanker Donthineni
2016-08-29 15:22     ` Shanker Donthineni
2016-08-29 15:22     ` Shanker Donthineni
2016-08-29 18:15     ` Christoffer Dall
2016-08-29 18:15       ` Christoffer Dall
2016-08-29 18:15       ` Christoffer Dall
2016-08-29 18:13 ` Christoffer Dall [this message]
2016-08-29 18:13   ` Christoffer Dall
2016-08-29 18:13   ` Christoffer Dall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160829181355.GB10162@cbox \
    --to=christoffer.dall@linaro.org \
    --cc=catalin.marinas@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=pbonzini@redhat.com \
    --cc=shankerd@codeaurora.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.