All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoffer Dall <cdall@kernel.org>
To: Dave Martin <Dave.Martin@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: Re: [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
Date: Mon, 9 Apr 2018 11:44:33 +0200	[thread overview]
Message-ID: <20180409094433.GA10904@cbox> (raw)
In-Reply-To: <20180406155151.GS16308@e103592.cambridge.arm.com>

On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote:
> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
> > Hi Dave,
> > 
> > On 06/04/18 16:01, Dave Martin wrote:
> > > To make the lazy FPSIMD context switch trap code easier to hack on,
> > > this patch converts it to C.
> > > 
> > > This is not amazingly efficient, but the trap should typically only
> > > be taken once per host context switch.
> > > 
> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > > 
> > > ---
> > > 
> > > Since RFCv1:
> > > 
> > >  * Fix indentation to be consistent with the rest of the file.
> > >  * Add missing ! to write back to sp with attempting to push regs.
> > > ---
> > >  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> > >  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> > >  2 files changed, 46 insertions(+), 35 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> > > index fdd1068..47c6a78 100644
> > > --- a/arch/arm64/kvm/hyp/entry.S
> > > +++ b/arch/arm64/kvm/hyp/entry.S
> > > @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
> > >  	// x1: vcpu
> > >  	// x2-x29,lr: vcpu regs
> > >  	// vcpu x0-x1 on the stack
> > > -	stp	x2, x3, [sp, #-16]!
> > > -	stp	x4, lr, [sp, #-16]!
> > > -
> > > -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> > > -	mrs	x2, cptr_el2
> > > -	bic	x2, x2, #CPTR_EL2_TFP
> > > -	msr	cptr_el2, x2
> > > -alternative_else
> > > -	mrs	x2, cpacr_el1
> > > -	orr	x2, x2, #CPACR_EL1_FPEN
> > > -	msr	cpacr_el1, x2
> > > -alternative_endif
> > > -	isb
> > > -
> > > -	mov	x3, x1
> > > -
> > > -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> > > -	kern_hyp_va x0
> > > -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > > -	bl	__fpsimd_save_state
> > > -
> > > -	add	x2, x3, #VCPU_CONTEXT
> > > -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > > -	bl	__fpsimd_restore_state
> > > -
> > > -	// Skip restoring fpexc32 for AArch64 guests
> > > -	mrs	x1, hcr_el2
> > > -	tbnz	x1, #HCR_RW_SHIFT, 1f
> > > -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> > > -	msr	fpexc32_el2, x4
> > > -1:
> > > -	ldp	x4, lr, [sp], #16
> > > -	ldp	x2, x3, [sp], #16
> > > -	ldp	x0, x1, [sp], #16
> > > -
> > > +	stp	x2, x3, [sp, #-144]!
> > > +	stp	x4, x5, [sp, #16]
> > > +	stp	x6, x7, [sp, #32]
> > > +	stp	x8, x9, [sp, #48]
> > > +	stp	x10, x11, [sp, #64]
> > > +	stp	x12, x13, [sp, #80]
> > > +	stp	x14, x15, [sp, #96]
> > > +	stp	x16, x17, [sp, #112]
> > > +	stp	x18, lr, [sp, #128]
> > > +
> > > +	bl	__hyp_switch_fpsimd
> > > +
> > > +	ldp	x4, x5, [sp, #16]
> > > +	ldp	x6, x7, [sp, #32]
> > > +	ldp	x8, x9, [sp, #48]
> > > +	ldp	x10, x11, [sp, #64]
> > > +	ldp	x12, x13, [sp, #80]
> > > +	ldp	x14, x15, [sp, #96]
> > > +	ldp	x16, x17, [sp, #112]
> > > +	ldp	x18, lr, [sp, #128]
> > > +	ldp	x0, x1, [sp, #144]
> > > +	ldp	x2, x3, [sp], #160
> > 
> > I can't say I'm overly thrilled with adding another save/restore 
> > sequence. How about treating it like a real guest exit instead? Granted, 
> > there is a bit more overhead to it, but as you pointed out above, this 
> > should be pretty rare...
> 
> I have no objection to handling this after exiting back to
> __kvm_vcpu_run(), provided the performance is deemed acceptable.
> 

My guess is that it's going to be visible on non-VHE systems, and given
that we're doing all of this for performance in the first place, I'm not
exceited about that approach either.

I thought it was acceptable to do another save/restore, because it was
only the GPRs (and equivalent to what the compiler would generate for a
function call?) and thus not susceptible to the complexities of sysreg
save/restores.

Another alternative would be to go back to Dave's original approach of
implementing the fpsimd state update to the host's structure in assembly
directly, but I was having a hard time understanding that.  Perhaps I
just need to try harder.

Thoughts?

Thanks,
-Christoffer

WARNING: multiple messages have this Message-ID (diff)
From: cdall@kernel.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
Date: Mon, 9 Apr 2018 11:44:33 +0200	[thread overview]
Message-ID: <20180409094433.GA10904@cbox> (raw)
In-Reply-To: <20180406155151.GS16308@e103592.cambridge.arm.com>

On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote:
> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
> > Hi Dave,
> > 
> > On 06/04/18 16:01, Dave Martin wrote:
> > > To make the lazy FPSIMD context switch trap code easier to hack on,
> > > this patch converts it to C.
> > > 
> > > This is not amazingly efficient, but the trap should typically only
> > > be taken once per host context switch.
> > > 
> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > > 
> > > ---
> > > 
> > > Since RFCv1:
> > > 
> > >  * Fix indentation to be consistent with the rest of the file.
> > >  * Add missing ! to write back to sp with attempting to push regs.
> > > ---
> > >  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> > >  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> > >  2 files changed, 46 insertions(+), 35 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> > > index fdd1068..47c6a78 100644
> > > --- a/arch/arm64/kvm/hyp/entry.S
> > > +++ b/arch/arm64/kvm/hyp/entry.S
> > > @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
> > >  	// x1: vcpu
> > >  	// x2-x29,lr: vcpu regs
> > >  	// vcpu x0-x1 on the stack
> > > -	stp	x2, x3, [sp, #-16]!
> > > -	stp	x4, lr, [sp, #-16]!
> > > -
> > > -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> > > -	mrs	x2, cptr_el2
> > > -	bic	x2, x2, #CPTR_EL2_TFP
> > > -	msr	cptr_el2, x2
> > > -alternative_else
> > > -	mrs	x2, cpacr_el1
> > > -	orr	x2, x2, #CPACR_EL1_FPEN
> > > -	msr	cpacr_el1, x2
> > > -alternative_endif
> > > -	isb
> > > -
> > > -	mov	x3, x1
> > > -
> > > -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> > > -	kern_hyp_va x0
> > > -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > > -	bl	__fpsimd_save_state
> > > -
> > > -	add	x2, x3, #VCPU_CONTEXT
> > > -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > > -	bl	__fpsimd_restore_state
> > > -
> > > -	// Skip restoring fpexc32 for AArch64 guests
> > > -	mrs	x1, hcr_el2
> > > -	tbnz	x1, #HCR_RW_SHIFT, 1f
> > > -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> > > -	msr	fpexc32_el2, x4
> > > -1:
> > > -	ldp	x4, lr, [sp], #16
> > > -	ldp	x2, x3, [sp], #16
> > > -	ldp	x0, x1, [sp], #16
> > > -
> > > +	stp	x2, x3, [sp, #-144]!
> > > +	stp	x4, x5, [sp, #16]
> > > +	stp	x6, x7, [sp, #32]
> > > +	stp	x8, x9, [sp, #48]
> > > +	stp	x10, x11, [sp, #64]
> > > +	stp	x12, x13, [sp, #80]
> > > +	stp	x14, x15, [sp, #96]
> > > +	stp	x16, x17, [sp, #112]
> > > +	stp	x18, lr, [sp, #128]
> > > +
> > > +	bl	__hyp_switch_fpsimd
> > > +
> > > +	ldp	x4, x5, [sp, #16]
> > > +	ldp	x6, x7, [sp, #32]
> > > +	ldp	x8, x9, [sp, #48]
> > > +	ldp	x10, x11, [sp, #64]
> > > +	ldp	x12, x13, [sp, #80]
> > > +	ldp	x14, x15, [sp, #96]
> > > +	ldp	x16, x17, [sp, #112]
> > > +	ldp	x18, lr, [sp, #128]
> > > +	ldp	x0, x1, [sp, #144]
> > > +	ldp	x2, x3, [sp], #160
> > 
> > I can't say I'm overly thrilled with adding another save/restore 
> > sequence. How about treating it like a real guest exit instead? Granted, 
> > there is a bit more overhead to it, but as you pointed out above, this 
> > should be pretty rare...
> 
> I have no objection to handling this after exiting back to
> __kvm_vcpu_run(), provided the performance is deemed acceptable.
> 

My guess is that it's going to be visible on non-VHE systems, and given
that we're doing all of this for performance in the first place, I'm not
exceited about that approach either.

I thought it was acceptable to do another save/restore, because it was
only the GPRs (and equivalent to what the compiler would generate for a
function call?) and thus not susceptible to the complexities of sysreg
save/restores.

Another alternative would be to go back to Dave's original approach of
implementing the fpsimd state update to the host's structure in assembly
directly, but I was having a hard time understanding that.  Perhaps I
just need to try harder.

Thoughts?

Thanks,
-Christoffer

  reply	other threads:[~2018-04-09  9:44 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-06 15:01 [RFC PATCH v2 0/3] KVM: arm64: Optimise FPSIMD context switching Dave Martin
2018-04-06 15:01 ` Dave Martin
2018-04-06 15:01 ` [RFC PATCH v2 1/3] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change Dave Martin
2018-04-06 15:01   ` Dave Martin
2018-04-06 15:01 ` [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C Dave Martin
2018-04-06 15:01   ` Dave Martin
2018-04-06 15:25   ` Marc Zyngier
2018-04-06 15:25     ` Marc Zyngier
2018-04-06 15:51     ` Dave Martin
2018-04-06 15:51       ` Dave Martin
2018-04-09  9:44       ` Christoffer Dall [this message]
2018-04-09  9:44         ` Christoffer Dall
2018-04-09 10:00         ` Marc Zyngier
2018-04-09 10:00           ` Marc Zyngier
2018-04-09 10:26           ` Christoffer Dall
2018-04-09 10:26             ` Christoffer Dall
2018-04-06 15:01 ` [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing Dave Martin
2018-04-06 15:01   ` Dave Martin
2018-04-07  9:54   ` Marc Zyngier
2018-04-07  9:54     ` Marc Zyngier
2018-04-09 10:55     ` Dave Martin
2018-04-09 10:55       ` Dave Martin
2018-04-09  9:48   ` Christoffer Dall
2018-04-09  9:48     ` Christoffer Dall
2018-04-09 10:23     ` Dave Martin
2018-04-09 10:23       ` Dave Martin
2018-04-09 10:57     ` Dave Martin
2018-04-09 10:57       ` Dave Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180409094433.GA10904@cbox \
    --to=cdall@kernel.org \
    --cc=Dave.Martin@arm.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=marc.zyngier@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.