All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoffer Dall <cdall@kernel.org>
To: Marc Zyngier <marc.zyngier@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Dave Martin <Dave.Martin@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
Date: Mon, 9 Apr 2018 12:26:27 +0200	[thread overview]
Message-ID: <20180409102627.GC10904@cbox> (raw)
In-Reply-To: <c1bbee2f-6087-c871-d68a-3d5ea84e0b8f@arm.com>

On Mon, Apr 09, 2018 at 11:00:40AM +0100, Marc Zyngier wrote:
> On 09/04/18 10:44, Christoffer Dall wrote:
> > On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote:
> >> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
> >>> Hi Dave,
> >>>
> >>> On 06/04/18 16:01, Dave Martin wrote:
> >>>> To make the lazy FPSIMD context switch trap code easier to hack on,
> >>>> this patch converts it to C.
> >>>>
> >>>> This is not amazingly efficient, but the trap should typically only
> >>>> be taken once per host context switch.
> >>>>
> >>>> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> >>>>
> >>>> ---
> >>>>
> >>>> Since RFCv1:
> >>>>
> >>>>  * Fix indentation to be consistent with the rest of the file.
> >>>>  * Add missing ! to write back to sp with attempting to push regs.
> >>>> ---
> >>>>  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> >>>>  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> >>>>  2 files changed, 46 insertions(+), 35 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> >>>> index fdd1068..47c6a78 100644
> >>>> --- a/arch/arm64/kvm/hyp/entry.S
> >>>> +++ b/arch/arm64/kvm/hyp/entry.S
> >>>> @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
> >>>>  	// x1: vcpu
> >>>>  	// x2-x29,lr: vcpu regs
> >>>>  	// vcpu x0-x1 on the stack
> >>>> -	stp	x2, x3, [sp, #-16]!
> >>>> -	stp	x4, lr, [sp, #-16]!
> >>>> -
> >>>> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> >>>> -	mrs	x2, cptr_el2
> >>>> -	bic	x2, x2, #CPTR_EL2_TFP
> >>>> -	msr	cptr_el2, x2
> >>>> -alternative_else
> >>>> -	mrs	x2, cpacr_el1
> >>>> -	orr	x2, x2, #CPACR_EL1_FPEN
> >>>> -	msr	cpacr_el1, x2
> >>>> -alternative_endif
> >>>> -	isb
> >>>> -
> >>>> -	mov	x3, x1
> >>>> -
> >>>> -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> >>>> -	kern_hyp_va x0
> >>>> -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> -	bl	__fpsimd_save_state
> >>>> -
> >>>> -	add	x2, x3, #VCPU_CONTEXT
> >>>> -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> -	bl	__fpsimd_restore_state
> >>>> -
> >>>> -	// Skip restoring fpexc32 for AArch64 guests
> >>>> -	mrs	x1, hcr_el2
> >>>> -	tbnz	x1, #HCR_RW_SHIFT, 1f
> >>>> -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> >>>> -	msr	fpexc32_el2, x4
> >>>> -1:
> >>>> -	ldp	x4, lr, [sp], #16
> >>>> -	ldp	x2, x3, [sp], #16
> >>>> -	ldp	x0, x1, [sp], #16
> >>>> -
> >>>> +	stp	x2, x3, [sp, #-144]!
> >>>> +	stp	x4, x5, [sp, #16]
> >>>> +	stp	x6, x7, [sp, #32]
> >>>> +	stp	x8, x9, [sp, #48]
> >>>> +	stp	x10, x11, [sp, #64]
> >>>> +	stp	x12, x13, [sp, #80]
> >>>> +	stp	x14, x15, [sp, #96]
> >>>> +	stp	x16, x17, [sp, #112]
> >>>> +	stp	x18, lr, [sp, #128]
> >>>> +
> >>>> +	bl	__hyp_switch_fpsimd
> >>>> +
> >>>> +	ldp	x4, x5, [sp, #16]
> >>>> +	ldp	x6, x7, [sp, #32]
> >>>> +	ldp	x8, x9, [sp, #48]
> >>>> +	ldp	x10, x11, [sp, #64]
> >>>> +	ldp	x12, x13, [sp, #80]
> >>>> +	ldp	x14, x15, [sp, #96]
> >>>> +	ldp	x16, x17, [sp, #112]
> >>>> +	ldp	x18, lr, [sp, #128]
> >>>> +	ldp	x0, x1, [sp, #144]
> >>>> +	ldp	x2, x3, [sp], #160
> >>>
> >>> I can't say I'm overly thrilled with adding another save/restore 
> >>> sequence. How about treating it like a real guest exit instead? Granted, 
> >>> there is a bit more overhead to it, but as you pointed out above, this 
> >>> should be pretty rare...
> >>
> >> I have no objection to handling this after exiting back to
> >> __kvm_vcpu_run(), provided the performance is deemed acceptable.
> >>
> > 
> > My guess is that it's going to be visible on non-VHE systems, and given
> > that we're doing all of this for performance in the first place, I'm not
> > exceited about that approach either.
> 
> My rational is that, as we don't disable FP access across most
> exit/entry sequences, we still significantly benefit from the optimization.
> 

Yes, but we will take that cost every time we've blocked (and someone
else used fpsimd) or every time we've returned to user space.  True,
that's slow anywhow, but still...

> > I thought it was acceptable to do another save/restore, because it was
> > only the GPRs (and equivalent to what the compiler would generate for a
> > function call?) and thus not susceptible to the complexities of sysreg
> > save/restores.
> 
> Sysreg? 

What I meant was that this is not saving/restoring any of the system
registers, which is where we've had the most changes and maintenance,
but is restricted to GPRs, but anyway...

> That's not what I'm proposing. What I'm proposing here is that
> we treat FP exception as a shallow exit that immediately returns to the
> guest without touching them. The overhead is an extra save/restore of
> the host's x19-x30, if I got my maths right. I agree that this is
> significant, but I'd like to measure this overhead before we go one way
> or the other.

...sorry, I didn't realize it was a shallow exit you suggested.  That's
a different story, and that would probably be in the noise if we
measured it.

> 
> > Another alternative would be to go back to Dave's original approach of
> > implementing the fpsimd state update to the host's structure in assembly
> > directly, but I was having a hard time understanding that.  Perhaps I
> > just need to try harder.
> I'd rather stick to the current C approach, no matter how we perform the
> save/restore. It feels a lot more readable and maintainable in the long run.
> 

Agreed.

Thanks,
-Christoffer

WARNING: multiple messages have this Message-ID (diff)
From: cdall@kernel.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
Date: Mon, 9 Apr 2018 12:26:27 +0200	[thread overview]
Message-ID: <20180409102627.GC10904@cbox> (raw)
In-Reply-To: <c1bbee2f-6087-c871-d68a-3d5ea84e0b8f@arm.com>

On Mon, Apr 09, 2018 at 11:00:40AM +0100, Marc Zyngier wrote:
> On 09/04/18 10:44, Christoffer Dall wrote:
> > On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote:
> >> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
> >>> Hi Dave,
> >>>
> >>> On 06/04/18 16:01, Dave Martin wrote:
> >>>> To make the lazy FPSIMD context switch trap code easier to hack on,
> >>>> this patch converts it to C.
> >>>>
> >>>> This is not amazingly efficient, but the trap should typically only
> >>>> be taken once per host context switch.
> >>>>
> >>>> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> >>>>
> >>>> ---
> >>>>
> >>>> Since RFCv1:
> >>>>
> >>>>  * Fix indentation to be consistent with the rest of the file.
> >>>>  * Add missing ! to write back to sp with attempting to push regs.
> >>>> ---
> >>>>  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> >>>>  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> >>>>  2 files changed, 46 insertions(+), 35 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> >>>> index fdd1068..47c6a78 100644
> >>>> --- a/arch/arm64/kvm/hyp/entry.S
> >>>> +++ b/arch/arm64/kvm/hyp/entry.S
> >>>> @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
> >>>>  	// x1: vcpu
> >>>>  	// x2-x29,lr: vcpu regs
> >>>>  	// vcpu x0-x1 on the stack
> >>>> -	stp	x2, x3, [sp, #-16]!
> >>>> -	stp	x4, lr, [sp, #-16]!
> >>>> -
> >>>> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> >>>> -	mrs	x2, cptr_el2
> >>>> -	bic	x2, x2, #CPTR_EL2_TFP
> >>>> -	msr	cptr_el2, x2
> >>>> -alternative_else
> >>>> -	mrs	x2, cpacr_el1
> >>>> -	orr	x2, x2, #CPACR_EL1_FPEN
> >>>> -	msr	cpacr_el1, x2
> >>>> -alternative_endif
> >>>> -	isb
> >>>> -
> >>>> -	mov	x3, x1
> >>>> -
> >>>> -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> >>>> -	kern_hyp_va x0
> >>>> -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> -	bl	__fpsimd_save_state
> >>>> -
> >>>> -	add	x2, x3, #VCPU_CONTEXT
> >>>> -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> -	bl	__fpsimd_restore_state
> >>>> -
> >>>> -	// Skip restoring fpexc32 for AArch64 guests
> >>>> -	mrs	x1, hcr_el2
> >>>> -	tbnz	x1, #HCR_RW_SHIFT, 1f
> >>>> -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> >>>> -	msr	fpexc32_el2, x4
> >>>> -1:
> >>>> -	ldp	x4, lr, [sp], #16
> >>>> -	ldp	x2, x3, [sp], #16
> >>>> -	ldp	x0, x1, [sp], #16
> >>>> -
> >>>> +	stp	x2, x3, [sp, #-144]!
> >>>> +	stp	x4, x5, [sp, #16]
> >>>> +	stp	x6, x7, [sp, #32]
> >>>> +	stp	x8, x9, [sp, #48]
> >>>> +	stp	x10, x11, [sp, #64]
> >>>> +	stp	x12, x13, [sp, #80]
> >>>> +	stp	x14, x15, [sp, #96]
> >>>> +	stp	x16, x17, [sp, #112]
> >>>> +	stp	x18, lr, [sp, #128]
> >>>> +
> >>>> +	bl	__hyp_switch_fpsimd
> >>>> +
> >>>> +	ldp	x4, x5, [sp, #16]
> >>>> +	ldp	x6, x7, [sp, #32]
> >>>> +	ldp	x8, x9, [sp, #48]
> >>>> +	ldp	x10, x11, [sp, #64]
> >>>> +	ldp	x12, x13, [sp, #80]
> >>>> +	ldp	x14, x15, [sp, #96]
> >>>> +	ldp	x16, x17, [sp, #112]
> >>>> +	ldp	x18, lr, [sp, #128]
> >>>> +	ldp	x0, x1, [sp, #144]
> >>>> +	ldp	x2, x3, [sp], #160
> >>>
> >>> I can't say I'm overly thrilled with adding another save/restore 
> >>> sequence. How about treating it like a real guest exit instead? Granted, 
> >>> there is a bit more overhead to it, but as you pointed out above, this 
> >>> should be pretty rare...
> >>
> >> I have no objection to handling this after exiting back to
> >> __kvm_vcpu_run(), provided the performance is deemed acceptable.
> >>
> > 
> > My guess is that it's going to be visible on non-VHE systems, and given
> > that we're doing all of this for performance in the first place, I'm not
> > exceited about that approach either.
> 
> My rational is that, as we don't disable FP access across most
> exit/entry sequences, we still significantly benefit from the optimization.
> 

Yes, but we will take that cost every time we've blocked (and someone
else used fpsimd) or every time we've returned to user space.  True,
that's slow anywhow, but still...

> > I thought it was acceptable to do another save/restore, because it was
> > only the GPRs (and equivalent to what the compiler would generate for a
> > function call?) and thus not susceptible to the complexities of sysreg
> > save/restores.
> 
> Sysreg? 

What I meant was that this is not saving/restoring any of the system
registers, which is where we've had the most changes and maintenance,
but is restricted to GPRs, but anyway...

> That's not what I'm proposing. What I'm proposing here is that
> we treat FP exception as a shallow exit that immediately returns to the
> guest without touching them. The overhead is an extra save/restore of
> the host's x19-x30, if I got my maths right. I agree that this is
> significant, but I'd like to measure this overhead before we go one way
> or the other.

...sorry, I didn't realize it was a shallow exit you suggested.  That's
a different story, and that would probably be in the noise if we
measured it.

> 
> > Another alternative would be to go back to Dave's original approach of
> > implementing the fpsimd state update to the host's structure in assembly
> > directly, but I was having a hard time understanding that.  Perhaps I
> > just need to try harder.
> I'd rather stick to the current C approach, no matter how we perform the
> save/restore. It feels a lot more readable and maintainable in the long run.
> 

Agreed.

Thanks,
-Christoffer

  reply	other threads:[~2018-04-09 10:26 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-06 15:01 [RFC PATCH v2 0/3] KVM: arm64: Optimise FPSIMD context switching Dave Martin
2018-04-06 15:01 ` Dave Martin
2018-04-06 15:01 ` [RFC PATCH v2 1/3] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change Dave Martin
2018-04-06 15:01   ` Dave Martin
2018-04-06 15:01 ` [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C Dave Martin
2018-04-06 15:01   ` Dave Martin
2018-04-06 15:25   ` Marc Zyngier
2018-04-06 15:25     ` Marc Zyngier
2018-04-06 15:51     ` Dave Martin
2018-04-06 15:51       ` Dave Martin
2018-04-09  9:44       ` Christoffer Dall
2018-04-09  9:44         ` Christoffer Dall
2018-04-09 10:00         ` Marc Zyngier
2018-04-09 10:00           ` Marc Zyngier
2018-04-09 10:26           ` Christoffer Dall [this message]
2018-04-09 10:26             ` Christoffer Dall
2018-04-06 15:01 ` [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing Dave Martin
2018-04-06 15:01   ` Dave Martin
2018-04-07  9:54   ` Marc Zyngier
2018-04-07  9:54     ` Marc Zyngier
2018-04-09 10:55     ` Dave Martin
2018-04-09 10:55       ` Dave Martin
2018-04-09  9:48   ` Christoffer Dall
2018-04-09  9:48     ` Christoffer Dall
2018-04-09 10:23     ` Dave Martin
2018-04-09 10:23       ` Dave Martin
2018-04-09 10:57     ` Dave Martin
2018-04-09 10:57       ` Dave Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180409102627.GC10904@cbox \
    --to=cdall@kernel.org \
    --cc=Dave.Martin@arm.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=marc.zyngier@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.