linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: cdall@kernel.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
Date: Mon, 9 Apr 2018 12:26:27 +0200	[thread overview]
Message-ID: <20180409102627.GC10904@cbox> (raw)
In-Reply-To: <c1bbee2f-6087-c871-d68a-3d5ea84e0b8f@arm.com>

On Mon, Apr 09, 2018 at 11:00:40AM +0100, Marc Zyngier wrote:
> On 09/04/18 10:44, Christoffer Dall wrote:
> > On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote:
> >> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
> >>> Hi Dave,
> >>>
> >>> On 06/04/18 16:01, Dave Martin wrote:
> >>>> To make the lazy FPSIMD context switch trap code easier to hack on,
> >>>> this patch converts it to C.
> >>>>
> >>>> This is not amazingly efficient, but the trap should typically only
> >>>> be taken once per host context switch.
> >>>>
> >>>> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> >>>>
> >>>> ---
> >>>>
> >>>> Since RFCv1:
> >>>>
> >>>>  * Fix indentation to be consistent with the rest of the file.
> >>>>  * Add missing ! to write back to sp with attempting to push regs.
> >>>> ---
> >>>>  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> >>>>  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> >>>>  2 files changed, 46 insertions(+), 35 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> >>>> index fdd1068..47c6a78 100644
> >>>> --- a/arch/arm64/kvm/hyp/entry.S
> >>>> +++ b/arch/arm64/kvm/hyp/entry.S
> >>>> @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
> >>>>  	// x1: vcpu
> >>>>  	// x2-x29,lr: vcpu regs
> >>>>  	// vcpu x0-x1 on the stack
> >>>> -	stp	x2, x3, [sp, #-16]!
> >>>> -	stp	x4, lr, [sp, #-16]!
> >>>> -
> >>>> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> >>>> -	mrs	x2, cptr_el2
> >>>> -	bic	x2, x2, #CPTR_EL2_TFP
> >>>> -	msr	cptr_el2, x2
> >>>> -alternative_else
> >>>> -	mrs	x2, cpacr_el1
> >>>> -	orr	x2, x2, #CPACR_EL1_FPEN
> >>>> -	msr	cpacr_el1, x2
> >>>> -alternative_endif
> >>>> -	isb
> >>>> -
> >>>> -	mov	x3, x1
> >>>> -
> >>>> -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> >>>> -	kern_hyp_va x0
> >>>> -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> -	bl	__fpsimd_save_state
> >>>> -
> >>>> -	add	x2, x3, #VCPU_CONTEXT
> >>>> -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> -	bl	__fpsimd_restore_state
> >>>> -
> >>>> -	// Skip restoring fpexc32 for AArch64 guests
> >>>> -	mrs	x1, hcr_el2
> >>>> -	tbnz	x1, #HCR_RW_SHIFT, 1f
> >>>> -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> >>>> -	msr	fpexc32_el2, x4
> >>>> -1:
> >>>> -	ldp	x4, lr, [sp], #16
> >>>> -	ldp	x2, x3, [sp], #16
> >>>> -	ldp	x0, x1, [sp], #16
> >>>> -
> >>>> +	stp	x2, x3, [sp, #-144]!
> >>>> +	stp	x4, x5, [sp, #16]
> >>>> +	stp	x6, x7, [sp, #32]
> >>>> +	stp	x8, x9, [sp, #48]
> >>>> +	stp	x10, x11, [sp, #64]
> >>>> +	stp	x12, x13, [sp, #80]
> >>>> +	stp	x14, x15, [sp, #96]
> >>>> +	stp	x16, x17, [sp, #112]
> >>>> +	stp	x18, lr, [sp, #128]
> >>>> +
> >>>> +	bl	__hyp_switch_fpsimd
> >>>> +
> >>>> +	ldp	x4, x5, [sp, #16]
> >>>> +	ldp	x6, x7, [sp, #32]
> >>>> +	ldp	x8, x9, [sp, #48]
> >>>> +	ldp	x10, x11, [sp, #64]
> >>>> +	ldp	x12, x13, [sp, #80]
> >>>> +	ldp	x14, x15, [sp, #96]
> >>>> +	ldp	x16, x17, [sp, #112]
> >>>> +	ldp	x18, lr, [sp, #128]
> >>>> +	ldp	x0, x1, [sp, #144]
> >>>> +	ldp	x2, x3, [sp], #160
> >>>
> >>> I can't say I'm overly thrilled with adding another save/restore 
> >>> sequence. How about treating it like a real guest exit instead? Granted, 
> >>> there is a bit more overhead to it, but as you pointed out above, this 
> >>> should be pretty rare...
> >>
> >> I have no objection to handling this after exiting back to
> >> __kvm_vcpu_run(), provided the performance is deemed acceptable.
> >>
> > 
> > My guess is that it's going to be visible on non-VHE systems, and given
> > that we're doing all of this for performance in the first place, I'm not
> > exceited about that approach either.
> 
> My rational is that, as we don't disable FP access across most
> exit/entry sequences, we still significantly benefit from the optimization.
> 

Yes, but we will take that cost every time we've blocked (and someone
else used fpsimd) or every time we've returned to user space.  True,
that's slow anywhow, but still...

> > I thought it was acceptable to do another save/restore, because it was
> > only the GPRs (and equivalent to what the compiler would generate for a
> > function call?) and thus not susceptible to the complexities of sysreg
> > save/restores.
> 
> Sysreg? 

What I meant was that this is not saving/restoring any of the system
registers, which is where we've had the most changes and maintenance,
but is restricted to GPRs, but anyway...

> That's not what I'm proposing. What I'm proposing here is that
> we treat FP exception as a shallow exit that immediately returns to the
> guest without touching them. The overhead is an extra save/restore of
> the host's x19-x30, if I got my maths right. I agree that this is
> significant, but I'd like to measure this overhead before we go one way
> or the other.

...sorry, I didn't realize it was a shallow exit you suggested.  That's
a different story, and that would probably be in the noise if we
measured it.

> 
> > Another alternative would be to go back to Dave's original approach of
> > implementing the fpsimd state update to the host's structure in assembly
> > directly, but I was having a hard time understanding that.  Perhaps I
> > just need to try harder.
> I'd rather stick to the current C approach, no matter how we perform the
> save/restore. It feels a lot more readable and maintainable in the long run.
> 

Agreed.

Thanks,
-Christoffer

  reply	other threads:[~2018-04-09 10:26 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-06 15:01 [RFC PATCH v2 0/3] KVM: arm64: Optimise FPSIMD context switching Dave Martin
2018-04-06 15:01 ` [RFC PATCH v2 1/3] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change Dave Martin
2018-04-06 15:01 ` [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C Dave Martin
2018-04-06 15:25   ` Marc Zyngier
2018-04-06 15:51     ` Dave Martin
2018-04-09  9:44       ` Christoffer Dall
2018-04-09 10:00         ` Marc Zyngier
2018-04-09 10:26           ` Christoffer Dall [this message]
2018-04-06 15:01 ` [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing Dave Martin
2018-04-07  9:54   ` Marc Zyngier
2018-04-09 10:55     ` Dave Martin
2018-04-09  9:48   ` Christoffer Dall
2018-04-09 10:23     ` Dave Martin
2018-04-09 10:57     ` Dave Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180409102627.GC10904@cbox \
    --to=cdall@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).