From: Christoffer Dall <cdall@kernel.org>
To: Marc Zyngier <marc.zyngier@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Dave Martin <Dave.Martin@arm.com>,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu
Subject: Re: [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
Date: Mon, 9 Apr 2018 12:26:27 +0200 [thread overview]
Message-ID: <20180409102627.GC10904@cbox> (raw)
In-Reply-To: <c1bbee2f-6087-c871-d68a-3d5ea84e0b8f@arm.com>
On Mon, Apr 09, 2018 at 11:00:40AM +0100, Marc Zyngier wrote:
> On 09/04/18 10:44, Christoffer Dall wrote:
> > On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote:
> >> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
> >>> Hi Dave,
> >>>
> >>> On 06/04/18 16:01, Dave Martin wrote:
> >>>> To make the lazy FPSIMD context switch trap code easier to hack on,
> >>>> this patch converts it to C.
> >>>>
> >>>> This is not amazingly efficient, but the trap should typically only
> >>>> be taken once per host context switch.
> >>>>
> >>>> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> >>>>
> >>>> ---
> >>>>
> >>>> Since RFCv1:
> >>>>
> >>>> * Fix indentation to be consistent with the rest of the file.
> >>>> * Add missing ! to write back to sp with attempting to push regs.
> >>>> ---
> >>>> arch/arm64/kvm/hyp/entry.S | 57 +++++++++++++++++----------------------------
> >>>> arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> >>>> 2 files changed, 46 insertions(+), 35 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> >>>> index fdd1068..47c6a78 100644
> >>>> --- a/arch/arm64/kvm/hyp/entry.S
> >>>> +++ b/arch/arm64/kvm/hyp/entry.S
> >>>> @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
> >>>> // x1: vcpu
> >>>> // x2-x29,lr: vcpu regs
> >>>> // vcpu x0-x1 on the stack
> >>>> - stp x2, x3, [sp, #-16]!
> >>>> - stp x4, lr, [sp, #-16]!
> >>>> -
> >>>> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> >>>> - mrs x2, cptr_el2
> >>>> - bic x2, x2, #CPTR_EL2_TFP
> >>>> - msr cptr_el2, x2
> >>>> -alternative_else
> >>>> - mrs x2, cpacr_el1
> >>>> - orr x2, x2, #CPACR_EL1_FPEN
> >>>> - msr cpacr_el1, x2
> >>>> -alternative_endif
> >>>> - isb
> >>>> -
> >>>> - mov x3, x1
> >>>> -
> >>>> - ldr x0, [x3, #VCPU_HOST_CONTEXT]
> >>>> - kern_hyp_va x0
> >>>> - add x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> - bl __fpsimd_save_state
> >>>> -
> >>>> - add x2, x3, #VCPU_CONTEXT
> >>>> - add x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> - bl __fpsimd_restore_state
> >>>> -
> >>>> - // Skip restoring fpexc32 for AArch64 guests
> >>>> - mrs x1, hcr_el2
> >>>> - tbnz x1, #HCR_RW_SHIFT, 1f
> >>>> - ldr x4, [x3, #VCPU_FPEXC32_EL2]
> >>>> - msr fpexc32_el2, x4
> >>>> -1:
> >>>> - ldp x4, lr, [sp], #16
> >>>> - ldp x2, x3, [sp], #16
> >>>> - ldp x0, x1, [sp], #16
> >>>> -
> >>>> + stp x2, x3, [sp, #-144]!
> >>>> + stp x4, x5, [sp, #16]
> >>>> + stp x6, x7, [sp, #32]
> >>>> + stp x8, x9, [sp, #48]
> >>>> + stp x10, x11, [sp, #64]
> >>>> + stp x12, x13, [sp, #80]
> >>>> + stp x14, x15, [sp, #96]
> >>>> + stp x16, x17, [sp, #112]
> >>>> + stp x18, lr, [sp, #128]
> >>>> +
> >>>> + bl __hyp_switch_fpsimd
> >>>> +
> >>>> + ldp x4, x5, [sp, #16]
> >>>> + ldp x6, x7, [sp, #32]
> >>>> + ldp x8, x9, [sp, #48]
> >>>> + ldp x10, x11, [sp, #64]
> >>>> + ldp x12, x13, [sp, #80]
> >>>> + ldp x14, x15, [sp, #96]
> >>>> + ldp x16, x17, [sp, #112]
> >>>> + ldp x18, lr, [sp, #128]
> >>>> + ldp x0, x1, [sp, #144]
> >>>> + ldp x2, x3, [sp], #160
> >>>
> >>> I can't say I'm overly thrilled with adding another save/restore
> >>> sequence. How about treating it like a real guest exit instead? Granted,
> >>> there is a bit more overhead to it, but as you pointed out above, this
> >>> should be pretty rare...
> >>
> >> I have no objection to handling this after exiting back to
> >> __kvm_vcpu_run(), provided the performance is deemed acceptable.
> >>
> >
> > My guess is that it's going to be visible on non-VHE systems, and given
> > that we're doing all of this for performance in the first place, I'm not
> > exceited about that approach either.
>
> My rational is that, as we don't disable FP access across most
> exit/entry sequences, we still significantly benefit from the optimization.
>
Yes, but we will take that cost every time we've blocked (and someone
else used fpsimd) or every time we've returned to user space. True,
that's slow anywhow, but still...
> > I thought it was acceptable to do another save/restore, because it was
> > only the GPRs (and equivalent to what the compiler would generate for a
> > function call?) and thus not susceptible to the complexities of sysreg
> > save/restores.
>
> Sysreg?
What I meant was that this is not saving/restoring any of the system
registers, which is where we've had the most changes and maintenance,
but is restricted to GPRs, but anyway...
> That's not what I'm proposing. What I'm proposing here is that
> we treat FP exception as a shallow exit that immediately returns to the
> guest without touching them. The overhead is an extra save/restore of
> the host's x19-x30, if I got my maths right. I agree that this is
> significant, but I'd like to measure this overhead before we go one way
> or the other.
...sorry, I didn't realize it was a shallow exit you suggested. That's
a different story, and that would probably be in the noise if we
measured it.
>
> > Another alternative would be to go back to Dave's original approach of
> > implementing the fpsimd state update to the host's structure in assembly
> > directly, but I was having a hard time understanding that. Perhaps I
> > just need to try harder.
> I'd rather stick to the current C approach, no matter how we perform the
> save/restore. It feels a lot more readable and maintainable in the long run.
>
Agreed.
Thanks,
-Christoffer
WARNING: multiple messages have this Message-ID (diff)
From: cdall@kernel.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
Date: Mon, 9 Apr 2018 12:26:27 +0200 [thread overview]
Message-ID: <20180409102627.GC10904@cbox> (raw)
In-Reply-To: <c1bbee2f-6087-c871-d68a-3d5ea84e0b8f@arm.com>
On Mon, Apr 09, 2018 at 11:00:40AM +0100, Marc Zyngier wrote:
> On 09/04/18 10:44, Christoffer Dall wrote:
> > On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote:
> >> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
> >>> Hi Dave,
> >>>
> >>> On 06/04/18 16:01, Dave Martin wrote:
> >>>> To make the lazy FPSIMD context switch trap code easier to hack on,
> >>>> this patch converts it to C.
> >>>>
> >>>> This is not amazingly efficient, but the trap should typically only
> >>>> be taken once per host context switch.
> >>>>
> >>>> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> >>>>
> >>>> ---
> >>>>
> >>>> Since RFCv1:
> >>>>
> >>>> * Fix indentation to be consistent with the rest of the file.
> >>>> * Add missing ! to write back to sp with attempting to push regs.
> >>>> ---
> >>>> arch/arm64/kvm/hyp/entry.S | 57 +++++++++++++++++----------------------------
> >>>> arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> >>>> 2 files changed, 46 insertions(+), 35 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> >>>> index fdd1068..47c6a78 100644
> >>>> --- a/arch/arm64/kvm/hyp/entry.S
> >>>> +++ b/arch/arm64/kvm/hyp/entry.S
> >>>> @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
> >>>> // x1: vcpu
> >>>> // x2-x29,lr: vcpu regs
> >>>> // vcpu x0-x1 on the stack
> >>>> - stp x2, x3, [sp, #-16]!
> >>>> - stp x4, lr, [sp, #-16]!
> >>>> -
> >>>> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> >>>> - mrs x2, cptr_el2
> >>>> - bic x2, x2, #CPTR_EL2_TFP
> >>>> - msr cptr_el2, x2
> >>>> -alternative_else
> >>>> - mrs x2, cpacr_el1
> >>>> - orr x2, x2, #CPACR_EL1_FPEN
> >>>> - msr cpacr_el1, x2
> >>>> -alternative_endif
> >>>> - isb
> >>>> -
> >>>> - mov x3, x1
> >>>> -
> >>>> - ldr x0, [x3, #VCPU_HOST_CONTEXT]
> >>>> - kern_hyp_va x0
> >>>> - add x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> - bl __fpsimd_save_state
> >>>> -
> >>>> - add x2, x3, #VCPU_CONTEXT
> >>>> - add x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> - bl __fpsimd_restore_state
> >>>> -
> >>>> - // Skip restoring fpexc32 for AArch64 guests
> >>>> - mrs x1, hcr_el2
> >>>> - tbnz x1, #HCR_RW_SHIFT, 1f
> >>>> - ldr x4, [x3, #VCPU_FPEXC32_EL2]
> >>>> - msr fpexc32_el2, x4
> >>>> -1:
> >>>> - ldp x4, lr, [sp], #16
> >>>> - ldp x2, x3, [sp], #16
> >>>> - ldp x0, x1, [sp], #16
> >>>> -
> >>>> + stp x2, x3, [sp, #-144]!
> >>>> + stp x4, x5, [sp, #16]
> >>>> + stp x6, x7, [sp, #32]
> >>>> + stp x8, x9, [sp, #48]
> >>>> + stp x10, x11, [sp, #64]
> >>>> + stp x12, x13, [sp, #80]
> >>>> + stp x14, x15, [sp, #96]
> >>>> + stp x16, x17, [sp, #112]
> >>>> + stp x18, lr, [sp, #128]
> >>>> +
> >>>> + bl __hyp_switch_fpsimd
> >>>> +
> >>>> + ldp x4, x5, [sp, #16]
> >>>> + ldp x6, x7, [sp, #32]
> >>>> + ldp x8, x9, [sp, #48]
> >>>> + ldp x10, x11, [sp, #64]
> >>>> + ldp x12, x13, [sp, #80]
> >>>> + ldp x14, x15, [sp, #96]
> >>>> + ldp x16, x17, [sp, #112]
> >>>> + ldp x18, lr, [sp, #128]
> >>>> + ldp x0, x1, [sp, #144]
> >>>> + ldp x2, x3, [sp], #160
> >>>
> >>> I can't say I'm overly thrilled with adding another save/restore
> >>> sequence. How about treating it like a real guest exit instead? Granted,
> >>> there is a bit more overhead to it, but as you pointed out above, this
> >>> should be pretty rare...
> >>
> >> I have no objection to handling this after exiting back to
> >> __kvm_vcpu_run(), provided the performance is deemed acceptable.
> >>
> >
> > My guess is that it's going to be visible on non-VHE systems, and given
> > that we're doing all of this for performance in the first place, I'm not
> > exceited about that approach either.
>
> My rational is that, as we don't disable FP access across most
> exit/entry sequences, we still significantly benefit from the optimization.
>
Yes, but we will take that cost every time we've blocked (and someone
else used fpsimd) or every time we've returned to user space. True,
that's slow anywhow, but still...
> > I thought it was acceptable to do another save/restore, because it was
> > only the GPRs (and equivalent to what the compiler would generate for a
> > function call?) and thus not susceptible to the complexities of sysreg
> > save/restores.
>
> Sysreg?
What I meant was that this is not saving/restoring any of the system
registers, which is where we've had the most changes and maintenance,
but is restricted to GPRs, but anyway...
> That's not what I'm proposing. What I'm proposing here is that
> we treat FP exception as a shallow exit that immediately returns to the
> guest without touching them. The overhead is an extra save/restore of
> the host's x19-x30, if I got my maths right. I agree that this is
> significant, but I'd like to measure this overhead before we go one way
> or the other.
...sorry, I didn't realize it was a shallow exit you suggested. That's
a different story, and that would probably be in the noise if we
measured it.
>
> > Another alternative would be to go back to Dave's original approach of
> > implementing the fpsimd state update to the host's structure in assembly
> > directly, but I was having a hard time understanding that. Perhaps I
> > just need to try harder.
> I'd rather stick to the current C approach, no matter how we perform the
> save/restore. It feels a lot more readable and maintainable in the long run.
>
Agreed.
Thanks,
-Christoffer
next prev parent reply other threads:[~2018-04-09 10:26 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-06 15:01 [RFC PATCH v2 0/3] KVM: arm64: Optimise FPSIMD context switching Dave Martin
2018-04-06 15:01 ` Dave Martin
2018-04-06 15:01 ` [RFC PATCH v2 1/3] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change Dave Martin
2018-04-06 15:01 ` Dave Martin
2018-04-06 15:01 ` [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C Dave Martin
2018-04-06 15:01 ` Dave Martin
2018-04-06 15:25 ` Marc Zyngier
2018-04-06 15:25 ` Marc Zyngier
2018-04-06 15:51 ` Dave Martin
2018-04-06 15:51 ` Dave Martin
2018-04-09 9:44 ` Christoffer Dall
2018-04-09 9:44 ` Christoffer Dall
2018-04-09 10:00 ` Marc Zyngier
2018-04-09 10:00 ` Marc Zyngier
2018-04-09 10:26 ` Christoffer Dall [this message]
2018-04-09 10:26 ` Christoffer Dall
2018-04-06 15:01 ` [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing Dave Martin
2018-04-06 15:01 ` Dave Martin
2018-04-07 9:54 ` Marc Zyngier
2018-04-07 9:54 ` Marc Zyngier
2018-04-09 10:55 ` Dave Martin
2018-04-09 10:55 ` Dave Martin
2018-04-09 9:48 ` Christoffer Dall
2018-04-09 9:48 ` Christoffer Dall
2018-04-09 10:23 ` Dave Martin
2018-04-09 10:23 ` Dave Martin
2018-04-09 10:57 ` Dave Martin
2018-04-09 10:57 ` Dave Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180409102627.GC10904@cbox \
--to=cdall@kernel.org \
--cc=Dave.Martin@arm.com \
--cc=ard.biesheuvel@linaro.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=marc.zyngier@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.