Re: [PATCH 2/2] KVM: arm64: nv: Don't save/restore FP register during a nested ERET or exception

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Marc Zyngier <maz@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	kvm@vger.kernel.org, Steffen Eiden <seiden@linux.ibm.com>,
	Joey Gouly <joey.gouly@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Oliver Upton <oupton@kernel.org>,
	Zenghui Yu <yuzenghui@huawei.com>, Will Deacon <will@kernel.org>,
	Fuad Tabba <tabba@google.com>
Subject: Re: [PATCH 2/2] KVM: arm64: nv: Don't save/restore FP register during a nested ERET or exception
Date: Wed, 13 May 2026 13:49:49 +0100	[thread overview]
Message-ID: <86cxyzxymq.wl-maz@kernel.org> (raw)
In-Reply-To: <agRuiKHWWn_88YzT@J2N7QTR9R3>

Hi Mark,

Thanks for looking into this.

On Wed, 13 May 2026 13:28:56 +0100,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> On Tue, May 12, 2026 at 03:07:55PM +0100, Marc Zyngier wrote:
> > When switching between L1 and L2, we diligently use a non-preemptible
> > put/load sequence in order to make sure that the old state is saved,
> > while the new state is brought in. Crucially, this includes the FP
> > registers.
> > 
> > However, this is a bit silly. The FP registers are completely shared
> > between the various ELs (just like the GPRs, really), and eagerly
> > save/restoring those in a non-preemptible section is just overhead.
> > Not to mention that the next access will end-up trapping, something
> > that becomes exponentially expensive as we nest deeper.
> > 
> > The temptation is therefore to completely drop this save/restore thing.
> > Why is it valid to do so? By analogy, the hypervisor doesn't try to
> > poloce things between EL1 and EL0, or between EL2 and EL0. Why should
> > it do so between EL2 and EL1 (or EL2 and L2 EL0)?
> >
> > Once you admit that the FP (and by extension SVE) registers are EL-agnostic,
> > the things that matter are:
> 
> s/poloce/police/ ?

That.

> 
> The above is a bit flowery; it would be nice to remove the rhetorical
> questions and just state that (aside from some control registers) the
> FPSIMD/SVE/SME state is shared between exception levels and doesn't need
> to be saved/restored.
> 
> How about:
> 
>   When switching between L1 and L2, we save the old state using
>   kvm_arch_vcpu_put(), mutate the state in memory, then load the new
>   state using kvm_arch_vcpu_load(). Any live FPSIMD/SVE state is saved
>   and unbound, such that it can be lazily restored on a subsequent trap.
> 
>   The FPSIMD/SVE state is shared by exception levels, and only a handful
>   of related control registers need to be changed when transitioning
>   between L1 and L2. The save/restore of the common state is needless
>   overhead, especially as trapping becomes exponentially more expensive
>   with nesting.
> 
>   Avoid this overhead by leaving the common FPSIMD/SVE state live on the
>   CPU, and only switching the state that is distinct for L1 and L2:
>

Sold. Do you offer a CMAAS (Commit Message As A Service)? Asking for a
friend... ;-)

> > - the trap controls: the effective values are recomputed on each entry
> >   into the guest to take the EL into account and merge the L0 and L1
> >   configuration if in a nested context, or directly use the L0 configuration
> >   in non-nested context (see __activate_traps()).
> > 
> > - the VL settings: the effective values are are also recomputed on each
> >   entry into the guest (see fpsimd_lazy_switch_to_guest()).
> 
> This is true for FPSIMD+SVE today. For SME, SMCR_ELx also contains other
> controls, and will need to be dealt with similarly. It might be worth
> noting that (and that ZCR_ELx could gain new controls in future).
>

Yeah. I tried not to worry too much about SME, but given that it is on
people's radar, I'll drop a comment here.

> > Since we appear to cover all bases, use the vcpu flags indicating the
> > handling of a nested ERET or exception delivery to avoid the whole FP
> > save/restore shenanigans.
> > 
> > For an EL1 L3 guest where L1 and L2 have this optimisation, this
> > results in at least a 10% wall clock reduction when running an I/O
> > heavy workload, generating a high rate of nested exceptions.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/kvm/fpsimd.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> > index 15e17aca1dec0..73eda0f46b127 100644
> > --- a/arch/arm64/kvm/fpsimd.c
> > +++ b/arch/arm64/kvm/fpsimd.c
> > @@ -28,6 +28,10 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
> >  	if (!system_supports_fpsimd())
> >  		return;
> >  
> > +	if (vcpu_get_flag(vcpu, IN_NESTED_ERET) ||
> > +	    vcpu_get_flag(vcpu, IN_NESTED_EXCEPTION))
> > +		return;
> > +
> 
> I think we need a comment as to why this is safe, with some other detail
> from the commit message. It would also be good to have asserts here to
> catch if something goes wrong.
> 
> How about:
> 
> 	/*
> 	 * Avoid needless save/restore of the guest's common
> 	 * FPSIMD/SVE/SME regs during transitions between L1/L2.
> 	 *
> 	 * These transitions only happens in a non-preemptible context
> 	 * where the host regs have already been saved and unbound. The
> 	 * live registers are either free or owned by the guest.
> 	 */
> 	if (vcpu_get_flag(vcpu, IN_NESTED_ERET) ||
> 	    vcpu_get_flag(vcpu, IN_NESTED_EXCEPTION) {
> 		WARN_ON_ONCE(host_owns_fp_regs());
> 		return;
> 	}
> 
> ... ?
> 
> Note: I didn't add WARN_ON_ONCE(preemptible()), since
> kvm_arch_vcpu_load_fp() should *never* be called in a preemptible
> context.
> 
> >  	/*
> >  	 * Ensure that any host FPSIMD/SVE/SME state is saved and unbound such
> >  	 * that the host kernel is responsible for restoring this state upon
> > @@ -102,6 +106,10 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
> >  {
> >  	unsigned long flags;
> >  
> > +	if (vcpu_get_flag(vcpu, IN_NESTED_ERET) ||
> > +	    vcpu_get_flag(vcpu, IN_NESTED_EXCEPTION))
> > +		return;
> 
> Likewise here, but we can reduce the comment, e.g.
> 
> 	/*
> 	 * See comment in kvm_arch_vcpu_load_fp().
> 	 */
> 	if (vcpu_get_flag(vcpu, IN_NESTED_ERET) ||
> 	    vcpu_get_flag(vcpu, IN_NESTED_EXCEPTION) {
> 		WARN_ON_ONCE(host_owns_fp_regs());
> 		return;
> 	}

Yup, that all looks good to me. I'll repost that next week with these
changes.

Thanks again,

	M.

-- 
Without deviation from the norm, progress is not possible.

     prev parent reply	other threads:[~2026-05-13 12:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 14:07 [PATCH 0/2] KVM: arm64: nv: Reduce FP/SVE overhead on exception/exception return Marc Zyngier
2026-05-12 14:07 ` [PATCH 1/2] KVM: arm64: nv: Track L2 to L1 exception emulation Marc Zyngier
2026-05-12 14:07 ` [PATCH 2/2] KVM: arm64: nv: Don't save/restore FP register during a nested ERET or exception Marc Zyngier
2026-05-13 12:28   ` Mark Rutland
2026-05-13 12:49     ` Marc Zyngier [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86cxyzxymq.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=joey.gouly@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=oupton@kernel.org \
    --cc=seiden@linux.ibm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox