Kernel KVM virtualization development
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: arm64: nv: Reduce FP/SVE overhead on exception/exception return
@ 2026-05-12 14:07 Marc Zyngier
  2026-05-12 14:07 ` [PATCH 1/2] KVM: arm64: nv: Track L2 to L1 exception emulation Marc Zyngier
  2026-05-12 14:07 ` [PATCH 2/2] KVM: arm64: nv: Don't save/restore FP register during a nested ERET or exception Marc Zyngier
  0 siblings, 2 replies; 3+ messages in thread
From: Marc Zyngier @ 2026-05-12 14:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm
  Cc: Steffen Eiden, Joey Gouly, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu, Mark Rutland, Will Deacon, Fuad Tabba

Staring at NV traces has shown that there is a substantial amount of
overhead being triggered when a guest switches between EL1 and EL2 (or
the reverse). This is caused by the naive put/load mechanism we use to
multiplex EL1 and EL2 onto EL1 only, and the FP handling appears as a
prime candidate for optimisation. More precisely, there are two
distinct sources of overhead here:

- the FP/SVE registers are saved, and potentially the host userspace
  state restored when doing put()

- the FP traps are reinstated as part of load(), as the state is now
  the host's

These two things mean that we end-up with a lot of work during this
switch, and that we are 100% guaranteed to get a FP/SVE trap very
quickly, as the guest keeps using the FP registers. These traps
themselves result in some horrible trap amplification in even moderate
levels of nesting, which we could trivially avoid. A bit of thinking
indicates that it should be entirely valid to elide this stuff in the
context of a nested exception/exception return.

The first patch in this small series just add a new vcpu state flag
indicating that put() and load() are done in the context of a nested
exception from L2 to L1. This is the exact pendent of IN_NESTED_ERET,
which tracks an ERET from L1 to L2.

The second patch uses these two flags to abruptly elide FP/SVE
save/restore when any of them is set, sidestepping the overhead
entirely.

Performance-wise, this is rather impressive. I get a 10%-20%
improvement on running the Debian installed as an L3 on my QC
platform. Combined with the use of the EL2 virtual timer, it almost
makes L3 usable.

But of course, nothing is simple with this stuff, which is why I'm
cc'ing Mark here, as he's done a lot of work tracking funny bugs in
our FP handling. Hopefully I haven't subtly broken anything, but let's
see!

Marc Zyngier (2):
  KVM: arm64: nv: Track L2 to L1 exception emulation
  KVM: arm64: nv: Don't save/restore FP register during a nested ERET or
    exception

 arch/arm64/include/asm/kvm_host.h | 3 ++-
 arch/arm64/kvm/emulate-nested.c   | 4 ++++
 arch/arm64/kvm/fpsimd.c           | 8 ++++++++
 3 files changed, 14 insertions(+), 1 deletion(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-12 14:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12 14:07 [PATCH 0/2] KVM: arm64: nv: Reduce FP/SVE overhead on exception/exception return Marc Zyngier
2026-05-12 14:07 ` [PATCH 1/2] KVM: arm64: nv: Track L2 to L1 exception emulation Marc Zyngier
2026-05-12 14:07 ` [PATCH 2/2] KVM: arm64: nv: Don't save/restore FP register during a nested ERET or exception Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox