From: Mark Rutland <mark.rutland@arm.com>
To: Mark Brown <broonie@kernel.org>, Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>,
Joey Gouly <joey.gouly@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Will Deacon <will@kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Jonathan Corbet <corbet@lwn.net>, Shuah Khan <shuah@kernel.org>,
Dave Martin <Dave.Martin@arm.com>, Fuad Tabba <tabba@google.com>,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH RFC v3 09/27] KVM: arm64: Factor SVE guest exit handling out into a function
Date: Fri, 17 Jan 2025 11:34:09 +0000 [thread overview]
Message-ID: <Z4pAMaEYvdLpmbg2@J2N7QTR9R3> (raw)
In-Reply-To: <20241220-kvm-arm64-sme-v3-9-05b018c1ffeb@kernel.org>
On Fri, Dec 20, 2024 at 04:46:34PM +0000, Mark Brown wrote:
> The SVE portion of kvm_vcpu_put() is quite large, especially given the
> comments required. When we add similar handling for SME the function
> will get even larger, in order to keep things managable factor the SVE
> portion out of the main kvm_vcpu_put().
While investigating some problems with SVE I spotted a latent bug in
this area where I suspect the fix will conflict with / supersede this
rework. Details below; IIUC the bug was introduced in commit:
8c8010d69c132273 ("KVM: arm64: Save/restore SVE state for nVHE")
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
> arch/arm64/kvm/fpsimd.c | 67 +++++++++++++++++++++++++++----------------------
> 1 file changed, 37 insertions(+), 30 deletions(-)
>
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> index 09b65abaf9db60cc57dbc554ad2108a80c2dc46b..3c2e0b96877ac5b4f3b9d8dfa38975f11b74b60d 100644
> --- a/arch/arm64/kvm/fpsimd.c
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -151,6 +151,41 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
> }
> }
>
> +static void kvm_vcpu_put_sve(struct kvm_vcpu *vcpu)
> +{
> + u64 zcr;
> +
> + if (!vcpu_has_sve(vcpu))
> + return;
> +
> + zcr = read_sysreg_el1(SYS_ZCR);
> +
> + /*
> + * If the vCPU is in the hyp context then ZCR_EL1 is loaded
> + * with its vEL2 counterpart.
> + */
> + __vcpu_sys_reg(vcpu, vcpu_sve_zcr_elx(vcpu)) = zcr;
> +
> + /*
> + * Restore the VL that was saved when bound to the CPU, which
> + * is the maximum VL for the guest. Because the layout of the
> + * data when saving the sve state depends on the VL, we need
> + * to use a consistent (i.e., the maximum) VL. Note that this
> + * means that at guest exit ZCR_EL1 is not necessarily the
> + * same as on guest entry.
> + *
> + * ZCR_EL2 holds the guest hypervisor's VL when running a
> + * nested guest, which could be smaller than the max for the
> + * vCPU. Similar to above, we first need to switch to a VL
> + * consistent with the layout of the vCPU's SVE state. KVM
> + * support for NV implies VHE, so using the ZCR_EL1 alias is
> + * safe.
> + */
> + if (!has_vhe() || (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu)))
> + sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1,
> + SYS_ZCR_EL1);
> +}
> +
> /*
> * Write back the vcpu FPSIMD regs if they are dirty, and invalidate the
> * cpu FPSIMD regs so that they can't be spuriously reused if this vcpu
> @@ -179,38 +214,10 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
> }
A little before this context, kvm_arch_vcpu_put_fp() calls
local_irq_save(), which indicates that IRQs can be taken before this
point, which is deeply suspicious.
If IRQs are enabled, then it's possible to take an IRQ and potentially
run a softirq handler which uses kernel-mode NEON. That means
kernel_neon_begin() will try to save the live FPSIMD/SVE/SME state via
fpsimd_save_user_state(), using the live value of ZCR_ELx.LEN, which would not
be correct per the comment.
Looking at kvm_arch_vcpu_ioctl_run(), the relevant logic is:
vcpu_load(vcpu); // calls kvm_arch_vcpu_load_fp()
while (ret > 0) {
preempt_disable();
local_irq_disable();
kvm_arch_vcpu_ctxflush_fp();
ret = kvm_arm_vcpu_enter_exit(vcpu);
kvm_arch_vcpu_ctxsync_fp(vcpu);
local_irq_enable();
preempt_enable();
}
vcpu_put(vcpu); // calls kvm_arch_vcpu_put_fp()
... and the problem can occur at any point after the vCPU has run where IRQs
are enabled, i.e, between local_irq_enable() and either local_irq_disable() or
vcpu_put()'s call to kvm_arch_vcpu_put_fp().
Note that kernel_neon_begin() calls:
fpsimd_save_user_state();
...
fpsimd_flush_cpu_state();
... and fpsimd_save_user_state() will see that the SVE VL is wrong:
if (WARN_ON(sve_get_vl() != vl)) {
force_signal_inject(SIGKILL, SI_KERNEL, 0, 0);
return;
}
... pending a SIGKILL for the VMM thread without saving the vCPU's state
before unbinding the live state via fpsimd_flush_cpu_state(), which'll
set TIF_FOREIGN_FPSTATE.
AFAICT it's possible to re-enter the vCPU after that happens, whereupon
stale vCPU FPSIMD/SVE state will be restored. Upon return to userspace
the SIGKILL will be delivered, killing the VMM.
As above, it looks like that's been broken since the nVHE SVE
save/restore was introduced in commit:
8c8010d69c132273 ("KVM: arm64: Save/restore SVE state for nVHE")
The TL;DR summary is that it's not sufficient for kvm_arch_vcpu_put_fp()
to fix up ZCR_ELx. Either:
* That needs to be fixed up while IRQs are masked, e.g. by
saving/restoring the host and guest ZCR_EL1 and/or ZCR_ELx values in
kvm_arch_vcpu_ctxflush_fp() and kvm_arch_vcpu_ctxsync_fp()
* The lazy save logic in fpsimd_save_user_state() needs to handle KVM
explicitly, saving the guest's ZCR_EL1 and restoring the host's
ZCR_ELx.
I think we need to fix that before we extend this logic for SME.
Mark.
>
> if (guest_owns_fp_regs()) {
> - if (vcpu_has_sve(vcpu)) {
> - u64 zcr = read_sysreg_el1(SYS_ZCR);
> -
> - /*
> - * If the vCPU is in the hyp context then ZCR_EL1 is
> - * loaded with its vEL2 counterpart.
> - */
> - __vcpu_sys_reg(vcpu, vcpu_sve_zcr_elx(vcpu)) = zcr;
> -
> - /*
> - * Restore the VL that was saved when bound to the CPU,
> - * which is the maximum VL for the guest. Because the
> - * layout of the data when saving the sve state depends
> - * on the VL, we need to use a consistent (i.e., the
> - * maximum) VL.
> - * Note that this means that at guest exit ZCR_EL1 is
> - * not necessarily the same as on guest entry.
> - *
> - * ZCR_EL2 holds the guest hypervisor's VL when running
> - * a nested guest, which could be smaller than the
> - * max for the vCPU. Similar to above, we first need to
> - * switch to a VL consistent with the layout of the
> - * vCPU's SVE state. KVM support for NV implies VHE, so
> - * using the ZCR_EL1 alias is safe.
> - */
> - if (!has_vhe() || (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu)))
> - sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1,
> - SYS_ZCR_EL1);
> - }
> + kvm_vcpu_put_sve(vcpu);
>
> /*
> - * Flush (save and invalidate) the fpsimd/sve state so that if
> + * Flush (save and invalidate) the FP state so that if
> * the host tries to use fpsimd/sve, it's not using stale data
> * from the guest.
> *
>
> --
> 2.39.5
>
>
next prev parent reply other threads:[~2025-01-17 11:34 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-20 16:46 [PATCH RFC v3 00/27] KVM: arm64: Implement support for SME in non-protected guests Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 01/27] arm64/fpsimd: Update FA64 and ZT0 enables when loading SME state Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 02/27] arm64/fpsimd: Decide to save ZT0 and streaming mode FFR at bind time Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 03/27] arm64/fpsimd: Check enable bit for FA64 when saving EFI state Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 04/27] arm64/fpsimd: Determine maximum virtualisable SME vector length Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 05/27] KVM: arm64: Introduce non-UNDEF FGT control Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 06/27] KVM: arm64: Pull ctxt_has_ helpers to start of sysreg-sr.h Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 07/27] KVM: arm64: Convert cpacr_clear_set() to a static inline Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 08/27] KVM: arm64: Move SVE state access macros after feature test macros Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 09/27] KVM: arm64: Factor SVE guest exit handling out into a function Mark Brown
2025-01-17 11:34 ` Mark Rutland [this message]
2025-01-17 12:37 ` Mark Brown
2025-01-22 11:51 ` Marc Zyngier
2025-01-22 11:56 ` Mark Rutland
2024-12-20 16:46 ` [PATCH RFC v3 10/27] KVM: arm64: Rename SVE finalization constants to be more general Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 11/27] KVM: arm64: Document the KVM ABI for SME Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 12/27] KVM: arm64: Define internal features " Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 13/27] KVM: arm64: Rename sve_state_reg_region Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 14/27] KVM: arm64: Store vector lengths in an array Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 15/27] KVM: arm64: Implement SME vector length configuration Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 16/27] KVM: arm64: Add definitions for SME control register Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 17/27] KVM: arm64: Support TPIDR2_EL0 Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 18/27] KVM: arm64: Support SMIDR_EL1 for guests Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 19/27] KVM: arm64: Support SME priority registers Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 20/27] KVM: arm64: Provide assembly for SME state restore Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 21/27] KVM: arm64: Support Z and P registers in streaming mode Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 22/27] KVM: arm64: Expose SME specific state to userspace Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 23/27] KVM: arm64: Context switch SME state for normal guests Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 24/27] KVM: arm64: Handle SME exceptions Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 25/27] KVM: arm64: Provide interface for configuring and enabling SME for guests Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 26/27] KVM: arm64: selftests: Add SME system registers to get-reg-list Mark Brown
2024-12-20 16:46 ` [PATCH RFC v3 27/27] KVM: arm64: selftests: Add SME to set_id_regs test Mark Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z4pAMaEYvdLpmbg2@J2N7QTR9R3 \
--to=mark.rutland@arm.com \
--cc=Dave.Martin@arm.com \
--cc=broonie@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=corbet@lwn.net \
--cc=joey.gouly@arm.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=maz@kernel.org \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=shuah@kernel.org \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.