Re: [PATCH v5 3/4] KVM: x86: Leave user-return notifier registered on reboot/shutdown

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Chao Gao <chao.gao@intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	"Kirill A. Shutemov" <kas@kernel.org>, <kvm@vger.kernel.org>,
	<x86@kernel.org>, <linux-coco@lists.linux.dev>,
	<linux-kernel@vger.kernel.org>, Yan Zhao <yan.y.zhao@intel.com>,
	Xiaoyao Li <xiaoyao.li@intel.com>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Hou Wenlong <houwenlong.hwl@antgroup.com>
Subject: Re: [PATCH v5 3/4] KVM: x86: Leave user-return notifier registered on reboot/shutdown
Date: Fri, 7 Nov 2025 16:18:22 +0800	[thread overview]
Message-ID: <aQ2rTgWwqWvoqnIL@intel.com> (raw)
In-Reply-To: <20251030191528.3380553-4-seanjc@google.com>

On Thu, Oct 30, 2025 at 12:15:27PM -0700, Sean Christopherson wrote:
>Leave KVM's user-return notifier registered in the unlikely case that the
>notifier is registered when disabling virtualization via IPI callback in
>response to reboot/shutdown.  On reboot/shutdown, keeping the notifier
>registered is ok as far as MSR state is concerned (arguably better then
>restoring MSRs at an unknown point in time), as the callback will run
>cleanly and restore host MSRs if the CPU manages to return to userspace
>before the system goes down.
>
>The only wrinkle is that if kvm.ko module unload manages to race with
>reboot/shutdown, then leaving the notifier registered could lead to
>use-after-free due to calling into unloaded kvm.ko module code.  But such
>a race is only possible on --forced reboot/shutdown, because otherwise
>userspace tasks would be frozen before kvm_shutdown() is called, i.e. on a
>"normal" reboot/shutdown, it should be impossible for the CPU to return to
>userspace after kvm_shutdown().
>
>Furthermore, on a --forced reboot/shutdown, unregistering the user-return
>hook from IRQ context doesn't fully guard against use-after-free, because
>KVM could immediately re-register the hook, e.g. if the IRQ arrives before
>kvm_user_return_register_notifier() is called.
>
>Rather than trying to guard against the IPI in the "normal" user-return
>code, which is difficult and noisy, simply leave the user-return notifier
>registered on a reboot, and bump the kvm.ko module refcount to defend
>against a use-after-free due to kvm.ko unload racing against reboot.
>
>Alternatively, KVM could allow kvm.ko and try to drop the notifiers during
>kvm_x86_exit(), but that's also a can of worms as registration is per-CPU,
>and so KVM would need to blast an IPI, and doing so while a reboot/shutdown
>is in-progress is far risky than preventing userspace from unloading KVM.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
>---
> arch/x86/kvm/x86.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index bb7a7515f280..c927326344b1 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -13086,7 +13086,21 @@ int kvm_arch_enable_virtualization_cpu(void)
> void kvm_arch_disable_virtualization_cpu(void)
> {
> 	kvm_x86_call(disable_virtualization_cpu)();
>-	drop_user_return_notifiers();
>+
>+	/*
>+	 * Leave the user-return notifiers as-is when disabling virtualization
>+	 * for reboot, i.e. when disabling via IPI function call, and instead
>+	 * pin kvm.ko (if it's a module) to defend against use-after-free (in
>+	 * the *very* unlikely scenario module unload is racing with reboot).
>+	 * On a forced reboot, tasks aren't frozen before shutdown, and so KVM
>+	 * could be actively modifying user-return MSR state when the IPI to
>+	 * disable virtualization arrives.  Handle the extreme edge case here
>+	 * instead of trying to account for it in the normal flows.
>+	 */
>+	if (in_task() || WARN_ON_ONCE(!kvm_rebooting))
>+		drop_user_return_notifiers();
>+	else
>+		__module_get(THIS_MODULE);

This doesn't pin kvm-{intel,amd}.ko, right? if so, there is still a potential
user-after-free if the CPU returns to userspace after the per-CPU
user_return_msrs is freed on kvm-{intel,amd}.ko unloading.

I think we need to either move __module_get() into
kvm_x86_call(disable_virtualization_cpu)() or allocate/free the per-CPU
user_return_msrs when loading/unloading kvm.ko. e.g.,

From 0269f0ee839528e8a9616738d615a096901d6185 Mon Sep 17 00:00:00 2001
From: Chao Gao <chao.gao@intel.com>
Date: Fri, 7 Nov 2025 00:10:28 -0800
Subject: [PATCH] KVM: x86: Allocate/free user_return_msrs at kvm.ko
 (un)loading time

Move user_return_msrs allocation/free from vendor modules (kvm-intel.ko and
kvm-amd.ko) (un)loading time to kvm.ko's to make it less risky to access
user_return_msrs in kvm.ko. Tying the lifetime of user_return_msrs to
vendor modules makes every access to user_return_msrs prone to
use-after-free issues as vendor modules may be unloaded at any time.

kvm_nr_uret_msrs is still reset to 0 when vendor modules are loaded to
clear out the user return MSR list configured by the previous vendor
module.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/x86.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bb7a7515f280..ab411bd09567 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -575,18 +575,17 @@ static inline void kvm_async_pf_hash_reset(struct
kvm_vcpu *vcpu)
		vcpu->arch.apf.gfns[i] = ~0;
 }
 
-static int kvm_init_user_return_msrs(void)
+static int __init kvm_init_user_return_msrs(void)
 {
	user_return_msrs = alloc_percpu(struct kvm_user_return_msrs);
	if (!user_return_msrs) {
		pr_err("failed to allocate percpu user_return_msrs\n");
		return -ENOMEM;
	}
-	kvm_nr_uret_msrs = 0;
	return 0;
 }
 
-static void kvm_free_user_return_msrs(void)
+static void __exit kvm_free_user_return_msrs(void)
 {
	int cpu;
 
@@ -10044,13 +10043,11 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops
*ops)
		return -ENOMEM;
	}
 
-	r = kvm_init_user_return_msrs();
-	if (r)
-		goto out_free_x86_emulator_cache;
+	kvm_nr_uret_msrs = 0;
 
	r = kvm_mmu_vendor_module_init();
	if (r)
-		goto out_free_percpu;
+		goto out_free_x86_emulator_cache;
 
	kvm_caps.supported_vm_types = BIT(KVM_X86_DEFAULT_VM);
	kvm_caps.supported_mce_cap = MCG_CTL_P | MCG_SER_P;
@@ -10148,8 +10145,6 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops
*ops)
	kvm_x86_call(hardware_unsetup)();
 out_mmu_exit:
	kvm_mmu_vendor_module_exit();
-out_free_percpu:
-	kvm_free_user_return_msrs();
 out_free_x86_emulator_cache:
	kmem_cache_destroy(x86_emulator_cache);
	return r;
@@ -10178,7 +10173,6 @@ void kvm_x86_vendor_exit(void)
 #endif
	kvm_x86_call(hardware_unsetup)();
	kvm_mmu_vendor_module_exit();
-	kvm_free_user_return_msrs();
	kmem_cache_destroy(x86_emulator_cache);
 #ifdef CONFIG_KVM_XEN
	static_key_deferred_flush(&kvm_xen_enabled);
@@ -14361,8 +14355,14 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_rmp_fault);
 
 static int __init kvm_x86_init(void)
 {
+	int r;
+
	kvm_init_xstate_sizes();
 
+	r = kvm_init_user_return_msrs();
+	if (r)
+		return r;
+
	kvm_mmu_x86_module_init();
	mitigate_smt_rsb &= boot_cpu_has_bug(X86_BUG_SMT_RSB) &&
cpu_smt_possible();
	return 0;
@@ -14371,6 +14371,7 @@ module_init(kvm_x86_init);
 
 static void __exit kvm_x86_exit(void)
 {
+	kvm_free_user_return_msrs();
	WARN_ON_ONCE(static_branch_unlikely(&kvm_has_noapic_vcpu));
 }
 module_exit(kvm_x86_exit);
-- 
2.47.3

next prev parent reply	other threads:[~2025-11-07  8:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-30 19:15 [PATCH v5 0/4] KVM: x86: User-return MSR fix+cleanups Sean Christopherson
2025-10-30 19:15 ` [PATCH v5 1/4] KVM: TDX: Explicitly set user-return MSRs that *may* be clobbered by the TDX-Module Sean Christopherson
2025-11-03  6:20   ` Yan Zhao
2025-11-04  7:06     ` Yan Zhao
2025-11-04  8:40       ` Xiaoyao Li
2025-11-04  9:31         ` Yan Zhao
2025-11-04 17:55           ` Sean Christopherson
2025-11-05  1:52             ` Yan Zhao
2025-11-05  9:16               ` Xiaoyao Li
2025-11-06  2:22                 ` Yan Zhao
2025-11-03  7:42   ` Xiaoyao Li
2025-10-30 19:15 ` [PATCH v5 2/4] KVM: x86: WARN if user-return MSR notifier is registered on exit Sean Christopherson
2025-10-30 19:15 ` [PATCH v5 3/4] KVM: x86: Leave user-return notifier registered on reboot/shutdown Sean Christopherson
2025-11-07  8:18   ` Chao Gao [this message]
2025-11-08  1:37     ` Sean Christopherson
2025-10-30 19:15 ` [PATCH v5 4/4] KVM: x86: Don't disable IRQs when unregistering user-return notifier Sean Christopherson
2025-11-04 10:34   ` Huang, Kai
2025-11-10 15:37 ` [PATCH v5 0/4] KVM: x86: User-return MSR fix+cleanups Sean Christopherson

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:bb7a7515f28 dfblob:ab411bd0956 )
 OR (
bs:"KVM: x86: Allocate/free user_return_msrs at kvm.ko" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQ2rTgWwqWvoqnIL@intel.com \
    --to=chao.gao@intel.com \
    --cc=houwenlong.hwl@antgroup.com \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=x86@kernel.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox