Re: [PATCH v5 3/4] KVM: x86: Leave user-return notifier registered on reboot/shutdown

linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Chao Gao <chao.gao@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	"Kirill A. Shutemov" <kas@kernel.org>,
	kvm@vger.kernel.org,  x86@kernel.org, linux-coco@lists.linux.dev,
	linux-kernel@vger.kernel.org,  Yan Zhao <yan.y.zhao@intel.com>,
	Xiaoyao Li <xiaoyao.li@intel.com>,
	 Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Hou Wenlong <houwenlong.hwl@antgroup.com>
Subject: Re: [PATCH v5 3/4] KVM: x86: Leave user-return notifier registered on reboot/shutdown
Date: Fri, 7 Nov 2025 17:37:11 -0800	[thread overview]
Message-ID: <aQ6ex5rKZU-bEDiX@google.com> (raw)
In-Reply-To: <aQ2rTgWwqWvoqnIL@intel.com>

On Fri, Nov 07, 2025, Chao Gao wrote:
> On Thu, Oct 30, 2025 at 12:15:27PM -0700, Sean Christopherson wrote:
> >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >index bb7a7515f280..c927326344b1 100644
> >--- a/arch/x86/kvm/x86.c
> >+++ b/arch/x86/kvm/x86.c
> >@@ -13086,7 +13086,21 @@ int kvm_arch_enable_virtualization_cpu(void)
> > void kvm_arch_disable_virtualization_cpu(void)
> > {
> > 	kvm_x86_call(disable_virtualization_cpu)();
> >-	drop_user_return_notifiers();
> >+
> >+	/*
> >+	 * Leave the user-return notifiers as-is when disabling virtualization
> >+	 * for reboot, i.e. when disabling via IPI function call, and instead
> >+	 * pin kvm.ko (if it's a module) to defend against use-after-free (in
> >+	 * the *very* unlikely scenario module unload is racing with reboot).
> >+	 * On a forced reboot, tasks aren't frozen before shutdown, and so KVM
> >+	 * could be actively modifying user-return MSR state when the IPI to
> >+	 * disable virtualization arrives.  Handle the extreme edge case here
> >+	 * instead of trying to account for it in the normal flows.
> >+	 */
> >+	if (in_task() || WARN_ON_ONCE(!kvm_rebooting))
> >+		drop_user_return_notifiers();
> >+	else
> >+		__module_get(THIS_MODULE);
> 
> This doesn't pin kvm-{intel,amd}.ko, right? if so, there is still a potential
> user-after-free if the CPU returns to userspace after the per-CPU
> user_return_msrs is freed on kvm-{intel,amd}.ko unloading.
> 
> I think we need to either move __module_get() into
> kvm_x86_call(disable_virtualization_cpu)() or allocate/free the per-CPU
> user_return_msrs when loading/unloading kvm.ko. e.g.,

Gah, you're right.  I considered the complications with vendor modules, but missed
the kvm_x86_vendor_exit() angle.

> >From 0269f0ee839528e8a9616738d615a096901d6185 Mon Sep 17 00:00:00 2001
> From: Chao Gao <chao.gao@intel.com>
> Date: Fri, 7 Nov 2025 00:10:28 -0800
> Subject: [PATCH] KVM: x86: Allocate/free user_return_msrs at kvm.ko
>  (un)loading time
> 
> Move user_return_msrs allocation/free from vendor modules (kvm-intel.ko and
> kvm-amd.ko) (un)loading time to kvm.ko's to make it less risky to access
> user_return_msrs in kvm.ko. Tying the lifetime of user_return_msrs to
> vendor modules makes every access to user_return_msrs prone to
> use-after-free issues as vendor modules may be unloaded at any time.
> 
> kvm_nr_uret_msrs is still reset to 0 when vendor modules are loaded to
> clear out the user return MSR list configured by the previous vendor
> module.

Hmm, the other idea would to stash the owner in kvm_x86_ops, and then do:

		__module_get(kvm_x86_ops.owner);

LOL, but that's even more flawed from a certain perspective, because
kvm_x86_ops.owner could be completely stale, especially if this races with
kvm_x86_vendor_exit().

> +static void __exit kvm_free_user_return_msrs(void)
>  {
> 	int cpu;
>  
> @@ -10044,13 +10043,11 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops
> *ops)
> 		return -ENOMEM;
> 	}
>  
> -	r = kvm_init_user_return_msrs();
> -	if (r)
> -		goto out_free_x86_emulator_cache;
> +	kvm_nr_uret_msrs = 0;

For maximum paranoia, we should zero at exit() and WARN at init().

> 	r = kvm_mmu_vendor_module_init();
> 	if (r)
> -		goto out_free_percpu;
> +		goto out_free_x86_emulator_cache;
>  
> 	kvm_caps.supported_vm_types = BIT(KVM_X86_DEFAULT_VM);
> 	kvm_caps.supported_mce_cap = MCG_CTL_P | MCG_SER_P;
> @@ -10148,8 +10145,6 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops
> *ops)
> 	kvm_x86_call(hardware_unsetup)();
>  out_mmu_exit:
> 	kvm_mmu_vendor_module_exit();
> -out_free_percpu:
> -	kvm_free_user_return_msrs();
>  out_free_x86_emulator_cache:
> 	kmem_cache_destroy(x86_emulator_cache);
> 	return r;
> @@ -10178,7 +10173,6 @@ void kvm_x86_vendor_exit(void)
>  #endif
> 	kvm_x86_call(hardware_unsetup)();
> 	kvm_mmu_vendor_module_exit();
> -	kvm_free_user_return_msrs();
> 	kmem_cache_destroy(x86_emulator_cache);
>  #ifdef CONFIG_KVM_XEN
> 	static_key_deferred_flush(&kvm_xen_enabled);
> @@ -14361,8 +14355,14 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_rmp_fault);
>  
>  static int __init kvm_x86_init(void)
>  {
> +	int r;
> +
> 	kvm_init_xstate_sizes();
>  
> +	r = kvm_init_user_return_msrs();
> +	if (r)

Rather than dynamically allocate the array of structures, we can "statically"
allocate it when the module is loaded.

I'll post this as a proper patch (with my massages) once I've tested.

Thanks much!

(and I forgot to hit "send", so this is going to show up after the patch, sorry)

next prev parent reply	other threads:[~2025-11-08  1:37 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-30 19:15 [PATCH v5 0/4] KVM: x86: User-return MSR fix+cleanups Sean Christopherson
2025-10-30 19:15 ` [PATCH v5 1/4] KVM: TDX: Explicitly set user-return MSRs that *may* be clobbered by the TDX-Module Sean Christopherson
2025-11-03  6:20   ` Yan Zhao
2025-11-04  7:06     ` Yan Zhao
2025-11-04  8:40       ` Xiaoyao Li
2025-11-04  9:31         ` Yan Zhao
2025-11-04 17:55           ` Sean Christopherson
2025-11-05  1:52             ` Yan Zhao
2025-11-05  9:16               ` Xiaoyao Li
2025-11-06  2:22                 ` Yan Zhao
2025-11-03  7:42   ` Xiaoyao Li
2025-10-30 19:15 ` [PATCH v5 2/4] KVM: x86: WARN if user-return MSR notifier is registered on exit Sean Christopherson
2025-10-30 19:15 ` [PATCH v5 3/4] KVM: x86: Leave user-return notifier registered on reboot/shutdown Sean Christopherson
2025-11-07  8:18   ` Chao Gao
2025-11-08  1:37     ` Sean Christopherson [this message]
2025-10-30 19:15 ` [PATCH v5 4/4] KVM: x86: Don't disable IRQs when unregistering user-return notifier Sean Christopherson
2025-11-04 10:34   ` Huang, Kai
2025-11-10 15:37 ` [PATCH v5 0/4] KVM: x86: User-return MSR fix+cleanups Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQ6ex5rKZU-bEDiX@google.com \
    --to=seanjc@google.com \
    --cc=chao.gao@intel.com \
    --cc=houwenlong.hwl@antgroup.com \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=x86@kernel.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).