public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND 0/2] KVM: Prevent module exit until all VMs are freed
@ 2022-03-03 18:33 David Matlack
  2022-03-03 18:33 ` [PATCH RESEND 1/2] " David Matlack
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: David Matlack @ 2022-03-03 18:33 UTC (permalink / raw)
  To: pbonzini
  Cc: David Matlack, kvm, Marcelo Tosatti, Gleb Natapov, Rik van Riel,
	seanjc, bgardon

[Resending with --cc-cover to fix CCs.]

This series fixes a long-standing theoretical bug where the KVM module
can be unloaded while there are still references to a struct kvm. This
can be reproduced with a simple patch [1] to run some delayed work 10
seconds after a VM file descriptor is released.

This bug was originally fixed by Ben Gardon <bgardon@google.com> in
Google's kernel due to a race with an internal kernel daemon.

KVM's async_pf code looks susceptible to this race since its inception,
but clearly no one has noticed. Upcoming changes to the TDP MMU will
move zapping to asynchronous workers which is much more likely to hit
this issue. Fix it now to close the gap in async_pf and prepare for the
TDP MMU zapping changes.

While here I noticed some further cleanups that could be done in the
async_pf code. It seems unnecessary for the async_pf code to grab a
reference via kvm_get_kvm() because there's code to synchronously drain
its queue of work in kvm_destroy_vm() -> ... ->
kvm_clear_async_pf_completion_queue() (at least on x86). This is
actually dead code because kvm_destroy_vm() will never be called while
there are references to kvm.users_count (which the async_pf callbacks
themselves hold). It seems we could either drop the reference grabbing
in async_pf.c or drop the call to kvm_clear_async_pf_completion_queue().

These patches apply on the latest kvm/queue commit b13a3befc815 ("KVM:
selftests: Add test to populate a VM with the max possible guest mem")
after reverting commit c9bdd0aa6800 ("KVM: allow struct kvm to outlive
the file descriptors").

Cc: kvm@vger.kernel.org (open list:KERNEL VIRTUAL MACHINE (KVM))
Cc: Marcelo Tosatti <mtosatti@redhat.com> (blamed_fixes:1/1=100%)
Cc: Gleb Natapov <gleb@redhat.com> (blamed_fixes:1/1=100%)
Cc: Rik van Riel <riel@redhat.com> (blamed_fixes:1/1=100%)
Cc: seanjc@google.com
Cc: bgardon@google.com

[1] To repro: Apply the following patch, run a KVM selftest, and then
unload the KVM module within 10 seconds of the selftest finishing. The
kernel will panic. With the fix applied, module unloading will fail
until the final struct kvm reference is put.

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9536ffa0473b..db827cf6a7a7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -771,6 +771,8 @@ struct kvm {
        struct notifier_block pm_notifier;
 #endif
        char stats_id[KVM_STATS_NAME_SIZE];
+
+       struct delayed_work run_after_vm_release_work;
 };

 #define kvm_err(fmt, ...) \
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 64eb99444688..35ae6d32dae5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1258,12 +1258,25 @@ void kvm_put_kvm_no_destroy(struct kvm *kvm)
 }
 EXPORT_SYMBOL_GPL(kvm_put_kvm_no_destroy);

+static void run_after_vm_release(struct work_struct *work)
+{
+       struct delayed_work *dwork = to_delayed_work(work);
+       struct kvm *kvm = container_of(dwork, struct kvm, run_after_vm_release_work);
+
+       pr_info("I'm still alive!\n");
+       kvm_put_kvm(kvm);
+}
+
 static int kvm_vm_release(struct inode *inode, struct file *filp)
 {
        struct kvm *kvm = filp->private_data;

        kvm_irqfd_release(kvm);

+       kvm_get_kvm(kvm);
+       INIT_DELAYED_WORK(&kvm->run_after_vm_release_work, run_after_vm_release);
+       schedule_delayed_work(&kvm->run_after_vm_release_work, 10 * HZ);
+
        kvm_put_kvm(kvm);
        return 0;
 }


David Matlack (2):
  KVM: Prevent module exit until all VMs are freed
  Revert "KVM: set owner of cpu and vm file operations"

 virt/kvm/kvm_main.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)


base-commit: b13a3befc815eae574d87e6249f973dfbb6ad6cd
prerequisite-patch-id: 38f66d60319bf0bc9bf49f91f0f9119e5441629b
prerequisite-patch-id: 51aa921d68ea649d436ea68e1b8f4aabc3805156
-- 
2.35.1.574.g5d30c73bfb-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RESEND 1/2] KVM: Prevent module exit until all VMs are freed
  2022-03-03 18:33 [PATCH RESEND 0/2] KVM: Prevent module exit until all VMs are freed David Matlack
@ 2022-03-03 18:33 ` David Matlack
  2022-03-08 21:40   ` Sean Christopherson
  2022-03-15 15:43   ` Murilo Opsfelder Araújo
  2022-03-03 18:33 ` [PATCH RESEND 2/2] Revert "KVM: set owner of cpu and vm file operations" David Matlack
  2022-03-15 20:49 ` [PATCH RESEND 0/2] KVM: Prevent module exit until all VMs are freed Paolo Bonzini
  2 siblings, 2 replies; 12+ messages in thread
From: David Matlack @ 2022-03-03 18:33 UTC (permalink / raw)
  To: pbonzini
  Cc: David Matlack, kvm, Marcelo Tosatti, Gleb Natapov, Rik van Riel,
	seanjc, bgardon, stable

Tie the lifetime the KVM module to the lifetime of each VM via
kvm.users_count. This way anything that grabs a reference to the VM via
kvm_get_kvm() cannot accidentally outlive the KVM module.

Prior to this commit, the lifetime of the KVM module was tied to the
lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU
file descriptors by their respective file_operations "owner" field.
This approach is insufficient because references grabbed via
kvm_get_kvm() do not prevent closing any of the aforementioned file
descriptors.

This fixes a long standing theoretical bug in KVM that at least affects
async page faults. kvm_setup_async_pf() grabs a reference via
kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing
prevents the VM file descriptor from being closed and the KVM module
from being unloaded before this callback runs.

Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Suggested-by: Ben Gardon <bgardon@google.com>
[ Based on a patch from Ben implemented for Google's kernel. ]
Signed-off-by: David Matlack <dmatlack@google.com>
---
 virt/kvm/kvm_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 35ae6d32dae5..b59f0a29dbd5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
 
 static const struct file_operations stat_fops_per_vm;
 
+static struct file_operations kvm_chardev_ops;
+
 static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
 			   unsigned long arg);
 #ifdef CONFIG_KVM_COMPAT
@@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	preempt_notifier_inc();
 	kvm_init_pm_notifier(kvm);
 
+	if (!try_module_get(kvm_chardev_ops.owner)) {
+		r = -ENODEV;
+		goto out_err;
+	}
+
 	return kvm;
 
 out_err:
@@ -1220,6 +1227,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	preempt_notifier_dec();
 	hardware_disable_all();
 	mmdrop(mm);
+	module_put(kvm_chardev_ops.owner);
 }
 
 void kvm_get_kvm(struct kvm *kvm)

base-commit: b13a3befc815eae574d87e6249f973dfbb6ad6cd
prerequisite-patch-id: 38f66d60319bf0bc9bf49f91f0f9119e5441629b
prerequisite-patch-id: 51aa921d68ea649d436ea68e1b8f4aabc3805156
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RESEND 2/2] Revert "KVM: set owner of cpu and vm file operations"
  2022-03-03 18:33 [PATCH RESEND 0/2] KVM: Prevent module exit until all VMs are freed David Matlack
  2022-03-03 18:33 ` [PATCH RESEND 1/2] " David Matlack
@ 2022-03-03 18:33 ` David Matlack
  2022-03-08 21:40   ` Sean Christopherson
  2022-03-15 20:49 ` [PATCH RESEND 0/2] KVM: Prevent module exit until all VMs are freed Paolo Bonzini
  2 siblings, 1 reply; 12+ messages in thread
From: David Matlack @ 2022-03-03 18:33 UTC (permalink / raw)
  To: pbonzini
  Cc: David Matlack, kvm, Marcelo Tosatti, Gleb Natapov, Rik van Riel,
	seanjc, bgardon

This reverts commit 3d3aab1b973b01bd2a1aa46307e94a1380b1d802.

Now that the KVM module's lifetime is tied to kvm.users_count, there is
no need to also tie it's lifetime to the lifetime of the VM and vCPU
file descriptors.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 virt/kvm/kvm_main.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b59f0a29dbd5..73b8f70e16cc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3684,7 +3684,7 @@ static int kvm_vcpu_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
-static struct file_operations kvm_vcpu_fops = {
+static const struct file_operations kvm_vcpu_fops = {
 	.release        = kvm_vcpu_release,
 	.unlocked_ioctl = kvm_vcpu_ioctl,
 	.mmap           = kvm_vcpu_mmap,
@@ -4735,7 +4735,7 @@ static long kvm_vm_compat_ioctl(struct file *filp,
 }
 #endif
 
-static struct file_operations kvm_vm_fops = {
+static const struct file_operations kvm_vm_fops = {
 	.release        = kvm_vm_release,
 	.unlocked_ioctl = kvm_vm_ioctl,
 	.llseek		= noop_llseek,
@@ -5744,8 +5744,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 		goto out_free_5;
 
 	kvm_chardev_ops.owner = module;
-	kvm_vm_fops.owner = module;
-	kvm_vcpu_fops.owner = module;
 
 	r = misc_register(&kvm_dev);
 	if (r) {
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 1/2] KVM: Prevent module exit until all VMs are freed
  2022-03-03 18:33 ` [PATCH RESEND 1/2] " David Matlack
@ 2022-03-08 21:40   ` Sean Christopherson
  2022-03-08 22:28     ` David Matlack
  2022-03-08 23:43     ` David Matlack
  2022-03-15 15:43   ` Murilo Opsfelder Araújo
  1 sibling, 2 replies; 12+ messages in thread
From: Sean Christopherson @ 2022-03-08 21:40 UTC (permalink / raw)
  To: David Matlack
  Cc: pbonzini, kvm, Marcelo Tosatti, Gleb Natapov, Rik van Riel,
	bgardon, stable

On Thu, Mar 03, 2022, David Matlack wrote:
> Tie the lifetime the KVM module to the lifetime of each VM via
> kvm.users_count. This way anything that grabs a reference to the VM via
> kvm_get_kvm() cannot accidentally outlive the KVM module.
> 
> Prior to this commit, the lifetime of the KVM module was tied to the
> lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU
> file descriptors by their respective file_operations "owner" field.
> This approach is insufficient because references grabbed via
> kvm_get_kvm() do not prevent closing any of the aforementioned file
> descriptors.
> 
> This fixes a long standing theoretical bug in KVM that at least affects
> async page faults. kvm_setup_async_pf() grabs a reference via
> kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing
> prevents the VM file descriptor from being closed and the KVM module
> from being unloaded before this callback runs.
> 
> Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out")

And (or)

  Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations")

because the above is x86-centric, at a glance PPC and maybe s390 have issues
beyond async #PF.

> Cc: stable@vger.kernel.org
> Suggested-by: Ben Gardon <bgardon@google.com>
> [ Based on a patch from Ben implemented for Google's kernel. ]
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  virt/kvm/kvm_main.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 35ae6d32dae5..b59f0a29dbd5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
>  
>  static const struct file_operations stat_fops_per_vm;
>  
> +static struct file_operations kvm_chardev_ops;
> +
>  static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
>  			   unsigned long arg);
>  #ifdef CONFIG_KVM_COMPAT
> @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
>  	preempt_notifier_inc();
>  	kvm_init_pm_notifier(kvm);
>  
> +	if (!try_module_get(kvm_chardev_ops.owner)) {

The "try" aspect is unnecessary.  Stealing from Paolo's version, 

	/* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */
	__module_get(kvm_chardev_ops.owner);

> +		r = -ENODEV;
> +		goto out_err;
> +	}
> +
>  	return kvm;
>  
>  out_err:
> @@ -1220,6 +1227,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
>  	preempt_notifier_dec();
>  	hardware_disable_all();
>  	mmdrop(mm);
> +	module_put(kvm_chardev_ops.owner);
>  }
>  
>  void kvm_get_kvm(struct kvm *kvm)
> 
> base-commit: b13a3befc815eae574d87e6249f973dfbb6ad6cd
> prerequisite-patch-id: 38f66d60319bf0bc9bf49f91f0f9119e5441629b
> prerequisite-patch-id: 51aa921d68ea649d436ea68e1b8f4aabc3805156
> -- 
> 2.35.1.616.g0bdcbb4464-goog
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 2/2] Revert "KVM: set owner of cpu and vm file operations"
  2022-03-03 18:33 ` [PATCH RESEND 2/2] Revert "KVM: set owner of cpu and vm file operations" David Matlack
@ 2022-03-08 21:40   ` Sean Christopherson
  0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2022-03-08 21:40 UTC (permalink / raw)
  To: David Matlack
  Cc: pbonzini, kvm, Marcelo Tosatti, Gleb Natapov, Rik van Riel,
	bgardon

On Thu, Mar 03, 2022, David Matlack wrote:
> This reverts commit 3d3aab1b973b01bd2a1aa46307e94a1380b1d802.
> 
> Now that the KVM module's lifetime is tied to kvm.users_count, there is
> no need to also tie it's lifetime to the lifetime of the VM and vCPU
> file descriptors.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 1/2] KVM: Prevent module exit until all VMs are freed
  2022-03-08 21:40   ` Sean Christopherson
@ 2022-03-08 22:28     ` David Matlack
  2022-03-08 23:08       ` Sean Christopherson
  2022-03-08 23:43     ` David Matlack
  1 sibling, 1 reply; 12+ messages in thread
From: David Matlack @ 2022-03-08 22:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm list, Marcelo Tosatti, Gleb Natapov,
	Rik van Riel, Ben Gardon, stable

On Tue, Mar 8, 2022 at 1:40 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Mar 03, 2022, David Matlack wrote:
> > Tie the lifetime the KVM module to the lifetime of each VM via
> > kvm.users_count. This way anything that grabs a reference to the VM via
> > kvm_get_kvm() cannot accidentally outlive the KVM module.
> >
> > Prior to this commit, the lifetime of the KVM module was tied to the
> > lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU
> > file descriptors by their respective file_operations "owner" field.
> > This approach is insufficient because references grabbed via
> > kvm_get_kvm() do not prevent closing any of the aforementioned file
> > descriptors.
> >
> > This fixes a long standing theoretical bug in KVM that at least affects
> > async page faults. kvm_setup_async_pf() grabs a reference via
> > kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing
> > prevents the VM file descriptor from being closed and the KVM module
> > from being unloaded before this callback runs.
> >
> > Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out")
>
> And (or)
>
>   Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations")
>
> because the above is x86-centric, at a glance PPC and maybe s390 have issues
> beyond async #PF.
>
> > Cc: stable@vger.kernel.org
> > Suggested-by: Ben Gardon <bgardon@google.com>
> > [ Based on a patch from Ben implemented for Google's kernel. ]
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
> >  virt/kvm/kvm_main.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 35ae6d32dae5..b59f0a29dbd5 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
> >
> >  static const struct file_operations stat_fops_per_vm;
> >
> > +static struct file_operations kvm_chardev_ops;
> > +
> >  static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
> >                          unsigned long arg);
> >  #ifdef CONFIG_KVM_COMPAT
> > @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
> >       preempt_notifier_inc();
> >       kvm_init_pm_notifier(kvm);
> >
> > +     if (!try_module_get(kvm_chardev_ops.owner)) {
>
> The "try" aspect is unnecessary.  Stealing from Paolo's version,
>
>         /* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */
>         __module_get(kvm_chardev_ops.owner);

Right, I did see that and agree we're guaranteed the KVM module has a
reference at this point. But the KVM module might be in state
MODULE_STATE_GOING (e.g. if someone ran "rmmod --wait"), which
try_module_get() checks.

>
> > +             r = -ENODEV;
> > +             goto out_err;
> > +     }
> > +
> >       return kvm;
> >
> >  out_err:
> > @@ -1220,6 +1227,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
> >       preempt_notifier_dec();
> >       hardware_disable_all();
> >       mmdrop(mm);
> > +     module_put(kvm_chardev_ops.owner);
> >  }
> >
> >  void kvm_get_kvm(struct kvm *kvm)
> >
> > base-commit: b13a3befc815eae574d87e6249f973dfbb6ad6cd
> > prerequisite-patch-id: 38f66d60319bf0bc9bf49f91f0f9119e5441629b
> > prerequisite-patch-id: 51aa921d68ea649d436ea68e1b8f4aabc3805156
> > --
> > 2.35.1.616.g0bdcbb4464-goog
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 1/2] KVM: Prevent module exit until all VMs are freed
  2022-03-08 22:28     ` David Matlack
@ 2022-03-08 23:08       ` Sean Christopherson
  2022-03-08 23:44         ` David Matlack
  0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2022-03-08 23:08 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, kvm list, Marcelo Tosatti, Gleb Natapov,
	Rik van Riel, Ben Gardon, stable

On Tue, Mar 08, 2022, David Matlack wrote:
> On Tue, Mar 8, 2022 at 1:40 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Mar 03, 2022, David Matlack wrote:
> > > Tie the lifetime the KVM module to the lifetime of each VM via
> > > kvm.users_count. This way anything that grabs a reference to the VM via
> > > kvm_get_kvm() cannot accidentally outlive the KVM module.
> > >
> > > Prior to this commit, the lifetime of the KVM module was tied to the
> > > lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU
> > > file descriptors by their respective file_operations "owner" field.
> > > This approach is insufficient because references grabbed via
> > > kvm_get_kvm() do not prevent closing any of the aforementioned file
> > > descriptors.
> > >
> > > This fixes a long standing theoretical bug in KVM that at least affects
> > > async page faults. kvm_setup_async_pf() grabs a reference via
> > > kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing
> > > prevents the VM file descriptor from being closed and the KVM module
> > > from being unloaded before this callback runs.
> > >
> > > Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out")
> >
> > And (or)
> >
> >   Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations")
> >
> > because the above is x86-centric, at a glance PPC and maybe s390 have issues
> > beyond async #PF.
> >
> > > Cc: stable@vger.kernel.org
> > > Suggested-by: Ben Gardon <bgardon@google.com>
> > > [ Based on a patch from Ben implemented for Google's kernel. ]
> > > Signed-off-by: David Matlack <dmatlack@google.com>
> > > ---
> > >  virt/kvm/kvm_main.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 35ae6d32dae5..b59f0a29dbd5 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
> > >
> > >  static const struct file_operations stat_fops_per_vm;
> > >
> > > +static struct file_operations kvm_chardev_ops;
> > > +
> > >  static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
> > >                          unsigned long arg);
> > >  #ifdef CONFIG_KVM_COMPAT
> > > @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
> > >       preempt_notifier_inc();
> > >       kvm_init_pm_notifier(kvm);
> > >
> > > +     if (!try_module_get(kvm_chardev_ops.owner)) {
> >
> > The "try" aspect is unnecessary.  Stealing from Paolo's version,
> >
> >         /* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */
> >         __module_get(kvm_chardev_ops.owner);
> 
> Right, I did see that and agree we're guaranteed the KVM module has a
> reference at this point. But the KVM module might be in state
> MODULE_STATE_GOING (e.g. if someone ran "rmmod --wait"), which
> try_module_get() checks.

Ah, can you throw that in as a comment?  Doesn't have to be much, just enough of
a breadcrumb to connect the dots and to prevent us from "optimizing" this to
__module_get() in the future.

	/* Use the "try" variant to play nice with e.g. "rmmod --wait". */

With a comment,

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 1/2] KVM: Prevent module exit until all VMs are freed
  2022-03-08 21:40   ` Sean Christopherson
  2022-03-08 22:28     ` David Matlack
@ 2022-03-08 23:43     ` David Matlack
  1 sibling, 0 replies; 12+ messages in thread
From: David Matlack @ 2022-03-08 23:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm list, Marcelo Tosatti, Gleb Natapov,
	Rik van Riel, Ben Gardon, stable

On Tue, Mar 8, 2022 at 1:40 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Mar 03, 2022, David Matlack wrote:
> > Tie the lifetime the KVM module to the lifetime of each VM via
> > kvm.users_count. This way anything that grabs a reference to the VM via
> > kvm_get_kvm() cannot accidentally outlive the KVM module.
> >
> > Prior to this commit, the lifetime of the KVM module was tied to the
> > lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU
> > file descriptors by their respective file_operations "owner" field.
> > This approach is insufficient because references grabbed via
> > kvm_get_kvm() do not prevent closing any of the aforementioned file
> > descriptors.
> >
> > This fixes a long standing theoretical bug in KVM that at least affects
> > async page faults. kvm_setup_async_pf() grabs a reference via
> > kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing
> > prevents the VM file descriptor from being closed and the KVM module
> > from being unloaded before this callback runs.
> >
> > Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out")
>
> And (or)
>
>   Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations")
>
> because the above is x86-centric, at a glance PPC and maybe s390 have issues
> beyond async #PF.

SGTM. It's a moot point in terms of stable inclusion since
af585b921e5d was first added in v2.6.38. But for anyone doing their
own backporting, 3d3aab1b973b makes it a bit more obvious this is a
generic problem even though it's not the commit that introduces the
bug.

>
> > Cc: stable@vger.kernel.org
> > Suggested-by: Ben Gardon <bgardon@google.com>
> > [ Based on a patch from Ben implemented for Google's kernel. ]
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
> >  virt/kvm/kvm_main.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 35ae6d32dae5..b59f0a29dbd5 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
> >
> >  static const struct file_operations stat_fops_per_vm;
> >
> > +static struct file_operations kvm_chardev_ops;
> > +
> >  static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
> >                          unsigned long arg);
> >  #ifdef CONFIG_KVM_COMPAT
> > @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
> >       preempt_notifier_inc();
> >       kvm_init_pm_notifier(kvm);
> >
> > +     if (!try_module_get(kvm_chardev_ops.owner)) {
>
> The "try" aspect is unnecessary.  Stealing from Paolo's version,
>
>         /* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */
>         __module_get(kvm_chardev_ops.owner);
>
> > +             r = -ENODEV;
> > +             goto out_err;
> > +     }
> > +
> >       return kvm;
> >
> >  out_err:
> > @@ -1220,6 +1227,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
> >       preempt_notifier_dec();
> >       hardware_disable_all();
> >       mmdrop(mm);
> > +     module_put(kvm_chardev_ops.owner);
> >  }
> >
> >  void kvm_get_kvm(struct kvm *kvm)
> >
> > base-commit: b13a3befc815eae574d87e6249f973dfbb6ad6cd
> > prerequisite-patch-id: 38f66d60319bf0bc9bf49f91f0f9119e5441629b
> > prerequisite-patch-id: 51aa921d68ea649d436ea68e1b8f4aabc3805156
> > --
> > 2.35.1.616.g0bdcbb4464-goog
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 1/2] KVM: Prevent module exit until all VMs are freed
  2022-03-08 23:08       ` Sean Christopherson
@ 2022-03-08 23:44         ` David Matlack
  0 siblings, 0 replies; 12+ messages in thread
From: David Matlack @ 2022-03-08 23:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm list, Marcelo Tosatti, Gleb Natapov,
	Rik van Riel, Ben Gardon, stable

On Tue, Mar 8, 2022 at 3:09 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Mar 08, 2022, David Matlack wrote:
> > On Tue, Mar 8, 2022 at 1:40 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Thu, Mar 03, 2022, David Matlack wrote:
> > > > Tie the lifetime the KVM module to the lifetime of each VM via
> > > > kvm.users_count. This way anything that grabs a reference to the VM via
> > > > kvm_get_kvm() cannot accidentally outlive the KVM module.
> > > >
> > > > Prior to this commit, the lifetime of the KVM module was tied to the
> > > > lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU
> > > > file descriptors by their respective file_operations "owner" field.
> > > > This approach is insufficient because references grabbed via
> > > > kvm_get_kvm() do not prevent closing any of the aforementioned file
> > > > descriptors.
> > > >
> > > > This fixes a long standing theoretical bug in KVM that at least affects
> > > > async page faults. kvm_setup_async_pf() grabs a reference via
> > > > kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing
> > > > prevents the VM file descriptor from being closed and the KVM module
> > > > from being unloaded before this callback runs.
> > > >
> > > > Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out")
> > >
> > > And (or)
> > >
> > >   Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations")
> > >
> > > because the above is x86-centric, at a glance PPC and maybe s390 have issues
> > > beyond async #PF.
> > >
> > > > Cc: stable@vger.kernel.org
> > > > Suggested-by: Ben Gardon <bgardon@google.com>
> > > > [ Based on a patch from Ben implemented for Google's kernel. ]
> > > > Signed-off-by: David Matlack <dmatlack@google.com>
> > > > ---
> > > >  virt/kvm/kvm_main.c | 8 ++++++++
> > > >  1 file changed, 8 insertions(+)
> > > >
> > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > index 35ae6d32dae5..b59f0a29dbd5 100644
> > > > --- a/virt/kvm/kvm_main.c
> > > > +++ b/virt/kvm/kvm_main.c
> > > > @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
> > > >
> > > >  static const struct file_operations stat_fops_per_vm;
> > > >
> > > > +static struct file_operations kvm_chardev_ops;
> > > > +
> > > >  static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
> > > >                          unsigned long arg);
> > > >  #ifdef CONFIG_KVM_COMPAT
> > > > @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
> > > >       preempt_notifier_inc();
> > > >       kvm_init_pm_notifier(kvm);
> > > >
> > > > +     if (!try_module_get(kvm_chardev_ops.owner)) {
> > >
> > > The "try" aspect is unnecessary.  Stealing from Paolo's version,
> > >
> > >         /* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */
> > >         __module_get(kvm_chardev_ops.owner);
> >
> > Right, I did see that and agree we're guaranteed the KVM module has a
> > reference at this point. But the KVM module might be in state
> > MODULE_STATE_GOING (e.g. if someone ran "rmmod --wait"), which
> > try_module_get() checks.
>
> Ah, can you throw that in as a comment?  Doesn't have to be much, just enough of
> a breadcrumb to connect the dots and to prevent us from "optimizing" this to
> __module_get() in the future.
>
>         /* Use the "try" variant to play nice with e.g. "rmmod --wait". */

Yeah. I should have included this in the first place (or at least a
blurb in the commit message).

>
> With a comment,
>
> Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 1/2] KVM: Prevent module exit until all VMs are freed
  2022-03-03 18:33 ` [PATCH RESEND 1/2] " David Matlack
  2022-03-08 21:40   ` Sean Christopherson
@ 2022-03-15 15:43   ` Murilo Opsfelder Araújo
  2022-03-15 20:45     ` Paolo Bonzini
  1 sibling, 1 reply; 12+ messages in thread
From: Murilo Opsfelder Araújo @ 2022-03-15 15:43 UTC (permalink / raw)
  To: David Matlack, pbonzini
  Cc: kvm, Marcelo Tosatti, Gleb Natapov, Rik van Riel, seanjc, bgardon,
	stable, farosas

Hi, David.

Some comments below.

On 3/3/22 15:33, David Matlack wrote:
> Tie the lifetime the KVM module to the lifetime of each VM via
> kvm.users_count. This way anything that grabs a reference to the VM via
> kvm_get_kvm() cannot accidentally outlive the KVM module.
> 
> Prior to this commit, the lifetime of the KVM module was tied to the
> lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU
> file descriptors by their respective file_operations "owner" field.
> This approach is insufficient because references grabbed via
> kvm_get_kvm() do not prevent closing any of the aforementioned file
> descriptors.
> 
> This fixes a long standing theoretical bug in KVM that at least affects
> async page faults. kvm_setup_async_pf() grabs a reference via
> kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing
> prevents the VM file descriptor from being closed and the KVM module
> from being unloaded before this callback runs.
> 
> Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out")
> Cc: stable@vger.kernel.org
> Suggested-by: Ben Gardon <bgardon@google.com>
> [ Based on a patch from Ben implemented for Google's kernel. ]
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   virt/kvm/kvm_main.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 35ae6d32dae5..b59f0a29dbd5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
>   
>   static const struct file_operations stat_fops_per_vm;
>   
> +static struct file_operations kvm_chardev_ops;
> +
>   static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
>   			   unsigned long arg);
>   #ifdef CONFIG_KVM_COMPAT
> @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
>   	preempt_notifier_inc();
>   	kvm_init_pm_notifier(kvm);
>   
> +	if (!try_module_get(kvm_chardev_ops.owner)) {
> +		r = -ENODEV;
> +		goto out_err;
> +	}
> +

Doesn't this problem also affects the other functions called from
kvm_dev_ioctl()?

Is it possible that the module is removed while other ioctl's are still running,
e.g. KVM_GET_API_VERSION and KVM_CHECK_EXTENSION, even though they don't use
struct kvm?

I wonder if this try_module_get() (along with module_put() in the out path of
the function) shouldn't be placed in the upper function kvm_dev_ioctl() so it
would cover all the other ioctl's.

>   	return kvm;
>   
>   out_err:
> @@ -1220,6 +1227,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
>   	preempt_notifier_dec();
>   	hardware_disable_all();
>   	mmdrop(mm);
> +	module_put(kvm_chardev_ops.owner);
>   }
>   
>   void kvm_get_kvm(struct kvm *kvm)
> 
> base-commit: b13a3befc815eae574d87e6249f973dfbb6ad6cd
> prerequisite-patch-id: 38f66d60319bf0bc9bf49f91f0f9119e5441629b
> prerequisite-patch-id: 51aa921d68ea649d436ea68e1b8f4aabc3805156

-- 
Murilo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 1/2] KVM: Prevent module exit until all VMs are freed
  2022-03-15 15:43   ` Murilo Opsfelder Araújo
@ 2022-03-15 20:45     ` Paolo Bonzini
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2022-03-15 20:45 UTC (permalink / raw)
  To: muriloo, David Matlack
  Cc: kvm, Marcelo Tosatti, Gleb Natapov, Rik van Riel, seanjc, bgardon,
	stable, farosas


On 3/15/22 16:43, Murilo Opsfelder Araújo wrote:
>>
>> +    if (!try_module_get(kvm_chardev_ops.owner)) {
>> +        r = -ENODEV;
>> +        goto out_err;
>> +    }
>> +
> 
> Doesn't this problem also affects the other functions called from
> kvm_dev_ioctl()?
> 
> Is it possible that the module is removed while other ioctl's are
> still running, e.g. KVM_GET_API_VERSION and KVM_CHECK_EXTENSION, even
> though they don't use struct kvm?

No, because opening /dev/kvm also adds a reference to the module.  The 
problem is that create_vm creates another source of references to the 
module that can survive after /dev/kvm is closed.

Paolo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 0/2] KVM: Prevent module exit until all VMs are freed
  2022-03-03 18:33 [PATCH RESEND 0/2] KVM: Prevent module exit until all VMs are freed David Matlack
  2022-03-03 18:33 ` [PATCH RESEND 1/2] " David Matlack
  2022-03-03 18:33 ` [PATCH RESEND 2/2] Revert "KVM: set owner of cpu and vm file operations" David Matlack
@ 2022-03-15 20:49 ` Paolo Bonzini
  2 siblings, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2022-03-15 20:49 UTC (permalink / raw)
  To: David Matlack
  Cc: kvm, Marcelo Tosatti, Gleb Natapov, Rik van Riel, seanjc, bgardon

On 3/3/22 19:33, David Matlack wrote:
> [Resending with --cc-cover to fix CCs.]
> 
> This series fixes a long-standing theoretical bug where the KVM module
> can be unloaded while there are still references to a struct kvm. This
> can be reproduced with a simple patch [1] to run some delayed work 10
> seconds after a VM file descriptor is released.
> 
> This bug was originally fixed by Ben Gardon <bgardon@google.com> in
> Google's kernel due to a race with an internal kernel daemon.
> 
> KVM's async_pf code looks susceptible to this race since its inception,
> but clearly no one has noticed. Upcoming changes to the TDP MMU will
> move zapping to asynchronous workers which is much more likely to hit
> this issue. Fix it now to close the gap in async_pf and prepare for the
> TDP MMU zapping changes.
> 
> While here I noticed some further cleanups that could be done in the
> async_pf code. It seems unnecessary for the async_pf code to grab a
> reference via kvm_get_kvm() because there's code to synchronously drain
> its queue of work in kvm_destroy_vm() -> ... ->
> kvm_clear_async_pf_completion_queue() (at least on x86). This is
> actually dead code because kvm_destroy_vm() will never be called while
> there are references to kvm.users_count (which the async_pf callbacks
> themselves hold). It seems we could either drop the reference grabbing
> in async_pf.c or drop the call to kvm_clear_async_pf_completion_queue().

Adding a WARN makes sense, but then you'd probably keep the call anyway
just to avoid a use-after-free in case the WARN triggers.

Queued with a note on patch 1:

+	/*
+	 * When the fd passed to this ioctl() is opened it pins the module,
+	 * but try_module_get() also prevents getting a reference if the module
+	 * is in MODULE_STATE_GOING (e.g. if someone ran "rmmod --wait").
+	 */

Thanks,

Paolo
> These patches apply on the latest kvm/queue commit b13a3befc815 ("KVM:
> selftests: Add test to populate a VM with the max possible guest mem")
> after reverting commit c9bdd0aa6800 ("KVM: allow struct kvm to outlive
> the file descriptors").


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-03-15 20:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-03 18:33 [PATCH RESEND 0/2] KVM: Prevent module exit until all VMs are freed David Matlack
2022-03-03 18:33 ` [PATCH RESEND 1/2] " David Matlack
2022-03-08 21:40   ` Sean Christopherson
2022-03-08 22:28     ` David Matlack
2022-03-08 23:08       ` Sean Christopherson
2022-03-08 23:44         ` David Matlack
2022-03-08 23:43     ` David Matlack
2022-03-15 15:43   ` Murilo Opsfelder Araújo
2022-03-15 20:45     ` Paolo Bonzini
2022-03-03 18:33 ` [PATCH RESEND 2/2] Revert "KVM: set owner of cpu and vm file operations" David Matlack
2022-03-08 21:40   ` Sean Christopherson
2022-03-15 20:49 ` [PATCH RESEND 0/2] KVM: Prevent module exit until all VMs are freed Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox