Re: [PATCH 1/2] KVM: async kvm_destroy_vm for vfio devices

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Matthew Rosato <mjrosato@linux.ibm.com>
Cc: alex.williamson@redhat.com, pbonzini@redhat.com, jgg@nvidia.com,
	cohuck@redhat.com, farman@linux.ibm.com, pmorel@linux.ibm.com,
	borntraeger@linux.ibm.com, frankja@linux.ibm.com,
	imbrenda@linux.ibm.com, david@redhat.com, akrowiak@linux.ibm.com,
	jjherne@linux.ibm.com, pasic@linux.ibm.com,
	zhenyuw@linux.intel.com, zhi.a.wang@intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] KVM: async kvm_destroy_vm for vfio devices
Date: Wed, 11 Jan 2023 19:54:51 +0000	[thread overview]
Message-ID: <Y78UCz5oeuntSQtK@google.com> (raw)
In-Reply-To: <20230109201037.33051-2-mjrosato@linux.ibm.com>

On Mon, Jan 09, 2023, Matthew Rosato wrote:
> Currently it is possible that the final put of a KVM reference comes from
> vfio during its device close operation.  This occurs while the vfio group
> lock is held; however, if the vfio device is still in the kvm device list,
> then the following call chain could result in a deadlock:
> 
> kvm_put_kvm
>  -> kvm_destroy_vm
>   -> kvm_destroy_devices
>    -> kvm_vfio_destroy
>     -> kvm_vfio_file_set_kvm
>      -> vfio_file_set_kvm
>       -> group->group_lock/group_rwsem
> 
> Avoid this scenario by adding kvm_put_kvm_async which will perform the
> kvm_destroy_vm asynchronously if the refcount reaches 0.

Something feels off.  If KVM's refcount is 0, then accessing device->group->kvm
in vfio_device_open() can't happen unless there's a refcounting bug somewhere.

E.g. if this snippet holds group_lock

		mutex_lock(&device->group->group_lock);
		device->kvm = device->group->kvm;

		if (device->ops->open_device) {
			ret = device->ops->open_device(device);
			if (ret)
				goto err_undo_count;
		}
		vfio_device_container_register(device);
		mutex_unlock(&device->group->group_lock);

and kvm_vfio_destroy() has already been invoked and is waiting on group_lock in
vfio_file_set_kvm(), then device->ops->open_device() is being called with a
non-NULL device->kvm that has kvm->users_count==0.  And intel_vgpu_open_device()
at least uses kvm_get_kvm(), i.e. assumes kvm->users_count > 0.

If there's no refcounting bug then kvm_vfio_destroy() doesn't need to call
kvm_vfio_file_set_kvm() since the only way there isn't a refcounting bug is if
group->kvm is unreachable.  This seems unlikely.

Assuming there is indeed a refcounting issue, one solution would be to harden all
->open_device() implementations to use kvm_get_kvm_safe().  I'd prefer not to have
to do that since it will complicate those implementations and also requires KVM
to support an async put.

Rather than force devices to get KVM references, why not handle that in common
VFIO code and drop KVM refcountin from devices?  Worst case scenario KVM is pinned
by a device that doesn't need KVM but is in a group associated with KVM.  If that's
a concern, it seems easy enough to add a flag to vfio_device_ops to enumerate
whether or not the device depends on KVM.

---
 drivers/vfio/vfio_main.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 6e8804fe0095..fb43212d77a0 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -772,6 +772,13 @@ static struct file *vfio_device_open(struct vfio_device *device)
 		 * reference and release it during close_device.
 		 */
 		mutex_lock(&device->group->group_lock);
+
+		if (device->group->kvm &&
+		    !kvm_get_kvm_safe(device->group->kvm->kvm)) {
+			ret = -ENODEV;
+			goto err_undo_count;
+		}
+
 		device->kvm = device->group->kvm;
 
 		if (device->ops->open_device) {
@@ -823,8 +830,10 @@ static struct file *vfio_device_open(struct vfio_device *device)
 err_undo_count:
 	mutex_unlock(&device->group->group_lock);
 	device->open_count--;
-	if (device->open_count == 0 && device->kvm)
+	if (device->open_count == 0 && device->kvm) {
+		kvm_put_kvm(device->kvm);
 		device->kvm = NULL;
+	}
 	mutex_unlock(&device->dev_set->lock);
 	module_put(device->dev->driver->owner);
 err_unassign_container:
@@ -1039,8 +1048,10 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	}
 	mutex_unlock(&device->group->group_lock);
 	device->open_count--;
-	if (device->open_count == 0)
+	if (device->open_count == 0 && device->kvm) {
+		kvm_put_kvm(device->kvm);
 		device->kvm = NULL;
+	}
 	mutex_unlock(&device->dev_set->lock);
 
 	module_put(device->dev->driver->owner);

base-commit: d52444c7a90fc551b4c3b0bda7d3f0b2ca9fc84d
--

next prev parent reply	other threads:[~2023-01-11 19:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-09 20:10 [PATCH 0/2] kvm/vfio: fix potential deadlock on vfio group lock Matthew Rosato
2023-01-09 20:10 ` [PATCH 1/2] KVM: async kvm_destroy_vm for vfio devices Matthew Rosato
2023-01-09 20:13   ` Jason Gunthorpe
2023-01-09 20:24     ` Matthew Rosato
2023-01-09 21:07   ` Anthony Krowiak
2023-01-11 19:54   ` Sean Christopherson [this message]
2023-01-11 20:05     ` Jason Gunthorpe
2023-01-11 20:53       ` Sean Christopherson
2023-01-12 12:45         ` Jason Gunthorpe
2023-01-12 17:21           ` Matthew Rosato
2023-01-12 17:27             ` Jason Gunthorpe
2023-01-09 20:10 ` [PATCH 2/2] KVM: s390: pci: use asyncronous kvm put Matthew Rosato

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:6e8804fe009 dfblob:fb43212d77a )
 OR (
bs:"Re: [PATCH 1/2] KVM: async kvm_destroy_vm for vfio devices" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y78UCz5oeuntSQtK@google.com \
    --to=seanjc@google.com \
    --cc=akrowiak@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-gvt-dev@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=jjherne@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=pmorel@linux.ibm.com \
    --cc=zhenyuw@linux.intel.com \
    --cc=zhi.a.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox