From: Jason Gunthorpe <jgg@nvidia.com>
To: Sean Christopherson <seanjc@google.com>
Cc: akrowiak@linux.ibm.com, jjherne@linux.ibm.com,
farman@linux.ibm.com, imbrenda@linux.ibm.com,
Matthew Rosato <mjrosato@linux.ibm.com>,
pmorel@linux.ibm.com, david@redhat.com,
linux-s390@vger.kernel.org, intel-gfx@lists.freedesktop.org,
cohuck@redhat.com, linux-kernel@vger.kernel.org,
pasic@linux.ibm.com, kvm@vger.kernel.org, pbonzini@redhat.com,
borntraeger@linux.ibm.com, intel-gvt-dev@lists.freedesktop.org,
frankja@linux.ibm.com
Subject: Re: [Intel-gfx] [PATCH 1/2] KVM: async kvm_destroy_vm for vfio devices
Date: Thu, 12 Jan 2023 08:45:38 -0400 [thread overview]
Message-ID: <Y8AA8r5MzKQIF8I7@nvidia.com> (raw)
In-Reply-To: <Y78hzsHiwaFpL60+@google.com>
On Wed, Jan 11, 2023 at 08:53:34PM +0000, Sean Christopherson wrote:
> On Wed, Jan 11, 2023, Jason Gunthorpe wrote:
> > On Wed, Jan 11, 2023 at 07:54:51PM +0000, Sean Christopherson wrote:
> >
> > > Something feels off. If KVM's refcount is 0, then accessing device->group->kvm
> > > in vfio_device_open() can't happen unless there's a refcounting bug somewhere.
> >
> > The problem is in close, not open.
>
> The deadlock problem is, yes. My point is that if group_lock needs to be taken
> when nullifying group->kvm during kvm_vfio_destroy(), then there is also a refcounting
> prolem with respect to open(). If there is no refcounting problem, then nullifying
> group->kvm during kvm_vfio_destroy() is unnecessary (but again, I doubt this is
> the case).
IIRC the drivers are supposed to use one of the refcount not zero
incrs to counteract this, but I never checked they do..
Yi is working on a patch to change things so vfio drops the kvm
pointer when the kvm file closes, not when the reference goes to 0
to avoid a refcount cycle problem which should also solve that.
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 6e8804fe0095..b3a84d65baa6 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -772,7 +772,12 @@ static struct file *vfio_device_open(struct vfio_device *device)
> * reference and release it during close_device.
> */
> mutex_lock(&device->group->group_lock);
> - device->kvm = device->group->kvm;
> +
> + if (device->kvm_ops && device->group->kvm) {
> + ret = device->kvm_ops->get_kvm(device->group->kvm);
At this point I'd rather just use the symbol get stuff like kvm does
and call the proper functions.
Jason
WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Matthew Rosato <mjrosato@linux.ibm.com>,
alex.williamson@redhat.com, pbonzini@redhat.com,
cohuck@redhat.com, farman@linux.ibm.com, pmorel@linux.ibm.com,
borntraeger@linux.ibm.com, frankja@linux.ibm.com,
imbrenda@linux.ibm.com, david@redhat.com, akrowiak@linux.ibm.com,
jjherne@linux.ibm.com, pasic@linux.ibm.com,
zhenyuw@linux.intel.com, zhi.a.wang@intel.com,
linux-s390@vger.kernel.org, kvm@vger.kernel.org,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] KVM: async kvm_destroy_vm for vfio devices
Date: Thu, 12 Jan 2023 08:45:38 -0400 [thread overview]
Message-ID: <Y8AA8r5MzKQIF8I7@nvidia.com> (raw)
In-Reply-To: <Y78hzsHiwaFpL60+@google.com>
On Wed, Jan 11, 2023 at 08:53:34PM +0000, Sean Christopherson wrote:
> On Wed, Jan 11, 2023, Jason Gunthorpe wrote:
> > On Wed, Jan 11, 2023 at 07:54:51PM +0000, Sean Christopherson wrote:
> >
> > > Something feels off. If KVM's refcount is 0, then accessing device->group->kvm
> > > in vfio_device_open() can't happen unless there's a refcounting bug somewhere.
> >
> > The problem is in close, not open.
>
> The deadlock problem is, yes. My point is that if group_lock needs to be taken
> when nullifying group->kvm during kvm_vfio_destroy(), then there is also a refcounting
> prolem with respect to open(). If there is no refcounting problem, then nullifying
> group->kvm during kvm_vfio_destroy() is unnecessary (but again, I doubt this is
> the case).
IIRC the drivers are supposed to use one of the refcount not zero
incrs to counteract this, but I never checked they do..
Yi is working on a patch to change things so vfio drops the kvm
pointer when the kvm file closes, not when the reference goes to 0
to avoid a refcount cycle problem which should also solve that.
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 6e8804fe0095..b3a84d65baa6 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -772,7 +772,12 @@ static struct file *vfio_device_open(struct vfio_device *device)
> * reference and release it during close_device.
> */
> mutex_lock(&device->group->group_lock);
> - device->kvm = device->group->kvm;
> +
> + if (device->kvm_ops && device->group->kvm) {
> + ret = device->kvm_ops->get_kvm(device->group->kvm);
At this point I'd rather just use the symbol get stuff like kvm does
and call the proper functions.
Jason
next prev parent reply other threads:[~2023-01-12 12:45 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-09 20:10 [Intel-gfx] [PATCH 0/2] kvm/vfio: fix potential deadlock on vfio group lock Matthew Rosato
2023-01-09 20:10 ` Matthew Rosato
2023-01-09 20:10 ` [Intel-gfx] [PATCH 1/2] KVM: async kvm_destroy_vm for vfio devices Matthew Rosato
2023-01-09 20:10 ` Matthew Rosato
2023-01-09 20:13 ` [Intel-gfx] " Jason Gunthorpe
2023-01-09 20:13 ` Jason Gunthorpe
2023-01-09 20:24 ` [Intel-gfx] " Matthew Rosato
2023-01-09 20:24 ` Matthew Rosato
2023-01-09 21:07 ` [Intel-gfx] " Anthony Krowiak
2023-01-09 21:07 ` Anthony Krowiak
2023-01-11 19:54 ` [Intel-gfx] " Sean Christopherson
2023-01-11 19:54 ` Sean Christopherson
2023-01-11 20:05 ` [Intel-gfx] " Jason Gunthorpe
2023-01-11 20:05 ` Jason Gunthorpe
2023-01-11 20:53 ` [Intel-gfx] " Sean Christopherson
2023-01-11 20:53 ` Sean Christopherson
2023-01-12 12:45 ` Jason Gunthorpe [this message]
2023-01-12 12:45 ` Jason Gunthorpe
2023-01-12 17:21 ` [Intel-gfx] " Matthew Rosato
2023-01-12 17:21 ` Matthew Rosato
2023-01-12 17:27 ` [Intel-gfx] " Jason Gunthorpe
2023-01-12 17:27 ` Jason Gunthorpe
2023-01-09 20:10 ` [Intel-gfx] [PATCH 2/2] KVM: s390: pci: use asyncronous kvm put Matthew Rosato
2023-01-09 20:10 ` Matthew Rosato
2023-01-09 21:10 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for kvm/vfio: fix potential deadlock on vfio group lock Patchwork
2023-01-09 21:38 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-01-10 7:08 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2023-01-11 20:39 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for kvm/vfio: fix potential deadlock on vfio group lock (rev2) Patchwork
2023-01-11 21:09 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for kvm/vfio: fix potential deadlock on vfio group lock (rev3) Patchwork
2023-01-12 19:32 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for kvm/vfio: fix potential deadlock on vfio group lock (rev4) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y8AA8r5MzKQIF8I7@nvidia.com \
--to=jgg@nvidia.com \
--cc=akrowiak@linux.ibm.com \
--cc=borntraeger@linux.ibm.com \
--cc=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=farman@linux.ibm.com \
--cc=frankja@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-gvt-dev@lists.freedesktop.org \
--cc=jjherne@linux.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mjrosato@linux.ibm.com \
--cc=pasic@linux.ibm.com \
--cc=pbonzini@redhat.com \
--cc=pmorel@linux.ibm.com \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.