From: Jason Gunthorpe <jgg@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Jim Harris <jim.harris@samsung.com>,
"bhelgaas@google.com" <bhelgaas@google.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"ben@nvidia.com" <ben@nvidia.com>
Subject: Re: Locking between vfio hot-remove and pci sysfs sriov_numvfs
Date: Thu, 7 Dec 2023 19:48:10 -0400 [thread overview]
Message-ID: <20231207234810.GN2692119@nvidia.com> (raw)
In-Reply-To: <20231207162148.2631fa58.alex.williamson@redhat.com>
On Thu, Dec 07, 2023 at 04:21:48PM -0700, Alex Williamson wrote:
> On Thu, 7 Dec 2023 22:38:23 +0000
> Jim Harris <jim.harris@samsung.com> wrote:
>
> > I am seeing a deadlock using SPDK with hotplug detection using vfio-pci
> > and an SR-IOV enabled NVMe SSD. It is not clear if this deadlock is intended
> > or if it's a kernel bug.
> >
> > Note: SPDK uses DPDK's PCI device enumeration framework, so I'll reference
> > both SPDK and DPDK in this description.
> >
> > DPDK registers an eventfd with vfio for hotplug notifications. If the associated
> > device is removed (i.e. write 1 to its pci sysfs remove entry), vfio
> > writes to the eventfd, requesting DPDK to release the device. It does this
> > while holding the device_lock(), and then waits for completion.
> >
> > DPDK gets the notification, and passes it up to SPDK. SPDK does not release
> > the device immediately. It has some asynchronous operations that need to be
> > performed first, so it will release the device a bit later.
> >
> > But before the device is released, SPDK also triggers DPDK to do a sysfs scan
> > looking for newly inserted devices. Note that the removed device is not
> > completely removed yet from kernel PCI perspective - all of its sysfs entries
> > are still available, including sriov_numvfs.
> >
> > DPDK explicitly reads sriov_numvfs to see if the device is SR-IOV capable.
> > SPDK itself doesn't actually use this value, but it is part of the scan
> > triggered by SPDK and directly leads to the deadlock. sriov_numvfs_show()
> > deadlocks because it tries to hold device_lock() while reading the pci
> > device's pdev->sriov->num_VFs.
> >
> > We're able to workaround this in SPDK by deferring the sysfs scan if
> > a device removal is in process. And maybe that is what we are supposed to
> > be doing, to avoid this deadlock?
> >
> > Reference to SPDK issue, for some more details (plus simple repro stpes for
> > anyone already familiar with SPDK): https://github.com/spdk/spdk/issues/3205
>
> device_lock() has been a recurring problem. We don't have a lot of
> leeway in how we support the driver remove callback, the device needs
> to be released. We can't return -EBUSY and I don't think we can drop
> the mutex while we're waiting on userspace.
The mechanism of waiting in remove for userspace is inherently flawed,
it can never work fully correctly. :( I've hit this many times.
Upon remove VFIO should immediately remove itself and leave behind a
non-functional file descriptor. Userspace should catch up eventually
and see it is toast.
The kernel locking model just cannot support userspace delaying this
process.
Jason
next prev parent reply other threads:[~2023-12-07 23:48 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20231207223824uscas1p27dd91f0af56cda282cd28046cc981fe9@uscas1p2.samsung.com>
2023-12-07 22:38 ` Locking between vfio hot-remove and pci sysfs sriov_numvfs Jim Harris
2023-12-07 23:21 ` Alex Williamson
2023-12-07 23:48 ` Jason Gunthorpe [this message]
2023-12-08 17:07 ` Jim Harris
2023-12-08 19:41 ` Jason Gunthorpe
2023-12-08 20:09 ` Jim Harris
2023-12-10 19:05 ` Jason Gunthorpe
2023-12-11 7:20 ` Leon Romanovsky
2023-12-12 21:34 ` Jim Harris
2023-12-13 6:55 ` Leon Romanovsky
2023-12-08 17:38 ` Jim Harris
2023-12-08 17:41 ` Jason Gunthorpe
2023-12-08 17:59 ` Jim Harris
2023-12-08 18:01 ` Jason Gunthorpe
2023-12-08 18:12 ` Alex Williamson
2023-12-08 19:43 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231207234810.GN2692119@nvidia.com \
--to=jgg@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=ben@nvidia.com \
--cc=bhelgaas@google.com \
--cc=jim.harris@samsung.com \
--cc=kvm@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox