From: Alex Williamson <alex.williamson@redhat.com>
To: Jim Harris <jim.harris@samsung.com>
Cc: "bhelgaas@google.com" <bhelgaas@google.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"ben@nvidia.com" <ben@nvidia.com>,
"jgg@nvidia.com" <jgg@nvidia.com>
Subject: Re: Locking between vfio hot-remove and pci sysfs sriov_numvfs
Date: Thu, 7 Dec 2023 16:21:48 -0700 [thread overview]
Message-ID: <20231207162148.2631fa58.alex.williamson@redhat.com> (raw)
In-Reply-To: <ZXJI5+f8bUelVXqu@ubuntu>
On Thu, 7 Dec 2023 22:38:23 +0000
Jim Harris <jim.harris@samsung.com> wrote:
> I am seeing a deadlock using SPDK with hotplug detection using vfio-pci
> and an SR-IOV enabled NVMe SSD. It is not clear if this deadlock is intended
> or if it's a kernel bug.
>
> Note: SPDK uses DPDK's PCI device enumeration framework, so I'll reference
> both SPDK and DPDK in this description.
>
> DPDK registers an eventfd with vfio for hotplug notifications. If the associated
> device is removed (i.e. write 1 to its pci sysfs remove entry), vfio
> writes to the eventfd, requesting DPDK to release the device. It does this
> while holding the device_lock(), and then waits for completion.
>
> DPDK gets the notification, and passes it up to SPDK. SPDK does not release
> the device immediately. It has some asynchronous operations that need to be
> performed first, so it will release the device a bit later.
>
> But before the device is released, SPDK also triggers DPDK to do a sysfs scan
> looking for newly inserted devices. Note that the removed device is not
> completely removed yet from kernel PCI perspective - all of its sysfs entries
> are still available, including sriov_numvfs.
>
> DPDK explicitly reads sriov_numvfs to see if the device is SR-IOV capable.
> SPDK itself doesn't actually use this value, but it is part of the scan
> triggered by SPDK and directly leads to the deadlock. sriov_numvfs_show()
> deadlocks because it tries to hold device_lock() while reading the pci
> device's pdev->sriov->num_VFs.
>
> We're able to workaround this in SPDK by deferring the sysfs scan if
> a device removal is in process. And maybe that is what we are supposed to
> be doing, to avoid this deadlock?
>
> Reference to SPDK issue, for some more details (plus simple repro stpes for
> anyone already familiar with SPDK): https://github.com/spdk/spdk/issues/3205
device_lock() has been a recurring problem. We don't have a lot of
leeway in how we support the driver remove callback, the device needs
to be released. We can't return -EBUSY and I don't think we can drop
the mutex while we're waiting on userspace.
I've done some fix-ups in the past to use device_trylock() to avoid
deadlocks, which might be an option here, ex. reading sriov_numvfs
could return -EBUSY in this scenario. We keep running into these
scenarios though and we might just need to pick a point at which we
kill the user process holding the device.
I'm open to suggestions. Thanks,
Alex
next prev parent reply other threads:[~2023-12-07 23:21 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20231207223824uscas1p27dd91f0af56cda282cd28046cc981fe9@uscas1p2.samsung.com>
2023-12-07 22:38 ` Locking between vfio hot-remove and pci sysfs sriov_numvfs Jim Harris
2023-12-07 23:21 ` Alex Williamson [this message]
2023-12-07 23:48 ` Jason Gunthorpe
2023-12-08 17:07 ` Jim Harris
2023-12-08 19:41 ` Jason Gunthorpe
2023-12-08 20:09 ` Jim Harris
2023-12-10 19:05 ` Jason Gunthorpe
2023-12-11 7:20 ` Leon Romanovsky
2023-12-12 21:34 ` Jim Harris
2023-12-13 6:55 ` Leon Romanovsky
2023-12-08 17:38 ` Jim Harris
2023-12-08 17:41 ` Jason Gunthorpe
2023-12-08 17:59 ` Jim Harris
2023-12-08 18:01 ` Jason Gunthorpe
2023-12-08 18:12 ` Alex Williamson
2023-12-08 19:43 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231207162148.2631fa58.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=ben@nvidia.com \
--cc=bhelgaas@google.com \
--cc=jgg@nvidia.com \
--cc=jim.harris@samsung.com \
--cc=kvm@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox