* [PATCH v11 0/2] PCI/IOV: Fix SR-IOV locking races and AB-BA deadlock
@ 2026-03-26 8:35 Ionut Nechita (Wind River)
2026-03-26 8:35 ` [PATCH v11 1/2] PCI/IOV: Make pci_lock_rescan_remove() reentrant and protect sriov_add_vfs/sriov_del_vfs Ionut Nechita (Wind River)
2026-03-26 8:35 ` [PATCH v11 2/2] PCI: Fix AB-BA deadlock between device_lock and pci_rescan_remove_lock in remove_store Ionut Nechita (Wind River)
0 siblings, 2 replies; 4+ messages in thread
From: Ionut Nechita (Wind River) @ 2026-03-26 8:35 UTC (permalink / raw)
To: linux-pci, bhelgaas
Cc: helgaas, sebott, schnelle, bblock, linux, lukas, stable,
linux-kernel, intel-xe, matthew.brost, michal.wajdeczko,
piotr.piorkowski, dtatulea, mani, kbusch, lkml, alifm, julianr,
ionut_n2001, sunlightlinux, Ionut Nechita (Wind River)
Hi Bjorn,
This is v11 of the fix for the SR-IOV race between driver .remove()
and concurrent hotplug events.
Changes since v10 (Mar 18):
- Patch 2/2: added kill_device() before device_release_driver() to
prevent a new driver from binding between unbind and removal,
closing the TOCTOU race window identified by Benjamin Block
- Patch 1/2 unchanged from v10
Changes since v9 (Mar 10):
- NEW patch 2/2: fix AB-BA deadlock in remove_store() by calling
device_release_driver() before pci_stop_and_remove_bus_device_locked(),
as suggested by Benjamin Block (addresses Guenter Roeck's report)
- Patch 1/2 unchanged from v9
Changes since v8 (Mar 9):
- Added Reviewed-by from Niklas Schnelle (IBM) and Tested-by (s390)
- Added Fixes tags for the three related commits
- Removed rescan/remove locking from sriov_numvfs_store() since
locking is now handled in sriov_add_vfs() and sriov_del_vfs()
- Rebased on linux-next (20260309)
The AB-BA deadlock:
CPU0 (remove_store) CPU1 (unbind_store)
-------------------- --------------------
pci_lock_rescan_remove()
device_lock()
driver .remove()
sriov_del_vfs()
pci_lock_rescan_remove() <-- WAITS
pci_stop_bus_device()
device_release_driver()
device_lock() <-- WAITS
Patch 2/2 fixes this by:
1. Marking the device as dead via kill_device() so no new driver
can bind (prevents TOCTOU race between unbind and removal)
2. Calling device_release_driver() before
pci_stop_and_remove_bus_device_locked(), so both paths take
locks in the same order: device_lock first, then
pci_rescan_remove_lock
Note: the concurrent unbind_store + hotplug-event case (where the
hotplug handler takes pci_rescan_remove_lock before device_lock)
remains a known limitation. This is a pre-existing issue that
Benjamin Block is addressing separately in:
https://lore.kernel.org/linux-pci/354b9e4a54ced67f3c89df198041df19434fe4c8.1773235561.git.bblock@linux.ibm.com/
This race has been independently observed by multiple organizations:
- IBM (s390 platform-generated hot-unplug events racing with
sriov_del_vfs during PF driver unload)
- NVIDIA (tested by Dragos Tatulea in earlier versions)
- Intel (xe driver hitting lockdep warnings and deadlocks when
calling pci_disable_sriov from .remove)
- Wind River (original reporter and patch author)
Test environment:
- Tested on s390 by Benjamin Block and Niklas Schnelle (IBM)
- Tested on x86_64 with Intel and NVIDIA SR-IOV devices (earlier
versions)
Based on linux-next (next-20260325).
Link: https://lore.kernel.org/linux-pci/20260214193235.262219-3-ionut.nechita@windriver.com/ [v1]
Link: https://lore.kernel.org/linux-pci/20260219212648.82606-1-ionut.nechita@windriver.com/ [v2]
Link: https://lore.kernel.org/lkml/20260225202434.18737-1-ionut.nechita@windriver.com/ [v3]
Link: https://lore.kernel.org/linux-pci/20260228120138.51197-2-ionut.nechita@windriver.com/ [v4]
Link: https://lore.kernel.org/linux-pci/20260303080903.28693-1-ionut.nechita@windriver.com/ [v5]
Link: https://lore.kernel.org/linux-pci/20260306082108.17322-1-ionut.nechita@windriver.com/ [v6]
Link: https://lore.kernel.org/linux-pci/20260308135352.80346-1-ionut.nechita@windriver.com/ [v7]
Link: https://lore.kernel.org/linux-pci/20260309194920.16459-1-ionut.nechita@windriver.com/ [v8]
Link: https://lore.kernel.org/linux-pci/20260310074303.17480-1-ionut.nechita@windriver.com/ [v9]
Link: https://lore.kernel.org/linux-pci/20260318210316.61975-1-ionut.nechita@windriver.com/ [v10]
Ionut Nechita (Wind River) (2):
PCI/IOV: Make pci_lock_rescan_remove() reentrant and protect
sriov_add_vfs/sriov_del_vfs
PCI: Fix AB-BA deadlock between device_lock and pci_rescan_remove_lock
in remove_store
drivers/pci/iov.c | 9 +++++----
drivers/pci/pci-sysfs.c | 30 +++++++++++++++++++++++++++++-
drivers/pci/probe.c | 11 +++++++++--
3 files changed, 43 insertions(+), 7 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH v11 1/2] PCI/IOV: Make pci_lock_rescan_remove() reentrant and protect sriov_add_vfs/sriov_del_vfs 2026-03-26 8:35 [PATCH v11 0/2] PCI/IOV: Fix SR-IOV locking races and AB-BA deadlock Ionut Nechita (Wind River) @ 2026-03-26 8:35 ` Ionut Nechita (Wind River) 2026-03-26 8:35 ` [PATCH v11 2/2] PCI: Fix AB-BA deadlock between device_lock and pci_rescan_remove_lock in remove_store Ionut Nechita (Wind River) 1 sibling, 0 replies; 4+ messages in thread From: Ionut Nechita (Wind River) @ 2026-03-26 8:35 UTC (permalink / raw) To: linux-pci, bhelgaas Cc: helgaas, sebott, schnelle, bblock, linux, lukas, stable, linux-kernel, intel-xe, matthew.brost, michal.wajdeczko, piotr.piorkowski, dtatulea, mani, kbusch, lkml, alifm, julianr, ionut_n2001, sunlightlinux, Ionut Nechita (Wind River) After reverting commit 05703271c3cd ("PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV") and moving the lock to sriov_numvfs_store(), the path through driver .remove() (e.g. rmmod, or manual unbind) that calls pci_disable_sriov() directly remains unprotected against concurrent hotplug events. This affects any SR-IOV capable driver that calls pci_disable_sriov() from its .remove() callback (i40e, ice, mlx5, bnxt, etc.). On s390, platform-generated hot-unplug events for VFs can race with sriov_del_vfs() when a PF driver is being unloaded. The platform event handler takes pci_rescan_remove_lock, but sriov_del_vfs() does not, leading to double removal and list corruption. We cannot use a plain mutex_lock() here because sriov_del_vfs() may also be called from paths that already hold pci_rescan_remove_lock (e.g. remove_store -> pci_stop_and_remove_bus_device_locked, or sriov_numvfs_store with the lock taken by the previous patch). Using mutex_lock() in those cases would deadlock. Make pci_lock_rescan_remove() itself reentrant using mutex_get_owner() and a reentrant depth counter, as suggested by Lukas Wunner and Benjamin Block, since these recursive locking scenarios exist elsewhere in the PCI subsystem: - If the lock is already held by the current task (checked via mutex_get_owner()): increments the reentrant counter and returns without re-acquiring, avoiding deadlock. - If the lock is held by another task: blocks until the lock is released, providing complete serialization. - If the lock is not held: acquires the mutex normally. pci_unlock_rescan_remove() decrements the reentrant counter if it is non-zero, otherwise releases the mutex. This approach keeps the API unchanged: callers simply pair lock/unlock calls without needing to track any return value or use separate reentrant variants. Add pci_lock_rescan_remove()/pci_unlock_rescan_remove() calls to sriov_add_vfs() and sriov_del_vfs() to protect VF addition and removal against concurrent hotplug events. Remove the rescan/remove locking from sriov_numvfs_store() that was introduced by commit a5338e365c45 ("PCI/IOV: Fix race between SR-IOV enable/disable and hotplug"), since the locking is now handled directly in sriov_add_vfs() and sriov_del_vfs() where it is actually needed, reducing the lock scope. Fixes: 18f9e9d150fc ("PCI/IOV: Factor out sriov_add_vfs()") Fixes: 05703271c3cd ("PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV") Fixes: a5338e365c45 ("PCI/IOV: Fix race between SR-IOV enable/disable and hotplug") Cc: stable@vger.kernel.org Suggested-by: Lukas Wunner <lukas@wunner.de> Suggested-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Tested-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> # s390 Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com> --- drivers/pci/iov.c | 9 +++++---- drivers/pci/probe.c | 11 +++++++++-- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 91ac4e37ecb9c..7ed902539051e 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -495,9 +495,7 @@ static ssize_t sriov_numvfs_store(struct device *dev, if (num_vfs == 0) { /* disable VFs */ - pci_lock_rescan_remove(); ret = pdev->driver->sriov_configure(pdev, 0); - pci_unlock_rescan_remove(); goto exit; } @@ -509,9 +507,7 @@ static ssize_t sriov_numvfs_store(struct device *dev, goto exit; } - pci_lock_rescan_remove(); ret = pdev->driver->sriov_configure(pdev, num_vfs); - pci_unlock_rescan_remove(); if (ret < 0) goto exit; @@ -633,15 +629,18 @@ static int sriov_add_vfs(struct pci_dev *dev, u16 num_vfs) if (dev->no_vf_scan) return 0; + pci_lock_rescan_remove(); for (i = 0; i < num_vfs; i++) { rc = pci_iov_add_virtfn(dev, i); if (rc) goto failed; } + pci_unlock_rescan_remove(); return 0; failed: while (i--) pci_iov_remove_virtfn(dev, i); + pci_unlock_rescan_remove(); return rc; } @@ -766,8 +765,10 @@ static void sriov_del_vfs(struct pci_dev *dev) struct pci_sriov *iov = dev->sriov; int i; + pci_lock_rescan_remove(); for (i = 0; i < iov->num_VFs; i++) pci_iov_remove_virtfn(dev, i); + pci_unlock_rescan_remove(); } static void sriov_disable(struct pci_dev *dev) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index bccc7a4bdd794..ce4d351b5aa21 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -3509,16 +3509,23 @@ EXPORT_SYMBOL_GPL(pci_rescan_bus); * routines should always be executed under this mutex. */ DEFINE_MUTEX(pci_rescan_remove_lock); +static size_t pci_rescan_remove_reentrant_count; void pci_lock_rescan_remove(void) { - mutex_lock(&pci_rescan_remove_lock); + if (mutex_get_owner(&pci_rescan_remove_lock) == (unsigned long)current) + pci_rescan_remove_reentrant_count++; + else + mutex_lock(&pci_rescan_remove_lock); } EXPORT_SYMBOL_GPL(pci_lock_rescan_remove); void pci_unlock_rescan_remove(void) { - mutex_unlock(&pci_rescan_remove_lock); + if (pci_rescan_remove_reentrant_count > 0) + pci_rescan_remove_reentrant_count--; + else + mutex_unlock(&pci_rescan_remove_lock); } EXPORT_SYMBOL_GPL(pci_unlock_rescan_remove); -- 2.53.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v11 2/2] PCI: Fix AB-BA deadlock between device_lock and pci_rescan_remove_lock in remove_store 2026-03-26 8:35 [PATCH v11 0/2] PCI/IOV: Fix SR-IOV locking races and AB-BA deadlock Ionut Nechita (Wind River) 2026-03-26 8:35 ` [PATCH v11 1/2] PCI/IOV: Make pci_lock_rescan_remove() reentrant and protect sriov_add_vfs/sriov_del_vfs Ionut Nechita (Wind River) @ 2026-03-26 8:35 ` Ionut Nechita (Wind River) 2026-04-15 19:48 ` Niklas Schnelle 1 sibling, 1 reply; 4+ messages in thread From: Ionut Nechita (Wind River) @ 2026-03-26 8:35 UTC (permalink / raw) To: linux-pci, bhelgaas Cc: helgaas, sebott, schnelle, bblock, linux, lukas, stable, linux-kernel, intel-xe, matthew.brost, michal.wajdeczko, piotr.piorkowski, dtatulea, mani, kbusch, lkml, alifm, julianr, ionut_n2001, sunlightlinux, Ionut Nechita (Wind River) remove_store() calls pci_stop_and_remove_bus_device_locked() which takes pci_rescan_remove_lock first, then device_lock during driver release. Meanwhile, unbind_store() takes device_lock first (via device_driver_detach), and the driver's .remove() callback may call pci_disable_sriov() -> sriov_del_vfs() -> pci_lock_rescan_remove(). This creates an AB-BA deadlock: CPU0 (remove_store) CPU1 (unbind_store) -------------------- -------------------- pci_lock_rescan_remove() device_lock() driver .remove() sriov_del_vfs() pci_lock_rescan_remove() <-- WAITS pci_stop_bus_device() device_release_driver() device_lock() <-- WAITS Fix this by first marking the device as dead using kill_device() to prevent any new driver from binding, then calling device_release_driver() before pci_stop_and_remove_bus_device_locked(). Marking the device dead closes the race window between unbinding and removal where a new driver could theoretically bind: once the dead flag is set, the device core will refuse any new driver probe. After device_release_driver() returns, the driver is already unbound, so the subsequent device_release_driver() call inside pci_stop_and_remove_bus_device_locked() becomes a no-op. Fixes: a5338e365c45 ("PCI/IOV: Fix race between SR-IOV enable/disable and hotplug") Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/linux-pci/0ca9e675-478c-411d-be32-e2d81439288f@roeck-us.net/ Reported-by: Benjamin Block <bblock@linux.ibm.com> Closes: https://lore.kernel.org/linux-pci/20260317090149.GA3835708@chlorum.ategam.org/ Suggested-by: Benjamin Block <bblock@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com> --- drivers/pci/pci-sysfs.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index a2f8a5d6190fd..e87aa96c02bde 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -518,8 +518,36 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr, if (kstrtoul(buf, 0, &val) < 0) return -EINVAL; - if (val && device_remove_file_self(dev, attr)) + if (val && device_remove_file_self(dev, attr)) { + /* + * Mark the device as dead so that no new driver can bind + * between the unbind and the removal below. Once the + * dead flag is set, the device core will refuse any new + * driver probe. + */ + device_lock(dev); + kill_device(dev); + device_unlock(dev); + + /* + * Unbind the driver before removing the device to avoid + * an AB-BA deadlock between device_lock and + * pci_rescan_remove_lock. Without this, remove_store + * takes pci_rescan_remove_lock first (via + * pci_stop_and_remove_bus_device_locked) and then + * device_lock during driver release, while a concurrent + * unbind_store (or sriov_numvfs_store) takes device_lock + * first and then pci_rescan_remove_lock (via + * sriov_del_vfs), creating a circular dependency. + * + * By unbinding first, the driver's .remove() callback + * (including any SR-IOV VF cleanup) completes before + * pci_rescan_remove_lock is acquired, ensuring both + * paths take locks in the same order. + */ + device_release_driver(dev); pci_stop_and_remove_bus_device_locked(to_pci_dev(dev)); + } return count; } static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0220, NULL, -- 2.53.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v11 2/2] PCI: Fix AB-BA deadlock between device_lock and pci_rescan_remove_lock in remove_store 2026-03-26 8:35 ` [PATCH v11 2/2] PCI: Fix AB-BA deadlock between device_lock and pci_rescan_remove_lock in remove_store Ionut Nechita (Wind River) @ 2026-04-15 19:48 ` Niklas Schnelle 0 siblings, 0 replies; 4+ messages in thread From: Niklas Schnelle @ 2026-04-15 19:48 UTC (permalink / raw) To: Ionut Nechita (Wind River), linux-pci, bhelgaas Cc: helgaas, sebott, bblock, linux, lukas, stable, linux-kernel, intel-xe, matthew.brost, michal.wajdeczko, piotr.piorkowski, dtatulea, mani, kbusch, lkml, alifm, julianr, ionut_n2001, sunlightlinux On Thu, 2026-03-26 at 10:35 +0200, Ionut Nechita (Wind River) wrote: > remove_store() calls pci_stop_and_remove_bus_device_locked() which > takes pci_rescan_remove_lock first, then device_lock during driver > release. Meanwhile, unbind_store() takes device_lock first (via > device_driver_detach), and the driver's .remove() callback may call > pci_disable_sriov() -> sriov_del_vfs() -> pci_lock_rescan_remove(). > > This creates an AB-BA deadlock: > > CPU0 (remove_store) CPU1 (unbind_store) > -------------------- -------------------- > pci_lock_rescan_remove() > device_lock() > driver .remove() > sriov_del_vfs() > pci_lock_rescan_remove() <-- WAITS > pci_stop_bus_device() > device_release_driver() > device_lock() <-- WAITS > > Fix this by first marking the device as dead using kill_device() to > prevent any new driver from binding, then calling device_release_driver() > before pci_stop_and_remove_bus_device_locked(). > > Marking the device dead closes the race window between unbinding and > removal where a new driver could theoretically bind: once the dead flag > is set, the device core will refuse any new driver probe. > > After device_release_driver() returns, the driver is already unbound, > so the subsequent device_release_driver() call inside > pci_stop_and_remove_bus_device_locked() becomes a no-op. > > Fixes: a5338e365c45 ("PCI/IOV: Fix race between SR-IOV enable/disable and hotplug") > Reported-by: Guenter Roeck <linux@roeck-us.net> > Closes: https://lore.kernel.org/linux-pci/0ca9e675-478c-411d-be32-e2d81439288f@roeck-us.net/ > Reported-by: Benjamin Block <bblock@linux.ibm.com> > Closes: https://lore.kernel.org/linux-pci/20260317090149.GA3835708@chlorum.ategam.org/ > Suggested-by: Benjamin Block <bblock@linux.ibm.com> > Cc: stable@vger.kernel.org > Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com> > --- > drivers/pci/pci-sysfs.c | 30 +++++++++++++++++++++++++++++- > 1 file changed, 29 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index a2f8a5d6190fd..e87aa96c02bde 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -518,8 +518,36 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr, > if (kstrtoul(buf, 0, &val) < 0) > return -EINVAL; > > - if (val && device_remove_file_self(dev, attr)) > + if (val && device_remove_file_self(dev, attr)) { > + /* > + * Mark the device as dead so that no new driver can bind > + * between the unbind and the removal below. Once the > + * dead flag is set, the device core will refuse any new > + * driver probe. > + */ > + device_lock(dev); > + kill_device(dev); > + device_unlock(dev); > + > + /* > + * Unbind the driver before removing the device to avoid > + * an AB-BA deadlock between device_lock and > + * pci_rescan_remove_lock. Without this, remove_store > + * takes pci_rescan_remove_lock first (via > + * pci_stop_and_remove_bus_device_locked) and then > + * device_lock during driver release, while a concurrent > + * unbind_store (or sriov_numvfs_store) takes device_lock > + * first and then pci_rescan_remove_lock (via > + * sriov_del_vfs), creating a circular dependency. > + * > + * By unbinding first, the driver's .remove() callback > + * (including any SR-IOV VF cleanup) completes before > + * pci_rescan_remove_lock is acquired, ensuring both > + * paths take locks in the same order. > + */ > + device_release_driver(dev); > pci_stop_and_remove_bus_device_locked(to_pci_dev(dev)); > + } > return count; > } > static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0220, NULL, This looks good to me and this use of kill_device() seems to match pretty well with the comment in kill_device() as setting it on tearing things down. Thank you for working on this rats nest of PCI rescan/remove lock issues! Feel free to add my Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-15 19:50 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-26 8:35 [PATCH v11 0/2] PCI/IOV: Fix SR-IOV locking races and AB-BA deadlock Ionut Nechita (Wind River) 2026-03-26 8:35 ` [PATCH v11 1/2] PCI/IOV: Make pci_lock_rescan_remove() reentrant and protect sriov_add_vfs/sriov_del_vfs Ionut Nechita (Wind River) 2026-03-26 8:35 ` [PATCH v11 2/2] PCI: Fix AB-BA deadlock between device_lock and pci_rescan_remove_lock in remove_store Ionut Nechita (Wind River) 2026-04-15 19:48 ` Niklas Schnelle
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox