* [PATCH v5 0/1] PCI/IOV: Add reentrant locking in sriov_add_vfs/sriov_del_vfs
@ 2026-03-03 8:09 Ionut Nechita (Wind River)
2026-03-03 8:09 ` [PATCH v5 1/1] PCI/IOV: Add reentrant locking in sriov_add_vfs/sriov_del_vfs for complete serialization Ionut Nechita (Wind River)
0 siblings, 1 reply; 6+ messages in thread
From: Ionut Nechita (Wind River) @ 2026-03-03 8:09 UTC (permalink / raw)
To: linux-pci, bhelgaas
Cc: helgaas, sebott, schnelle, bblock, alifm, julianr, dtatulea, mani,
ionut_n2001, sunlightlinux, linux-kernel, stable,
Ionut Nechita (Wind River)
Hi Bjorn,
This is v5 of the fix for the SR-IOV race between driver .remove()
and concurrent hotplug events (particularly on s390).
Changes since v4 (Feb 28):
- Replaced local pci_rescan_remove_owner variable with
mutex_get_owner() to check lock ownership, as suggested by
Manivannan Sadhasivam and agreed by Benjamin Block
- Removed owner tracking from pci_lock_rescan_remove() and
pci_unlock_rescan_remove() - they are now unchanged from upstream
- Rebased on linux-next (20260302)
Changes since v3 (Feb 25):
- Rebased on linux-next (next-20260227)
- Declared pci_rescan_remove_owner as const pointer
(const struct task_struct *) to make clear it is not meant to
modify the task (Benjamin Block)
- Added Reviewed-by and Tested-by from Benjamin Block (IBM)
Changes since v2 (Feb 19):
- Rebased on linux-next (next-20260225)
- Added Tested-by from Dragos Tatulea (NVIDIA)
- No code changes from v2
Changes since v1 (Feb 14):
- Renamed from pci_lock_rescan_remove_nested() to
pci_lock_rescan_remove_reentrant() to avoid confusion with
mutex_lock_nested() lockdep annotations (Benjamin Block)
- Added pci_unlock_rescan_remove_reentrant(const bool locked) helper
to avoid open-coding conditional unlock at each call site
(Benjamin Block)
- Moved declarations from drivers/pci/pci.h to include/linux/pci.h
alongside existing lock/unlock declarations (Benjamin Block)
- Simplified callers: removed negation of return value and manual
conditional unlock in favor of the paired lock/unlock helpers
The problem: on s390, platform-generated hot-unplug events for VFs
can race with sriov_del_vfs() when a PF driver is being unloaded.
The platform event handler takes pci_rescan_remove_lock, but
sriov_del_vfs() does not, leading to double removal and list
corruption. We cannot use a plain mutex_lock() because
sriov_del_vfs() may be called from paths that already hold the
lock (deadlock), and mutex_trylock() cannot distinguish self from
other holders.
The fix introduces pci_lock_rescan_remove_reentrant() which uses
mutex_get_owner() to check if the current task already holds the
lock, avoiding both deadlock and the trylock problem.
Link: https://lore.kernel.org/linux-pci/20260214193235.262219-3-ionut.nechita@windriver.com/ [v1]
Link: https://lore.kernel.org/linux-pci/20260219212648.82606-1-ionut.nechita@windriver.com/ [v2]
Link: https://lore.kernel.org/linux-pci/20260225202434.18737-1-ionut.nechita@windriver.com/ [v3]
Link: https://lore.kernel.org/linux-pci/20260228120138.51197-2-ionut.nechita@windriver.com/ [v4]
Ionut Nechita (1):
PCI/IOV: Add reentrant locking in sriov_add_vfs/sriov_del_vfs for
complete serialization
drivers/pci/iov.c | 7 +++++++
drivers/pci/probe.c | 16 ++++++++++++++++
include/linux/pci.h | 2 ++
3 files changed, 25 insertions(+)
--
2.53.0
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH v5 1/1] PCI/IOV: Add reentrant locking in sriov_add_vfs/sriov_del_vfs for complete serialization 2026-03-03 8:09 [PATCH v5 0/1] PCI/IOV: Add reentrant locking in sriov_add_vfs/sriov_del_vfs Ionut Nechita (Wind River) @ 2026-03-03 8:09 ` Ionut Nechita (Wind River) 2026-03-03 18:19 ` Lukas Wunner 0 siblings, 1 reply; 6+ messages in thread From: Ionut Nechita (Wind River) @ 2026-03-03 8:09 UTC (permalink / raw) To: linux-pci, bhelgaas Cc: helgaas, sebott, schnelle, bblock, alifm, julianr, dtatulea, mani, ionut_n2001, sunlightlinux, linux-kernel, stable, Ionut Nechita (Wind River) After reverting commit 05703271c3cd ("PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV") and moving the lock to sriov_numvfs_store(), the path through driver .remove() (e.g. rmmod, or manual unbind) that calls pci_disable_sriov() directly remains unprotected against concurrent hotplug events. This affects any SR-IOV capable driver that calls pci_disable_sriov() from its .remove() callback (i40e, ice, mlx5, bnxt, etc.). On s390, platform-generated hot-unplug events for VFs can race with sriov_del_vfs() when a PF driver is being unloaded. The platform event handler takes pci_rescan_remove_lock, but sriov_del_vfs() does not, leading to double removal and list corruption. We cannot use a plain mutex_lock() here because sriov_del_vfs() may also be called from paths that already hold pci_rescan_remove_lock (e.g. remove_store -> pci_stop_and_remove_bus_device_locked, or sriov_numvfs_store with the lock taken by the previous patch). Using mutex_lock() in those cases would deadlock. Instead, introduce a pci_lock_rescan_remove_reentrant() helper that uses mutex_get_owner() to check if the current task already holds the lock: - If the lock is not held: acquires it and returns true, providing full serialization against concurrent hotplug events (including platform-generated events on s390). - If the lock is already held by the current task (reentrant call from remove_store or sriov_numvfs_store paths): returns false without re-acquiring, avoiding deadlock while the caller already provides the necessary serialization. - If the lock is held by another task (concurrent hotplug): blocks until the lock is released, then acquires it, providing complete serialization. This is the key improvement over a trylock approach. A matching pci_unlock_rescan_remove_reentrant() helper takes the return value of the lock function as argument, so callers don't need to open-code the conditional unlock. The "reentrant" naming is chosen to avoid confusion with existing mutex_lock_nested() which is a lockdep annotation concept, not actual reentrant locking. The declarations are placed in include/linux/pci.h alongside the existing pci_lock_rescan_remove()/pci_unlock_rescan_remove() declarations to maintain API consistency and allow use by external drivers if needed. Fixes: 18f9e9d150fc ("PCI/IOV: Factor out sriov_add_vfs()") Cc: stable@vger.kernel.org Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Tested-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Ionut Nechita <ionut_n2001@yahoo.com> Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com> --- Changes in v5: - Replaced local pci_rescan_remove_owner variable with mutex_get_owner() to check lock ownership, avoiding the need to modify pci_lock_rescan_remove()/pci_unlock_rescan_remove() (Manivannan Sadhasivam, Benjamin Block) - Rebased on linux-next (20260302) Changes in v4: - Rebased on linux-next (next-20260227) - Declared pci_rescan_remove_owner as const pointer (const struct task_struct *) to make clear it is not meant to modify the task (Benjamin Block) - Added Reviewed-by and Tested-by from Benjamin Block (IBM) Changes in v3: - Rebased on linux-next (next-20260225) - Added Tested-by from Dragos Tatulea (NVIDIA) - No code changes from v2 Changes in v2: - Renamed from pci_lock_rescan_remove_nested() to pci_lock_rescan_remove_reentrant() to avoid confusion with mutex_lock_nested() lockdep annotations (Benjamin Block) - Added pci_unlock_rescan_remove_reentrant(const bool locked) helper to avoid open-coding conditional unlock at each call site (Benjamin Block) - Moved declarations from drivers/pci/pci.h to include/linux/pci.h alongside existing lock/unlock declarations (Benjamin Block) - Simplified callers: removed negation of return value and manual conditional unlock in favor of the paired lock/unlock helpers drivers/pci/iov.c | 7 +++++++ drivers/pci/probe.c | 16 ++++++++++++++++ include/linux/pci.h | 2 ++ 3 files changed, 25 insertions(+) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 91ac4e37ecb9..adbe4ecc587c 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -629,19 +629,23 @@ static int sriov_add_vfs(struct pci_dev *dev, u16 num_vfs) { unsigned int i; int rc; + bool locked; if (dev->no_vf_scan) return 0; + locked = pci_lock_rescan_remove_reentrant(); for (i = 0; i < num_vfs; i++) { rc = pci_iov_add_virtfn(dev, i); if (rc) goto failed; } + pci_unlock_rescan_remove_reentrant(locked); return 0; failed: while (i--) pci_iov_remove_virtfn(dev, i); + pci_unlock_rescan_remove_reentrant(locked); return rc; } @@ -764,10 +768,13 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) static void sriov_del_vfs(struct pci_dev *dev) { struct pci_sriov *iov = dev->sriov; + bool locked; int i; + locked = pci_lock_rescan_remove_reentrant(); for (i = 0; i < iov->num_VFs; i++) pci_iov_remove_virtfn(dev, i); + pci_unlock_rescan_remove_reentrant(locked); } static void sriov_disable(struct pci_dev *dev) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index bccc7a4bdd79..86c8f593e0cd 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -3522,6 +3522,22 @@ void pci_unlock_rescan_remove(void) } EXPORT_SYMBOL_GPL(pci_unlock_rescan_remove); +bool pci_lock_rescan_remove_reentrant(void) +{ + if (mutex_get_owner(&pci_rescan_remove_lock) == (unsigned long)current) + return false; + pci_lock_rescan_remove(); + return true; +} +EXPORT_SYMBOL_GPL(pci_lock_rescan_remove_reentrant); + +void pci_unlock_rescan_remove_reentrant(const bool locked) +{ + if (locked) + pci_unlock_rescan_remove(); +} +EXPORT_SYMBOL_GPL(pci_unlock_rescan_remove_reentrant); + static int __init pci_sort_bf_cmp(const struct device *d_a, const struct device *d_b) { diff --git a/include/linux/pci.h b/include/linux/pci.h index 1c270f1d5123..080950f0bab3 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1535,6 +1535,8 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev); unsigned int pci_rescan_bus(struct pci_bus *bus); void pci_lock_rescan_remove(void); void pci_unlock_rescan_remove(void); +bool pci_lock_rescan_remove_reentrant(void); +void pci_unlock_rescan_remove_reentrant(const bool locked); /* Vital Product Data routines */ ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf); -- 2.53.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v5 1/1] PCI/IOV: Add reentrant locking in sriov_add_vfs/sriov_del_vfs for complete serialization 2026-03-03 8:09 ` [PATCH v5 1/1] PCI/IOV: Add reentrant locking in sriov_add_vfs/sriov_del_vfs for complete serialization Ionut Nechita (Wind River) @ 2026-03-03 18:19 ` Lukas Wunner 2026-03-03 19:28 ` Lukas Wunner 0 siblings, 1 reply; 6+ messages in thread From: Lukas Wunner @ 2026-03-03 18:19 UTC (permalink / raw) To: Ionut Nechita (Wind River) Cc: linux-pci, helgaas, sebott, schnelle, bblock, alifm, julianr, dtatulea, mani, ionut_n2001, sunlightlinux, linux-kernel On Tue, Mar 03, 2026 at 10:09:02AM +0200, Ionut Nechita (Wind River) wrote: > +++ b/drivers/pci/iov.c > @@ -629,19 +629,23 @@ static int sriov_add_vfs(struct pci_dev *dev, u16 num_vfs) > { > unsigned int i; > int rc; > + bool locked; > > if (dev->no_vf_scan) > return 0; > > + locked = pci_lock_rescan_remove_reentrant(); > for (i = 0; i < num_vfs; i++) { > rc = pci_iov_add_virtfn(dev, i); > if (rc) > goto failed; > } > + pci_unlock_rescan_remove_reentrant(locked); I think a nicer API would be to have a counter which is incremented by pci_lock_rescan_remove_reentrant() if owner is current, is decremented by pci_unlock_rescan_remove_reentrant() if owner is current and pci_unlock_rescan_remove_reentrant() unlocks only if it hits 0. No need for an atomic counter as this happens under the lock. Then you don't need this bool which leaks out of the API into the callers. Also, I would put this in the existing pci_lock_rescan_remove(), i.e. without introducing a new _reentrant variant, because these deadlocks exist elsewhere. They're known to happen on unplug in pciehp as well. Thanks, Lukas ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v5 1/1] PCI/IOV: Add reentrant locking in sriov_add_vfs/sriov_del_vfs for complete serialization 2026-03-03 18:19 ` Lukas Wunner @ 2026-03-03 19:28 ` Lukas Wunner 2026-03-03 19:56 ` Niklas Schnelle 0 siblings, 1 reply; 6+ messages in thread From: Lukas Wunner @ 2026-03-03 19:28 UTC (permalink / raw) To: Ionut Nechita (Wind River) Cc: linux-pci, helgaas, sebott, schnelle, bblock, alifm, julianr, dtatulea, mani, ionut_n2001, sunlightlinux, linux-kernel On Tue, Mar 03, 2026 at 07:19:31PM +0100, Lukas Wunner wrote: > Also, I would put this in the existing pci_lock_rescan_remove(), > i.e. without introducing a new _reentrant variant, because these > deadlocks exist elsewhere. They're known to happen on unplug in > pciehp as well. Actually, scratch this particular comment. The deadlock in pciehp is actually an AB-BA deadlock, not recursive acquisition. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v5 1/1] PCI/IOV: Add reentrant locking in sriov_add_vfs/sriov_del_vfs for complete serialization 2026-03-03 19:28 ` Lukas Wunner @ 2026-03-03 19:56 ` Niklas Schnelle 2026-03-03 20:02 ` Keith Busch 0 siblings, 1 reply; 6+ messages in thread From: Niklas Schnelle @ 2026-03-03 19:56 UTC (permalink / raw) To: Lukas Wunner, Ionut Nechita (Wind River) Cc: linux-pci, helgaas, sebott, bblock, alifm, julianr, dtatulea, mani, ionut_n2001, sunlightlinux, linux-kernel On Tue, 2026-03-03 at 20:28 +0100, Lukas Wunner wrote: > On Tue, Mar 03, 2026 at 07:19:31PM +0100, Lukas Wunner wrote: > > Also, I would put this in the existing pci_lock_rescan_remove(), > > i.e. without introducing a new _reentrant variant, because these > > deadlocks exist elsewhere. They're known to happen on unplug in > > pciehp as well. > > Actually, scratch this particular comment. The deadlock in pciehp > is actually an AB-BA deadlock, not recursive acquisition. Hi Lukas, Do you have a link to a report or a lockdep splat for this issue? We're currently working on another series restructuring the use of the rescan/remove lock in s390 PCI code to handle another deadlock. And while most of that is s390 specific code this may give helpful insights. Also definitely underlines how tricky this lock is. Thanks, Niklas ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v5 1/1] PCI/IOV: Add reentrant locking in sriov_add_vfs/sriov_del_vfs for complete serialization 2026-03-03 19:56 ` Niklas Schnelle @ 2026-03-03 20:02 ` Keith Busch 0 siblings, 0 replies; 6+ messages in thread From: Keith Busch @ 2026-03-03 20:02 UTC (permalink / raw) To: Niklas Schnelle Cc: Lukas Wunner, Ionut Nechita (Wind River), linux-pci, helgaas, sebott, bblock, alifm, julianr, dtatulea, mani, ionut_n2001, sunlightlinux, linux-kernel On Tue, Mar 03, 2026 at 08:56:24PM +0100, Niklas Schnelle wrote: > On Tue, 2026-03-03 at 20:28 +0100, Lukas Wunner wrote: > > Actually, scratch this particular comment. The deadlock in pciehp > > is actually an AB-BA deadlock, not recursive acquisition. > > Hi Lukas, > > Do you have a link to a report or a lockdep splat for this issue? https://lore.kernel.org/all/4c882e25194ba8282b78fe963fec8faae7cf23eb.1529173804.git.lukas@wunner.de/ ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-03 20:02 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-03 8:09 [PATCH v5 0/1] PCI/IOV: Add reentrant locking in sriov_add_vfs/sriov_del_vfs Ionut Nechita (Wind River) 2026-03-03 8:09 ` [PATCH v5 1/1] PCI/IOV: Add reentrant locking in sriov_add_vfs/sriov_del_vfs for complete serialization Ionut Nechita (Wind River) 2026-03-03 18:19 ` Lukas Wunner 2026-03-03 19:28 ` Lukas Wunner 2026-03-03 19:56 ` Niklas Schnelle 2026-03-03 20:02 ` Keith Busch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox