* [PATCH 0/2] PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV
@ 2025-08-26 8:52 Niklas Schnelle
2025-08-26 8:52 ` [PATCH 1/2] " Niklas Schnelle
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Niklas Schnelle @ 2025-08-26 8:52 UTC (permalink / raw)
To: Bjorn Helgaas, Lukas Wunner
Cc: Keith Busch, Gerd Bayer, Matthew Rosato, Benjamin Block,
Halil Pasic, Farhan Ali, Julian Ruess, Heiko Carstens,
Vasily Gorbik, Alexander Gordeev, linux-pci, linux-kernel,
Niklas Schnelle
Hi Bjorn, Hi Lukas,
This series fixes missing PCI rescan-remove locking in sriov_disable()
and sriov_enable() the former of which was observed to cause a double
remove / list corruption on s390. The first patch is the fix itself and
gives more details while the second patch is an optional proposal to add
a lockdep assertion to pci_stop_and_remove_bus_device() to catch missing
rescan-remove locking more easily in the future. If applied without the
first patch disabling SR-IOV via "echo 0 > /sys/bus/pci/devices/<dev>
/sriov_numvfs" triggers the lockdep assertion. I haven't found an easy
way to trigger the assertion in the sriov_enable() case but I checked
callers.
Also since the sriov_add_vfs() path is not excercised on s390 due to
pdev->no_vf_scan I did some basic testing on an x86 test system with an
SR-IOV capable ConnectX-6 DX NIC.
Thanks,
Niklas
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
Niklas Schnelle (2):
PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV
PCI: Add lockdep assertion in pci_stop_and_remove_bus_device()
drivers/pci/iov.c | 5 +++++
drivers/pci/pci.h | 2 ++
drivers/pci/probe.c | 2 +-
drivers/pci/remove.c | 1 +
4 files changed, 9 insertions(+), 1 deletion(-)
---
base-commit: 1b237f190eb3d36f52dffe07a40b5eb210280e00
change-id: 20250821-pci_fix_sriov_disable-552f29c822dd
Best regards,
--
Niklas Schnelle
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 1/2] PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV 2025-08-26 8:52 [PATCH 0/2] PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV Niklas Schnelle @ 2025-08-26 8:52 ` Niklas Schnelle 2025-09-24 17:57 ` Farhan Ali 2025-09-26 9:24 ` Julian Ruess 2025-08-26 8:52 ` [PATCH 2/2] PCI: Add lockdep assertion in pci_stop_and_remove_bus_device() Niklas Schnelle 2025-09-26 21:02 ` [PATCH 0/2] PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV Bjorn Helgaas 2 siblings, 2 replies; 12+ messages in thread From: Niklas Schnelle @ 2025-08-26 8:52 UTC (permalink / raw) To: Bjorn Helgaas, Lukas Wunner Cc: Keith Busch, Gerd Bayer, Matthew Rosato, Benjamin Block, Halil Pasic, Farhan Ali, Julian Ruess, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, linux-pci, linux-kernel, Niklas Schnelle Before disabling SR-IOV via config space accesses to the parent PF, sriov_disable() first removes the PCI devices representing the VFs. Since commit 9d16947b7583 ("PCI: Add global pci_lock_rescan_remove()") such removal operations are serialized against concurrent remove and rescan using the pci_rescan_remove_lock. No such locking was ever added in sriov_disable() however. In particular when commit 18f9e9d150fc ("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device removal into sriov_del_vfs() there was still no locking around the pci_iov_remove_virtfn() calls. On s390 the lack of serialization in sriov_disable() may cause double remove and list corruption with the below (amended) trace being observed: PSW: 0704c00180000000 0000000c914e4b38 (klist_put+56) GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001 00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480 0000000000000001 0000000000000000 0000000000000000 0000000180692828 00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8 #0 [3800313fb20] device_del at c9158ad5c #1 [3800313fb88] pci_remove_bus_device at c915105ba #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198 #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0 #4 [3800313fc60] zpci_bus_remove_device at c90fb6104 #5 [3800313fca0] __zpci_event_availability at c90fb3dca #6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2 #7 [3800313fd60] crw_collect_info at c91905822 #8 [3800313fe10] kthread at c90feb390 #9 [3800313fe68] __ret_from_fork at c90f6aa64 #10 [3800313fe98] ret_from_fork at c9194f3f2. This is because in addition to sriov_disable() removing the VFs, the platform also generates hot-unplug events for the VFs. This being the reverse operation to the hotplug events generated by sriov_enable() and handled via pdev->no_vf_scan. And while the event processing takes pci_rescan_remove_lock and checks whether the struct pci_dev still exists, the lack of synchronization makes this checking racy. Other races may also be possible of course though given that this lack of locking persisted so long obversable races seem very rare. Even on s390 the list corruption was only observed with certain devices since the platform events are only triggered by the config accesses that come after the removal, so as long as the removal finnished synchronously they would not race. Either way the locking is missing so fix this by adding it to the sriov_del_vfs() helper. Just lik PCI rescan-remove locking is also missing in sriov_add_vfs() including for the error case where pci_stop_ad_remove_bus_device() is called without the PCI rescan-remove lock being held. Even in the non error case adding new PCI devices and busses should be serialized via the PCI rescan-remove lock. Add the necessary locking. Cc: stable@vger.kernel.org Fixes: 18f9e9d150fc ("PCI/IOV: Factor out sriov_add_vfs()") Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> --- drivers/pci/iov.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index ac4375954c9479b5f4a0e666b5215094fdaeefc2..77dee43b785838d215b58db2d22088e9346e0583 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -629,15 +629,18 @@ static int sriov_add_vfs(struct pci_dev *dev, u16 num_vfs) if (dev->no_vf_scan) return 0; + pci_lock_rescan_remove(); for (i = 0; i < num_vfs; i++) { rc = pci_iov_add_virtfn(dev, i); if (rc) goto failed; } + pci_unlock_rescan_remove(); return 0; failed: while (i--) pci_iov_remove_virtfn(dev, i); + pci_unlock_rescan_remove(); return rc; } @@ -762,8 +765,10 @@ static void sriov_del_vfs(struct pci_dev *dev) struct pci_sriov *iov = dev->sriov; int i; + pci_lock_rescan_remove(); for (i = 0; i < iov->num_VFs; i++) pci_iov_remove_virtfn(dev, i); + pci_unlock_rescan_remove(); } static void sriov_disable(struct pci_dev *dev) -- 2.48.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV 2025-08-26 8:52 ` [PATCH 1/2] " Niklas Schnelle @ 2025-09-24 17:57 ` Farhan Ali 2025-09-25 7:48 ` Niklas Schnelle 2025-09-26 9:24 ` Julian Ruess 1 sibling, 1 reply; 12+ messages in thread From: Farhan Ali @ 2025-09-24 17:57 UTC (permalink / raw) To: Niklas Schnelle, Bjorn Helgaas, Lukas Wunner Cc: Keith Busch, Gerd Bayer, Matthew Rosato, Benjamin Block, Halil Pasic, Julian Ruess, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, linux-pci, linux-kernel On 8/26/2025 1:52 AM, Niklas Schnelle wrote: > Before disabling SR-IOV via config space accesses to the parent PF, > sriov_disable() first removes the PCI devices representing the VFs. > > Since commit 9d16947b7583 ("PCI: Add global pci_lock_rescan_remove()") > such removal operations are serialized against concurrent remove and > rescan using the pci_rescan_remove_lock. No such locking was ever added > in sriov_disable() however. In particular when commit 18f9e9d150fc > ("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device > removal into sriov_del_vfs() there was still no locking around the > pci_iov_remove_virtfn() calls. > > On s390 the lack of serialization in sriov_disable() may cause double > remove and list corruption with the below (amended) trace being observed: > > PSW: 0704c00180000000 0000000c914e4b38 (klist_put+56) > GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001 > 00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480 > 0000000000000001 0000000000000000 0000000000000000 0000000180692828 > 00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8 > #0 [3800313fb20] device_del at c9158ad5c > #1 [3800313fb88] pci_remove_bus_device at c915105ba > #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198 > #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0 > #4 [3800313fc60] zpci_bus_remove_device at c90fb6104 > #5 [3800313fca0] __zpci_event_availability at c90fb3dca > #6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2 > #7 [3800313fd60] crw_collect_info at c91905822 > #8 [3800313fe10] kthread at c90feb390 > #9 [3800313fe68] __ret_from_fork at c90f6aa64 > #10 [3800313fe98] ret_from_fork at c9194f3f2. > > This is because in addition to sriov_disable() removing the VFs, the > platform also generates hot-unplug events for the VFs. This being > the reverse operation to the hotplug events generated by sriov_enable() > and handled via pdev->no_vf_scan. And while the event processing takes > pci_rescan_remove_lock and checks whether the struct pci_dev still > exists, the lack of synchronization makes this checking racy. > > Other races may also be possible of course though given that this lack > of locking persisted so long obversable races seem very rare. Even on > s390 the list corruption was only observed with certain devices since > the platform events are only triggered by the config accesses that come > after the removal, so as long as the removal finnished synchronously > they would not race. Either way the locking is missing so fix this by > adding it to the sriov_del_vfs() helper. > > Just lik PCI rescan-remove locking is also missing in sriov_add_vfs() > including for the error case where pci_stop_ad_remove_bus_device() is > called without the PCI rescan-remove lock being held. Even in the non > error case adding new PCI devices and busses should be serialized via > the PCI rescan-remove lock. Add the necessary locking. > > Cc: stable@vger.kernel.org > Fixes: 18f9e9d150fc ("PCI/IOV: Factor out sriov_add_vfs()") > Reviewed-by: Benjamin Block <bblock@linux.ibm.com> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > --- > drivers/pci/iov.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > index ac4375954c9479b5f4a0e666b5215094fdaeefc2..77dee43b785838d215b58db2d22088e9346e0583 100644 > --- a/drivers/pci/iov.c > +++ b/drivers/pci/iov.c > @@ -629,15 +629,18 @@ static int sriov_add_vfs(struct pci_dev *dev, u16 num_vfs) > if (dev->no_vf_scan) > return 0; > > + pci_lock_rescan_remove(); > for (i = 0; i < num_vfs; i++) { > rc = pci_iov_add_virtfn(dev, i); Should we move the lock/unlock to pci_iov_add_virtfn? As that's where the device is added to the bus? Similarly move the locking/unlocking to pci_iov_remove_virtfn? Thanks Farhan > if (rc) > goto failed; > } > + pci_unlock_rescan_remove(); > return 0; > failed: > while (i--) > pci_iov_remove_virtfn(dev, i); > + pci_unlock_rescan_remove(); > > return rc; > } > @@ -762,8 +765,10 @@ static void sriov_del_vfs(struct pci_dev *dev) > struct pci_sriov *iov = dev->sriov; > int i; > > + pci_lock_rescan_remove(); > for (i = 0; i < iov->num_VFs; i++) > pci_iov_remove_virtfn(dev, i); > + pci_unlock_rescan_remove(); > } > > static void sriov_disable(struct pci_dev *dev) > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV 2025-09-24 17:57 ` Farhan Ali @ 2025-09-25 7:48 ` Niklas Schnelle 2025-09-25 22:05 ` Farhan Ali 0 siblings, 1 reply; 12+ messages in thread From: Niklas Schnelle @ 2025-09-25 7:48 UTC (permalink / raw) To: Farhan Ali, Bjorn Helgaas, Lukas Wunner Cc: Keith Busch, Gerd Bayer, Matthew Rosato, Benjamin Block, Halil Pasic, Julian Ruess, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, linux-pci, linux-kernel On Wed, 2025-09-24 at 10:57 -0700, Farhan Ali wrote: > On 8/26/2025 1:52 AM, Niklas Schnelle wrote: > > Before disabling SR-IOV via config space accesses to the parent PF, > > sriov_disable() first removes the PCI devices representing the VFs. > > > > Since commit 9d16947b7583 ("PCI: Add global pci_lock_rescan_remove()") > > such removal operations are serialized against concurrent remove and > > rescan using the pci_rescan_remove_lock. No such locking was ever added > > in sriov_disable() however. In particular when commit 18f9e9d150fc > > ("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device > > removal into sriov_del_vfs() there was still no locking around the > > pci_iov_remove_virtfn() calls. > > > > On s390 the lack of serialization in sriov_disable() may cause double > > remove and list corruption with the below (amended) trace being observed: > > > > PSW: 0704c00180000000 0000000c914e4b38 (klist_put+56) > > GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001 > > 00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480 > > 0000000000000001 0000000000000000 0000000000000000 0000000180692828 > > 00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8 > > #0 [3800313fb20] device_del at c9158ad5c > > #1 [3800313fb88] pci_remove_bus_device at c915105ba > > #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198 > > #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0 > > #4 [3800313fc60] zpci_bus_remove_device at c90fb6104 > > #5 [3800313fca0] __zpci_event_availability at c90fb3dca > > #6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2 > > #7 [3800313fd60] crw_collect_info at c91905822 > > #8 [3800313fe10] kthread at c90feb390 > > #9 [3800313fe68] __ret_from_fork at c90f6aa64 > > #10 [3800313fe98] ret_from_fork at c9194f3f2. > > > > This is because in addition to sriov_disable() removing the VFs, the > > platform also generates hot-unplug events for the VFs. This being > > the reverse operation to the hotplug events generated by sriov_enable() > > and handled via pdev->no_vf_scan. And while the event processing takes > > pci_rescan_remove_lock and checks whether the struct pci_dev still > > exists, the lack of synchronization makes this checking racy. > > > > Other races may also be possible of course though given that this lack > > of locking persisted so long obversable races seem very rare. Even on > > s390 the list corruption was only observed with certain devices since > > the platform events are only triggered by the config accesses that come > > after the removal, so as long as the removal finnished synchronously > > they would not race. Either way the locking is missing so fix this by > > adding it to the sriov_del_vfs() helper. > > > > Just lik PCI rescan-remove locking is also missing in sriov_add_vfs() > > including for the error case where pci_stop_ad_remove_bus_device() is > > called without the PCI rescan-remove lock being held. Even in the non > > error case adding new PCI devices and busses should be serialized via > > the PCI rescan-remove lock. Add the necessary locking. > > > > Cc: stable@vger.kernel.org > > Fixes: 18f9e9d150fc ("PCI/IOV: Factor out sriov_add_vfs()") > > Reviewed-by: Benjamin Block <bblock@linux.ibm.com> > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > > --- > > drivers/pci/iov.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > > index ac4375954c9479b5f4a0e666b5215094fdaeefc2..77dee43b785838d215b58db2d22088e9346e0583 100644 > > --- a/drivers/pci/iov.c > > +++ b/drivers/pci/iov.c > > @@ -629,15 +629,18 @@ static int sriov_add_vfs(struct pci_dev *dev, u16 num_vfs) > > if (dev->no_vf_scan) > > return 0; > > > > + pci_lock_rescan_remove(); > > for (i = 0; i < num_vfs; i++) { > > rc = pci_iov_add_virtfn(dev, i); > > Should we move the lock/unlock to pci_iov_add_virtfn? As that's where > the device is added to the bus? Similarly move the locking/unlocking to > pci_iov_remove_virtfn? > > Thanks > Farhan > > I contemplated this as well. Most of the existing uses of pci_lock/unlock_rescan_remove() are relatively coarse grained covering e.g. the scanning of a whole bus. So I tried to keep this in line with that such that all the VFs are added in a single critical section. Thanks, Niklas ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV 2025-09-25 7:48 ` Niklas Schnelle @ 2025-09-25 22:05 ` Farhan Ali 0 siblings, 0 replies; 12+ messages in thread From: Farhan Ali @ 2025-09-25 22:05 UTC (permalink / raw) To: Niklas Schnelle, Bjorn Helgaas, Lukas Wunner Cc: Keith Busch, Gerd Bayer, Matthew Rosato, Benjamin Block, Halil Pasic, Julian Ruess, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, linux-pci, linux-kernel On 9/25/2025 12:48 AM, Niklas Schnelle wrote: > On Wed, 2025-09-24 at 10:57 -0700, Farhan Ali wrote: >> On 8/26/2025 1:52 AM, Niklas Schnelle wrote: >>> Before disabling SR-IOV via config space accesses to the parent PF, >>> sriov_disable() first removes the PCI devices representing the VFs. >>> >>> Since commit 9d16947b7583 ("PCI: Add global pci_lock_rescan_remove()") >>> such removal operations are serialized against concurrent remove and >>> rescan using the pci_rescan_remove_lock. No such locking was ever added >>> in sriov_disable() however. In particular when commit 18f9e9d150fc >>> ("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device >>> removal into sriov_del_vfs() there was still no locking around the >>> pci_iov_remove_virtfn() calls. >>> >>> On s390 the lack of serialization in sriov_disable() may cause double >>> remove and list corruption with the below (amended) trace being observed: >>> >>> PSW: 0704c00180000000 0000000c914e4b38 (klist_put+56) >>> GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001 >>> 00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480 >>> 0000000000000001 0000000000000000 0000000000000000 0000000180692828 >>> 00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8 >>> #0 [3800313fb20] device_del at c9158ad5c >>> #1 [3800313fb88] pci_remove_bus_device at c915105ba >>> #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198 >>> #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0 >>> #4 [3800313fc60] zpci_bus_remove_device at c90fb6104 >>> #5 [3800313fca0] __zpci_event_availability at c90fb3dca >>> #6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2 >>> #7 [3800313fd60] crw_collect_info at c91905822 >>> #8 [3800313fe10] kthread at c90feb390 >>> #9 [3800313fe68] __ret_from_fork at c90f6aa64 >>> #10 [3800313fe98] ret_from_fork at c9194f3f2. >>> >>> This is because in addition to sriov_disable() removing the VFs, the >>> platform also generates hot-unplug events for the VFs. This being >>> the reverse operation to the hotplug events generated by sriov_enable() >>> and handled via pdev->no_vf_scan. And while the event processing takes >>> pci_rescan_remove_lock and checks whether the struct pci_dev still >>> exists, the lack of synchronization makes this checking racy. >>> >>> Other races may also be possible of course though given that this lack >>> of locking persisted so long obversable races seem very rare. Even on >>> s390 the list corruption was only observed with certain devices since >>> the platform events are only triggered by the config accesses that come >>> after the removal, so as long as the removal finnished synchronously >>> they would not race. Either way the locking is missing so fix this by >>> adding it to the sriov_del_vfs() helper. >>> >>> Just lik PCI rescan-remove locking is also missing in sriov_add_vfs() >>> including for the error case where pci_stop_ad_remove_bus_device() is >>> called without the PCI rescan-remove lock being held. Even in the non >>> error case adding new PCI devices and busses should be serialized via >>> the PCI rescan-remove lock. Add the necessary locking. >>> >>> Cc: stable@vger.kernel.org >>> Fixes: 18f9e9d150fc ("PCI/IOV: Factor out sriov_add_vfs()") >>> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> >>> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> >>> --- >>> drivers/pci/iov.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c >>> index ac4375954c9479b5f4a0e666b5215094fdaeefc2..77dee43b785838d215b58db2d22088e9346e0583 100644 >>> --- a/drivers/pci/iov.c >>> +++ b/drivers/pci/iov.c >>> @@ -629,15 +629,18 @@ static int sriov_add_vfs(struct pci_dev *dev, u16 num_vfs) >>> if (dev->no_vf_scan) >>> return 0; >>> >>> + pci_lock_rescan_remove(); >>> for (i = 0; i < num_vfs; i++) { >>> rc = pci_iov_add_virtfn(dev, i); >> Should we move the lock/unlock to pci_iov_add_virtfn? As that's where >> the device is added to the bus? Similarly move the locking/unlocking to >> pci_iov_remove_virtfn? >> >> Thanks >> Farhan >> >> > I contemplated this as well. Most of the existing uses of > pci_lock/unlock_rescan_remove() are relatively coarse grained covering > e.g. the scanning of a whole bus. So I tried to keep this in line with > that such that all the VFs are added in a single critical section. > > Thanks, > Niklas Makes sense, the patch LGTM. Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Thanks Farhan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV 2025-08-26 8:52 ` [PATCH 1/2] " Niklas Schnelle 2025-09-24 17:57 ` Farhan Ali @ 2025-09-26 9:24 ` Julian Ruess 1 sibling, 0 replies; 12+ messages in thread From: Julian Ruess @ 2025-09-26 9:24 UTC (permalink / raw) To: Niklas Schnelle, Bjorn Helgaas, Lukas Wunner Cc: Keith Busch, Gerd Bayer, Matthew Rosato, Benjamin Block, Halil Pasic, Farhan Ali, Julian Ruess, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, linux-pci, linux-kernel On Tue Aug 26, 2025 at 10:52 AM CEST, Niklas Schnelle wrote: > Before disabling SR-IOV via config space accesses to the parent PF, > sriov_disable() first removes the PCI devices representing the VFs. > > Since commit 9d16947b7583 ("PCI: Add global pci_lock_rescan_remove()") > such removal operations are serialized against concurrent remove and > rescan using the pci_rescan_remove_lock. No such locking was ever added > in sriov_disable() however. In particular when commit 18f9e9d150fc > ("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device > removal into sriov_del_vfs() there was still no locking around the > pci_iov_remove_virtfn() calls. > > On s390 the lack of serialization in sriov_disable() may cause double > remove and list corruption with the below (amended) trace being observed: > > PSW: 0704c00180000000 0000000c914e4b38 (klist_put+56) > GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001 > 00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480 > 0000000000000001 0000000000000000 0000000000000000 0000000180692828 > 00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8 > #0 [3800313fb20] device_del at c9158ad5c > #1 [3800313fb88] pci_remove_bus_device at c915105ba > #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198 > #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0 > #4 [3800313fc60] zpci_bus_remove_device at c90fb6104 > #5 [3800313fca0] __zpci_event_availability at c90fb3dca > #6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2 > #7 [3800313fd60] crw_collect_info at c91905822 > #8 [3800313fe10] kthread at c90feb390 > #9 [3800313fe68] __ret_from_fork at c90f6aa64 > #10 [3800313fe98] ret_from_fork at c9194f3f2. > > This is because in addition to sriov_disable() removing the VFs, the > platform also generates hot-unplug events for the VFs. This being > the reverse operation to the hotplug events generated by sriov_enable() > and handled via pdev->no_vf_scan. And while the event processing takes > pci_rescan_remove_lock and checks whether the struct pci_dev still > exists, the lack of synchronization makes this checking racy. > > Other races may also be possible of course though given that this lack > of locking persisted so long obversable races seem very rare. Even on > s390 the list corruption was only observed with certain devices since > the platform events are only triggered by the config accesses that come > after the removal, so as long as the removal finnished synchronously > they would not race. Either way the locking is missing so fix this by > adding it to the sriov_del_vfs() helper. > > Just lik PCI rescan-remove locking is also missing in sriov_add_vfs() > including for the error case where pci_stop_ad_remove_bus_device() is > called without the PCI rescan-remove lock being held. Even in the non > error case adding new PCI devices and busses should be serialized via > the PCI rescan-remove lock. Add the necessary locking. > > Cc: stable@vger.kernel.org > Fixes: 18f9e9d150fc ("PCI/IOV: Factor out sriov_add_vfs()") > Reviewed-by: Benjamin Block <bblock@linux.ibm.com> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > --- > drivers/pci/iov.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > index ac4375954c9479b5f4a0e666b5215094fdaeefc2..77dee43b785838d215b58db2d22088e9346e0583 100644 > --- a/drivers/pci/iov.c > +++ b/drivers/pci/iov.c > @@ -629,15 +629,18 @@ static int sriov_add_vfs(struct pci_dev *dev, u16 num_vfs) > if (dev->no_vf_scan) > return 0; > > + pci_lock_rescan_remove(); > for (i = 0; i < num_vfs; i++) { > rc = pci_iov_add_virtfn(dev, i); > if (rc) > goto failed; > } > + pci_unlock_rescan_remove(); > return 0; > failed: > while (i--) > pci_iov_remove_virtfn(dev, i); > + pci_unlock_rescan_remove(); > > return rc; > } > @@ -762,8 +765,10 @@ static void sriov_del_vfs(struct pci_dev *dev) > struct pci_sriov *iov = dev->sriov; > int i; > > + pci_lock_rescan_remove(); > for (i = 0; i < iov->num_VFs; i++) > pci_iov_remove_virtfn(dev, i); > + pci_unlock_rescan_remove(); > } > > static void sriov_disable(struct pci_dev *dev) Feel free to add my Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Thanks, Julian ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/2] PCI: Add lockdep assertion in pci_stop_and_remove_bus_device() 2025-08-26 8:52 [PATCH 0/2] PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV Niklas Schnelle 2025-08-26 8:52 ` [PATCH 1/2] " Niklas Schnelle @ 2025-08-26 8:52 ` Niklas Schnelle 2025-09-12 14:48 ` Gerd Bayer ` (2 more replies) 2025-09-26 21:02 ` [PATCH 0/2] PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV Bjorn Helgaas 2 siblings, 3 replies; 12+ messages in thread From: Niklas Schnelle @ 2025-08-26 8:52 UTC (permalink / raw) To: Bjorn Helgaas, Lukas Wunner Cc: Keith Busch, Gerd Bayer, Matthew Rosato, Benjamin Block, Halil Pasic, Farhan Ali, Julian Ruess, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, linux-pci, linux-kernel, Niklas Schnelle Removing a PCI devices requires holding pci_rescan_remove_lock. Prompted by this being missed in sriov_disable() and going unnoticed since its inception add a lockdep assert so this doesn't get missed again in the future. Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> --- drivers/pci/pci.h | 2 ++ drivers/pci/probe.c | 2 +- drivers/pci/remove.c | 1 + 3 files changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 34f65d69662e9f61f0c489ec58de2ce17d21c0c6..1ad2e3ab147f3b2c42b3257e4f366fc5e424ede3 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -84,6 +84,8 @@ struct pcie_tlp_log; extern const unsigned char pcie_link_speed[]; extern bool pci_early_dump; +extern struct mutex pci_rescan_remove_lock; + bool pcie_cap_has_lnkctl(const struct pci_dev *dev); bool pcie_cap_has_lnkctl2(const struct pci_dev *dev); bool pcie_cap_has_rtctl(const struct pci_dev *dev); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index f41128f91ca76ab014ad669ae84a53032c7c6b6b..2b35bb39ab0366bbf86b43e721811575b9fbcefb 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -3469,7 +3469,7 @@ EXPORT_SYMBOL_GPL(pci_rescan_bus); * pci_rescan_bus(), pci_rescan_bus_bridge_resize() and PCI device removal * routines should always be executed under this mutex. */ -static DEFINE_MUTEX(pci_rescan_remove_lock); +DEFINE_MUTEX(pci_rescan_remove_lock); void pci_lock_rescan_remove(void) { diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c index 445afdfa6498edc88f1ef89df279af1419025495..0b9a609392cecba36a818bc496a0af64061c259a 100644 --- a/drivers/pci/remove.c +++ b/drivers/pci/remove.c @@ -138,6 +138,7 @@ static void pci_remove_bus_device(struct pci_dev *dev) */ void pci_stop_and_remove_bus_device(struct pci_dev *dev) { + lockdep_assert_held(&pci_rescan_remove_lock); pci_stop_bus_device(dev); pci_remove_bus_device(dev); } -- 2.48.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] PCI: Add lockdep assertion in pci_stop_and_remove_bus_device() 2025-08-26 8:52 ` [PATCH 2/2] PCI: Add lockdep assertion in pci_stop_and_remove_bus_device() Niklas Schnelle @ 2025-09-12 14:48 ` Gerd Bayer 2025-09-24 18:06 ` Farhan Ali 2025-09-26 9:25 ` Julian Ruess 2 siblings, 0 replies; 12+ messages in thread From: Gerd Bayer @ 2025-09-12 14:48 UTC (permalink / raw) To: Niklas Schnelle, Bjorn Helgaas, Lukas Wunner Cc: Keith Busch, Matthew Rosato, Benjamin Block, Halil Pasic, Farhan Ali, Julian Ruess, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, linux-pci, linux-kernel On Tue, 2025-08-26 at 10:52 +0200, Niklas Schnelle wrote: > Removing a PCI devices requires holding pci_rescan_remove_lock. Prompted > by this being missed in sriov_disable() and going unnoticed since its > inception add a lockdep assert so this doesn't get missed again in the > future. > > Reviewed-by: Benjamin Block <bblock@linux.ibm.com> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > --- > drivers/pci/pci.h | 2 ++ > drivers/pci/probe.c | 2 +- > drivers/pci/remove.c | 1 + > 3 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 34f65d69662e9f61f0c489ec58de2ce17d21c0c6..1ad2e3ab147f3b2c42b3257e4f366fc5e424ede3 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -84,6 +84,8 @@ struct pcie_tlp_log; > extern const unsigned char pcie_link_speed[]; > extern bool pci_early_dump; > > +extern struct mutex pci_rescan_remove_lock; > + > bool pcie_cap_has_lnkctl(const struct pci_dev *dev); > bool pcie_cap_has_lnkctl2(const struct pci_dev *dev); > bool pcie_cap_has_rtctl(const struct pci_dev *dev); > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index f41128f91ca76ab014ad669ae84a53032c7c6b6b..2b35bb39ab0366bbf86b43e721811575b9fbcefb 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -3469,7 +3469,7 @@ EXPORT_SYMBOL_GPL(pci_rescan_bus); > * pci_rescan_bus(), pci_rescan_bus_bridge_resize() and PCI device removal > * routines should always be executed under this mutex. > */ > -static DEFINE_MUTEX(pci_rescan_remove_lock); > +DEFINE_MUTEX(pci_rescan_remove_lock); > > void pci_lock_rescan_remove(void) > { > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c > index 445afdfa6498edc88f1ef89df279af1419025495..0b9a609392cecba36a818bc496a0af64061c259a 100644 > --- a/drivers/pci/remove.c > +++ b/drivers/pci/remove.c > @@ -138,6 +138,7 @@ static void pci_remove_bus_device(struct pci_dev *dev) > */ > void pci_stop_and_remove_bus_device(struct pci_dev *dev) > { > + lockdep_assert_held(&pci_rescan_remove_lock); > pci_stop_bus_device(dev); > pci_remove_bus_device(dev); > } I'm totally in favor of adding this lockdep assertion, even if this means that the mutex pci_rescan_remove_lock needs to be externalized from drivers/pci/probe.c. However, I was surprised that you didn't add the assertion to the _locked() variant until I realized that here the naming of _locked vs. not _locked variants of pci_stop_and_remove_bus_device() is just the opposite to the naming in driver/pci/pci.c: There _locked implies that the necessary lock is already held on routine entry. But this change in semantics was already introduced with commit 9d16947b7583 ("PCI: Add global pci_lock_rescan_remove()"). Looks like aligning the naming to the convention in driver/pci/pci.c would touch quite a bit of code - but so does the introduction of this lockdep assertion... Sigh, Gerd ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] PCI: Add lockdep assertion in pci_stop_and_remove_bus_device() 2025-08-26 8:52 ` [PATCH 2/2] PCI: Add lockdep assertion in pci_stop_and_remove_bus_device() Niklas Schnelle 2025-09-12 14:48 ` Gerd Bayer @ 2025-09-24 18:06 ` Farhan Ali 2025-09-25 7:25 ` Niklas Schnelle 2025-09-26 9:25 ` Julian Ruess 2 siblings, 1 reply; 12+ messages in thread From: Farhan Ali @ 2025-09-24 18:06 UTC (permalink / raw) To: Niklas Schnelle, Bjorn Helgaas, Lukas Wunner Cc: Keith Busch, Gerd Bayer, Matthew Rosato, Benjamin Block, Halil Pasic, Julian Ruess, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, linux-pci, linux-kernel On 8/26/2025 1:52 AM, Niklas Schnelle wrote: > Removing a PCI devices requires holding pci_rescan_remove_lock. Prompted > by this being missed in sriov_disable() and going unnoticed since its > inception add a lockdep assert so this doesn't get missed again in the > future. > > Reviewed-by: Benjamin Block <bblock@linux.ibm.com> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > --- > drivers/pci/pci.h | 2 ++ > drivers/pci/probe.c | 2 +- > drivers/pci/remove.c | 1 + > 3 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 34f65d69662e9f61f0c489ec58de2ce17d21c0c6..1ad2e3ab147f3b2c42b3257e4f366fc5e424ede3 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -84,6 +84,8 @@ struct pcie_tlp_log; > extern const unsigned char pcie_link_speed[]; > extern bool pci_early_dump; > > +extern struct mutex pci_rescan_remove_lock; > + > bool pcie_cap_has_lnkctl(const struct pci_dev *dev); > bool pcie_cap_has_lnkctl2(const struct pci_dev *dev); > bool pcie_cap_has_rtctl(const struct pci_dev *dev); > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index f41128f91ca76ab014ad669ae84a53032c7c6b6b..2b35bb39ab0366bbf86b43e721811575b9fbcefb 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -3469,7 +3469,7 @@ EXPORT_SYMBOL_GPL(pci_rescan_bus); > * pci_rescan_bus(), pci_rescan_bus_bridge_resize() and PCI device removal > * routines should always be executed under this mutex. > */ > -static DEFINE_MUTEX(pci_rescan_remove_lock); > +DEFINE_MUTEX(pci_rescan_remove_lock); > > void pci_lock_rescan_remove(void) > { > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c > index 445afdfa6498edc88f1ef89df279af1419025495..0b9a609392cecba36a818bc496a0af64061c259a 100644 > --- a/drivers/pci/remove.c > +++ b/drivers/pci/remove.c > @@ -138,6 +138,7 @@ static void pci_remove_bus_device(struct pci_dev *dev) > */ > void pci_stop_and_remove_bus_device(struct pci_dev *dev) > { > + lockdep_assert_held(&pci_rescan_remove_lock); > pci_stop_bus_device(dev); > pci_remove_bus_device(dev); > } We also have the function pci_stop_and_remove_bus_device_locked() as Gerd mentioned, so is pci_stop_and_remove_bus_device meant to be called without the rescan_remove_lock held? This is a little confusing as we shouldn't be adding/removing from the bus without the lock AFAIU, but maybe I am missing something? Thanks Farhan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] PCI: Add lockdep assertion in pci_stop_and_remove_bus_device() 2025-09-24 18:06 ` Farhan Ali @ 2025-09-25 7:25 ` Niklas Schnelle 0 siblings, 0 replies; 12+ messages in thread From: Niklas Schnelle @ 2025-09-25 7:25 UTC (permalink / raw) To: Farhan Ali, Bjorn Helgaas, Lukas Wunner Cc: Keith Busch, Gerd Bayer, Matthew Rosato, Benjamin Block, Halil Pasic, Julian Ruess, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, linux-pci, linux-kernel On Wed, 2025-09-24 at 11:06 -0700, Farhan Ali wrote: > On 8/26/2025 1:52 AM, Niklas Schnelle wrote: > > Removing a PCI devices requires holding pci_rescan_remove_lock. Prompted > > by this being missed in sriov_disable() and going unnoticed since its > > inception add a lockdep assert so this doesn't get missed again in the > > future. > > > > Reviewed-by: Benjamin Block <bblock@linux.ibm.com> > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > > --- > > drivers/pci/pci.h | 2 ++ > > drivers/pci/probe.c | 2 +- > > drivers/pci/remove.c | 1 + > > 3 files changed, 4 insertions(+), 1 deletion(-) > > --- snip --- > > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c > > index 445afdfa6498edc88f1ef89df279af1419025495..0b9a609392cecba36a818bc496a0af64061c259a 100644 > > --- a/drivers/pci/remove.c > > +++ b/drivers/pci/remove.c > > @@ -138,6 +138,7 @@ static void pci_remove_bus_device(struct pci_dev *dev) > > */ > > void pci_stop_and_remove_bus_device(struct pci_dev *dev) > > { > > + lockdep_assert_held(&pci_rescan_remove_lock); > > pci_stop_bus_device(dev); > > pci_remove_bus_device(dev); > > } > > We also have the function pci_stop_and_remove_bus_device_locked() as > Gerd mentioned, so is pci_stop_and_remove_bus_device meant to be called > without the rescan_remove_lock held? This is a little confusing as we > shouldn't be adding/removing from the bus without the lock AFAIU, but > maybe I am missing something? > > Thanks > Farhan As far as I understand one would use pci_stop_and_remove_bus_device() if one is already holding the pci_rescan_remove_lock and pci_stop_and_remove_bus_device_locked() if one only needs to take it for that call. I think this is kind of easy to get confused about so this lockdep assertion is even more useful. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] PCI: Add lockdep assertion in pci_stop_and_remove_bus_device() 2025-08-26 8:52 ` [PATCH 2/2] PCI: Add lockdep assertion in pci_stop_and_remove_bus_device() Niklas Schnelle 2025-09-12 14:48 ` Gerd Bayer 2025-09-24 18:06 ` Farhan Ali @ 2025-09-26 9:25 ` Julian Ruess 2 siblings, 0 replies; 12+ messages in thread From: Julian Ruess @ 2025-09-26 9:25 UTC (permalink / raw) To: Niklas Schnelle, Bjorn Helgaas, Lukas Wunner Cc: Keith Busch, Gerd Bayer, Matthew Rosato, Benjamin Block, Halil Pasic, Farhan Ali, Julian Ruess, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, linux-pci, linux-kernel On Tue Aug 26, 2025 at 10:52 AM CEST, Niklas Schnelle wrote: > Removing a PCI devices requires holding pci_rescan_remove_lock. Prompted > by this being missed in sriov_disable() and going unnoticed since its > inception add a lockdep assert so this doesn't get missed again in the > future. > > Reviewed-by: Benjamin Block <bblock@linux.ibm.com> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > --- > drivers/pci/pci.h | 2 ++ > drivers/pci/probe.c | 2 +- > drivers/pci/remove.c | 1 + > 3 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 34f65d69662e9f61f0c489ec58de2ce17d21c0c6..1ad2e3ab147f3b2c42b3257e4f366fc5e424ede3 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -84,6 +84,8 @@ struct pcie_tlp_log; > extern const unsigned char pcie_link_speed[]; > extern bool pci_early_dump; > > +extern struct mutex pci_rescan_remove_lock; > + > bool pcie_cap_has_lnkctl(const struct pci_dev *dev); > bool pcie_cap_has_lnkctl2(const struct pci_dev *dev); > bool pcie_cap_has_rtctl(const struct pci_dev *dev); > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index f41128f91ca76ab014ad669ae84a53032c7c6b6b..2b35bb39ab0366bbf86b43e721811575b9fbcefb 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -3469,7 +3469,7 @@ EXPORT_SYMBOL_GPL(pci_rescan_bus); > * pci_rescan_bus(), pci_rescan_bus_bridge_resize() and PCI device removal > * routines should always be executed under this mutex. > */ > -static DEFINE_MUTEX(pci_rescan_remove_lock); > +DEFINE_MUTEX(pci_rescan_remove_lock); > > void pci_lock_rescan_remove(void) > { > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c > index 445afdfa6498edc88f1ef89df279af1419025495..0b9a609392cecba36a818bc496a0af64061c259a 100644 > --- a/drivers/pci/remove.c > +++ b/drivers/pci/remove.c > @@ -138,6 +138,7 @@ static void pci_remove_bus_device(struct pci_dev *dev) > */ > void pci_stop_and_remove_bus_device(struct pci_dev *dev) > { > + lockdep_assert_held(&pci_rescan_remove_lock); > pci_stop_bus_device(dev); > pci_remove_bus_device(dev); > } Feel free to add my Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Thanks, Julian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV 2025-08-26 8:52 [PATCH 0/2] PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV Niklas Schnelle 2025-08-26 8:52 ` [PATCH 1/2] " Niklas Schnelle 2025-08-26 8:52 ` [PATCH 2/2] PCI: Add lockdep assertion in pci_stop_and_remove_bus_device() Niklas Schnelle @ 2025-09-26 21:02 ` Bjorn Helgaas 2 siblings, 0 replies; 12+ messages in thread From: Bjorn Helgaas @ 2025-09-26 21:02 UTC (permalink / raw) To: Niklas Schnelle Cc: Lukas Wunner, Keith Busch, Gerd Bayer, Matthew Rosato, Benjamin Block, Halil Pasic, Farhan Ali, Julian Ruess, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, linux-pci, linux-kernel On Tue, Aug 26, 2025 at 10:52:07AM +0200, Niklas Schnelle wrote: > Hi Bjorn, Hi Lukas, > > This series fixes missing PCI rescan-remove locking in sriov_disable() > and sriov_enable() the former of which was observed to cause a double > remove / list corruption on s390. The first patch is the fix itself and > gives more details while the second patch is an optional proposal to add > a lockdep assertion to pci_stop_and_remove_bus_device() to catch missing > rescan-remove locking more easily in the future. If applied without the > first patch disabling SR-IOV via "echo 0 > /sys/bus/pci/devices/<dev> > /sriov_numvfs" triggers the lockdep assertion. I haven't found an easy > way to trigger the assertion in the sriov_enable() case but I checked > callers. > > Also since the sriov_add_vfs() path is not excercised on s390 due to > pdev->no_vf_scan I did some basic testing on an x86 test system with an > SR-IOV capable ConnectX-6 DX NIC. > > Thanks, > Niklas > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > --- > Niklas Schnelle (2): > PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV > PCI: Add lockdep assertion in pci_stop_and_remove_bus_device() > > drivers/pci/iov.c | 5 +++++ > drivers/pci/pci.h | 2 ++ > drivers/pci/probe.c | 2 +- > drivers/pci/remove.c | 1 + > 4 files changed, 9 insertions(+), 1 deletion(-) Applied to pci/virtualization, hoping to squeeze into v6.18. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-09-26 21:02 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-26 8:52 [PATCH 0/2] PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV Niklas Schnelle 2025-08-26 8:52 ` [PATCH 1/2] " Niklas Schnelle 2025-09-24 17:57 ` Farhan Ali 2025-09-25 7:48 ` Niklas Schnelle 2025-09-25 22:05 ` Farhan Ali 2025-09-26 9:24 ` Julian Ruess 2025-08-26 8:52 ` [PATCH 2/2] PCI: Add lockdep assertion in pci_stop_and_remove_bus_device() Niklas Schnelle 2025-09-12 14:48 ` Gerd Bayer 2025-09-24 18:06 ` Farhan Ali 2025-09-25 7:25 ` Niklas Schnelle 2025-09-26 9:25 ` Julian Ruess 2025-09-26 21:02 ` [PATCH 0/2] PCI/IOV: Add missing PCI rescan-remove locking when enabling/disabling SR-IOV Bjorn Helgaas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox