* [PATCH RFC 0/8] pci: rescan/remove locking rework
@ 2024-07-22 15:19 Keith Busch
2024-07-22 15:19 ` [PATCH RFC 1/8] pci: make pci_stop_dev concurrent safe Keith Busch
` (8 more replies)
0 siblings, 9 replies; 21+ messages in thread
From: Keith Busch @ 2024-07-22 15:19 UTC (permalink / raw)
To: linux-pci, bhelgaas, lukas; +Cc: mika.westerberg, Keith Busch
From: Keith Busch <kbusch@kernel.org>
This patch set targets a subset of pci bus scanning and removals that
were shown to be problematic with deep pci topologies that support
native hotplug. I've tried to capture the common pci components, but
there are definitely many subsystems accessing the topology in their own
way, many of which I can't possibly test, and I have not tried to
convert every user to this new locking scheme. However, if I did this
correctly, they should be no worse off than today!
The earlier patches are just cleanups and/or making it a little easier
to change the locking schemes. The real stuff happens from patches 7 and
8.
I've run this with lockdep enabled, tested concurrent hotplug events on
various x86 platforms with layers of pci switches. That said, as
mentioned earlier, there are many paths to here that I haven't been able
to test, so the final patch might be considered experimental.
Manual concurrent concurrent device removals is still broken. I've sent
a different patch for that here, but it does not sound like it is
acceptable:
https://lore.kernel.org/linux-pci/20240719185513.3376321-1-kbusch@meta.com/T/#u
Keith Busch (8):
pci: make pci_stop_dev concurrent safe
pci: make pci_destroy_dev concurrent safe
pci: move the walk bus lock to where its needed
pci: walk bus recursively
pci: unexport pci_walk_bus_locked
pci: add helpers for stop and remove bus
pci: reference count subordinate
pci: use finer grain locking for bus protection
drivers/pci/bus.c | 70 ++++++++++++-------------
drivers/pci/hotplug/pciehp_pci.c | 21 +++++---
drivers/pci/pci-sysfs.c | 2 +
drivers/pci/pci.c | 31 ++++++++---
drivers/pci/pci.h | 16 +++++-
drivers/pci/pcie/aspm.c | 7 ++-
drivers/pci/probe.c | 16 +++++-
drivers/pci/remove.c | 90 +++++++++++++++++++-------------
include/linux/pci.h | 11 ++++
9 files changed, 175 insertions(+), 89 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH RFC 1/8] pci: make pci_stop_dev concurrent safe
2024-07-22 15:19 [PATCH RFC 0/8] pci: rescan/remove locking rework Keith Busch
@ 2024-07-22 15:19 ` Keith Busch
2024-08-15 14:17 ` Jonathan Cameron
2024-07-22 15:19 ` [PATCH RFC 2/8] pci: make pci_destroy_dev " Keith Busch
` (7 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Keith Busch @ 2024-07-22 15:19 UTC (permalink / raw)
To: linux-pci, bhelgaas, lukas; +Cc: mika.westerberg, Keith Busch
From: Keith Busch <kbusch@kernel.org>
Use the atomic ADDED flag to safely ensure concurrent callers can't
attempt to stop the device multiple times.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/bus.c | 2 +-
drivers/pci/pci.h | 9 +++++++--
drivers/pci/remove.c | 18 ++++++++----------
3 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 55c8536860518..e41dfece0d969 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -348,7 +348,7 @@ void pci_bus_add_device(struct pci_dev *dev)
if (retval < 0 && retval != -EPROBE_DEFER)
pci_warn(dev, "device attach failed (%d)\n", retval);
- pci_dev_assign_added(dev, true);
+ pci_dev_assign_added(dev);
if (dev_of_node(&dev->dev) && pci_is_bridge(dev)) {
retval = of_platform_populate(dev_of_node(&dev->dev), NULL, NULL,
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 79c8398f39384..171dfd6f14e6e 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -444,9 +444,14 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
#define PCI_DPC_RECOVERED 1
#define PCI_DPC_RECOVERING 2
-static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
+static inline void pci_dev_assign_added(struct pci_dev *dev)
{
- assign_bit(PCI_DEV_ADDED, &dev->priv_flags, added);
+ set_bit(PCI_DEV_ADDED, &dev->priv_flags);
+}
+
+static inline bool pci_dev_test_and_clear_added(struct pci_dev *dev)
+{
+ return test_and_clear_bit(PCI_DEV_ADDED, &dev->priv_flags);
}
static inline bool pci_dev_is_added(const struct pci_dev *dev)
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 910387e5bdbf9..ec3064a115bf8 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -16,17 +16,15 @@ static void pci_free_resources(struct pci_dev *dev)
static void pci_stop_dev(struct pci_dev *dev)
{
- pci_pme_active(dev, false);
-
- if (pci_dev_is_added(dev)) {
- of_platform_depopulate(&dev->dev);
- device_release_driver(&dev->dev);
- pci_proc_detach_device(dev);
- pci_remove_sysfs_dev_files(dev);
- of_pci_remove_node(dev);
+ if (!pci_dev_test_and_clear_added(dev))
+ return;
- pci_dev_assign_added(dev, false);
- }
+ pci_pme_active(dev, false);
+ of_platform_depopulate(&dev->dev);
+ device_release_driver(&dev->dev);
+ pci_proc_detach_device(dev);
+ pci_remove_sysfs_dev_files(dev);
+ of_pci_remove_node(dev);
}
static void pci_destroy_dev(struct pci_dev *dev)
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH RFC 2/8] pci: make pci_destroy_dev concurrent safe
2024-07-22 15:19 [PATCH RFC 0/8] pci: rescan/remove locking rework Keith Busch
2024-07-22 15:19 ` [PATCH RFC 1/8] pci: make pci_stop_dev concurrent safe Keith Busch
@ 2024-07-22 15:19 ` Keith Busch
2024-08-15 14:18 ` Jonathan Cameron
2024-07-22 15:19 ` [PATCH RFC 3/8] pci: move the walk bus lock to where its needed Keith Busch
` (6 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Keith Busch @ 2024-07-22 15:19 UTC (permalink / raw)
To: linux-pci, bhelgaas, lukas; +Cc: mika.westerberg, Keith Busch
From: Keith Busch <kbusch@kernel.org>
Use an atomic flag instead of the racey check against the device's kobj
parent. We shouldn't be poking into device implementation details at
this level anyway.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/pci.h | 6 ++++++
drivers/pci/remove.c | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 171dfd6f14e6e..19cbf18743a96 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -443,6 +443,7 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
#define PCI_DEV_ADDED 0
#define PCI_DPC_RECOVERED 1
#define PCI_DPC_RECOVERING 2
+#define PCI_DEV_REMOVED 3
static inline void pci_dev_assign_added(struct pci_dev *dev)
{
@@ -459,6 +460,11 @@ static inline bool pci_dev_is_added(const struct pci_dev *dev)
return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
}
+static inline bool pci_dev_test_and_set_removed(struct pci_dev *dev)
+{
+ return test_and_set_bit(PCI_DEV_REMOVED, &dev->priv_flags);
+}
+
#ifdef CONFIG_PCIEAER
#include <linux/aer.h>
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index ec3064a115bf8..8284ab20949c9 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -29,7 +29,7 @@ static void pci_stop_dev(struct pci_dev *dev)
static void pci_destroy_dev(struct pci_dev *dev)
{
- if (!dev->dev.kobj.parent)
+ if (pci_dev_test_and_set_removed(dev))
return;
device_del(&dev->dev);
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH RFC 3/8] pci: move the walk bus lock to where its needed
2024-07-22 15:19 [PATCH RFC 0/8] pci: rescan/remove locking rework Keith Busch
2024-07-22 15:19 ` [PATCH RFC 1/8] pci: make pci_stop_dev concurrent safe Keith Busch
2024-07-22 15:19 ` [PATCH RFC 2/8] pci: make pci_destroy_dev " Keith Busch
@ 2024-07-22 15:19 ` Keith Busch
2024-08-15 14:20 ` Jonathan Cameron
2024-07-22 15:19 ` [PATCH RFC 4/8] pci: walk bus recursively Keith Busch
` (5 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Keith Busch @ 2024-07-22 15:19 UTC (permalink / raw)
To: linux-pci, bhelgaas, lukas; +Cc: mika.westerberg, Keith Busch
From: Keith Busch <kbusch@kernel.org>
Simplify the common function by removing an unnecessary parameter that
can be more easily handled in the only caller that wants it.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/bus.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index e41dfece0d969..7c07a141e8772 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -390,7 +390,7 @@ void pci_bus_add_devices(const struct pci_bus *bus)
EXPORT_SYMBOL(pci_bus_add_devices);
static void __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
- void *userdata, bool locked)
+ void *userdata)
{
struct pci_dev *dev;
struct pci_bus *bus;
@@ -398,8 +398,6 @@ static void __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void
int retval;
bus = top;
- if (!locked)
- down_read(&pci_bus_sem);
next = top->devices.next;
for (;;) {
if (next == &bus->devices) {
@@ -422,8 +420,6 @@ static void __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void
if (retval)
break;
}
- if (!locked)
- up_read(&pci_bus_sem);
}
/**
@@ -441,7 +437,9 @@ static void __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void
*/
void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *), void *userdata)
{
- __pci_walk_bus(top, cb, userdata, false);
+ down_read(&pci_bus_sem);
+ __pci_walk_bus(top, cb, userdata);
+ up_read(&pci_bus_sem);
}
EXPORT_SYMBOL_GPL(pci_walk_bus);
@@ -449,7 +447,7 @@ void pci_walk_bus_locked(struct pci_bus *top, int (*cb)(struct pci_dev *, void *
{
lockdep_assert_held(&pci_bus_sem);
- __pci_walk_bus(top, cb, userdata, true);
+ __pci_walk_bus(top, cb, userdata);
}
EXPORT_SYMBOL_GPL(pci_walk_bus_locked);
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH RFC 4/8] pci: walk bus recursively
2024-07-22 15:19 [PATCH RFC 0/8] pci: rescan/remove locking rework Keith Busch
` (2 preceding siblings ...)
2024-07-22 15:19 ` [PATCH RFC 3/8] pci: move the walk bus lock to where its needed Keith Busch
@ 2024-07-22 15:19 ` Keith Busch
2024-08-15 14:33 ` Jonathan Cameron
2024-07-22 15:19 ` [PATCH RFC 5/8] pci: unexport pci_walk_bus_locked Keith Busch
` (4 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Keith Busch @ 2024-07-22 15:19 UTC (permalink / raw)
To: linux-pci, bhelgaas, lukas; +Cc: mika.westerberg, Keith Busch
From: Keith Busch <kbusch@kernel.org>
The original implementation purposefully chose a non-recursive walk,
presumably as a precaution on stack use. We do recursive bus walking in
other places though. For example:
pci_bus_resettable
pci_stop_bus_device
pci_remove_bus_device
pci_bus_allocate_dev_resources
So, recursive pci bus walking is well tested and safe, and the
implementation is easier to follow. The motivation for changing it now
is to make it easier to introduce finer grain locking in the future.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/bus.c | 34 ++++++++++------------------------
1 file changed, 10 insertions(+), 24 deletions(-)
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 7c07a141e8772..b7208e644c79f 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -389,37 +389,23 @@ void pci_bus_add_devices(const struct pci_bus *bus)
}
EXPORT_SYMBOL(pci_bus_add_devices);
-static void __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
+static int __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
void *userdata)
{
struct pci_dev *dev;
- struct pci_bus *bus;
- struct list_head *next;
- int retval;
+ int ret = 0;
- bus = top;
- next = top->devices.next;
- for (;;) {
- if (next == &bus->devices) {
- /* end of this bus, go up or finish */
- if (bus == top)
+ list_for_each_entry(dev, &top->devices, bus_list) {
+ ret = cb(dev, userdata);
+ if (ret)
+ break;
+ if (dev->subordinate) {
+ ret = __pci_walk_bus(dev->subordinate, cb, userdata);
+ if (ret)
break;
- next = bus->self->bus_list.next;
- bus = bus->self->bus;
- continue;
}
- dev = list_entry(next, struct pci_dev, bus_list);
- if (dev->subordinate) {
- /* this is a pci-pci bridge, do its devices next */
- next = dev->subordinate->devices.next;
- bus = dev->subordinate;
- } else
- next = dev->bus_list.next;
-
- retval = cb(dev, userdata);
- if (retval)
- break;
}
+ return ret;
}
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH RFC 5/8] pci: unexport pci_walk_bus_locked
2024-07-22 15:19 [PATCH RFC 0/8] pci: rescan/remove locking rework Keith Busch
` (3 preceding siblings ...)
2024-07-22 15:19 ` [PATCH RFC 4/8] pci: walk bus recursively Keith Busch
@ 2024-07-22 15:19 ` Keith Busch
2024-08-15 14:36 ` Jonathan Cameron
2024-07-22 15:19 ` [PATCH RFC 6/8] pci: add helpers for stop and remove bus Keith Busch
` (3 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Keith Busch @ 2024-07-22 15:19 UTC (permalink / raw)
To: linux-pci, bhelgaas, lukas; +Cc: mika.westerberg, Keith Busch
From: Keith Busch <kbusch@kernel.org>
There's only one user of this, and it's internal, so no need to export
it.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/bus.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index b7208e644c79f..638e79d10bfab 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -435,7 +435,6 @@ void pci_walk_bus_locked(struct pci_bus *top, int (*cb)(struct pci_dev *, void *
__pci_walk_bus(top, cb, userdata);
}
-EXPORT_SYMBOL_GPL(pci_walk_bus_locked);
struct pci_bus *pci_bus_get(struct pci_bus *bus)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH RFC 6/8] pci: add helpers for stop and remove bus
2024-07-22 15:19 [PATCH RFC 0/8] pci: rescan/remove locking rework Keith Busch
` (4 preceding siblings ...)
2024-07-22 15:19 ` [PATCH RFC 5/8] pci: unexport pci_walk_bus_locked Keith Busch
@ 2024-07-22 15:19 ` Keith Busch
2024-08-15 14:49 ` Jonathan Cameron
2024-07-22 15:19 ` [PATCH RFC 7/8] pci: reference count subordinate Keith Busch
` (2 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Keith Busch @ 2024-07-22 15:19 UTC (permalink / raw)
To: linux-pci, bhelgaas, lukas; +Cc: mika.westerberg, Keith Busch
From: Keith Busch <kbusch@kernel.org>
There are repeated patterns of tearing down pci buses, so combine to
helper functions and use these.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/remove.c | 46 +++++++++++++++++++++++---------------------
1 file changed, 24 insertions(+), 22 deletions(-)
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 8284ab20949c9..288162a11ab19 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -4,6 +4,9 @@
#include <linux/of_platform.h>
#include "pci.h"
+static void pci_stop_bus(struct pci_bus *bus);
+static void pci_remove_bus_device(struct pci_dev *dev);
+
static void pci_free_resources(struct pci_dev *dev)
{
struct resource *res;
@@ -45,8 +48,17 @@ static void pci_destroy_dev(struct pci_dev *dev)
put_device(&dev->dev);
}
+static void pci_clear_bus(struct pci_bus *bus)
+{
+ struct pci_dev *dev, *next;
+
+ list_for_each_entry_safe(dev, next, &bus->devices, bus_list)
+ pci_remove_bus_device(dev);
+}
+
void pci_remove_bus(struct pci_bus *bus)
{
+ pci_clear_bus(bus);
pci_proc_detach_bus(bus);
down_write(&pci_bus_sem);
@@ -66,7 +78,15 @@ EXPORT_SYMBOL(pci_remove_bus);
static void pci_stop_bus_device(struct pci_dev *dev)
{
struct pci_bus *bus = dev->subordinate;
- struct pci_dev *child, *tmp;
+
+ if (bus)
+ pci_stop_bus(bus);
+ pci_stop_dev(dev);
+}
+
+static void pci_stop_bus(struct pci_bus *bus)
+{
+ struct pci_dev *dev, *next;
/*
* Stopping an SR-IOV PF device removes all the associated VFs,
@@ -74,29 +94,18 @@ static void pci_stop_bus_device(struct pci_dev *dev)
* iterator. Therefore, iterate in reverse so we remove the VFs
* first, then the PF.
*/
- if (bus) {
- list_for_each_entry_safe_reverse(child, tmp,
- &bus->devices, bus_list)
- pci_stop_bus_device(child);
- }
-
- pci_stop_dev(dev);
+ list_for_each_entry_safe_reverse(dev, next, &bus->devices, bus_list)
+ pci_stop_bus_device(dev);
}
static void pci_remove_bus_device(struct pci_dev *dev)
{
struct pci_bus *bus = dev->subordinate;
- struct pci_dev *child, *tmp;
if (bus) {
- list_for_each_entry_safe(child, tmp,
- &bus->devices, bus_list)
- pci_remove_bus_device(child);
-
pci_remove_bus(bus);
dev->subordinate = NULL;
}
-
pci_destroy_dev(dev);
}
@@ -129,16 +138,13 @@ EXPORT_SYMBOL_GPL(pci_stop_and_remove_bus_device_locked);
void pci_stop_root_bus(struct pci_bus *bus)
{
- struct pci_dev *child, *tmp;
struct pci_host_bridge *host_bridge;
if (!pci_is_root_bus(bus))
return;
host_bridge = to_pci_host_bridge(bus->bridge);
- list_for_each_entry_safe_reverse(child, tmp,
- &bus->devices, bus_list)
- pci_stop_bus_device(child);
+ pci_stop_bus(bus);
/* stop the host bridge */
device_release_driver(&host_bridge->dev);
@@ -147,16 +153,12 @@ EXPORT_SYMBOL_GPL(pci_stop_root_bus);
void pci_remove_root_bus(struct pci_bus *bus)
{
- struct pci_dev *child, *tmp;
struct pci_host_bridge *host_bridge;
if (!pci_is_root_bus(bus))
return;
host_bridge = to_pci_host_bridge(bus->bridge);
- list_for_each_entry_safe(child, tmp,
- &bus->devices, bus_list)
- pci_remove_bus_device(child);
#ifdef CONFIG_PCI_DOMAINS_GENERIC
/* Release domain_nr if it was dynamically allocated */
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH RFC 7/8] pci: reference count subordinate
2024-07-22 15:19 [PATCH RFC 0/8] pci: rescan/remove locking rework Keith Busch
` (5 preceding siblings ...)
2024-07-22 15:19 ` [PATCH RFC 6/8] pci: add helpers for stop and remove bus Keith Busch
@ 2024-07-22 15:19 ` Keith Busch
2024-08-15 15:10 ` Jonathan Cameron
2024-07-22 15:19 ` [PATCH RFC 8/8] pci: use finer grain locking for bus protection Keith Busch
2024-08-07 15:40 ` [PATCH RFC 0/8] pci: rescan/remove locking rework Keith Busch
8 siblings, 1 reply; 21+ messages in thread
From: Keith Busch @ 2024-07-22 15:19 UTC (permalink / raw)
To: linux-pci, bhelgaas, lukas; +Cc: mika.westerberg, Keith Busch
From: Keith Busch <kbusch@kernel.org>
The subordinate is accessed under the pci_bus_sem (or often times no
lock at at all), but unset under the rescan_remove_lock. Access the
subordinate buses with reference counts to guard against a concurrent
removal and use-after-free.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/bus.c | 18 +++++++++++++++---
drivers/pci/hotplug/pciehp_pci.c | 12 ++++++++++--
drivers/pci/pci.c | 28 ++++++++++++++++++++++------
drivers/pci/pci.h | 1 +
drivers/pci/pcie/aspm.c | 7 +++++--
drivers/pci/probe.c | 7 +++++--
drivers/pci/remove.c | 18 +++++++++++++-----
7 files changed, 71 insertions(+), 20 deletions(-)
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 638e79d10bfab..3085ecaa2ba40 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -382,9 +382,11 @@ void pci_bus_add_devices(const struct pci_bus *bus)
/* Skip if device attach failed */
if (!pci_dev_is_added(dev))
continue;
- child = dev->subordinate;
+
+ child = pci_dev_get_subordinate(dev);
if (child)
pci_bus_add_devices(child);
+ pci_bus_put(child);
}
}
EXPORT_SYMBOL(pci_bus_add_devices);
@@ -396,11 +398,16 @@ static int __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void
int ret = 0;
list_for_each_entry(dev, &top->devices, bus_list) {
+ struct pci_bus *bus;
+
ret = cb(dev, userdata);
if (ret)
break;
- if (dev->subordinate) {
- ret = __pci_walk_bus(dev->subordinate, cb, userdata);
+
+ bus = pci_dev_get_subordinate(dev);
+ if (bus) {
+ ret = __pci_walk_bus(bus, cb, userdata);
+ pci_bus_put(bus);
if (ret)
break;
}
@@ -448,3 +455,8 @@ void pci_bus_put(struct pci_bus *bus)
if (bus)
put_device(&bus->dev);
}
+
+struct pci_bus *pci_dev_get_subordinate(struct pci_dev *dev)
+{
+ return pci_bus_get(READ_ONCE(dev->subordinate));
+}
diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index 65e50bee1a8c0..b15dcfd86c60a 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -33,9 +33,12 @@ int pciehp_configure_device(struct controller *ctrl)
{
struct pci_dev *dev;
struct pci_dev *bridge = ctrl->pcie->port;
- struct pci_bus *parent = bridge->subordinate;
+ struct pci_bus *parent = pci_dev_get_subordinate(bridge);
int num, ret = 0;
+ if (!parent)
+ return 0;
+
pci_lock_rescan_remove();
dev = pci_get_slot(parent, PCI_DEVFN(0, 0));
@@ -78,6 +81,7 @@ int pciehp_configure_device(struct controller *ctrl)
out:
pci_unlock_rescan_remove();
+ pci_bus_put(parent);
return ret;
}
@@ -95,9 +99,12 @@ int pciehp_configure_device(struct controller *ctrl)
void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
{
struct pci_dev *dev, *temp;
- struct pci_bus *parent = ctrl->pcie->port->subordinate;
+ struct pci_bus *parent = pci_dev_get_subordinate(ctrl->pcie->port);
u16 command;
+ if (!parent)
+ return;
+
ctrl_dbg(ctrl, "%s: domain:bus:dev = %04x:%02x:00\n",
__func__, pci_domain_nr(parent), parent->number);
@@ -138,4 +145,5 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
}
pci_unlock_rescan_remove();
+ pci_bus_put(parent);
}
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e3a49f66982d5..0cd36b7772c8b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3113,9 +3113,14 @@ void pci_bridge_d3_update(struct pci_dev *dev)
* so we need to go through all children to find out if one of them
* continues to block D3.
*/
- if (d3cold_ok && !bridge->bridge_d3)
- pci_walk_bus(bridge->subordinate, pci_dev_check_d3cold,
- &d3cold_ok);
+ if (d3cold_ok && !bridge->bridge_d3) {
+ struct pci_bus *bus = pci_dev_get_subordinate(bridge);
+
+ if (bus) {
+ pci_walk_bus(bus, pci_dev_check_d3cold, &d3cold_ok);
+ pci_bus_put(bus);
+ }
+ }
if (bridge->bridge_d3 != d3cold_ok) {
bridge->bridge_d3 = d3cold_ok;
@@ -4824,6 +4829,7 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus)
int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
{
struct pci_dev *child __free(pci_dev_put) = NULL;
+ struct pci_bus *bus;
int delay;
if (pci_dev_is_disconnected(dev))
@@ -4840,7 +4846,17 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
* board_added(). In case of ACPI hotplug the firmware is expected
* to configure the devices before OS is notified.
*/
- if (!dev->subordinate || list_empty(&dev->subordinate->devices)) {
+ bus = pci_dev_get_subordinate(dev);
+ if (!bus) {
+ up_read(&pci_bus_sem);
+ return 0;
+ }
+
+ child = pci_dev_get(list_first_entry_or_null(&bus->devices,
+ struct pci_dev,
+ bus_list));
+ if (!child) {
+ pci_bus_put(bus);
up_read(&pci_bus_sem);
return 0;
}
@@ -4848,12 +4864,12 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
/* Take d3cold_delay requirements into account */
delay = pci_bus_max_d3cold_delay(dev->subordinate);
if (!delay) {
+ pci_bus_put(bus);
up_read(&pci_bus_sem);
return 0;
}
- child = pci_dev_get(list_first_entry(&dev->subordinate->devices,
- struct pci_dev, bus_list));
+ pci_bus_put(bus);
up_read(&pci_bus_sem);
/*
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 19cbf18743a96..83e449253066e 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -312,6 +312,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev);
void pci_disable_bridge_window(struct pci_dev *dev);
struct pci_bus *pci_bus_get(struct pci_bus *bus);
void pci_bus_put(struct pci_bus *bus);
+struct pci_bus *pci_dev_get_subordinate(struct pci_dev *dev);
/* PCIe link information from Link Capabilities 2 */
#define PCIE_LNKCAP2_SLS2SPEED(lnkcap2) \
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index cee2365e54b8b..3c0c83d35ab86 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -1212,9 +1212,11 @@ static void pcie_update_aspm_capable(struct pcie_link_state *root)
link->aspm_capable = link->aspm_support;
}
list_for_each_entry(link, &link_list, sibling) {
+ struct pci_bus *linkbus;
struct pci_dev *child;
- struct pci_bus *linkbus = link->pdev->subordinate;
- if (link->root != root)
+
+ linkbus = pci_dev_get_subordinate(link->pdev);
+ if (!linkbus || link->root != root)
continue;
list_for_each_entry(child, &linkbus->devices, bus_list) {
if ((pci_pcie_type(child) != PCI_EXP_TYPE_ENDPOINT) &&
@@ -1222,6 +1224,7 @@ static void pcie_update_aspm_capable(struct pcie_link_state *root)
continue;
pcie_aspm_check_latency(child);
}
+ pci_bus_put(linkbus);
}
}
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index b14b9876c0303..53522685193da 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1167,7 +1167,10 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
child->resource[i] = &bridge->resource[PCI_BRIDGE_RESOURCES+i];
child->resource[i]->name = child->name;
}
- bridge->subordinate = child;
+
+ down_write(&pci_bus_sem);
+ WRITE_ONCE(bridge->subordinate, child);
+ up_write(&pci_bus_sem);
add_dev:
pci_set_bus_msi_domain(child);
@@ -3380,7 +3383,7 @@ int pci_hp_add_bridge(struct pci_dev *dev)
/* Scan bridges that need to be reconfigured */
pci_scan_bridge_extend(parent, dev, busnr, available_buses, 1);
- if (!dev->subordinate)
+ if (!READ_ONCE(dev->subordinate))
return -1;
return 0;
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 288162a11ab19..646c97e41a323 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -77,10 +77,12 @@ EXPORT_SYMBOL(pci_remove_bus);
static void pci_stop_bus_device(struct pci_dev *dev)
{
- struct pci_bus *bus = dev->subordinate;
+ struct pci_bus *bus = pci_dev_get_subordinate(dev);
- if (bus)
+ if (bus) {
pci_stop_bus(bus);
+ pci_bus_put(bus);
+ }
pci_stop_dev(dev);
}
@@ -100,12 +102,18 @@ static void pci_stop_bus(struct pci_bus *bus)
static void pci_remove_bus_device(struct pci_dev *dev)
{
- struct pci_bus *bus = dev->subordinate;
+ struct pci_bus *bus;
+ down_write(&pci_bus_sem);
+ bus = pci_dev_get_subordinate(dev);
if (bus) {
+ WRITE_ONCE(dev->subordinate, NULL);
+ up_write(&pci_bus_sem);
+
pci_remove_bus(bus);
- dev->subordinate = NULL;
- }
+ pci_bus_put(bus);
+ } else
+ up_write(&pci_bus_sem);
pci_destroy_dev(dev);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH RFC 8/8] pci: use finer grain locking for bus protection
2024-07-22 15:19 [PATCH RFC 0/8] pci: rescan/remove locking rework Keith Busch
` (6 preceding siblings ...)
2024-07-22 15:19 ` [PATCH RFC 7/8] pci: reference count subordinate Keith Busch
@ 2024-07-22 15:19 ` Keith Busch
2024-08-15 15:21 ` Jonathan Cameron
2024-08-07 15:40 ` [PATCH RFC 0/8] pci: rescan/remove locking rework Keith Busch
8 siblings, 1 reply; 21+ messages in thread
From: Keith Busch @ 2024-07-22 15:19 UTC (permalink / raw)
To: linux-pci, bhelgaas, lukas; +Cc: mika.westerberg, Keith Busch
From: Keith Busch <kbusch@kernel.org>
The global rescan_remove lock has deadlocks during concurrent removals
because it is used within interrupt handlers. Use a bus specific lock
instead.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/bus.c | 11 ++++++++---
drivers/pci/hotplug/pciehp_pci.c | 11 ++++++-----
drivers/pci/pci-sysfs.c | 2 ++
drivers/pci/pci.c | 5 ++++-
drivers/pci/probe.c | 9 +++++++++
drivers/pci/remove.c | 10 ++++++++++
include/linux/pci.h | 11 +++++++++++
7 files changed, 50 insertions(+), 9 deletions(-)
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 3085ecaa2ba40..d80a9e4f39d38 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -384,8 +384,11 @@ void pci_bus_add_devices(const struct pci_bus *bus)
continue;
child = pci_dev_get_subordinate(dev);
- if (child)
+ if (child) {
+ pci_lock_bus(child);
pci_bus_add_devices(child);
+ pci_unlock_bus(child);
+ }
pci_bus_put(child);
}
}
@@ -406,7 +409,9 @@ static int __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void
bus = pci_dev_get_subordinate(dev);
if (bus) {
+ pci_lock_bus(bus);
ret = __pci_walk_bus(bus, cb, userdata);
+ pci_unlock_bus(bus);
pci_bus_put(bus);
if (ret)
break;
@@ -430,9 +435,9 @@ static int __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void
*/
void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *), void *userdata)
{
- down_read(&pci_bus_sem);
+ pci_lock_bus(top);
__pci_walk_bus(top, cb, userdata);
- up_read(&pci_bus_sem);
+ pci_unlock_bus(top);
}
EXPORT_SYMBOL_GPL(pci_walk_bus);
diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index b15dcfd86c60a..f6260f1085d81 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -39,8 +39,7 @@ int pciehp_configure_device(struct controller *ctrl)
if (!parent)
return 0;
- pci_lock_rescan_remove();
-
+ pci_lock_bus(parent);
dev = pci_get_slot(parent, PCI_DEVFN(0, 0));
if (dev) {
/*
@@ -63,6 +62,7 @@ int pciehp_configure_device(struct controller *ctrl)
for_each_pci_bridge(dev, parent)
pci_hp_add_bridge(dev);
+ pci_unlock_bus(parent);
pci_assign_unassigned_bridge_resources(bridge);
pcie_bus_configure_settings(parent);
@@ -72,6 +72,7 @@ int pciehp_configure_device(struct controller *ctrl)
* to avoid AB-BA deadlock with device_lock.
*/
up_read(&ctrl->reset_lock);
+ pci_lock_bus(parent);
pci_bus_add_devices(parent);
down_read_nested(&ctrl->reset_lock, ctrl->depth);
@@ -80,7 +81,7 @@ int pciehp_configure_device(struct controller *ctrl)
pci_dev_put(dev);
out:
- pci_unlock_rescan_remove();
+ pci_unlock_bus(parent);
pci_bus_put(parent);
return ret;
}
@@ -111,7 +112,7 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
if (!presence)
pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
- pci_lock_rescan_remove();
+ pci_lock_bus(parent);
/*
* Stopping an SR-IOV PF device removes all the associated VFs,
@@ -144,6 +145,6 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
pci_dev_put(dev);
}
- pci_unlock_rescan_remove();
+ pci_unlock_bus(parent);
pci_bus_put(parent);
}
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 40cfa716392fb..0853c931b3c7d 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -487,8 +487,10 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
if (kstrtoul(buf, 0, &val) < 0)
return -EINVAL;
+ get_device(dev);
if (val && device_remove_file_self(dev, attr))
pci_stop_and_remove_bus_device_locked(to_pci_dev(dev));
+ put_device(dev);
return count;
}
static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0220, NULL,
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 0cd36b7772c8b..7e5f05b155658 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3117,7 +3117,10 @@ void pci_bridge_d3_update(struct pci_dev *dev)
struct pci_bus *bus = pci_dev_get_subordinate(bridge);
if (bus) {
- pci_walk_bus(bus, pci_dev_check_d3cold, &d3cold_ok);
+ down_read(&pci_bus_sem);
+ pci_walk_bus_locked(bus, pci_dev_check_d3cold,
+ &d3cold_ok);
+ up_read(&pci_bus_sem);
pci_bus_put(bus);
}
}
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 53522685193da..9c1589be9c390 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -563,6 +563,7 @@ static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
if (!b)
return NULL;
+ mutex_init(&b->lock);
INIT_LIST_HEAD(&b->node);
INIT_LIST_HEAD(&b->children);
INIT_LIST_HEAD(&b->devices);
@@ -1359,7 +1360,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
}
buses = subordinate - secondary;
+ pci_lock_bus(child);
cmax = pci_scan_child_bus_extend(child, buses);
+ pci_unlock_bus(child);
if (cmax > subordinate)
pci_warn(dev, "bridge has subordinate %02x but max busn %02x\n",
subordinate, cmax);
@@ -3109,7 +3112,9 @@ int pci_host_probe(struct pci_host_bridge *bridge)
list_for_each_entry(child, &bus->children, node)
pcie_bus_configure_settings(child);
+ pci_lock_bus(bus);
pci_bus_add_devices(bus);
+ pci_unlock_bus(bus);
return 0;
}
EXPORT_SYMBOL_GPL(pci_host_probe);
@@ -3284,11 +3289,13 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
unsigned int max;
struct pci_bus *bus = bridge->subordinate;
+ pci_lock_bus(bus);
max = pci_scan_child_bus(bus);
pci_assign_unassigned_bridge_resources(bridge);
pci_bus_add_devices(bus);
+ pci_unlock_bus(bus);
return max;
}
@@ -3306,9 +3313,11 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
{
unsigned int max;
+ pci_lock_bus(bus);
max = pci_scan_child_bus(bus);
pci_assign_unassigned_bus_resources(bus);
pci_bus_add_devices(bus);
+ pci_unlock_bus(bus);
return max;
}
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 646c97e41a323..cf641a80a7f21 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -52,8 +52,10 @@ static void pci_clear_bus(struct pci_bus *bus)
{
struct pci_dev *dev, *next;
+ pci_lock_bus(bus);
list_for_each_entry_safe(dev, next, &bus->devices, bus_list)
pci_remove_bus_device(dev);
+ pci_unlock_bus(bus);
}
void pci_remove_bus(struct pci_bus *bus)
@@ -96,8 +98,10 @@ static void pci_stop_bus(struct pci_bus *bus)
* iterator. Therefore, iterate in reverse so we remove the VFs
* first, then the PF.
*/
+ pci_lock_bus(bus);
list_for_each_entry_safe_reverse(dev, next, &bus->devices, bus_list)
pci_stop_bus_device(dev);
+ pci_unlock_bus(bus);
}
static void pci_remove_bus_device(struct pci_dev *dev)
@@ -138,9 +142,15 @@ EXPORT_SYMBOL(pci_stop_and_remove_bus_device);
void pci_stop_and_remove_bus_device_locked(struct pci_dev *dev)
{
+ struct pci_bus *bus = pci_bus_get(dev->bus);
+
pci_lock_rescan_remove();
+ pci_lock_bus(bus);
pci_stop_and_remove_bus_device(dev);
+ pci_unlock_bus(bus);
pci_unlock_rescan_remove();
+
+ pci_bus_put(bus);
}
EXPORT_SYMBOL_GPL(pci_stop_and_remove_bus_device_locked);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 9e36b6c1810ea..6b37373b831ec 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -649,6 +649,7 @@ struct pci_bus {
struct list_head node; /* Node in list of buses */
struct pci_bus *parent; /* Parent bus this bridge is on */
struct list_head children; /* List of child buses */
+ struct mutex lock; /* Lock for devices */
struct list_head devices; /* List of devices on this bus */
struct pci_dev *self; /* Bridge device as seen by parent */
struct list_head slots; /* List of slots on this bus;
@@ -681,6 +682,16 @@ struct pci_bus {
unsigned int unsafe_warn:1; /* warned about RW1C config write */
};
+static inline void pci_lock_bus(struct pci_bus *bus)
+{
+ mutex_lock_nested(&bus->lock, bus->number);
+}
+
+static inline void pci_unlock_bus(struct pci_bus *bus)
+{
+ mutex_unlock(&bus->lock);
+}
+
#define to_pci_bus(n) container_of(n, struct pci_bus, dev)
static inline u16 pci_dev_id(struct pci_dev *dev)
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 0/8] pci: rescan/remove locking rework
2024-07-22 15:19 [PATCH RFC 0/8] pci: rescan/remove locking rework Keith Busch
` (7 preceding siblings ...)
2024-07-22 15:19 ` [PATCH RFC 8/8] pci: use finer grain locking for bus protection Keith Busch
@ 2024-08-07 15:40 ` Keith Busch
8 siblings, 0 replies; 21+ messages in thread
From: Keith Busch @ 2024-08-07 15:40 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-pci, bhelgaas, lukas, mika.westerberg
On Mon, Jul 22, 2024 at 08:19:28AM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> This patch set targets a subset of pci bus scanning and removals that
> were shown to be problematic with deep pci topologies that support
> native hotplug. I've tried to capture the common pci components, but
> there are definitely many subsystems accessing the topology in their own
> way, many of which I can't possibly test, and I have not tried to
> convert every user to this new locking scheme. However, if I did this
> correctly, they should be no worse off than today!
>
> The earlier patches are just cleanups and/or making it a little easier
> to change the locking schemes. The real stuff happens from patches 7 and
> 8.
>
> I've run this with lockdep enabled, tested concurrent hotplug events on
> various x86 platforms with layers of pci switches. That said, as
> mentioned earlier, there are many paths to here that I haven't been able
> to test, so the final patch might be considered experimental.
Any thoughts on this new scheme? I know I sent this during the merge
window, so maybe bad timing on my part. Changing the locks like this is
kind of scary too, so if that's the case, could we reconsider the
band-aid patch from before?
https://lore.kernel.org/linux-pci/20240612181625.3604512-2-kbusch@meta.com/
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 1/8] pci: make pci_stop_dev concurrent safe
2024-07-22 15:19 ` [PATCH RFC 1/8] pci: make pci_stop_dev concurrent safe Keith Busch
@ 2024-08-15 14:17 ` Jonathan Cameron
2024-08-20 15:02 ` Keith Busch
0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Cameron @ 2024-08-15 14:17 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-pci, bhelgaas, lukas, mika.westerberg, Keith Busch
On Mon, 22 Jul 2024 08:19:29 -0700
Keith Busch <kbusch@meta.com> wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> Use the atomic ADDED flag to safely ensure concurrent callers can't
> attempt to stop the device multiple times.
Maybe mention what concurrent paths exist where this might happen.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
Change looks sensible anyway. FWIW as I'm not an expert in these paths.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 2/8] pci: make pci_destroy_dev concurrent safe
2024-07-22 15:19 ` [PATCH RFC 2/8] pci: make pci_destroy_dev " Keith Busch
@ 2024-08-15 14:18 ` Jonathan Cameron
0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2024-08-15 14:18 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-pci, bhelgaas, lukas, mika.westerberg, Keith Busch
On Mon, 22 Jul 2024 08:19:30 -0700
Keith Busch <kbusch@meta.com> wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> Use an atomic flag instead of the racey check against the device's kobj
> parent. We shouldn't be poking into device implementation details at
> this level anyway.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Absolutely agree we shouldn't be poking the kobj parent here.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 3/8] pci: move the walk bus lock to where its needed
2024-07-22 15:19 ` [PATCH RFC 3/8] pci: move the walk bus lock to where its needed Keith Busch
@ 2024-08-15 14:20 ` Jonathan Cameron
0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2024-08-15 14:20 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-pci, bhelgaas, lukas, mika.westerberg, Keith Busch
On Mon, 22 Jul 2024 08:19:31 -0700
Keith Busch <kbusch@meta.com> wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> Simplify the common function by removing an unnecessary parameter that
> can be more easily handled in the only caller that wants it.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
Good cleanup irrespective of anything else in this series.
I guess there is some history behind this one.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 4/8] pci: walk bus recursively
2024-07-22 15:19 ` [PATCH RFC 4/8] pci: walk bus recursively Keith Busch
@ 2024-08-15 14:33 ` Jonathan Cameron
0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2024-08-15 14:33 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-pci, bhelgaas, lukas, mika.westerberg, Keith Busch
On Mon, 22 Jul 2024 08:19:32 -0700
Keith Busch <kbusch@meta.com> wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> The original implementation purposefully chose a non-recursive walk,
> presumably as a precaution on stack use. We do recursive bus walking in
> other places though. For example:
>
> pci_bus_resettable
> pci_stop_bus_device
> pci_remove_bus_device
> pci_bus_allocate_dev_resources
>
> So, recursive pci bus walking is well tested and safe, and the
> implementation is easier to follow. The motivation for changing it now
> is to make it easier to introduce finer grain locking in the future.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
Trivial naming question inline. Otherwise LGTM
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/pci/bus.c | 34 ++++++++++------------------------
> 1 file changed, 10 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 7c07a141e8772..b7208e644c79f 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -389,37 +389,23 @@ void pci_bus_add_devices(const struct pci_bus *bus)
> }
> EXPORT_SYMBOL(pci_bus_add_devices);
>
> -static void __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
> +static int __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
Keeping the parameter name of 'top' seems less than intuitive
now this is recursive as on the recursions it is no longer the top.
Maybe just call it bus? That will make this diff really confusing however.
> void *userdata)
> {
> struct pci_dev *dev;
> - struct pci_bus *bus;
> - struct list_head *next;
> - int retval;
> + int ret = 0;
>
> - bus = top;
> - next = top->devices.next;
> - for (;;) {
> - if (next == &bus->devices) {
> - /* end of this bus, go up or finish */
> - if (bus == top)
> + list_for_each_entry(dev, &top->devices, bus_list) {
> + ret = cb(dev, userdata);
> + if (ret)
> + break;
> + if (dev->subordinate) {
> + ret = __pci_walk_bus(dev->subordinate, cb, userdata);
> + if (ret)
> break;
> - next = bus->self->bus_list.next;
> - bus = bus->self->bus;
> - continue;
> }
> - dev = list_entry(next, struct pci_dev, bus_list);
> - if (dev->subordinate) {
> - /* this is a pci-pci bridge, do its devices next */
> - next = dev->subordinate->devices.next;
> - bus = dev->subordinate;
> - } else
> - next = dev->bus_list.next;
> -
> - retval = cb(dev, userdata);
> - if (retval)
> - break;
> }
> + return ret;
> }
>
> /**
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 5/8] pci: unexport pci_walk_bus_locked
2024-07-22 15:19 ` [PATCH RFC 5/8] pci: unexport pci_walk_bus_locked Keith Busch
@ 2024-08-15 14:36 ` Jonathan Cameron
0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2024-08-15 14:36 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-pci, bhelgaas, lukas, mika.westerberg, Keith Busch
On Mon, 22 Jul 2024 08:19:33 -0700
Keith Busch <kbusch@meta.com> wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> There's only one user of this, and it's internal, so no need to export
> it.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Maybe Bjorn can queue a few of these cleanups to reduce the scope
of future versions of this to just the more complex patches?
Jonathan
> ---
> drivers/pci/bus.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index b7208e644c79f..638e79d10bfab 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -435,7 +435,6 @@ void pci_walk_bus_locked(struct pci_bus *top, int (*cb)(struct pci_dev *, void *
>
> __pci_walk_bus(top, cb, userdata);
> }
> -EXPORT_SYMBOL_GPL(pci_walk_bus_locked);
>
> struct pci_bus *pci_bus_get(struct pci_bus *bus)
> {
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 6/8] pci: add helpers for stop and remove bus
2024-07-22 15:19 ` [PATCH RFC 6/8] pci: add helpers for stop and remove bus Keith Busch
@ 2024-08-15 14:49 ` Jonathan Cameron
0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2024-08-15 14:49 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-pci, bhelgaas, lukas, mika.westerberg, Keith Busch
On Mon, 22 Jul 2024 08:19:34 -0700
Keith Busch <kbusch@meta.com> wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> There are repeated patterns of tearing down pci buses, so combine to
> helper functions and use these.
There are some subtle changes in ordering in here. I'm not
immediately convinced by all of them.
Perhaps this should be broken down further so we get the direct
code replacements that are easy to review, the movement of calls
to different functions (e.g. addition of pci_clear_bus() in
pci_remove_bus() and dropping that call, or the code that it matches
in two other places).
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> drivers/pci/remove.c | 46 +++++++++++++++++++++++---------------------
> 1 file changed, 24 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index 8284ab20949c9..288162a11ab19 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -4,6 +4,9 @@
> #include <linux/of_platform.h>
> #include "pci.h"
>
> +static void pci_stop_bus(struct pci_bus *bus);
> +static void pci_remove_bus_device(struct pci_dev *dev);
> +
> static void pci_free_resources(struct pci_dev *dev)
> {
> struct resource *res;
> @@ -45,8 +48,17 @@ static void pci_destroy_dev(struct pci_dev *dev)
> put_device(&dev->dev);
> }
>
> +static void pci_clear_bus(struct pci_bus *bus)
> +{
> + struct pci_dev *dev, *next;
> +
> + list_for_each_entry_safe(dev, next, &bus->devices, bus_list)
> + pci_remove_bus_device(dev);
> +}
> +
> void pci_remove_bus(struct pci_bus *bus)
> {
> + pci_clear_bus(bus);
So this is replacing the list_for_each_entry_safe block that
was previously in pci_remove_root_bus / pci_remove_bus_device but there
are other callers of this function such as in xen-pcifront.c which
are going to see this change.
> pci_proc_detach_bus(bus);
>
> down_write(&pci_bus_sem);
> @@ -66,7 +78,15 @@ EXPORT_SYMBOL(pci_remove_bus);
>
> static void pci_remove_bus_device(struct pci_dev *dev)
> {
> struct pci_bus *bus = dev->subordinate;
> - struct pci_dev *child, *tmp;
>
> if (bus) {
> - list_for_each_entry_safe(child, tmp,
> - &bus->devices, bus_list)
> - pci_remove_bus_device(child);
> -
> pci_remove_bus(bus);
> dev->subordinate = NULL;
> }
> -
Grumpy reviewer hat. Unrelated change.
> pci_destroy_dev(dev);
> }
>
> @@ -129,16 +138,13 @@ EXPORT_SYMBOL_GPL(pci_stop_and_remove_bus_device_locked);
>
> void pci_stop_root_bus(struct pci_bus *bus)
> {
> - struct pci_dev *child, *tmp;
> struct pci_host_bridge *host_bridge;
>
> if (!pci_is_root_bus(bus))
> return;
>
> host_bridge = to_pci_host_bridge(bus->bridge);
> - list_for_each_entry_safe_reverse(child, tmp,
> - &bus->devices, bus_list)
> - pci_stop_bus_device(child);
> + pci_stop_bus(bus);
>
> /* stop the host bridge */
> device_release_driver(&host_bridge->dev);
> @@ -147,16 +153,12 @@ EXPORT_SYMBOL_GPL(pci_stop_root_bus);
>
> void pci_remove_root_bus(struct pci_bus *bus)
> {
> - struct pci_dev *child, *tmp;
> struct pci_host_bridge *host_bridge;
>
> if (!pci_is_root_bus(bus))
> return;
>
> host_bridge = to_pci_host_bridge(bus->bridge);
> - list_for_each_entry_safe(child, tmp,
> - &bus->devices, bus_list)
> - pci_remove_bus_device(child);
>
> #ifdef CONFIG_PCI_DOMAINS_GENERIC
> /* Release domain_nr if it was dynamically allocated */
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 7/8] pci: reference count subordinate
2024-07-22 15:19 ` [PATCH RFC 7/8] pci: reference count subordinate Keith Busch
@ 2024-08-15 15:10 ` Jonathan Cameron
0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2024-08-15 15:10 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-pci, bhelgaas, lukas, mika.westerberg, Keith Busch
On Mon, 22 Jul 2024 08:19:35 -0700
Keith Busch <kbusch@meta.com> wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> The subordinate is accessed under the pci_bus_sem (or often times no
> lock at at all), but unset under the rescan_remove_lock. Access the
> subordinate buses with reference counts to guard against a concurrent
> removal and use-after-free.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
Hi Keith,
A few comments inline.
Jonathan
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e3a49f66982d5..0cd36b7772c8b 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3113,9 +3113,14 @@ void pci_bridge_d3_update(struct pci_dev *dev)
> * so we need to go through all children to find out if one of them
> * continues to block D3.
> */
> - if (d3cold_ok && !bridge->bridge_d3)
> - pci_walk_bus(bridge->subordinate, pci_dev_check_d3cold,
> - &d3cold_ok);
> + if (d3cold_ok && !bridge->bridge_d3) {
> + struct pci_bus *bus = pci_dev_get_subordinate(bridge);
> +
> + if (bus) {
> + pci_walk_bus(bus, pci_dev_check_d3cold, &d3cold_ok);
> + pci_bus_put(bus);
I'd be tempted to call pci_bus_put(bus) unconditionally but doesn't matter
a lot.
> + }
> + }
>
> if (bridge->bridge_d3 != d3cold_ok) {
> bridge->bridge_d3 = d3cold_ok;
> @@ -4824,6 +4829,7 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus)
> int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
> {
> struct pci_dev *child __free(pci_dev_put) = NULL;
I would moan about constructors and desctructors together, but unrelated
to this patch and would actually break the change I suggest below given
the lifetime of child is longer than the loop where it's gotten.
So I won't moan about it :)
> + struct pci_bus *bus;
> int delay;
>
> if (pci_dev_is_disconnected(dev))
> @@ -4840,7 +4846,17 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
> * board_added(). In case of ACPI hotplug the firmware is expected
> * to configure the devices before OS is notified.
> */
> - if (!dev->subordinate || list_empty(&dev->subordinate->devices)) {
> + bus = pci_dev_get_subordinate(dev);
> + if (!bus) {
> + up_read(&pci_bus_sem);
> + return 0;
> + }
> +
> + child = pci_dev_get(list_first_entry_or_null(&bus->devices,
> + struct pci_dev,
> + bus_list));
> + if (!child) {
> + pci_bus_put(bus);
> up_read(&pci_bus_sem);
> return 0;
> }
> @@ -4848,12 +4864,12 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
> /* Take d3cold_delay requirements into account */
> delay = pci_bus_max_d3cold_delay(dev->subordinate);
> if (!delay) {
> + pci_bus_put(bus);
> up_read(&pci_bus_sem);
> return 0;
> }
>
> - child = pci_dev_get(list_first_entry(&dev->subordinate->devices,
> - struct pci_dev, bus_list));
> + pci_bus_put(bus);
> up_read(&pci_bus_sem);
I'd like scoped_guard() {
struct pci_bus *bus __free(pci_bus_put) = pci_dev_get_sub...
here so that the manual cleanup can be avoided in the early return paths.
}
}
>
> /*
...
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index cee2365e54b8b..3c0c83d35ab86 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -1212,9 +1212,11 @@ static void pcie_update_aspm_capable(struct pcie_link_state *root)
> link->aspm_capable = link->aspm_support;
> }
> list_for_each_entry(link, &link_list, sibling) {
> + struct pci_bus *linkbus;
> struct pci_dev *child;
> - struct pci_bus *linkbus = link->pdev->subordinate;
> - if (link->root != root)
> +
> + linkbus = pci_dev_get_subordinate(link->pdev);
Maybe worth a
DEFINE_FREE() for pci_bus_put to match the one for pci_dev_put?
> + if (!linkbus || link->root != root)
> continue;
> list_for_each_entry(child, &linkbus->devices, bus_list) {
> if ((pci_pcie_type(child) != PCI_EXP_TYPE_ENDPOINT) &&
> @@ -1222,6 +1224,7 @@ static void pcie_update_aspm_capable(struct pcie_link_state *root)
> continue;
> pcie_aspm_check_latency(child);
> }
> + pci_bus_put(linkbus);
> }
> }
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index b14b9876c0303..53522685193da 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
...
> @@ -3380,7 +3383,7 @@ int pci_hp_add_bridge(struct pci_dev *dev)
As far as I can tell the return value of this function is never used.
So could just drop the code below. Maybe clean up this function
to return void or start handling the return value.
> /* Scan bridges that need to be reconfigured */
> pci_scan_bridge_extend(parent, dev, busnr, available_buses, 1);
>
> - if (!dev->subordinate)
> + if (!READ_ONCE(dev->subordinate))
> return -1;
>
> return 0;
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 8/8] pci: use finer grain locking for bus protection
2024-07-22 15:19 ` [PATCH RFC 8/8] pci: use finer grain locking for bus protection Keith Busch
@ 2024-08-15 15:21 ` Jonathan Cameron
2024-08-15 17:05 ` Keith Busch
0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Cameron @ 2024-08-15 15:21 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-pci, bhelgaas, lukas, mika.westerberg, Keith Busch
On Mon, 22 Jul 2024 08:19:36 -0700
Keith Busch <kbusch@meta.com> wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> The global rescan_remove lock has deadlocks during concurrent removals
> because it is used within interrupt handlers. Use a bus specific lock
> instead.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
Looks reasonable to me. I'm not particularly confident on this one
so more eyes (plus testing) would be good.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Jonathan
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 8/8] pci: use finer grain locking for bus protection
2024-08-15 15:21 ` Jonathan Cameron
@ 2024-08-15 17:05 ` Keith Busch
0 siblings, 0 replies; 21+ messages in thread
From: Keith Busch @ 2024-08-15 17:05 UTC (permalink / raw)
To: Jonathan Cameron; +Cc: Keith Busch, linux-pci, bhelgaas, lukas, mika.westerberg
On Thu, Aug 15, 2024 at 04:21:26PM +0100, Jonathan Cameron wrote:
> > The global rescan_remove lock has deadlocks during concurrent removals
> > because it is used within interrupt handlers. Use a bus specific lock
> > instead.
> >
> > Signed-off-by: Keith Busch <kbusch@kernel.org>
>
> Looks reasonable to me. I'm not particularly confident on this one
> so more eyes (plus testing) would be good.
Thanks for the reviews!
For the hardware I have, it is limited to x86 and I tested concurrent
pcie native hotplugs and errors with PCIe switches. I could get that to
reliably deadlock or crash before, and those are fixed with this patch
set.
But indeed, there are many more paths using this common pci code than
what I could test. I am confident about the earlier patches from this
series, but I also feel this last patch ought to be tried on a more
diversified set of platforms.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 1/8] pci: make pci_stop_dev concurrent safe
2024-08-15 14:17 ` Jonathan Cameron
@ 2024-08-20 15:02 ` Keith Busch
2024-08-21 11:01 ` Jonathan Cameron
0 siblings, 1 reply; 21+ messages in thread
From: Keith Busch @ 2024-08-20 15:02 UTC (permalink / raw)
To: Jonathan Cameron; +Cc: Keith Busch, linux-pci, bhelgaas, lukas, mika.westerberg
On Thu, Aug 15, 2024 at 03:17:17PM +0100, Jonathan Cameron wrote:
> On Mon, 22 Jul 2024 08:19:29 -0700
> Keith Busch <kbusch@meta.com> wrote:
>
> > From: Keith Busch <kbusch@kernel.org>
> >
> > Use the atomic ADDED flag to safely ensure concurrent callers can't
> > attempt to stop the device multiple times.
>
> Maybe mention what concurrent paths exist where this might happen.
I think everyone calling this is holding the pci_rescan_remove_lock, so
it shouldn't be possible today. This series aims to remove that lock
though, so this is more of a prep patch for that. But also, the flag is
already an atomic type, so using those properties makes sense on its own
too.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 1/8] pci: make pci_stop_dev concurrent safe
2024-08-20 15:02 ` Keith Busch
@ 2024-08-21 11:01 ` Jonathan Cameron
0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2024-08-21 11:01 UTC (permalink / raw)
To: Keith Busch; +Cc: Keith Busch, linux-pci, bhelgaas, lukas, mika.westerberg
On Tue, 20 Aug 2024 09:02:20 -0600
Keith Busch <kbusch@kernel.org> wrote:
> On Thu, Aug 15, 2024 at 03:17:17PM +0100, Jonathan Cameron wrote:
> > On Mon, 22 Jul 2024 08:19:29 -0700
> > Keith Busch <kbusch@meta.com> wrote:
> >
> > > From: Keith Busch <kbusch@kernel.org>
> > >
> > > Use the atomic ADDED flag to safely ensure concurrent callers can't
> > > attempt to stop the device multiple times.
> >
> > Maybe mention what concurrent paths exist where this might happen.
>
> I think everyone calling this is holding the pci_rescan_remove_lock, so
> it shouldn't be possible today. This series aims to remove that lock
> though, so this is more of a prep patch for that. But also, the flag is
> already an atomic type, so using those properties makes sense on its own
> too.
Ok. Perhaps mention that it's cleanup / a prep patch rather than a fix.
The current text scared me a bit ;)
Jonathan
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-08-21 11:01 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-22 15:19 [PATCH RFC 0/8] pci: rescan/remove locking rework Keith Busch
2024-07-22 15:19 ` [PATCH RFC 1/8] pci: make pci_stop_dev concurrent safe Keith Busch
2024-08-15 14:17 ` Jonathan Cameron
2024-08-20 15:02 ` Keith Busch
2024-08-21 11:01 ` Jonathan Cameron
2024-07-22 15:19 ` [PATCH RFC 2/8] pci: make pci_destroy_dev " Keith Busch
2024-08-15 14:18 ` Jonathan Cameron
2024-07-22 15:19 ` [PATCH RFC 3/8] pci: move the walk bus lock to where its needed Keith Busch
2024-08-15 14:20 ` Jonathan Cameron
2024-07-22 15:19 ` [PATCH RFC 4/8] pci: walk bus recursively Keith Busch
2024-08-15 14:33 ` Jonathan Cameron
2024-07-22 15:19 ` [PATCH RFC 5/8] pci: unexport pci_walk_bus_locked Keith Busch
2024-08-15 14:36 ` Jonathan Cameron
2024-07-22 15:19 ` [PATCH RFC 6/8] pci: add helpers for stop and remove bus Keith Busch
2024-08-15 14:49 ` Jonathan Cameron
2024-07-22 15:19 ` [PATCH RFC 7/8] pci: reference count subordinate Keith Busch
2024-08-15 15:10 ` Jonathan Cameron
2024-07-22 15:19 ` [PATCH RFC 8/8] pci: use finer grain locking for bus protection Keith Busch
2024-08-15 15:21 ` Jonathan Cameron
2024-08-15 17:05 ` Keith Busch
2024-08-07 15:40 ` [PATCH RFC 0/8] pci: rescan/remove locking rework Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).