* [PATCH 2/2] PCI: err: ensure stable topology during handling
2024-06-10 22:03 Keith Busch
@ 2024-06-10 22:03 ` Keith Busch
0 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2024-06-10 22:03 UTC (permalink / raw)
To: linux-pci, lukas, bhelgaas; +Cc: Keith Busch
From: Keith Busch <kbusch@kernel.org>
DPC and AER handling access their subordinate bus devices. If pciehp should
happen to also trigger during this handling, it will remove all the subordinate
buses, then dereferecing any children may be a use-after-free. That may lead to
kernel panics like the below.
BUG: unable to handle page fault for address: 00000000091400c0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc Kdump: loaded Tainted: G E 6.9.0-0_fbk0_rc10_871_g4e98bf884071 #1
RIP: 0010:pci_bus_read_config_dword+0x17/0x50
Code: e9 0e 00 00 00 c7 01 ff ff ff ff b8 86 00 00 00 c3 cc cc 0f 1f 44 00 00 53 50 c7 44 24 04 00 00 00 00 f6 c2 03 75 27 48 89 cb <48> 8b 87 c0 00 00 00 4c 8d 44 24 04 b9 04 00 00 00 ff 50 18 85 c0
RSP: 0018:ffffc90039113d60 EFLAGS: 00010246
RAX: 0000000009140000 RBX: ffffc90039113d7c RCX: ffffc90039113d7c
RDX: 0000000000000004 RSI: 0000000000000000 RDI: 0000000009140000
RBP: 0000000000000100 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000001f975c6971 R12: 000000000000e9fc
R13: ffff88811b5b4000 R14: ffffc90039113d7c R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff899f7d3c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000091400c0 CR3: 00000243fb00f002 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
? __die+0x78/0xc0
? page_fault_oops+0x2a8/0x3a0
? sched_clock+0x5/0x10
? psi_task_switch+0x39/0xc90
? __switch_to+0x131/0x530
? exc_page_fault+0x63/0x130
? asm_exc_page_fault+0x22/0x30
? pci_bus_read_config_dword+0x17/0x50
pci_dev_wait+0x107/0x190
? dpc_completed+0x50/0x50
dpc_reset_link+0x4e/0xd0
pcie_do_recovery+0xb2/0x2d0
? irq_forced_thread_fn+0x60/0x60
dpc_handler+0x107/0x130
irq_thread_fn+0x19/0x40
irq_thread+0x120/0x1e0
? irq_thread_fn+0x40/0x40
? irq_forced_secondary_handler+0x20/0x20
kthread+0xae/0xe0
? file_tty_write+0x360/0x360
ret_from_fork+0x2f/0x40
? file_tty_write+0x360/0x360
ret_from_fork_asm+0x11/0x20
</TASK>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/pcie/err.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 31090770fffcc..5355fc0fbf910 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -192,7 +192,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_channel_state_t state,
pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev))
{
- int type = pci_pcie_type(dev);
+ int type = pci_pcie_type(dev), ret;
struct pci_dev *bridge;
pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
@@ -214,6 +214,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
else
bridge = pci_upstream_bridge(dev);
+
+ ret = pci_trylock_rescan_remove(bridge);
+ if (!ret)
+ return PCI_ERS_RESULT_DISCONNECT;
pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL);
pci_dbg(bridge, "broadcast error_detected message\n");
@@ -262,12 +266,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
}
pci_walk_bridge(bridge, pci_pm_runtime_put, NULL);
+ pci_unlock_rescan_remove();
pci_info(bridge, "device recovery successful\n");
return status;
failed:
pci_walk_bridge(bridge, pci_pm_runtime_put, NULL);
+ pci_unlock_rescan_remove();
pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT);
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 0/2] pcie hotplug and error fixes
@ 2024-06-12 18:10 Keith Busch
2024-06-12 18:10 ` [PATCH 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock Keith Busch
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Keith Busch @ 2024-06-12 18:10 UTC (permalink / raw)
To: linux-pci, lukas, bhelgaas; +Cc: Keith Busch
From: Keith Busch <kbusch@kernel.org>
I am working with larger pcie topologies again, and we're seeing some
failures when dealing with certain overlapping pcie events.
The topology is essentially this:
[Root Port] <-> [UpStream Port] <-> [DownStream Port] <-> [End Device]
An error between the DSP and ED triggers DPC. There's only inband
presence detection so it also triggers hotplug. Before the error
handling is completed, though, another error seen by the RP triggers its
own DPC handling.
The concurrent event handling reveals some interesting races, and this
small patchset tries to address these in the low invasive way.
Keith Busch (2):
PCI: pciehp: fix concurrent sub-tree removal deadlock
PCI: err: ensure stable topology during handling
drivers/pci/hotplug/pciehp_pci.c | 12 +++++++++---
drivers/pci/pci.h | 1 +
drivers/pci/pcie/err.c | 8 +++++++-
drivers/pci/probe.c | 24 ++++++++++++++++++++++++
include/linux/pci.h | 2 ++
5 files changed, 43 insertions(+), 4 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock
2024-06-12 18:10 [PATCH 0/2] pcie hotplug and error fixes Keith Busch
@ 2024-06-12 18:10 ` Keith Busch
2024-06-12 18:10 ` [PATCH 2/2] PCI: err: ensure stable topology during handling Keith Busch
2024-06-12 18:11 ` [PATCH 0/2] pcie hotplug and error fixes Keith Busch
2 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2024-06-12 18:10 UTC (permalink / raw)
To: linux-pci, lukas, bhelgaas; +Cc: Keith Busch
From: Keith Busch <kbusch@kernel.org>
PCIe hotplug events modify the topology in their IRQ thread once it can
acquire the global pci_rescan_remove_lock.
If a different removal event happens to acquire that lock first, and
that removal event is for the parent device of the bridge processing the
other hotplug event, then we are deadlocked: the parent removal will
wait indefinitely on the child's IRQ thread because the parent is
holding the global lock the child thread needs to make forward progress.
Introduce a new locking function that aborts if the device is being
removed. The following are stack traces of the deadlock:
Task A:
pciehp_unconfigure_device+0x41/0x120
pciehp_disable_slot+0x3c/0xc0
pciehp_handle_presence_or_link_change+0x28f/0x3e0
pciehp_ist+0xc3/0x210
irq_thread_fn+0x19/0x40
Task B:
__synchronize_irq+0x5b/0x90
free_irq+0x192/0x2e0
pcie_shutdown_notification+0x3b/0x40
pciehp_remove+0x23/0x50
pcie_port_remove_service+0x2c/0x40
device_release_driver_internal+0x11f/0x180
bus_remove_device+0xc5/0x110
device_del+0x126/0x340
device_unregister+0x13/0x50
remove_iter+0x17/0x20
device_for_each_child+0x4a/0x70
pcie_portdrv_remove+0x23/0x40
pci_device_remove+0x24/0x60
device_release_driver_internal+0x11f/0x180
pci_stop_bus_device+0x57/0x80
pci_stop_bus_device+0x2c/0x80
pci_stop_and_remove_bus_device+0xe/0x20
pciehp_unconfigure_device+0x76/0x120
pciehp_disable_slot+0x3c/0xc0
pciehp_handle_presence_or_link_change+0x28f/0x3e0
pciehp_ist+0xc3/0x210
irq_thread_fn+0x19/0x40
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/hotplug/pciehp_pci.c | 12 +++++++++---
drivers/pci/pci.h | 1 +
drivers/pci/probe.c | 24 ++++++++++++++++++++++++
include/linux/pci.h | 2 ++
4 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index ad12515a4a121..ca6237b0732c8 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -34,9 +34,12 @@ int pciehp_configure_device(struct controller *ctrl)
struct pci_dev *dev;
struct pci_dev *bridge = ctrl->pcie->port;
struct pci_bus *parent = bridge->subordinate;
- int num, ret = 0;
+ int num, ret;
- pci_lock_rescan_remove();
+ ret = pci_trylock_rescan_remove(bridge);
+ if (!ret)
+ return -ENODEV;
+ ret = 0;
dev = pci_get_slot(parent, PCI_DEVFN(0, 0));
if (dev) {
@@ -93,6 +96,7 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
struct pci_dev *dev, *temp;
struct pci_bus *parent = ctrl->pcie->port->subordinate;
u16 command;
+ int ret;
ctrl_dbg(ctrl, "%s: domain:bus:dev = %04x:%02x:00\n",
__func__, pci_domain_nr(parent), parent->number);
@@ -100,7 +104,9 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
if (!presence)
pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
- pci_lock_rescan_remove();
+ ret = pci_trylock_rescan_remove(parent->self);
+ if (!ret)
+ return;
/*
* Stopping an SR-IOV PF device removes all the associated VFs,
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index fd44565c47562..f525490a02122 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -370,6 +370,7 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
{
pci_dev_set_io_state(dev, pci_channel_io_perm_failure);
pci_doe_disconnected(dev);
+ pci_notify_disconnected();
return 0;
}
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 5fbabb4e3425f..d2e19a1d1a45b 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3302,6 +3302,7 @@ EXPORT_SYMBOL_GPL(pci_rescan_bus);
* routines should always be executed under this mutex.
*/
static DEFINE_MUTEX(pci_rescan_remove_lock);
+static DECLARE_WAIT_QUEUE_HEAD(pci_lock_wq);
void pci_lock_rescan_remove(void)
{
@@ -3309,12 +3310,35 @@ void pci_lock_rescan_remove(void)
}
EXPORT_SYMBOL_GPL(pci_lock_rescan_remove);
+/*
+ * pci_trylock_rescan_remove() - keep trying to take the lock until successful
+ * or notified the device is disconnected
+ *
+ * Returns 1 if the lock was successfully taken, 0 otherwise.
+ */
+bool pci_trylock_rescan_remove(struct pci_dev *dev)
+{
+ int ret;
+
+ wait_event(pci_lock_wq,
+ (ret = mutex_trylock(&pci_rescan_remove_lock)) == 1 ||
+ pci_dev_is_disconnected(dev));
+
+ return ret;
+}
+
void pci_unlock_rescan_remove(void)
{
mutex_unlock(&pci_rescan_remove_lock);
+ wake_up_all(&pci_lock_wq);
}
EXPORT_SYMBOL_GPL(pci_unlock_rescan_remove);
+void pci_notify_disconnected(void)
+{
+ wake_up_all(&pci_lock_wq);
+}
+
static int __init pci_sort_bf_cmp(const struct device *d_a,
const struct device *d_b)
{
diff --git a/include/linux/pci.h b/include/linux/pci.h
index cafc5ab1cbcb4..b05aaf9aac6c8 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1442,7 +1442,9 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev);
unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge);
unsigned int pci_rescan_bus(struct pci_bus *bus);
void pci_lock_rescan_remove(void);
+bool pci_trylock_rescan_remove(struct pci_dev *dev);
void pci_unlock_rescan_remove(void);
+void pci_notify_disconnected(void);
/* Vital Product Data routines */
ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf);
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] PCI: err: ensure stable topology during handling
2024-06-12 18:10 [PATCH 0/2] pcie hotplug and error fixes Keith Busch
2024-06-12 18:10 ` [PATCH 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock Keith Busch
@ 2024-06-12 18:10 ` Keith Busch
2024-06-12 18:11 ` [PATCH 0/2] pcie hotplug and error fixes Keith Busch
2 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2024-06-12 18:10 UTC (permalink / raw)
To: linux-pci, lukas, bhelgaas; +Cc: Keith Busch
From: Keith Busch <kbusch@kernel.org>
DPC and AER handling access their subordinate bus devices. If pciehp should
happen to also trigger during this handling, it will remove all the subordinate
buses, then dereferecing any children may be a use-after-free. That may lead to
kernel panics like the below.
BUG: unable to handle page fault for address: 00000000091400c0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc Kdump: loaded Tainted: G E 6.9.0-0_fbk0_rc10_871_g4e98bf884071 #1
RIP: 0010:pci_bus_read_config_dword+0x17/0x50
Code: e9 0e 00 00 00 c7 01 ff ff ff ff b8 86 00 00 00 c3 cc cc 0f 1f 44 00 00 53 50 c7 44 24 04 00 00 00 00 f6 c2 03 75 27 48 89 cb <48> 8b 87 c0 00 00 00 4c 8d 44 24 04 b9 04 00 00 00 ff 50 18 85 c0
RSP: 0018:ffffc90039113d60 EFLAGS: 00010246
RAX: 0000000009140000 RBX: ffffc90039113d7c RCX: ffffc90039113d7c
RDX: 0000000000000004 RSI: 0000000000000000 RDI: 0000000009140000
RBP: 0000000000000100 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000001f975c6971 R12: 000000000000e9fc
R13: ffff88811b5b4000 R14: ffffc90039113d7c R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff899f7d3c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000091400c0 CR3: 00000243fb00f002 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
? __die+0x78/0xc0
? page_fault_oops+0x2a8/0x3a0
? sched_clock+0x5/0x10
? psi_task_switch+0x39/0xc90
? __switch_to+0x131/0x530
? exc_page_fault+0x63/0x130
? asm_exc_page_fault+0x22/0x30
? pci_bus_read_config_dword+0x17/0x50
pci_dev_wait+0x107/0x190
? dpc_completed+0x50/0x50
dpc_reset_link+0x4e/0xd0
pcie_do_recovery+0xb2/0x2d0
? irq_forced_thread_fn+0x60/0x60
dpc_handler+0x107/0x130
irq_thread_fn+0x19/0x40
irq_thread+0x120/0x1e0
? irq_thread_fn+0x40/0x40
? irq_forced_secondary_handler+0x20/0x20
kthread+0xae/0xe0
? file_tty_write+0x360/0x360
ret_from_fork+0x2f/0x40
? file_tty_write+0x360/0x360
ret_from_fork_asm+0x11/0x20
</TASK>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/pcie/err.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 31090770fffcc..5355fc0fbf910 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -192,7 +192,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_channel_state_t state,
pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev))
{
- int type = pci_pcie_type(dev);
+ int type = pci_pcie_type(dev), ret;
struct pci_dev *bridge;
pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
@@ -214,6 +214,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
else
bridge = pci_upstream_bridge(dev);
+
+ ret = pci_trylock_rescan_remove(bridge);
+ if (!ret)
+ return PCI_ERS_RESULT_DISCONNECT;
pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL);
pci_dbg(bridge, "broadcast error_detected message\n");
@@ -262,12 +266,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
}
pci_walk_bridge(bridge, pci_pm_runtime_put, NULL);
+ pci_unlock_rescan_remove();
pci_info(bridge, "device recovery successful\n");
return status;
failed:
pci_walk_bridge(bridge, pci_pm_runtime_put, NULL);
+ pci_unlock_rescan_remove();
pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT);
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 0/2] pcie hotplug and error fixes
2024-06-12 18:10 [PATCH 0/2] pcie hotplug and error fixes Keith Busch
2024-06-12 18:10 ` [PATCH 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock Keith Busch
2024-06-12 18:10 ` [PATCH 2/2] PCI: err: ensure stable topology during handling Keith Busch
@ 2024-06-12 18:11 ` Keith Busch
2 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2024-06-12 18:11 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-pci, lukas, bhelgaas
On Wed, Jun 12, 2024 at 11:10:22AM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> I am working with larger pcie topologies again, and we're seeing some
> failures when dealing with certain overlapping pcie events.
>
> The topology is essentially this:
>
> [Root Port] <-> [UpStream Port] <-> [DownStream Port] <-> [End Device]
>
> An error between the DSP and ED triggers DPC. There's only inband
> presence detection so it also triggers hotplug. Before the error
> handling is completed, though, another error seen by the RP triggers its
> own DPC handling.
>
> The concurrent event handling reveals some interesting races, and this
> small patchset tries to address these in the low invasive way.
>
> Keith Busch (2):
> PCI: pciehp: fix concurrent sub-tree removal deadlock
> PCI: err: ensure stable topology during handling
Oops, sorry for the noise here. I accidently pointed git send-email to
the wrong directory and resent v1 that was already flagged as breaking.
I'll resend the correct version shortly.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-06-12 18:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-12 18:10 [PATCH 0/2] pcie hotplug and error fixes Keith Busch
2024-06-12 18:10 ` [PATCH 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock Keith Busch
2024-06-12 18:10 ` [PATCH 2/2] PCI: err: ensure stable topology during handling Keith Busch
2024-06-12 18:11 ` [PATCH 0/2] pcie hotplug and error fixes Keith Busch
-- strict thread matches above, loose matches on Subject: below --
2024-06-10 22:03 Keith Busch
2024-06-10 22:03 ` [PATCH 2/2] PCI: err: ensure stable topology during handling Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox