From: Keith Busch <kbusch@meta.com>
To: <linux-pci@vger.kernel.org>, <lukas@wunner.de>, <bhelgaas@google.com>
Cc: Keith Busch <kbusch@kernel.org>
Subject: [PATCH 2/2] PCI: err: ensure stable topology during handling
Date: Wed, 12 Jun 2024 11:10:24 -0700 [thread overview]
Message-ID: <20240612181024.3577119-3-kbusch@meta.com> (raw)
In-Reply-To: <20240612181024.3577119-1-kbusch@meta.com>
From: Keith Busch <kbusch@kernel.org>
DPC and AER handling access their subordinate bus devices. If pciehp should
happen to also trigger during this handling, it will remove all the subordinate
buses, then dereferecing any children may be a use-after-free. That may lead to
kernel panics like the below.
BUG: unable to handle page fault for address: 00000000091400c0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc Kdump: loaded Tainted: G E 6.9.0-0_fbk0_rc10_871_g4e98bf884071 #1
RIP: 0010:pci_bus_read_config_dword+0x17/0x50
Code: e9 0e 00 00 00 c7 01 ff ff ff ff b8 86 00 00 00 c3 cc cc 0f 1f 44 00 00 53 50 c7 44 24 04 00 00 00 00 f6 c2 03 75 27 48 89 cb <48> 8b 87 c0 00 00 00 4c 8d 44 24 04 b9 04 00 00 00 ff 50 18 85 c0
RSP: 0018:ffffc90039113d60 EFLAGS: 00010246
RAX: 0000000009140000 RBX: ffffc90039113d7c RCX: ffffc90039113d7c
RDX: 0000000000000004 RSI: 0000000000000000 RDI: 0000000009140000
RBP: 0000000000000100 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000001f975c6971 R12: 000000000000e9fc
R13: ffff88811b5b4000 R14: ffffc90039113d7c R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff899f7d3c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000091400c0 CR3: 00000243fb00f002 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
? __die+0x78/0xc0
? page_fault_oops+0x2a8/0x3a0
? sched_clock+0x5/0x10
? psi_task_switch+0x39/0xc90
? __switch_to+0x131/0x530
? exc_page_fault+0x63/0x130
? asm_exc_page_fault+0x22/0x30
? pci_bus_read_config_dword+0x17/0x50
pci_dev_wait+0x107/0x190
? dpc_completed+0x50/0x50
dpc_reset_link+0x4e/0xd0
pcie_do_recovery+0xb2/0x2d0
? irq_forced_thread_fn+0x60/0x60
dpc_handler+0x107/0x130
irq_thread_fn+0x19/0x40
irq_thread+0x120/0x1e0
? irq_thread_fn+0x40/0x40
? irq_forced_secondary_handler+0x20/0x20
kthread+0xae/0xe0
? file_tty_write+0x360/0x360
ret_from_fork+0x2f/0x40
? file_tty_write+0x360/0x360
ret_from_fork_asm+0x11/0x20
</TASK>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/pcie/err.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 31090770fffcc..5355fc0fbf910 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -192,7 +192,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_channel_state_t state,
pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev))
{
- int type = pci_pcie_type(dev);
+ int type = pci_pcie_type(dev), ret;
struct pci_dev *bridge;
pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
@@ -214,6 +214,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
else
bridge = pci_upstream_bridge(dev);
+
+ ret = pci_trylock_rescan_remove(bridge);
+ if (!ret)
+ return PCI_ERS_RESULT_DISCONNECT;
pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL);
pci_dbg(bridge, "broadcast error_detected message\n");
@@ -262,12 +266,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
}
pci_walk_bridge(bridge, pci_pm_runtime_put, NULL);
+ pci_unlock_rescan_remove();
pci_info(bridge, "device recovery successful\n");
return status;
failed:
pci_walk_bridge(bridge, pci_pm_runtime_put, NULL);
+ pci_unlock_rescan_remove();
pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT);
--
2.43.0
next prev parent reply other threads:[~2024-06-12 18:20 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-12 18:10 [PATCH 0/2] pcie hotplug and error fixes Keith Busch
2024-06-12 18:10 ` [PATCH 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock Keith Busch
2024-06-12 18:10 ` Keith Busch [this message]
2024-06-12 18:11 ` [PATCH 0/2] pcie hotplug and error fixes Keith Busch
-- strict thread matches above, loose matches on Subject: below --
2024-06-10 22:03 Keith Busch
2024-06-10 22:03 ` [PATCH 2/2] PCI: err: ensure stable topology during handling Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240612181024.3577119-3-kbusch@meta.com \
--to=kbusch@meta.com \
--cc=bhelgaas@google.com \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox