* [PATCH v2] PCI/ASPM: Fix UAF by disabling ASPM for link when child function is removed
@ 2023-05-07 3:40 Ding Hui
2023-05-18 21:54 ` Bjorn Helgaas
2024-12-23 3:39 ` [PATCH 0/1] " Daniel Stodden
0 siblings, 2 replies; 10+ messages in thread
From: Ding Hui @ 2023-05-07 3:40 UTC (permalink / raw)
To: bhelgaas
Cc: sathyanarayanan.kuppuswamy, vidyas, david.e.box, kai.heng.feng,
michael.a.bottini, rajatja, refactormyself, qinzongquan, dinghui,
linux-pci, linux-kernel
If the Function 0 of a Multi-Function device is software removed,
a freed downstream pointer will be left in struct pcie_link_state,
and then when pcie_config_aspm_link() be invoked from any path,
we will trigger use-after-free, e.g.:
Reproducer:
[root@host ~]# cat repro.sh
#!/bin/bash
DEV_F0="0000:03:00.0"
echo 1 > /sys/bus/pci/devices/$DEV_F0/remove
echo powersave > /sys/module/pcie_aspm/parameters/policy
Result:
==================================================================
BUG: KASAN: slab-use-after-free in pcie_config_aspm_link+0x42d/0x500
Read of size 4 at addr ffff8881070c80a0 by task repro.sh/2056
CPU: 3 PID: 2056 Comm: repro.sh Not tainted 6.3.0+ #15
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Call Trace:
<TASK>
dump_stack_lvl+0x33/0x50
print_address_description.constprop.0+0x27/0x310
print_report+0x3e/0x70
kasan_report+0xae/0xe0
pcie_config_aspm_link+0x42d/0x500
pcie_aspm_set_policy+0x8e/0x1a0
param_attr_store+0x162/0x2c0
module_attr_store+0x3e/0x80
kernfs_fop_write_iter+0x2d5/0x460
vfs_write+0x72e/0xae0
ksys_write+0xed/0x1c0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
As per PCIe spec r6.0, sec 7.5.3.7, it is recommended that software
program the same value in all Functions for Multi-Function Devices
(including ARI Devices). For ARI Devices, ASPM Control is determined
solely by the setting in Function 0.
So we can just disable ASPM of the whole component if any child
function is removed, the downstream pointer will be avoided from
use-after-free, that will also avoid other potential corner cases.
Fixes: b5a0a9b59c81 ("PCI/ASPM: Read and set up L1 substate capabilities")
Debugged-by: Zongquan Qin <qinzongquan@sangfor.com.cn>
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
---
v2:
- better commit title and message
- update comment
- add reproduction steps
v1: https://lore.kernel.org/lkml/20230504123418.4438-1-dinghui@sangfor.com.cn/
Link: https://lore.kernel.org/lkml/20230429132604.31853-1-dinghui@sangfor.com.cn/
---
drivers/pci/pcie/aspm.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 66d7514ca111..06152cc39fea 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -1010,18 +1010,15 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
down_read(&pci_bus_sem);
mutex_lock(&aspm_lock);
- /*
- * All PCIe functions are in one slot, remove one function will remove
- * the whole slot, so just wait until we are the last function left.
- */
- if (!list_empty(&parent->subordinate->devices))
- goto out;
link = parent->link_state;
root = link->root;
parent_link = link->parent;
- /* All functions are removed, so just disable ASPM for the link */
+ /*
+ * For any function removed, disable ASPM for the link. See PCIe r6.0,
+ * sec 7.7.3.7 for details.
+ */
pcie_config_aspm_link(link, 0);
list_del(&link->sibling);
/* Clock PM is for endpoint device */
@@ -1032,7 +1029,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
pcie_update_aspm_capable(root);
pcie_config_aspm_path(parent_link);
}
-out:
+
mutex_unlock(&aspm_lock);
up_read(&pci_bus_sem);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] PCI/ASPM: Fix UAF by disabling ASPM for link when child function is removed
2023-05-07 3:40 [PATCH v2] PCI/ASPM: Fix UAF by disabling ASPM for link when child function is removed Ding Hui
@ 2023-05-18 21:54 ` Bjorn Helgaas
2024-12-23 3:39 ` [PATCH 0/1] " Daniel Stodden
1 sibling, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2023-05-18 21:54 UTC (permalink / raw)
To: Ding Hui
Cc: bhelgaas, sathyanarayanan.kuppuswamy, vidyas, david.e.box,
kai.heng.feng, michael.a.bottini, rajatja, refactormyself,
qinzongquan, linux-pci, linux-kernel
On Sun, May 07, 2023 at 11:40:57AM +0800, Ding Hui wrote:
> If the Function 0 of a Multi-Function device is software removed,
> a freed downstream pointer will be left in struct pcie_link_state,
> and then when pcie_config_aspm_link() be invoked from any path,
> we will trigger use-after-free, e.g.:
>
> Reproducer:
>
> [root@host ~]# cat repro.sh
> #!/bin/bash
> DEV_F0="0000:03:00.0"
> echo 1 > /sys/bus/pci/devices/$DEV_F0/remove
> echo powersave > /sys/module/pcie_aspm/parameters/policy
>
> Result:
>
> ==================================================================
> BUG: KASAN: slab-use-after-free in pcie_config_aspm_link+0x42d/0x500
> Read of size 4 at addr ffff8881070c80a0 by task repro.sh/2056
> CPU: 3 PID: 2056 Comm: repro.sh Not tainted 6.3.0+ #15
> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> Call Trace:
> <TASK>
> dump_stack_lvl+0x33/0x50
> print_address_description.constprop.0+0x27/0x310
> print_report+0x3e/0x70
> kasan_report+0xae/0xe0
> pcie_config_aspm_link+0x42d/0x500
> pcie_aspm_set_policy+0x8e/0x1a0
> param_attr_store+0x162/0x2c0
> module_attr_store+0x3e/0x80
> kernfs_fop_write_iter+0x2d5/0x460
> vfs_write+0x72e/0xae0
> ksys_write+0xed/0x1c0
> do_syscall_64+0x38/0x90
> entry_SYSCALL_64_after_hwframe+0x72/0xdc
>
> As per PCIe spec r6.0, sec 7.5.3.7, it is recommended that software
> program the same value in all Functions for Multi-Function Devices
> (including ARI Devices). For ARI Devices, ASPM Control is determined
> solely by the setting in Function 0.
>
> So we can just disable ASPM of the whole component if any child
> function is removed, the downstream pointer will be avoided from
> use-after-free, that will also avoid other potential corner cases.
>
> Fixes: b5a0a9b59c81 ("PCI/ASPM: Read and set up L1 substate capabilities")
> Debugged-by: Zongquan Qin <qinzongquan@sangfor.com.cn>
> Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Applied to pci/aspm for v6.5, thank you!
> ---
> v2:
> - better commit title and message
> - update comment
> - add reproduction steps
>
> v1: https://lore.kernel.org/lkml/20230504123418.4438-1-dinghui@sangfor.com.cn/
>
> Link: https://lore.kernel.org/lkml/20230429132604.31853-1-dinghui@sangfor.com.cn/
>
> ---
> drivers/pci/pcie/aspm.c | 13 +++++--------
> 1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 66d7514ca111..06152cc39fea 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -1010,18 +1010,15 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
>
> down_read(&pci_bus_sem);
> mutex_lock(&aspm_lock);
> - /*
> - * All PCIe functions are in one slot, remove one function will remove
> - * the whole slot, so just wait until we are the last function left.
> - */
> - if (!list_empty(&parent->subordinate->devices))
> - goto out;
>
> link = parent->link_state;
> root = link->root;
> parent_link = link->parent;
>
> - /* All functions are removed, so just disable ASPM for the link */
> + /*
> + * For any function removed, disable ASPM for the link. See PCIe r6.0,
> + * sec 7.7.3.7 for details.
> + */
> pcie_config_aspm_link(link, 0);
> list_del(&link->sibling);
> /* Clock PM is for endpoint device */
> @@ -1032,7 +1029,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
> pcie_update_aspm_capable(root);
> pcie_config_aspm_path(parent_link);
> }
> -out:
> +
> mutex_unlock(&aspm_lock);
> up_read(&pci_bus_sem);
> }
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/1] Re: PCI/ASPM: Fix UAF by disabling ASPM for link when child function is removed
2023-05-07 3:40 [PATCH v2] PCI/ASPM: Fix UAF by disabling ASPM for link when child function is removed Ding Hui
2023-05-18 21:54 ` Bjorn Helgaas
@ 2024-12-23 3:39 ` Daniel Stodden
2024-12-23 3:39 ` [PATCH 1/1] PCI/ASPM: fix link state exit during switch upstream function removal Daniel Stodden
2024-12-28 5:00 ` [PATCH 0/1] Re: PCI/ASPM: Fix UAF by disabling ASPM for link when child function is removed Ding Hui
1 sibling, 2 replies; 10+ messages in thread
From: Daniel Stodden @ 2024-12-23 3:39 UTC (permalink / raw)
To: dinghui
Cc: Daniel Stodden, bhelgaas, david.e.box, kai.heng.feng, linux-pci,
michael.a.bottini, qinzongquan, rajatja, refactormyself,
sathyanarayanan.kuppuswamy, vidyas
About change 456d8aa37d0f "PCI/ASPM: Disable ASPM on MFD function
removal to avoid use-after-free")
Let's say root port 00:03.0 was connected to a PCIe switch.
[00]-+..
|
+--03.0-[01-03]--+-00.0-[02-03]--+-02.0-[02]--x
| | \-0d.0-[03]----00.0 Endpoint
. +-00.1 Upstream Port Sibling
And that switch had not only an upstream port at 01:00.0, but also a
sibling function at 01:00.1.
Let's break the link under 00:03.0, which makes pciehp remove the [01]
bus. Surprise effect: traversal during bus [01] device removal happens
in reverse order (for SR-IOV-ish reasons, see pciehp_pci.c
commentary). Fair enough, ASPM should probably not rely on any
specific order anyway.
Recursing through pci_remove_bus_device() underneath, the order in
which we pci_destroy_dev() will be:
[ 01:00.1 [ 02:02.0 [ 03:00.0 ] 02:0d.0 ] 01:00.0 ]
Trivially, the above is also the order in which
pcie_aspm_exit_link_state() will be called.
Then note how, since above change removed the list_empty() exit
condition, we are now going to remove the pcie_link_state for bus [01]
(parent=<00:03.0>) during the first invocation, i.e. right at
pcie_aspm_exit_link_state(<01:00.1>).
Iow: with bus [03] removal only to come, we removed the
pcie_link_state upstream first, and only then will remove the
downstream pcie_link_state at parent=<02:0d.0>.
Eventually reaching that second link, it carries a ref "parent_link =
link->parent" which now points to free'd memory again. One can observe
a rather high probability of finishing with a random GPF or nullptr
dereference condition.
Above switches, with MFD upstream portions, exist. Case at hand is a
PEX8717 with 4 DMA engines:
+-08.0-[51-5b]--+-00.0-[52-5b]--+-02.0-[53]--
| \-0d.0-[54-5b]----00.0 Broadcom Inc. …
+-00.1 PLX Technology, Inc. PEX PCI Express Switch DMA interface
+-00.2 PLX Technology, Inc. PEX PCI Express Switch DMA interface
+-00.3 PLX Technology, Inc. PEX PCI Express Switch DMA interface
\-00.4 PLX Technology, Inc. PEX PCI Express Switch DMA interface
Backtrace:
[ 790.817077] BUG: kernel NULL pointer dereference, address: 00000000000000a0
[ 790.900514] #PF: supervisor read access in kernel mode
[ 790.962081] #PF: error_code(0x0000) - not-present page
[ 791.023641] PGD 8000000104648067 P4D 8000000104648067 PUD 1404bc067 PMD 0
[ 791.106041] Oops: 0000 [#1] PREEMPT SMP PTI
[ 791.156151] CPU: 8 PID: 145 Comm: irq/43-pciehp Tainted: G OE 6.1.0-22-2-amd64 #5 Debian 6.1.94-1
[ 791.279173] Hardware name: Intel Camelback Mountain CRB/Camelback Mountain CRB, BIOS Aboot-norcal7-7.1.6-generic-22971530 06/30/2021
[ 791.421982] RIP: 0010:pcie_config_aspm_link+0x48/0x330
[ 791.483548] Code: 48 8b 04 25 28 00 00 00 48 89 44 24 30 31 c0 8b 47 30 4c 8b 47 08 83 e3 7f c1 e8 0e f7 d3 89 c2 83 e0 7f 21 c3 83 e2 7f 21 f3 <41> 8b b6 a0 00 00 00 89 d8 83 e0 87 f6 c3 04 0f 44 d8 0f b7 47 30
[ 791.708646] RSP: 0018:ffffa8cf8062bcb8 EFLAGS: 00010246
[ 791.771254] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 791.856772] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94bac3d56a80
[ 791.942291] RBP: ffff94bac3d56a80 R08: 0000000000000000 R09: ffffa8cf8062bc6c
[ 792.027808] R10: 0000000000000000 R11: 0000000000000004 R12: ffff94b9c0f61fc0
[ 792.113326] R13: ffff94bbc0ae9828 R14: 0000000000000000 R15: ffff94b9c0ea6f20
[ 792.198845] FS: 0000000000000000(0000) GS:ffff94c8ffc00000(0000) …
[ 792.295827] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 792.364680] CR2: 00000000000000a0 CR3: 00000001062fe003 CR4: 00000000003706e0
[ 792.450198] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 792.535717] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 792.621235] Call Trace:
[ 792.650506] <TASK>
[ 792.675616] ? __die_body.cold+0x1a/0x1f
[ 792.722600] ? page_fault_oops+0xd2/0x2b0
[ 792.770625] ? sysvec_apic_timer_interrupt+0xa/0x90
[ 792.829068] ? exc_page_fault+0x70/0x170
[ 792.876051] ? asm_exc_page_fault+0x22/0x30
[ 792.926161] ? pcie_config_aspm_link+0x48/0x330
[ 792.980437] pcie_aspm_exit_link_state+0xb9/0x120
[ 793.036796] pci_remove_bus_device+0xc8/0x110
[ 793.088988] pci_remove_bus_device+0x2e/0x110
[ 793.141180] pci_remove_bus_device+0x3e/0x110
[ 793.193373] pciehp_unconfigure_device+0x94/0x160
[ 793.249733] pciehp_disable_slot+0x69/0x100
[ 793.299840] pciehp_handle_presence_or_link_change+0x241/0x350
[ 793.369742] pciehp_ist+0x164/0x170
[ 793.411524] ? disable_irq_nosync+0x10/0x10
[ 793.461632] irq_thread_fn+0x1f/0x60
[ 793.504449] irq_thread+0xfa/0x1c0
[ 793.545185] ? irq_thread_fn+0x60/0x60
[ 793.590085] ? irq_thread_check_affinity+0xf0/0xf0
[ 793.647485] kthread+0xda/0x100
[ 793.685096] ? kthread_complete_and_exit+0x20/0x20
[ 793.742495] ret_from_fork+0x22/0x30
[ 793.785314] </TASK>
Daniel Stodden (1):
PCI/ASPM: fix link state exit during switch upstream function removal.
drivers/pci/pcie/aspm.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/1] PCI/ASPM: fix link state exit during switch upstream function removal.
2024-12-23 3:39 ` [PATCH 0/1] " Daniel Stodden
@ 2024-12-23 3:39 ` Daniel Stodden
2025-01-03 1:53 ` Daniel Stodden
2025-01-29 21:29 ` Bjorn Helgaas
2024-12-28 5:00 ` [PATCH 0/1] Re: PCI/ASPM: Fix UAF by disabling ASPM for link when child function is removed Ding Hui
1 sibling, 2 replies; 10+ messages in thread
From: Daniel Stodden @ 2024-12-23 3:39 UTC (permalink / raw)
To: dinghui
Cc: Daniel Stodden, bhelgaas, david.e.box, kai.heng.feng, linux-pci,
michael.a.bottini, qinzongquan, rajatja, refactormyself,
sathyanarayanan.kuppuswamy, vidyas, Daniel Stodden
From: Daniel Stodden <daniel.stodden@gmail.com>
Before change 456d8aa37d0f "Disable ASPM on MFD function removal to
avoid use-after-free", we would free the ASPM link only after the last
function on the bus pertaining to the given link was removed.
That was too late. If function 0 is removed before sibling function,
link->downstream would point to free'd memory after.
After above change, we freed the ASPM parent link state upon any
function removal on the bus pertaining to a given link.
That is too early. If the link is to a PCIe switch with MFD on the
upstream port, then removing functions other than 0 first would free a
link which still remains parent_link to the remaining downstream
ports.
The resulting GPFs are especially frequent during hot-unplug, because
pciehp removes devices on the link bus in reverse order.
On that switch, function 0 is the virtual P2P bridge to the internal
bus. Free exactly when function 0 is removed -- before the parent link
is obsolete, but after all subordinate links are gone.
Fixes: 456d8aa37d0f ("PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free")
Signed-off-by: Daniel Stodden <dns@arista.com>
---
drivers/pci/pcie/aspm.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index e0bc90597dca..8ae7c75b408c 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -1273,16 +1273,16 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
parent_link = link->parent;
/*
- * link->downstream is a pointer to the pci_dev of function 0. If
- * we remove that function, the pci_dev is about to be deallocated,
- * so we can't use link->downstream again. Free the link state to
- * avoid this.
+ * Free the parent link state, no later than function 0 (i.e.
+ * link->downstream) being removed.
*
- * If we're removing a non-0 function, it's possible we could
- * retain the link state, but PCIe r6.0, sec 7.5.3.7, recommends
- * programming the same ASPM Control value for all functions of
- * multi-function devices, so disable ASPM for all of them.
+ * Do not free free the link state any earlier. If function 0
+ * is a switch upstream port, this link state is parent_link
+ * to all subordinate ones.
*/
+ if (pdev != link->downstream)
+ goto out;
+
pcie_config_aspm_link(link, 0);
list_del(&link->sibling);
free_link_state(link);
@@ -1293,6 +1293,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
pcie_config_aspm_path(parent_link);
}
+ out:
mutex_unlock(&aspm_lock);
up_read(&pci_bus_sem);
}
--
2.47.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/1] Re: PCI/ASPM: Fix UAF by disabling ASPM for link when child function is removed
2024-12-23 3:39 ` [PATCH 0/1] " Daniel Stodden
2024-12-23 3:39 ` [PATCH 1/1] PCI/ASPM: fix link state exit during switch upstream function removal Daniel Stodden
@ 2024-12-28 5:00 ` Ding Hui
2024-12-30 15:17 ` [PATCH 0/1] " Daniel Stodden
1 sibling, 1 reply; 10+ messages in thread
From: Ding Hui @ 2024-12-28 5:00 UTC (permalink / raw)
To: Daniel Stodden, bhelgaas
Cc: david.e.box, kai.heng.feng, linux-pci, michael.a.bottini,
qinzongquan, rajatja, refactormyself, sathyanarayanan.kuppuswamy,
vidyas
Sorry for late reply.
Thanks for pointing out the release order issue that I indeed overlooked.
I'm not an expert in PCI, so I'm not sure if there's a similar PCI topology like this:
[00]-+..
|
+--03.0-[01-03]--+-00.0 Endpoint
+-00.1-[02-03]--+-02.0-[02]--x
\-0d.0-[03]----00.0 Endpoint
If this is possible, I think that even after applying your patch, removing 01:00.0
individually by
echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
maybe still lead to a Use-After-Free when pcie_aspm_exit_link_state(<02:02.0>).
Additionally, Bjorn Helgaas also expects compliance with the PCIe specification r6.0,
section 7.5.3.7, which recommends that software programs the same ASPM Control value
across all functions of multi-function devices.
Therefore, should we recursively release the link_state of child devices before
the current device in pcie_aspm_exit_link_state()?
--
Thanks,
-dinghui
On 2024/12/23 11:39, Daniel Stodden wrote:
> About change 456d8aa37d0f "PCI/ASPM: Disable ASPM on MFD function
> removal to avoid use-after-free")
>
> Let's say root port 00:03.0 was connected to a PCIe switch.
>
> [00]-+..
> |
> +--03.0-[01-03]--+-00.0-[02-03]--+-02.0-[02]--x
> | | \-0d.0-[03]----00.0 Endpoint
> . +-00.1 Upstream Port Sibling
>
> And that switch had not only an upstream port at 01:00.0, but also a
> sibling function at 01:00.1.
>
> Let's break the link under 00:03.0, which makes pciehp remove the [01]
> bus. Surprise effect: traversal during bus [01] device removal happens
> in reverse order (for SR-IOV-ish reasons, see pciehp_pci.c
> commentary). Fair enough, ASPM should probably not rely on any
> specific order anyway.
>
> Recursing through pci_remove_bus_device() underneath, the order in
> which we pci_destroy_dev() will be:
>
> [ 01:00.1 [ 02:02.0 [ 03:00.0 ] 02:0d.0 ] 01:00.0 ]
>
> Trivially, the above is also the order in which
> pcie_aspm_exit_link_state() will be called.
>
> Then note how, since above change removed the list_empty() exit
> condition, we are now going to remove the pcie_link_state for bus [01]
> (parent=<00:03.0>) during the first invocation, i.e. right at
> pcie_aspm_exit_link_state(<01:00.1>).
>
> Iow: with bus [03] removal only to come, we removed the
> pcie_link_state upstream first, and only then will remove the
> downstream pcie_link_state at parent=<02:0d.0>.
>
> Eventually reaching that second link, it carries a ref "parent_link =
> link->parent" which now points to free'd memory again. One can observe
> a rather high probability of finishing with a random GPF or nullptr
> dereference condition.
> > Above switches, with MFD upstream portions, exist. Case at hand is a
> PEX8717 with 4 DMA engines:
>
> +-08.0-[51-5b]--+-00.0-[52-5b]--+-02.0-[53]--
> | \-0d.0-[54-5b]----00.0 Broadcom Inc. …
> +-00.1 PLX Technology, Inc. PEX PCI Express Switch DMA interface
> +-00.2 PLX Technology, Inc. PEX PCI Express Switch DMA interface
> +-00.3 PLX Technology, Inc. PEX PCI Express Switch DMA interface
> \-00.4 PLX Technology, Inc. PEX PCI Express Switch DMA interface
>
> Backtrace:
>
> [ 790.817077] BUG: kernel NULL pointer dereference, address: 00000000000000a0
> [ 790.900514] #PF: supervisor read access in kernel mode
> [ 790.962081] #PF: error_code(0x0000) - not-present page
> [ 791.023641] PGD 8000000104648067 P4D 8000000104648067 PUD 1404bc067 PMD 0
> [ 791.106041] Oops: 0000 [#1] PREEMPT SMP PTI
> [ 791.156151] CPU: 8 PID: 145 Comm: irq/43-pciehp Tainted: G OE 6.1.0-22-2-amd64 #5 Debian 6.1.94-1
> [ 791.279173] Hardware name: Intel Camelback Mountain CRB/Camelback Mountain CRB, BIOS Aboot-norcal7-7.1.6-generic-22971530 06/30/2021
> [ 791.421982] RIP: 0010:pcie_config_aspm_link+0x48/0x330
> [ 791.483548] Code: 48 8b 04 25 28 00 00 00 48 89 44 24 30 31 c0 8b 47 30 4c 8b 47 08 83 e3 7f c1 e8 0e f7 d3 89 c2 83 e0 7f 21 c3 83 e2 7f 21 f3 <41> 8b b6 a0 00 00 00 89 d8 83 e0 87 f6 c3 04 0f 44 d8 0f b7 47 30
> [ 791.708646] RSP: 0018:ffffa8cf8062bcb8 EFLAGS: 00010246
> [ 791.771254] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 791.856772] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94bac3d56a80
> [ 791.942291] RBP: ffff94bac3d56a80 R08: 0000000000000000 R09: ffffa8cf8062bc6c
> [ 792.027808] R10: 0000000000000000 R11: 0000000000000004 R12: ffff94b9c0f61fc0
> [ 792.113326] R13: ffff94bbc0ae9828 R14: 0000000000000000 R15: ffff94b9c0ea6f20
> [ 792.198845] FS: 0000000000000000(0000) GS:ffff94c8ffc00000(0000) …
> [ 792.295827] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 792.364680] CR2: 00000000000000a0 CR3: 00000001062fe003 CR4: 00000000003706e0
> [ 792.450198] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 792.535717] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 792.621235] Call Trace:
> [ 792.650506] <TASK>
> [ 792.675616] ? __die_body.cold+0x1a/0x1f
> [ 792.722600] ? page_fault_oops+0xd2/0x2b0
> [ 792.770625] ? sysvec_apic_timer_interrupt+0xa/0x90
> [ 792.829068] ? exc_page_fault+0x70/0x170
> [ 792.876051] ? asm_exc_page_fault+0x22/0x30
> [ 792.926161] ? pcie_config_aspm_link+0x48/0x330
> [ 792.980437] pcie_aspm_exit_link_state+0xb9/0x120
> [ 793.036796] pci_remove_bus_device+0xc8/0x110
> [ 793.088988] pci_remove_bus_device+0x2e/0x110
> [ 793.141180] pci_remove_bus_device+0x3e/0x110
> [ 793.193373] pciehp_unconfigure_device+0x94/0x160
> [ 793.249733] pciehp_disable_slot+0x69/0x100
> [ 793.299840] pciehp_handle_presence_or_link_change+0x241/0x350
> [ 793.369742] pciehp_ist+0x164/0x170
> [ 793.411524] ? disable_irq_nosync+0x10/0x10
> [ 793.461632] irq_thread_fn+0x1f/0x60
> [ 793.504449] irq_thread+0xfa/0x1c0
> [ 793.545185] ? irq_thread_fn+0x60/0x60
> [ 793.590085] ? irq_thread_check_affinity+0xf0/0xf0
> [ 793.647485] kthread+0xda/0x100
> [ 793.685096] ? kthread_complete_and_exit+0x20/0x20
> [ 793.742495] ret_from_fork+0x22/0x30
> [ 793.785314] </TASK>
>
> Daniel Stodden (1):
> PCI/ASPM: fix link state exit during switch upstream function removal.
>
> drivers/pci/pcie/aspm.c | 17 +++++++++--------
> 1 file changed, 9 insertions(+), 8 deletions(-)
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/1] PCI/ASPM: Fix UAF by disabling ASPM for link when child function is removed
2024-12-28 5:00 ` [PATCH 0/1] Re: PCI/ASPM: Fix UAF by disabling ASPM for link when child function is removed Ding Hui
@ 2024-12-30 15:17 ` Daniel Stodden
2025-01-01 12:25 ` Ding Hui
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Stodden @ 2024-12-30 15:17 UTC (permalink / raw)
To: Ding Hui
Cc: Bjorn Helgaas, david.e.box, kai.heng.feng, linux-pci,
michael.a.bottini, qinzongquan, rajatja, refactormyself,
sathyanarayanan.kuppuswamy, vidyas
Hi!
> On Dec 27, 2024, at 9:00 PM, Ding Hui <dinghui@sangfor.com.cn> wrote:
>
> Sorry for late reply.
No problem at all.
Wasn’t really expecting to make much of a wave when when sending this out a day before the holidays.
Thanks for picking it up!
> Thanks for pointing out the release order issue that I indeed overlooked.
>
> I'm not an expert in PCI, so I'm not sure if there's a similar PCI topology like this:
>
> [00]-+..
> |
> +--03.0-[01-03]--+-00.0 Endpoint
> +-00.1-[02-03]--+-02.0-[02]--x
> \-0d.0-[03]----00.0 Endpoint
>
> If this is possible, I think that even after applying your patch, removing 01:00.0
> individually by
> echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
> maybe still lead to a Use-After-Free when pcie_aspm_exit_link_state(<02:02.0>).
Now, my main problem is that I’m not super familiar with ASPM details.
But let’s start from what PCIe has. And I’m just assuming that ASPM does not apply outside PCI Express.
(It’s called drivers/pci/pcie/aspm.c after all. And pcie_aspm_sanity_check appears to agree with me (?))
Outside of legacy support, the main PCIe devices are root complex, switches, and (PCIe) endpoints (spec: "PCI Express Fabric Topology”)
These in turn yield 4 essential PCIe types of _functions_: root ports, switch upstream ports, switch downstream ports, and endpoints.
All except endpoints are represented as virtual PCI-to-PCI bridges in the conventional PCI.
In your diagram, 00:03.0 is a bridge representing a root port of the root complex.
Bus 01 is a link.
To have more bridges, 01:* must be on the upstream port of a switch. That switch has a bridge on the upstream port, with the secondary bus being a virtual one representing the fabric connecting downstream ports, internal to it.
Now here the rub: the spec says that upstream port is represented by a bridge.
I wish I could just quote the spec also saying that that bridge is function 0 the bus representing the link upstream.
(Or, given how the aspm.c code works, at least some function 0).
But I think it doesn’t.
Now here is the practical side: I know about 3-4 switch families at the moment.
Broadcom/PEX, Diodes/Pericom, Microchip/Switchtec) and ASMedia.
1. All model the bridge as :00.0. "Function 0”.
2. At least two of them (PEX, Switchtec) have additional functions as endpoints. For DMA, management, and/or NTB support.
They’re at 00:{1..}. Your change broke PCIe hotplug for both of them. (But this is not a rant, just saying what it is.)
3. Those sibling functions on the upstream port are very common.
This is also because the PCIe spec is very specific about _not_ having anything but downstream ports, i.e. bridges,
on the internal bus. Well, they have to go somewhere.
This is why the old code worked for so long, and why I think my patch at least works in practice.
So, to answer your questions: it’s not in the spec at a spot I can point at. But no PCIe device I know about makes 01:00.0 and endpoint *and* 01:00.1 a bridge at the same time. Plenty of them the other way around.
Contrast this with what I’m gathering from aspm.c
- The driver is following *link* topology. Not so much function topology.
- Function 0 appears to be assumed to be the main downstream representative of the device for power management purposes.
I did not find this in the ASPM spec (maybe to a fault), but plenty other sections in PCIe go similar ways. So this isn’t surprising.
- I does, however, not explictly state that function 0 is also the bridge.
Now, the overall PCI layer is working on bridges and buses, not links. And so, removal, whether it’s coming from link state or sysfs, is following bridges and buses.
- As long as function 0 is the bridge, the current code works.
- If there is some PCIe switch where link->downstream is not the bridge, it breaks again.
One could go further in sanity checking. E.g. enforce that if function 0 is an endpoint, there are no subordinate links.
I’d be willing to add more sanity checks. But the only real idea I have to offer would warrant a separate patch, and it’s at the end of this mail.
> Additionally, Bjorn Helgaas also expects compliance with the PCIe specification r6.0,
> section 7.5.3.7, which recommends that software programs the same ASPM Control value
> across all functions of multi-function devices.
Yep. And I don’t disagree (although I’m not familar enough to be as decided about it.)
However, I deleted your comment regarding that. Didn’t mean to offend. My argument went roughly as follows:
- Whether we program all functions is not up to pcie_aspm_exit_link_state().
- It’s up to pcie_config_aspm_link(), called by pcie_aspm_exit_link_state().
- That’s already happening, and apparently commented on accordingly,.
Above pcie_config_aspm_dev().
The spot you edited just didn’t have much to add or remove there, afaict.
> Therefore, should we recursively release the link_state of child devices before
> the current device in pcie_aspm_exit_link_state()?
No.
.. if I may offer an opinion here. :)
Beware, I’m not a maintiner. Just having opinions.
The PCI layer is already taking care of it. The link state removal now follows the bus removal.
(Well, as long as that function-0-is-the-upstream-bridge assumption isn’t broken.)
The bus removal (pci/remove.c) is happening recursively already. So the link state removal now happens
in the depth first order you’re envisioning. From that POV, it’s okay.
So why are we worried?
I think I’ve seen people mention that the link_state contents could go into the pci_dev, earlier on this thread?
I don’t think not having a separate allocation is the problem either. Having the separate struct per se isn’t wrong.
Still, we’re worried for a reason.
The problem with the remaining aspm.c code, if there is any, is not that it’s not doing enough already.
It’s doing way, way too much. That’s how you ended up on this list, and that’s what brought me here.
That pcie_link_state is currently not only carrying ASPM attributes. It’s replicating the conventional PCI topology.
With those {downstream, root, parent) pointers. And once you have that, you’re either inconsistent, and crash, or “only" redundant.
I think if one wanted to improvie things further, it might be as basic as getting rid of those pointers between the links.
Then go by PCIe function types instead (root port, upstrea, downstream, endpoint). But follow the bus topology, not custom pointers.
(But it’s not like I tried yet.)
Daniel
> --
> Thanks,
> -dinghui
>
> On 2024/12/23 11:39, Daniel Stodden wrote:
>> About change 456d8aa37d0f "PCI/ASPM: Disable ASPM on MFD function
>> removal to avoid use-after-free")
>> Let's say root port 00:03.0 was connected to a PCIe switch.
>> [00]-+..
>> |
>> +--03.0-[01-03]--+-00.0-[02-03]--+-02.0-[02]--x
>> | | \-0d.0-[03]----00.0 Endpoint
>> . +-00.1 Upstream Port Sibling
>> And that switch had not only an upstream port at 01:00.0, but also a
>> sibling function at 01:00.1.
>> Let's break the link under 00:03.0, which makes pciehp remove the [01]
>> bus. Surprise effect: traversal during bus [01] device removal happens
>> in reverse order (for SR-IOV-ish reasons, see pciehp_pci.c
>> commentary). Fair enough, ASPM should probably not rely on any
>> specific order anyway.
>> Recursing through pci_remove_bus_device() underneath, the order in
>> which we pci_destroy_dev() will be:
>> [ 01:00.1 [ 02:02.0 [ 03:00.0 ] 02:0d.0 ] 01:00.0 ]
>> Trivially, the above is also the order in which
>> pcie_aspm_exit_link_state() will be called.
>> Then note how, since above change removed the list_empty() exit
>> condition, we are now going to remove the pcie_link_state for bus [01]
>> (parent=<00:03.0>) during the first invocation, i.e. right at
>> pcie_aspm_exit_link_state(<01:00.1>).
>> Iow: with bus [03] removal only to come, we removed the
>> pcie_link_state upstream first, and only then will remove the
>> downstream pcie_link_state at parent=<02:0d.0>.
>> Eventually reaching that second link, it carries a ref "parent_link =
>> link->parent" which now points to free'd memory again. One can observe
>> a rather high probability of finishing with a random GPF or nullptr
>> dereference condition.
>> > Above switches, with MFD upstream portions, exist. Case at hand is a
>> PEX8717 with 4 DMA engines:
>> +-08.0-[51-5b]--+-00.0-[52-5b]--+-02.0-[53]--
>> | \-0d.0-[54-5b]----00.0 Broadcom Inc. …
>> +-00.1 PLX Technology, Inc. PEX PCI Express Switch DMA interface
>> +-00.2 PLX Technology, Inc. PEX PCI Express Switch DMA interface
>> +-00.3 PLX Technology, Inc. PEX PCI Express Switch DMA interface
>> \-00.4 PLX Technology, Inc. PEX PCI Express Switch DMA interface
>> Backtrace:
>> [ 790.817077] BUG: kernel NULL pointer dereference, address: 00000000000000a0
>> [ 790.900514] #PF: supervisor read access in kernel mode
>> [ 790.962081] #PF: error_code(0x0000) - not-present page
>> [ 791.023641] PGD 8000000104648067 P4D 8000000104648067 PUD 1404bc067 PMD 0
>> [ 791.106041] Oops: 0000 [#1] PREEMPT SMP PTI
>> [ 791.156151] CPU: 8 PID: 145 Comm: irq/43-pciehp Tainted: G OE 6.1.0-22-2-amd64 #5 Debian 6.1.94-1
>> [ 791.279173] Hardware name: Intel Camelback Mountain CRB/Camelback Mountain CRB, BIOS Aboot-norcal7-7.1.6-generic-22971530 06/30/2021
>> [ 791.421982] RIP: 0010:pcie_config_aspm_link+0x48/0x330
>> [ 791.483548] Code: 48 8b 04 25 28 00 00 00 48 89 44 24 30 31 c0 8b 47 30 4c 8b 47 08 83 e3 7f c1 e8 0e f7 d3 89 c2 83 e0 7f 21 c3 83 e2 7f 21 f3 <41> 8b b6 a0 00 00 00 89 d8 83 e0 87 f6 c3 04 0f 44 d8 0f b7 47 30
>> [ 791.708646] RSP: 0018:ffffa8cf8062bcb8 EFLAGS: 00010246
>> [ 791.771254] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>> [ 791.856772] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94bac3d56a80
>> [ 791.942291] RBP: ffff94bac3d56a80 R08: 0000000000000000 R09: ffffa8cf8062bc6c
>> [ 792.027808] R10: 0000000000000000 R11: 0000000000000004 R12: ffff94b9c0f61fc0
>> [ 792.113326] R13: ffff94bbc0ae9828 R14: 0000000000000000 R15: ffff94b9c0ea6f20
>> [ 792.198845] FS: 0000000000000000(0000) GS:ffff94c8ffc00000(0000) …
>> [ 792.295827] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 792.364680] CR2: 00000000000000a0 CR3: 00000001062fe003 CR4: 00000000003706e0
>> [ 792.450198] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 792.535717] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 792.621235] Call Trace:
>> [ 792.650506] <TASK>
>> [ 792.675616] ? __die_body.cold+0x1a/0x1f
>> [ 792.722600] ? page_fault_oops+0xd2/0x2b0
>> [ 792.770625] ? sysvec_apic_timer_interrupt+0xa/0x90
>> [ 792.829068] ? exc_page_fault+0x70/0x170
>> [ 792.876051] ? asm_exc_page_fault+0x22/0x30
>> [ 792.926161] ? pcie_config_aspm_link+0x48/0x330
>> [ 792.980437] pcie_aspm_exit_link_state+0xb9/0x120
>> [ 793.036796] pci_remove_bus_device+0xc8/0x110
>> [ 793.088988] pci_remove_bus_device+0x2e/0x110
>> [ 793.141180] pci_remove_bus_device+0x3e/0x110
>> [ 793.193373] pciehp_unconfigure_device+0x94/0x160
>> [ 793.249733] pciehp_disable_slot+0x69/0x100
>> [ 793.299840] pciehp_handle_presence_or_link_change+0x241/0x350
>> [ 793.369742] pciehp_ist+0x164/0x170
>> [ 793.411524] ? disable_irq_nosync+0x10/0x10
>> [ 793.461632] irq_thread_fn+0x1f/0x60
>> [ 793.504449] irq_thread+0xfa/0x1c0
>> [ 793.545185] ? irq_thread_fn+0x60/0x60
>> [ 793.590085] ? irq_thread_check_affinity+0xf0/0xf0
>> [ 793.647485] kthread+0xda/0x100
>> [ 793.685096] ? kthread_complete_and_exit+0x20/0x20
>> [ 793.742495] ret_from_fork+0x22/0x30
>> [ 793.785314] </TASK>
>> Daniel Stodden (1):
>> PCI/ASPM: fix link state exit during switch upstream function removal.
>> drivers/pci/pcie/aspm.c | 17 +++++++++--------
>> 1 file changed, 9 insertions(+), 8 deletions(-)
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/1] PCI/ASPM: Fix UAF by disabling ASPM for link when child function is removed
2024-12-30 15:17 ` [PATCH 0/1] " Daniel Stodden
@ 2025-01-01 12:25 ` Ding Hui
0 siblings, 0 replies; 10+ messages in thread
From: Ding Hui @ 2025-01-01 12:25 UTC (permalink / raw)
To: Daniel Stodden
Cc: Bjorn Helgaas, david.e.box, kai.heng.feng, linux-pci,
michael.a.bottini, qinzongquan, rajatja, refactormyself,
sathyanarayanan.kuppuswamy, vidyas
On 2024/12/30 23:17, Daniel Stodden wrote:
>
> Hi!
>
>
>
>> On Dec 27, 2024, at 9:00 PM, Ding Hui <dinghui@sangfor.com.cn> wrote:
>>
>> Sorry for late reply.
>
>
> No problem at all.
> Wasn’t really expecting to make much of a wave when when sending this out a day before the holidays.
>
> Thanks for picking it up!
>
>> Thanks for pointing out the release order issue that I indeed overlooked.
>>
>> I'm not an expert in PCI, so I'm not sure if there's a similar PCI topology like this:
>>
>> [00]-+..
>> |
>> +--03.0-[01-03]--+-00.0 Endpoint
>> +-00.1-[02-03]--+-02.0-[02]--x
>> \-0d.0-[03]----00.0 Endpoint
>>
>> If this is possible, I think that even after applying your patch, removing 01:00.0
>> individually by
>> echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
>> maybe still lead to a Use-After-Free when pcie_aspm_exit_link_state(<02:02.0>).
>
> Now, my main problem is that I’m not super familiar with ASPM details.
> But let’s start from what PCIe has. And I’m just assuming that ASPM does not apply outside PCI Express.
>
> (It’s called drivers/pci/pcie/aspm.c after all. And pcie_aspm_sanity_check appears to agree with me (?))
>
> Outside of legacy support, the main PCIe devices are root complex, switches, and (PCIe) endpoints (spec: "PCI Express Fabric Topology”)
>
> These in turn yield 4 essential PCIe types of _functions_: root ports, switch upstream ports, switch downstream ports, and endpoints.
>
> All except endpoints are represented as virtual PCI-to-PCI bridges in the conventional PCI.
>
> In your diagram, 00:03.0 is a bridge representing a root port of the root complex.
> Bus 01 is a link.
>
> To have more bridges, 01:* must be on the upstream port of a switch. That switch has a bridge on the upstream port, with the secondary bus being a virtual one representing the fabric connecting downstream ports, internal to it.
>
> Now here the rub: the spec says that upstream port is represented by a bridge.
> I wish I could just quote the spec also saying that that bridge is function 0 the bus representing the link upstream.
>
> (Or, given how the aspm.c code works, at least some function 0).
>
> But I think it doesn’t.
>
> Now here is the practical side: I know about 3-4 switch families at the moment.
> Broadcom/PEX, Diodes/Pericom, Microchip/Switchtec) and ASMedia.
>
> 1. All model the bridge as :00.0. "Function 0”.
>
> 2. At least two of them (PEX, Switchtec) have additional functions as endpoints. For DMA, management, and/or NTB support.
> They’re at 00:{1..}. Your change broke PCIe hotplug for both of them. (But this is not a rant, just saying what it is.)
>
> 3. Those sibling functions on the upstream port are very common.
> This is also because the PCIe spec is very specific about _not_ having anything but downstream ports, i.e. bridges,
> on the internal bus. Well, they have to go somewhere.
>
> This is why the old code worked for so long, and why I think my patch at least works in practice.
>
> So, to answer your questions: it’s not in the spec at a spot I can point at. But no PCIe device I know about makes 01:00.0 and endpoint *and* 01:00.1 a bridge at the same time. Plenty of them the other way around.
>
Thanks for the detailed information that has greatly enriched my insights, and it does make sense.
> Contrast this with what I’m gathering from aspm.c
>
> - The driver is following *link* topology. Not so much function topology.
> - Function 0 appears to be assumed to be the main downstream representative of the device for power management purposes.
> I did not find this in the ASPM spec (maybe to a fault), but plenty other sections in PCIe go similar ways. So this isn’t surprising.
> - I does, however, not explictly state that function 0 is also the bridge.
>
> Now, the overall PCI layer is working on bridges and buses, not links. And so, removal, whether it’s coming from link state or sysfs, is following bridges and buses.
>
> - As long as function 0 is the bridge, the current code works.
> - If there is some PCIe switch where link->downstream is not the bridge, it breaks again.
>
> One could go further in sanity checking. E.g. enforce that if function 0 is an endpoint, there are no subordinate links.
>
> I’d be willing to add more sanity checks. But the only real idea I have to offer would warrant a separate patch, and it’s at the end of this mail.
>
>> Additionally, Bjorn Helgaas also expects compliance with the PCIe specification r6.0,
>> section 7.5.3.7, which recommends that software programs the same ASPM Control value
>> across all functions of multi-function devices.
>
> Yep. And I don’t disagree (although I’m not familar enough to be as decided about it.)
>
> However, I deleted your comment regarding that. Didn’t mean to offend. My argument went roughly as follows:
>
> - Whether we program all functions is not up to pcie_aspm_exit_link_state().
> - It’s up to pcie_config_aspm_link(), called by pcie_aspm_exit_link_state().
> - That’s already happening, and apparently commented on accordingly,.
> Above pcie_config_aspm_dev().
>
> The spot you edited just didn’t have much to add or remove there, afaict.
>
>> Therefore, should we recursively release the link_state of child devices before
>> the current device in pcie_aspm_exit_link_state()?
>
> No.
>
> .. if I may offer an opinion here. :)
>
> Beware, I’m not a maintiner. Just having opinions.
>
> The PCI layer is already taking care of it. The link state removal now follows the bus removal.
>
> (Well, as long as that function-0-is-the-upstream-bridge assumption isn’t broken.)
>
I believe your patch won't bring back the UAF issue I had before, so I'll give you an ACK.
Thanks again.
> The bus removal (pci/remove.c) is happening recursively already. So the link state removal now happens
> in the depth first order you’re envisioning. From that POV, it’s okay.
>
> So why are we worried?
>
> I think I’ve seen people mention that the link_state contents could go into the pci_dev, earlier on this thread?
> I don’t think not having a separate allocation is the problem either. Having the separate struct per se isn’t wrong.
>
> Still, we’re worried for a reason.
>
> The problem with the remaining aspm.c code, if there is any, is not that it’s not doing enough already.
> It’s doing way, way too much. That’s how you ended up on this list, and that’s what brought me here.
>
> That pcie_link_state is currently not only carrying ASPM attributes. It’s replicating the conventional PCI topology.
> With those {downstream, root, parent) pointers. And once you have that, you’re either inconsistent, and crash, or “only" redundant.
>
> I think if one wanted to improvie things further, it might be as basic as getting rid of those pointers between the links.
> Then go by PCIe function types instead (root port, upstrea, downstream, endpoint). But follow the bus topology, not custom pointers.
>
> (But it’s not like I tried yet.)
>
> Daniel
>
>> --
>> Thanks,
>> -dinghui
>>
>> On 2024/12/23 11:39, Daniel Stodden wrote:
>>> About change 456d8aa37d0f "PCI/ASPM: Disable ASPM on MFD function
>>> removal to avoid use-after-free")
>>> Let's say root port 00:03.0 was connected to a PCIe switch.
>>> [00]-+..
>>> |
>>> +--03.0-[01-03]--+-00.0-[02-03]--+-02.0-[02]--x
>>> | | \-0d.0-[03]----00.0 Endpoint
>>> . +-00.1 Upstream Port Sibling
>>> And that switch had not only an upstream port at 01:00.0, but also a
>>> sibling function at 01:00.1.
>>> Let's break the link under 00:03.0, which makes pciehp remove the [01]
>>> bus. Surprise effect: traversal during bus [01] device removal happens
>>> in reverse order (for SR-IOV-ish reasons, see pciehp_pci.c
>>> commentary). Fair enough, ASPM should probably not rely on any
>>> specific order anyway.
>>> Recursing through pci_remove_bus_device() underneath, the order in
>>> which we pci_destroy_dev() will be:
>>> [ 01:00.1 [ 02:02.0 [ 03:00.0 ] 02:0d.0 ] 01:00.0 ]
>>> Trivially, the above is also the order in which
>>> pcie_aspm_exit_link_state() will be called.
>>> Then note how, since above change removed the list_empty() exit
>>> condition, we are now going to remove the pcie_link_state for bus [01]
>>> (parent=<00:03.0>) during the first invocation, i.e. right at
>>> pcie_aspm_exit_link_state(<01:00.1>).
>>> Iow: with bus [03] removal only to come, we removed the
>>> pcie_link_state upstream first, and only then will remove the
>>> downstream pcie_link_state at parent=<02:0d.0>.
>>> Eventually reaching that second link, it carries a ref "parent_link =
>>> link->parent" which now points to free'd memory again. One can observe
>>> a rather high probability of finishing with a random GPF or nullptr
>>> dereference condition.
>>>> Above switches, with MFD upstream portions, exist. Case at hand is a
>>> PEX8717 with 4 DMA engines:
>>> +-08.0-[51-5b]--+-00.0-[52-5b]--+-02.0-[53]--
>>> | \-0d.0-[54-5b]----00.0 Broadcom Inc. …
>>> +-00.1 PLX Technology, Inc. PEX PCI Express Switch DMA interface
>>> +-00.2 PLX Technology, Inc. PEX PCI Express Switch DMA interface
>>> +-00.3 PLX Technology, Inc. PEX PCI Express Switch DMA interface
>>> \-00.4 PLX Technology, Inc. PEX PCI Express Switch DMA interface
>>> Backtrace:
>>> [ 790.817077] BUG: kernel NULL pointer dereference, address: 00000000000000a0
>>> [ 790.900514] #PF: supervisor read access in kernel mode
>>> [ 790.962081] #PF: error_code(0x0000) - not-present page
>>> [ 791.023641] PGD 8000000104648067 P4D 8000000104648067 PUD 1404bc067 PMD 0
>>> [ 791.106041] Oops: 0000 [#1] PREEMPT SMP PTI
>>> [ 791.156151] CPU: 8 PID: 145 Comm: irq/43-pciehp Tainted: G OE 6.1.0-22-2-amd64 #5 Debian 6.1.94-1
>>> [ 791.279173] Hardware name: Intel Camelback Mountain CRB/Camelback Mountain CRB, BIOS Aboot-norcal7-7.1.6-generic-22971530 06/30/2021
>>> [ 791.421982] RIP: 0010:pcie_config_aspm_link+0x48/0x330
>>> [ 791.483548] Code: 48 8b 04 25 28 00 00 00 48 89 44 24 30 31 c0 8b 47 30 4c 8b 47 08 83 e3 7f c1 e8 0e f7 d3 89 c2 83 e0 7f 21 c3 83 e2 7f 21 f3 <41> 8b b6 a0 00 00 00 89 d8 83 e0 87 f6 c3 04 0f 44 d8 0f b7 47 30
>>> [ 791.708646] RSP: 0018:ffffa8cf8062bcb8 EFLAGS: 00010246
>>> [ 791.771254] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>> [ 791.856772] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94bac3d56a80
>>> [ 791.942291] RBP: ffff94bac3d56a80 R08: 0000000000000000 R09: ffffa8cf8062bc6c
>>> [ 792.027808] R10: 0000000000000000 R11: 0000000000000004 R12: ffff94b9c0f61fc0
>>> [ 792.113326] R13: ffff94bbc0ae9828 R14: 0000000000000000 R15: ffff94b9c0ea6f20
>>> [ 792.198845] FS: 0000000000000000(0000) GS:ffff94c8ffc00000(0000) …
>>> [ 792.295827] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 792.364680] CR2: 00000000000000a0 CR3: 00000001062fe003 CR4: 00000000003706e0
>>> [ 792.450198] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ 792.535717] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [ 792.621235] Call Trace:
>>> [ 792.650506] <TASK>
>>> [ 792.675616] ? __die_body.cold+0x1a/0x1f
>>> [ 792.722600] ? page_fault_oops+0xd2/0x2b0
>>> [ 792.770625] ? sysvec_apic_timer_interrupt+0xa/0x90
>>> [ 792.829068] ? exc_page_fault+0x70/0x170
>>> [ 792.876051] ? asm_exc_page_fault+0x22/0x30
>>> [ 792.926161] ? pcie_config_aspm_link+0x48/0x330
>>> [ 792.980437] pcie_aspm_exit_link_state+0xb9/0x120
>>> [ 793.036796] pci_remove_bus_device+0xc8/0x110
>>> [ 793.088988] pci_remove_bus_device+0x2e/0x110
>>> [ 793.141180] pci_remove_bus_device+0x3e/0x110
>>> [ 793.193373] pciehp_unconfigure_device+0x94/0x160
>>> [ 793.249733] pciehp_disable_slot+0x69/0x100
>>> [ 793.299840] pciehp_handle_presence_or_link_change+0x241/0x350
>>> [ 793.369742] pciehp_ist+0x164/0x170
>>> [ 793.411524] ? disable_irq_nosync+0x10/0x10
>>> [ 793.461632] irq_thread_fn+0x1f/0x60
>>> [ 793.504449] irq_thread+0xfa/0x1c0
>>> [ 793.545185] ? irq_thread_fn+0x60/0x60
>>> [ 793.590085] ? irq_thread_check_affinity+0xf0/0xf0
>>> [ 793.647485] kthread+0xda/0x100
>>> [ 793.685096] ? kthread_complete_and_exit+0x20/0x20
>>> [ 793.742495] ret_from_fork+0x22/0x30
>>> [ 793.785314] </TASK>
>>> Daniel Stodden (1):
>>> PCI/ASPM: fix link state exit during switch upstream function removal.
>>> drivers/pci/pcie/aspm.c | 17 +++++++++--------
>>> 1 file changed, 9 insertions(+), 8 deletions(-)
>>
>>
>>
>
>
>
--
Thanks,
-dinghui
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] PCI/ASPM: fix link state exit during switch upstream function removal.
2024-12-23 3:39 ` [PATCH 1/1] PCI/ASPM: fix link state exit during switch upstream function removal Daniel Stodden
@ 2025-01-03 1:53 ` Daniel Stodden
2025-01-29 21:29 ` Bjorn Helgaas
1 sibling, 0 replies; 10+ messages in thread
From: Daniel Stodden @ 2025-01-03 1:53 UTC (permalink / raw)
To: Daniel Stodden
Cc: dinghui, Bjorn Helgaas, david.e.box, kai.heng.feng, linux-pci,
michael.a.bottini, qinzongquan, rajatja, refactormyself,
sathyanarayanan.kuppuswamy, vidyas
> On Dec 22, 2024, at 7:39 PM, Daniel Stodden <dns@arista.com> wrote:
>
> From: Daniel Stodden <daniel.stodden@gmail.com>
If this gets accepted — remove that line? Not important, but I don’t know how it got there. Might be because I had to export/import patches between private and corporate machines during test a few times.
Thanks,
Daniel
> Before change 456d8aa37d0f "Disable ASPM on MFD function removal to
> avoid use-after-free", we would free the ASPM link only after the last
> function on the bus pertaining to the given link was removed.
>
> That was too late. If function 0 is removed before sibling function,
> link->downstream would point to free'd memory after.
>
> After above change, we freed the ASPM parent link state upon any
> function removal on the bus pertaining to a given link.
>
> That is too early. If the link is to a PCIe switch with MFD on the
> upstream port, then removing functions other than 0 first would free a
> link which still remains parent_link to the remaining downstream
> ports.
>
> The resulting GPFs are especially frequent during hot-unplug, because
> pciehp removes devices on the link bus in reverse order.
>
> On that switch, function 0 is the virtual P2P bridge to the internal
> bus. Free exactly when function 0 is removed -- before the parent link
> is obsolete, but after all subordinate links are gone.
>
> Fixes: 456d8aa37d0f ("PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free")
> Signed-off-by: Daniel Stodden <dns@arista.com>
> ---
> drivers/pci/pcie/aspm.c | 17 +++++++++--------
> 1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index e0bc90597dca..8ae7c75b408c 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -1273,16 +1273,16 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
> parent_link = link->parent;
>
> /*
> - * link->downstream is a pointer to the pci_dev of function 0. If
> - * we remove that function, the pci_dev is about to be deallocated,
> - * so we can't use link->downstream again. Free the link state to
> - * avoid this.
> + * Free the parent link state, no later than function 0 (i.e.
> + * link->downstream) being removed.
> *
> - * If we're removing a non-0 function, it's possible we could
> - * retain the link state, but PCIe r6.0, sec 7.5.3.7, recommends
> - * programming the same ASPM Control value for all functions of
> - * multi-function devices, so disable ASPM for all of them.
> + * Do not free free the link state any earlier. If function 0
> + * is a switch upstream port, this link state is parent_link
> + * to all subordinate ones.
> */
> + if (pdev != link->downstream)
> + goto out;
> +
> pcie_config_aspm_link(link, 0);
> list_del(&link->sibling);
> free_link_state(link);
> @@ -1293,6 +1293,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
> pcie_config_aspm_path(parent_link);
> }
>
> + out:
> mutex_unlock(&aspm_lock);
> up_read(&pci_bus_sem);
> }
> --
> 2.47.0
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] PCI/ASPM: fix link state exit during switch upstream function removal.
2024-12-23 3:39 ` [PATCH 1/1] PCI/ASPM: fix link state exit during switch upstream function removal Daniel Stodden
2025-01-03 1:53 ` Daniel Stodden
@ 2025-01-29 21:29 ` Bjorn Helgaas
2025-01-30 6:54 ` Sathyanarayanan Kuppuswamy
1 sibling, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2025-01-29 21:29 UTC (permalink / raw)
To: Daniel Stodden
Cc: dinghui, Daniel Stodden, bhelgaas, david.e.box, kai.heng.feng,
linux-pci, michael.a.bottini, qinzongquan, rajatja,
refactormyself, sathyanarayanan.kuppuswamy, vidyas
On Sun, Dec 22, 2024 at 07:39:08PM -0800, Daniel Stodden wrote:
> From: Daniel Stodden <daniel.stodden@gmail.com>
>
> Before change 456d8aa37d0f "Disable ASPM on MFD function removal to
> avoid use-after-free", we would free the ASPM link only after the last
> function on the bus pertaining to the given link was removed.
>
> That was too late. If function 0 is removed before sibling function,
> link->downstream would point to free'd memory after.
>
> After above change, we freed the ASPM parent link state upon any
> function removal on the bus pertaining to a given link.
>
> That is too early. If the link is to a PCIe switch with MFD on the
> upstream port, then removing functions other than 0 first would free a
> link which still remains parent_link to the remaining downstream
> ports.
Is this specific to a Switch? It seems like removal of any
multi-function device might trip over this.
> The resulting GPFs are especially frequent during hot-unplug, because
> pciehp removes devices on the link bus in reverse order.
Do you have a sample GPF? If we can include a few pertinent lines
here it may help people connect a problem with this fix.
> On that switch, function 0 is the virtual P2P bridge to the internal
> bus. Free exactly when function 0 is removed -- before the parent link
> is obsolete, but after all subordinate links are gone.
I agree this is a problem.
IIUC we allocate pcie_link_state when enumerating a device on the
upstream end of a link, i.e., a Root Port or Switch Downstream Port,
but we deallocate it when removing a device on the downstream end of
the link, i.e., an Endpoint or Switch Upstream Port. This asymmetry
seems kind of prone to error.
Also, struct pcie_link_state contains redundant information, e.g., the
pdev, downstream, parent, and sibling members basically duplicate the
hierarchy already described by the struct pci_bus parent, self, and
devices members. Redundancy like this is also error prone.
This patch is attractive because it's a very small fix, and maybe we
should use it for that reason. But I do think we're basically
papering over a pretty serious design defect in ASPM.
I think we'd ultimately be better off if we allocated pcie_link_state
either as a member of struct pci_dev (instead of using a pointer), or
perhaps in pci_setup_device() when we set up the rest of the
bridge-related things and made it live as long as the bridge device.
Actually, if we removed all the redundant pointers in struct
pcie_link_state, it would be smaller than a single pointer, so there'd
be no reason to allocate it dynamically.
Of course this would be a much bigger change to aspm.c.
> Fixes: 456d8aa37d0f ("PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free")
Do we have any public problem reports we could mention here? I'm
actually a little surprised that this hasn't been a bigger problem,
given that 456d8aa37d0f appeared in v6.5 in Aug 2023. But maybe there
is some unusual topology or hot-unplug involved?
> Signed-off-by: Daniel Stodden <dns@arista.com>
> ---
> drivers/pci/pcie/aspm.c | 17 +++++++++--------
> 1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index e0bc90597dca..8ae7c75b408c 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -1273,16 +1273,16 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
> parent_link = link->parent;
>
> /*
> - * link->downstream is a pointer to the pci_dev of function 0. If
> - * we remove that function, the pci_dev is about to be deallocated,
> - * so we can't use link->downstream again. Free the link state to
> - * avoid this.
> + * Free the parent link state, no later than function 0 (i.e.
> + * link->downstream) being removed.
> *
> - * If we're removing a non-0 function, it's possible we could
> - * retain the link state, but PCIe r6.0, sec 7.5.3.7, recommends
> - * programming the same ASPM Control value for all functions of
> - * multi-function devices, so disable ASPM for all of them.
> + * Do not free free the link state any earlier. If function 0
> + * is a switch upstream port, this link state is parent_link
> + * to all subordinate ones.
> */
> + if (pdev != link->downstream)
> + goto out;
> +
> pcie_config_aspm_link(link, 0);
> list_del(&link->sibling);
> free_link_state(link);
> @@ -1293,6 +1293,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
> pcie_config_aspm_path(parent_link);
> }
>
> + out:
> mutex_unlock(&aspm_lock);
> up_read(&pci_bus_sem);
> }
> --
> 2.47.0
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] PCI/ASPM: fix link state exit during switch upstream function removal.
2025-01-29 21:29 ` Bjorn Helgaas
@ 2025-01-30 6:54 ` Sathyanarayanan Kuppuswamy
0 siblings, 0 replies; 10+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-01-30 6:54 UTC (permalink / raw)
To: Bjorn Helgaas, Daniel Stodden
Cc: dinghui, Daniel Stodden, bhelgaas, david.e.box, kai.heng.feng,
linux-pci, michael.a.bottini, qinzongquan, rajatja,
refactormyself, vidyas
Hi,
On 1/29/25 1:29 PM, Bjorn Helgaas wrote:
> On Sun, Dec 22, 2024 at 07:39:08PM -0800, Daniel Stodden wrote:
>> From: Daniel Stodden <daniel.stodden@gmail.com>
If possible include where you noticed this issue and any dmesg
logs related to it.
>> Before change 456d8aa37d0f "Disable ASPM on MFD function removal to
>> avoid use-after-free", we would free the ASPM link only after the last
>> function on the bus pertaining to the given link was removed.
>>
>> That was too late. If function 0 is removed before sibling function,
>> link->downstream would point to free'd memory after.
>>
>> After above change, we freed the ASPM parent link state upon any
>> function removal on the bus pertaining to a given link.
>>
>> That is too early. If the link is to a PCIe switch with MFD on the
>> upstream port, then removing functions other than 0 first would free a
>> link which still remains parent_link to the remaining downstream
>> ports.
> Is this specific to a Switch? It seems like removal of any
> multi-function device might trip over this.
>
>> The resulting GPFs are especially frequent during hot-unplug, because
>> pciehp removes devices on the link bus in reverse order.
> Do you have a sample GPF? If we can include a few pertinent lines
> here it may help people connect a problem with this fix.
>
>> On that switch, function 0 is the virtual P2P bridge to the internal
>> bus. Free exactly when function 0 is removed -- before the parent link
>> is obsolete, but after all subordinate links are gone.
> I agree this is a problem.
>
> IIUC we allocate pcie_link_state when enumerating a device on the
> upstream end of a link, i.e., a Root Port or Switch Downstream Port,
> but we deallocate it when removing a device on the downstream end of
> the link, i.e., an Endpoint or Switch Upstream Port. This asymmetry
> seems kind of prone to error.
>
> Also, struct pcie_link_state contains redundant information, e.g., the
> pdev, downstream, parent, and sibling members basically duplicate the
> hierarchy already described by the struct pci_bus parent, self, and
> devices members. Redundancy like this is also error prone.
>
> This patch is attractive because it's a very small fix, and maybe we
> should use it for that reason. But I do think we're basically
> papering over a pretty serious design defect in ASPM.
>
> I think we'd ultimately be better off if we allocated pcie_link_state
> either as a member of struct pci_dev (instead of using a pointer), or
> perhaps in pci_setup_device() when we set up the rest of the
> bridge-related things and made it live as long as the bridge device.
>
> Actually, if we removed all the redundant pointers in struct
> pcie_link_state, it would be smaller than a single pointer, so there'd
> be no reason to allocate it dynamically.
>
> Of course this would be a much bigger change to aspm.c.
Agree. I also think creating the link at the setup would be a better
approach.
But If this issue is seen in any real hardware now and need a urgent
fix, may be we can merge this change for now.
>
>> Fixes: 456d8aa37d0f ("PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free")
> Do we have any public problem reports we could mention here? I'm
> actually a little surprised that this hasn't been a bigger problem,
> given that 456d8aa37d0f appeared in v6.5 in Aug 2023. But maybe there
> is some unusual topology or hot-unplug involved?
>
>> Signed-off-by: Daniel Stodden <dns@arista.com>
>> ---
>> drivers/pci/pcie/aspm.c | 17 +++++++++--------
>> 1 file changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> index e0bc90597dca..8ae7c75b408c 100644
>> --- a/drivers/pci/pcie/aspm.c
>> +++ b/drivers/pci/pcie/aspm.c
>> @@ -1273,16 +1273,16 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
>> parent_link = link->parent;
>>
>> /*
>> - * link->downstream is a pointer to the pci_dev of function 0. If
>> - * we remove that function, the pci_dev is about to be deallocated,
>> - * so we can't use link->downstream again. Free the link state to
>> - * avoid this.
>> + * Free the parent link state, no later than function 0 (i.e.
>> + * link->downstream) being removed.
>> *
>> - * If we're removing a non-0 function, it's possible we could
>> - * retain the link state, but PCIe r6.0, sec 7.5.3.7, recommends
>> - * programming the same ASPM Control value for all functions of
>> - * multi-function devices, so disable ASPM for all of them.
>> + * Do not free free the link state any earlier. If function 0
Above line is not very clear. May be you can remove " Do not free free
the link state any earlier"
>> + * is a switch upstream port, this link state is parent_link
>> + * to all subordinate ones.
>> */
>> + if (pdev != link->downstream)
>> + goto out;
>> +
>> pcie_config_aspm_link(link, 0);
>> list_del(&link->sibling);
>> free_link_state(link);
>> @@ -1293,6 +1293,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
>> pcie_config_aspm_path(parent_link);
>> }
>>
>> + out:
>> mutex_unlock(&aspm_lock);
>> up_read(&pci_bus_sem);
>> }
>> --
>> 2.47.0
>>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-01-30 6:54 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-07 3:40 [PATCH v2] PCI/ASPM: Fix UAF by disabling ASPM for link when child function is removed Ding Hui
2023-05-18 21:54 ` Bjorn Helgaas
2024-12-23 3:39 ` [PATCH 0/1] " Daniel Stodden
2024-12-23 3:39 ` [PATCH 1/1] PCI/ASPM: fix link state exit during switch upstream function removal Daniel Stodden
2025-01-03 1:53 ` Daniel Stodden
2025-01-29 21:29 ` Bjorn Helgaas
2025-01-30 6:54 ` Sathyanarayanan Kuppuswamy
2024-12-28 5:00 ` [PATCH 0/1] Re: PCI/ASPM: Fix UAF by disabling ASPM for link when child function is removed Ding Hui
2024-12-30 15:17 ` [PATCH 0/1] " Daniel Stodden
2025-01-01 12:25 ` Ding Hui
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).