* [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check
@ 2025-08-18 4:06 Guanghui Feng
2025-08-18 6:32 ` Lukas Wunner
0 siblings, 1 reply; 5+ messages in thread
From: Guanghui Feng @ 2025-08-18 4:06 UTC (permalink / raw)
To: bhelgaas; +Cc: alikernel-developer, linux-pci
When executing a secondary bus reset on a bridge downstream port, all
downstream devices and switches will be reseted. Before
pci_bridge_secondary_bus_reset returns, ensure that all available
devices have completed reset and initialization. Otherwise, using a
device before initialization completed will result in errors or even
device offline.
Note: If this modification is resonable, I will modify
the patch to address issues such as the long-term lock
occupation of pci_walk_bus.
Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com>
---
drivers/pci/pci.c | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b0f4d98036cd..c1544f650719 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4839,6 +4839,18 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus)
return max(min_delay, max_delay);
}
+struct pci_bridge_rst {
+ int ret;
+ int timeout;
+ char *reset_type;
+};
+
+static int pci_bridge_rst_wait_dev(struct pci_dev *dev, void *data)
+{
+ struct pci_bridge_rst *d = data;
+ return d->ret = pci_dev_wait(dev, d->reset_type, d->timeout);
+}
+
/**
* pci_bridge_wait_for_secondary_bus - Wait for secondary bus to be accessible
* @dev: PCI bridge
@@ -4857,8 +4869,8 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus)
*/
int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
{
- struct pci_dev *child __free(pci_dev_put) = NULL;
int delay;
+ struct pci_bridge_rst data = {.reset_type = reset_type};
if (pci_dev_is_disconnected(dev))
return 0;
@@ -4885,9 +4897,6 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
up_read(&pci_bus_sem);
return 0;
}
-
- child = pci_dev_get(list_first_entry(&dev->subordinate->devices,
- struct pci_dev, bus_list));
up_read(&pci_bus_sem);
/*
@@ -4924,7 +4933,9 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
msleep(delay);
- if (!pci_dev_wait(child, reset_type, PCI_RESET_WAIT - delay))
+ data.timeout = PCI_RESET_WAIT - delay;
+ pci_walk_bus(dev->subordinate, pci_bridge_rst_wait_dev, &data);
+ if (!data.ret)
return 0;
/*
@@ -4939,8 +4950,9 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
if (!(status & PCI_EXP_LNKSTA_DLLLA))
return -ENOTTY;
- return pci_dev_wait(child, reset_type,
- PCIE_RESET_READY_POLL_MS - PCI_RESET_WAIT);
+ data.timeout = PCIE_RESET_READY_POLL_MS - PCI_RESET_WAIT;
+ pci_walk_bus(dev->subordinate, pci_bridge_rst_wait_dev, &data);
+ return data.ret;
}
pci_dbg(dev, "waiting %d ms for downstream link, after activation\n",
@@ -4951,8 +4963,9 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
return -ENOTTY;
}
- return pci_dev_wait(child, reset_type,
- PCIE_RESET_READY_POLL_MS - delay);
+ data.timeout = PCIE_RESET_READY_POLL_MS - delay;
+ pci_walk_bus(dev->subordinate, pci_bridge_rst_wait_dev, &data);
+ return data.ret;
}
void pci_reset_secondary_bus(struct pci_dev *dev)
--
2.32.0.3.gf3a3e56d6
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check
2025-08-18 4:06 [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check Guanghui Feng
@ 2025-08-18 6:32 ` Lukas Wunner
2025-08-19 7:01 ` guanghui.fgh
2025-08-19 7:09 ` guanghui.fgh
0 siblings, 2 replies; 5+ messages in thread
From: Lukas Wunner @ 2025-08-18 6:32 UTC (permalink / raw)
To: Guanghui Feng; +Cc: bhelgaas, alikernel-developer, linux-pci
On Mon, Aug 18, 2025 at 12:06:40PM +0800, Guanghui Feng wrote:
> When executing a secondary bus reset on a bridge downstream port, all
> downstream devices and switches will be reseted. Before
> pci_bridge_secondary_bus_reset returns, ensure that all available
> devices have completed reset and initialization. Otherwise, using a
> device before initialization completed will result in errors or even
> device offline.
I recently received a report off-list for what looks like the same issue
and came up with the patch below.
Would it fix the issue for you?
It's not yet a properly fleshed-out patch, just a proof of concept.
But it's smaller and simpler than the approach you've taken.
This patch is for a Secondary Bus Reset issued by AER. Is the bus reset
likewise happening through AER in your case or what's the code path
leading to the bus reset?
-- >8 --
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index fa83ebd..8b427a9 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -761,6 +761,10 @@ static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
pci_restore_state(dev);
pci_save_state(dev);
+
+ if (pci_bridge_wait_for_secondary_bus(dev, "hot reset"))
+ return PCI_ERS_RESULT_DISCONNECT;
+
return PCI_ERS_RESULT_RECOVERED;
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check
2025-08-18 6:32 ` Lukas Wunner
@ 2025-08-19 7:01 ` guanghui.fgh
2025-08-19 7:09 ` guanghui.fgh
1 sibling, 0 replies; 5+ messages in thread
From: guanghui.fgh @ 2025-08-19 7:01 UTC (permalink / raw)
To: Lukas Wunner; +Cc: bhelgaas, alikernel-developer, linux-pci
When passing through multiple devices attached to PCIe switch downstream port through the vfio module,
we can initiate a secondary bus reset (__pci_reset_bus --- pci_bridge_secondary_bus_reset)
using the vfio VFIO_DEVICE_PCI_HOT_RESET call. However, it's crucial to ensure that all devices
have completed reset and initialization before pci_bridge_secondary_bus_reset returns. Otherwise,
directly accessing an unreset device can trigger a device error or even cause it to go offline.
Therefore, it's necessary to wait for all devices to complete reset in pci_bridge_secondary_bus_reset.
(The above [RFC] patch also requires adjustments to handle situations like long-held locks and unexpected device offlines.)
Thanks
------------------------------------------------------------------
From:Lukas Wunner <lukas@wunner.de>
Send Time:2025年8月18日(周一) 14:32
To:Guanghui Feng<guanghuifeng@linux.alibaba.com>
CC:bhelgaas<bhelgaas@google.com>; "alikernel-developer"<alikernel-developer@linux.alibaba.com>; "linux-pci"<linux-pci@vger.kernel.org>
Subject:Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check
On Mon, Aug 18, 2025 at 12:06:40PM +0800, Guanghui Feng wrote:
> When executing a secondary bus reset on a bridge downstream port, all
> downstream devices and switches will be reseted. Before
> pci_bridge_secondary_bus_reset returns, ensure that all available
> devices have completed reset and initialization. Otherwise, using a
> device before initialization completed will result in errors or even
> device offline.
I recently received a report off-list for what looks like the same issue
and came up with the patch below.
Would it fix the issue for you?
It's not yet a properly fleshed-out patch, just a proof of concept.
But it's smaller and simpler than the approach you've taken.
This patch is for a Secondary Bus Reset issued by AER. Is the bus reset
likewise happening through AER in your case or what's the code path
leading to the bus reset?
-- >8 --
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index fa83ebd..8b427a9 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -761,6 +761,10 @@ static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
pci_restore_state(dev);
pci_save_state(dev);
+
+ if (pci_bridge_wait_for_secondary_bus(dev, "hot reset"))
+ return PCI_ERS_RESULT_DISCONNECT;
+
return PCI_ERS_RESULT_RECOVERED;
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check
2025-08-18 6:32 ` Lukas Wunner
2025-08-19 7:01 ` guanghui.fgh
@ 2025-08-19 7:09 ` guanghui.fgh
2025-08-29 6:53 ` guanghui.fgh
1 sibling, 1 reply; 5+ messages in thread
From: guanghui.fgh @ 2025-08-19 7:09 UTC (permalink / raw)
To: Lukas Wunner; +Cc: bhelgaas, alikernel-developer, linux-pci
When passing through multiple devices attached to PCIe switch downstream port through the vfio module,
we can initiate a secondary bus reset (__pci_reset_bus --- pci_bridge_secondary_bus_reset)
using the vfio VFIO_DEVICE_PCI_HOT_RESET call. However, it's crucial to ensure that all devices
have completed reset and initialization before pci_bridge_secondary_bus_reset returns. Otherwise,
directly accessing an unreset device can trigger a device error or even cause it to go offline.
Therefore, it's necessary to wait for all devices to complete reset in pci_bridge_secondary_bus_reset.
(The above [RFC] patch also requires adjustments to handle situations like long-held locks and unexpected device offlines.)
Thanks
------------------------------------------------------------------
From:Lukas Wunner <lukas@wunner.de>
Send Time:2025 Aug. 18 (Mon.) 14:32
To:Guanghui Feng<guanghuifeng@linux.alibaba.com>
CC:bhelgaas<bhelgaas@google.com>; "alikernel-developer"<alikernel-developer@linux.alibaba.com>; "linux-pci"<linux-pci@vger.kernel.org>
Subject:Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check
On Mon, Aug 18, 2025 at 12:06:40PM +0800, Guanghui Feng wrote:
> When executing a secondary bus reset on a bridge downstream port, all
> downstream devices and switches will be reseted. Before
> pci_bridge_secondary_bus_reset returns, ensure that all available
> devices have completed reset and initialization. Otherwise, using a
> device before initialization completed will result in errors or even
> device offline.
I recently received a report off-list for what looks like the same issue
and came up with the patch below.
Would it fix the issue for you?
It's not yet a properly fleshed-out patch, just a proof of concept.
But it's smaller and simpler than the approach you've taken.
This patch is for a Secondary Bus Reset issued by AER. Is the bus reset
likewise happening through AER in your case or what's the code path
leading to the bus reset?
-- >8 --
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index fa83ebd..8b427a9 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -761,6 +761,10 @@ static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
pci_restore_state(dev);
pci_save_state(dev);
+
+ if (pci_bridge_wait_for_secondary_bus(dev, "hot reset"))
+ return PCI_ERS_RESULT_DISCONNECT;
+
return PCI_ERS_RESULT_RECOVERED;
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check
2025-08-19 7:09 ` guanghui.fgh
@ 2025-08-29 6:53 ` guanghui.fgh
0 siblings, 0 replies; 5+ messages in thread
From: guanghui.fgh @ 2025-08-29 6:53 UTC (permalink / raw)
To: Lukas Wunner; +Cc: bhelgaas, alikernel-developer, linux-pci
Do you have any suggestions for this modification?
Thanks
------------------------------------------------------------------
From:guanghui.fgh <guanghuifeng@linux.alibaba.com>
Send Time:2025 Aug. 19 (Tue.) 15:09
To:Lukas Wunner<lukas@wunner.de>
CC:bhelgaas<bhelgaas@google.com>; "alikernel-developer"<alikernel-developer@linux.alibaba.com>; "linux-pci"<linux-pci@vger.kernel.org>
Subject:Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check
When passing through multiple devices attached to PCIe switch downstream port through the vfio module,
we can initiate a secondary bus reset (__pci_reset_bus --- pci_bridge_secondary_bus_reset)
using the vfio VFIO_DEVICE_PCI_HOT_RESET call. However, it's crucial to ensure that all devices
have completed reset and initialization before pci_bridge_secondary_bus_reset returns. Otherwise,
directly accessing an unreset device can trigger a device error or even cause it to go offline.
Therefore, it's necessary to wait for all devices to complete reset in pci_bridge_secondary_bus_reset.
(The above [RFC] patch also requires adjustments to handle situations like long-held locks and unexpected device offlines.)
Thanks
------------------------------------------------------------------
From:Lukas Wunner <lukas@wunner.de>
Send Time:2025 Aug. 18 (Mon.) 14:32
To:Guanghui Feng<guanghuifeng@linux.alibaba.com>
CC:bhelgaas<bhelgaas@google.com>; "alikernel-developer"<alikernel-developer@linux.alibaba.com>; "linux-pci"<linux-pci@vger.kernel.org>
Subject:Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check
On Mon, Aug 18, 2025 at 12:06:40PM +0800, Guanghui Feng wrote:
> When executing a secondary bus reset on a bridge downstream port, all
> downstream devices and switches will be reseted. Before
> pci_bridge_secondary_bus_reset returns, ensure that all available
> devices have completed reset and initialization. Otherwise, using a
> device before initialization completed will result in errors or even
> device offline.
I recently received a report off-list for what looks like the same issue
and came up with the patch below.
Would it fix the issue for you?
It's not yet a properly fleshed-out patch, just a proof of concept.
But it's smaller and simpler than the approach you've taken.
This patch is for a Secondary Bus Reset issued by AER. Is the bus reset
likewise happening through AER in your case or what's the code path
leading to the bus reset?
-- >8 --
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index fa83ebd..8b427a9 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -761,6 +761,10 @@ static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
pci_restore_state(dev);
pci_save_state(dev);
+
+ if (pci_bridge_wait_for_secondary_bus(dev, "hot reset"))
+ return PCI_ERS_RESULT_DISCONNECT;
+
return PCI_ERS_RESULT_RECOVERED;
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-08-29 6:53 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-18 4:06 [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check Guanghui Feng
2025-08-18 6:32 ` Lukas Wunner
2025-08-19 7:01 ` guanghui.fgh
2025-08-19 7:09 ` guanghui.fgh
2025-08-29 6:53 ` guanghui.fgh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox