Linux PCI subsystem development
 help / color / mirror / Atom feed
* [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check
@ 2025-08-18  4:06 Guanghui Feng
  2025-08-18  6:32 ` Lukas Wunner
  0 siblings, 1 reply; 5+ messages in thread
From: Guanghui Feng @ 2025-08-18  4:06 UTC (permalink / raw)
  To: bhelgaas; +Cc: alikernel-developer, linux-pci

When executing a secondary bus reset on a bridge downstream port, all
downstream devices and switches will be reseted. Before
pci_bridge_secondary_bus_reset returns, ensure that all available
devices have completed reset and initialization. Otherwise, using a
device before initialization completed will result in errors or even
device offline.

Note: If this modification is resonable, I will modify
the patch to address issues such as the long-term lock
occupation of pci_walk_bus.

Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com>
---
 drivers/pci/pci.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b0f4d98036cd..c1544f650719 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4839,6 +4839,18 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus)
 	return max(min_delay, max_delay);
 }
 
+struct pci_bridge_rst {
+	int ret;
+	int timeout;
+	char *reset_type;
+};
+
+static int pci_bridge_rst_wait_dev(struct pci_dev *dev, void *data)
+{
+	struct pci_bridge_rst *d = data;
+	return d->ret = pci_dev_wait(dev, d->reset_type, d->timeout);
+}
+
 /**
  * pci_bridge_wait_for_secondary_bus - Wait for secondary bus to be accessible
  * @dev: PCI bridge
@@ -4857,8 +4869,8 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus)
  */
 int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
 {
-	struct pci_dev *child __free(pci_dev_put) = NULL;
 	int delay;
+	struct pci_bridge_rst data = {.reset_type = reset_type};
 
 	if (pci_dev_is_disconnected(dev))
 		return 0;
@@ -4885,9 +4897,6 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
 		up_read(&pci_bus_sem);
 		return 0;
 	}
-
-	child = pci_dev_get(list_first_entry(&dev->subordinate->devices,
-					     struct pci_dev, bus_list));
 	up_read(&pci_bus_sem);
 
 	/*
@@ -4924,7 +4933,9 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
 		pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
 		msleep(delay);
 
-		if (!pci_dev_wait(child, reset_type, PCI_RESET_WAIT - delay))
+		data.timeout = PCI_RESET_WAIT - delay;
+		pci_walk_bus(dev->subordinate, pci_bridge_rst_wait_dev, &data);
+		if (!data.ret)
 			return 0;
 
 		/*
@@ -4939,8 +4950,9 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
 		if (!(status & PCI_EXP_LNKSTA_DLLLA))
 			return -ENOTTY;
 
-		return pci_dev_wait(child, reset_type,
-				    PCIE_RESET_READY_POLL_MS - PCI_RESET_WAIT);
+		data.timeout = PCIE_RESET_READY_POLL_MS - PCI_RESET_WAIT;
+		pci_walk_bus(dev->subordinate, pci_bridge_rst_wait_dev, &data);
+		return data.ret;
 	}
 
 	pci_dbg(dev, "waiting %d ms for downstream link, after activation\n",
@@ -4951,8 +4963,9 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
 		return -ENOTTY;
 	}
 
-	return pci_dev_wait(child, reset_type,
-			    PCIE_RESET_READY_POLL_MS - delay);
+	data.timeout = PCIE_RESET_READY_POLL_MS - delay;
+	pci_walk_bus(dev->subordinate, pci_bridge_rst_wait_dev, &data);
+	return data.ret;
 }
 
 void pci_reset_secondary_bus(struct pci_dev *dev)
-- 
2.32.0.3.gf3a3e56d6


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check
  2025-08-18  4:06 [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check Guanghui Feng
@ 2025-08-18  6:32 ` Lukas Wunner
  2025-08-19  7:01   ` guanghui.fgh
  2025-08-19  7:09   ` guanghui.fgh
  0 siblings, 2 replies; 5+ messages in thread
From: Lukas Wunner @ 2025-08-18  6:32 UTC (permalink / raw)
  To: Guanghui Feng; +Cc: bhelgaas, alikernel-developer, linux-pci

On Mon, Aug 18, 2025 at 12:06:40PM +0800, Guanghui Feng wrote:
> When executing a secondary bus reset on a bridge downstream port, all
> downstream devices and switches will be reseted. Before
> pci_bridge_secondary_bus_reset returns, ensure that all available
> devices have completed reset and initialization. Otherwise, using a
> device before initialization completed will result in errors or even
> device offline.

I recently received a report off-list for what looks like the same issue
and came up with the patch below.

Would it fix the issue for you?

It's not yet a properly fleshed-out patch, just a proof of concept.
But it's smaller and simpler than the approach you've taken.

This patch is for a Secondary Bus Reset issued by AER.  Is the bus reset
likewise happening through AER in your case or what's the code path
leading to the bus reset?

-- >8 --

diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index fa83ebd..8b427a9 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -761,6 +761,10 @@ static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
 
 	pci_restore_state(dev);
 	pci_save_state(dev);
+
+	if (pci_bridge_wait_for_secondary_bus(dev, "hot reset"))
+		return PCI_ERS_RESULT_DISCONNECT;
+
 	return PCI_ERS_RESULT_RECOVERED;
 }
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check
  2025-08-18  6:32 ` Lukas Wunner
@ 2025-08-19  7:01   ` guanghui.fgh
  2025-08-19  7:09   ` guanghui.fgh
  1 sibling, 0 replies; 5+ messages in thread
From: guanghui.fgh @ 2025-08-19  7:01 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: bhelgaas, alikernel-developer, linux-pci

When passing through multiple devices attached to PCIe switch downstream port through the vfio module,
we can initiate a secondary bus reset (__pci_reset_bus --- pci_bridge_secondary_bus_reset)
using the vfio VFIO_DEVICE_PCI_HOT_RESET call. However, it's crucial to ensure that all devices
have completed reset and initialization before pci_bridge_secondary_bus_reset returns. Otherwise,
directly accessing an unreset device can trigger a device error or even cause it to go offline.

Therefore, it's necessary to wait for all devices to complete reset in pci_bridge_secondary_bus_reset. 
(The above [RFC] patch also requires adjustments to handle situations like long-held locks and unexpected device offlines.)

Thanks

------------------------------------------------------------------
From:Lukas Wunner <lukas@wunner.de>
Send Time:2025年8月18日(周一) 14:32
To:Guanghui Feng<guanghuifeng@linux.alibaba.com>
CC:bhelgaas<bhelgaas@google.com>; "alikernel-developer"<alikernel-developer@linux.alibaba.com>; "linux-pci"<linux-pci@vger.kernel.org>
Subject:Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check


On Mon, Aug 18, 2025 at 12:06:40PM +0800, Guanghui Feng wrote:
> When executing a secondary bus reset on a bridge downstream port, all
> downstream devices and switches will be reseted. Before
> pci_bridge_secondary_bus_reset returns, ensure that all available
> devices have completed reset and initialization. Otherwise, using a
> device before initialization completed will result in errors or even
> device offline.

I recently received a report off-list for what looks like the same issue
and came up with the patch below.

Would it fix the issue for you?

It's not yet a properly fleshed-out patch, just a proof of concept.
But it's smaller and simpler than the approach you've taken.

This patch is for a Secondary Bus Reset issued by AER.  Is the bus reset
likewise happening through AER in your case or what's the code path
leading to the bus reset?

-- >8 --

diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index fa83ebd..8b427a9 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -761,6 +761,10 @@ static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
 
  pci_restore_state(dev);
  pci_save_state(dev);
+
+ if (pci_bridge_wait_for_secondary_bus(dev, "hot reset"))
+  return PCI_ERS_RESULT_DISCONNECT;
+
  return PCI_ERS_RESULT_RECOVERED;
 }
 



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check
  2025-08-18  6:32 ` Lukas Wunner
  2025-08-19  7:01   ` guanghui.fgh
@ 2025-08-19  7:09   ` guanghui.fgh
  2025-08-29  6:53     ` guanghui.fgh
  1 sibling, 1 reply; 5+ messages in thread
From: guanghui.fgh @ 2025-08-19  7:09 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: bhelgaas, alikernel-developer, linux-pci

When passing through multiple devices attached to PCIe switch downstream port through the vfio module,
we can initiate a secondary bus reset (__pci_reset_bus --- pci_bridge_secondary_bus_reset)
using the vfio VFIO_DEVICE_PCI_HOT_RESET call. However, it's crucial to ensure that all devices
have completed reset and initialization before pci_bridge_secondary_bus_reset returns. Otherwise,
directly accessing an unreset device can trigger a device error or even cause it to go offline.

Therefore, it's necessary to wait for all devices to complete reset in pci_bridge_secondary_bus_reset. 
(The above [RFC] patch also requires adjustments to handle situations like long-held locks and unexpected device offlines.)

Thanks


------------------------------------------------------------------
From:Lukas Wunner <lukas@wunner.de>
Send Time:2025 Aug. 18 (Mon.) 14:32
To:Guanghui Feng<guanghuifeng@linux.alibaba.com>
CC:bhelgaas<bhelgaas@google.com>; "alikernel-developer"<alikernel-developer@linux.alibaba.com>; "linux-pci"<linux-pci@vger.kernel.org>
Subject:Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check


On Mon, Aug 18, 2025 at 12:06:40PM +0800, Guanghui Feng wrote:
> When executing a secondary bus reset on a bridge downstream port, all
> downstream devices and switches will be reseted. Before
> pci_bridge_secondary_bus_reset returns, ensure that all available
> devices have completed reset and initialization. Otherwise, using a
> device before initialization completed will result in errors or even
> device offline.

I recently received a report off-list for what looks like the same issue
and came up with the patch below.

Would it fix the issue for you?

It's not yet a properly fleshed-out patch, just a proof of concept.
But it's smaller and simpler than the approach you've taken.

This patch is for a Secondary Bus Reset issued by AER.  Is the bus reset
likewise happening through AER in your case or what's the code path
leading to the bus reset?

-- >8 --

diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index fa83ebd..8b427a9 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -761,6 +761,10 @@ static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
 
  pci_restore_state(dev);
  pci_save_state(dev);
+
+ if (pci_bridge_wait_for_secondary_bus(dev, "hot reset"))
+  return PCI_ERS_RESULT_DISCONNECT;
+
  return PCI_ERS_RESULT_RECOVERED;
 }
 



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check
  2025-08-19  7:09   ` guanghui.fgh
@ 2025-08-29  6:53     ` guanghui.fgh
  0 siblings, 0 replies; 5+ messages in thread
From: guanghui.fgh @ 2025-08-29  6:53 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: bhelgaas, alikernel-developer, linux-pci

Do you have any suggestions for this modification?

Thanks
------------------------------------------------------------------
From:guanghui.fgh <guanghuifeng@linux.alibaba.com>
Send Time:2025 Aug. 19 (Tue.) 15:09
To:Lukas Wunner<lukas@wunner.de>
CC:bhelgaas<bhelgaas@google.com>; "alikernel-developer"<alikernel-developer@linux.alibaba.com>; "linux-pci"<linux-pci@vger.kernel.org>
Subject:Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check


When passing through multiple devices attached to PCIe switch downstream port through the vfio module,
we can initiate a secondary bus reset (__pci_reset_bus --- pci_bridge_secondary_bus_reset)
using the vfio VFIO_DEVICE_PCI_HOT_RESET call. However, it's crucial to ensure that all devices
have completed reset and initialization before pci_bridge_secondary_bus_reset returns. Otherwise,
directly accessing an unreset device can trigger a device error or even cause it to go offline.

Therefore, it's necessary to wait for all devices to complete reset in pci_bridge_secondary_bus_reset. 
(The above [RFC] patch also requires adjustments to handle situations like long-held locks and unexpected device offlines.)

Thanks


------------------------------------------------------------------
From:Lukas Wunner <lukas@wunner.de>
Send Time:2025 Aug. 18 (Mon.) 14:32
To:Guanghui Feng<guanghuifeng@linux.alibaba.com>
CC:bhelgaas<bhelgaas@google.com>; "alikernel-developer"<alikernel-developer@linux.alibaba.com>; "linux-pci"<linux-pci@vger.kernel.org>
Subject:Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check


On Mon, Aug 18, 2025 at 12:06:40PM +0800, Guanghui Feng wrote:
> When executing a secondary bus reset on a bridge downstream port, all
> downstream devices and switches will be reseted. Before
> pci_bridge_secondary_bus_reset returns, ensure that all available
> devices have completed reset and initialization. Otherwise, using a
> device before initialization completed will result in errors or even
> device offline.

I recently received a report off-list for what looks like the same issue
and came up with the patch below.

Would it fix the issue for you?

It's not yet a properly fleshed-out patch, just a proof of concept.
But it's smaller and simpler than the approach you've taken.

This patch is for a Secondary Bus Reset issued by AER.  Is the bus reset
likewise happening through AER in your case or what's the code path
leading to the bus reset?

-- >8 --

diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index fa83ebd..8b427a9 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -761,6 +761,10 @@ static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
 
  pci_restore_state(dev);
  pci_save_state(dev);
+
+ if (pci_bridge_wait_for_secondary_bus(dev, "hot reset"))
+  return PCI_ERS_RESULT_DISCONNECT;
+
  return PCI_ERS_RESULT_RECOVERED;
 }
 




^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-08-29  6:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-18  4:06 [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check Guanghui Feng
2025-08-18  6:32 ` Lukas Wunner
2025-08-19  7:01   ` guanghui.fgh
2025-08-19  7:09   ` guanghui.fgh
2025-08-29  6:53     ` guanghui.fgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox