* [PATCH v2] PCI: dpc: Increase pciehp waiting time for DPC recovery
@ 2026-02-04 3:55 LeoLiu-oc
2026-02-05 8:54 ` Lukas Wunner
0 siblings, 1 reply; 3+ messages in thread
From: LeoLiu-oc @ 2026-02-04 3:55 UTC (permalink / raw)
To: Bjorn Helgaas, Mahesh J Salgaonkar, Lukas Wunner, Przemek Kitszel,
leoliu-oc
Cc: Oliver O'Halloran, linuxppc-dev, linux-pci, linux-kernel,
CobeChen, ErosZhang, TonyWWang
From: LeoLiu-oc <leoliu-oc@zhaoxin.com>
Commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
amended PCIe hotplug to not bring down the slot upon Data Link Layer State
Changed events caused by Downstream Port Containment.
Commit c3be50f7547c ("PCI: pciehp: Ignore Presence Detect Changed caused by
DPC") sought to ignore Presence Detect Changed events occurring as a side
effect of Downstream Port Containment. These commits await recovery from
DPC and then clears events which occurred in the meantime.
However, pciehp_ist() waits up to 4 seconds before assuming that DPC
recovery has failed and disabling the slot. This timeout period is
insufficient for some PCIe devices.
For example, The execution of the ice_pci_err_detected() in the ice network
card driver exceeded the maximum waiting time for DPC recovery, causing the
pciehp_disable_slot() to be executed which is not needed. From the user's
point of view, you will see that the ice network card may not be usable and
could even cause more serious errors, such as a kernel panic. kernel panic
is caused by a race between pciehp_disable_slot() and pcie_do_recovery().
In practice, we would observe that the ice network card is in an
unavailable state and a kernel panic.
Therefore, we need to increase the time that pciehp_ist() waits for the DPC
to recover. For some PCIe devices, the time taken for the error_detected()
to execute may exceed 16 seconds, but the dpc_reset_link() has not yet been
executed. In this situation, the Link Down/Up events and Presence Detect
Changed events that occur during the DPC recovery should also be ignored.
Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>
---
v2:
- Modify and add code comments
- Add handling for error_detected() execution exceeding 16s
v1: https://lore.kernel.org/all/20260123104034.429060-1-LeoLiu-oc@zhaoxin.com/
---
drivers/pci/pcie/dpc.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index fc18349614d7..331d0299af6a 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -103,6 +103,7 @@ static bool dpc_completed(struct pci_dev *pdev)
bool pci_dpc_recovered(struct pci_dev *pdev)
{
struct pci_host_bridge *host;
+ u16 status;
if (!pdev->dpc_cap)
return false;
@@ -118,10 +119,22 @@ bool pci_dpc_recovered(struct pci_dev *pdev)
/*
* Need a timeout in case DPC never completes due to failure of
* dpc_wait_rp_inactive(). The spec doesn't mandate a time limit,
- * but reports indicate that DPC completes within 4 seconds.
+ * but reports indicate that DPC completes within 16 seconds.
*/
wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
- msecs_to_jiffies(4000));
+ msecs_to_jiffies(16000));
+
+ /*
+ * In some cases, the execution time of report_error_detected()
+ * exceeded 16 seconds, and dpc_reset_link() was still waiting to
+ * be executed. This situation should be treated as successful dpc
+ * recovery.
+ */
+ pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS, &status);
+ if ((!PCI_POSSIBLE_ERROR(status)) && (status & PCI_EXP_DPC_STATUS_TRIGGER)) {
+ pci_warn(pdev, "DPC: error_detected() callback timed out\n");
+ return true;
+ }
return test_and_clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2] PCI: dpc: Increase pciehp waiting time for DPC recovery
2026-02-04 3:55 [PATCH v2] PCI: dpc: Increase pciehp waiting time for DPC recovery LeoLiu-oc
@ 2026-02-05 8:54 ` Lukas Wunner
2026-02-06 11:07 ` LeoLiu-oc
0 siblings, 1 reply; 3+ messages in thread
From: Lukas Wunner @ 2026-02-05 8:54 UTC (permalink / raw)
To: LeoLiu-oc
Cc: Bjorn Helgaas, Mahesh J Salgaonkar, Przemek Kitszel,
Oliver O'Halloran, linuxppc-dev, linux-pci, linux-kernel,
CobeChen, ErosZhang, TonyWWang
On Wed, Feb 04, 2026 at 11:55:42AM +0800, LeoLiu-oc wrote:
> For example, The execution of the ice_pci_err_detected() in the ice network
> card driver exceeded the maximum waiting time for DPC recovery, causing the
> pciehp_disable_slot() to be executed which is not needed. From the user's
> point of view, you will see that the ice network card may not be usable and
> could even cause more serious errors, such as a kernel panic. kernel panic
> is caused by a race between pciehp_disable_slot() and pcie_do_recovery().
> In practice, we would observe that the ice network card is in an
> unavailable state and a kernel panic.
Unfortunately v2 was submitted without answering all of the questions
and testing all of the things asked for during review:
https://lore.kernel.org/all/aYBoP-B2E9fp_4YZ@wunner.de/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] PCI: dpc: Increase pciehp waiting time for DPC recovery
2026-02-05 8:54 ` Lukas Wunner
@ 2026-02-06 11:07 ` LeoLiu-oc
0 siblings, 0 replies; 3+ messages in thread
From: LeoLiu-oc @ 2026-02-06 11:07 UTC (permalink / raw)
To: Lukas Wunner
Cc: Bjorn Helgaas, Mahesh J Salgaonkar, Przemek Kitszel,
Oliver O'Halloran, linuxppc-dev, linux-pci, linux-kernel,
CobeChen, ErosZhang, TonyWWang
在 2026/2/5 16:54, Lukas Wunner 写道:
>
>
> [这封邮件来自外部发件人 谨防风险]
>
> On Wed, Feb 04, 2026 at 11:55:42AM +0800, LeoLiu-oc wrote:
>> For example, The execution of the ice_pci_err_detected() in the ice network
>> card driver exceeded the maximum waiting time for DPC recovery, causing the
>> pciehp_disable_slot() to be executed which is not needed. From the user's
>> point of view, you will see that the ice network card may not be usable and
>> could even cause more serious errors, such as a kernel panic. kernel panic
>> is caused by a race between pciehp_disable_slot() and pcie_do_recovery().
>> In practice, we would observe that the ice network card is in an
>> unavailable state and a kernel panic.
>
> Unfortunately v2 was submitted without answering all of the questions
> and testing all of the things asked for during review:
>
> https://lore.kernel.org/all/aYBoP-B2E9fp_4YZ@wunner.de/
I have already replied to your concern in the following email. Please
pay attention to check it
https://lore.kernel.org/all/018007dd-68d9-4e16-b605-15d9c77ea13f@zhaoxin.com/
Yours sincerely,
LeoLiu-oc
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-02-09 1:01 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-04 3:55 [PATCH v2] PCI: dpc: Increase pciehp waiting time for DPC recovery LeoLiu-oc
2026-02-05 8:54 ` Lukas Wunner
2026-02-06 11:07 ` LeoLiu-oc
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox