public inbox for linuxppc-dev@ozlabs.org
 help / color / mirror / Atom feed
From: LeoLiu-oc <LeoLiu-oc@zhaoxin.com>
To: Bjorn Helgaas <bhelgaas@google.com>,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	Lukas Wunner <lukas@wunner.de>,
	Przemek Kitszel <przemyslaw.kitszel@intel.com>,
	<leoliu-oc@zhaoxin.com>
Cc: Oliver O'Halloran <oohall@gmail.com>,
	<linuxppc-dev@lists.ozlabs.org>, <linux-pci@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <CobeChen@zhaoxin.com>,
	<ErosZhang@zhaoxin.com>, <TonyWWang@zhaoxin.com>
Subject: [PATCH v2] PCI: dpc: Increase pciehp waiting time for DPC recovery
Date: Wed, 4 Feb 2026 11:55:42 +0800	[thread overview]
Message-ID: <20260204035542.53232-1-LeoLiu-oc@zhaoxin.com> (raw)

From: LeoLiu-oc <leoliu-oc@zhaoxin.com>

Commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
amended PCIe hotplug to not bring down the slot upon Data Link Layer State
Changed events caused by Downstream Port Containment.

Commit c3be50f7547c ("PCI: pciehp: Ignore Presence Detect Changed caused by
DPC") sought to ignore Presence Detect Changed events occurring as a side
effect of Downstream Port Containment. These commits await recovery from
DPC and then clears events which occurred in the meantime.

However, pciehp_ist() waits up to 4 seconds before assuming that DPC
recovery has failed and disabling the slot. This timeout period is
insufficient for some PCIe devices.

For example, The execution of the ice_pci_err_detected() in the ice network
card driver exceeded the maximum waiting time for DPC recovery, causing the
pciehp_disable_slot() to be executed which is not needed. From the user's
point of view, you will see that the ice network card may not be usable and
could even cause more serious errors, such as a kernel panic. kernel panic
is caused by a race between pciehp_disable_slot() and pcie_do_recovery().
In practice, we would observe that the ice network card is in an
unavailable state and a kernel panic.

Therefore, we need to increase the time that pciehp_ist() waits for the DPC
to recover. For some PCIe devices, the time taken for the error_detected()
to execute may exceed 16 seconds, but the dpc_reset_link() has not yet been
executed. In this situation, the Link Down/Up events and Presence Detect
Changed events that occur during the DPC recovery should also be ignored.

Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>

---
v2:
 - Modify and add code comments
 - Add handling for error_detected() execution exceeding 16s

v1: https://lore.kernel.org/all/20260123104034.429060-1-LeoLiu-oc@zhaoxin.com/
---
 drivers/pci/pcie/dpc.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index fc18349614d7..331d0299af6a 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -103,6 +103,7 @@ static bool dpc_completed(struct pci_dev *pdev)
 bool pci_dpc_recovered(struct pci_dev *pdev)
 {
 	struct pci_host_bridge *host;
+	u16 status;
 
 	if (!pdev->dpc_cap)
 		return false;
@@ -118,10 +119,22 @@ bool pci_dpc_recovered(struct pci_dev *pdev)
 	/*
 	 * Need a timeout in case DPC never completes due to failure of
 	 * dpc_wait_rp_inactive().  The spec doesn't mandate a time limit,
-	 * but reports indicate that DPC completes within 4 seconds.
+	 * but reports indicate that DPC completes within 16 seconds.
 	 */
 	wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
-			   msecs_to_jiffies(4000));
+			   msecs_to_jiffies(16000));
+
+	/*
+	 * In some cases, the execution time of report_error_detected()
+	 * exceeded 16 seconds, and dpc_reset_link() was still waiting to
+	 * be executed. This situation should be treated as successful dpc
+	 * recovery.
+	 */
+	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS, &status);
+	if ((!PCI_POSSIBLE_ERROR(status)) && (status & PCI_EXP_DPC_STATUS_TRIGGER)) {
+		pci_warn(pdev, "DPC: error_detected() callback timed out\n");
+		return true;
+	}
 
 	return test_and_clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
 }
-- 
2.43.0



             reply	other threads:[~2026-02-04  5:55 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-04  3:55 LeoLiu-oc [this message]
2026-02-05  8:54 ` [PATCH v2] PCI: dpc: Increase pciehp waiting time for DPC recovery Lukas Wunner
2026-02-06 11:07   ` LeoLiu-oc

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260204035542.53232-1-LeoLiu-oc@zhaoxin.com \
    --to=leoliu-oc@zhaoxin.com \
    --cc=CobeChen@zhaoxin.com \
    --cc=ErosZhang@zhaoxin.com \
    --cc=TonyWWang@zhaoxin.com \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lukas@wunner.de \
    --cc=mahesh@linux.ibm.com \
    --cc=oohall@gmail.com \
    --cc=przemyslaw.kitszel@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox