From: Lukas Wunner <lukas@wunner.de>
To: Hongbo Yao <andy.xu@hj-micro.com>
Cc: Sathyanarayanan Kuppuswamy
<sathyanarayanan.kuppuswamy@linux.intel.com>,
bhelgaas@google.com, mahesh@linux.ibm.com, oohall@gmail.com,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
jemma.zhang@hj-micro.com, peter.du@hj-micro.com
Subject: Re: [PATCH] PCI/DPC: Extend DPC recovery timeout
Date: Fri, 11 Jul 2025 06:13:01 +0200 [thread overview]
Message-ID: <aHCPTU03s-SkAsPs@wunner.de> (raw)
In-Reply-To: <b9b64b4f-dcec-4ab1-b796-54d66ec91fc5@hj-micro.com>
On Fri, Jul 11, 2025 at 11:20:15AM +0800, Hongbo Yao wrote:
> 2025/7/8 1:04, Sathyanarayanan Kuppuswamy:
> > On 7/7/25 3:30 AM, Andy Xu wrote:
> > > Setting timeout to 7s covers both devices with safety margin.
> >
> > Instead of updating the recovery time, can you check why your device
> > recovery takes
> > such a long time and how to fix it from the device end?
>
> I fully agree that ideally the root cause should be addressed on the
> device side to reduce the DPC recovery latency, and that waiting longer
> in the kernel is not a perfect solution.
>
> However, the current 4 seconds timeout in pci_dpc_recovered() is indeed
> an empirical value rather than a hard requirement from the PCIe
> specification. In real-world scenarios, like with Mellanox ConnectX-5/7
> adapters, we've observed that full DPC recovery can take more than 5-6
> seconds, which leads to premature hotplug processing and device removal.
I think Sathya's point was: Have you made an effort to talk to the
vendor and ask them to root-cause and fix the issue e.g. with a firmware
update.
> To improve robustness and maintain flexibility, I???m considering
> introducing a module parameter to allow tuning the DPC recovery timeout
> dynamically. Would you like me to prepare and submit such a patch for
> review?
We try to avoid adding new module parameters. Things should just work
out of the box without the user having to adjust the kernel command
line for their system.
So the solution is indeed to either adjust the delay for everyone
(as you've done) or introduce an unsigned int to struct pci_dev
which can be assigned the delay after reset for the device to be
responsive.
For comparison, we're allowing up to 60 sec for devices to become
available after a Fundamental Reset or Conventional Reset
(PCIE_RESET_READY_POLL_MS). That's how long we're waiting in
dpc_reset_link() -> pci_bridge_wait_for_secondary_bus() and
we're not consistent with that when we wait only 4 sec in
pci_dpc_recovered().
I think the reason is that we weren't really sure whether this approach
to synchronize hotplug with DPC works well and how to choose delays.
But we've had this for a few years now and it seems to have worked nicely
for people. I think this is the first report where it's not been
working out of the box.
Thanks,
Lukas
next prev parent reply other threads:[~2025-07-11 4:13 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-07 10:30 [PATCH] PCI/DPC: Extend DPC recovery timeout Andy Xu
2025-07-07 17:04 ` Sathyanarayanan Kuppuswamy
2025-07-11 3:20 ` Hongbo Yao
2025-07-11 4:13 ` Lukas Wunner [this message]
2025-08-06 21:34 ` Bjorn Helgaas
2025-08-06 21:52 ` Keith Busch
2025-08-07 1:54 ` Ethan Zhao
2025-08-07 2:00 ` Ethan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aHCPTU03s-SkAsPs@wunner.de \
--to=lukas@wunner.de \
--cc=andy.xu@hj-micro.com \
--cc=bhelgaas@google.com \
--cc=jemma.zhang@hj-micro.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mahesh@linux.ibm.com \
--cc=oohall@gmail.com \
--cc=peter.du@hj-micro.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.