From: Bjorn Helgaas <helgaas@kernel.org>
To: Lukas Wunner <lukas@wunner.de>
Cc: Hongbo Yao <andy.xu@hj-micro.com>,
Sathyanarayanan Kuppuswamy
<sathyanarayanan.kuppuswamy@linux.intel.com>,
bhelgaas@google.com, mahesh@linux.ibm.com, oohall@gmail.com,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
jemma.zhang@hj-micro.com, peter.du@hj-micro.com
Subject: Re: [PATCH] PCI/DPC: Extend DPC recovery timeout
Date: Wed, 6 Aug 2025 16:34:09 -0500 [thread overview]
Message-ID: <20250806213409.GA19037@bhelgaas> (raw)
In-Reply-To: <aHCPTU03s-SkAsPs@wunner.de>
On Fri, Jul 11, 2025 at 06:13:01AM +0200, Lukas Wunner wrote:
> On Fri, Jul 11, 2025 at 11:20:15AM +0800, Hongbo Yao wrote:
> > 2025/7/8 1:04, Sathyanarayanan Kuppuswamy:
> > > On 7/7/25 3:30 AM, Andy Xu wrote:
> > > > Setting timeout to 7s covers both devices with safety margin.
> > >
> > > Instead of updating the recovery time, can you check why your device
> > > recovery takes
> > > such a long time and how to fix it from the device end?
> >
> > I fully agree that ideally the root cause should be addressed on the
> > device side to reduce the DPC recovery latency, and that waiting longer
> > in the kernel is not a perfect solution.
> >
> > However, the current 4 seconds timeout in pci_dpc_recovered() is indeed
> > an empirical value rather than a hard requirement from the PCIe
> > specification. In real-world scenarios, like with Mellanox ConnectX-5/7
> > adapters, we've observed that full DPC recovery can take more than 5-6
> > seconds, which leads to premature hotplug processing and device removal.
>
> I think Sathya's point was: Have you made an effort to talk to the
> vendor and ask them to root-cause and fix the issue e.g. with a firmware
> update.
Would definitely be great, but unless we have a number in the spec to
point to, they might just shrug and ask what the requirement is.
> > To improve robustness and maintain flexibility, I???m considering
> > introducing a module parameter to allow tuning the DPC recovery timeout
> > dynamically. Would you like me to prepare and submit such a patch for
> > review?
>
> We try to avoid adding new module parameters. Things should just work
> out of the box without the user having to adjust the kernel command
> line for their system.
>
> So the solution is indeed to either adjust the delay for everyone
> (as you've done) or introduce an unsigned int to struct pci_dev
> which can be assigned the delay after reset for the device to be
> responsive.
>
> For comparison, we're allowing up to 60 sec for devices to become
> available after a Fundamental Reset or Conventional Reset
> (PCIE_RESET_READY_POLL_MS). That's how long we're waiting in
> dpc_reset_link() -> pci_bridge_wait_for_secondary_bus() and
> we're not consistent with that when we wait only 4 sec in
> pci_dpc_recovered().
>
> I think the reason is that we weren't really sure whether this approach
> to synchronize hotplug with DPC works well and how to choose delays.
> But we've had this for a few years now and it seems to have worked nicely
> for people. I think this is the first report where it's not been
> working out of the box.
Why would we wait less than PCIE_RESET_READY_POLL_MS? DPC disables
the link, so that's basically a reset for the device. Seems like we
should allow as much time as we do for any other kind of reset.
Bjorn
next prev parent reply other threads:[~2025-08-06 21:34 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-07 10:30 [PATCH] PCI/DPC: Extend DPC recovery timeout Andy Xu
2025-07-07 17:04 ` Sathyanarayanan Kuppuswamy
2025-07-11 3:20 ` Hongbo Yao
2025-07-11 4:13 ` Lukas Wunner
2025-08-06 21:34 ` Bjorn Helgaas [this message]
2025-08-06 21:52 ` Keith Busch
2025-08-07 1:54 ` Ethan Zhao
2025-08-07 2:00 ` Ethan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250806213409.GA19037@bhelgaas \
--to=helgaas@kernel.org \
--cc=andy.xu@hj-micro.com \
--cc=bhelgaas@google.com \
--cc=jemma.zhang@hj-micro.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mahesh@linux.ibm.com \
--cc=oohall@gmail.com \
--cc=peter.du@hj-micro.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.