Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Lukas Wunner <lukas@wunner.de>
Cc: Hongbo Yao <andy.xu@hj-micro.com>,
	Sathyanarayanan Kuppuswamy
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	bhelgaas@google.com, mahesh@linux.ibm.com, oohall@gmail.com,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	jemma.zhang@hj-micro.com, peter.du@hj-micro.com
Subject: Re: [PATCH] PCI/DPC: Extend DPC recovery timeout
Date: Wed, 6 Aug 2025 16:34:09 -0500	[thread overview]
Message-ID: <20250806213409.GA19037@bhelgaas> (raw)
In-Reply-To: <aHCPTU03s-SkAsPs@wunner.de>

On Fri, Jul 11, 2025 at 06:13:01AM +0200, Lukas Wunner wrote:
> On Fri, Jul 11, 2025 at 11:20:15AM +0800, Hongbo Yao wrote:
> > 2025/7/8 1:04, Sathyanarayanan Kuppuswamy:
> > > On 7/7/25 3:30 AM, Andy Xu wrote:
> > > > Setting timeout to 7s covers both devices with safety margin.
> > > 
> > > Instead of updating the recovery time, can you check why your device
> > > recovery takes
> > > such a long time and how to fix it from the device end?
> > 
> > I fully agree that ideally the root cause should be addressed on the
> > device side to reduce the DPC recovery latency, and that waiting longer
> > in the kernel is not a perfect solution.
> > 
> > However, the current 4 seconds timeout in pci_dpc_recovered() is indeed
> > an empirical value rather than a hard requirement from the PCIe
> > specification. In real-world scenarios, like with Mellanox ConnectX-5/7
> > adapters, we've observed that full DPC recovery can take more than 5-6
> > seconds, which leads to premature hotplug processing and device removal.
> 
> I think Sathya's point was:  Have you made an effort to talk to the
> vendor and ask them to root-cause and fix the issue e.g. with a firmware
> update.

Would definitely be great, but unless we have a number in the spec to
point to, they might just shrug and ask what the requirement is.

> > To improve robustness and maintain flexibility, I???m considering
> > introducing a module parameter to allow tuning the DPC recovery timeout
> > dynamically. Would you like me to prepare and submit such a patch for
> > review?
> 
> We try to avoid adding new module parameters.  Things should just work
> out of the box without the user having to adjust the kernel command
> line for their system.
> 
> So the solution is indeed to either adjust the delay for everyone
> (as you've done) or introduce an unsigned int to struct pci_dev
> which can be assigned the delay after reset for the device to be
> responsive.
> 
> For comparison, we're allowing up to 60 sec for devices to become
> available after a Fundamental Reset or Conventional Reset
> (PCIE_RESET_READY_POLL_MS).  That's how long we're waiting in
> dpc_reset_link() -> pci_bridge_wait_for_secondary_bus() and
> we're not consistent with that when we wait only 4 sec in
> pci_dpc_recovered().
> 
> I think the reason is that we weren't really sure whether this approach
> to synchronize hotplug with DPC works well and how to choose delays.
> But we've had this for a few years now and it seems to have worked nicely
> for people.  I think this is the first report where it's not been
> working out of the box.

Why would we wait less than PCIE_RESET_READY_POLL_MS?  DPC disables
the link, so that's basically a reset for the device.  Seems like we
should allow as much time as we do for any other kind of reset.

Bjorn

  reply	other threads:[~2025-08-06 21:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-07 10:30 [PATCH] PCI/DPC: Extend DPC recovery timeout Andy Xu
2025-07-07 17:04 ` Sathyanarayanan Kuppuswamy
2025-07-11  3:20   ` Hongbo Yao
2025-07-11  4:13     ` Lukas Wunner
2025-08-06 21:34       ` Bjorn Helgaas [this message]
2025-08-06 21:52         ` Keith Busch
2025-08-07  1:54           ` Ethan Zhao
2025-08-07  2:00     ` Ethan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250806213409.GA19037@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=andy.xu@hj-micro.com \
    --cc=bhelgaas@google.com \
    --cc=jemma.zhang@hj-micro.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mahesh@linux.ibm.com \
    --cc=oohall@gmail.com \
    --cc=peter.du@hj-micro.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox