* [PATCHv2] pci: allow user specifiy a reset poll timeout
@ 2025-02-18 16:54 Keith Busch
2025-04-15 0:11 ` Keith Busch
0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2025-02-18 16:54 UTC (permalink / raw)
To: bhelgaas, linux-pci; +Cc: ilpo.jarvinen, lukas, Keith Busch
From: Keith Busch <kbusch@kernel.org>
The spec does not provide any upper limit to how long a device may
return Request Retry Status. It just says "Some devices require a
lengthy self-initialization sequence to complete". The kernel
arbitrarily chose 60 seconds since that really ought to be enough. But
there are devices where this turns out not to be enough.
Since any timeout choice would be arbitrary, and 60 seconds is generally
more than enough for the majority of hardware, let's make this a
parameter so an admin can adjust it specifically to their needs if the
default timeout isn't appropriate.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
v1->v2:
The user interface is in seconds granularity just to be more user
friendly. I don't think anyone needs millisecond granularity here.
This also required clamping the value to prevent any possible overflow
from bad user values.
Replaced the macro aliasing the kernel param variable to just directly
use the param variable. The variable is also renamed to match the
define that it's replacing.
.../admin-guide/kernel-parameters.txt | 3 +++
drivers/pci/pci.c | 18 ++++++++++++------
2 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fb8752b42ec85..148d0f37b6594 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4843,6 +4843,9 @@
Note: this may remove isolation between devices
and may put more devices in an IOMMU group.
+ reset_wait=nn The number of seconds to wait after a reset
+ while seeing Request Retry Status. Default is
+ 60 (1 minute).
force_floating [S390] Force usage of floating interrupts.
nomio [S390] Do not use MIO instructions.
norid [S390] ignore the RID field and force use of
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 869d204a70a37..b0ee84b90a22e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -75,7 +75,7 @@ struct pci_pme_device {
* limit, but 60 sec ought to be enough for any device to become
* responsive.
*/
-#define PCIE_RESET_READY_POLL_MS 60000 /* msec */
+static int pci_reset_ready_poll_ms = 60000;
static void pci_dev_d3_sleep(struct pci_dev *dev)
{
@@ -4549,7 +4549,7 @@ int pcie_flr(struct pci_dev *dev)
*/
msleep(100);
- return pci_dev_wait(dev, "FLR", PCIE_RESET_READY_POLL_MS);
+ return pci_dev_wait(dev, "FLR", pci_reset_ready_poll_ms);
}
EXPORT_SYMBOL_GPL(pcie_flr);
@@ -4616,7 +4616,7 @@ static int pci_af_flr(struct pci_dev *dev, bool probe)
*/
msleep(100);
- return pci_dev_wait(dev, "AF_FLR", PCIE_RESET_READY_POLL_MS);
+ return pci_dev_wait(dev, "AF_FLR", pci_reset_ready_poll_ms);
}
/**
@@ -4661,7 +4661,7 @@ static int pci_pm_reset(struct pci_dev *dev, bool probe)
pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, csr);
pci_dev_d3_sleep(dev);
- return pci_dev_wait(dev, "PM D3hot->D0", PCIE_RESET_READY_POLL_MS);
+ return pci_dev_wait(dev, "PM D3hot->D0", pci_reset_ready_poll_ms);
}
/**
@@ -4928,7 +4928,7 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
return -ENOTTY;
return pci_dev_wait(child, reset_type,
- PCIE_RESET_READY_POLL_MS - PCI_RESET_WAIT);
+ pci_reset_ready_poll_ms - PCI_RESET_WAIT);
}
pci_dbg(dev, "waiting %d ms for downstream link, after activation\n",
@@ -4940,7 +4940,7 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
}
return pci_dev_wait(child, reset_type,
- PCIE_RESET_READY_POLL_MS - delay);
+ pci_reset_ready_poll_ms - delay);
}
void pci_reset_secondary_bus(struct pci_dev *dev)
@@ -6841,6 +6841,12 @@ static int __init pci_setup(char *str)
disable_acs_redir_param = str + 18;
} else if (!strncmp(str, "config_acs=", 11)) {
config_acs_param = str + 11;
+ } else if (!strncmp(str, "reset_wait=", 11)) {
+ unsigned long val;
+
+ val = clamp(simple_strtoul(str + 11, &str, 0),
+ 1, INT_MAX / MSEC_PER_SEC);
+ pci_reset_ready_poll_ms = val * MSEC_PER_SEC;
} else {
pr_err("PCI: Unknown option `%s'\n", str);
}
--
2.43.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCHv2] pci: allow user specifiy a reset poll timeout
2025-02-18 16:54 [PATCHv2] pci: allow user specifiy a reset poll timeout Keith Busch
@ 2025-04-15 0:11 ` Keith Busch
2025-06-11 16:28 ` Manivannan Sadhasivam
0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2025-04-15 0:11 UTC (permalink / raw)
To: Keith Busch; +Cc: bhelgaas, linux-pci, ilpo.jarvinen, lukas
On Tue, Feb 18, 2025 at 08:54:44AM -0800, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> The spec does not provide any upper limit to how long a device may
> return Request Retry Status. It just says "Some devices require a
> lengthy self-initialization sequence to complete". The kernel
> arbitrarily chose 60 seconds since that really ought to be enough. But
> there are devices where this turns out not to be enough.
>
> Since any timeout choice would be arbitrary, and 60 seconds is generally
> more than enough for the majority of hardware, let's make this a
> parameter so an admin can adjust it specifically to their needs if the
> default timeout isn't appropriate.
This patch is trying to address timings that have no spec defined
behavior, so making it user tunable sounds just more reasonable than a
kernel define. If we're not considering upstream options to make this
tunable, I think we have no choice but to continue with bespoke
out-of-tree solutions.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2] pci: allow user specifiy a reset poll timeout
2025-04-15 0:11 ` Keith Busch
@ 2025-06-11 16:28 ` Manivannan Sadhasivam
2025-06-11 16:40 ` Keith Busch
0 siblings, 1 reply; 8+ messages in thread
From: Manivannan Sadhasivam @ 2025-06-11 16:28 UTC (permalink / raw)
To: Keith Busch; +Cc: Keith Busch, bhelgaas, linux-pci, ilpo.jarvinen, lukas
On Mon, Apr 14, 2025 at 06:11:44PM -0600, Keith Busch wrote:
> On Tue, Feb 18, 2025 at 08:54:44AM -0800, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> >
> > The spec does not provide any upper limit to how long a device may
> > return Request Retry Status. It just says "Some devices require a
> > lengthy self-initialization sequence to complete". The kernel
> > arbitrarily chose 60 seconds since that really ought to be enough. But
> > there are devices where this turns out not to be enough.
> >
> > Since any timeout choice would be arbitrary, and 60 seconds is generally
> > more than enough for the majority of hardware, let's make this a
> > parameter so an admin can adjust it specifically to their needs if the
> > default timeout isn't appropriate.
>
> This patch is trying to address timings that have no spec defined
> behavior, so making it user tunable sounds just more reasonable than a
> kernel define. If we're not considering upstream options to make this
> tunable, I think we have no choice but to continue with bespoke
> out-of-tree solutions.
Do we know the list of devices exhibiting this pattern? And does the time limit
is deterministic? I'm just trying to see if it is possible to add quirks for
those devices.
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2] pci: allow user specifiy a reset poll timeout
2025-06-11 16:28 ` Manivannan Sadhasivam
@ 2025-06-11 16:40 ` Keith Busch
2025-06-11 17:11 ` Manivannan Sadhasivam
0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2025-06-11 16:40 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Keith Busch, bhelgaas, linux-pci, ilpo.jarvinen, lukas
On Wed, Jun 11, 2025 at 09:58:59PM +0530, Manivannan Sadhasivam wrote:
> On Mon, Apr 14, 2025 at 06:11:44PM -0600, Keith Busch wrote:
> > On Tue, Feb 18, 2025 at 08:54:44AM -0800, Keith Busch wrote:
> > > From: Keith Busch <kbusch@kernel.org>
> > >
> > > The spec does not provide any upper limit to how long a device may
> > > return Request Retry Status. It just says "Some devices require a
> > > lengthy self-initialization sequence to complete". The kernel
> > > arbitrarily chose 60 seconds since that really ought to be enough. But
> > > there are devices where this turns out not to be enough.
> > >
> > > Since any timeout choice would be arbitrary, and 60 seconds is generally
> > > more than enough for the majority of hardware, let's make this a
> > > parameter so an admin can adjust it specifically to their needs if the
> > > default timeout isn't appropriate.
> >
> > This patch is trying to address timings that have no spec defined
> > behavior, so making it user tunable sounds just more reasonable than a
> > kernel define. If we're not considering upstream options to make this
> > tunable, I think we have no choice but to continue with bespoke
> > out-of-tree solutions.
>
> Do we know the list of devices exhibiting this pattern? And does the time limit
> is deterministic? I'm just trying to see if it is possible to add quirks for
> those devices.
No. I'm dealing with new devices being actively developed, with new ones
coming out every year, so a quirk list would just be never ending
maintenance pain point. The fact I can't point them to off-the-shelf
kernels to test with has been frustrating for everyone. If we just had a
user defined option instead of forcing the kernel's arbitrary choice,
then the problem is solved once and forever.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2] pci: allow user specifiy a reset poll timeout
2025-06-11 16:40 ` Keith Busch
@ 2025-06-11 17:11 ` Manivannan Sadhasivam
2025-06-11 17:17 ` Keith Busch
0 siblings, 1 reply; 8+ messages in thread
From: Manivannan Sadhasivam @ 2025-06-11 17:11 UTC (permalink / raw)
To: Keith Busch; +Cc: Keith Busch, bhelgaas, linux-pci, ilpo.jarvinen, lukas
On Wed, Jun 11, 2025 at 10:40:10AM -0600, Keith Busch wrote:
> On Wed, Jun 11, 2025 at 09:58:59PM +0530, Manivannan Sadhasivam wrote:
> > On Mon, Apr 14, 2025 at 06:11:44PM -0600, Keith Busch wrote:
> > > On Tue, Feb 18, 2025 at 08:54:44AM -0800, Keith Busch wrote:
> > > > From: Keith Busch <kbusch@kernel.org>
> > > >
> > > > The spec does not provide any upper limit to how long a device may
> > > > return Request Retry Status. It just says "Some devices require a
> > > > lengthy self-initialization sequence to complete". The kernel
> > > > arbitrarily chose 60 seconds since that really ought to be enough. But
> > > > there are devices where this turns out not to be enough.
> > > >
> > > > Since any timeout choice would be arbitrary, and 60 seconds is generally
> > > > more than enough for the majority of hardware, let's make this a
> > > > parameter so an admin can adjust it specifically to their needs if the
> > > > default timeout isn't appropriate.
> > >
> > > This patch is trying to address timings that have no spec defined
> > > behavior, so making it user tunable sounds just more reasonable than a
> > > kernel define. If we're not considering upstream options to make this
> > > tunable, I think we have no choice but to continue with bespoke
> > > out-of-tree solutions.
> >
> > Do we know the list of devices exhibiting this pattern? And does the time limit
> > is deterministic? I'm just trying to see if it is possible to add quirks for
> > those devices.
>
> No. I'm dealing with new devices being actively developed, with new ones
> coming out every year, so a quirk list would just be never ending
> maintenance pain point.
Sounds like you have a lot of devices behaving this way. So can't you quirk them
based on VID and CLASS?
> The fact I can't point them to off-the-shelf
> kernels to test with has been frustrating for everyone. If we just had a
> user defined option instead of forcing the kernel's arbitrary choice,
> then the problem is solved once and forever.
I think nowadays the use of module_params is not encouraged, though in this
case, it is already present and you are just trying to add one more option.
But, adding a new option for devices from a single vendor might not fly (though
only Bjorn could take that call).
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2] pci: allow user specifiy a reset poll timeout
2025-06-11 17:11 ` Manivannan Sadhasivam
@ 2025-06-11 17:17 ` Keith Busch
2025-06-12 8:06 ` Ilpo Järvinen
0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2025-06-11 17:17 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Keith Busch, bhelgaas, linux-pci, ilpo.jarvinen, lukas
On Wed, Jun 11, 2025 at 10:41:33PM +0530, Manivannan Sadhasivam wrote:
> On Wed, Jun 11, 2025 at 10:40:10AM -0600, Keith Busch wrote:
> >
> > No. I'm dealing with new devices being actively developed, with new ones
> > coming out every year, so a quirk list would just be never ending
> > maintenance pain point.
>
> Sounds like you have a lot of devices behaving this way. So can't you quirk them
> based on VID and CLASS?
What I mean by active development is that the timeout continues to be a
moving target. A quirk only gives me a fixed value, but I need a
modifiable one without having to recompile the kernel.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2] pci: allow user specifiy a reset poll timeout
2025-06-11 17:17 ` Keith Busch
@ 2025-06-12 8:06 ` Ilpo Järvinen
2025-06-23 17:51 ` Keith Busch
0 siblings, 1 reply; 8+ messages in thread
From: Ilpo Järvinen @ 2025-06-12 8:06 UTC (permalink / raw)
To: Keith Busch
Cc: Manivannan Sadhasivam, Keith Busch, bhelgaas, linux-pci,
Lukas Wunner
On Wed, 11 Jun 2025, Keith Busch wrote:
> On Wed, Jun 11, 2025 at 10:41:33PM +0530, Manivannan Sadhasivam wrote:
> > On Wed, Jun 11, 2025 at 10:40:10AM -0600, Keith Busch wrote:
> > >
> > > No. I'm dealing with new devices being actively developed, with new ones
> > > coming out every year, so a quirk list would just be never ending
> > > maintenance pain point.
> >
> > Sounds like you have a lot of devices behaving this way. So can't you quirk them
> > based on VID and CLASS?
>
> What I mean by active development is that the timeout continues to be a
> moving target. A quirk only gives me a fixed value, but I need a
> modifiable one without having to recompile the kernel.
Hi,
Doesn't DRS/FRS address this such way that the device can tell when it's
ready? So perhaps check if DRS/FRS is supported and only then make the
timeout like really large?
--
i.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2] pci: allow user specifiy a reset poll timeout
2025-06-12 8:06 ` Ilpo Järvinen
@ 2025-06-23 17:51 ` Keith Busch
0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2025-06-23 17:51 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Manivannan Sadhasivam, Keith Busch, bhelgaas, linux-pci,
Lukas Wunner
On Thu, Jun 12, 2025 at 11:06:41AM +0300, Ilpo Järvinen wrote:
> On Wed, 11 Jun 2025, Keith Busch wrote:
>
> > On Wed, Jun 11, 2025 at 10:41:33PM +0530, Manivannan Sadhasivam wrote:
> > > On Wed, Jun 11, 2025 at 10:40:10AM -0600, Keith Busch wrote:
> > > >
> > > > No. I'm dealing with new devices being actively developed, with new ones
> > > > coming out every year, so a quirk list would just be never ending
> > > > maintenance pain point.
> > >
> > > Sounds like you have a lot of devices behaving this way. So can't you quirk them
> > > based on VID and CLASS?
> >
> > What I mean by active development is that the timeout continues to be a
> > moving target. A quirk only gives me a fixed value, but I need a
> > modifiable one without having to recompile the kernel.
>
> Hi,
>
> Doesn't DRS/FRS address this such way that the device can tell when it's
> ready? So perhaps check if DRS/FRS is supported and only then make the
> timeout like really large?
Even if the kernel supported that, you'd still need an arbitrary timeout
in order to make forward progress in case the device never becomes
ready.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-06-23 17:51 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-18 16:54 [PATCHv2] pci: allow user specifiy a reset poll timeout Keith Busch
2025-04-15 0:11 ` Keith Busch
2025-06-11 16:28 ` Manivannan Sadhasivam
2025-06-11 16:40 ` Keith Busch
2025-06-11 17:11 ` Manivannan Sadhasivam
2025-06-11 17:17 ` Keith Busch
2025-06-12 8:06 ` Ilpo Järvinen
2025-06-23 17:51 ` Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).