* [PATCH] PCI/PM: Avoid redundant delays on D3hot->D3cold
@ 2025-10-03 22:40 Brian Norris
2025-10-06 13:52 ` Mika Westerberg
2025-11-25 16:20 ` Brian Norris
0 siblings, 2 replies; 8+ messages in thread
From: Brian Norris @ 2025-10-03 22:40 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Mika Westerberg, Rafael J . Wysocki, linux-kernel, linux-pci,
Brian Norris, stable, Brian Norris
From: Brian Norris <briannorris@google.com>
When transitioning to D3cold, __pci_set_power_state() will first
transition a device to D3hot. If the device was already in D3hot, this
will add excess work:
(a) read/modify/write PMCSR; and
(b) excess delay (pci_dev_d3_sleep()).
For (b), we already performed the necessary delay on the previous D3hot
entry; this was extra noticeable when evaluating runtime PM transition
latency.
Check whether we're already in the target state before continuing.
Note that __pci_set_power_state() already does this same check for other
state transitions, but D3cold is special because __pci_set_power_state()
converts it to D3hot for the purposes of PMCSR.
This seems to be an oversight in commit 0aacdc957401 ("PCI/PM: Clean up
pci_set_low_power_state()").
Fixes: 0aacdc957401 ("PCI/PM: Clean up pci_set_low_power_state()")
Cc: <stable@vger.kernel.org>
Signed-off-by: Brian Norris <briannorris@google.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
drivers/pci/pci.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b0f4d98036cd..7517f1380201 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1539,6 +1539,9 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
|| (state == PCI_D2 && !dev->d2_support))
return -EIO;
+ if (state == dev->current_state)
+ return 0;
+
pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
if (PCI_POSSIBLE_ERROR(pmcsr)) {
pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n",
--
2.51.0.618.g983fd99d29-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] PCI/PM: Avoid redundant delays on D3hot->D3cold
2025-10-03 22:40 [PATCH] PCI/PM: Avoid redundant delays on D3hot->D3cold Brian Norris
@ 2025-10-06 13:52 ` Mika Westerberg
2025-10-06 18:32 ` Brian Norris
2025-11-25 16:20 ` Brian Norris
1 sibling, 1 reply; 8+ messages in thread
From: Mika Westerberg @ 2025-10-06 13:52 UTC (permalink / raw)
To: Brian Norris
Cc: Bjorn Helgaas, Rafael J . Wysocki, linux-kernel, linux-pci,
Brian Norris, stable
Hi,
On Fri, Oct 03, 2025 at 03:40:09PM -0700, Brian Norris wrote:
> From: Brian Norris <briannorris@google.com>
>
> When transitioning to D3cold, __pci_set_power_state() will first
> transition a device to D3hot. If the device was already in D3hot, this
> will add excess work:
> (a) read/modify/write PMCSR; and
> (b) excess delay (pci_dev_d3_sleep()).
How come the device is already in D3hot when __pci_set_power_state() is
called? IIRC PCI core will transition the device to low power state so that
it passes there the deepest possible state, and at that point the device is
still in D0. Then __pci_set_power_state() puts it into D3hot and then turns
if the power resource -> D3cold.
What I'm missing here?
> For (b), we already performed the necessary delay on the previous D3hot
> entry; this was extra noticeable when evaluating runtime PM transition
> latency.
>
> Check whether we're already in the target state before continuing.
>
> Note that __pci_set_power_state() already does this same check for other
> state transitions, but D3cold is special because __pci_set_power_state()
> converts it to D3hot for the purposes of PMCSR.
>
> This seems to be an oversight in commit 0aacdc957401 ("PCI/PM: Clean up
> pci_set_low_power_state()").
>
> Fixes: 0aacdc957401 ("PCI/PM: Clean up pci_set_low_power_state()")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Brian Norris <briannorris@google.com>
> Signed-off-by: Brian Norris <briannorris@chromium.org>
BTW, I think only one SoB from you is enough ;-)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] PCI/PM: Avoid redundant delays on D3hot->D3cold
2025-10-06 13:52 ` Mika Westerberg
@ 2025-10-06 18:32 ` Brian Norris
2025-10-06 19:33 ` Bjorn Helgaas
2025-10-07 5:00 ` Mika Westerberg
0 siblings, 2 replies; 8+ messages in thread
From: Brian Norris @ 2025-10-06 18:32 UTC (permalink / raw)
To: Mika Westerberg
Cc: Bjorn Helgaas, Rafael J . Wysocki, linux-kernel, linux-pci,
stable
Hi Mika,
On Mon, Oct 06, 2025 at 03:52:22PM +0200, Mika Westerberg wrote:
> On Fri, Oct 03, 2025 at 03:40:09PM -0700, Brian Norris wrote:
> > From: Brian Norris <briannorris@google.com>
> >
> > When transitioning to D3cold, __pci_set_power_state() will first
> > transition a device to D3hot. If the device was already in D3hot, this
> > will add excess work:
> > (a) read/modify/write PMCSR; and
> > (b) excess delay (pci_dev_d3_sleep()).
>
> How come the device is already in D3hot when __pci_set_power_state() is
> called? IIRC PCI core will transition the device to low power state so that
> it passes there the deepest possible state, and at that point the device is
> still in D0. Then __pci_set_power_state() puts it into D3hot and then turns
> if the power resource -> D3cold.
>
> What I'm missing here?
Some PCI drivers call pci_set_power_state(..., PCI_D3hot) on their own
when preparing for runtime or system suspend, so by the time they hit
pci_finish_runtime_suspend(), they're in D3hot. Then, pci_target_state()
may still pick a lower state (D3cold).
HTH,
Brian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] PCI/PM: Avoid redundant delays on D3hot->D3cold
2025-10-06 18:32 ` Brian Norris
@ 2025-10-06 19:33 ` Bjorn Helgaas
2025-10-06 23:13 ` Manivannan Sadhasivam
2025-10-07 5:00 ` Mika Westerberg
1 sibling, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2025-10-06 19:33 UTC (permalink / raw)
To: Brian Norris
Cc: Mika Westerberg, Bjorn Helgaas, Rafael J . Wysocki, linux-kernel,
linux-pci, stable
On Mon, Oct 06, 2025 at 11:32:38AM -0700, Brian Norris wrote:
> On Mon, Oct 06, 2025 at 03:52:22PM +0200, Mika Westerberg wrote:
> > On Fri, Oct 03, 2025 at 03:40:09PM -0700, Brian Norris wrote:
> > > From: Brian Norris <briannorris@google.com>
> > >
> > > When transitioning to D3cold, __pci_set_power_state() will first
> > > transition a device to D3hot. If the device was already in D3hot, this
> > > will add excess work:
> > > (a) read/modify/write PMCSR; and
> > > (b) excess delay (pci_dev_d3_sleep()).
> >
> > How come the device is already in D3hot when __pci_set_power_state() is
> > called? IIRC PCI core will transition the device to low power state so that
> > it passes there the deepest possible state, and at that point the device is
> > still in D0. Then __pci_set_power_state() puts it into D3hot and then turns
> > if the power resource -> D3cold.
> >
> > What I'm missing here?
>
> Some PCI drivers call pci_set_power_state(..., PCI_D3hot) on their own
> when preparing for runtime or system suspend, so by the time they hit
> pci_finish_runtime_suspend(), they're in D3hot. Then, pci_target_state()
> may still pick a lower state (D3cold).
We might need this change, but maybe this is also an opportunity to
remove some of those pci_set_power_state(..., PCI_D3hot) calls from
drivers.
I didn't look into any of them in detail, but I would jump at any
chance to remove PCI details from driver suspend paths. There are
only ~20 calls from suspend functions, ~25 from shutdown, and a few
from poweroff. The fact that there are so few makes me think they
might be leftovers that could be more fully converted to generic PM.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] PCI/PM: Avoid redundant delays on D3hot->D3cold
2025-10-06 19:33 ` Bjorn Helgaas
@ 2025-10-06 23:13 ` Manivannan Sadhasivam
2025-10-13 22:13 ` Brian Norris
0 siblings, 1 reply; 8+ messages in thread
From: Manivannan Sadhasivam @ 2025-10-06 23:13 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Brian Norris, Mika Westerberg, Bjorn Helgaas, Rafael J . Wysocki,
linux-kernel, linux-pci, stable
On Mon, Oct 06, 2025 at 02:33:33PM -0500, Bjorn Helgaas wrote:
> On Mon, Oct 06, 2025 at 11:32:38AM -0700, Brian Norris wrote:
> > On Mon, Oct 06, 2025 at 03:52:22PM +0200, Mika Westerberg wrote:
> > > On Fri, Oct 03, 2025 at 03:40:09PM -0700, Brian Norris wrote:
> > > > From: Brian Norris <briannorris@google.com>
> > > >
> > > > When transitioning to D3cold, __pci_set_power_state() will first
> > > > transition a device to D3hot. If the device was already in D3hot, this
> > > > will add excess work:
> > > > (a) read/modify/write PMCSR; and
> > > > (b) excess delay (pci_dev_d3_sleep()).
> > >
> > > How come the device is already in D3hot when __pci_set_power_state() is
> > > called? IIRC PCI core will transition the device to low power state so that
> > > it passes there the deepest possible state, and at that point the device is
> > > still in D0. Then __pci_set_power_state() puts it into D3hot and then turns
> > > if the power resource -> D3cold.
> > >
> > > What I'm missing here?
> >
> > Some PCI drivers call pci_set_power_state(..., PCI_D3hot) on their own
> > when preparing for runtime or system suspend, so by the time they hit
> > pci_finish_runtime_suspend(), they're in D3hot. Then, pci_target_state()
> > may still pick a lower state (D3cold).
>
> We might need this change, but maybe this is also an opportunity to
> remove some of those pci_set_power_state(..., PCI_D3hot) calls from
> drivers.
>
Agree. The PCI client drivers should have no business in opting for D3Hot in the
suspend path. It should be the other way around, they should opt-out if they
want by calling pci_save_state(), but that is also subject to discussion.
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] PCI/PM: Avoid redundant delays on D3hot->D3cold
2025-10-06 18:32 ` Brian Norris
2025-10-06 19:33 ` Bjorn Helgaas
@ 2025-10-07 5:00 ` Mika Westerberg
1 sibling, 0 replies; 8+ messages in thread
From: Mika Westerberg @ 2025-10-07 5:00 UTC (permalink / raw)
To: Brian Norris
Cc: Bjorn Helgaas, Rafael J . Wysocki, linux-kernel, linux-pci,
stable
Hi,
On Mon, Oct 06, 2025 at 11:32:38AM -0700, Brian Norris wrote:
> Hi Mika,
>
> On Mon, Oct 06, 2025 at 03:52:22PM +0200, Mika Westerberg wrote:
> > On Fri, Oct 03, 2025 at 03:40:09PM -0700, Brian Norris wrote:
> > > From: Brian Norris <briannorris@google.com>
> > >
> > > When transitioning to D3cold, __pci_set_power_state() will first
> > > transition a device to D3hot. If the device was already in D3hot, this
> > > will add excess work:
> > > (a) read/modify/write PMCSR; and
> > > (b) excess delay (pci_dev_d3_sleep()).
> >
> > How come the device is already in D3hot when __pci_set_power_state() is
> > called? IIRC PCI core will transition the device to low power state so that
> > it passes there the deepest possible state, and at that point the device is
> > still in D0. Then __pci_set_power_state() puts it into D3hot and then turns
> > if the power resource -> D3cold.
> >
> > What I'm missing here?
>
> Some PCI drivers call pci_set_power_state(..., PCI_D3hot) on their own
> when preparing for runtime or system suspend, so by the time they hit
> pci_finish_runtime_suspend(), they're in D3hot. Then, pci_target_state()
> may still pick a lower state (D3cold).
Ah, right. Thanks for clarification.
Yeah, I agree with Bjorn and Mani that those calls should go away (PCI core
does that already). That makes driver writes life simpler wrt. PCI PM.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] PCI/PM: Avoid redundant delays on D3hot->D3cold
2025-10-06 23:13 ` Manivannan Sadhasivam
@ 2025-10-13 22:13 ` Brian Norris
0 siblings, 0 replies; 8+ messages in thread
From: Brian Norris @ 2025-10-13 22:13 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Bjorn Helgaas, Mika Westerberg, Bjorn Helgaas, Rafael J . Wysocki,
linux-kernel, linux-pci, stable
On Mon, Oct 06, 2025 at 04:13:26PM -0700, Manivannan Sadhasivam wrote:
> On Mon, Oct 06, 2025 at 02:33:33PM -0500, Bjorn Helgaas wrote:
> > On Mon, Oct 06, 2025 at 11:32:38AM -0700, Brian Norris wrote:
> > > Some PCI drivers call pci_set_power_state(..., PCI_D3hot) on their own
> > > when preparing for runtime or system suspend, so by the time they hit
> > > pci_finish_runtime_suspend(), they're in D3hot. Then, pci_target_state()
> > > may still pick a lower state (D3cold).
> >
> > We might need this change, but maybe this is also an opportunity to
> > remove some of those pci_set_power_state(..., PCI_D3hot) calls from
> > drivers.
> >
>
> Agree. The PCI client drivers should have no business in opting for D3Hot in the
> suspend path.
I dunno. There are various reasons a device might want to go to D3Hot
some time before fully suspending the system, and possibly even before
runtime suspend (or they may not support runtime PM at all). For
example, on the first step on my alphabetical trawl through
git grep -l '\<pci_set_power_state\>' drivers/
I found a driver that supports some power-toggling via debugfs, in
drivers/accel/habanalabs/common/debugfs.c. It would take nontrivial
effort to evaluate every case like that for removal.
BTW, we even have documentation for this:
https://docs.kernel.org/power/pci.html#suspend
"However, in some rare case it is convenient to carry out these operations in
a PCI driver. Then, pci_save_state(), pci_prepare_to_sleep(), and
pci_set_power_state() should be used to save the device's standard configuration
registers, to prepare it for system wakeup (if necessary), and to put it into a
low-power state, respectively."
So sure, it should be rare (like the docs say), and it's probably
redundant in many cases, but I'm not that interested in shaving various
drivers' yaks right now. I'm just fixing a (small) performance
regression in documented behavior.
> It should be the other way around, they should opt-out if they
> want by calling pci_save_state(), but that is also subject to discussion.
FWIW, that's also documented in the above link.
Brian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] PCI/PM: Avoid redundant delays on D3hot->D3cold
2025-10-03 22:40 [PATCH] PCI/PM: Avoid redundant delays on D3hot->D3cold Brian Norris
2025-10-06 13:52 ` Mika Westerberg
@ 2025-11-25 16:20 ` Brian Norris
1 sibling, 0 replies; 8+ messages in thread
From: Brian Norris @ 2025-11-25 16:20 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Mika Westerberg, Rafael J . Wysocki, linux-kernel, linux-pci,
stable
On Fri, Oct 03, 2025 at 03:40:09PM -0700, Brian Norris wrote:
> From: Brian Norris <briannorris@google.com>
>
> When transitioning to D3cold, __pci_set_power_state() will first
> transition a device to D3hot. If the device was already in D3hot, this
> will add excess work:
> (a) read/modify/write PMCSR; and
> (b) excess delay (pci_dev_d3_sleep()).
>
> For (b), we already performed the necessary delay on the previous D3hot
> entry; this was extra noticeable when evaluating runtime PM transition
> latency.
>
> Check whether we're already in the target state before continuing.
>
> Note that __pci_set_power_state() already does this same check for other
> state transitions, but D3cold is special because __pci_set_power_state()
> converts it to D3hot for the purposes of PMCSR.
>
> This seems to be an oversight in commit 0aacdc957401 ("PCI/PM: Clean up
> pci_set_low_power_state()").
>
> Fixes: 0aacdc957401 ("PCI/PM: Clean up pci_set_low_power_state()")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Brian Norris <briannorris@google.com>
> Signed-off-by: Brian Norris <briannorris@chromium.org>
I'd like to know the status of this patch, with the merge window
approaching. It sounds like people agreed it fixes a confirmed
regression. I also don't think the request to remove all power state
management from all drivers was a reasonable one.
Brian
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-11-25 16:20 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-03 22:40 [PATCH] PCI/PM: Avoid redundant delays on D3hot->D3cold Brian Norris
2025-10-06 13:52 ` Mika Westerberg
2025-10-06 18:32 ` Brian Norris
2025-10-06 19:33 ` Bjorn Helgaas
2025-10-06 23:13 ` Manivannan Sadhasivam
2025-10-13 22:13 ` Brian Norris
2025-10-07 5:00 ` Mika Westerberg
2025-11-25 16:20 ` Brian Norris
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).