From: Bjorn Helgaas <helgaas@kernel.org>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Lukas Wunner <lukas@wunner.de>,
sashiko-bot@kernel.org, linux-pci@vger.kernel.org,
sashiko@lists.linux.dev, Marco Nenciarini <mnencia@kcore.it>,
Michal Winiarski <michal.winiarski@intel.com>,
Ilpo Jarvinen <ilpo.jarvinen@linux.intel.com>,
Eric Chanudet <echanude@redhat.com>,
Jean Guyader <jean.guyader@gmail.com>,
Alex Williamson <alex@shazbot.org>, Sinan Kaya <okaya@kernel.org>,
Mario Limonciello <superm1@kernel.org>,
Mika Westerberg <mika.westerberg@linux.intel.com>
Subject: Re: [PATCH] PCI: Drop unnecessary retries when restoring BARs
Date: Fri, 8 May 2026 16:43:09 -0500 [thread overview]
Message-ID: <20260508214309.GA14699@bhelgaas> (raw)
In-Reply-To: <CAJZ5v0iZN5NtUztqe=MxCRcXdBaaqzZ749OqSUkadwwBy0ugUQ@mail.gmail.com>
On Fri, May 08, 2026 at 02:51:37PM +0200, Rafael J. Wysocki wrote:
> On Fri, May 8, 2026 at 2:17 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, May 05, 2026 at 12:43:17PM +0200, Rafael J. Wysocki wrote:
> > > On Mon, May 4, 2026 at 11:17 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > ...
> >
> > > > If pci_power_up() is doing D3cold -> D0, main power is initially
> > > > off, so platform_pci_set_power_state(PCI_D0) would turn on main
> > > > power, which is a Fundamental Reset leaving the device in
> > > > D0uninitialized. In general the device needs at least 100 ms
> > > > after that reset before any access, e.g., before the PMCSR read.
> > >
> > > Yes, it does in general, but that should be covered by
> > > platform_pci_set_power_state() because on some platforms the delay
> > > can be shorter.
> >
> > It's surprising to me that platform_pci_set_power_state() takes
> > care of that delay because there's so much PCIe infrastructure
> > (dependencies on link speed, RRS polling, Immediate Readiness,
> > Readiness Notifications, etc), much of which sounds more like
> > OS-level support than firmware support.
>
> It is kind of a mixed bag because it involves platform-specific
> operations (the PCIe spec doesn't define a standard method of
> transitioning devices in D3cold into D0).
>
> The consensus in the industry appears to be that if AML is used for
> carrying out D3cold->D0 transitions, it is expected to take the
> requisite delays into account.
>
> > But if platform_pci_set_power_state() does guarantee that the device
> > is Configuration-Ready, we should document that somewhere.
>
> It is more along the lines of what I said before: After
> platform_pci_set_power_state() the device is either
> Configuration-Ready, or inaccessible (in which case it can be assumed
> to have dropped off the bus, but this is all platform-specific, so on
> some platforms there may be ways to revive such devices).
If the device is Configuration-Ready or inaccessible after
platform_pci_set_power_state(PCI_D0), then I agree we shouldn't add
that delay in pci_power_up().
> That said, I'm all for documenting it.
Strawman below. "Inaccessible upon return" is bound to lead to
questions, but if the PCI core can't do anything I don't know what
else to say.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ed1b1d7a7b46..2b0513c35387 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1068,6 +1068,12 @@ static inline bool platform_pci_power_manageable(struct pci_dev *dev)
return acpi_pci_power_manageable(dev);
}
+/*
+ * When setting power state to D0, platform_pci_set_power_state() ensures
+ * main power is on and that any required delays after a transition to D0
+ * have been completed. The device is either Configuration-Ready or
+ * inaccessible upon return.
+ */
static inline int platform_pci_set_power_state(struct pci_dev *dev,
pci_power_t t)
{
> > > > If pci_power_up() is doing D3hot -> D0, the device already has
> > > > main power, so I suppose platform_pci_set_power_state(PCI_D0)
> > > > doesn't do anything.
> > >
> > > It may or may not. Some platforms supply AML to run during D3hot ->
> > > D0 transitions (and the ACPI spec allows that).
> >
> > If the device started in D3hot, it never lost main power, and I
> > assumed platform_pci_set_power_state(PCI_D0) might just leave it in
> > D3hot. It sounds like it takes it all the way to D0, although the
> > subsequent code that checks the PMCSR state does suggest that the
> > device might *not* be in D0:
>
> On platforms using ACPI, platform_pci_set_power_state() carries out
> the AML part of the transition which may do nothing for devices in
> D3hot (for instance, there may be an ACPI power resource that needs to
> be turned "on" in order to restore power to the device, but if the
> power resource is "on" already, say because it is shared with another
> device that is in D0 ATM, its state may not change).
>
> platform_pci_set_power_state() may also be a no-op for the given
> device (for example, if this is an add-on device without any
> associated AML).
>
> The rule of thumb is that if there is a programmatic way to remove
> power from a PCIe device (so it can go into D3cold), it is generally
> platform-specific and there should be a complementary programmatic way
> to do the reverse (also generally platform-specific). Otherwise, the
> device can go only as deep as D3hot except when the system as a whole
> is powered down or it is on a Thunderbolt link that can be
> disconnected at any time etc.
>
> > platform_pci_set_power_state(dev, PCI_D0);
> >
> > pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> > state = pmcsr & PCI_PM_CTRL_STATE_MASK;
> > if (state == PCI_D0)
> > goto end;
> >
> > pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0);
> > if (state == PCI_D3hot)
> > pci_dev_d3_sleep(dev);
> > ...
> >
> > end:
> >
> > If platform_pci_set_power_state(PCI_D0) puts the device in D0 and
> > takes care of all the delays, "state" should always be PCI_D0, and we
> > shouldn't need the D3hot and D2 delays here.
>
> Well, not quite, as per the above.
OK, so after platform_pci_set_power_state(PCI_D0), the device may be
in D3hot in some cases. Then there's a D3hot -> D0 transition, and we
do pci_dev_d3_sleep(), but not pci_dev_wait(). Sec 2.3.1 says RRS is
permitted after a D3hot to D0uninitialized, and pci_dev_wait() is what
waits and retries for the RRS case.
It looks to me like we need pci_dev_wait() after that transition when
No_Soft_Reset == 0, i.e., something like the below. Do you think this
is unnecessary?
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8f7cfcc00090..ed1b1d7a7b46 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1341,10 +1341,14 @@ int pci_power_up(struct pci_dev *dev)
pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0);
/* Mandatory transition delays; see PCI PM 1.2. */
- if (state == PCI_D3hot)
+ if (state == PCI_D3hot) {
pci_dev_d3_sleep(dev);
- else if (state == PCI_D2)
+ if (!(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
+ pci_dev_wait(dev, "power up D3hot->D0",
+ PCIE_RESET_READY_POLL_MS);
+ } else if (state == PCI_D2) {
udelay(PCI_PM_D2_DELAY);
+ }
end:
dev->current_state = PCI_D0;
prev parent reply other threads:[~2026-05-08 21:43 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-03 13:34 [PATCH] PCI: Drop unnecessary retries when restoring BARs Lukas Wunner
2026-05-03 13:51 ` sashiko-bot
2026-05-04 7:49 ` Lukas Wunner
2026-05-04 17:09 ` Bjorn Helgaas
2026-05-04 19:31 ` Rafael J. Wysocki
2026-05-04 21:17 ` Bjorn Helgaas
2026-05-05 10:43 ` Rafael J. Wysocki
2026-05-08 0:17 ` Bjorn Helgaas
2026-05-08 12:51 ` Rafael J. Wysocki
2026-05-08 21:43 ` Bjorn Helgaas [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260508214309.GA14699@bhelgaas \
--to=helgaas@kernel.org \
--cc=alex@shazbot.org \
--cc=echanude@redhat.com \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=jean.guyader@gmail.com \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=michal.winiarski@intel.com \
--cc=mika.westerberg@linux.intel.com \
--cc=mnencia@kcore.it \
--cc=okaya@kernel.org \
--cc=rafael@kernel.org \
--cc=sashiko-bot@kernel.org \
--cc=sashiko@lists.linux.dev \
--cc=superm1@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox