Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Lukas Wunner <lukas@wunner.de>,
	sashiko-bot@kernel.org, linux-pci@vger.kernel.org,
	sashiko@lists.linux.dev, Marco Nenciarini <mnencia@kcore.it>,
	Michal Winiarski <michal.winiarski@intel.com>,
	Ilpo Jarvinen <ilpo.jarvinen@linux.intel.com>,
	Eric Chanudet <echanude@redhat.com>,
	Jean Guyader <jean.guyader@gmail.com>,
	Alex Williamson <alex@shazbot.org>, Sinan Kaya <okaya@kernel.org>,
	Mario Limonciello <superm1@kernel.org>,
	Mika Westerberg <mika.westerberg@linux.intel.com>
Subject: Re: [PATCH] PCI: Drop unnecessary retries when restoring BARs
Date: Fri, 8 May 2026 16:43:09 -0500	[thread overview]
Message-ID: <20260508214309.GA14699@bhelgaas> (raw)
In-Reply-To: <CAJZ5v0iZN5NtUztqe=MxCRcXdBaaqzZ749OqSUkadwwBy0ugUQ@mail.gmail.com>

On Fri, May 08, 2026 at 02:51:37PM +0200, Rafael J. Wysocki wrote:
> On Fri, May 8, 2026 at 2:17 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, May 05, 2026 at 12:43:17PM +0200, Rafael J. Wysocki wrote:
> > > On Mon, May 4, 2026 at 11:17 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > ...
> >
> > > > If pci_power_up() is doing D3cold -> D0, main power is initially
> > > > off, so platform_pci_set_power_state(PCI_D0) would turn on main
> > > > power, which is a Fundamental Reset leaving the device in
> > > > D0uninitialized.  In general the device needs at least 100 ms
> > > > after that reset before any access, e.g., before the PMCSR read.
> > >
> > > Yes, it does in general, but that should be covered by
> > > platform_pci_set_power_state() because on some platforms the delay
> > > can be shorter.
> >
> > It's surprising to me that platform_pci_set_power_state() takes
> > care of that delay because there's so much PCIe infrastructure
> > (dependencies on link speed, RRS polling, Immediate Readiness,
> > Readiness Notifications, etc), much of which sounds more like
> > OS-level support than firmware support.
> 
> It is kind of a mixed bag because it involves platform-specific
> operations (the PCIe spec doesn't define a standard method of
> transitioning devices in D3cold into D0).
> 
> The consensus in the industry appears to be that if AML is used for
> carrying out D3cold->D0 transitions, it is expected to take the
> requisite delays into account.
> 
> > But if platform_pci_set_power_state() does guarantee that the device
> > is Configuration-Ready, we should document that somewhere.
> 
> It is more along the lines of what I said before: After
> platform_pci_set_power_state() the device is either
> Configuration-Ready, or inaccessible (in which case it can be assumed
> to have dropped off the bus, but this is all platform-specific, so on
> some platforms there may be ways to revive such devices).

If the device is Configuration-Ready or inaccessible after
platform_pci_set_power_state(PCI_D0), then I agree we shouldn't add
that delay in pci_power_up().

> That said, I'm all for documenting it.

Strawman below.  "Inaccessible upon return" is bound to lead to
questions, but if the PCI core can't do anything I don't know what
else to say.

  diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
  index ed1b1d7a7b46..2b0513c35387 100644
  --- a/drivers/pci/pci.c
  +++ b/drivers/pci/pci.c
  @@ -1068,6 +1068,12 @@ static inline bool platform_pci_power_manageable(struct pci_dev *dev)
	  return acpi_pci_power_manageable(dev);
   }
   
  +/*
  + * When setting power state to D0, platform_pci_set_power_state() ensures
  + * main power is on and that any required delays after a transition to D0
  + * have been completed.  The device is either Configuration-Ready or
  + * inaccessible upon return.
  + */
   static inline int platform_pci_set_power_state(struct pci_dev *dev,
						 pci_power_t t)
   {

> > > > If pci_power_up() is doing D3hot -> D0, the device already has
> > > > main power, so I suppose platform_pci_set_power_state(PCI_D0)
> > > > doesn't do anything.
> > >
> > > It may or may not.  Some platforms supply AML to run during D3hot ->
> > > D0 transitions (and the ACPI spec allows that).
> >
> > If the device started in D3hot, it never lost main power, and I
> > assumed platform_pci_set_power_state(PCI_D0) might just leave it in
> > D3hot.  It sounds like it takes it all the way to D0, although the
> > subsequent code that checks the PMCSR state does suggest that the
> > device might *not* be in D0:
> 
> On platforms using ACPI, platform_pci_set_power_state() carries out
> the AML part of the transition which may do nothing for devices in
> D3hot (for instance, there may be an ACPI power resource that needs to
> be turned "on" in order to restore power to the device, but if the
> power resource is "on" already, say because it is shared with another
> device that is in D0 ATM, its state may not change).
> 
> platform_pci_set_power_state() may also be a no-op for the given
> device (for example, if this is an add-on device without any
> associated AML).
> 
> The rule of thumb is that if there is a programmatic way to remove
> power from a PCIe device (so it can go into D3cold), it is generally
> platform-specific and there should be a complementary programmatic way
> to do the reverse (also generally platform-specific).  Otherwise, the
> device can go only as deep as D3hot except when the system as a whole
> is powered down or it is on a Thunderbolt link that can be
> disconnected at any time etc.
> 
> >     platform_pci_set_power_state(dev, PCI_D0);
> >
> >     pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> >     state = pmcsr & PCI_PM_CTRL_STATE_MASK;
> >     if (state == PCI_D0)
> >       goto end;
> >
> >     pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0);
> >     if (state == PCI_D3hot)
> >       pci_dev_d3_sleep(dev);
> >     ...
> >
> >   end:
> >
> > If platform_pci_set_power_state(PCI_D0) puts the device in D0 and
> > takes care of all the delays, "state" should always be PCI_D0, and we
> > shouldn't need the D3hot and D2 delays here.
> 
> Well, not quite, as per the above.

OK, so after platform_pci_set_power_state(PCI_D0), the device may be
in D3hot in some cases.  Then there's a D3hot -> D0 transition, and we
do pci_dev_d3_sleep(), but not pci_dev_wait().  Sec 2.3.1 says RRS is
permitted after a D3hot to D0uninitialized, and pci_dev_wait() is what
waits and retries for the RRS case.

It looks to me like we need pci_dev_wait() after that transition when
No_Soft_Reset == 0, i.e., something like the below.  Do you think this
is unnecessary?

  diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
  index 8f7cfcc00090..ed1b1d7a7b46 100644
  --- a/drivers/pci/pci.c
  +++ b/drivers/pci/pci.c
  @@ -1341,10 +1341,14 @@ int pci_power_up(struct pci_dev *dev)
	  pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0);
   
	  /* Mandatory transition delays; see PCI PM 1.2. */
  -	if (state == PCI_D3hot)
  +	if (state == PCI_D3hot) {
		  pci_dev_d3_sleep(dev);
  -	else if (state == PCI_D2)
  +		if (!(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
  +			pci_dev_wait(dev, "power up D3hot->D0",
  +				     PCIE_RESET_READY_POLL_MS);
  +	} else if (state == PCI_D2) {
		  udelay(PCI_PM_D2_DELAY);
  +	}
   
   end:
	  dev->current_state = PCI_D0;

      reply	other threads:[~2026-05-08 21:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-03 13:34 [PATCH] PCI: Drop unnecessary retries when restoring BARs Lukas Wunner
2026-05-03 13:51 ` sashiko-bot
2026-05-04  7:49   ` Lukas Wunner
2026-05-04 17:09     ` Bjorn Helgaas
2026-05-04 19:31       ` Rafael J. Wysocki
2026-05-04 21:17         ` Bjorn Helgaas
2026-05-05 10:43           ` Rafael J. Wysocki
2026-05-08  0:17             ` Bjorn Helgaas
2026-05-08 12:51               ` Rafael J. Wysocki
2026-05-08 21:43                 ` Bjorn Helgaas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260508214309.GA14699@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=alex@shazbot.org \
    --cc=echanude@redhat.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=jean.guyader@gmail.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=michal.winiarski@intel.com \
    --cc=mika.westerberg@linux.intel.com \
    --cc=mnencia@kcore.it \
    --cc=okaya@kernel.org \
    --cc=rafael@kernel.org \
    --cc=sashiko-bot@kernel.org \
    --cc=sashiko@lists.linux.dev \
    --cc=superm1@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox