All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Lukas Wunner <lukas@wunner.de>,
	sashiko-bot@kernel.org, linux-pci@vger.kernel.org,
	sashiko@lists.linux.dev, Marco Nenciarini <mnencia@kcore.it>,
	Michal Winiarski <michal.winiarski@intel.com>,
	Ilpo Jarvinen <ilpo.jarvinen@linux.intel.com>,
	Eric Chanudet <echanude@redhat.com>,
	Jean Guyader <jean.guyader@gmail.com>,
	Alex Williamson <alex@shazbot.org>, Sinan Kaya <okaya@kernel.org>,
	Mario Limonciello <superm1@kernel.org>,
	Mika Westerberg <mika.westerberg@linux.intel.com>
Subject: Re: [PATCH] PCI: Drop unnecessary retries when restoring BARs
Date: Fri, 8 May 2026 16:43:09 -0500	[thread overview]
Message-ID: <20260508214309.GA14699@bhelgaas> (raw)
In-Reply-To: <CAJZ5v0iZN5NtUztqe=MxCRcXdBaaqzZ749OqSUkadwwBy0ugUQ@mail.gmail.com>

On Fri, May 08, 2026 at 02:51:37PM +0200, Rafael J. Wysocki wrote:
> On Fri, May 8, 2026 at 2:17 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, May 05, 2026 at 12:43:17PM +0200, Rafael J. Wysocki wrote:
> > > On Mon, May 4, 2026 at 11:17 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > ...
> >
> > > > If pci_power_up() is doing D3cold -> D0, main power is initially
> > > > off, so platform_pci_set_power_state(PCI_D0) would turn on main
> > > > power, which is a Fundamental Reset leaving the device in
> > > > D0uninitialized.  In general the device needs at least 100 ms
> > > > after that reset before any access, e.g., before the PMCSR read.
> > >
> > > Yes, it does in general, but that should be covered by
> > > platform_pci_set_power_state() because on some platforms the delay
> > > can be shorter.
> >
> > It's surprising to me that platform_pci_set_power_state() takes
> > care of that delay because there's so much PCIe infrastructure
> > (dependencies on link speed, RRS polling, Immediate Readiness,
> > Readiness Notifications, etc), much of which sounds more like
> > OS-level support than firmware support.
> 
> It is kind of a mixed bag because it involves platform-specific
> operations (the PCIe spec doesn't define a standard method of
> transitioning devices in D3cold into D0).
> 
> The consensus in the industry appears to be that if AML is used for
> carrying out D3cold->D0 transitions, it is expected to take the
> requisite delays into account.
> 
> > But if platform_pci_set_power_state() does guarantee that the device
> > is Configuration-Ready, we should document that somewhere.
> 
> It is more along the lines of what I said before: After
> platform_pci_set_power_state() the device is either
> Configuration-Ready, or inaccessible (in which case it can be assumed
> to have dropped off the bus, but this is all platform-specific, so on
> some platforms there may be ways to revive such devices).

If the device is Configuration-Ready or inaccessible after
platform_pci_set_power_state(PCI_D0), then I agree we shouldn't add
that delay in pci_power_up().

> That said, I'm all for documenting it.

Strawman below.  "Inaccessible upon return" is bound to lead to
questions, but if the PCI core can't do anything I don't know what
else to say.

  diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
  index ed1b1d7a7b46..2b0513c35387 100644
  --- a/drivers/pci/pci.c
  +++ b/drivers/pci/pci.c
  @@ -1068,6 +1068,12 @@ static inline bool platform_pci_power_manageable(struct pci_dev *dev)
	  return acpi_pci_power_manageable(dev);
   }
   
  +/*
  + * When setting power state to D0, platform_pci_set_power_state() ensures
  + * main power is on and that any required delays after a transition to D0
  + * have been completed.  The device is either Configuration-Ready or
  + * inaccessible upon return.
  + */
   static inline int platform_pci_set_power_state(struct pci_dev *dev,
						 pci_power_t t)
   {

> > > > If pci_power_up() is doing D3hot -> D0, the device already has
> > > > main power, so I suppose platform_pci_set_power_state(PCI_D0)
> > > > doesn't do anything.
> > >
> > > It may or may not.  Some platforms supply AML to run during D3hot ->
> > > D0 transitions (and the ACPI spec allows that).
> >
> > If the device started in D3hot, it never lost main power, and I
> > assumed platform_pci_set_power_state(PCI_D0) might just leave it in
> > D3hot.  It sounds like it takes it all the way to D0, although the
> > subsequent code that checks the PMCSR state does suggest that the
> > device might *not* be in D0:
> 
> On platforms using ACPI, platform_pci_set_power_state() carries out
> the AML part of the transition which may do nothing for devices in
> D3hot (for instance, there may be an ACPI power resource that needs to
> be turned "on" in order to restore power to the device, but if the
> power resource is "on" already, say because it is shared with another
> device that is in D0 ATM, its state may not change).
> 
> platform_pci_set_power_state() may also be a no-op for the given
> device (for example, if this is an add-on device without any
> associated AML).
> 
> The rule of thumb is that if there is a programmatic way to remove
> power from a PCIe device (so it can go into D3cold), it is generally
> platform-specific and there should be a complementary programmatic way
> to do the reverse (also generally platform-specific).  Otherwise, the
> device can go only as deep as D3hot except when the system as a whole
> is powered down or it is on a Thunderbolt link that can be
> disconnected at any time etc.
> 
> >     platform_pci_set_power_state(dev, PCI_D0);
> >
> >     pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> >     state = pmcsr & PCI_PM_CTRL_STATE_MASK;
> >     if (state == PCI_D0)
> >       goto end;
> >
> >     pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0);
> >     if (state == PCI_D3hot)
> >       pci_dev_d3_sleep(dev);
> >     ...
> >
> >   end:
> >
> > If platform_pci_set_power_state(PCI_D0) puts the device in D0 and
> > takes care of all the delays, "state" should always be PCI_D0, and we
> > shouldn't need the D3hot and D2 delays here.
> 
> Well, not quite, as per the above.

OK, so after platform_pci_set_power_state(PCI_D0), the device may be
in D3hot in some cases.  Then there's a D3hot -> D0 transition, and we
do pci_dev_d3_sleep(), but not pci_dev_wait().  Sec 2.3.1 says RRS is
permitted after a D3hot to D0uninitialized, and pci_dev_wait() is what
waits and retries for the RRS case.

It looks to me like we need pci_dev_wait() after that transition when
No_Soft_Reset == 0, i.e., something like the below.  Do you think this
is unnecessary?

  diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
  index 8f7cfcc00090..ed1b1d7a7b46 100644
  --- a/drivers/pci/pci.c
  +++ b/drivers/pci/pci.c
  @@ -1341,10 +1341,14 @@ int pci_power_up(struct pci_dev *dev)
	  pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0);
   
	  /* Mandatory transition delays; see PCI PM 1.2. */
  -	if (state == PCI_D3hot)
  +	if (state == PCI_D3hot) {
		  pci_dev_d3_sleep(dev);
  -	else if (state == PCI_D2)
  +		if (!(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
  +			pci_dev_wait(dev, "power up D3hot->D0",
  +				     PCIE_RESET_READY_POLL_MS);
  +	} else if (state == PCI_D2) {
		  udelay(PCI_PM_D2_DELAY);
  +	}
   
   end:
	  dev->current_state = PCI_D0;

  reply	other threads:[~2026-05-08 21:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-03 13:34 [PATCH] PCI: Drop unnecessary retries when restoring BARs Lukas Wunner
2026-05-03 13:51 ` sashiko-bot
2026-05-04  7:49   ` Lukas Wunner
2026-05-04 17:09     ` Bjorn Helgaas
2026-05-04 19:31       ` Rafael J. Wysocki
2026-05-04 21:17         ` Bjorn Helgaas
2026-05-05 10:43           ` Rafael J. Wysocki
2026-05-08  0:17             ` Bjorn Helgaas
2026-05-08 12:51               ` Rafael J. Wysocki
2026-05-08 21:43                 ` Bjorn Helgaas [this message]
2026-05-11 15:24                   ` Rafael J. Wysocki
2026-05-12  0:01                     ` Bjorn Helgaas
2026-05-12 10:49                       ` Rafael J. Wysocki
2026-05-22 22:31 ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260508214309.GA14699@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=alex@shazbot.org \
    --cc=echanude@redhat.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=jean.guyader@gmail.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=michal.winiarski@intel.com \
    --cc=mika.westerberg@linux.intel.com \
    --cc=mnencia@kcore.it \
    --cc=okaya@kernel.org \
    --cc=rafael@kernel.org \
    --cc=sashiko-bot@kernel.org \
    --cc=sashiko@lists.linux.dev \
    --cc=superm1@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.