Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Nathan Rossi <nathan@nathanrossi.com>
Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	Nathan Rossi <nathan.rossi@digi.com>,
	Bjorn Helgaas <bhelgaas@google.com>
Subject: Re: [PATCH] PCI/ASPM: Wait for data link active after retraining
Date: Tue, 8 Nov 2022 16:29:44 -0600	[thread overview]
Message-ID: <20221108222944.GA504625@bhelgaas> (raw)
In-Reply-To: <20220602065544.2552771-1-nathan@nathanrossi.com>

On Thu, Jun 02, 2022 at 06:55:44AM +0000, Nathan Rossi wrote:
> From: Nathan Rossi <nathan.rossi@digi.com>
> 
> When retraining the link either the child or the parent device may have
> the data link layer state machine of the respective devices move out of
> the active state despite the physical link training being completed.
> Depending on how long is takes for the devices to return to the active
> state, the device may not be ready and any further reads/writes to the
> device can fail.
> 
> This issue is present with the pci-mvebu controller paired with a device
> supporting ASPM but without advertising the Slot Clock, where during
> boot the pcie_aspm_cap_init call would cause common clocks to be made
> consistent and then retrain the link. However the data link layer would
> not be active before any device initialization (e.g. ASPM capability
> queries, BAR configuration) causing improper configuration of the device
> without error.
> 
> To ensure the child device is accessible, after the link retraining use
> pcie_wait_for_link to perform the associated state checks and any needed
> delays.
> 
> Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
> ---
>  drivers/pci/pcie/aspm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index a96b7424c9..4b8a1810be 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -288,7 +288,8 @@ static void pcie_aspm_configure_common_clock(struct pcie_link_state *link)
>  		reg16 &= ~PCI_EXP_LNKCTL_CCC;
>  	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
>  
> -	if (pcie_retrain_link(link))
> +	/* Retrain link and then wait for the link to become active */
> +	if (pcie_retrain_link(link) && pcie_wait_for_link(parent, true))

pcie_retrain_link() waits for PCI_EXP_LNKSTA_LT (Link Training) to be
cleared, which means the LTSSM has exited the Configuration/Recovery
state.  pcie_wait_for_link() waits for PCI_EXP_LNKSTA_DLLLA (Data Link
Layer Link Active) to be set, which means the link is in DL_Active.

I don't see an explicit procedure in the spec for determining when
a link retrain is complete, but from PCIe r6.0, sec 6.2.11 (DPC):

  After software releases the Downstream Port from DPC, the Port’s
  LTSSM must transition to the Detect state, where the Link will
  attempt to retrain. Software can use Data Link Layer State Changed
  interrupts, DL_ACTIVE ERR_COR signaling, or both, to signal when the
  Link reaches the DL_Active state again.

and sec 6.6:

  On the completion of Link Training (entering the DL_Active state,
  see Section 3.2), a component must be able to receive and process
  TLPs and DLLPs.

The only use mentioned in the spec for the Link Training bit is the
implementation note in sec 7.5.3.7 about avoiding race conditions when
using the Retrain Link bit, where software should poll Link Training
until it returns to zero before setting the Retrain Link bit to change
link parameters.

And I think you're absolutely right that what we *want* here is the
data link layer DL_Active state, not just the link layer L0 state.

This all makes me think that checking the Link Training bit might be
the wrong thing to begin with.

Of course, the Data Link Layer Link Active bit wasn't added until PCIe
r1.1, and even now it's optional.  Without it, I don't know if there's
a way to make sure the link is in DL_Active.

Maybe pcie_retrain_link() should wait for Data Link Layer Link Active
if it is supported, and use the existing behavior of waiting for Link
Training to be cleared otherwise?

>  		return;
>  
>  	/* Training failed. Restore common clock configurations */
> ---
> 2.36.1

  parent reply	other threads:[~2022-11-08 22:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-02  6:55 [PATCH] PCI/ASPM: Wait for data link active after retraining Nathan Rossi
2022-09-26  0:59 ` Nathan Rossi
2022-10-02 17:56 ` Pali Rohár
2022-11-02  8:31   ` Nathan Rossi
2022-11-02  8:37     ` Pali Rohár
2022-11-08 22:29 ` Bjorn Helgaas [this message]
2022-11-09 17:34   ` Bjorn Helgaas
2022-11-14  3:59     ` Nathan Rossi
2022-12-12  9:11       ` Siddharth Vadapalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221108222944.GA504625@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=nathan.rossi@digi.com \
    --cc=nathan@nathanrossi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox