From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Matthew W Carlis <mattc@purestorage.com>
Cc: linux-pci@vger.kernel.org, bhelgaas@google.com,
ashishk@purestorage.com, macro@orcam.me.uk,
msaggi@purestorage.com, sconnor@purestorage.com
Subject: Re: [PATCH v2 1/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824
Date: Thu, 3 Jul 2025 15:11:30 +0300 (EEST) [thread overview]
Message-ID: <eb43f8c8-cf36-0c94-e87d-78d8ef8efb9c@linux.intel.com> (raw)
In-Reply-To: <20250702052430.13716-2-mattc@purestorage.com>
[-- Attachment #1: Type: text/plain, Size: 3894 bytes --]
On Tue, 1 Jul 2025, Matthew W Carlis wrote:
Hi Matthew,
I have a few simple style related comments to the patch itself and
questions about this scenario below.
The wording in the shortlog (in subject) sounds a bit clumsy to me,
perhaps change it to something like this:
PCI: Don't use Target Speed quirk if device is not ASM2824
> The pcie_failed_link_retrain() was added due to a behavior observed with
> a very specific set of circumstances which are in a comment above the
> function. The "quirk" is supposed to force the link down to Gen1 in the
> case where LTSSM is stuck in a loop or failing to train etc. The problem
> is that this "quirk" is applied to any bridge & it can often write the
> Gen1 TLS (Target Link Speed) when it should not. Leaving the port in
> a state that will result in a device linking up at Gen1 when it should not.
> Incorrect action by pcie_failed_link_retrain() has been observed with a
> variety of different NVMe drives using U.2 connectors & in multiple different
> hardware designs. Directly attached to the root port, downstream of a
> PCIe switch (Microchip/Broadcom) with different generations of Intel CPU.
> All of these systems were configured without power controller capability.
> They were also all in compliance with the Async Hot-Plug Reference model in
> PCI Express® Base Specification Revision 6.0 Appendix I. for OS controlled
> DPC Hot-Plug.
> The issue appears to be more likely to hit to be applied when using
> OOB PD (out-of band presence detect), but has also been observed without
> OOB PD support ('DLL State Changed' or 'In-Band PD').
> Powering off or power cycling the slot via an out-of-band power control
> mechanism with OOB PD is extremely likely to hit since the kernel would
> see that slot presence is true. Physical Hot-insertion is also extremly
extremely
> likely to hit this issue with OOB PD with U.2 drives due to timing
> between presence assertion and the actual power-on/link-up of the NVMe
> drive itself. When the device eventually does power-up the TLS would
> have been left forced to Gen1. This is similarly true to the case of
> power cycling or powering off the slot.
> Exact circumstances for when this issue has been hit in a system without
> OOB PD due hasn't been fully understood to due having less reproductions
> as well as having reverted this patch for this configurations.
Paragraphs should be separated with empty lines and started without spaces
as indent.
This description did not answer to the key question, why does
pcie_lbms_seen() returns true in these case which is required for 2.5GT/s
to be set for the bridge? Is it a stale indication? Would LBMS get
cleared but quirk runs too soon to see that?
Is this mainly related to some artificial test that rapidly fires event
after another (which is known to confuse the quirk)? ...I mean, you say
"extremely likely".
I suppose when the problem occurs and the bridge remains at 2.5GT/s, is it
possible to restore the higher speed using the pcie_cooling device
associated with the bridge / bwctrl? You can find the correct cooling
device with this:
grep -H . /sys/class/thermal/cooling_device*/type | grep PCIe_
...and then write to cur_state.
> Signed-off-by: Matthew W Carlis <mattc@purestorage.com>
> ---
> drivers/pci/quirks.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index d7f4ee634263..39bb0c025119 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -100,6 +100,8 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
> };
> u16 lnksta, lnkctl2;
> int ret = -ENOTTY;
As per the coding style, please add an empty line after the local
variables.
> + if (!pci_match_id(ids, dev))
> + return ret;
--
i.
next prev parent reply other threads:[~2025-07-03 12:11 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-02 5:24 [PATCH v2 0/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824 Matthew W Carlis
2025-07-02 5:24 ` [PATCH v2 1/1] " Matthew W Carlis
2025-07-03 12:11 ` Ilpo Järvinen [this message]
2025-07-03 23:53 ` [PATCH v2 0/1] " Matthew W Carlis
2025-07-04 10:20 ` Ilpo Järvinen
2025-07-08 22:49 ` Matthew W Carlis
2025-07-09 9:45 ` Ilpo Järvinen
2025-07-09 18:52 ` Matthew W Carlis
2025-07-09 20:27 ` Matthew W Carlis
2025-07-11 13:46 ` Ilpo Järvinen
2025-07-16 13:01 ` Maciej W. Rozycki
2025-07-23 19:13 ` Matthew W Carlis
2025-08-01 16:04 ` Maciej W. Rozycki
2025-08-15 0:35 ` Matthew W Carlis
2025-12-01 6:33 ` Maciej W. Rozycki
2025-12-23 1:12 ` Matthew W Carlis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=eb43f8c8-cf36-0c94-e87d-78d8ef8efb9c@linux.intel.com \
--to=ilpo.jarvinen@linux.intel.com \
--cc=ashishk@purestorage.com \
--cc=bhelgaas@google.com \
--cc=linux-pci@vger.kernel.org \
--cc=macro@orcam.me.uk \
--cc=mattc@purestorage.com \
--cc=msaggi@purestorage.com \
--cc=sconnor@purestorage.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.