linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824
@ 2025-06-27  0:26 Matthew W Carlis
  2025-06-27  0:26 ` [PATCH 1/1] " Matthew W Carlis
  2025-06-27  0:54 ` [PATCH 0/1] " Maciej W. Rozycki
  0 siblings, 2 replies; 5+ messages in thread
From: Matthew W Carlis @ 2025-06-27  0:26 UTC (permalink / raw)
  To: linux-pci; +Cc: bhelgaas, ashishk, macro, msaggi, sconnor, Matthew W Carlis

  I have reached out several times about issues caused by the
pcie_failed_link_retrain() quirk. Initially there were some additional changes
that we made to try and reduce the occurrences, but I have continued to
observe issues where hot-plug slots end up at Gen1 speed when they should not
have been or the quirk invoked when the link is not actually training at all.
Realistically speaking this quirk is a large regression to hot-plug
functionality in the kernel & therefore I am submitting this patch to restrict
the quirk to the ASMedia device where the LTSSM problem was actually observed
in the first place.
  The comment above the quirk states that the bad behavior was observed when the
Asmedia ASM2824 switch was upstream of some other device. Then further asserts
that the issue could in theory happen if the ASM2824 was downstream &
therefore I believe this is why it was concluded that the quirk should be
invoked on any device. i.e If the ASM2824 is the downstream device then we
could not use its device ID to trigger the quirk action.
  This is flawed in the sense that it was not actually observed and may not even
be an actual configuration anywhere in any device. It very well may not be the
case that ASM2824 could have this issue when it is the downstream device
because the issue was never root caused as far as I can tell & there is no
analyzer trace. The author had a noble goal, but it seems quite difficult to
capture the correct trigger & sequence of actions for every device considering
we're trying to address an issue beyond compliance with the spec afaik.

  In my testing I have encountered alarming rates of the quirk being invoked
when it should not be invoked & frequently degrading a link that would have
otherwise trained to full speed/width. The impact to hot-plug reliability is
observed to be extreme & therefore we believe cannot be justified to be
broadly applied. In the case of hot-insert the rate of being incorrectly forced
to Gen1 has been observed to be as high as 15% with some U.2 NVMe drives.
  It has been observed in several different system configurations with several
different U.2 NVMe drives from different vendors. All of the systems that we
have reproduced this issue on comply with PCI Express® Base Specification
Revision 6.0 Appendix I. Async Hot-Plug Reference Model for OS controlled DPC
or are very near to it. None of the systems I have tested implement Power
Controller capability.
  The largest occurrence of this issue has been observed on systems with OOB PD
(out-of-band presence detect), but it has also been observed in systems that
do not have OOB PD (Using Inband-PD or DLL State Changed).
  Actions likely to trigger the condition where the quirk forces the link to
Gen1 include physical hot-insert, slot power cycle, slot power on, toggle of
fundamental reset. By observation I believe there are several timing hazards
with DPC especially when using EDR. The expectation by DPC that the link should
recover before returning from DPC handling work is additionally a questionable
expectation in the case of a port being Hot Plug capable. Further, it appears
that the quirk can be invoked two times by the DPC driver. In the case of not
using HotPlug- with DPC it appears even more likely for invocation of the quirk
due to different handling around SDES (Surprise down error). In my mind this
makes a very complicated set of interactions even more complicated...

  In the case of hot-insert it appears that the power-up sequencing of drives &
their boot times directly contribute to the invocation of the quirk & the link
being forced to Gen1. For example, presence interrupt comes quickly
(first-to-mate in U.2) however the power pins are last-to-mate (ground pins
second to mate). Therefore presence can be seen even before power-up
sequencing in the drive is complete. If the drive powers-up, boots and the
link becomes active just after the quirk has written TLS (target link speed)
to Gen1 then your drive is forced to Gen1. If the sequence takes even longer
then you would see in the log "broken device" & "retraining failed", but
then later DLLSC would initiate the pciehp device add sequence again which
creates extreme confusion for most readers.
  In the case of power cycling the slot (without power controller capability)
then there are differences between OOB-PD enabled systems vs systems using DLL
State Change interrupts. In the case of OOB-PD the kernel will declare "Link
Down", set the ctrl state to OFF_STATE, remove the device, but then
immediately declare "Card present" & run down the pciehp_enable_slot() path,
but would run into the quirk since the slot power be off & it not see the link
train before timing out. Disabling OOB-PD & using recently deprecated
Inband-PD avoids the trap more often since presence is synthesized by LTSSM &
only asserted when the link is active however link degradation was still
observed in pcie resilience torture testing. Unfortunately I don't have a
meaningful characterization of the Inband-PD reproductions.
  With & without HotPlug capability the quirk becomes harder to hit after
pulling in the pci/pcie/bwctrl.c changes, but is still observable in several
circumstances. Those being observed around the handling of DPC with and with
EDR. The bottom line from my perspective is that even with bwctrl.c we still
observe a significant regression in hot-plug reliability in terms of arriving
at the correct speed. In my experience the link issue observed by the author
of the quirk is most likely an incompatibility between specific devices
as opposed to being something that could result from degraded link integrity or
device again & therefore should be restricted to the particular device where
observed.

Matthew W Carlis (1):
  PCI: pcie_failed_link_retrain() return if dev is not ASM2824

 drivers/pci/quirks.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

-- 
2.46.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824
  2025-06-27  0:26 [PATCH 0/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824 Matthew W Carlis
@ 2025-06-27  0:26 ` Matthew W Carlis
  2025-06-29  1:24   ` kernel test robot
  2025-06-27  0:54 ` [PATCH 0/1] " Maciej W. Rozycki
  1 sibling, 1 reply; 5+ messages in thread
From: Matthew W Carlis @ 2025-06-27  0:26 UTC (permalink / raw)
  To: linux-pci; +Cc: bhelgaas, ashishk, macro, msaggi, sconnor, Matthew W Carlis

  The pcie_failed_link_retrain() was added due to a behavior observed with
a very specific set of circumstances which are in a comment above the
function. The "quirk" is supposed to force the link down to Gen1 in the
case where LTSSM is stuck in a loop or failing to train etc. The problem
is that this "quirk" is applied to any bridge & it can often write the
Gen1 TLS (Target Link Speed) when it should not. Leaving the port in
a state that will result in a device linking up at Gen1 when it should not.
  Incorrect action by pcie_failed_link_retrain() has been observed with a
variety of different NVMe drives using U.2 connectors & in multiple different
hardware designs. Directly attached to the root port, downstream of a
PCIe switch (Microchip/Broadcom) with different generations of Intel CPU.
All of these systems were configured without power controller capability.
They were also all in compliance with the Async Hot-Plug Reference model in
PCI Express® Base Specification Revision 6.0 Appendix I. for OS controlled
DPC Hot-Plug.
  The issue appears to be more likely to hit to be applied when using
OOB PD (out-of band presence detect), but has also been observed without
OOB PD support ('DLL State Changed' or 'In-Band PD').
  Powering off or power cycling the slot via an out-of-band power control
mechanism with OOB PD is extremely likely to hit since the kernel would
see that slot presence is true. Physical Hot-insertion is also extremly
likely to hit this issue with OOB PD with U.2 drives due to timing
between presence assertion and the actual power-on/link-up of the NVMe
drive itself. When the device eventually does power-up the TLS would
have been left forced to Gen1. This is similarly true to the case of
power cycling or powering off the slot.
  Exact circumstances for when this issue has been hit in a system without
OOB PD due hasn't been fully understood to due having less reproductions
as well as having reverted this patch for this configurations.

Signed-off-by: Matthew W Carlis <mattc@purestorage.com>

 100.0% drivers/pci/
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index d7f4ee634263..497e4acc52c8 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -100,6 +100,8 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
 	};
 	u16 lnksta, lnkctl2;
 	int ret = -ENOTTY;
+	if (!pci_match_id(ids, dev)
+		return ret;
 
 	if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
 	    !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
@@ -124,8 +126,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
 	}
 
 	if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
-	    (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
-	    pci_match_id(ids, dev)) {
+	    (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) {
 		u32 lnkcap;
 
 		pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824
  2025-06-27  0:26 [PATCH 0/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824 Matthew W Carlis
  2025-06-27  0:26 ` [PATCH 1/1] " Matthew W Carlis
@ 2025-06-27  0:54 ` Maciej W. Rozycki
  2025-06-27 16:20   ` Matthew W Carlis
  1 sibling, 1 reply; 5+ messages in thread
From: Maciej W. Rozycki @ 2025-06-27  0:54 UTC (permalink / raw)
  To: Matthew W Carlis; +Cc: linux-pci, Bjorn Helgaas, ashishk, msaggi, sconnor

On Thu, 26 Jun 2025, Matthew W Carlis wrote:

>   I have reached out several times about issues caused by the
> pcie_failed_link_retrain() quirk. Initially there were some additional changes
> that we made to try and reduce the occurrences, but I have continued to
> observe issues where hot-plug slots end up at Gen1 speed when they should not
> have been or the quirk invoked when the link is not actually training at all.

 Have you verified that with a fix[1] applied for a regression introduced 
by commit de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set 
PCIe Link Speed") discussed in the other thread[2] you can still see those 
issues?

References:

[1] "PCI: Fix the issue of failed speed limit lifting", 
    <https://lore.kernel.org/r/20250123055155.22648-1-sjiwei@163.com/>

[2] "PCI: Fix link speed calculation on retrain failure",
    <https://lore.kernel.org/r/1c92ef6bcb314ee6977839b46b393282e4f52e74.1750684771.git.lukas@wunner.de/>

  Maciej

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 0/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824
  2025-06-27  0:54 ` [PATCH 0/1] " Maciej W. Rozycki
@ 2025-06-27 16:20   ` Matthew W Carlis
  0 siblings, 0 replies; 5+ messages in thread
From: Matthew W Carlis @ 2025-06-27 16:20 UTC (permalink / raw)
  To: macro; +Cc: ashishk, bhelgaas, linux-pci, mattc, msaggi, sconnor

On Fri, 27 Jun 2025, Maciej W. Rozycki wrote:
> Have you verified that with a fix[1] applied for a regression introduced 
> by commit de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set 
> PCIe Link Speed") discussed in the other thread[2] you can still see those 
> issues?

When I tested with bwctrl.c I also picked the mentioned patch with it.
Certainly the patch series related to bwctrl.c & the patch you mention helped
significantly, but it seems that it is still possible to have the quirk degrade
a 'would be healthy' link. What I have seen so far with the bwctrl patch series
have been during PCIe error injection testing & the quirk running at the end of
DPC handling during pci_bridge_wait_for_secondary_bus(). I believe sometimes an
endpoint isn't ready for the link to be active again during DPC handling as
soon as DPC status is cleared & therefore the quirk may be triggered. One case
I haven't quite fully explained was the quirk running twice when DPC is in
pci_bridge_wait_for_secondary_bus().. It means that both
pcie_wait_for_link_delay() and the final pci_dev_wait() must have been the
callers of the quirk(), but I think it implies that the first must have thought
the link initially wasn't working then the quirk thought forcing to Gen1 fixed
the issue however the drive must not have been read to respond to command
register reads soon enough.

On another note I realized I left off a closing parenthesis in my patch... I
guess I would have to re-submit if we move forward on this.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824
  2025-06-27  0:26 ` [PATCH 1/1] " Matthew W Carlis
@ 2025-06-29  1:24   ` kernel test robot
  0 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2025-06-29  1:24 UTC (permalink / raw)
  To: Matthew W Carlis, linux-pci
  Cc: llvm, oe-kbuild-all, bhelgaas, ashishk, macro, msaggi, sconnor,
	Matthew W Carlis

Hi Matthew,

kernel test robot noticed the following build errors:

[auto build test ERROR on pci/next]
[also build test ERROR on pci/for-linus linus/master v6.16-rc3 next-20250627]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-W-Carlis/PCI-pcie_failed_link_retrain-return-if-dev-is-not-ASM2824/20250628-042356
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link:    https://lore.kernel.org/r/20250627002652.22920-2-mattc%40purestorage.com
patch subject: [PATCH 1/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20250629/202506290953.x9ODOCCs-lkp@intel.com/config)
compiler: clang version 20.1.7 (https://github.com/llvm/llvm-project 6146a88f60492b520a36f8f8f3231e15f3cc6082)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250629/202506290953.x9ODOCCs-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202506290953.x9ODOCCs-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/pci/quirks.c:104:3: error: expected ')'
     104 |                 return ret;
         |                 ^
   drivers/pci/quirks.c:103:5: note: to match this '('
     103 |         if (!pci_match_id(ids, dev)
         |            ^
   1 error generated.


vim +104 drivers/pci/quirks.c

    46	
    47	/*
    48	 * Retrain the link of a downstream PCIe port by hand if necessary.
    49	 *
    50	 * This is needed at least where a downstream port of the ASMedia ASM2824
    51	 * Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
    52	 * Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
    53	 * 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
    54	 * board.
    55	 *
    56	 * In such a configuration the switches are supposed to negotiate the link
    57	 * speed of preferably 5.0GT/s, falling back to 2.5GT/s.  However the link
    58	 * continues switching between the two speeds indefinitely and the data
    59	 * link layer never reaches the active state, with link training reported
    60	 * repeatedly active ~84% of the time.  Forcing the target link speed to
    61	 * 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
    62	 * each other correctly however.  And more interestingly retraining with a
    63	 * higher target link speed afterwards lets the two successfully negotiate
    64	 * 5.0GT/s.
    65	 *
    66	 * With the ASM2824 we can rely on the otherwise optional Data Link Layer
    67	 * Link Active status bit and in the failed link training scenario it will
    68	 * be off along with the Link Bandwidth Management Status indicating that
    69	 * hardware has changed the link speed or width in an attempt to correct
    70	 * unreliable link operation.  For a port that has been left unconnected
    71	 * both bits will be clear.  So use this information to detect the problem
    72	 * rather than polling the Link Training bit and watching out for flips or
    73	 * at least the active status.
    74	 *
    75	 * Since the exact nature of the problem isn't known and in principle this
    76	 * could trigger where an ASM2824 device is downstream rather upstream,
    77	 * apply this erratum workaround to any downstream ports as long as they
    78	 * support Link Active reporting and have the Link Control 2 register.
    79	 * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
    80	 * request a retrain and check the result.
    81	 *
    82	 * If this turns out successful and we know by the Vendor:Device ID it is
    83	 * safe to do so, then lift the restriction, letting the devices negotiate
    84	 * a higher speed.  Also check for a similar 2.5GT/s speed restriction the
    85	 * firmware may have already arranged and lift it with ports that already
    86	 * report their data link being up.
    87	 *
    88	 * Otherwise revert the speed to the original setting and request a retrain
    89	 * again to remove any residual state, ignoring the result as it's supposed
    90	 * to fail anyway.
    91	 *
    92	 * Return 0 if the link has been successfully retrained.  Return an error
    93	 * if retraining was not needed or we attempted a retrain and it failed.
    94	 */
    95	int pcie_failed_link_retrain(struct pci_dev *dev)
    96	{
    97		static const struct pci_device_id ids[] = {
    98			{ PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
    99			{}
   100		};
   101		u16 lnksta, lnkctl2;
   102		int ret = -ENOTTY;
   103		if (!pci_match_id(ids, dev)
 > 104			return ret;
   105	
   106		if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
   107		    !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
   108			return ret;
   109	
   110		pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
   111		pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
   112		if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) {
   113			u16 oldlnkctl2 = lnkctl2;
   114	
   115			pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
   116	
   117			ret = pcie_set_target_speed(dev, PCIE_SPEED_2_5GT, false);
   118			if (ret) {
   119				pci_info(dev, "retraining failed\n");
   120				pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2),
   121						      true);
   122				return ret;
   123			}
   124	
   125			pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
   126		}
   127	
   128		if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
   129		    (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) {
   130			u32 lnkcap;
   131	
   132			pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
   133			pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
   134			ret = pcie_set_target_speed(dev, PCIE_LNKCAP_SLS2SPEED(lnkcap), false);
   135			if (ret) {
   136				pci_info(dev, "retraining failed\n");
   137				return ret;
   138			}
   139		}
   140	
   141		return ret;
   142	}
   143	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-06-29  1:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-27  0:26 [PATCH 0/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824 Matthew W Carlis
2025-06-27  0:26 ` [PATCH 1/1] " Matthew W Carlis
2025-06-29  1:24   ` kernel test robot
2025-06-27  0:54 ` [PATCH 0/1] " Maciej W. Rozycki
2025-06-27 16:20   ` Matthew W Carlis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).