public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] PCI: Rework error reporting with PCIe failed link retraining
@ 2024-08-25 13:47 Maciej W. Rozycki
  2024-08-25 13:47 ` [PATCH v3 1/4] PCI: Clear the LBMS bit after a link retrain Maciej W. Rozycki
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Maciej W. Rozycki @ 2024-08-25 13:47 UTC (permalink / raw)
  To: Ilpo Järvinen, Matthew W Carlis, Bjorn Helgaas
  Cc: Mika Westerberg, Oliver O'Halloran, linux-pci, linux-kernel

Hi,

 This is a minor update to the patch series posted here: 
<https://lore.kernel.org/r/alpine.DEB.2.21.2408091017050.61955@angie.orcam.me.uk> 
which includes a change to 2/4 suggested by Ilpo to wait on LT rather than 
DLLLA with the recovery call to `pcie_retrain_link' in the failure path, 
so as to avoid an excessive delay where we expect training to fail anyway.

 This patch series addresses issues observed by Ilpo as reported here: 
<https://lore.kernel.org/r/aa2d1c4e-9961-d54a-00c7-ddf8e858a9b0@linux.intel.com/>, 
one with excessive delays happening when `pcie_failed_link_retrain' is 
called, but link retraining has not been actually attempted, and another 
one with an API misuse caused by a merge mistake.

 It also addresses an issue observed by Matthew as discussed here: 
<https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/> 
and here: 
<https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/>. 
where a stale LBMS bit state causes `pcie_failed_link_retrain', in the 
absence of a downstream device, to leave the link stuck at the 2.5GT/s 
speed rate, which then negatively impacts devices plugged in in the 
future.

 See individual change descriptions for further details.

 Original submission at:
<https://lore.kernel.org/r/alpine.DEB.2.21.2402092125070.2376@angie.orcam.me.uk/>.

  Maciej

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/4] PCI: Clear the LBMS bit after a link retrain
  2024-08-25 13:47 [PATCH v3 0/4] PCI: Rework error reporting with PCIe failed link retraining Maciej W. Rozycki
@ 2024-08-25 13:47 ` Maciej W. Rozycki
  2024-08-25 13:47 ` [PATCH v3 2/4] PCI: Revert to the original speed after PCIe failed link retraining Maciej W. Rozycki
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Maciej W. Rozycki @ 2024-08-25 13:47 UTC (permalink / raw)
  To: Ilpo Järvinen, Matthew W Carlis, Bjorn Helgaas
  Cc: Mika Westerberg, Oliver O'Halloran, linux-pci, linux-kernel

The LBMS bit, where implemented, is set by hardware either in response 
to the completion of retraining caused by writing 1 to the Retrain Link 
bit or whenever hardware has changed the link speed or width in attempt 
to correct unreliable link operation.  It is never cleared by hardware 
other than by software writing 1 to the bit position in the Link Status 
register and we never do such a write.

We currently have two places, namely `apply_bad_link_workaround' and 
`pcie_failed_link_retrain' in drivers/pci/controller/dwc/pcie-tegra194.c 
and drivers/pci/quirks.c respectively where we check the state of the 
LBMS bit and neither is interested in the state of the bit resulting 
from the completion of retraining, both check for a link fault.

And in particular `pcie_failed_link_retrain' causes issues consequently, 
by trying to retrain a link where there's no downstream device anymore 
and the state of 1 in the LBMS bit has been retained from when there was 
a device downstream that has since been removed.

Clear the LBMS bit then at the conclusion of `pcie_retrain_link', so 
that we have a single place that controls it and that our code can track 
link speed or width changes resulting from unreliable link operation.

Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Reported-by: Matthew W Carlis <mattc@purestorage.com>
Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/
Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: stable@vger.kernel.org # v6.5+
---
No change from v2.

New change in v2.
---
 drivers/pci/pci.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

linux-pcie-retrain-link-lbms-clear.diff
Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -4718,7 +4718,15 @@ int pcie_retrain_link(struct pci_dev *pd
 		pcie_capability_clear_word(pdev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL);
 	}
 
-	return pcie_wait_for_link_status(pdev, use_lt, !use_lt);
+	rc = pcie_wait_for_link_status(pdev, use_lt, !use_lt);
+
+	/*
+	 * Clear LBMS after a manual retrain so that the bit can be used
+	 * to track link speed or width changes made by hardware itself
+	 * in attempt to correct unreliable link operation.
+	 */
+	pcie_capability_write_word(pdev, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS);
+	return rc;
 }
 
 /**

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 2/4] PCI: Revert to the original speed after PCIe failed link retraining
  2024-08-25 13:47 [PATCH v3 0/4] PCI: Rework error reporting with PCIe failed link retraining Maciej W. Rozycki
  2024-08-25 13:47 ` [PATCH v3 1/4] PCI: Clear the LBMS bit after a link retrain Maciej W. Rozycki
@ 2024-08-25 13:47 ` Maciej W. Rozycki
  2024-08-26  9:16   ` Ilpo Järvinen
  2024-08-25 13:47 ` [PATCH v3 3/4] PCI: Correct error reporting with " Maciej W. Rozycki
  2024-08-25 13:47 ` [PATCH v3 4/4] PCI: Use an error code " Maciej W. Rozycki
  3 siblings, 1 reply; 8+ messages in thread
From: Maciej W. Rozycki @ 2024-08-25 13:47 UTC (permalink / raw)
  To: Ilpo Järvinen, Matthew W Carlis, Bjorn Helgaas
  Cc: Mika Westerberg, Oliver O'Halloran, linux-pci, linux-kernel

When `pcie_failed_link_retrain' has failed to retrain the link by hand 
it leaves the link speed restricted to 2.5GT/s, which will then affect 
any device that has been plugged in later on, which may not suffer from 
the problem that caused the speed restriction to have been attempted.  
Consequently such a downstream device will suffer from an unnecessary 
communication throughput limitation and therefore performance loss.

Remove the speed restriction then and revert the Link Control 2 register 
to its original state if link retraining with the speed restriction in 
place has failed.  Retrain the link again afterwards so as to remove any 
residual state, waiting on LT rather than DLLLA to avoid an excessive 
delay and ignoring the result as this training is supposed to fail anyway.

Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Reported-by: Matthew W Carlis <mattc@purestorage.com>
Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/
Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: stable@vger.kernel.org # v6.5+
---
Changes from v2:

- Wait on LT rather than DLLLA with clean-up retraining with the speed 
  restriction lifted, so as to avoid an excessive delay as it's supposed 
  to fail anyway.

New change in v2.
---
 drivers/pci/quirks.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

linux-pcie-failed-link-retrain-fail-unclamp.diff
Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -66,7 +66,7 @@
  * apply this erratum workaround to any downstream ports as long as they
  * support Link Active reporting and have the Link Control 2 register.
  * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
- * request a retrain and wait 200ms for the data link to go up.
+ * request a retrain and check the result.
  *
  * If this turns out successful and we know by the Vendor:Device ID it is
  * safe to do so, then lift the restriction, letting the devices negotiate
@@ -74,6 +74,10 @@
  * firmware may have already arranged and lift it with ports that already
  * report their data link being up.
  *
+ * Otherwise revert the speed to the original setting and request a retrain
+ * again to remove any residual state, ignoring the result as it's supposed
+ * to fail anyway.
+ *
  * Return TRUE if the link has been successfully retrained, otherwise FALSE.
  */
 bool pcie_failed_link_retrain(struct pci_dev *dev)
@@ -92,6 +96,8 @@ bool pcie_failed_link_retrain(struct pci
 	pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
 	if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
 	    PCI_EXP_LNKSTA_LBMS) {
+		u16 oldlnkctl2 = lnkctl2;
+
 		pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
 
 		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
@@ -100,6 +106,9 @@ bool pcie_failed_link_retrain(struct pci
 
 		if (pcie_retrain_link(dev, false)) {
 			pci_info(dev, "retraining failed\n");
+			pcie_capability_write_word(dev, PCI_EXP_LNKCTL2,
+						   oldlnkctl2);
+			pcie_retrain_link(dev, true);
 			return false;
 		}
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 3/4] PCI: Correct error reporting with PCIe failed link retraining
  2024-08-25 13:47 [PATCH v3 0/4] PCI: Rework error reporting with PCIe failed link retraining Maciej W. Rozycki
  2024-08-25 13:47 ` [PATCH v3 1/4] PCI: Clear the LBMS bit after a link retrain Maciej W. Rozycki
  2024-08-25 13:47 ` [PATCH v3 2/4] PCI: Revert to the original speed after PCIe failed link retraining Maciej W. Rozycki
@ 2024-08-25 13:47 ` Maciej W. Rozycki
  2024-08-25 13:47 ` [PATCH v3 4/4] PCI: Use an error code " Maciej W. Rozycki
  3 siblings, 0 replies; 8+ messages in thread
From: Maciej W. Rozycki @ 2024-08-25 13:47 UTC (permalink / raw)
  To: Ilpo Järvinen, Matthew W Carlis, Bjorn Helgaas
  Cc: Mika Westerberg, Oliver O'Halloran, linux-pci, linux-kernel

Only return successful completion status from `pcie_failed_link_retrain' 
if retraining has actually been done, preventing excessive delays from 
being triggered at call sites in a hope that communication will finally 
be established with the downstream device where in fact nothing has been 
done about the link in question that would justify such a hope.

Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/aa2d1c4e-9961-d54a-00c7-ddf8e858a9b0@linux.intel.com/
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: stable@vger.kernel.org # v6.5+
---
No changes from v2.

Changes from v1 (superseding 1/2):

- Regenerate on top of 2/4.

- Reword the comment update for clarity.

- Go back to returning explicit FALSE rather than `ret' where it is known 
  that we failed.
---
 drivers/pci/quirks.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

linux-pcie-failed-link-retrain-status-fix.diff
Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -78,7 +78,8 @@
  * again to remove any residual state, ignoring the result as it's supposed
  * to fail anyway.
  *
- * Return TRUE if the link has been successfully retrained, otherwise FALSE.
+ * Return TRUE if the link has been successfully retrained.  Return FALSE
+ * if retraining was not needed or we attempted a retrain and it failed.
  */
 bool pcie_failed_link_retrain(struct pci_dev *dev)
 {
@@ -87,6 +88,7 @@ bool pcie_failed_link_retrain(struct pci
 		{}
 	};
 	u16 lnksta, lnkctl2;
+	bool ret = false;
 
 	if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
 	    !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
@@ -104,7 +106,8 @@ bool pcie_failed_link_retrain(struct pci
 		lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
 		pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
 
-		if (pcie_retrain_link(dev, false)) {
+		ret = pcie_retrain_link(dev, false) == 0;
+		if (!ret) {
 			pci_info(dev, "retraining failed\n");
 			pcie_capability_write_word(dev, PCI_EXP_LNKCTL2,
 						   oldlnkctl2);
@@ -126,13 +129,14 @@ bool pcie_failed_link_retrain(struct pci
 		lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;
 		pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
 
-		if (pcie_retrain_link(dev, false)) {
+		ret = pcie_retrain_link(dev, false) == 0;
+		if (!ret) {
 			pci_info(dev, "retraining failed\n");
 			return false;
 		}
 	}
 
-	return true;
+	return ret;
 }
 
 static ktime_t fixup_debug_start(struct pci_dev *dev,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 4/4] PCI: Use an error code with PCIe failed link retraining
  2024-08-25 13:47 [PATCH v3 0/4] PCI: Rework error reporting with PCIe failed link retraining Maciej W. Rozycki
                   ` (2 preceding siblings ...)
  2024-08-25 13:47 ` [PATCH v3 3/4] PCI: Correct error reporting with " Maciej W. Rozycki
@ 2024-08-25 13:47 ` Maciej W. Rozycki
  3 siblings, 0 replies; 8+ messages in thread
From: Maciej W. Rozycki @ 2024-08-25 13:47 UTC (permalink / raw)
  To: Ilpo Järvinen, Matthew W Carlis, Bjorn Helgaas
  Cc: Mika Westerberg, Oliver O'Halloran, linux-pci, linux-kernel

Given how the call place in `pcie_wait_for_link_delay' got structured 
now, and that `pcie_retrain_link' returns a potentially useful error 
code, convert `pcie_failed_link_retrain' to return an error code rather 
than a boolean status, fixing handling at the call site mentioned.  
Update the other call site accordingly.

Fixes: 1abb47390350 ("Merge branch 'pci/enumeration'")
Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/aa2d1c4e-9961-d54a-00c7-ddf8e858a9b0@linux.intel.com/
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: stable@vger.kernel.org # v6.5+
---
Changes from v2:

- Regenerate following changes to 2/4, no functional change.

Changes from v1 (superseding 2/2):

- Regenerate following changes to 3/4, no functional change.
---
 drivers/pci/pci.c    |    2 +-
 drivers/pci/pci.h    |    6 +++---
 drivers/pci/quirks.c |   20 ++++++++++----------
 3 files changed, 14 insertions(+), 14 deletions(-)

linux-pcie-failed-link-retrain-status-int.diff
Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -1324,7 +1324,7 @@ static int pci_dev_wait(struct pci_dev *
 		if (delay > PCI_RESET_WAIT) {
 			if (retrain) {
 				retrain = false;
-				if (pcie_failed_link_retrain(bridge)) {
+				if (pcie_failed_link_retrain(bridge) == 0) {
 					delay = 1;
 					continue;
 				}
Index: linux-macro/drivers/pci/pci.h
===================================================================
--- linux-macro.orig/drivers/pci/pci.h
+++ linux-macro/drivers/pci/pci.h
@@ -606,7 +606,7 @@ void pci_acs_init(struct pci_dev *dev);
 int pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags);
 int pci_dev_specific_enable_acs(struct pci_dev *dev);
 int pci_dev_specific_disable_acs_redir(struct pci_dev *dev);
-bool pcie_failed_link_retrain(struct pci_dev *dev);
+int pcie_failed_link_retrain(struct pci_dev *dev);
 #else
 static inline int pci_dev_specific_acs_enabled(struct pci_dev *dev,
 					       u16 acs_flags)
@@ -621,9 +621,9 @@ static inline int pci_dev_specific_disab
 {
 	return -ENOTTY;
 }
-static inline bool pcie_failed_link_retrain(struct pci_dev *dev)
+static inline int pcie_failed_link_retrain(struct pci_dev *dev)
 {
-	return false;
+	return -ENOTTY;
 }
 #endif
 
Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -78,21 +78,21 @@
  * again to remove any residual state, ignoring the result as it's supposed
  * to fail anyway.
  *
- * Return TRUE if the link has been successfully retrained.  Return FALSE
+ * Return 0 if the link has been successfully retrained.  Return an error
  * if retraining was not needed or we attempted a retrain and it failed.
  */
-bool pcie_failed_link_retrain(struct pci_dev *dev)
+int pcie_failed_link_retrain(struct pci_dev *dev)
 {
 	static const struct pci_device_id ids[] = {
 		{ PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
 		{}
 	};
 	u16 lnksta, lnkctl2;
-	bool ret = false;
+	int ret = -ENOTTY;
 
 	if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
 	    !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
-		return false;
+		return ret;
 
 	pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
 	pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
@@ -106,13 +106,13 @@ bool pcie_failed_link_retrain(struct pci
 		lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
 		pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
 
-		ret = pcie_retrain_link(dev, false) == 0;
-		if (!ret) {
+		ret = pcie_retrain_link(dev, false);
+		if (ret) {
 			pci_info(dev, "retraining failed\n");
 			pcie_capability_write_word(dev, PCI_EXP_LNKCTL2,
 						   oldlnkctl2);
 			pcie_retrain_link(dev, true);
-			return false;
+			return ret;
 		}
 
 		pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
@@ -129,10 +129,10 @@ bool pcie_failed_link_retrain(struct pci
 		lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;
 		pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
 
-		ret = pcie_retrain_link(dev, false) == 0;
-		if (!ret) {
+		ret = pcie_retrain_link(dev, false);
+		if (ret) {
 			pci_info(dev, "retraining failed\n");
-			return false;
+			return ret;
 		}
 	}
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/4] PCI: Revert to the original speed after PCIe failed link retraining
  2024-08-25 13:47 ` [PATCH v3 2/4] PCI: Revert to the original speed after PCIe failed link retraining Maciej W. Rozycki
@ 2024-08-26  9:16   ` Ilpo Järvinen
  2024-09-09 12:50     ` Ilpo Järvinen
  0 siblings, 1 reply; 8+ messages in thread
From: Ilpo Järvinen @ 2024-08-26  9:16 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Matthew W Carlis, Bjorn Helgaas, Mika Westerberg,
	Oliver O'Halloran, linux-pci, LKML

[-- Attachment #1: Type: text/plain, Size: 3616 bytes --]

On Sun, 25 Aug 2024, Maciej W. Rozycki wrote:

> When `pcie_failed_link_retrain' has failed to retrain the link by hand 
> it leaves the link speed restricted to 2.5GT/s, which will then affect 
> any device that has been plugged in later on, which may not suffer from 
> the problem that caused the speed restriction to have been attempted.  
> Consequently such a downstream device will suffer from an unnecessary 
> communication throughput limitation and therefore performance loss.
> 
> Remove the speed restriction then and revert the Link Control 2 register 
> to its original state if link retraining with the speed restriction in 
> place has failed.  Retrain the link again afterwards so as to remove any 
> residual state, waiting on LT rather than DLLLA to avoid an excessive 
> delay and ignoring the result as this training is supposed to fail anyway.
> 
> Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
> Reported-by: Matthew W Carlis <mattc@purestorage.com>
> Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/
> Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/
> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
> Cc: stable@vger.kernel.org # v6.5+
> ---
> Changes from v2:
> 
> - Wait on LT rather than DLLLA with clean-up retraining with the speed 
>   restriction lifted, so as to avoid an excessive delay as it's supposed 
>   to fail anyway.
> 
> New change in v2.
> ---
>  drivers/pci/quirks.c |   11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> linux-pcie-failed-link-retrain-fail-unclamp.diff
> Index: linux-macro/drivers/pci/quirks.c
> ===================================================================
> --- linux-macro.orig/drivers/pci/quirks.c
> +++ linux-macro/drivers/pci/quirks.c
> @@ -66,7 +66,7 @@
>   * apply this erratum workaround to any downstream ports as long as they
>   * support Link Active reporting and have the Link Control 2 register.
>   * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
> - * request a retrain and wait 200ms for the data link to go up.
> + * request a retrain and check the result.
>   *
>   * If this turns out successful and we know by the Vendor:Device ID it is
>   * safe to do so, then lift the restriction, letting the devices negotiate
> @@ -74,6 +74,10 @@
>   * firmware may have already arranged and lift it with ports that already
>   * report their data link being up.
>   *
> + * Otherwise revert the speed to the original setting and request a retrain
> + * again to remove any residual state, ignoring the result as it's supposed
> + * to fail anyway.
> + *
>   * Return TRUE if the link has been successfully retrained, otherwise FALSE.
>   */
>  bool pcie_failed_link_retrain(struct pci_dev *dev)
> @@ -92,6 +96,8 @@ bool pcie_failed_link_retrain(struct pci
>  	pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
>  	if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
>  	    PCI_EXP_LNKSTA_LBMS) {
> +		u16 oldlnkctl2 = lnkctl2;
> +
>  		pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
>  
>  		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> @@ -100,6 +106,9 @@ bool pcie_failed_link_retrain(struct pci
>  
>  		if (pcie_retrain_link(dev, false)) {
>  			pci_info(dev, "retraining failed\n");
> +			pcie_capability_write_word(dev, PCI_EXP_LNKCTL2,
> +						   oldlnkctl2);
> +			pcie_retrain_link(dev, true);

Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

-- 
 i.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/4] PCI: Revert to the original speed after PCIe failed link retraining
  2024-08-26  9:16   ` Ilpo Järvinen
@ 2024-09-09 12:50     ` Ilpo Järvinen
  2024-09-09 14:11       ` Krzysztof Wilczyński
  0 siblings, 1 reply; 8+ messages in thread
From: Ilpo Järvinen @ 2024-09-09 12:50 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Maciej W. Rozycki, Matthew W Carlis, Mika Westerberg,
	Oliver O'Halloran, linux-pci, LKML

[-- Attachment #1: Type: text/plain, Size: 4050 bytes --]

Hi Bjorn,

You still seem to have the old version of this patch in enumeration 
branch.

Could you please consider replacing it with this v3 one that is slightly 
better (use_lt was changed to true because it makes more sense).

On Mon, 26 Aug 2024, Ilpo Järvinen wrote:
> On Sun, 25 Aug 2024, Maciej W. Rozycki wrote:
> 
> > When `pcie_failed_link_retrain' has failed to retrain the link by hand 
> > it leaves the link speed restricted to 2.5GT/s, which will then affect 
> > any device that has been plugged in later on, which may not suffer from 
> > the problem that caused the speed restriction to have been attempted.  
> > Consequently such a downstream device will suffer from an unnecessary 
> > communication throughput limitation and therefore performance loss.
> > 
> > Remove the speed restriction then and revert the Link Control 2 register 
> > to its original state if link retraining with the speed restriction in 
> > place has failed.  Retrain the link again afterwards so as to remove any 
> > residual state, waiting on LT rather than DLLLA to avoid an excessive 
> > delay and ignoring the result as this training is supposed to fail anyway.
> > 
> > Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
> > Reported-by: Matthew W Carlis <mattc@purestorage.com>
> > Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/
> > Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/
> > Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
> > Cc: stable@vger.kernel.org # v6.5+
> > ---
> > Changes from v2:
> > 
> > - Wait on LT rather than DLLLA with clean-up retraining with the speed 
> >   restriction lifted, so as to avoid an excessive delay as it's supposed 
> >   to fail anyway.
> > 
> > New change in v2.
> > ---
> >  drivers/pci/quirks.c |   11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > linux-pcie-failed-link-retrain-fail-unclamp.diff
> > Index: linux-macro/drivers/pci/quirks.c
> > ===================================================================
> > --- linux-macro.orig/drivers/pci/quirks.c
> > +++ linux-macro/drivers/pci/quirks.c
> > @@ -66,7 +66,7 @@
> >   * apply this erratum workaround to any downstream ports as long as they
> >   * support Link Active reporting and have the Link Control 2 register.
> >   * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
> > - * request a retrain and wait 200ms for the data link to go up.
> > + * request a retrain and check the result.
> >   *
> >   * If this turns out successful and we know by the Vendor:Device ID it is
> >   * safe to do so, then lift the restriction, letting the devices negotiate
> > @@ -74,6 +74,10 @@
> >   * firmware may have already arranged and lift it with ports that already
> >   * report their data link being up.
> >   *
> > + * Otherwise revert the speed to the original setting and request a retrain
> > + * again to remove any residual state, ignoring the result as it's supposed
> > + * to fail anyway.
> > + *
> >   * Return TRUE if the link has been successfully retrained, otherwise FALSE.
> >   */
> >  bool pcie_failed_link_retrain(struct pci_dev *dev)
> > @@ -92,6 +96,8 @@ bool pcie_failed_link_retrain(struct pci
> >  	pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
> >  	if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
> >  	    PCI_EXP_LNKSTA_LBMS) {
> > +		u16 oldlnkctl2 = lnkctl2;
> > +
> >  		pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
> >  
> >  		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> > @@ -100,6 +106,9 @@ bool pcie_failed_link_retrain(struct pci
> >  
> >  		if (pcie_retrain_link(dev, false)) {
> >  			pci_info(dev, "retraining failed\n");
> > +			pcie_capability_write_word(dev, PCI_EXP_LNKCTL2,
> > +						   oldlnkctl2);
> > +			pcie_retrain_link(dev, true);
> 
> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>


-- 
 i.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/4] PCI: Revert to the original speed after PCIe failed link retraining
  2024-09-09 12:50     ` Ilpo Järvinen
@ 2024-09-09 14:11       ` Krzysztof Wilczyński
  0 siblings, 0 replies; 8+ messages in thread
From: Krzysztof Wilczyński @ 2024-09-09 14:11 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Bjorn Helgaas, Maciej W. Rozycki, Matthew W Carlis,
	Mika Westerberg, Oliver O'Halloran, linux-pci, LKML

Hello,

> You still seem to have the old version of this patch in enumeration 
> branch.
> 
> Could you please consider replacing it with this v3 one that is slightly 
> better (use_lt was changed to true because it makes more sense).

Done.  Should be up to date now.

	Krzysztof

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-09-09 14:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-25 13:47 [PATCH v3 0/4] PCI: Rework error reporting with PCIe failed link retraining Maciej W. Rozycki
2024-08-25 13:47 ` [PATCH v3 1/4] PCI: Clear the LBMS bit after a link retrain Maciej W. Rozycki
2024-08-25 13:47 ` [PATCH v3 2/4] PCI: Revert to the original speed after PCIe failed link retraining Maciej W. Rozycki
2024-08-26  9:16   ` Ilpo Järvinen
2024-09-09 12:50     ` Ilpo Järvinen
2024-09-09 14:11       ` Krzysztof Wilczyński
2024-08-25 13:47 ` [PATCH v3 3/4] PCI: Correct error reporting with " Maciej W. Rozycki
2024-08-25 13:47 ` [PATCH v3 4/4] PCI: Use an error code " Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox