[PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures
@ 2023-06-11 17:19 Maciej W. Rozycki
  2023-06-11 17:19 ` [PATCH v9 01/14] PCI: pciehp: Rely on `link_active_reporting' Maciej W. Rozycki
                   ` (14 more replies)
  0 siblings, 15 replies; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:19 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Hi,

 This is v9 of the change to work around a PCIe link training phenomenon 
where a pair of devices both capable of operating at a link speed above 
2.5GT/s seems unable to negotiate the link speed and continues training 
indefinitely with the Link Training bit switching on and off repeatedly 
and the data link layer never reaching the active state.

 With several requests addressed and a few extra issues spotted this
version has now grown to 14 patches.  It has been verified for device 
enumeration with and without PCI_QUIRKS enabled, using the same piece of 
RISC-V hardware as previously.  Hot plug or reset events have not been 
verified, as this is difficult if at all feasible with hardware in 
question.

 Last iteration: 
<https://lore.kernel.org/r/alpine.DEB.2.21.2304060100160.13659@angie.orcam.me.uk/>, 
and my input to it:
<https://lore.kernel.org/r/alpine.DEB.2.21.2306080224280.36323@angie.orcam.me.uk/>.

  Maciej

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v9 01/14] PCI: pciehp: Rely on `link_active_reporting'
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
@ 2023-06-11 17:19 ` Maciej W. Rozycki
  2023-06-11 17:19 ` [PATCH v9 02/14] PCI: Export PCIe link retrain timeout Maciej W. Rozycki
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:19 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Use `link_active_reporting' to determine whether Data Link Layer Link 
Active Reporting is available rather than re-retrieving the capability.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
---
NB this has been compile-tested only with PPC64LE and x86-64
configurations.

No change from v8.

Changes from v7:

- Add Reviewed-by: tag by Lukas Wunner.

- Reorder from 6/7.

No change from v6.

New change in v6.
---
 drivers/pci/hotplug/pciehp_hpc.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

linux-pcie-link-active-reporting-hpc.diff
Index: linux-macro/drivers/pci/hotplug/pciehp_hpc.c
===================================================================
--- linux-macro.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-macro/drivers/pci/hotplug/pciehp_hpc.c
@@ -984,7 +984,7 @@ static inline int pcie_hotplug_depth(str
 struct controller *pcie_init(struct pcie_device *dev)
 {
 	struct controller *ctrl;
-	u32 slot_cap, slot_cap2, link_cap;
+	u32 slot_cap, slot_cap2;
 	u8 poweron;
 	struct pci_dev *pdev = dev->port;
 	struct pci_bus *subordinate = pdev->subordinate;
@@ -1030,9 +1030,6 @@ struct controller *pcie_init(struct pcie
 	if (dmi_first_match(inband_presence_disabled_dmi_table))
 		ctrl->inband_presence_disabled = 1;
 
-	/* Check if Data Link Layer Link Active Reporting is implemented */
-	pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &link_cap);
-
 	/* Clear all remaining event bits in Slot Status register. */
 	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
 		PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
@@ -1051,7 +1048,7 @@ struct controller *pcie_init(struct pcie
 		FLAG(slot_cap, PCI_EXP_SLTCAP_EIP),
 		FLAG(slot_cap, PCI_EXP_SLTCAP_NCCS),
 		FLAG(slot_cap2, PCI_EXP_SLTCAP2_IBPD),
-		FLAG(link_cap, PCI_EXP_LNKCAP_DLLLARC),
+		FLAG(pdev->link_active_reporting, true),
 		pdev->broken_cmd_compl ? " (with Cmd Compl erratum)" : "");
 
 	/*

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v9 02/14] PCI: Export PCIe link retrain timeout
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
  2023-06-11 17:19 ` [PATCH v9 01/14] PCI: pciehp: Rely on `link_active_reporting' Maciej W. Rozycki
@ 2023-06-11 17:19 ` Maciej W. Rozycki
  2023-06-11 17:19 ` [PATCH v9 03/14] PCI: Execute `quirk_enable_clear_retrain_link' earlier Maciej W. Rozycki
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:19 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Convert LINK_RETRAIN_TIMEOUT from jiffies to milliseconds, accordingly 
rename to PCIE_LINK_RETRAIN_TIMEOUT_MS, and make available via "pci.h" 
for PCI drivers to use.  Use in `pcie_wait_for_link_delay'.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
Changes from v8:

- Convert LINK_RETRAIN_TIMEOUT from jiffies to milliseconds, rename it to
  PCIE_LINK_RETRAIN_TIMEOUT_MS rather than PCIE_LINK_RETRAIN_TIMEOUT, and 
  adjust its use accordingly.

- Also replace hardcoded 1000 in `pcie_wait_for_link_delay'.

- Correct the change heading, s/PCI/PCIe/ for the link reference.

Changes from v7:

- Reorder from 1/7.

No change from v6.

No change from v5.

New change in v5.
---
 drivers/pci/pci.c       |    2 +-
 drivers/pci/pci.h       |    2 ++
 drivers/pci/pcie/aspm.c |    4 +---
 3 files changed, 4 insertions(+), 4 deletions(-)

linux-pcie-link-retrain-timeout.diff
Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -4860,7 +4860,7 @@ static int pci_pm_reset(struct pci_dev *
 static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
 				     int delay)
 {
-	int timeout = 1000;
+	int timeout = PCIE_LINK_RETRAIN_TIMEOUT_MS;
 	bool ret;
 	u16 lnk_status;
 
Index: linux-macro/drivers/pci/pci.h
===================================================================
--- linux-macro.orig/drivers/pci/pci.h
+++ linux-macro/drivers/pci/pci.h
@@ -11,6 +11,8 @@
 
 #define PCI_VSEC_ID_INTEL_TBT	0x1234	/* Thunderbolt */
 
+#define PCIE_LINK_RETRAIN_TIMEOUT_MS	1000
+
 extern const unsigned char pcie_link_speed[];
 extern bool pci_early_dump;
 
Index: linux-macro/drivers/pci/pcie/aspm.c
===================================================================
--- linux-macro.orig/drivers/pci/pcie/aspm.c
+++ linux-macro/drivers/pci/pcie/aspm.c
@@ -90,8 +90,6 @@ static const char *policy_str[] = {
 	[POLICY_POWER_SUPERSAVE] = "powersupersave"
 };
 
-#define LINK_RETRAIN_TIMEOUT HZ
-
 /*
  * The L1 PM substate capability is only implemented in function 0 in a
  * multi function device.
@@ -213,7 +211,7 @@ static bool pcie_retrain_link(struct pci
 	}
 
 	/* Wait for link training end. Break out after waiting for timeout */
-	end_jiffies = jiffies + LINK_RETRAIN_TIMEOUT;
+	end_jiffies = jiffies + msecs_to_jiffies(PCIE_LINK_RETRAIN_TIMEOUT_MS);
 	do {
 		pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &reg16);
 		if (!(reg16 & PCI_EXP_LNKSTA_LT))

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v9 03/14] PCI: Execute `quirk_enable_clear_retrain_link' earlier
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
  2023-06-11 17:19 ` [PATCH v9 01/14] PCI: pciehp: Rely on `link_active_reporting' Maciej W. Rozycki
  2023-06-11 17:19 ` [PATCH v9 02/14] PCI: Export PCIe link retrain timeout Maciej W. Rozycki
@ 2023-06-11 17:19 ` Maciej W. Rozycki
  2023-06-11 17:19 ` [PATCH v9 04/14] PCI: Initialize `link_active_reporting' earlier Maciej W. Rozycki
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:19 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Make `quirk_enable_clear_retrain_link' `pci_fixup_early' so that any later 
fixups can rely on `clear_retrain_link' to have been already initialised.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
No change from v8.

Changes from v7:

- Reorder from 2/7.

No change from v6.

No change from v5.

New change in v5.
---
 drivers/pci/quirks.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

linux-pcie-clear-retrain-link-early.diff
Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -2407,9 +2407,9 @@ static void quirk_enable_clear_retrain_l
 	dev->clear_retrain_link = 1;
 	pci_info(dev, "Enable PCIe Retrain Link quirk\n");
 }
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PERICOM, 0xe110, quirk_enable_clear_retrain_link);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PERICOM, 0xe111, quirk_enable_clear_retrain_link);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PERICOM, 0xe130, quirk_enable_clear_retrain_link);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_PERICOM, 0xe110, quirk_enable_clear_retrain_link);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_PERICOM, 0xe111, quirk_enable_clear_retrain_link);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_PERICOM, 0xe130, quirk_enable_clear_retrain_link);
 
 static void fixup_rev1_53c810(struct pci_dev *dev)
 {

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v9 04/14] PCI: Initialize `link_active_reporting' earlier
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (2 preceding siblings ...)
  2023-06-11 17:19 ` [PATCH v9 03/14] PCI: Execute `quirk_enable_clear_retrain_link' earlier Maciej W. Rozycki
@ 2023-06-11 17:19 ` Maciej W. Rozycki
  2023-06-11 17:19 ` [PATCH v9 05/14] powerpc/eeh: Rely on `link_active_reporting' Maciej W. Rozycki
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:19 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Determine whether Data Link Layer Link Active Reporting is available 
ahead of calling any fixups so that the cached value can be used there 
and later on.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
No change from v8.

Changes from v7:

- Reorder from 3/7.

Changes from v6:

- Regenerate against 6.3-rc5.

New change in v6.
---
 drivers/pci/probe.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

linux-pcie-link-active-reporting-early.diff
Index: linux-macro/drivers/pci/probe.c
===================================================================
--- linux-macro.orig/drivers/pci/probe.c
+++ linux-macro/drivers/pci/probe.c
@@ -820,7 +820,6 @@ static void pci_set_bus_speed(struct pci
 
 		pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP, &linkcap);
 		bus->max_bus_speed = pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS];
-		bridge->link_active_reporting = !!(linkcap & PCI_EXP_LNKCAP_DLLLARC);
 
 		pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta);
 		pcie_update_link_speed(bus, linksta);
@@ -1829,6 +1828,7 @@ int pci_setup_device(struct pci_dev *dev
 	int err, pos = 0;
 	struct pci_bus_region region;
 	struct resource *res;
+	u32 linkcap;
 
 	hdr_type = pci_hdr_type(dev);
 
@@ -1876,6 +1876,10 @@ int pci_setup_device(struct pci_dev *dev
 	/* "Unknown power state" */
 	dev->current_state = PCI_UNKNOWN;
 
+	/* Set it early to make it available to fixups, etc.  */
+	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
+	dev->link_active_reporting = !!(linkcap & PCI_EXP_LNKCAP_DLLLARC);
+
 	/* Early fixups, before probing the BARs */
 	pci_fixup_device(pci_fixup_early, dev);
 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v9 05/14] powerpc/eeh: Rely on `link_active_reporting'
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (3 preceding siblings ...)
  2023-06-11 17:19 ` [PATCH v9 04/14] PCI: Initialize `link_active_reporting' earlier Maciej W. Rozycki
@ 2023-06-11 17:19 ` Maciej W. Rozycki
  2023-06-11 17:19 ` [PATCH v9 06/14] net/mlx5: " Maciej W. Rozycki
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:19 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Use `link_active_reporting' to determine whether Data Link Layer Link 
Active Reporting is available rather than re-retrieving the capability.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
NB this has been compile-tested only with a PPC64LE configuration.

No change from v8.

Changes from v7:

- Reorder from 4/7.

No change from v6.

New change in v6.
---
 arch/powerpc/kernel/eeh_pe.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

linux-pcie-link-active-reporting-eeh.diff
Index: linux-macro/arch/powerpc/kernel/eeh_pe.c
===================================================================
--- linux-macro.orig/arch/powerpc/kernel/eeh_pe.c
+++ linux-macro/arch/powerpc/kernel/eeh_pe.c
@@ -671,9 +671,8 @@ static void eeh_bridge_check_link(struct
 	eeh_ops->write_config(edev, cap + PCI_EXP_LNKCTL, 2, val);
 
 	/* Check link */
-	eeh_ops->read_config(edev, cap + PCI_EXP_LNKCAP, 4, &val);
-	if (!(val & PCI_EXP_LNKCAP_DLLLARC)) {
-		eeh_edev_dbg(edev, "No link reporting capability (0x%08x) \n", val);
+	if (!edev->pdev->link_active_reporting) {
+		eeh_edev_dbg(edev, "No link reporting capability\n");
 		msleep(1000);
 		return;
 	}

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v9 06/14] net/mlx5: Rely on `link_active_reporting'
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (4 preceding siblings ...)
  2023-06-11 17:19 ` [PATCH v9 05/14] powerpc/eeh: Rely on `link_active_reporting' Maciej W. Rozycki
@ 2023-06-11 17:19 ` Maciej W. Rozycki
  2023-06-11 17:19 ` [PATCH v9 07/14] PCI: Export `pcie_retrain_link' for use outside ASPM Maciej W. Rozycki
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:19 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Use `link_active_reporting' to determine whether Data Link Layer Link 
Active Reporting is available rather than re-retrieving the capability.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
NB this has been compile-tested only with PPC64LE and x86-64 
configurations.

No change from v8.

Changes from v7:

- Reorder from 5/7.

Changes from v6:

- Regenerate against 6.3-rc5.

New change in v6.
---
 drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

linux-pcie-link-active-reporting-mlx5.diff
Index: linux-macro/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
===================================================================
--- linux-macro.orig/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ linux-macro/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -307,7 +307,6 @@ static int mlx5_pci_link_toggle(struct m
 	unsigned long timeout;
 	struct pci_dev *sdev;
 	int cap, err;
-	u32 reg32;
 
 	/* Check that all functions under the pci bridge are PFs of
 	 * this device otherwise fail this function.
@@ -346,11 +345,8 @@ static int mlx5_pci_link_toggle(struct m
 		return err;
 
 	/* Check link */
-	err = pci_read_config_dword(bridge, cap + PCI_EXP_LNKCAP, &reg32);
-	if (err)
-		return err;
-	if (!(reg32 & PCI_EXP_LNKCAP_DLLLARC)) {
-		mlx5_core_warn(dev, "No PCI link reporting capability (0x%08x)\n", reg32);
+	if (!bridge->link_active_reporting) {
+		mlx5_core_warn(dev, "No PCI link reporting capability\n");
 		msleep(1000);
 		goto restore;
 	}

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v9 07/14] PCI: Export `pcie_retrain_link' for use outside ASPM
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (5 preceding siblings ...)
  2023-06-11 17:19 ` [PATCH v9 06/14] net/mlx5: " Maciej W. Rozycki
@ 2023-06-11 17:19 ` Maciej W. Rozycki
  2023-06-11 17:19 ` [PATCH v9 08/14] PCI: Use distinct local vars in `pcie_retrain_link' Maciej W. Rozycki
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:19 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Export `pcie_retrain_link' for link retrain needs outside ASPM.  There 
is no functional change at this point, but `struct pcie_link_state' is 
local to ASPM and not used within `pcie_retrain_link' other than to get 
at the associated PCI device, so change the operand and adjust the lone 
call site accordingly.  Document the interface.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
There's a missing full stop added in a comment in the course of the move, 
not worth mentioning in the change description IMHO and not worth its own 
change either.  This comment will go away in a subsequent change anyway.

New change in v9.
---
 drivers/pci/pci.c       |   36 ++++++++++++++++++++++++++++++++++++
 drivers/pci/pci.h       |    1 +
 drivers/pci/pcie/aspm.c |   32 +-------------------------------
 3 files changed, 38 insertions(+), 31 deletions(-)

linux-pcie-retrain-link-export.diff
Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -4912,6 +4912,42 @@ bool pcie_wait_for_link(struct pci_dev *
 	return pcie_wait_for_link_delay(pdev, active, 100);
 }
 
+/**
+ * pcie_retrain_link - Request a link retrain and wait for it to complete
+ * @pdev: Device whose link to retrain.
+ *
+ * Return TRUE if successful, or FALSE if training has not completed
+ * within PCIE_LINK_RETRAIN_TIMEOUT_MS milliseconds.
+ */
+bool pcie_retrain_link(struct pci_dev *pdev)
+{
+	unsigned long end_jiffies;
+	u16 reg16;
+
+	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &reg16);
+	reg16 |= PCI_EXP_LNKCTL_RL;
+	pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, reg16);
+	if (pdev->clear_retrain_link) {
+		/*
+		 * Due to an erratum in some devices the Retrain Link bit
+		 * needs to be cleared again manually to allow the link
+		 * training to succeed.
+		 */
+		reg16 &= ~PCI_EXP_LNKCTL_RL;
+		pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, reg16);
+	}
+
+	/* Wait for link training end. Break out after waiting for timeout. */
+	end_jiffies = jiffies + msecs_to_jiffies(PCIE_LINK_RETRAIN_TIMEOUT_MS);
+	do {
+		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &reg16);
+		if (!(reg16 & PCI_EXP_LNKSTA_LT))
+			break;
+		msleep(1);
+	} while (time_before(jiffies, end_jiffies));
+	return !(reg16 & PCI_EXP_LNKSTA_LT);
+}
+
 /*
  * Find maximum D3cold delay required by all the devices on the bus.  The
  * spec says 100 ms, but firmware can lower it and we allow drivers to
Index: linux-macro/drivers/pci/pci.h
===================================================================
--- linux-macro.orig/drivers/pci/pci.h
+++ linux-macro/drivers/pci/pci.h
@@ -561,6 +561,7 @@ pci_ers_result_t pcie_do_recovery(struct
 		pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev));
 
 bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
+bool pcie_retrain_link(struct pci_dev *pdev);
 #ifdef CONFIG_PCIEASPM
 void pcie_aspm_init_link_state(struct pci_dev *pdev);
 void pcie_aspm_exit_link_state(struct pci_dev *pdev);
Index: linux-macro/drivers/pci/pcie/aspm.c
===================================================================
--- linux-macro.orig/drivers/pci/pcie/aspm.c
+++ linux-macro/drivers/pci/pcie/aspm.c
@@ -191,36 +191,6 @@ static void pcie_clkpm_cap_init(struct p
 	link->clkpm_disable = blacklist ? 1 : 0;
 }
 
-static bool pcie_retrain_link(struct pcie_link_state *link)
-{
-	struct pci_dev *parent = link->pdev;
-	unsigned long end_jiffies;
-	u16 reg16;
-
-	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
-	reg16 |= PCI_EXP_LNKCTL_RL;
-	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
-	if (parent->clear_retrain_link) {
-		/*
-		 * Due to an erratum in some devices the Retrain Link bit
-		 * needs to be cleared again manually to allow the link
-		 * training to succeed.
-		 */
-		reg16 &= ~PCI_EXP_LNKCTL_RL;
-		pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
-	}
-
-	/* Wait for link training end. Break out after waiting for timeout */
-	end_jiffies = jiffies + msecs_to_jiffies(PCIE_LINK_RETRAIN_TIMEOUT_MS);
-	do {
-		pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &reg16);
-		if (!(reg16 & PCI_EXP_LNKSTA_LT))
-			break;
-		msleep(1);
-	} while (time_before(jiffies, end_jiffies));
-	return !(reg16 & PCI_EXP_LNKSTA_LT);
-}
-
 /*
  * pcie_aspm_configure_common_clock: check if the 2 ends of a link
  *   could use common clock. If they are, configure them to use the
@@ -287,7 +257,7 @@ static void pcie_aspm_configure_common_c
 		reg16 &= ~PCI_EXP_LNKCTL_CCC;
 	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
 
-	if (pcie_retrain_link(link))
+	if (pcie_retrain_link(link->pdev))
 		return;
 
 	/* Training failed. Restore common clock configurations */

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v9 08/14] PCI: Use distinct local vars in `pcie_retrain_link'
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (6 preceding siblings ...)
  2023-06-11 17:19 ` [PATCH v9 07/14] PCI: Export `pcie_retrain_link' for use outside ASPM Maciej W. Rozycki
@ 2023-06-11 17:19 ` Maciej W. Rozycki
  2023-06-11 17:19 ` [PATCH v9 09/14] PCI: Factor our waiting for link training end Maciej W. Rozycki
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:19 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Use separate local variables to hold the respective values retrieved 
from the Link Control Register and the Link Status Register.  Not only 
it improves readability, but it makes it possible for the compiler to 
detect actual uninitialised use should this code change in the future.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
New change in v9.
---
 drivers/pci/pci.c |   19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

linux-pcie-retrain-link-lnkctl-lnksta.diff
Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -4922,30 +4922,31 @@ bool pcie_wait_for_link(struct pci_dev *
 bool pcie_retrain_link(struct pci_dev *pdev)
 {
 	unsigned long end_jiffies;
-	u16 reg16;
+	u16 lnkctl;
+	u16 lnksta;
 
-	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &reg16);
-	reg16 |= PCI_EXP_LNKCTL_RL;
-	pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, reg16);
+	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &lnkctl);
+	lnkctl |= PCI_EXP_LNKCTL_RL;
+	pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
 	if (pdev->clear_retrain_link) {
 		/*
 		 * Due to an erratum in some devices the Retrain Link bit
 		 * needs to be cleared again manually to allow the link
 		 * training to succeed.
 		 */
-		reg16 &= ~PCI_EXP_LNKCTL_RL;
-		pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, reg16);
+		lnkctl &= ~PCI_EXP_LNKCTL_RL;
+		pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
 	}
 
 	/* Wait for link training end. Break out after waiting for timeout. */
 	end_jiffies = jiffies + msecs_to_jiffies(PCIE_LINK_RETRAIN_TIMEOUT_MS);
 	do {
-		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &reg16);
-		if (!(reg16 & PCI_EXP_LNKSTA_LT))
+		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta);
+		if (!(lnksta & PCI_EXP_LNKSTA_LT))
 			break;
 		msleep(1);
 	} while (time_before(jiffies, end_jiffies));
-	return !(reg16 & PCI_EXP_LNKSTA_LT);
+	return !(lnksta & PCI_EXP_LNKSTA_LT);
 }
 
 /*

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v9 09/14] PCI: Factor our waiting for link training end
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (7 preceding siblings ...)
  2023-06-11 17:19 ` [PATCH v9 08/14] PCI: Use distinct local vars in `pcie_retrain_link' Maciej W. Rozycki
@ 2023-06-11 17:19 ` Maciej W. Rozycki
  2023-06-11 17:19 ` [PATCH v9 10/14] PCI: Add support for polling DLLLA to `pcie_retrain_link' Maciej W. Rozycki
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:19 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Move code polling for the Link Training bit to clear into a function of 
its own.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
New change in v9.
---
 drivers/pci/pci.c |   37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

linux-pcie-wait-for-link-status.diff
Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -4850,6 +4850,28 @@ static int pci_pm_reset(struct pci_dev *
 }
 
 /**
+ * pcie_wait_for_link_status - Wait for link training end
+ * @pdev: Device whose link to wait for.
+ *
+ * Return TRUE if successful, or FALSE if training has not completed
+ * within PCIE_LINK_RETRAIN_TIMEOUT_MS milliseconds.
+ */
+static bool pcie_wait_for_link_status(struct pci_dev *pdev)
+{
+	unsigned long end_jiffies;
+	u16 lnksta;
+
+	end_jiffies = jiffies + msecs_to_jiffies(PCIE_LINK_RETRAIN_TIMEOUT_MS);
+	do {
+		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta);
+		if (!(lnksta & PCI_EXP_LNKSTA_LT))
+			break;
+		msleep(1);
+	} while (time_before(jiffies, end_jiffies));
+	return !(lnksta & PCI_EXP_LNKSTA_LT);
+}
+
+/**
  * pcie_wait_for_link_delay - Wait until link is active or inactive
  * @pdev: Bridge device
  * @active: waiting for active or inactive?
@@ -4916,14 +4938,11 @@ bool pcie_wait_for_link(struct pci_dev *
  * pcie_retrain_link - Request a link retrain and wait for it to complete
  * @pdev: Device whose link to retrain.
  *
- * Return TRUE if successful, or FALSE if training has not completed
- * within PCIE_LINK_RETRAIN_TIMEOUT_MS milliseconds.
+ * Return TRUE if successful, or FALSE if training has not completed.
  */
 bool pcie_retrain_link(struct pci_dev *pdev)
 {
-	unsigned long end_jiffies;
 	u16 lnkctl;
-	u16 lnksta;
 
 	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &lnkctl);
 	lnkctl |= PCI_EXP_LNKCTL_RL;
@@ -4938,15 +4957,7 @@ bool pcie_retrain_link(struct pci_dev *p
 		pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
 	}
 
-	/* Wait for link training end. Break out after waiting for timeout. */
-	end_jiffies = jiffies + msecs_to_jiffies(PCIE_LINK_RETRAIN_TIMEOUT_MS);
-	do {
-		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta);
-		if (!(lnksta & PCI_EXP_LNKSTA_LT))
-			break;
-		msleep(1);
-	} while (time_before(jiffies, end_jiffies));
-	return !(lnksta & PCI_EXP_LNKSTA_LT);
+	return pcie_wait_for_link_status(pdev);
 }
 
 /*

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v9 10/14] PCI: Add support for polling DLLLA to `pcie_retrain_link'
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (8 preceding siblings ...)
  2023-06-11 17:19 ` [PATCH v9 09/14] PCI: Factor our waiting for link training end Maciej W. Rozycki
@ 2023-06-11 17:19 ` Maciej W. Rozycki
  2023-06-11 17:19 ` [PATCH v9 11/14] PCI: Use `pcie_wait_for_link_status' in `pcie_wait_for_link_delay' Maciej W. Rozycki
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:19 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Let the caller of `pcie_retrain_link' specify whether they want to use 
the LT bit or the DLLLA bit of the Link Status Register to determine if 
link training has completed.  It is up to the caller to verify whether 
the use of the DLLLA bit, the implementation of which is optional, is 
valid for the device requested.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
New change in v9.
---
 drivers/pci/pci.c       |   28 ++++++++++++++++++++--------
 drivers/pci/pci.h       |    2 +-
 drivers/pci/pcie/aspm.c |    2 +-
 3 files changed, 22 insertions(+), 10 deletions(-)

linux-pcie-retrain-link-use-lt.diff
Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -4850,25 +4850,32 @@ static int pci_pm_reset(struct pci_dev *
 }
 
 /**
- * pcie_wait_for_link_status - Wait for link training end
+ * pcie_wait_for_link_status - Wait for link status change
  * @pdev: Device whose link to wait for.
+ * @use_lt: Use the LT bit if TRUE, or the DLLLA bit if FALSE.
+ * @active: Waiting for active or inactive?
  *
- * Return TRUE if successful, or FALSE if training has not completed
- * within PCIE_LINK_RETRAIN_TIMEOUT_MS milliseconds.
+ * Return TRUE if successful, or FALSE if status has not changed within
+ * PCIE_LINK_RETRAIN_TIMEOUT_MS milliseconds.
  */
-static bool pcie_wait_for_link_status(struct pci_dev *pdev)
+static bool pcie_wait_for_link_status(struct pci_dev *pdev,
+				      bool use_lt, bool active)
 {
+	u16 lnksta_mask, lnksta_match;
 	unsigned long end_jiffies;
 	u16 lnksta;
 
+	lnksta_mask = use_lt ? PCI_EXP_LNKSTA_LT : PCI_EXP_LNKSTA_DLLLA;
+	lnksta_match = active ? lnksta_mask : 0;
+
 	end_jiffies = jiffies + msecs_to_jiffies(PCIE_LINK_RETRAIN_TIMEOUT_MS);
 	do {
 		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta);
-		if (!(lnksta & PCI_EXP_LNKSTA_LT))
+		if ((lnksta & lnksta_mask) == lnksta_match)
 			break;
 		msleep(1);
 	} while (time_before(jiffies, end_jiffies));
-	return !(lnksta & PCI_EXP_LNKSTA_LT);
+	return (lnksta & lnksta_mask) == lnksta_match;
 }
 
 /**
@@ -4937,10 +4944,15 @@ bool pcie_wait_for_link(struct pci_dev *
 /**
  * pcie_retrain_link - Request a link retrain and wait for it to complete
  * @pdev: Device whose link to retrain.
+ * @use_lt: Use the LT bit if TRUE, or the DLLLA bit if FALSE, for status.
+ *
+ * Retrain completion status is retrieved from the Link Status Register
+ * according to @use_lt.  It is not verified whether the use of the DLLLA
+ * bit is valid.
  *
  * Return TRUE if successful, or FALSE if training has not completed.
  */
-bool pcie_retrain_link(struct pci_dev *pdev)
+bool pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
 {
 	u16 lnkctl;
 
@@ -4957,7 +4969,7 @@ bool pcie_retrain_link(struct pci_dev *p
 		pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
 	}
 
-	return pcie_wait_for_link_status(pdev);
+	return pcie_wait_for_link_status(pdev, use_lt, !use_lt);
 }
 
 /*
Index: linux-macro/drivers/pci/pci.h
===================================================================
--- linux-macro.orig/drivers/pci/pci.h
+++ linux-macro/drivers/pci/pci.h
@@ -561,7 +561,7 @@ pci_ers_result_t pcie_do_recovery(struct
 		pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev));
 
 bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
-bool pcie_retrain_link(struct pci_dev *pdev);
+bool pcie_retrain_link(struct pci_dev *pdev, bool use_lt);
 #ifdef CONFIG_PCIEASPM
 void pcie_aspm_init_link_state(struct pci_dev *pdev);
 void pcie_aspm_exit_link_state(struct pci_dev *pdev);
Index: linux-macro/drivers/pci/pcie/aspm.c
===================================================================
--- linux-macro.orig/drivers/pci/pcie/aspm.c
+++ linux-macro/drivers/pci/pcie/aspm.c
@@ -257,7 +257,7 @@ static void pcie_aspm_configure_common_c
 		reg16 &= ~PCI_EXP_LNKCTL_CCC;
 	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
 
-	if (pcie_retrain_link(link->pdev))
+	if (pcie_retrain_link(link->pdev, true))
 		return;
 
 	/* Training failed. Restore common clock configurations */

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v9 11/14] PCI: Use `pcie_wait_for_link_status' in `pcie_wait_for_link_delay'
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (9 preceding siblings ...)
  2023-06-11 17:19 ` [PATCH v9 10/14] PCI: Add support for polling DLLLA to `pcie_retrain_link' Maciej W. Rozycki
@ 2023-06-11 17:19 ` Maciej W. Rozycki
  2023-06-11 17:20 ` [PATCH v9 12/14] PCI: Provide stub failed link recovery for device probing and hot plug Maciej W. Rozycki
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:19 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Remove a DLLLA status bit polling loop from `pcie_wait_for_link_delay' 
and call almost identical code in `pcie_wait_for_link_status' instead.  
This reduces the lower bound on the polling interval from 10ms to 1ms, 
possibly increasing the CPU load on the system in favour to reducing 
the wait time.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
New change in v9.
---
 drivers/pci/pci.c |   17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

linux-pcie-wait-for-link-delay-status.diff
Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -4889,16 +4889,14 @@ static bool pcie_wait_for_link_status(st
 static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
 				     int delay)
 {
-	int timeout = PCIE_LINK_RETRAIN_TIMEOUT_MS;
 	bool ret;
-	u16 lnk_status;
 
 	/*
 	 * Some controllers might not implement link active reporting. In this
 	 * case, we wait for 1000 ms + any delay requested by the caller.
 	 */
 	if (!pdev->link_active_reporting) {
-		msleep(timeout + delay);
+		msleep(PCIE_LINK_RETRAIN_TIMEOUT_MS + delay);
 		return true;
 	}
 
@@ -4913,20 +4911,11 @@ static bool pcie_wait_for_link_delay(str
 	 */
 	if (active)
 		msleep(20);
-	for (;;) {
-		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
-		ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
-		if (ret == active)
-			break;
-		if (timeout <= 0)
-			break;
-		msleep(10);
-		timeout -= 10;
-	}
+	ret = pcie_wait_for_link_status(pdev, false, active);
 	if (active && ret)
 		msleep(delay);
 
-	return ret == active;
+	return ret;
 }
 
 /**

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v9 12/14] PCI: Provide stub failed link recovery for device probing and hot plug
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (10 preceding siblings ...)
  2023-06-11 17:19 ` [PATCH v9 11/14] PCI: Use `pcie_wait_for_link_status' in `pcie_wait_for_link_delay' Maciej W. Rozycki
@ 2023-06-11 17:20 ` Maciej W. Rozycki
  2024-07-22 19:34   ` PCI: Work around PCIe link training failures Matthew W Carlis
  2023-06-11 17:20 ` [PATCH v9 13/14] PCI: Add failed link recovery for device reset events Maciej W. Rozycki
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:20 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

This now fails unconditionally and will be always optimised away, but 
provides for quirks to implement recovery for failed links detected in 
device probing and device hot plug events.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
New change in v9, factored out from 7/7:

- Rename `pcie_downstream_link_retrain' to `pcie_failed_link_retrain'.

- Add stub implementation in "pci.h".
---
 drivers/pci/pci.c   |    2 ++
 drivers/pci/pci.h   |    4 ++++
 drivers/pci/probe.c |    2 ++
 3 files changed, 8 insertions(+)

linux-pcie-failed-link-retrain.diff
Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -4912,6 +4912,8 @@ static bool pcie_wait_for_link_delay(str
 	if (active)
 		msleep(20);
 	ret = pcie_wait_for_link_status(pdev, false, active);
+	if (active && !ret)
+		ret = pcie_failed_link_retrain(pdev);
 	if (active && ret)
 		msleep(delay);
 
Index: linux-macro/drivers/pci/pci.h
===================================================================
--- linux-macro.orig/drivers/pci/pci.h
+++ linux-macro/drivers/pci/pci.h
@@ -554,6 +554,10 @@ static inline int pci_dev_specific_disab
 	return -ENOTTY;
 }
 #endif
+static inline bool pcie_failed_link_retrain(struct pci_dev *dev)
+{
+	return false;
+}
 
 /* PCI error reporting and recovery */
 pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
Index: linux-macro/drivers/pci/probe.c
===================================================================
--- linux-macro.orig/drivers/pci/probe.c
+++ linux-macro/drivers/pci/probe.c
@@ -2549,6 +2549,8 @@ void pci_device_add(struct pci_dev *dev,
 	dma_set_max_seg_size(&dev->dev, 65536);
 	dma_set_seg_boundary(&dev->dev, 0xffffffff);
 
+	pcie_failed_link_retrain(dev);
+
 	/* Fix up broken headers */
 	pci_fixup_device(pci_fixup_header, dev);
 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v9 13/14] PCI: Add failed link recovery for device reset events
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (11 preceding siblings ...)
  2023-06-11 17:20 ` [PATCH v9 12/14] PCI: Provide stub failed link recovery for device probing and hot plug Maciej W. Rozycki
@ 2023-06-11 17:20 ` Maciej W. Rozycki
  2023-06-11 17:20 ` [PATCH v9 14/14] PCI: Work around PCIe link training failures Maciej W. Rozycki
  2023-06-14 23:12 ` [PATCH v9 00/14] pci: Work around ASMedia ASM2824 " Bjorn Helgaas
  14 siblings, 0 replies; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:20 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Request failed link recovery with any upstream bridge where a device has 
not come back after reset within PCI_RESET_WAIT time.  Reset the polling 
interval if recovery succeeded, otherwise continue as usual.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
New change in v9, factored out from 7/7:

- Remove duplicate succesful completion report previously added (not sure 
  where it came from, possibly an unnoticed leftover from experiments).

- Make the type of `retrain' variable `bool' rather than `int' and invert
  the logic used.

- Rename `pcie_downstream_link_retrain' to `pcie_failed_link_retrain'.

- Rename `pcie_upstream_link_retrain' to `pcie_parent_link_retrain'.  Add 
  documentation.
---
 drivers/pci/pci.c |   39 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 5 deletions(-)

linux-pcie-dev-wait-link-retrain.diff
Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -1146,10 +1146,27 @@ void pci_resume_bus(struct pci_bus *bus)
 		pci_walk_bus(bus, pci_resume_one, NULL);
 }
 
+/**
+ * pcie_parent_link_retrain - Check and retrain link we are downstream from
+ * @dev: PCI device to handle.
+ *
+ * Return TRUE if the link was retrained, FALSE otherwise.
+ */
+static bool pcie_parent_link_retrain(struct pci_dev *dev)
+{
+	struct pci_dev *bridge;
+
+	bridge = pci_upstream_bridge(dev);
+	if (bridge)
+		return pcie_failed_link_retrain(bridge);
+	else
+		return false;
+}
+
 static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 {
+	bool retrain = true;
 	int delay = 1;
-	u32 id;
 
 	/*
 	 * After reset, the device should not silently discard config
@@ -1163,21 +1180,33 @@ static int pci_dev_wait(struct pci_dev *
 	 * Command register instead of Vendor ID so we don't have to
 	 * contend with the CRS SV value.
 	 */
-	pci_read_config_dword(dev, PCI_COMMAND, &id);
-	while (PCI_POSSIBLE_ERROR(id)) {
+	for (;;) {
+		u32 id;
+
+		pci_read_config_dword(dev, PCI_COMMAND, &id);
+		if (!PCI_POSSIBLE_ERROR(id))
+			break;
+
 		if (delay > timeout) {
 			pci_warn(dev, "not ready %dms after %s; giving up\n",
 				 delay - 1, reset_type);
 			return -ENOTTY;
 		}
 
-		if (delay > PCI_RESET_WAIT)
+		if (delay > PCI_RESET_WAIT) {
+			if (retrain) {
+				retrain = false;
+				if (pcie_parent_link_retrain(dev)) {
+					delay = 1;
+					continue;
+				}
+			}
 			pci_info(dev, "not ready %dms after %s; waiting\n",
 				 delay - 1, reset_type);
+		}
 
 		msleep(delay);
 		delay *= 2;
-		pci_read_config_dword(dev, PCI_COMMAND, &id);
 	}
 
 	if (delay > PCI_RESET_WAIT)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v9 14/14] PCI: Work around PCIe link training failures
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (12 preceding siblings ...)
  2023-06-11 17:20 ` [PATCH v9 13/14] PCI: Add failed link recovery for device reset events Maciej W. Rozycki
@ 2023-06-11 17:20 ` Maciej W. Rozycki
  2023-06-14 23:12 ` [PATCH v9 00/14] pci: Work around ASMedia ASM2824 " Bjorn Helgaas
  14 siblings, 0 replies; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-11 17:20 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Attempt to handle cases such as with a downstream port of the ASMedia 
ASM2824 PCIe switch where link training never completes and the link 
continues switching between speeds indefinitely with the data link layer 
never reaching the active state.

It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 
switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, 
P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the 
switches are supposed to negotiate the link speed of preferably 5.0GT/s, 
falling back to 2.5GT/s.

Instead the link continues oscillating between the two speeds, at the 
rate of 34-35 times per second, with link training reported repeatedly 
active ~84% of the time.  Forcibly limiting the target link speed to 
2.5GT/s with the upstream ASM2824 device however makes the two switches 
communicate correctly.  Removing the speed restriction afterwards makes 
the two devices switch to 5.0GT/s then.

Make use of these observations then and detect the inability to train 
the link, by checking for the Data Link Layer Link Active status bit 
being off while the Link Bandwidth Management Status indicating that 
hardware has changed the link speed or width in an attempt to correct 
unreliable link operation.

Restrict the speed to 2.5GT/s then with the Target Link Speed field, 
request a retrain and wait 200ms for the data link to go up.  If this 
turns out successful, then lift the restriction, letting the devices 
negotiate a higher speed.

Also check for a 2.5GT/s speed restriction the firmware may have already 
arranged and lift it too with ports of devices known to continue working 
afterwards, currently the ASM2824 only, that already report their data 
link being up.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2203022037020.56670@angie.orcam.me.uk/
Link: https://source.denx.de/u-boot/u-boot/-/commit/a398a51ccc68
---
Changes from v8:

- Rename `pcie_downstream_link_retrain' to `pcie_failed_link_retrain', add 
  a prototype in "pci.h", moving the stub implementation under !PCI_QUIRKS 
  umbrella.

- Move back to quirks.c, though as an internal API call rather than a 
  regular quirk.

- Adjust for PCIE_LINK_RETRAIN_TIMEOUT_MS expressed in milliseconds rather 
  than jiffies.

- Use a `pcie_retrain_link' call rather than retraining inline, and also 
  use it in the restriction lift path, making it another possible failure 
  point.

No changes from v7.

Changes from v6:

- Regenerate against 6.3-rc5.

- Shorten the lore.kernel.org archive link in the change description.

Changes from v5:

- Move from a quirk into PCI core and call at device probing, hot-plug,
  reset and resume.  Keep the ASMedia part under CONFIG_PCI_QUIRKS.

- Rely on `dev->link_active_reporting' rather than re-retrieving the 
  capability.

Changes from v4:

- Remove <linux/bug.h> inclusion no longer needed.

- Make the quirk generic based on probing device features rather than 
  specific to the ASM2824 part only; take the Retrain Link bit erratum 
  into account.

- Still lift the 2.5GT/s speed restriction with the ASM2824 only.

- Increase retrain timeout from 200ms to 1s (PCIE_LINK_RETRAIN_TIMEOUT).

- Remove retrain success notification.

- Use PCIe helpers rather than generic PCI functions throughout.

- Trim down and update the wording of the change description for the 
  switch from an ASM2824-specific to a generic fixup.

Changes from v3:

- Remove the <linux/pci_ids.h> entry for the ASM2824.

Changes from v2:

- Regenerate for 5.17-rc2 for a merge conflict.

- Replace BUG_ON for a missing PCI Express capability with WARN_ON and an
  early return.

Changes from v1:

- Regenerate for a merge conflict.
---
 drivers/pci/pci.h    |    3 +
 drivers/pci/quirks.c |   93 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+), 1 deletion(-)

linux-pcie-asm2824-manual-retrain.diff
Index: linux-macro/drivers/pci/pci.h
===================================================================
--- linux-macro.orig/drivers/pci/pci.h
+++ linux-macro/drivers/pci/pci.h
@@ -539,6 +539,7 @@ void pci_acs_init(struct pci_dev *dev);
 int pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags);
 int pci_dev_specific_enable_acs(struct pci_dev *dev);
 int pci_dev_specific_disable_acs_redir(struct pci_dev *dev);
+bool pcie_failed_link_retrain(struct pci_dev *dev);
 #else
 static inline int pci_dev_specific_acs_enabled(struct pci_dev *dev,
 					       u16 acs_flags)
@@ -553,11 +554,11 @@ static inline int pci_dev_specific_disab
 {
 	return -ENOTTY;
 }
-#endif
 static inline bool pcie_failed_link_retrain(struct pci_dev *dev)
 {
 	return false;
 }
+#endif
 
 /* PCI error reporting and recovery */
 pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -33,6 +33,99 @@
 #include <linux/switchtec.h>
 #include "pci.h"
 
+/*
+ * Retrain the link of a downstream PCIe port by hand if necessary.
+ *
+ * This is needed at least where a downstream port of the ASMedia ASM2824
+ * Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
+ * Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
+ * 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
+ * board.
+ *
+ * In such a configuration the switches are supposed to negotiate the link
+ * speed of preferably 5.0GT/s, falling back to 2.5GT/s.  However the link
+ * continues switching between the two speeds indefinitely and the data
+ * link layer never reaches the active state, with link training reported
+ * repeatedly active ~84% of the time.  Forcing the target link speed to
+ * 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
+ * each other correctly however.  And more interestingly retraining with a
+ * higher target link speed afterwards lets the two successfully negotiate
+ * 5.0GT/s.
+ *
+ * With the ASM2824 we can rely on the otherwise optional Data Link Layer
+ * Link Active status bit and in the failed link training scenario it will
+ * be off along with the Link Bandwidth Management Status indicating that
+ * hardware has changed the link speed or width in an attempt to correct
+ * unreliable link operation.  For a port that has been left unconnected
+ * both bits will be clear.  So use this information to detect the problem
+ * rather than polling the Link Training bit and watching out for flips or
+ * at least the active status.
+ *
+ * Since the exact nature of the problem isn't known and in principle this
+ * could trigger where an ASM2824 device is downstream rather upstream,
+ * apply this erratum workaround to any downstream ports as long as they
+ * support Link Active reporting and have the Link Control 2 register.
+ * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
+ * request a retrain and wait 200ms for the data link to go up.
+ *
+ * If this turns out successful and we know by the Vendor:Device ID it is
+ * safe to do so, then lift the restriction, letting the devices negotiate
+ * a higher speed.  Also check for a similar 2.5GT/s speed restriction the
+ * firmware may have already arranged and lift it with ports that already
+ * report their data link being up.
+ *
+ * Return TRUE if the link has been successfully retrained, otherwise FALSE.
+ */
+bool pcie_failed_link_retrain(struct pci_dev *dev)
+{
+	static const struct pci_device_id ids[] = {
+		{ PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
+		{}
+	};
+	u16 lnksta, lnkctl2;
+
+	if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
+	    !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
+		return false;
+
+	pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
+	pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+	if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
+	    PCI_EXP_LNKSTA_LBMS) {
+		pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
+
+		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
+		lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
+		pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
+
+		if (!pcie_retrain_link(dev, false)) {
+			pci_info(dev, "retraining failed\n");
+			return false;
+		}
+
+		pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+	}
+
+	if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
+	    (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
+	    pci_match_id(ids, dev)) {
+		u32 lnkcap;
+
+		pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
+		pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
+		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
+		lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;
+		pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
+
+		if (!pcie_retrain_link(dev, false)) {
+			pci_info(dev, "retraining failed\n");
+			return false;
+		}
+	}
+
+	return true;
+}
+
 static ktime_t fixup_debug_start(struct pci_dev *dev,
 				 void (*fn)(struct pci_dev *dev))
 {

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures
  2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (13 preceding siblings ...)
  2023-06-11 17:20 ` [PATCH v9 14/14] PCI: Work around PCIe link training failures Maciej W. Rozycki
@ 2023-06-14 23:12 ` Bjorn Helgaas
  2023-06-15  0:41   ` Maciej W. Rozycki
  14 siblings, 1 reply; 44+ messages in thread
From: Bjorn Helgaas @ 2023-06-14 23:12 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: linux-pci, linux-kernel, Eric Dumazet, Oliver O'Halloran,
	Stefan Roese, Leon Romanovsky, linux-rdma, Jakub Kicinski,
	Paolo Abeni, Jim Wilson, Nicholas Piggin, Alex Williamson,
	Bjorn Helgaas, Mika Westerberg, David Abdurachmanov, linuxppc-dev,
	Mahesh J Salgaonkar, David S. Miller, Lukas Wunner, netdev,
	Pali Rohár, Saeed Mahameed

On Sun, Jun 11, 2023 at 06:19:08PM +0100, Maciej W. Rozycki wrote:
> Hi,
> 
>  This is v9 of the change to work around a PCIe link training phenomenon 
> where a pair of devices both capable of operating at a link speed above 
> 2.5GT/s seems unable to negotiate the link speed and continues training 
> indefinitely with the Link Training bit switching on and off repeatedly 
> and the data link layer never reaching the active state.
> 
>  With several requests addressed and a few extra issues spotted this
> version has now grown to 14 patches.  It has been verified for device 
> enumeration with and without PCI_QUIRKS enabled, using the same piece of 
> RISC-V hardware as previously.  Hot plug or reset events have not been 
> verified, as this is difficult if at all feasible with hardware in 
> question.
> 
>  Last iteration: 
> <https://lore.kernel.org/r/alpine.DEB.2.21.2304060100160.13659@angie.orcam.me.uk/>, 
> and my input to it:
> <https://lore.kernel.org/r/alpine.DEB.2.21.2306080224280.36323@angie.orcam.me.uk/>.

Thanks, I applied these to pci/enumeration for v6.5.

I tweaked a few things, so double-check to be sure I didn't break
something:

  - Moved dev->link_active_reporting init to set_pcie_port_type()
    because it does other PCIe-related stuff.

  - Reordered to keep all the link_active_reporting things together.

  - Reordered to clean up & factor pcie_retrain_link() before exposing
    it to the rest of the PCI core.

  - Moved pcie_retrain_link() a little earlier to keep it next to
    pcie_wait_for_link_status().

  - Squashed the stubs into the actual quirk so we don't have the
    intermediate state where we call the stubs but they never do
    anything (let me know if there's a reason we need your order).

  - Inline pcie_parent_link_retrain(), which seemed like it didn't add
    enough to be worthwhile.

Interdiff below:

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 80694e2574b8..f11268924c8f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1153,27 +1153,16 @@ void pci_resume_bus(struct pci_bus *bus)
 		pci_walk_bus(bus, pci_resume_one, NULL);
 }
 
-/**
- * pcie_parent_link_retrain - Check and retrain link we are downstream from
- * @dev: PCI device to handle.
- *
- * Return TRUE if the link was retrained, FALSE otherwise.
- */
-static bool pcie_parent_link_retrain(struct pci_dev *dev)
-{
-	struct pci_dev *bridge;
-
-	bridge = pci_upstream_bridge(dev);
-	if (bridge)
-		return pcie_failed_link_retrain(bridge);
-	else
-		return false;
-}
-
 static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 {
-	bool retrain = true;
 	int delay = 1;
+	bool retrain = false;
+	struct pci_dev *bridge;
+
+	if (pci_is_pcie(dev)) {
+		retrain = true;
+		bridge = pci_upstream_bridge(dev);
+	}
 
 	/*
 	 * After reset, the device should not silently discard config
@@ -1201,9 +1190,9 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 		}
 
 		if (delay > PCI_RESET_WAIT) {
-			if (retrain) {
+			if (retrain && bridge) {
 				retrain = false;
-				if (pcie_parent_link_retrain(dev)) {
+				if (pcie_failed_link_retrain(bridge)) {
 					delay = 1;
 					continue;
 				}
@@ -4914,6 +4903,38 @@ static bool pcie_wait_for_link_status(struct pci_dev *pdev,
 	return (lnksta & lnksta_mask) == lnksta_match;
 }
 
+/**
+ * pcie_retrain_link - Request a link retrain and wait for it to complete
+ * @pdev: Device whose link to retrain.
+ * @use_lt: Use the LT bit if TRUE, or the DLLLA bit if FALSE, for status.
+ *
+ * Retrain completion status is retrieved from the Link Status Register
+ * according to @use_lt.  It is not verified whether the use of the DLLLA
+ * bit is valid.
+ *
+ * Return TRUE if successful, or FALSE if training has not completed
+ * within PCIE_LINK_RETRAIN_TIMEOUT_MS milliseconds.
+ */
+bool pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
+{
+	u16 lnkctl;
+
+	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &lnkctl);
+	lnkctl |= PCI_EXP_LNKCTL_RL;
+	pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
+	if (pdev->clear_retrain_link) {
+		/*
+		 * Due to an erratum in some devices the Retrain Link bit
+		 * needs to be cleared again manually to allow the link
+		 * training to succeed.
+		 */
+		lnkctl &= ~PCI_EXP_LNKCTL_RL;
+		pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
+	}
+
+	return pcie_wait_for_link_status(pdev, use_lt, !use_lt);
+}
+
 /**
  * pcie_wait_for_link_delay - Wait until link is active or inactive
  * @pdev: Bridge device
@@ -4968,37 +4989,6 @@ bool pcie_wait_for_link(struct pci_dev *pdev, bool active)
 	return pcie_wait_for_link_delay(pdev, active, 100);
 }
 
-/**
- * pcie_retrain_link - Request a link retrain and wait for it to complete
- * @pdev: Device whose link to retrain.
- * @use_lt: Use the LT bit if TRUE, or the DLLLA bit if FALSE, for status.
- *
- * Retrain completion status is retrieved from the Link Status Register
- * according to @use_lt.  It is not verified whether the use of the DLLLA
- * bit is valid.
- *
- * Return TRUE if successful, or FALSE if training has not completed.
- */
-bool pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
-{
-	u16 lnkctl;
-
-	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &lnkctl);
-	lnkctl |= PCI_EXP_LNKCTL_RL;
-	pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
-	if (pdev->clear_retrain_link) {
-		/*
-		 * Due to an erratum in some devices the Retrain Link bit
-		 * needs to be cleared again manually to allow the link
-		 * training to succeed.
-		 */
-		lnkctl &= ~PCI_EXP_LNKCTL_RL;
-		pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
-	}
-
-	return pcie_wait_for_link_status(pdev, use_lt, !use_lt);
-}
-
 /*
  * Find maximum D3cold delay required by all the devices on the bus.  The
  * spec says 100 ms, but firmware can lower it and we allow drivers to
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 016a9d4a61f7..f547db0a728f 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1526,6 +1526,7 @@ void set_pcie_port_type(struct pci_dev *pdev)
 {
 	int pos;
 	u16 reg16;
+	u32 reg32;
 	int type;
 	struct pci_dev *parent;
 
@@ -1539,6 +1540,10 @@ void set_pcie_port_type(struct pci_dev *pdev)
 	pci_read_config_dword(pdev, pos + PCI_EXP_DEVCAP, &pdev->devcap);
 	pdev->pcie_mpss = FIELD_GET(PCI_EXP_DEVCAP_PAYLOAD, pdev->devcap);
 
+	pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &reg32);
+	if (reg32 & PCI_EXP_LNKCAP_DLLLARC)
+		pdev->link_active_reporting = 1;
+
 	parent = pci_upstream_bridge(pdev);
 	if (!parent)
 		return;
@@ -1828,7 +1833,6 @@ int pci_setup_device(struct pci_dev *dev)
 	int err, pos = 0;
 	struct pci_bus_region region;
 	struct resource *res;
-	u32 linkcap;
 
 	hdr_type = pci_hdr_type(dev);
 
@@ -1876,10 +1880,6 @@ int pci_setup_device(struct pci_dev *dev)
 	/* "Unknown power state" */
 	dev->current_state = PCI_UNKNOWN;
 
-	/* Set it early to make it available to fixups, etc.  */
-	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
-	dev->link_active_reporting = !!(linkcap & PCI_EXP_LNKCAP_DLLLARC);
-
 	/* Early fixups, before probing the BARs */
 	pci_fixup_device(pci_fixup_early, dev);
 

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures
  2023-06-14 23:12 ` [PATCH v9 00/14] pci: Work around ASMedia ASM2824 " Bjorn Helgaas
@ 2023-06-15  0:41   ` Maciej W. Rozycki
  2023-06-15 18:37     ` Bjorn Helgaas
  0 siblings, 1 reply; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-15  0:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Eric Dumazet, Oliver O'Halloran,
	Stefan Roese, Leon Romanovsky, linux-rdma, Jakub Kicinski,
	Paolo Abeni, Jim Wilson, Nicholas Piggin, Alex Williamson,
	Bjorn Helgaas, Mika Westerberg, David Abdurachmanov, linuxppc-dev,
	Mahesh J Salgaonkar, David S. Miller, Lukas Wunner, netdev,
	Pali Rohár, Saeed Mahameed

On Wed, 14 Jun 2023, Bjorn Helgaas wrote:

> >  This is v9 of the change to work around a PCIe link training phenomenon 
> > where a pair of devices both capable of operating at a link speed above 
> > 2.5GT/s seems unable to negotiate the link speed and continues training 
> > indefinitely with the Link Training bit switching on and off repeatedly 
> > and the data link layer never reaching the active state.
> > 
> >  With several requests addressed and a few extra issues spotted this
> > version has now grown to 14 patches.  It has been verified for device 
> > enumeration with and without PCI_QUIRKS enabled, using the same piece of 
> > RISC-V hardware as previously.  Hot plug or reset events have not been 
> > verified, as this is difficult if at all feasible with hardware in 
> > question.
> > 
> >  Last iteration: 
> > <https://lore.kernel.org/r/alpine.DEB.2.21.2304060100160.13659@angie.orcam.me.uk/>, 
> > and my input to it:
> > <https://lore.kernel.org/r/alpine.DEB.2.21.2306080224280.36323@angie.orcam.me.uk/>.
> 
> Thanks, I applied these to pci/enumeration for v6.5.

 Great, thanks!

> I tweaked a few things, so double-check to be sure I didn't break
> something:
> 
>   - Moved dev->link_active_reporting init to set_pcie_port_type()
>     because it does other PCIe-related stuff.
> 
>   - Reordered to keep all the link_active_reporting things together.
> 
>   - Reordered to clean up & factor pcie_retrain_link() before exposing
>     it to the rest of the PCI core.
> 
>   - Moved pcie_retrain_link() a little earlier to keep it next to
>     pcie_wait_for_link_status().
> 
>   - Squashed the stubs into the actual quirk so we don't have the
>     intermediate state where we call the stubs but they never do
>     anything (let me know if there's a reason we need your order).
> 
>   - Inline pcie_parent_link_retrain(), which seemed like it didn't add
>     enough to be worthwhile.

 Ack, I'll double-check and report back.  A minor nit I've spotted below:

>  static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>  {
> -	bool retrain = true;
>  	int delay = 1;
> +	bool retrain = false;
> +	struct pci_dev *bridge;
> +
> +	if (pci_is_pcie(dev)) {
> +		retrain = true;
> +		bridge = pci_upstream_bridge(dev);
> +	}

 If doing it this way, which I actually like, I think it would be a little 
bit better performance- and style-wise if this was written as:

	if (pci_is_pcie(dev)) {
		bridge = pci_upstream_bridge(dev);
		retrain = !!bridge;
	}

(or "retrain = bridge != NULL" if you prefer this style), and then we 
don't have to repeatedly check two variables iff (pcie && !bridge) in the 
loop below:

> @@ -1201,9 +1190,9 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>  		}
>  
>  		if (delay > PCI_RESET_WAIT) {
> -			if (retrain) {
> +			if (retrain && bridge) {

-- i.e. code can stay then as:

			if (retrain) {

here.  I hope you find this observation rather obvious, so will you amend 
your tree, or shall I send an incremental update?

 Otherwise I don't find anything suspicious with the interdiff itself 
(thanks for posting it, that's really useful indeed!), but as I say I'll 
yet double-check how things look and work with your tree.  Hopefully 
tomorrow (Thu), as I have other stuff yet to complete tonight.

  Maciej

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures
  2023-06-15  0:41   ` Maciej W. Rozycki
@ 2023-06-15 18:37     ` Bjorn Helgaas
  2023-06-16 12:27       ` Maciej W. Rozycki
  0 siblings, 1 reply; 44+ messages in thread
From: Bjorn Helgaas @ 2023-06-15 18:37 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: linux-pci, linux-kernel, Eric Dumazet, Oliver O'Halloran,
	Stefan Roese, Leon Romanovsky, linux-rdma, Jakub Kicinski,
	Paolo Abeni, Jim Wilson, Nicholas Piggin, Alex Williamson,
	Bjorn Helgaas, Mika Westerberg, David Abdurachmanov, linuxppc-dev,
	Mahesh J Salgaonkar, David S. Miller, Lukas Wunner, netdev,
	Pali Rohár, Saeed Mahameed

On Thu, Jun 15, 2023 at 01:41:10AM +0100, Maciej W. Rozycki wrote:
> On Wed, 14 Jun 2023, Bjorn Helgaas wrote:
> 
> > >  This is v9 of the change to work around a PCIe link training phenomenon 
> > > where a pair of devices both capable of operating at a link speed above 
> > > 2.5GT/s seems unable to negotiate the link speed and continues training 
> > > indefinitely with the Link Training bit switching on and off repeatedly 
> > > and the data link layer never reaching the active state.
> > > 
> > >  With several requests addressed and a few extra issues spotted this
> > > version has now grown to 14 patches.  It has been verified for device 
> > > enumeration with and without PCI_QUIRKS enabled, using the same piece of 
> > > RISC-V hardware as previously.  Hot plug or reset events have not been 
> > > verified, as this is difficult if at all feasible with hardware in 
> > > question.

> >  static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> >  {
> > -	bool retrain = true;
> >  	int delay = 1;
> > +	bool retrain = false;
> > +	struct pci_dev *bridge;
> > +
> > +	if (pci_is_pcie(dev)) {
> > +		retrain = true;
> > +		bridge = pci_upstream_bridge(dev);
> > +	}
> 
>  If doing it this way, which I actually like, I think it would be a little 
> bit better performance- and style-wise if this was written as:
> 
> 	if (pci_is_pcie(dev)) {
> 		bridge = pci_upstream_bridge(dev);
> 		retrain = !!bridge;
> 	}
> 
> (or "retrain = bridge != NULL" if you prefer this style), and then we 
> don't have to repeatedly check two variables iff (pcie && !bridge) in the 
> loop below:

Done, thanks, I do like that better.  I did:

  bridge = pci_upstream_bridge(dev);
  if (bridge)
    retrain = true;

because it seems like it flows more naturally when reading.

Bjorn

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures
  2023-06-15 18:37     ` Bjorn Helgaas
@ 2023-06-16 12:27       ` Maciej W. Rozycki
  2023-06-16 20:29         ` Bjorn Helgaas
  0 siblings, 1 reply; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-16 12:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Eric Dumazet, Oliver O'Halloran,
	Stefan Roese, Leon Romanovsky, linux-rdma, Jakub Kicinski,
	Paolo Abeni, Jim Wilson, Nicholas Piggin, Alex Williamson,
	Bjorn Helgaas, Mika Westerberg, David Abdurachmanov, linuxppc-dev,
	Mahesh J Salgaonkar, David S. Miller, Lukas Wunner, netdev,
	Pali Rohár, Saeed Mahameed

On Thu, 15 Jun 2023, Bjorn Helgaas wrote:

> >  If doing it this way, which I actually like, I think it would be a little 
> > bit better performance- and style-wise if this was written as:
> > 
> > 	if (pci_is_pcie(dev)) {
> > 		bridge = pci_upstream_bridge(dev);
> > 		retrain = !!bridge;
> > 	}
> > 
> > (or "retrain = bridge != NULL" if you prefer this style), and then we 
> > don't have to repeatedly check two variables iff (pcie && !bridge) in the 
> > loop below:
> 
> Done, thanks, I do like that better.  I did:
> 
>   bridge = pci_upstream_bridge(dev);
>   if (bridge)
>     retrain = true;
> 
> because it seems like it flows more naturally when reading.

 Perfect, and good timing too, as I have just started checking your tree 
as your message arrived.  I ran my usual tests with and w/o PCI_QUIRKS 
enabled and results were as expected.  As before I didn't check hot plug 
and reset paths as these features are awkward with the HiFive Unmatched 
system involved.

 I have skimmed over the changes as committed to pci/enumeration and found 
nothing suspicious.  I have verified that the tree builds as at each of 
them with my configuration.

 As per my earlier remark:

> I think making a system halfway-fixed would make little sense, but with
> the actual fix actually made last as you suggested I think this can be
> split off, because it'll make no functional change by itself.

I am not perfectly happy with your rearrangement to fold the !PCI_QUIRKS 
stub into the change carrying the actual workaround and then have the 
reset path update with a follow-up change only, but I won't fight over it.  
It's only one tree revision that will be in this halfway-fixed state and 
I'll trust your judgement here.

 Let me know if anything pops up related to these changes anytime and I'll 
be happy to look into it.  The system involved is nearing two years since 
its deployment already, but hopefully it has many years to go yet and will 
continue being ready to verify things.  It's not that there's lots of real 
RISC-V hardware available, let alone with PCI/e connectivity.

 Thank you for staying with me and reviewing this patch series through all 
the iterations.

  Maciej

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures
  2023-06-16 12:27       ` Maciej W. Rozycki
@ 2023-06-16 20:29         ` Bjorn Helgaas
  2023-06-20  9:54           ` Maciej W. Rozycki
  0 siblings, 1 reply; 44+ messages in thread
From: Bjorn Helgaas @ 2023-06-16 20:29 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: linux-pci, linux-kernel, Eric Dumazet, Oliver O'Halloran,
	Stefan Roese, Leon Romanovsky, linux-rdma, Jakub Kicinski,
	Paolo Abeni, Jim Wilson, Nicholas Piggin, Alex Williamson,
	Bjorn Helgaas, Mika Westerberg, David Abdurachmanov, linuxppc-dev,
	Mahesh J Salgaonkar, David S. Miller, Lukas Wunner, netdev,
	Pali Rohár, Saeed Mahameed

On Fri, Jun 16, 2023 at 01:27:52PM +0100, Maciej W. Rozycki wrote:
> On Thu, 15 Jun 2023, Bjorn Helgaas wrote:

>  As per my earlier remark:
> 
> > I think making a system halfway-fixed would make little sense, but with
> > the actual fix actually made last as you suggested I think this can be
> > split off, because it'll make no functional change by itself.
> 
> I am not perfectly happy with your rearrangement to fold the !PCI_QUIRKS 
> stub into the change carrying the actual workaround and then have the 
> reset path update with a follow-up change only, but I won't fight over it.  
> It's only one tree revision that will be in this halfway-fixed state and 
> I'll trust your judgement here.

Thanks for raising this.  Here's my thought process:

  12 PCI: Provide stub failed link recovery for device probing and hot plug
  13 PCI: Add failed link recovery for device reset events
  14 PCI: Work around PCIe link training failures

Patch 12 [1] adds calls to pcie_failed_link_retrain(), which does
nothing and returns false.  Functionally, it's a no-op, but the
structure is important later.

Patch 13 [2] claims to request failed link recovery after resets, but
actually doesn't do anything yet because pcie_failed_link_retrain() is
still a no-op, so this was a bit confusing.

Patch 14 [3] implements pcie_failed_link_retrain(), so the recovery
mentioned in 12 and 13 actually happens.  But this patch doesn't add
the call to pcie_failed_link_retrain(), so it's a little bit hard to
connect the dots.

I agree that as I rearranged it, the workaround doesn't apply in all
cases simultaneously.  Maybe not ideal, but maybe not terrible either.
Looking at it again, maybe it would have made more sense to move the
pcie_wait_for_link_delay() change to the last patch along with the
pci_dev_wait() change.  I dunno.

Bjorn

[1] 12 https://lore.kernel.org/r/alpine.DEB.2.21.2306111619570.64925@angie.orcam.me.uk
[2] 13 https://lore.kernel.org/r/alpine.DEB.2.21.2306111631050.64925@angie.orcam.me.uk
[3] 14 https://lore.kernel.org/r/alpine.DEB.2.21.2305310038540.59226@angie.orcam.me.uk

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures
  2023-06-16 20:29         ` Bjorn Helgaas
@ 2023-06-20  9:54           ` Maciej W. Rozycki
  2024-08-06  0:06             ` PCI: Work around " Matthew W Carlis
  0 siblings, 1 reply; 44+ messages in thread
From: Maciej W. Rozycki @ 2023-06-20  9:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Eric Dumazet, Oliver O'Halloran,
	Stefan Roese, Leon Romanovsky, linux-rdma, Jakub Kicinski,
	Paolo Abeni, Jim Wilson, Nicholas Piggin, Alex Williamson,
	Bjorn Helgaas, Mika Westerberg, David Abdurachmanov, linuxppc-dev,
	Mahesh J Salgaonkar, David S. Miller, Lukas Wunner, netdev,
	Pali Rohár, Saeed Mahameed

On Fri, 16 Jun 2023, Bjorn Helgaas wrote:

> I agree that as I rearranged it, the workaround doesn't apply in all
> cases simultaneously.  Maybe not ideal, but maybe not terrible either.
> Looking at it again, maybe it would have made more sense to move the
> pcie_wait_for_link_delay() change to the last patch along with the
> pci_dev_wait() change.  I dunno.

 I think the order of the changes is not important enough to justify 
spending a lot of time and mental effort on it.  You decided, so be it.  
Thank you for your effort made with this review.

 With this series out of the way I have now posted a small clean-up for 
SBR code duplication between PCI core and an InfiniBand driver I came 
across in the course of working on this series.  See 
<https://lore.kernel.org/r/alpine.DEB.2.21.2306200153110.14084@angie.orcam.me.uk/>.

 Please have a look at your convenience.

  Maciej

^ permalink raw reply	[flat|nested] 44+ messages in thread

* PCI: Work around PCIe link training failures
  2023-06-11 17:20 ` [PATCH v9 12/14] PCI: Provide stub failed link recovery for device probing and hot plug Maciej W. Rozycki
@ 2024-07-22 19:34   ` Matthew W Carlis
  2024-07-22 20:40     ` Maciej W. Rozycki
  0 siblings, 1 reply; 44+ messages in thread
From: Matthew W Carlis @ 2024-07-22 19:34 UTC (permalink / raw)
  To: macro
  Cc: linux-pci, mahesh, edumazet, oohall, sr, leon, linux-rdma,
	christophe.leroy, kuba, pabeni, wilson, linuxppc-dev, npiggin,
	alex.williamson, bhelgaas, mika.westerberg, david.abdurachmanov,
	saeedm, linux-kernel, lukas, netdev, pali, davem

Sorry to resurrect this one, but I was wondering why the
PCI device ID in drivers/pci/quirks.c for the ASMedia ASM2824
isn't checked before forcing the link down to Gen1... We have
had to revert this patch during our kernel migration due to it
interacting poorly with at least one older Gen3 PLX PCIe switch
vendor/generation while using DPC. In another context we have
found similar issues during system bringup without DPC while
using a more legacy hot-plug model (BIOS defaults for us..).
In both contexts our devices are stuck at Gen1 after physical
hot-plug/insert, power-cycle.

Tried reading through the patch history/review but it was still
a little bit unclear to me. Can we add the device ID check as a
precondition to forcing link to Gen1?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PCI: Work around PCIe link training failures
  2024-07-22 19:34   ` PCI: Work around PCIe link training failures Matthew W Carlis
@ 2024-07-22 20:40     ` Maciej W. Rozycki
  2024-07-24 19:18       ` Matthew W Carlis
  0 siblings, 1 reply; 44+ messages in thread
From: Maciej W. Rozycki @ 2024-07-22 20:40 UTC (permalink / raw)
  To: Matthew W Carlis
  Cc: linux-pci, mahesh, edumazet, oohall, sr, leon, linux-rdma,
	christophe.leroy, kuba, pabeni, Jim Wilson, linuxppc-dev, npiggin,
	alex.williamson, Ilpo Järvinen, Bjorn Helgaas,
	mika.westerberg, david.abdurachmanov, netdev, linux-kernel, lukas,
	saeedm, pali, David S. Miller

[+cc Ilpo for his previous involvement here]

On Mon, 22 Jul 2024, Matthew W Carlis wrote:

> Sorry to resurrect this one, but I was wondering why the
> PCI device ID in drivers/pci/quirks.c for the ASMedia ASM2824
> isn't checked before forcing the link down to Gen1... We have
> had to revert this patch during our kernel migration due to it
> interacting poorly with at least one older Gen3 PLX PCIe switch
> vendor/generation while using DPC. In another context we have
> found similar issues during system bringup without DPC while
> using a more legacy hot-plug model (BIOS defaults for us..).
> In both contexts our devices are stuck at Gen1 after physical
> hot-plug/insert, power-cycle.

 Sorry to hear about your problems.  However the workaround is supposed to 
only trigger if the link has already failed negotiation.  Could you please 
be more specific as to the actual scenario where it triggers?

 A scenario was mentioned earlier on, where a downstream device has been 
removed from a slot and left behind the LBMS bit set in the corresponding 
downstream port of the upstream device.  It then triggered the workaround 
where the port was rescanned with the slot still empty, which then left 
the link capped at 2.5GT/s for a device subsequently inserted.  Is it what 
happens for you?

 As I recall Ilpo has been working on changes that among others should 
make sure no stale LBMS bit has been left set, but I'm not sure what the 
state of affairs has been here.  Myself I've been too swamped in the 
recent months and consequently didn't look into any improvements in this 
area (and unrelated issues involving the system in question in my remote 
lab have further impeded me).

> Tried reading through the patch history/review but it was still
> a little bit unclear to me. Can we add the device ID check as a
> precondition to forcing link to Gen1?

 The main reason is it is believed that it is the downstream device 
causing the issue, and obviously you can't fetch its ID if you can't 
negotiate link so as to talk to it in the first place.

  Maciej

^ permalink raw reply	[flat|nested] 44+ messages in thread

* PCI: Work around PCIe link training failures
  2024-07-22 20:40     ` Maciej W. Rozycki
@ 2024-07-24 19:18       ` Matthew W Carlis
  2024-07-26  8:04         ` Matthew W Carlis
  0 siblings, 1 reply; 44+ messages in thread
From: Matthew W Carlis @ 2024-07-24 19:18 UTC (permalink / raw)
  To: macro
  Cc: linux-pci, mahesh, edumazet, oohall, sr, leon, linux-rdma,
	christophe.leroy, kuba, pabeni, wilson, npiggin, alex.williamson,
	ilpo.jarvinen, bhelgaas, mika.westerberg, pali,
	david.abdurachmanov, netdev, linux-kernel, lukas, mattc, saeedm,
	linuxppc-dev, davem

Sorry for belated response. I wasn't really sure when you first asked & I
still only have a 'hand wavy' theory here. I think one thing that is getting
us in trouble is when we turn the endpoint device on, then off, wait for a
little while then turn it back on. It seems that the port here in this case
is forced to Gen1 & there is not any path for the kernel to allow it to
try another alternative again without an informed user to write the register.

I'm still trying to barter for the time to really deeply dive into this so
must apologize if this sounds crazy or couldn't be correct.

- Matt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* PCI: Work around PCIe link training failures
  2024-07-24 19:18       ` Matthew W Carlis
@ 2024-07-26  8:04         ` Matthew W Carlis
  2024-07-29 10:27           ` Ilpo Järvinen
  0 siblings, 1 reply; 44+ messages in thread
From: Matthew W Carlis @ 2024-07-26  8:04 UTC (permalink / raw)
  To: macro
  Cc: linux-pci, mahesh, edumazet, oohall, sr, leon, linux-rdma,
	christophe.leroy, kuba, pabeni, wilson, npiggin, alex.williamson,
	ilpo.jarvinen, bhelgaas, mika.westerberg, pali,
	david.abdurachmanov, netdev, linux-kernel, lukas, mattc, saeedm,
	linuxppc-dev, davem

On Mon, 22 Jul 2024, Maciej W. Rozycki wrote:

> The main reason is it is believed that it is the downstream device
> causing the issue, and obviously you can't fetch its ID if you can't
> negotiate link so as to talk to it in the first place.

Have had some more time to look into this issue. So, I think the problem
with this change is that it is quite strict in its assumptions about what
it means when a device fails to train, but in an environment where hot-plug
is exercised frequently you are essentially bound have something interrupt
the link training. In the first case where we caught this problem our test
automation was doing some power cycle tortures on our endpoints. If you catch
the right timing the link will be forced down to Gen1 forever without some other
automation to recover you unless your device is the one single device in the
allowlist which had the hardware bug in the first place.

I wonder if we can come up with some kind of alternative.

- Matt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PCI: Work around PCIe link training failures
  2024-07-26  8:04         ` Matthew W Carlis
@ 2024-07-29 10:27           ` Ilpo Järvinen
  2024-07-29 14:51             ` Maciej W. Rozycki
  0 siblings, 1 reply; 44+ messages in thread
From: Ilpo Järvinen @ 2024-07-29 10:27 UTC (permalink / raw)
  To: Matthew W Carlis
  Cc: linux-pci, mahesh, edumazet, oohall, sr, leon, linux-rdma,
	christophe.leroy, kuba, pabeni, wilson, linuxppc-dev, npiggin,
	alex.williamson, bhelgaas, Mika Westerberg, david.abdurachmanov,
	Netdev, LKML, Lukas Wunner, saeedm, pali, davem, macro

On Fri, 26 Jul 2024, Matthew W Carlis wrote:

> On Mon, 22 Jul 2024, Maciej W. Rozycki wrote:
> 
> > The main reason is it is believed that it is the downstream device
> > causing the issue, and obviously you can't fetch its ID if you can't
> > negotiate link so as to talk to it in the first place.
> 
> Have had some more time to look into this issue. So, I think the problem
> with this change is that it is quite strict in its assumptions about what
> it means when a device fails to train, but in an environment where hot-plug
> is exercised frequently you are essentially bound have something interrupt
> the link training. In the first case where we caught this problem our test
> automation was doing some power cycle tortures on our endpoints. If you catch
> the right timing the link will be forced down to Gen1 forever without some other
> automation to recover you unless your device is the one single device in the
> allowlist which had the hardware bug in the first place.
> 
> I wonder if we can come up with some kind of alternative.

The most obvious solution is to not leave the speed at Gen1 on failure in 
Target Speed quirk but to restore the original Target Speed value. The 
downside with that is if the current retraining interface (function) is 
used, it adds delay. But the retraining functions could be reworked such 
that the retraining is only triggered in case the Target Speed quirk 
fails but we don't wait for its result (which will very likely fail 
anyway).

-- 
 i.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PCI: Work around PCIe link training failures
  2024-07-29 10:27           ` Ilpo Järvinen
@ 2024-07-29 14:51             ` Maciej W. Rozycki
  2024-07-29 18:56               ` Matthew W Carlis
  0 siblings, 1 reply; 44+ messages in thread
From: Maciej W. Rozycki @ 2024-07-29 14:51 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: linux-pci, mahesh, edumazet, oohall, sr, leon, linux-rdma,
	christophe.leroy, kuba, pabeni, Jim Wilson, linuxppc-dev, npiggin,
	alex.williamson, Bjorn Helgaas, Mika Westerberg,
	david.abdurachmanov, Netdev, LKML, Lukas Wunner, Matthew W Carlis,
	saeedm, pali, David S. Miller

On Mon, 29 Jul 2024, Ilpo Järvinen wrote:

> > > The main reason is it is believed that it is the downstream device
> > > causing the issue, and obviously you can't fetch its ID if you can't
> > > negotiate link so as to talk to it in the first place.
> > 
> > Have had some more time to look into this issue. So, I think the problem
> > with this change is that it is quite strict in its assumptions about what
> > it means when a device fails to train, but in an environment where hot-plug
> > is exercised frequently you are essentially bound have something interrupt
> > the link training. In the first case where we caught this problem our test
> > automation was doing some power cycle tortures on our endpoints. If you catch
> > the right timing the link will be forced down to Gen1 forever without some other
> > automation to recover you unless your device is the one single device in the
> > allowlist which had the hardware bug in the first place.
> > 
> > I wonder if we can come up with some kind of alternative.
> 
> The most obvious solution is to not leave the speed at Gen1 on failure in 
> Target Speed quirk but to restore the original Target Speed value. The 
> downside with that is if the current retraining interface (function) is 
> used, it adds delay. But the retraining functions could be reworked such 
> that the retraining is only triggered in case the Target Speed quirk 
> fails but we don't wait for its result (which will very likely fail 
> anyway).

 This is what I have also been thinking of.

 After these many years it took from the inception of this change until it 
landed upstream I'm not sure anymore what my original idea was behind 
leaving the link clamped on a retrain failure, but I think it was either 
not to fiddle with the setting beyond the absolute necessity at hand 
(which the scenarios such as Matthew's prove wrong) or to leave the 
setting in a hope that training will eventually have succeeded (but it 
seems to make little sense as there'll be nothing there to actually 
observe the success unless the bus gets rescanned for another reason).

 I'll be at my lab towards the end of the week with a maintenance visit, 
so I'll allocate some time to fiddle with this issue on that occasion and 
implement such an update.

  Maciej

^ permalink raw reply	[flat|nested] 44+ messages in thread

* PCI: Work around PCIe link training failures
  2024-07-29 14:51             ` Maciej W. Rozycki
@ 2024-07-29 18:56               ` Matthew W Carlis
  0 siblings, 0 replies; 44+ messages in thread
From: Matthew W Carlis @ 2024-07-29 18:56 UTC (permalink / raw)
  To: macro
  Cc: linux-pci, mahesh, edumazet, oohall, sr, leon, linux-rdma,
	christophe.leroy, kuba, pabeni, wilson, npiggin, alex.williamson,
	ilpo.jarvinen, bhelgaas, mika.westerberg, pali,
	david.abdurachmanov, netdev, linux-kernel, lukas, mattc, saeedm,
	linuxppc-dev, davem

On Mon, 29 July 2024, Ilpo Järvinen wrote:

> The most obvious solution is to not leave the speed at Gen1 on failure in
> Target Speed quirk but to restore the original Target Speed value. The
> downside with that is if the current retraining interface (function) is
> used, it adds delay.

Tends to be that I care less about how long a device is gone & more about
how it will behave once it reappears. For our purposes we don't even tend
to notice a few seconds of wiggle in this area, but we do notice impact
if the kernel creates the nvme device & it is degraded in some way. Even
though we might have automation to recover the device we will have lost
more time already than by the purposed delay afaik.

Some of the time a human would have hot-insert'ed a new device, but much of
the time perhaps the device will be coming back from downstream port containment
where there won't be a person to ensure the correctness of link speed/width.
In the DPC case perhaps the endpoint itself will have reset/rebooted/crashed
where you already suffer a few hundred ms of delay from EP's boot time.

I would be interested to know what kind of maximum delay we would all be
willing to tolerate & what applications might care.

On Mon, 29 Jul 2024, Maciej W. Rozycki wrote:

> After these many years it took from the inception of this change until it
> landed upstream I'm not sure anymore what my original idea was behind
> leaving the link clamped

A familiar question I have been known to ask myself. - "Why did I do this again?"
The scary/funny thing is that there is almost always a reason.

I do think there might be some benefit to overall system stability to have some
kind of damping on link retraining rate because I have also seen device stuck
in an infinite cycle of many retrains per second, but each time we come through
the hot-insert code path kernel should let the link partners try to get to their
maximum speeds because it could in theory be a totally new EP. In the handful
I have seen there was some kind of defect with a particular device & replacement
resolved it.

- Matt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* PCI: Work around PCIe link training failures
  2023-06-20  9:54           ` Maciej W. Rozycki
@ 2024-08-06  0:06             ` Matthew W Carlis
  2024-08-06 19:36               ` Bjorn Helgaas
  2024-08-07 11:49               ` Maciej W. Rozycki
  0 siblings, 2 replies; 44+ messages in thread
From: Matthew W Carlis @ 2024-08-06  0:06 UTC (permalink / raw)
  To: macro
  Cc: linux-pci, mahesh, edumazet, oohall, sr, leon, linux-rdma,
	helgaas, kuba, pabeni, wilson, linuxppc-dev, npiggin,
	alex.williamson, bhelgaas, mika.westerberg, david.abdurachmanov,
	saeedm, linux-kernel, lukas, netdev, pali, davem

Hello again. I just realized that my first response to this thread two weeks
ago was not actually starting from the end of the discussion. I hope I found
it now... Must say sorry for this I am still figuring out how to follow these
threads.
I need to ask if we can either revert this patch or only modify the quirk to
only run on the device in mention (ASMedia ASM2824). We have now identified
it as causing devices to get stuck at Gen1 in multiple generations of our
hardware & across product lines on ports were hot-plug is common. To be a
little more specific it includes Intel root ports and Broadcomm PCIe switch
ports and also Microchip PCIe switch ports.
The most common place where we see our systems getting stuck at Gen1 is with
device power cycling. If a device is powered on and then off quickly then the
link will of course fail to train & the consequence here is that the port is
forced to Gen1 forever. Does anybody know why the patch will only remove the
forced Gen1 speed from the ASMedia device?

- Matt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PCI: Work around PCIe link training failures
  2024-08-06  0:06             ` PCI: Work around " Matthew W Carlis
@ 2024-08-06 19:36               ` Bjorn Helgaas
  2024-08-07  8:43                 ` Matthew W Carlis
  2024-08-07 11:49               ` Maciej W. Rozycki
  1 sibling, 1 reply; 44+ messages in thread
From: Bjorn Helgaas @ 2024-08-06 19:36 UTC (permalink / raw)
  To: Matthew W Carlis
  Cc: linux-pci, mahesh, edumazet, oohall, sr, leon, linux-rdma, kuba,
	pabeni, wilson, linuxppc-dev, npiggin, alex.williamson, bhelgaas,
	mika.westerberg, david.abdurachmanov, saeedm, linux-kernel, lukas,
	netdev, pali, davem, macro

On Mon, Aug 05, 2024 at 06:06:59PM -0600, Matthew W Carlis wrote:
> Hello again. I just realized that my first response to this thread two weeks
> ago was not actually starting from the end of the discussion. I hope I found
> it now... Must say sorry for this I am still figuring out how to follow these
> threads.
> I need to ask if we can either revert this patch or only modify the quirk to
> only run on the device in mention (ASMedia ASM2824). We have now identified
> it as causing devices to get stuck at Gen1 in multiple generations of our
> hardware & across product lines on ports were hot-plug is common. To be a
> little more specific it includes Intel root ports and Broadcomm PCIe switch
> ports and also Microchip PCIe switch ports.
> The most common place where we see our systems getting stuck at Gen1 is with
> device power cycling. If a device is powered on and then off quickly then the
> link will of course fail to train & the consequence here is that the port is
> forced to Gen1 forever. Does anybody know why the patch will only remove the
> forced Gen1 speed from the ASMedia device?

Thanks for keeping this thread alive.  I don't know the fix, but it
does seem like this series made wASMedia ASM2824 work better but
caused regressions elsewhere, so maybe we just need to accept that
ASM2824 is slightly broken and doesn't work as well as it should.

Bjorn

^ permalink raw reply	[flat|nested] 44+ messages in thread

* PCI: Work around PCIe link training failures
  2024-08-06 19:36               ` Bjorn Helgaas
@ 2024-08-07  8:43                 ` Matthew W Carlis
  2024-08-07 11:14                   ` Maciej W. Rozycki
  0 siblings, 1 reply; 44+ messages in thread
From: Matthew W Carlis @ 2024-08-07  8:43 UTC (permalink / raw)
  To: helgaas
  Cc: linux-pci, mahesh, edumazet, oohall, sr, leon, linux-rdma, kuba,
	pabeni, wilson, linuxppc-dev, npiggin, alex.williamson, bhelgaas,
	mika.westerberg, david.abdurachmanov, saeedm, linux-kernel, lukas,
	netdev, mattc, pali, davem, macro

On Tues, 06 Aug 2024 Bjorn Helgaas wrote:
> it does seem like this series made wASMedia ASM2824 work better but
> caused regressions elsewhere, so maybe we just need to accept that
> ASM2824 is slightly broken and doesn't work as well as it should.

One of my colleagues challenged me to provide a more concrete example
where the change will cause problems. One such configuration would be not
implementing the Power Controller Control in the Slot Capabilities Register.
Then, Powering off the slot via out-of-band interfaces would result in the
kernel forcing the DSP to Gen1 100% of the time as far as I can tell. 
The aspect of this force to Gen1 that is the most concerning to my team is
that it isn't cleaned up even if we replaced the EP with some other EP.

I was curious about the PCIe devices mentioned in the commit because I
look at crazy malfunctioning devices too often so I pasted the following:
"Delock Riser Card PCI Expres 41433" into Google. 
I'm not really a physical layer guy, but is it possible that the reported
issue be due to signal integrity? I'm not sure if sending PCIe over a USB
cable is "reliable".

I've never worked with an ASMedia switch and don't have a reliable way to
reproduce anything like the interaction between the two device at hand. As
much as I hate to make the request my thinking is that the patch should be
reverted until there is a solution that doesn't leave the link forced to
Gen1 forever for every EP thereafter.

- Matt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PCI: Work around PCIe link training failures
  2024-08-07  8:43                 ` Matthew W Carlis
@ 2024-08-07 11:14                   ` Maciej W. Rozycki
  2024-08-07 12:29                     ` Oliver O'Halloran
  0 siblings, 1 reply; 44+ messages in thread
From: Maciej W. Rozycki @ 2024-08-07 11:14 UTC (permalink / raw)
  To: Matthew W Carlis, Ilpo Järvinen
  Cc: linux-pci, mahesh, edumazet, oohall, sr, leon, linux-rdma,
	helgaas, kuba, pabeni, Jim Wilson, linuxppc-dev, npiggin,
	alex.williamson, Bjorn Helgaas, mika.westerberg,
	david.abdurachmanov, saeedm, linux-kernel, lukas, netdev, pali,
	David S. Miller

On Wed, 7 Aug 2024, Matthew W Carlis wrote:

> > it does seem like this series made wASMedia ASM2824 work better but
> > caused regressions elsewhere, so maybe we just need to accept that
> > ASM2824 is slightly broken and doesn't work as well as it should.
> 
> One of my colleagues challenged me to provide a more concrete example
> where the change will cause problems. One such configuration would be not
> implementing the Power Controller Control in the Slot Capabilities Register.
> Then, Powering off the slot via out-of-band interfaces would result in the
> kernel forcing the DSP to Gen1 100% of the time as far as I can tell. 
> The aspect of this force to Gen1 that is the most concerning to my team is
> that it isn't cleaned up even if we replaced the EP with some other EP.

 Why does that happen?

 For the quirk to trigger, the link has to be down and there has to be the 
LBMS Link Status bit set from link management events as per the PCIe spec 
while the link was previously up, and then both of that while rescanning 
the PCIe device in question, so there's a lot of conditions to meet.  Is 
it the case that in your setup there is no device at this point, but one 
gets plugged in later?

 One aspect to mention here is that when taking a device offline the LBMS 
Link Status bit really ought to be cleared by Linux in the corresponding 
downstream port of the parent device.  As I recall Ilpo was working on a 
broader link bandwidth management subsystem for Linux, which would do that 
among others.  He asked me to make experiments with my problematic machine 
to see if this would interfere, but unrelated issues with DRAM controller 
(now fixed by reducing the DDR clock rate) have prevented me from doing 
that.  I'll see if I can get back to that soon.

> I was curious about the PCIe devices mentioned in the commit because I
> look at crazy malfunctioning devices too often so I pasted the following:
> "Delock Riser Card PCI Expres 41433" into Google. 
> I'm not really a physical layer guy, but is it possible that the reported
> issue be due to signal integrity? I'm not sure if sending PCIe over a USB
> cable is "reliable".

 Well, it's a transmission line and as long as bandwidth and latency 
requirements are met it's as good as any.  My understanding has been one 
of the objectives of PCIe over conventional PCI was to make external links 
such as to expansion boxes easier to implement.

 Please note that it is a purpose-made cable too, rather than just an 
off-the-shelf USB cable.  Then I have solutions deployed with PCIe routed 
over DVI cables too.

 I doubt it could be a signal integrity issue, because once the link has 
negotiated the highest speed possible (Gen2) via this complicated dance it 
works with no issues for months.  I'm not sure what the highest uptime for 
the system in question exactly was, but it was in the range of half a 
year, and I have a network interface downstream which I regularly use for 
heavy NFS traffic in GNU tool chain regression verification, so any issue 
would have shown up pretty quickly.

 Given how switching between speed rates works with PCIe (by establishing 
a link at 2.5GT/s first and then exchanging rates available as data before 
choosing the highest supported by both endpoints) I suspect that it is a 
protocol issue: either or both devices have got it slightly wrong, which 
breaks it when they're combined together.  Otherwise why would retraining 
to 5GT/s by hand work while it doesn't if to be done by hardware itself?  
There is not much if any difference here between both scenarios really.

> I've never worked with an ASMedia switch and don't have a reliable way to
> reproduce anything like the interaction between the two device at hand. As
> much as I hate to make the request my thinking is that the patch should be
> reverted until there is a solution that doesn't leave the link forced to
> Gen1 forever for every EP thereafter.

 I'm working on such a change this week.

 It's just that my primary objectives for this maintenance visit at my lab 
were to fix a pair of broken PSUs (which I have now done) and upgrade the 
firmware of a console manager device to fix an issue with a remote serial 
BREAK feature affecting Magic SysRq (also completed), plus I have a day 
job too that is unrelated.

 I also bought a PCIe to dual M.2 M-Key option card with the ASM2824 
switch onboard to have a different setup to evaluate and determine if this 
issue is specific to the RISC-V board or not.  Unfortunately the ASM2824 
does not bring the downstream ports up if the card is placed in a slot 
that does not supply Vaux (which is a conforming arrangement according to 
the PCIe spec), apparently due to a quirk in the ASM2824 switch according 
to the option card manufacturer, and there is no software workaround 
possible.

 So I won't be able to use this alternative arrangement until I have 
modified the option card, which I won't be able to do before October the 
earliest.  

 I just can't help with that there is so many broken hardware designs out 
there and I have to strive to navigate through and do my job regardless.

  Maciej

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PCI: Work around PCIe link training failures
  2024-08-06  0:06             ` PCI: Work around " Matthew W Carlis
  2024-08-06 19:36               ` Bjorn Helgaas
@ 2024-08-07 11:49               ` Maciej W. Rozycki
  2024-08-08  2:07                 ` Matthew W Carlis
  1 sibling, 1 reply; 44+ messages in thread
From: Maciej W. Rozycki @ 2024-08-07 11:49 UTC (permalink / raw)
  To: Matthew W Carlis
  Cc: linux-pci, mahesh, edumazet, oohall, sr, leon, linux-rdma,
	helgaas, kuba, pabeni, Jim Wilson, linuxppc-dev, npiggin,
	alex.williamson, Bjorn Helgaas, mika.westerberg,
	david.abdurachmanov, saeedm, linux-kernel, lukas, netdev, pali,
	David S. Miller

On Mon, 5 Aug 2024, Matthew W Carlis wrote:

> The most common place where we see our systems getting stuck at Gen1 is with
> device power cycling. If a device is powered on and then off quickly then the
> link will of course fail to train & the consequence here is that the port is
> forced to Gen1 forever. Does anybody know why the patch will only remove the
> forced Gen1 speed from the ASMedia device?

 I know, as the author of the change.

 The reason for this is safety: it's better to have a link run at 2.5GT/s 
than not at all, and from the nature of the issue there is no guarantee 
that if you remove the speed clamp, then the link won't go back into the 
infinite retraining loop that the workaround is supposed to break.

 I was only able to verify that the speed clamp is safe to lift with this 
particular device, but if you hit the actual issue with any other device 
and find that it is safe to remove the clamp as well, then you're welcome 
to add it to the list too.

  Maciej

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PCI: Work around PCIe link training failures
  2024-08-07 11:14                   ` Maciej W. Rozycki
@ 2024-08-07 12:29                     ` Oliver O'Halloran
  0 siblings, 0 replies; 44+ messages in thread
From: Oliver O'Halloran @ 2024-08-07 12:29 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: linux-pci, linux-kernel, edumazet, sr, leon, linux-rdma, helgaas,
	kuba, pabeni, Jim Wilson, npiggin, alex.williamson,
	Ilpo Järvinen, Bjorn Helgaas, mika.westerberg, pali,
	david.abdurachmanov, netdev, mahesh, David S. Miller, lukas,
	Matthew W Carlis, linuxppc-dev, saeedm

On Wed, Aug 7, 2024 at 9:14 PM Maciej W. Rozycki <macro@orcam.me.uk> wrote:
>
> On Wed, 7 Aug 2024, Matthew W Carlis wrote:
>
> > > it does seem like this series made wASMedia ASM2824 work better but
> > > caused regressions elsewhere, so maybe we just need to accept that
> > > ASM2824 is slightly broken and doesn't work as well as it should.
> >
> > One of my colleagues challenged me to provide a more concrete example
> > where the change will cause problems. One such configuration would be not
> > implementing the Power Controller Control in the Slot Capabilities Register.
> > Then, Powering off the slot via out-of-band interfaces would result in the
> > kernel forcing the DSP to Gen1 100% of the time as far as I can tell.
> > The aspect of this force to Gen1 that is the most concerning to my team is
> > that it isn't cleaned up even if we replaced the EP with some other EP.
>
>  Why does that happen?
>
>  For the quirk to trigger, the link has to be down and there has to be the
> LBMS Link Status bit set from link management events as per the PCIe spec
> while the link was previously up, and then both of that while rescanning
> the PCIe device in question, so there's a lot of conditions to meet.  Is
> it the case that in your setup there is no device at this point, but one
> gets plugged in later?

My read was that Matt is essentially doing a surprise hot-unplug by
removing power to the card without notifying the OS. I thought the
LBMS bit wouldn't be set in that case since the link goes down rather
than changes speed, but the spec is a little vague and that appears to
be happening in Matt's testing. It might be worth disabling the
workaround if the port has the surprise hotplug capability bit set.
It's fairly common for ports on NVMe drive backplanes to have it set
and a lot of people would be unhappy about those being forced to Gen 1
by accident.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* PCI: Work around PCIe link training failures
  2024-08-07 11:49               ` Maciej W. Rozycki
@ 2024-08-08  2:07                 ` Matthew W Carlis
  2024-08-08 23:13                   ` Oliver O'Halloran
  2024-08-09 13:34                   ` Maciej W. Rozycki
  0 siblings, 2 replies; 44+ messages in thread
From: Matthew W Carlis @ 2024-08-08  2:07 UTC (permalink / raw)
  To: macro
  Cc: linux-pci, mahesh, edumazet, oohall, sr, leon, linux-rdma,
	helgaas, kuba, pabeni, wilson, linuxppc-dev, npiggin,
	alex.williamson, bhelgaas, mika.westerberg, david.abdurachmanov,
	saeedm, linux-kernel, lukas, netdev, mattc, pali, davem

On Wed, 7 Aug 2024 22:29:35 +1000 Oliver O'Halloran Wrote
> My read was that Matt is essentially doing a surprise hot-unplug by
> removing power to the card without notifying the OS. I thought the
> LBMS bit wouldn't be set in that case since the link goes down rather
> than changes speed, but the spec is a little vague and that appears to
> be happening in Matt's testing. It might be worth disabling the
> workaround if the port has the surprise hotplug capability bit set.

Most of the systems I have are using downstream port containment which does
not recommend setting the Hot-Plug Surprise in Slot Capabilities & therefore
we do not. The first time we noticed an issue with this patch was in test
automation which was power cycling the endpoints & injecting uncorrectable
errors to ensure our hosts are robust in the face of PCIe chaos & that they
will recover. Later we started to see other teams on other products
encountering the same bug in simpler cases where humans turn on and off
EP power for development purposes. Although we are not using Hot-Plug Surprise
we are often triggering DPC on the Surprise Down Uncorrectable Error when
we hit this Gen1 issue.

On Wed, 7 Aug 2024 12:14:13 +0100 Maciej W. Rozycki Wrote
> For the quirk to trigger, the link has to be down and there has to be the
> LBMS Link Status bit set from link management events as per the PCIe spec
> while the link was previously up, and then both of that while rescanning
> the PCIe device in question, so there's a lot of conditions to meet.

If there is nothing clearing the bit then why is there any expectation that
it wouldn't be set? We have 20/30/40 endpoints in a host that can be hot-removed,
hot-added at any point in time in any combination & its often the case that
the system uptime be hundreds of days. Eventually the bit will just become set
as a result of time and scale.

On Wed, 7 Aug 2024 12:14:13 +0100 Maciej W. Rozycki Wrote
> The reason for this is safety: it's better to have a link run at 2.5GT/s
> than not at all, and from the nature of the issue there is no guarantee
> that if you remove the speed clamp, then the link won't go back into the
> infinite retraining loop that the workaround is supposed to break.

I guess I don't really understand why it wouldn't be safe for every device
other than the ASMedia since they aren't the device that had the issue in the
first place. The main problem in my mind is the system doesn't actually have to
be retraining at all, it can occur the first time you replace a device or
power cycle the device or the first time the device goes into DPC & comes back.
If the quirk simply returned without doing anything when the ASMedia is not in the
allow-list it would make me more at ease. I can also imagine some other implementations
that would work well, but it doesn't seem correct that we could only give a single
opportunity to a device before forcing it to live at Gen1. Perhaps it should be
aware of the rate or a count or something...

I can only assume that there will be more systems that start to run into issues
with the patch as they strive to keep up with LTS & they exercise the hot-plug
path.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PCI: Work around PCIe link training failures
  2024-08-08  2:07                 ` Matthew W Carlis
@ 2024-08-08 23:13                   ` Oliver O'Halloran
  2024-08-09 13:34                   ` Maciej W. Rozycki
  1 sibling, 0 replies; 44+ messages in thread
From: Oliver O'Halloran @ 2024-08-08 23:13 UTC (permalink / raw)
  To: Matthew W Carlis
  Cc: linux-pci, mahesh, edumazet, sr, leon, linux-rdma, helgaas, kuba,
	pabeni, wilson, linuxppc-dev, npiggin, alex.williamson, bhelgaas,
	mika.westerberg, david.abdurachmanov, saeedm, linux-kernel, lukas,
	netdev, pali, davem, macro

On Thu, Aug 8, 2024 at 12:08 PM Matthew W Carlis <mattc@purestorage.com> wrote:
>
> On Wed, 7 Aug 2024 22:29:35 +1000 Oliver O'Halloran Wrote
> > My read was that Matt is essentially doing a surprise hot-unplug by
> > removing power to the card without notifying the OS. I thought the
> > LBMS bit wouldn't be set in that case since the link goes down rather
> > than changes speed, but the spec is a little vague and that appears to
> > be happening in Matt's testing. It might be worth disabling the
> > workaround if the port has the surprise hotplug capability bit set.
>
> Most of the systems I have are using downstream port containment which does
> not recommend setting the Hot-Plug Surprise in Slot Capabilities & therefore
> we do not. The first time we noticed an issue with this patch was in test
> automation which was power cycling the endpoints & injecting uncorrectable
> errors to ensure our hosts are robust in the face of PCIe chaos & that they
> will recover. Later we started to see other teams on other products
> encountering the same bug in simpler cases where humans turn on and off
> EP power for development purposes.

Ok? If we have to check for DPC being enabled in addition to checking
the surprise bit in the slot capabilities then that's fine, we can do
that. The question to be answered here is: how should this feature
work on ports where it's normal for a device to be removed without any
notice?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PCI: Work around PCIe link training failures
  2024-08-08  2:07                 ` Matthew W Carlis
  2024-08-08 23:13                   ` Oliver O'Halloran
@ 2024-08-09 13:34                   ` Maciej W. Rozycki
  2024-08-15 19:40                     ` Matthew W Carlis
  1 sibling, 1 reply; 44+ messages in thread
From: Maciej W. Rozycki @ 2024-08-09 13:34 UTC (permalink / raw)
  To: Matthew W Carlis
  Cc: linux-pci, mahesh, edumazet, oohall, sr, leon, linux-rdma,
	helgaas, kuba, pabeni, Jim Wilson, linuxppc-dev, npiggin,
	alex.williamson, Bjorn Helgaas, mika.westerberg,
	david.abdurachmanov, saeedm, linux-kernel, lukas, netdev, pali,
	David S. Miller

On Wed, 7 Aug 2024, Matthew W Carlis wrote:

> > For the quirk to trigger, the link has to be down and there has to be the
> > LBMS Link Status bit set from link management events as per the PCIe spec
> > while the link was previously up, and then both of that while rescanning
> > the PCIe device in question, so there's a lot of conditions to meet.
> 
> If there is nothing clearing the bit then why is there any expectation that
> it wouldn't be set? We have 20/30/40 endpoints in a host that can be hot-removed,
> hot-added at any point in time in any combination & its often the case that
> the system uptime be hundreds of days. Eventually the bit will just become set
> as a result of time and scale.

 Well, in principle in a setup with reliable links the LBMS bit may never 
be set, e.g. this system of mine has been in 24/7 operation since the last 
reboot 410 days ago and for the devices that support Link Active reporting 
it shows:

LnkSta:Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt-
LnkSta:Speed 5GT/s, Width x8, TrErr- Train- SlotClk- DLActive+ BWMgmt+ ABWMgmt+
LnkSta:Speed 5GT/s, Width x1, TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt+
LnkSta:Speed 5GT/s, Width x2, TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt+
LnkSta:Speed 5GT/s, Width x1, TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt+
LnkSta:Speed 8GT/s, Width x16, TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt+
LnkSta:Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
LnkSta:Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
LnkSta:Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
LnkSta:Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
LnkSta:Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt-

so out of 11 devices 6 have the LBMS bit clear.  But then 5 have it set, 
perhaps worryingly, so of course you're right, that it will get set in the 
field, though it's not enough by itself for your problem to trigger.

 Then there is manual link retraining, which we do from time to time as 
well, which will set the bit too, which I overlooked.

 To cut the story short, it was an oversight of mine, especially as I 
don't really make any use myself of hot plug scenarios.

> > The reason for this is safety: it's better to have a link run at 2.5GT/s
> > than not at all, and from the nature of the issue there is no guarantee
> > that if you remove the speed clamp, then the link won't go back into the
> > infinite retraining loop that the workaround is supposed to break.
> 
> I guess I don't really understand why it wouldn't be safe for every device
> other than the ASMedia since they aren't the device that had the issue in the
> first place. The main problem in my mind is the system doesn't actually have to
> be retraining at all, it can occur the first time you replace a device or
> power cycle the device or the first time the device goes into DPC & comes back.
> If the quirk simply returned without doing anything when the ASMedia is not in the
> allow-list it would make me more at ease. I can also imagine some other implementations
> that would work well, but it doesn't seem correct that we could only give a single
> opportunity to a device before forcing it to live at Gen1. Perhaps it should be
> aware of the rate or a count or something...

 It's a complex matter.  For a starter please read the introduction at 
`pcie_failed_link_retrain'.

 When the problem triggers the link goes into an infinite link training 
loop, with the Link Training (LT) bit flipping on and off.  I've made a 
complementary workaround for U-Boot (since your bootstrap device can be 
downstream such a broken link), where a statistical probe is done for the 
LT bit flipping as discovered by polling the bit in a tight loop.  This is 
fine for a piece of firmware that has nothing better to do, but in an OS 
kernel we just can't do it.

 Also U-Boot does not remove the 2.5GT/s clamp because it does not have to 
set up the system in an optimal way, but just sufficiently to successfully 
boot.  An OS kernel can then adjust the configuration if it knows about 
the workaround, or leave it as it is.  In the latter case things will 
continue working across PCIe resets, because the TLS field is sticky.

 For Linux I went for just checking the DLLLA bit as it seems good enough 
for devices that support Link Active reporting.  And I admit that the 
workaround is quite aggressive, but this was a deliberate decision 
following the robustness principle: the end user may not be qualified 
enough to diagnose the problem and given its nature not even think it's 
something that can be made to work.  The downstream device does not show 
up in the PCIe hierarchy and just looks like it's broken.  It takes a lot 
of knowledge and experience to notice the LT bit flipping and even myself, 
quite a seasoned Linux kernel developer, only noticed it by chance.

 It was discussed to death with the original submission, the rationale is 
given in the introductory comment and I'd prefer that it stays as it is.  
The ASM2824 switch works just fine with other downstream devices and so 
does the PI7C9X2G304 switch with other upstream devices.  Both vendors 
refused to help narrow it down and declined to comment (a person from 
Diodes née Pericom replied, but they learnt it's about a PCIe switch they 
told me they were from another department and couldn't help), which would 
have helped (I now just recommend staying away from both manufacturers).  
Therefore my assumption is this can potentially trigger with any pair of 
devices.

 I have now posted a patch series to address this problem of yours and a 
previous issue reported by Ilpo.  See: 
<https://patchwork.kernel.org/project/linux-pci/list/?series=878216>, and 
I've included you in the list of direct recipients too.

 I will appreciate if you give it a try, and again, apologies for the 
trouble caused, but such things happen in engineering.

  Maciej

^ permalink raw reply	[flat|nested] 44+ messages in thread

* PCI: Work around PCIe link training failures
  2024-08-09 13:34                   ` Maciej W. Rozycki
@ 2024-08-15 19:40                     ` Matthew W Carlis
  2024-08-16 13:57                       ` Maciej W. Rozycki
  0 siblings, 1 reply; 44+ messages in thread
From: Matthew W Carlis @ 2024-08-15 19:40 UTC (permalink / raw)
  To: macro
  Cc: alex.williamson, bhelgaas, davem, david.abdurachmanov, edumazet,
	helgaas, kuba, leon, linux-kernel, linux-pci, linux-rdma,
	linuxppc-dev, lukas, mahesh, mattc, mika.westerberg, netdev,
	npiggin, oohall, pabeni, pali, saeedm, sr, wilson

Sorry for the delay in my responses here I had some things get in my way.

On Fri, 9 Aug 2024 09:13:52 Oliver O'Halloran <oohall@gmail.com> wrote:

> Ok? If we have to check for DPC being enabled in addition to checking
> the surprise bit in the slot capabilities then that's fine, we can do
> that. The question to be answered here is: how should this feature
> work on ports where it's normal for a device to be removed without any
> notice?

I'm not sure if its the correct thing to check however. I assumed that ports
using the pciehp driver would usually consider it "normal" for a device to
be removed actually, but maybe I have the idea of hp reversed.

On Fri, 9 Aug 2024 14:34:04 Maciej W. Rozycki <macro@orcam.me.uk> wrote:

> Well, in principle in a setup with reliable links the LBMS bit may never 
> be set, e.g. this system of mine has been in 24/7 operation since the last 
> reboot 410 days ago and for the devices that support Link Active reporting 
> it shows:
> ...
> so out of 11 devices 6 have the LBMS bit clear.  But then 5 have it set, 
> perhaps worryingly, so of course you're right, that it will get set in the 
> field, though it's not enough by itself for your problem to trigger.

The way I look at it is that its essentially a probability distribution with time,
but I try to avoid learning too much about the physical layer because I would find
myself debugging more hardware issues lol. I also don't think LBMS/LABS being set
by itself is very interesting without knowing the rate at which it is being set.
FWIW I have seen some devices in the past going into recovery state many times a
second & still never downtrain, but at the same time they were setting the
LBMS/LABS bits which maybe not quite spec compliant.

I would like to help test these changes, but I would like to avoid having to test
each mentioned change individually. Does anyone have any preferences in how I batch
the patches for testing? Would it be ok if I just pulled them all together on one go?

- Matt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PCI: Work around PCIe link training failures
  2024-08-15 19:40                     ` Matthew W Carlis
@ 2024-08-16 13:57                       ` Maciej W. Rozycki
  2024-10-01 21:04                         ` Matthew W Carlis
  0 siblings, 1 reply; 44+ messages in thread
From: Maciej W. Rozycki @ 2024-08-16 13:57 UTC (permalink / raw)
  To: Matthew W Carlis
  Cc: alex.williamson, Bjorn Helgaas, David S. Miller,
	david.abdurachmanov, edumazet, helgaas, kuba, leon, linux-kernel,
	linux-pci, linux-rdma, linuxppc-dev, lukas, mahesh,
	mika.westerberg, netdev, npiggin, oohall, pabeni, pali, saeedm,
	sr, Jim Wilson

On Thu, 15 Aug 2024, Matthew W Carlis wrote:

> > Well, in principle in a setup with reliable links the LBMS bit may never 
> > be set, e.g. this system of mine has been in 24/7 operation since the last 
> > reboot 410 days ago and for the devices that support Link Active reporting 
> > it shows:
> > ...
> > so out of 11 devices 6 have the LBMS bit clear.  But then 5 have it set, 
> > perhaps worryingly, so of course you're right, that it will get set in the 
> > field, though it's not enough by itself for your problem to trigger.
> 
> The way I look at it is that its essentially a probability distribution with time,
> but I try to avoid learning too much about the physical layer because I would find
> myself debugging more hardware issues lol. I also don't think LBMS/LABS being set
> by itself is very interesting without knowing the rate at which it is being set.

 Agreed.  Ilpo's upcoming bandwidth controller will hopefully give us such 
data.

> FWIW I have seen some devices in the past going into recovery state many times a
> second & still never downtrain, but at the same time they were setting the
> LBMS/LABS bits which maybe not quite spec compliant.
> 
> I would like to help test these changes, but I would like to avoid having to test
> each mentioned change individually. Does anyone have any preferences in how I batch
> the patches for testing? Would it be ok if I just pulled them all together on one go?

 Certainly fine with me, especially as 3/4 and 4/4 aren't really related 
to your failure scenario, and then you need 1/4 and 2/4 both at a time to 
address both aspects of the issue you have reported.

  Maciej


^ permalink raw reply	[flat|nested] 44+ messages in thread

* PCI: Work around PCIe link training failures
  2024-08-16 13:57                       ` Maciej W. Rozycki
@ 2024-10-01 21:04                         ` Matthew W Carlis
  2024-10-02 12:58                           ` Maciej W. Rozycki
  0 siblings, 1 reply; 44+ messages in thread
From: Matthew W Carlis @ 2024-10-01 21:04 UTC (permalink / raw)
  To: macro
  Cc: alex.williamson, bhelgaas, davem, david.abdurachmanov, edumazet,
	helgaas, kuba, leon, linux-kernel, linux-pci, linux-rdma,
	linuxppc-dev, lukas, mahesh, mattc, mika.westerberg, netdev,
	npiggin, oohall, pabeni, pali, saeedm, sr, wilson

I just wanted to follow up with our testing results for the mentioned
patches. It took me a while to get them running in our test pool, but
we just got it going yesterday and the initial results look really good.
We will continue running them in our testing from now on & if any issues
come up I'll try to report them, but otherwise I wanted to say thank you
for entertaining the discussion & reacting so quickly with new patches.

The patches we pulled into our testing:

[PATCH v3 1/4] PCI: Clear the LBMS bit after a link retrain
[PATCH v3 2/4] PCI: Revert to the original speed after PCIe failed link retraining

- Matt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PCI: Work around PCIe link training failures
  2024-10-01 21:04                         ` Matthew W Carlis
@ 2024-10-02 12:58                           ` Maciej W. Rozycki
  2024-10-02 20:55                             ` Bjorn Helgaas
  0 siblings, 1 reply; 44+ messages in thread
From: Maciej W. Rozycki @ 2024-10-02 12:58 UTC (permalink / raw)
  To: Matthew W Carlis
  Cc: alex.williamson, Bjorn Helgaas, David S. Miller,
	david.abdurachmanov, edumazet, helgaas, kuba, leon, linux-kernel,
	linux-pci, linux-rdma, linuxppc-dev, lukas, mahesh,
	mika.westerberg, netdev, npiggin, oohall, pabeni, pali, saeedm,
	sr, Jim Wilson

On Tue, 1 Oct 2024, Matthew W Carlis wrote:

> I just wanted to follow up with our testing results for the mentioned
> patches. It took me a while to get them running in our test pool, but
> we just got it going yesterday and the initial results look really good.
> We will continue running them in our testing from now on & if any issues
> come up I'll try to report them, but otherwise I wanted to say thank you
> for entertaining the discussion & reacting so quickly with new patches.

 My pleasure.  I'm glad that the solution works for you.  Let me know if 
you need anything else.

  Maciej


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PCI: Work around PCIe link training failures
  2024-10-02 12:58                           ` Maciej W. Rozycki
@ 2024-10-02 20:55                             ` Bjorn Helgaas
  2024-10-03 10:39                               ` Maciej W. Rozycki
  0 siblings, 1 reply; 44+ messages in thread
From: Bjorn Helgaas @ 2024-10-02 20:55 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Matthew W Carlis, alex.williamson, Bjorn Helgaas, David S. Miller,
	david.abdurachmanov, edumazet, kuba, leon, linux-kernel,
	linux-pci, linux-rdma, linuxppc-dev, lukas, mahesh,
	mika.westerberg, netdev, npiggin, oohall, pabeni, pali, saeedm,
	sr, Jim Wilson

On Wed, Oct 02, 2024 at 01:58:15PM +0100, Maciej W. Rozycki wrote:
> On Tue, 1 Oct 2024, Matthew W Carlis wrote:
> 
> > I just wanted to follow up with our testing results for the mentioned
> > patches. It took me a while to get them running in our test pool, but
> > we just got it going yesterday and the initial results look really good.
> > We will continue running them in our testing from now on & if any issues
> > come up I'll try to report them, but otherwise I wanted to say thank you
> > for entertaining the discussion & reacting so quickly with new patches.
> 
>  My pleasure.  I'm glad that the solution works for you.  Let me know if 
> you need anything else.

If there's anything missing that still needs to be added to v6.13-rc1,
can somebody repost those?  I lost track of what's still outstanding.

Bjorn


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PCI: Work around PCIe link training failures
  2024-10-02 20:55                             ` Bjorn Helgaas
@ 2024-10-03 10:39                               ` Maciej W. Rozycki
  2025-06-10  7:00                                 ` Matthew W Carlis
  0 siblings, 1 reply; 44+ messages in thread
From: Maciej W. Rozycki @ 2024-10-03 10:39 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Matthew W Carlis, alex.williamson, Bjorn Helgaas, David S. Miller,
	david.abdurachmanov, edumazet, kuba, leon, linux-kernel,
	linux-pci, linux-rdma, linuxppc-dev, lukas, mahesh,
	mika.westerberg, netdev, npiggin, oohall, pabeni, pali, saeedm,
	sr, Jim Wilson

On Wed, 2 Oct 2024, Bjorn Helgaas wrote:

> If there's anything missing that still needs to be added to v6.13-rc1,
> can somebody repost those?  I lost track of what's still outstanding.

 I have nothing outstanding to add right away.  Thank you for asking.

  Maciej


^ permalink raw reply	[flat|nested] 44+ messages in thread

* PCI: Work around PCIe link training failures
  2024-10-03 10:39                               ` Maciej W. Rozycki
@ 2025-06-10  7:00                                 ` Matthew W Carlis
  0 siblings, 0 replies; 44+ messages in thread
From: Matthew W Carlis @ 2025-06-10  7:00 UTC (permalink / raw)
  To: macro
  Cc: alex.williamson, bhelgaas, davem, david.abdurachmanov, edumazet,
	helgaas, kuba, leon, linux-kernel, linux-pci, linux-rdma,
	linuxppc-dev, lukas, mahesh, mattc, mika.westerberg, netdev,
	npiggin, oohall, pabeni, pali, saeedm, sr, wilson

Hello again.. It looks like there are specific system configurations that are
extremely likely to have issues with this patch & result in undesirable
system behavior..

  Specifically hot-plug systems with side-band presence detection & without
Power Controls (i.e PwrCtrl-) given to config space. It may also be related
to presence on U.2 connectors being first-to-mate/last-to-break, but
I don't have much experience with the different connectors. The main
issue is that the quirk is invoked in at least two common cases when
it is not expected that the link would train. 
  For example, if I power off the slot via an out-of-band vendor specific
mechanism we see the kernel decide that the link should be training,
presumable because it will see PresDet+ in Slot Status. In this case it
decides the link failed to train, writes the Gen1 speed value into TLS
(target link speed) & returns after waiting for the link one more time.
The next time the slot is powered on the link will train to Gen1 due to TLS.
  Another problematic case is when we physically insert a device. In my case
I am using nvme drives with U.2 connectors from several different vendors.
The presence event is often generated before the device is fully powered on
due to U.2 assigning presence as a first-to-mate & power being last-to-mate.
I believe in this case that the kernel is expecting the link to train too
soon & therefore we find that the quirk often applies the Gen1 TLS speed.
Later, when the link comes up it is frequently observed at Gen1. I tried
to unset bit 3 in Slot Control (Presence Detect Changed Enable), but we
still hit the first case I described with powering off slots.
  I should be clear and say that we are able to see devices forced to Gen1
extremely often in the side-band presence configuration. We would really like
to see this "quirk" removed or put behind an opt-in config since it causes
significant regression in common configurations of pcie-hotplug. I have tried
to come up with ideas to modify/improve the quirk, but I am not very
confident that there is a good solution if the kernel cannot know for certain
whether the link is expected to train.

Thanks,
-Matt

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2025-06-10  7:00 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 01/14] PCI: pciehp: Rely on `link_active_reporting' Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 02/14] PCI: Export PCIe link retrain timeout Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 03/14] PCI: Execute `quirk_enable_clear_retrain_link' earlier Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 04/14] PCI: Initialize `link_active_reporting' earlier Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 05/14] powerpc/eeh: Rely on `link_active_reporting' Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 06/14] net/mlx5: " Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 07/14] PCI: Export `pcie_retrain_link' for use outside ASPM Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 08/14] PCI: Use distinct local vars in `pcie_retrain_link' Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 09/14] PCI: Factor our waiting for link training end Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 10/14] PCI: Add support for polling DLLLA to `pcie_retrain_link' Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 11/14] PCI: Use `pcie_wait_for_link_status' in `pcie_wait_for_link_delay' Maciej W. Rozycki
2023-06-11 17:20 ` [PATCH v9 12/14] PCI: Provide stub failed link recovery for device probing and hot plug Maciej W. Rozycki
2024-07-22 19:34   ` PCI: Work around PCIe link training failures Matthew W Carlis
2024-07-22 20:40     ` Maciej W. Rozycki
2024-07-24 19:18       ` Matthew W Carlis
2024-07-26  8:04         ` Matthew W Carlis
2024-07-29 10:27           ` Ilpo Järvinen
2024-07-29 14:51             ` Maciej W. Rozycki
2024-07-29 18:56               ` Matthew W Carlis
2023-06-11 17:20 ` [PATCH v9 13/14] PCI: Add failed link recovery for device reset events Maciej W. Rozycki
2023-06-11 17:20 ` [PATCH v9 14/14] PCI: Work around PCIe link training failures Maciej W. Rozycki
2023-06-14 23:12 ` [PATCH v9 00/14] pci: Work around ASMedia ASM2824 " Bjorn Helgaas
2023-06-15  0:41   ` Maciej W. Rozycki
2023-06-15 18:37     ` Bjorn Helgaas
2023-06-16 12:27       ` Maciej W. Rozycki
2023-06-16 20:29         ` Bjorn Helgaas
2023-06-20  9:54           ` Maciej W. Rozycki
2024-08-06  0:06             ` PCI: Work around " Matthew W Carlis
2024-08-06 19:36               ` Bjorn Helgaas
2024-08-07  8:43                 ` Matthew W Carlis
2024-08-07 11:14                   ` Maciej W. Rozycki
2024-08-07 12:29                     ` Oliver O'Halloran
2024-08-07 11:49               ` Maciej W. Rozycki
2024-08-08  2:07                 ` Matthew W Carlis
2024-08-08 23:13                   ` Oliver O'Halloran
2024-08-09 13:34                   ` Maciej W. Rozycki
2024-08-15 19:40                     ` Matthew W Carlis
2024-08-16 13:57                       ` Maciej W. Rozycki
2024-10-01 21:04                         ` Matthew W Carlis
2024-10-02 12:58                           ` Maciej W. Rozycki
2024-10-02 20:55                             ` Bjorn Helgaas
2024-10-03 10:39                               ` Maciej W. Rozycki
2025-06-10  7:00                                 ` Matthew W Carlis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).