* [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
@ 2025-12-01 3:52 Maciej W. Rozycki
2025-12-01 9:45 ` Ilpo Järvinen
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Maciej W. Rozycki @ 2025-12-01 3:52 UTC (permalink / raw)
To: Bjorn Helgaas, Matthew W Carlis, ALOK TIWARI
Cc: ashishk, bamstadt, msaggi, sconnor, Lukas Wunner,
Ilpo Järvinen, Jiwei, guojinhui.liam, ahuang12, sunjw10,
linux-pci, linux-kernel
Discard Vendor:Device ID matching in the PCIe failed link retraining
quirk and ignore the link status for the removal of the 2.5GT/s speed
clamp, whether applied by the quirk itself or the firmware earlier on.
Revert to the original target link speed if this final link retraining
has failed.
This is so that link training noise in hot-plug scenarios does not make
a link remain clamped to the 2.5GT/s speed where an event race has led
the quirk to apply the speed clamp for one device, only to leave it in
place for a subsequent device to be plugged in.
Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: <stable@vger.kernel.org> # v6.5+
---
drivers/pci/quirks.c | 50 ++++++++++++++++++--------------------------------
1 file changed, 18 insertions(+), 32 deletions(-)
linux-pcie-failed-link-retrain-unclamp-always.diff
Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -79,11 +79,10 @@ static bool pcie_lbms_seen(struct pci_de
* Restrict the speed to 2.5GT/s then with the Target Link Speed field,
* request a retrain and check the result.
*
- * If this turns out successful and we know by the Vendor:Device ID it is
- * safe to do so, then lift the restriction, letting the devices negotiate
- * a higher speed. Also check for a similar 2.5GT/s speed restriction the
- * firmware may have already arranged and lift it with ports that already
- * report their data link being up.
+ * If this turns out successful, or where a 2.5GT/s speed restriction has
+ * been previously arranged by the firmware and the port reports its link
+ * already being up, lift the restriction, in a hope it is safe to do so,
+ * letting the devices negotiate a higher speed.
*
* Otherwise revert the speed to the original setting and request a retrain
* again to remove any residual state, ignoring the result as it's supposed
@@ -94,52 +93,39 @@ static bool pcie_lbms_seen(struct pci_de
*/
int pcie_failed_link_retrain(struct pci_dev *dev)
{
- static const struct pci_device_id ids[] = {
- { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
- {}
- };
- u16 lnksta, lnkctl2;
+ u16 lnksta, lnkctl2, oldlnkctl2;
int ret = -ENOTTY;
+ u32 lnkcap;
if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
!pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
return ret;
pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+ pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &oldlnkctl2);
if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) {
- u16 oldlnkctl2;
-
pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
- pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &oldlnkctl2);
ret = pcie_set_target_speed(dev, PCIE_SPEED_2_5GT, false);
- if (ret) {
- pci_info(dev, "retraining failed\n");
- pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2),
- true);
- return ret;
- }
-
- pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+ if (ret)
+ goto err;
}
pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
-
- if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
- (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
- pci_match_id(ids, dev)) {
- u32 lnkcap;
-
+ pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
+ if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
+ (lnkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) {
pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
- pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
ret = pcie_set_target_speed(dev, PCIE_LNKCAP_SLS2SPEED(lnkcap), false);
- if (ret) {
- pci_info(dev, "retraining failed\n");
- return ret;
- }
+ if (ret)
+ goto err;
}
return ret;
+err:
+ pci_info(dev, "retraining failed\n");
+ pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2), true);
+ return ret;
}
static ktime_t fixup_debug_start(struct pci_dev *dev,
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
2025-12-01 3:52 [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining Maciej W. Rozycki
@ 2025-12-01 9:45 ` Ilpo Järvinen
2025-12-01 13:55 ` Maciej W. Rozycki
2025-12-02 13:49 ` [External] : " ALOK TIWARI
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Ilpo Järvinen @ 2025-12-01 9:45 UTC (permalink / raw)
To: Maciej W. Rozycki
Cc: Bjorn Helgaas, Matthew W Carlis, ALOK TIWARI, ashishk, bamstadt,
msaggi, sconnor, Lukas Wunner, Jiwei, guojinhui.liam, ahuang12,
sunjw10, linux-pci, LKML
On Mon, 1 Dec 2025, Maciej W. Rozycki wrote:
> Discard Vendor:Device ID matching in the PCIe failed link retraining
> quirk and ignore the link status for the removal of the 2.5GT/s speed
> clamp, whether applied by the quirk itself or the firmware earlier on.
> Revert to the original target link speed if this final link retraining
> has failed.
>
> This is so that link training noise in hot-plug scenarios does not make
> a link remain clamped to the 2.5GT/s speed where an event race has led
> the quirk to apply the speed clamp for one device, only to leave it in
> place for a subsequent device to be plugged in.
>
> Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
> Cc: <stable@vger.kernel.org> # v6.5+
> ---
> drivers/pci/quirks.c | 50 ++++++++++++++++++--------------------------------
> 1 file changed, 18 insertions(+), 32 deletions(-)
>
> linux-pcie-failed-link-retrain-unclamp-always.diff
> Index: linux-macro/drivers/pci/quirks.c
> ===================================================================
> --- linux-macro.orig/drivers/pci/quirks.c
> +++ linux-macro/drivers/pci/quirks.c
> @@ -79,11 +79,10 @@ static bool pcie_lbms_seen(struct pci_de
> * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
> * request a retrain and check the result.
> *
> - * If this turns out successful and we know by the Vendor:Device ID it is
> - * safe to do so, then lift the restriction, letting the devices negotiate
> - * a higher speed. Also check for a similar 2.5GT/s speed restriction the
> - * firmware may have already arranged and lift it with ports that already
> - * report their data link being up.
> + * If this turns out successful, or where a 2.5GT/s speed restriction has
> + * been previously arranged by the firmware and the port reports its link
> + * already being up, lift the restriction, in a hope it is safe to do so,
> + * letting the devices negotiate a higher speed.
> *
> * Otherwise revert the speed to the original setting and request a retrain
> * again to remove any residual state, ignoring the result as it's supposed
> @@ -94,52 +93,39 @@ static bool pcie_lbms_seen(struct pci_de
> */
> int pcie_failed_link_retrain(struct pci_dev *dev)
> {
> - static const struct pci_device_id ids[] = {
> - { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
> - {}
> - };
> - u16 lnksta, lnkctl2;
> + u16 lnksta, lnkctl2, oldlnkctl2;
> int ret = -ENOTTY;
> + u32 lnkcap;
>
> if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
> !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
> return ret;
>
> pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &oldlnkctl2);
> if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) {
> - u16 oldlnkctl2;
> -
> pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
>
> - pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &oldlnkctl2);
> ret = pcie_set_target_speed(dev, PCIE_SPEED_2_5GT, false);
> - if (ret) {
> - pci_info(dev, "retraining failed\n");
> - pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2),
> - true);
> - return ret;
> - }
> -
> - pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
> + if (ret)
> + goto err;
> }
>
> pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
> -
> - if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
> - (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
> - pci_match_id(ids, dev)) {
> - u32 lnkcap;
> -
> + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> + if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
> + (lnkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) {
I'm trying to recall, if there was some particular reason why
->supported_speeds couldn't be used in this function. It would avoid the
need to read LinkCap at all.
> pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
> - pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> ret = pcie_set_target_speed(dev, PCIE_LNKCAP_SLS2SPEED(lnkcap), false);
> - if (ret) {
> - pci_info(dev, "retraining failed\n");
> - return ret;
> - }
> + if (ret)
> + goto err;
> }
>
> return ret;
return 0;
> +err:
> + pci_info(dev, "retraining failed\n");
> + pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2), true);
> + return ret;
> }
>
> static ktime_t fixup_debug_start(struct pci_dev *dev,
>
--
i.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
2025-12-01 9:45 ` Ilpo Järvinen
@ 2025-12-01 13:55 ` Maciej W. Rozycki
2025-12-01 16:48 ` Ilpo Järvinen
2025-12-08 19:24 ` Maciej W. Rozycki
0 siblings, 2 replies; 13+ messages in thread
From: Maciej W. Rozycki @ 2025-12-01 13:55 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Bjorn Helgaas, Matthew W Carlis, ALOK TIWARI, ashishk, msaggi,
sconnor, Lukas Wunner, Jiwei, guojinhui.liam, ahuang12, sunjw10,
linux-pci, LKML
On Mon, 1 Dec 2025, Ilpo Järvinen wrote:
> > + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > + if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
> > + (lnkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) {
>
> I'm trying to recall, if there was some particular reason why
> ->supported_speeds couldn't be used in this function. It would avoid the
> need to read LinkCap at all.
Thanks for the hint. There's probably none and it's just me missing some
of the zillion bits and pieces. I'll wait a couple of days for any other
people to chime in and respin with this update included if everyone is
otherwise happy to proceed with this update.
> > + if (ret)
> > + goto err;
> > }
> >
> > return ret;
>
> return 0;
It can still return -ENOTTY if neither of the two latter conditionals
matched, meaning the quirk was not applicable after all. ISTR you had
issues with the structure of this code before; I am not sure if it can
be made any better in a reasonable way. It is not a failure per se, so
the newly-added common error path does not apply. This is the case for:
"Return an error if retraining was not needed[...]" from the introductory
comment.
Shall I add a comment above the return statement referring to this?
Maciej
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
2025-12-01 13:55 ` Maciej W. Rozycki
@ 2025-12-01 16:48 ` Ilpo Järvinen
2025-12-08 19:24 ` Maciej W. Rozycki
1 sibling, 0 replies; 13+ messages in thread
From: Ilpo Järvinen @ 2025-12-01 16:48 UTC (permalink / raw)
To: Maciej W. Rozycki
Cc: Bjorn Helgaas, Matthew W Carlis, ALOK TIWARI, ashishk, msaggi,
sconnor, Lukas Wunner, Jiwei, guojinhui.liam, ahuang12, sunjw10,
linux-pci, LKML
[-- Attachment #1: Type: text/plain, Size: 1732 bytes --]
On Mon, 1 Dec 2025, Maciej W. Rozycki wrote:
> On Mon, 1 Dec 2025, Ilpo Järvinen wrote:
>
> > > + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > > + if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
> > > + (lnkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) {
> >
> > I'm trying to recall, if there was some particular reason why
> > ->supported_speeds couldn't be used in this function. It would avoid the
> > need to read LinkCap at all.
>
> Thanks for the hint. There's probably none and it's just me missing some
> of the zillion bits and pieces. I'll wait a couple of days for any other
> people to chime in and respin with this update included if everyone is
> otherwise happy to proceed with this update.
>
> > > + if (ret)
> > > + goto err;
> > > }
> > >
> > > return ret;
> >
> > return 0;
>
> It can still return -ENOTTY if neither of the two latter conditionals
> matched, meaning the quirk was not applicable after all. ISTR you had
> issues with the structure of this code before; I am not sure if it can
> be made any better in a reasonable way. It is not a failure per se, so
> the newly-added common error path does not apply. This is the case for:
> "Return an error if retraining was not needed[...]" from the introductory
> comment.
>
> Shall I add a comment above the return statement referring to this?
I think it's fine as is, I just didn't review with enough context to
notice what it was initialized to (the usual thing when adding a
rollback path is to forget to change the normal path to return 0, thus
"auto commenting" it without checking enough, I'm sorry about that).
--
i.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [External] : [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
2025-12-01 3:52 [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining Maciej W. Rozycki
2025-12-01 9:45 ` Ilpo Järvinen
@ 2025-12-02 13:49 ` ALOK TIWARI
2025-12-02 16:07 ` Ilpo Järvinen
2025-12-04 1:40 ` [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing Matthew W Carlis
2025-12-04 18:30 ` [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining Matthew W Carlis
3 siblings, 1 reply; 13+ messages in thread
From: ALOK TIWARI @ 2025-12-02 13:49 UTC (permalink / raw)
To: Maciej W. Rozycki, Bjorn Helgaas, Matthew W Carlis
Cc: ashishk, bamstadt, msaggi, sconnor, Lukas Wunner,
Ilpo Järvinen, Jiwei, guojinhui.liam, ahuang12, sunjw10,
linux-pci, linux-kernel
On 12/1/2025 9:22 AM, Maciej W. Rozycki wrote:
> Discard Vendor:Device ID matching in the PCIe failed link retraining
> quirk and ignore the link status for the removal of the 2.5GT/s speed
> clamp, whether applied by the quirk itself or the firmware earlier on.
> Revert to the original target link speed if this final link retraining
> has failed.
>
> This is so that link training noise in hot-plug scenarios does not make
> a link remain clamped to the 2.5GT/s speed where an event race has led
> the quirk to apply the speed clamp for one device, only to leave it in
> place for a subsequent device to be plugged in.
>
> Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
> Signed-off-by: Maciej W. Rozycki<macro@orcam.me.uk>
> Cc:<stable@vger.kernel.org> # v6.5+
> ---
> drivers/pci/quirks.c | 50 ++++++++++++++++++--------------------------------
> 1 file changed, 18 insertions(+), 32 deletions(-)
>
> linux-pcie-failed-link-retrain-unclamp-always.diff
> Index: linux-macro/drivers/pci/quirks.c
> ===================================================================
> --- linux-macro.orig/drivers/pci/quirks.c
> +++ linux-macro/drivers/pci/quirks.c
Thanks a lot for your patch.
The patch works, and the issue has been resolved in our testing.
However, this patch does not cleanly apply to the 6.12 LTS kernel.
To apply the fix cleanly, a series of patches is required.
PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
2389d8dc38fee PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
15b8968dcb90f PCI/bwctrl: Fix NULL pointer deref on unbind and bind
e3f30d563a388 PCI: Make pci_destroy_dev() concurrent safe
667f053b05f00 PCI/bwctrl: Fix NULL pointer dereference on bus number
exhaustion
e93d9fcfd7dc6 PCI: Refactor pcie_update_link_speed()
9989e0ca7462c PCI: Fix link speed calculation on retrain failure
b85af48de3ece PCI: Adjust the position of reading the Link Control 2
register
026e4bffb0af9 PCI/bwctrl: Fix pcie_bwctrl_select_speed() return type
d278b098282d1 thermal: Add PCIe cooling driver
de9a6c8d5dbfe PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed
665745f274870 PCI/bwctrl: Re-add BW notification portdrv as PCIe BW
controller
3491f50966686 PCI: Abstract LBMS seen check into pcie_lbms_seen()
Could you please provide a version of this patch that can be
cleanly cherry-picked for the 6.12 LTS (6.12.y) branch?
Alternatively, is it okay to back-port the above patch series to 6.12.y?
Tested-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Thanks,
Alok
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [External] : [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
2025-12-02 13:49 ` [External] : " ALOK TIWARI
@ 2025-12-02 16:07 ` Ilpo Järvinen
2025-12-03 19:01 ` Maciej W. Rozycki
0 siblings, 1 reply; 13+ messages in thread
From: Ilpo Järvinen @ 2025-12-02 16:07 UTC (permalink / raw)
To: ALOK TIWARI
Cc: Maciej W. Rozycki, Bjorn Helgaas, Matthew W Carlis, ashishk,
bamstadt, msaggi, sconnor, Lukas Wunner, Jiwei, guojinhui.liam,
ahuang12, sunjw10, linux-pci, LKML
On Tue, 2 Dec 2025, ALOK TIWARI wrote:
> On 12/1/2025 9:22 AM, Maciej W. Rozycki wrote:
> > Discard Vendor:Device ID matching in the PCIe failed link retraining
> > quirk and ignore the link status for the removal of the 2.5GT/s speed
> > clamp, whether applied by the quirk itself or the firmware earlier on.
> > Revert to the original target link speed if this final link retraining
> > has failed.
> >
> > This is so that link training noise in hot-plug scenarios does not make
> > a link remain clamped to the 2.5GT/s speed where an event race has led
> > the quirk to apply the speed clamp for one device, only to leave it in
> > place for a subsequent device to be plugged in.
> >
> > Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
> > Signed-off-by: Maciej W. Rozycki<macro@orcam.me.uk>
> > Cc:<stable@vger.kernel.org> # v6.5+
> > ---
> > drivers/pci/quirks.c | 50
> > ++++++++++++++++++--------------------------------
> > 1 file changed, 18 insertions(+), 32 deletions(-)
> >
> > linux-pcie-failed-link-retrain-unclamp-always.diff
> > Index: linux-macro/drivers/pci/quirks.c
> > ===================================================================
> > --- linux-macro.orig/drivers/pci/quirks.c
> > +++ linux-macro/drivers/pci/quirks.c
>
> Thanks a lot for your patch.
> The patch works, and the issue has been resolved in our testing.
>
> However, this patch does not cleanly apply to the 6.12 LTS kernel.
> To apply the fix cleanly, a series of patches is required.
>
> PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
> 2389d8dc38fee PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
> 15b8968dcb90f PCI/bwctrl: Fix NULL pointer deref on unbind and bind
> e3f30d563a388 PCI: Make pci_destroy_dev() concurrent safe
> 667f053b05f00 PCI/bwctrl: Fix NULL pointer dereference on bus number
> exhaustion
> e93d9fcfd7dc6 PCI: Refactor pcie_update_link_speed()
> 9989e0ca7462c PCI: Fix link speed calculation on retrain failure
> b85af48de3ece PCI: Adjust the position of reading the Link Control 2 register
> 026e4bffb0af9 PCI/bwctrl: Fix pcie_bwctrl_select_speed() return type
> d278b098282d1 thermal: Add PCIe cooling driver
> de9a6c8d5dbfe PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed
> 665745f274870 PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller
> 3491f50966686 PCI: Abstract LBMS seen check into pcie_lbms_seen()
>
> Could you please provide a version of this patch that can be
> cleanly cherry-picked for the 6.12 LTS (6.12.y) branch?
>
> Alternatively, is it okay to back-port the above patch series to 6.12.y?
>
>
> Tested-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Thanks for testing.
As the change has a Fixes tag, stable maintainers will eventually get to
it (after the change has progressed into Linus' tree).
As per usual, if the patch won't apply to some old kernel version cleanly,
stable maintainers will decide themselves whether they end up taking some
extra changes or ask for a backport from the submitter of the original
change.
--
i.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [External] : [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
2025-12-02 16:07 ` Ilpo Järvinen
@ 2025-12-03 19:01 ` Maciej W. Rozycki
2025-12-08 19:25 ` Maciej W. Rozycki
0 siblings, 1 reply; 13+ messages in thread
From: Maciej W. Rozycki @ 2025-12-03 19:01 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: ALOK TIWARI, Bjorn Helgaas, Matthew W Carlis, ashishk, msaggi,
sconnor, Lukas Wunner, Jiwei, guojinhui.liam, ahuang12, sunjw10,
linux-pci, LKML
On Tue, 2 Dec 2025, Ilpo Järvinen wrote:
> > > linux-pcie-failed-link-retrain-unclamp-always.diff
> > > Index: linux-macro/drivers/pci/quirks.c
> > > ===================================================================
> > > --- linux-macro.orig/drivers/pci/quirks.c
> > > +++ linux-macro/drivers/pci/quirks.c
> >
> > Thanks a lot for your patch.
> > The patch works, and the issue has been resolved in our testing.
Great, thanks for running the verification!
> > However, this patch does not cleanly apply to the 6.12 LTS kernel.
> > To apply the fix cleanly, a series of patches is required.
> >
> > PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
> > 2389d8dc38fee PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
> > 15b8968dcb90f PCI/bwctrl: Fix NULL pointer deref on unbind and bind
> > e3f30d563a388 PCI: Make pci_destroy_dev() concurrent safe
> > 667f053b05f00 PCI/bwctrl: Fix NULL pointer dereference on bus number
> > exhaustion
> > e93d9fcfd7dc6 PCI: Refactor pcie_update_link_speed()
> > 9989e0ca7462c PCI: Fix link speed calculation on retrain failure
> > b85af48de3ece PCI: Adjust the position of reading the Link Control 2 register
> > 026e4bffb0af9 PCI/bwctrl: Fix pcie_bwctrl_select_speed() return type
> > d278b098282d1 thermal: Add PCIe cooling driver
> > de9a6c8d5dbfe PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed
> > 665745f274870 PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller
> > 3491f50966686 PCI: Abstract LBMS seen check into pcie_lbms_seen()
Thank you for identifying the dependencies. Clearly the majority of the
changes is not functionally related to this fix at all and any need for
them is purely mechanical.
> As the change has a Fixes tag, stable maintainers will eventually get to
> it (after the change has progressed into Linus' tree).
>
> As per usual, if the patch won't apply to some old kernel version cleanly,
> stable maintainers will decide themselves whether they end up taking some
> extra changes or ask for a backport from the submitter of the original
> change.
Dependencies can be listed if need be along with a backport request.
However in this case there have been lots of syntactic updates to this
function, which do not necessarily qualify for backporting, as that is
supposed to include critical fixes only. Certainly changes to prevent
systems from breaking do qualify, but helper macros and functions, or code
restructuring do not.
Since I'm going to wait for further feedback and respin anyway, I will
check if there are critical dependencies required that are missing on the
stable branches and list any along with the backport request. Then any
remaining syntactic sugar can, as you say, be handled on a case by case
basis in the backport process.
Maciej
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing
2025-12-01 3:52 [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining Maciej W. Rozycki
2025-12-01 9:45 ` Ilpo Järvinen
2025-12-02 13:49 ` [External] : " ALOK TIWARI
@ 2025-12-04 1:40 ` Matthew W Carlis
2025-12-04 23:43 ` Maciej W. Rozycki
2025-12-04 18:30 ` [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining Matthew W Carlis
3 siblings, 1 reply; 13+ messages in thread
From: Matthew W Carlis @ 2025-12-04 1:40 UTC (permalink / raw)
To: macro
Cc: ahuang12, alok.a.tiwari, ashishk, bamstadt, bhelgaas,
guojinhui.liam, ilpo.jarvinen, jiwei.sun.bj, linux-kernel,
linux-pci, lukas, mattc, msaggi, sconnor, sunjw10
On Mon, 1 Dec 2025, Maciej W. Rozycki wrote:
> Discard Vendor:Device ID matching in the PCIe failed link retraining
> quirk and ignore the link status for the removal of the 2.5GT/s speed
> clamp, whether applied by the quirk itself or the firmware earlier on.
> Revert to the original target link speed if this final link retraining
> has failed.
I think we should either remove the quirk or only execute the quirk when the
downstream port is the specific ASMedia 0x2824. Hardware companies that
develop PCIe devices rely on the linux kernel for a significant amount of
their testing & the action taken by this quirk is going to introduce
noise into those tests by initiating unexpected speed changes etc.
As long as we have this quirk messing with link speeds we'll just
continue to see issue reports over time in my opinion.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
2025-12-01 3:52 [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining Maciej W. Rozycki
` (2 preceding siblings ...)
2025-12-04 1:40 ` [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing Matthew W Carlis
@ 2025-12-04 18:30 ` Matthew W Carlis
2025-12-08 19:25 ` Maciej W. Rozycki
3 siblings, 1 reply; 13+ messages in thread
From: Matthew W Carlis @ 2025-12-04 18:30 UTC (permalink / raw)
To: macro
Cc: ahuang12, alok.a.tiwari, ashishk, bamstadt, bhelgaas,
guojinhui.liam, ilpo.jarvinen, jiwei.sun.bj, linux-kernel,
linux-pci, lukas, mattc, msaggi, sconnor, sunjw10
I'm sorry my last response kind of messed up the threading in this chain...
I don't understand why we still think its a good idea to have this action
in the kernel for any device type since it seems to only help Maciej W. Rozycki's
specific situation which is very likely to be the only one of its kind. In
addition the Delock adapter isn't what I would consider a particularly
"solid" device.
For what its worth I have a particular Gen5 device in my lab here that gets stuck
in an infinite link up-down loop when you force the speed to Gen2 when also combined
with a specific downstream port... I'm sure there is another device somewhere else
in the world that has the same issue when you force it to Gen1.
The kernel should assume a PCIe link will automatically train to the best
speed that the devices can achieve & if the link fails to train then it should
be up to the user space to decide what to do with it in my opinion.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing
2025-12-04 1:40 ` [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing Matthew W Carlis
@ 2025-12-04 23:43 ` Maciej W. Rozycki
0 siblings, 0 replies; 13+ messages in thread
From: Maciej W. Rozycki @ 2025-12-04 23:43 UTC (permalink / raw)
To: Matthew W Carlis
Cc: ahuang12, alok.a.tiwari, ashishk, Bjorn Helgaas, guojinhui.liam,
Ilpo Järvinen, jiwei.sun.bj, linux-kernel, linux-pci,
Lukas Wunner, msaggi, sconnor, sunjw10
On Wed, 3 Dec 2025, Matthew W Carlis wrote:
> > Discard Vendor:Device ID matching in the PCIe failed link retraining
> > quirk and ignore the link status for the removal of the 2.5GT/s speed
> > clamp, whether applied by the quirk itself or the firmware earlier on.
> > Revert to the original target link speed if this final link retraining
> > has failed.
>
> I think we should either remove the quirk or only execute the quirk when the
> downstream port is the specific ASMedia 0x2824. Hardware companies that
> develop PCIe devices rely on the linux kernel for a significant amount of
> their testing & the action taken by this quirk is going to introduce
> noise into those tests by initiating unexpected speed changes etc.
Conversely, ISTM this could be good motivation for hardware designers to
reduce hot-plug noise. After all LBMS is only supposed to be ever set for
links in the active state and not while training, so perhaps debouncing is
needed for the transient state?
> As long as we have this quirk messing with link speeds we'll just
> continue to see issue reports over time in my opinion.
Well, the issues happened because I made an unfortunate design decision
with the original implementation which did not clean up after itself, just
because I have no use for hot-plug scenarios and didn't envisage it could
be causing issues.
The most recent update brings the device back to its original state
whether retraining succeeded or not, so it should now be transparent to
your noisy hot-plug scenarios, while helping stubborn devices at the same
time. You might have not noticed this code existed if it had been in its
currently proposed shape right from the beginning.
It's only those who do nothing that make no mistakes.
Maciej
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
2025-12-01 13:55 ` Maciej W. Rozycki
2025-12-01 16:48 ` Ilpo Järvinen
@ 2025-12-08 19:24 ` Maciej W. Rozycki
1 sibling, 0 replies; 13+ messages in thread
From: Maciej W. Rozycki @ 2025-12-08 19:24 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Bjorn Helgaas, Matthew W Carlis, ALOK TIWARI, ashishk, msaggi,
sconnor, Lukas Wunner, Jiwei, guojinhui.liam, ahuang12, sunjw10,
linux-pci, LKML
On Mon, 1 Dec 2025, Maciej W. Rozycki wrote:
> > > + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > > + if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
> > > + (lnkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) {
> >
> > I'm trying to recall, if there was some particular reason why
> > ->supported_speeds couldn't be used in this function. It would avoid the
> > need to read LinkCap at all.
>
> Thanks for the hint. There's probably none and it's just me missing some
> of the zillion bits and pieces. I'll wait a couple of days for any other
> people to chime in and respin with this update included if everyone is
> otherwise happy to proceed with this update.
I take it no further feedback will be gathered, so I've sent v2 now, but
I've figured out backporting v1 as it is will result in less intrusion to
the trunk commit, so I have only made a change to use `->supported_speeds'
a follow-up patch in a series. Please let me know if this works for you.
Maciej
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [External] : [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
2025-12-03 19:01 ` Maciej W. Rozycki
@ 2025-12-08 19:25 ` Maciej W. Rozycki
0 siblings, 0 replies; 13+ messages in thread
From: Maciej W. Rozycki @ 2025-12-08 19:25 UTC (permalink / raw)
To: Ilpo Järvinen, ALOK TIWARI
Cc: Bjorn Helgaas, Matthew W Carlis, ashishk, msaggi, sconnor,
Lukas Wunner, Jiwei, guojinhui.liam, ahuang12, sunjw10, linux-pci,
LKML
On Wed, 3 Dec 2025, Maciej W. Rozycki wrote:
> Since I'm going to wait for further feedback and respin anyway, I will
> check if there are critical dependencies required that are missing on the
> stable branches and list any along with the backport request. Then any
> remaining syntactic sugar can, as you say, be handled on a case by case
> basis in the backport process.
I've gone through the relevant active stable branches and there's no fix
required to be additionally backported to either 6.6 or 6.12. There are
only mechanical updates needed, which I've included in the change below,
and which works for me with 6.12. Also nothing extra is needed for 6.17
as the code is the same as in the trunk.
Maciej
---
drivers/pci/quirks.c | 53 ++++++++++++++++++++-------------------------------
1 file changed, 21 insertions(+), 32 deletions(-)
Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -68,11 +68,10 @@
* Restrict the speed to 2.5GT/s then with the Target Link Speed field,
* request a retrain and check the result.
*
- * If this turns out successful and we know by the Vendor:Device ID it is
- * safe to do so, then lift the restriction, letting the devices negotiate
- * a higher speed. Also check for a similar 2.5GT/s speed restriction the
- * firmware may have already arranged and lift it with ports that already
- * report their data link being up.
+ * If this turns out successful, or where a 2.5GT/s speed restriction has
+ * been previously arranged by the firmware and the port reports its link
+ * already being up, lift the restriction, in a hope it is safe to do so,
+ * letting the devices negotiate a higher speed.
*
* Otherwise revert the speed to the original setting and request a retrain
* again to remove any residual state, ignoring the result as it's supposed
@@ -83,23 +82,19 @@
*/
int pcie_failed_link_retrain(struct pci_dev *dev)
{
- static const struct pci_device_id ids[] = {
- { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
- {}
- };
- u16 lnksta, lnkctl2;
+ u16 lnksta, lnkctl2, oldlnkctl2;
int ret = -ENOTTY;
+ u32 lnkcap;
if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
!pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
return ret;
- pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+ pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &oldlnkctl2);
+ lnkctl2 = oldlnkctl2;
if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
PCI_EXP_LNKSTA_LBMS) {
- u16 oldlnkctl2 = lnkctl2;
-
pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
@@ -107,36 +102,30 @@ int pcie_failed_link_retrain(struct pci_
pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
ret = pcie_retrain_link(dev, false);
- if (ret) {
- pci_info(dev, "retraining failed\n");
- pcie_capability_write_word(dev, PCI_EXP_LNKCTL2,
- oldlnkctl2);
- pcie_retrain_link(dev, true);
- return ret;
- }
-
- pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+ if (ret)
+ goto err;
}
- if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
- (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
- pci_match_id(ids, dev)) {
- u32 lnkcap;
-
+ pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
+ if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
+ (lnkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) {
pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
- pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
+
lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;
pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
ret = pcie_retrain_link(dev, false);
- if (ret) {
- pci_info(dev, "retraining failed\n");
- return ret;
- }
+ if (ret)
+ goto err;
}
return ret;
+err:
+ pci_info(dev, "retraining failed\n");
+ pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, oldlnkctl2);
+ pcie_retrain_link(dev, true);
+ return ret;
}
static ktime_t fixup_debug_start(struct pci_dev *dev,
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
2025-12-04 18:30 ` [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining Matthew W Carlis
@ 2025-12-08 19:25 ` Maciej W. Rozycki
0 siblings, 0 replies; 13+ messages in thread
From: Maciej W. Rozycki @ 2025-12-08 19:25 UTC (permalink / raw)
To: Matthew W Carlis
Cc: ahuang12, alok.a.tiwari, ashishk, Bjorn Helgaas, guojinhui.liam,
Ilpo Järvinen, jiwei.sun.bj, linux-kernel, linux-pci,
Lukas Wunner, msaggi, sconnor, sunjw10
On Thu, 4 Dec 2025, Matthew W Carlis wrote:
> For what its worth I have a particular Gen5 device in my lab here that gets stuck
> in an infinite link up-down loop when you force the speed to Gen2 when also combined
> with a specific downstream port... I'm sure there is another device somewhere else
> in the world that has the same issue when you force it to Gen1.
Well, it's an erratum vs another erratum case. Then should such a device
at its maximum link speed trigger the workaround somehow, with this fix in
place any temporary clamp will be lifted anyway and the link retrained, so
it will recover from the link up-down loop. So no functional regression.
> The kernel should assume a PCIe link will automatically train to the best
> speed that the devices can achieve & if the link fails to train then it should
> be up to the user space to decide what to do with it in my opinion.
It might be tough where you have your rootfs device down the problematic
link.
Thank you for your input.
Maciej
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-12-08 19:25 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-01 3:52 [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining Maciej W. Rozycki
2025-12-01 9:45 ` Ilpo Järvinen
2025-12-01 13:55 ` Maciej W. Rozycki
2025-12-01 16:48 ` Ilpo Järvinen
2025-12-08 19:24 ` Maciej W. Rozycki
2025-12-02 13:49 ` [External] : " ALOK TIWARI
2025-12-02 16:07 ` Ilpo Järvinen
2025-12-03 19:01 ` Maciej W. Rozycki
2025-12-08 19:25 ` Maciej W. Rozycki
2025-12-04 1:40 ` [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing Matthew W Carlis
2025-12-04 23:43 ` Maciej W. Rozycki
2025-12-04 18:30 ` [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining Matthew W Carlis
2025-12-08 19:25 ` Maciej W. Rozycki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox