* [PATCH] PCI: Fix link speed calculation on retrain failure @ 2025-06-23 13:22 Lukas Wunner 2025-06-23 13:49 ` Sathyanarayanan Kuppuswamy ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Lukas Wunner @ 2025-06-23 13:22 UTC (permalink / raw) To: Bjorn Helgaas Cc: Andrew, Ilpo Jarvinen, Maciej W. Rozycki, Matthew W Carlis, linux-pci When pcie_failed_link_retrain() fails to retrain, it tries to revert to the previous link speed. However it calculates that speed from the Link Control 2 register without masking out non-speed bits first. PCIE_LNKCTL2_TLS2SPEED() converts such incorrect values to PCI_SPEED_UNKNOWN, which in turn causes a WARN splat in pcie_set_target_speed(): pci 0000:00:01.1: [1022:14ed] type 01 class 0x060400 PCIe Root Port pci 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s pci 0000:00:01.1: retraining failed WARNING: CPU: 1 PID: 1 at drivers/pci/pcie/bwctrl.c:168 pcie_set_target_speed RDX: 0000000000000001 RSI: 00000000000000ff RDI: ffff9acd82efa000 pcie_failed_link_retrain pci_device_add pci_scan_single_device pci_scan_slot pci_scan_child_bus_extend acpi_pci_root_create pci_acpi_scan_root acpi_pci_root_add acpi_bus_attach device_for_each_child acpi_dev_for_each_child acpi_bus_attach device_for_each_child acpi_dev_for_each_child acpi_bus_attach acpi_bus_scan acpi_scan_init acpi_init Per the calling convention of the System V AMD64 ABI, the arguments to pcie_set_target_speed(struct pci_dev *, enum pci_bus_speed, bool) are stored in RDI, RSI, RDX. As visible above, RSI contains 0xff, i.e. PCI_SPEED_UNKNOWN. Fixes: f68dea13405c ("PCI: Revert to the original speed after PCIe failed link retraining") Reported-by: Andrew <andreasx0@protonmail.com> Closes: https://lore.kernel.org/r/7iNzXbCGpf8yUMJZBQjLdbjPcXrEJqBxy5-bHfppz0ek-h4_-G93b1KUrm106r2VNF2FV_sSq0nENv4RsRIUGnlYZMlQr2ZD2NyB5sdj5aU=@protonmail.com/ Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v6.12+ --- drivers/pci/quirks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index d7f4ee6..deaaf4f 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -108,7 +108,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) { - u16 oldlnkctl2 = lnkctl2; + u16 oldlnkctl2 = lnkctl2 & PCI_EXP_LNKCTL2_TLS; pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n"); -- 2.47.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] PCI: Fix link speed calculation on retrain failure 2025-06-23 13:22 [PATCH] PCI: Fix link speed calculation on retrain failure Lukas Wunner @ 2025-06-23 13:49 ` Sathyanarayanan Kuppuswamy 2025-06-24 11:23 ` Ilpo Järvinen 2025-06-24 16:48 ` Bjorn Helgaas 2 siblings, 0 replies; 10+ messages in thread From: Sathyanarayanan Kuppuswamy @ 2025-06-23 13:49 UTC (permalink / raw) To: Lukas Wunner, Bjorn Helgaas Cc: Andrew, Ilpo Jarvinen, Maciej W. Rozycki, Matthew W Carlis, linux-pci On 6/23/25 6:22 AM, Lukas Wunner wrote: > When pcie_failed_link_retrain() fails to retrain, it tries to revert to > the previous link speed. However it calculates that speed from the Link > Control 2 register without masking out non-speed bits first. > > PCIE_LNKCTL2_TLS2SPEED() converts such incorrect values to > PCI_SPEED_UNKNOWN, which in turn causes a WARN splat in > pcie_set_target_speed(): > > pci 0000:00:01.1: [1022:14ed] type 01 class 0x060400 PCIe Root Port > pci 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s > pci 0000:00:01.1: retraining failed > WARNING: CPU: 1 PID: 1 at drivers/pci/pcie/bwctrl.c:168 pcie_set_target_speed > RDX: 0000000000000001 RSI: 00000000000000ff RDI: ffff9acd82efa000 > pcie_failed_link_retrain > pci_device_add > pci_scan_single_device > pci_scan_slot > pci_scan_child_bus_extend > acpi_pci_root_create > pci_acpi_scan_root > acpi_pci_root_add > acpi_bus_attach > device_for_each_child > acpi_dev_for_each_child > acpi_bus_attach > device_for_each_child > acpi_dev_for_each_child > acpi_bus_attach > acpi_bus_scan > acpi_scan_init > acpi_init > > Per the calling convention of the System V AMD64 ABI, the arguments to > pcie_set_target_speed(struct pci_dev *, enum pci_bus_speed, bool) are > stored in RDI, RSI, RDX. As visible above, RSI contains 0xff, i.e. > PCI_SPEED_UNKNOWN. > > Fixes: f68dea13405c ("PCI: Revert to the original speed after PCIe failed link retraining") > Reported-by: Andrew <andreasx0@protonmail.com> > Closes: https://lore.kernel.org/r/7iNzXbCGpf8yUMJZBQjLdbjPcXrEJqBxy5-bHfppz0ek-h4_-G93b1KUrm106r2VNF2FV_sSq0nENv4RsRIUGnlYZMlQr2ZD2NyB5sdj5aU=@protonmail.com/ > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Cc: stable@vger.kernel.org # v6.12+ > --- Looks good to me. Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> > drivers/pci/quirks.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index d7f4ee6..deaaf4f 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -108,7 +108,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) { > - u16 oldlnkctl2 = lnkctl2; > + u16 oldlnkctl2 = lnkctl2 & PCI_EXP_LNKCTL2_TLS; > > pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n"); > -- Sathyanarayanan Kuppuswamy Linux Kernel Developer ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] PCI: Fix link speed calculation on retrain failure 2025-06-23 13:22 [PATCH] PCI: Fix link speed calculation on retrain failure Lukas Wunner 2025-06-23 13:49 ` Sathyanarayanan Kuppuswamy @ 2025-06-24 11:23 ` Ilpo Järvinen 2025-06-24 12:19 ` Lukas Wunner 2025-06-24 16:48 ` Bjorn Helgaas 2 siblings, 1 reply; 10+ messages in thread From: Ilpo Järvinen @ 2025-06-24 11:23 UTC (permalink / raw) To: Lukas Wunner Cc: Bjorn Helgaas, Andrew, Maciej W. Rozycki, Matthew W Carlis, linux-pci [-- Attachment #1: Type: text/plain, Size: 2759 bytes --] On Mon, 23 Jun 2025, Lukas Wunner wrote: > When pcie_failed_link_retrain() fails to retrain, it tries to revert to > the previous link speed. However it calculates that speed from the Link > Control 2 register without masking out non-speed bits first. > > PCIE_LNKCTL2_TLS2SPEED() converts such incorrect values to > PCI_SPEED_UNKNOWN, which in turn causes a WARN splat in > pcie_set_target_speed(): > > pci 0000:00:01.1: [1022:14ed] type 01 class 0x060400 PCIe Root Port > pci 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s > pci 0000:00:01.1: retraining failed > WARNING: CPU: 1 PID: 1 at drivers/pci/pcie/bwctrl.c:168 pcie_set_target_speed > RDX: 0000000000000001 RSI: 00000000000000ff RDI: ffff9acd82efa000 > pcie_failed_link_retrain > pci_device_add > pci_scan_single_device > pci_scan_slot > pci_scan_child_bus_extend > acpi_pci_root_create > pci_acpi_scan_root > acpi_pci_root_add > acpi_bus_attach > device_for_each_child > acpi_dev_for_each_child > acpi_bus_attach > device_for_each_child > acpi_dev_for_each_child > acpi_bus_attach > acpi_bus_scan > acpi_scan_init > acpi_init > > Per the calling convention of the System V AMD64 ABI, the arguments to > pcie_set_target_speed(struct pci_dev *, enum pci_bus_speed, bool) are > stored in RDI, RSI, RDX. As visible above, RSI contains 0xff, i.e. > PCI_SPEED_UNKNOWN. > > Fixes: f68dea13405c ("PCI: Revert to the original speed after PCIe failed link retraining") > Reported-by: Andrew <andreasx0@protonmail.com> > Closes: https://lore.kernel.org/r/7iNzXbCGpf8yUMJZBQjLdbjPcXrEJqBxy5-bHfppz0ek-h4_-G93b1KUrm106r2VNF2FV_sSq0nENv4RsRIUGnlYZMlQr2ZD2NyB5sdj5aU=@protonmail.com/ > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Cc: stable@vger.kernel.org # v6.12+ > --- > drivers/pci/quirks.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index d7f4ee6..deaaf4f 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -108,7 +108,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) { > - u16 oldlnkctl2 = lnkctl2; > + u16 oldlnkctl2 = lnkctl2 & PCI_EXP_LNKCTL2_TLS; > > pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n"); Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> IIRC, there was a patch from somebody else which fixed this a bit differently but never got applied (many months ago by now). -- i. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] PCI: Fix link speed calculation on retrain failure 2025-06-24 11:23 ` Ilpo Järvinen @ 2025-06-24 12:19 ` Lukas Wunner 2025-06-24 12:28 ` Ilpo Järvinen 0 siblings, 1 reply; 10+ messages in thread From: Lukas Wunner @ 2025-06-24 12:19 UTC (permalink / raw) To: Ilpo Järvinen Cc: Bjorn Helgaas, Andrew, Maciej W. Rozycki, Matthew W Carlis, linux-pci On Tue, Jun 24, 2025 at 02:23:33PM +0300, Ilpo Järvinen wrote: > On Mon, 23 Jun 2025, Lukas Wunner wrote: > > --- a/drivers/pci/quirks.c > > +++ b/drivers/pci/quirks.c > > @@ -108,7 +108,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > > pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > > if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) { > > - u16 oldlnkctl2 = lnkctl2; > > + u16 oldlnkctl2 = lnkctl2 & PCI_EXP_LNKCTL2_TLS; > > > > pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n"); > > IIRC, there was a patch from somebody else which fixed this a bit > differently but never got applied (many months ago by now). Must be this one, still marked "New" in patchwork: https://patchwork.kernel.org/project/linux-pci/patch/20250123055155.22648-2-sjiwei@163.com/ I don't care which one gets applied, as long as the issue is fixed. Thanks, Lukas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] PCI: Fix link speed calculation on retrain failure 2025-06-24 12:19 ` Lukas Wunner @ 2025-06-24 12:28 ` Ilpo Järvinen 0 siblings, 0 replies; 10+ messages in thread From: Ilpo Järvinen @ 2025-06-24 12:28 UTC (permalink / raw) To: Lukas Wunner, Bjorn Helgaas Cc: Andrew, Maciej W. Rozycki, Matthew W Carlis, linux-pci [-- Attachment #1: Type: text/plain, Size: 1331 bytes --] On Tue, 24 Jun 2025, Lukas Wunner wrote: > On Tue, Jun 24, 2025 at 02:23:33PM +0300, Ilpo Järvinen wrote: > > On Mon, 23 Jun 2025, Lukas Wunner wrote: > > > --- a/drivers/pci/quirks.c > > > +++ b/drivers/pci/quirks.c > > > @@ -108,7 +108,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > > > pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > > > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > > > if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) { > > > - u16 oldlnkctl2 = lnkctl2; > > > + u16 oldlnkctl2 = lnkctl2 & PCI_EXP_LNKCTL2_TLS; > > > > > > pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n"); > > > > IIRC, there was a patch from somebody else which fixed this a bit > > differently but never got applied (many months ago by now). > > Must be this one, still marked "New" in patchwork: > > https://patchwork.kernel.org/project/linux-pci/patch/20250123055155.22648-2-sjiwei@163.com/ > > I don't care which one gets applied, as long as the issue is fixed. Yes, that's the one. It seemed pointless to me to require callers to apply the mask to the register value when the conversion helper could do that itself, especially when the macro takes in a parameter called "lnkctl2". -- i. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] PCI: Fix link speed calculation on retrain failure 2025-06-23 13:22 [PATCH] PCI: Fix link speed calculation on retrain failure Lukas Wunner 2025-06-23 13:49 ` Sathyanarayanan Kuppuswamy 2025-06-24 11:23 ` Ilpo Järvinen @ 2025-06-24 16:48 ` Bjorn Helgaas 2025-06-24 18:13 ` Sathyanarayanan Kuppuswamy 2 siblings, 1 reply; 10+ messages in thread From: Bjorn Helgaas @ 2025-06-24 16:48 UTC (permalink / raw) To: Lukas Wunner Cc: Andrew, Ilpo Jarvinen, Maciej W. Rozycki, Matthew W Carlis, linux-pci, Sathyanarayanan Kuppuswamy, Jiwei Sun, Adrian Huang12 [+cc Sathy, Jiwei, Adrian] On Mon, Jun 23, 2025 at 03:22:14PM +0200, Lukas Wunner wrote: > When pcie_failed_link_retrain() fails to retrain, it tries to revert to > the previous link speed. However it calculates that speed from the Link > Control 2 register without masking out non-speed bits first. > > PCIE_LNKCTL2_TLS2SPEED() converts such incorrect values to > PCI_SPEED_UNKNOWN, which in turn causes a WARN splat in > pcie_set_target_speed(): > > pci 0000:00:01.1: [1022:14ed] type 01 class 0x060400 PCIe Root Port > pci 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s > pci 0000:00:01.1: retraining failed > WARNING: CPU: 1 PID: 1 at drivers/pci/pcie/bwctrl.c:168 pcie_set_target_speed > RDX: 0000000000000001 RSI: 00000000000000ff RDI: ffff9acd82efa000 > pcie_failed_link_retrain > pci_device_add > pci_scan_single_device > pci_scan_slot > pci_scan_child_bus_extend > acpi_pci_root_create > pci_acpi_scan_root > acpi_pci_root_add > acpi_bus_attach > device_for_each_child > acpi_dev_for_each_child > acpi_bus_attach > device_for_each_child > acpi_dev_for_each_child > acpi_bus_attach > acpi_bus_scan > acpi_scan_init > acpi_init > > Per the calling convention of the System V AMD64 ABI, the arguments to > pcie_set_target_speed(struct pci_dev *, enum pci_bus_speed, bool) are > stored in RDI, RSI, RDX. As visible above, RSI contains 0xff, i.e. > PCI_SPEED_UNKNOWN. > > Fixes: f68dea13405c ("PCI: Revert to the original speed after PCIe failed link retraining") > Reported-by: Andrew <andreasx0@protonmail.com> > Closes: https://lore.kernel.org/r/7iNzXbCGpf8yUMJZBQjLdbjPcXrEJqBxy5-bHfppz0ek-h4_-G93b1KUrm106r2VNF2FV_sSq0nENv4RsRIUGnlYZMlQr2ZD2NyB5sdj5aU=@protonmail.com/ > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Cc: stable@vger.kernel.org # v6.12+ I like the brevity of this patch, but I do worry that if we ever have other users of PCIE_LNKCTL2_TLS2SPEED(), we might have the same problem again. Also, it looks like PCIE_LNKCAP_SLS2SPEED() has the same problem. f68dea13405c predates PCIE_LNKCTL2_TLS2SPEED(), and I don't think this problem existed as of f68dea13405c. I think the Fixes: tag should be for de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed"), which added PCIE_LNKCTL2_TLS2SPEED() and PCIE_LNKCAP_SLS2SPEED() without masking out the other bits. I think I'll take Jiwei's patch [1], which fixes PCIE_LNKCTL2_TLS2SPEED() and PCIE_LNKCAP_SLS2SPEED() without requiring changes in the users. I'll add the details of Andrew's report to the commit log. [1] https://lore.kernel.org/all/20250123055155.22648-2-sjiwei@163.com/ > --- > drivers/pci/quirks.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index d7f4ee6..deaaf4f 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -108,7 +108,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) { > - u16 oldlnkctl2 = lnkctl2; > + u16 oldlnkctl2 = lnkctl2 & PCI_EXP_LNKCTL2_TLS; > > pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n"); > > -- > 2.47.2 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] PCI: Fix link speed calculation on retrain failure 2025-06-24 16:48 ` Bjorn Helgaas @ 2025-06-24 18:13 ` Sathyanarayanan Kuppuswamy 2025-06-25 16:06 ` andreasx0 0 siblings, 1 reply; 10+ messages in thread From: Sathyanarayanan Kuppuswamy @ 2025-06-24 18:13 UTC (permalink / raw) To: Bjorn Helgaas, Lukas Wunner Cc: Andrew, Ilpo Jarvinen, Maciej W. Rozycki, Matthew W Carlis, linux-pci, Jiwei Sun, Adrian Huang12 On 6/24/25 9:48 AM, Bjorn Helgaas wrote: > [+cc Sathy, Jiwei, Adrian] > > On Mon, Jun 23, 2025 at 03:22:14PM +0200, Lukas Wunner wrote: >> When pcie_failed_link_retrain() fails to retrain, it tries to revert to >> the previous link speed. However it calculates that speed from the Link >> Control 2 register without masking out non-speed bits first. >> >> PCIE_LNKCTL2_TLS2SPEED() converts such incorrect values to >> PCI_SPEED_UNKNOWN, which in turn causes a WARN splat in >> pcie_set_target_speed(): >> >> pci 0000:00:01.1: [1022:14ed] type 01 class 0x060400 PCIe Root Port >> pci 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s >> pci 0000:00:01.1: retraining failed >> WARNING: CPU: 1 PID: 1 at drivers/pci/pcie/bwctrl.c:168 pcie_set_target_speed >> RDX: 0000000000000001 RSI: 00000000000000ff RDI: ffff9acd82efa000 >> pcie_failed_link_retrain >> pci_device_add >> pci_scan_single_device >> pci_scan_slot >> pci_scan_child_bus_extend >> acpi_pci_root_create >> pci_acpi_scan_root >> acpi_pci_root_add >> acpi_bus_attach >> device_for_each_child >> acpi_dev_for_each_child >> acpi_bus_attach >> device_for_each_child >> acpi_dev_for_each_child >> acpi_bus_attach >> acpi_bus_scan >> acpi_scan_init >> acpi_init >> >> Per the calling convention of the System V AMD64 ABI, the arguments to >> pcie_set_target_speed(struct pci_dev *, enum pci_bus_speed, bool) are >> stored in RDI, RSI, RDX. As visible above, RSI contains 0xff, i.e. >> PCI_SPEED_UNKNOWN. >> >> Fixes: f68dea13405c ("PCI: Revert to the original speed after PCIe failed link retraining") >> Reported-by: Andrew <andreasx0@protonmail.com> >> Closes: https://lore.kernel.org/r/7iNzXbCGpf8yUMJZBQjLdbjPcXrEJqBxy5-bHfppz0ek-h4_-G93b1KUrm106r2VNF2FV_sSq0nENv4RsRIUGnlYZMlQr2ZD2NyB5sdj5aU=@protonmail.com/ >> Signed-off-by: Lukas Wunner <lukas@wunner.de> >> Cc: stable@vger.kernel.org # v6.12+ > I like the brevity of this patch, but I do worry that if we ever have > other users of PCIE_LNKCTL2_TLS2SPEED(), we might have the same > problem again. > > Also, it looks like PCIE_LNKCAP_SLS2SPEED() has the same problem. > > f68dea13405c predates PCIE_LNKCTL2_TLS2SPEED(), and I don't think this > problem existed as of f68dea13405c. I think the Fixes: tag should be > for de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe > Link Speed"), which added PCIE_LNKCTL2_TLS2SPEED() and > PCIE_LNKCAP_SLS2SPEED() without masking out the other bits. > > I think I'll take Jiwei's patch [1], which fixes > PCIE_LNKCTL2_TLS2SPEED() and PCIE_LNKCAP_SLS2SPEED() without requiring > changes in the users. I'll add the details of Andrew's report to the > commit log. Agree. It is better to fix it in the macro. > > [1] https://lore.kernel.org/all/20250123055155.22648-2-sjiwei@163.com/ > >> --- >> drivers/pci/quirks.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c >> index d7f4ee6..deaaf4f 100644 >> --- a/drivers/pci/quirks.c >> +++ b/drivers/pci/quirks.c >> @@ -108,7 +108,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) >> pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); >> pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); >> if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) { >> - u16 oldlnkctl2 = lnkctl2; >> + u16 oldlnkctl2 = lnkctl2 & PCI_EXP_LNKCTL2_TLS; >> >> pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n"); >> >> -- >> 2.47.2 >> -- Sathyanarayanan Kuppuswamy Linux Kernel Developer ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] PCI: Fix link speed calculation on retrain failure 2025-06-24 18:13 ` Sathyanarayanan Kuppuswamy @ 2025-06-25 16:06 ` andreasx0 2025-06-25 17:46 ` Bjorn Helgaas 0 siblings, 1 reply; 10+ messages in thread From: andreasx0 @ 2025-06-25 16:06 UTC (permalink / raw) To: Sathyanarayanan Kuppuswamy Cc: Bjorn Helgaas, Lukas Wunner, Ilpo Jarvinen, Maciej W. Rozycki, Matthew W Carlis, linux-pci, Jiwei Sun, Adrian Huang12 [-- Attachment #1.1: Type: text/plain, Size: 4188 bytes --] Again. As said the patch from Lucas fixed the warning that was caused because the discrete nvidia gpu was disabled by bios. On Tuesday, June 24th, 2025 at 21:13, Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: > On 6/24/25 9:48 AM, Bjorn Helgaas wrote: > > > [+cc Sathy, Jiwei, Adrian] > > > > On Mon, Jun 23, 2025 at 03:22:14PM +0200, Lukas Wunner wrote: > > > > > When pcie_failed_link_retrain() fails to retrain, it tries to revert to > > > the previous link speed. However it calculates that speed from the Link > > > Control 2 register without masking out non-speed bits first. > > > > > > PCIE_LNKCTL2_TLS2SPEED() converts such incorrect values to > > > PCI_SPEED_UNKNOWN, which in turn causes a WARN splat in > > > pcie_set_target_speed(): > > > > > > pci 0000:00:01.1: [1022:14ed] type 01 class 0x060400 PCIe Root Port > > > pci 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s > > > pci 0000:00:01.1: retraining failed > > > WARNING: CPU: 1 PID: 1 at drivers/pci/pcie/bwctrl.c:168 pcie_set_target_speed > > > RDX: 0000000000000001 RSI: 00000000000000ff RDI: ffff9acd82efa000 > > > pcie_failed_link_retrain > > > pci_device_add > > > pci_scan_single_device > > > pci_scan_slot > > > pci_scan_child_bus_extend > > > acpi_pci_root_create > > > pci_acpi_scan_root > > > acpi_pci_root_add > > > acpi_bus_attach > > > device_for_each_child > > > acpi_dev_for_each_child > > > acpi_bus_attach > > > device_for_each_child > > > acpi_dev_for_each_child > > > acpi_bus_attach > > > acpi_bus_scan > > > acpi_scan_init > > > acpi_init > > > > > > Per the calling convention of the System V AMD64 ABI, the arguments to > > > pcie_set_target_speed(struct pci_dev *, enum pci_bus_speed, bool) are > > > stored in RDI, RSI, RDX. As visible above, RSI contains 0xff, i.e. > > > PCI_SPEED_UNKNOWN. > > > > > > Fixes: f68dea13405c ("PCI: Revert to the original speed after PCIe failed link retraining") > > > Reported-by: Andrew andreasx0@protonmail.com > > > Closes: https://lore.kernel.org/r/7iNzXbCGpf8yUMJZBQjLdbjPcXrEJqBxy5-bHfppz0ek-h4_-G93b1KUrm106r2VNF2FV_sSq0nENv4RsRIUGnlYZMlQr2ZD2NyB5sdj5aU=@protonmail.com/ > > > Signed-off-by: Lukas Wunner lukas@wunner.de > > > Cc: stable@vger.kernel.org # v6.12+ > > > I like the brevity of this patch, but I do worry that if we ever have > > > other users of PCIE_LNKCTL2_TLS2SPEED(), we might have the same > > > problem again. > > > > Also, it looks like PCIE_LNKCAP_SLS2SPEED() has the same problem. > > > > f68dea13405c predates PCIE_LNKCTL2_TLS2SPEED(), and I don't think this > > problem existed as of f68dea13405c. I think the Fixes: tag should be > > for de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe > > Link Speed"), which added PCIE_LNKCTL2_TLS2SPEED() and > > PCIE_LNKCAP_SLS2SPEED() without masking out the other bits. > > > > I think I'll take Jiwei's patch [1], which fixes > > PCIE_LNKCTL2_TLS2SPEED() and PCIE_LNKCAP_SLS2SPEED() without requiring > > changes in the users. I'll add the details of Andrew's report to the > > commit log. > > > Agree. It is better to fix it in the macro. > > > [1] https://lore.kernel.org/all/20250123055155.22648-2-sjiwei@163.com/ > > > > > --- > > > drivers/pci/quirks.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > > index d7f4ee6..deaaf4f 100644 > > > --- a/drivers/pci/quirks.c > > > +++ b/drivers/pci/quirks.c > > > @@ -108,7 +108,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > > > pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > > > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > > > if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) { > > > - u16 oldlnkctl2 = lnkctl2; > > > + u16 oldlnkctl2 = lnkctl2 & PCI_EXP_LNKCTL2_TLS; > > > > > > pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n"); > > > > > > -- > > > 2.47.2 > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer [-- Attachment #1.2: publickey - andreasx0@protonmail.com - 0xF61BB148.asc --] [-- Type: application/pgp-keys, Size: 661 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 343 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] PCI: Fix link speed calculation on retrain failure 2025-06-25 16:06 ` andreasx0 @ 2025-06-25 17:46 ` Bjorn Helgaas 2025-06-26 22:33 ` andreasx0 0 siblings, 1 reply; 10+ messages in thread From: Bjorn Helgaas @ 2025-06-25 17:46 UTC (permalink / raw) To: andreasx0 Cc: Sathyanarayanan Kuppuswamy, Lukas Wunner, Ilpo Jarvinen, Maciej W. Rozycki, Matthew W Carlis, linux-pci, Jiwei Sun, Adrian Huang12 On Wed, Jun 25, 2025 at 04:06:58PM +0000, andreasx0 wrote: > Again. As said the patch from Lucas fixed the warning that was > caused because the discrete nvidia gpu was disabled by bios. The series I applied is at https://lore.kernel.org/all/20250123055155.22648-1-sjiwei@163.com/. The patches currently queued are at https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h=enumeration I cc'd you on my response to that series, so if you think the commit log needs a change, feel free to suggest something in that thread. It's a generic problem, not anything specific to the GPU, so I just included the log messages a user would see when the problem happens. I added your Reported-by because I think the first patch [2] *should* fix the problem you saw. If it doesn't, please let me know. If you test it and it does fix the problem, I'd be happy to add your Tested-by as well. Thanks very much for reporting this issue and giving it a nudge to get it fixed! [2] https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?id=9989e0ca7462 > On Tuesday, June 24th, 2025 at 21:13, Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: > > > On 6/24/25 9:48 AM, Bjorn Helgaas wrote: > > > > > > [+cc Sathy, Jiwei, Adrian] > > > > > > > On Mon, Jun 23, 2025 at 03:22:14PM +0200, Lukas Wunner wrote: > > > > > > > > When pcie_failed_link_retrain() fails to retrain, it tries to revert to > > > > the previous link speed. However it calculates that speed from the Link > > > > Control 2 register without masking out non-speed bits first. > > > > > > > > > PCIE_LNKCTL2_TLS2SPEED() converts such incorrect values to > > > > PCI_SPEED_UNKNOWN, which in turn causes a WARN splat in > > > > pcie_set_target_speed(): > > > > > > > > > pci 0000:00:01.1: [1022:14ed] type 01 class 0x060400 PCIe Root Port > > > > pci 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s > > > > pci 0000:00:01.1: retraining failed > > > > WARNING: CPU: 1 PID: 1 at drivers/pci/pcie/bwctrl.c:168 pcie_set_target_speed > > > > RDX: 0000000000000001 RSI: 00000000000000ff RDI: ffff9acd82efa000 > > > > pcie_failed_link_retrain > > > > pci_device_add > > > > pci_scan_single_device > > > > pci_scan_slot > > > > pci_scan_child_bus_extend > > > > acpi_pci_root_create > > > > pci_acpi_scan_root > > > > acpi_pci_root_add > > > > acpi_bus_attach > > > > device_for_each_child > > > > acpi_dev_for_each_child > > > > acpi_bus_attach > > > > device_for_each_child > > > > acpi_dev_for_each_child > > > > acpi_bus_attach > > > > acpi_bus_scan > > > > acpi_scan_init > > > > acpi_init > > > > > > > > > Per the calling convention of the System V AMD64 ABI, the arguments to > > > > pcie_set_target_speed(struct pci_dev *, enum pci_bus_speed, bool) are > > > > stored in RDI, RSI, RDX. As visible above, RSI contains 0xff, i.e. > > > > PCI_SPEED_UNKNOWN. > > > > > > > > > Fixes: f68dea13405c ("PCI: Revert to the original speed after PCIe failed link retraining") > > > > Reported-by: Andrew andreasx0@protonmail.com > > > > Closes: https://lore.kernel.org/r/7iNzXbCGpf8yUMJZBQjLdbjPcXrEJqBxy5-bHfppz0ek-h4_-G93b1KUrm106r2VNF2FV_sSq0nENv4RsRIUGnlYZMlQr2ZD2NyB5sdj5aU=@protonmail.com/ > > > > Signed-off-by: Lukas Wunner lukas@wunner.de > > > > Cc: stable@vger.kernel.org # v6.12+ > > > > I like the brevity of this patch, but I do worry that if we ever have > > > > other users of PCIE_LNKCTL2_TLS2SPEED(), we might have the same > > > > problem again. > > > > > > > Also, it looks like PCIE_LNKCAP_SLS2SPEED() has the same problem. > > > > > > > f68dea13405c predates PCIE_LNKCTL2_TLS2SPEED(), and I don't think this > > > problem existed as of f68dea13405c. I think the Fixes: tag should be > > > for de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe > > > Link Speed"), which added PCIE_LNKCTL2_TLS2SPEED() and > > > PCIE_LNKCAP_SLS2SPEED() without masking out the other bits. > > > > > > > I think I'll take Jiwei's patch [1], which fixes > > > PCIE_LNKCTL2_TLS2SPEED() and PCIE_LNKCAP_SLS2SPEED() without requiring > > > changes in the users. I'll add the details of Andrew's report to the > > > commit log. > > > > > > > > Agree. It is better to fix it in the macro. > > > > > > [1] https://lore.kernel.org/all/20250123055155.22648-2-sjiwei@163.com/ > > > > > > > > --- > > > > drivers/pci/quirks.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > > > index d7f4ee6..deaaf4f 100644 > > > > --- a/drivers/pci/quirks.c > > > > +++ b/drivers/pci/quirks.c > > > > @@ -108,7 +108,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > > > > pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > > > > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > > > > if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) { > > > > - u16 oldlnkctl2 = lnkctl2; > > > > + u16 oldlnkctl2 = lnkctl2 & PCI_EXP_LNKCTL2_TLS; > > > > > > > > > pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n"); > > > > > > > > > -- > > > > 2.47.2 > > > > > -- > > Sathyanarayanan Kuppuswamy > > Linux Kernel Developer ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] PCI: Fix link speed calculation on retrain failure 2025-06-25 17:46 ` Bjorn Helgaas @ 2025-06-26 22:33 ` andreasx0 0 siblings, 0 replies; 10+ messages in thread From: andreasx0 @ 2025-06-26 22:33 UTC (permalink / raw) To: Bjorn Helgaas Cc: Sathyanarayanan Kuppuswamy, Lukas Wunner, Ilpo Jarvinen, Maciej W. Rozycki, Matthew W Carlis, linux-pci, Jiwei Sun, Adrian Huang12 [-- Attachment #1.1: Type: text/plain, Size: 5783 bytes --] Tested and passed. But what i mean is why try to retrain the line of the connected bios disabled discrete GPU? Is the normal of disabled to drop it to 2.5? On Wednesday, June 25th, 2025 at 20:46, Bjorn Helgaas <helgaas@kernel.org> wrote: > On Wed, Jun 25, 2025 at 04:06:58PM +0000, andreasx0 wrote: > > > Again. As said the patch from Lucas fixed the warning that was > > caused because the discrete nvidia gpu was disabled by bios. > > > The series I applied is at > https://lore.kernel.org/all/20250123055155.22648-1-sjiwei@163.com/. > The patches currently queued are at > https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h=enumeration > > I cc'd you on my response to that series, so if you think the commit > log needs a change, feel free to suggest something in that thread. > It's a generic problem, not anything specific to the GPU, so I just > included the log messages a user would see when the problem happens. > > I added your Reported-by because I think the first patch [2] should > fix the problem you saw. If it doesn't, please let me know. If you > test it and it does fix the problem, I'd be happy to add your > Tested-by as well. > > Thanks very much for reporting this issue and giving it a nudge to get > it fixed! > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?id=9989e0ca7462 > > > On Tuesday, June 24th, 2025 at 21:13, Sathyanarayanan Kuppuswamy sathyanarayanan.kuppuswamy@linux.intel.com wrote: > > > > > On 6/24/25 9:48 AM, Bjorn Helgaas wrote: > > > > > > [+cc Sathy, Jiwei, Adrian] > > > > > > On Mon, Jun 23, 2025 at 03:22:14PM +0200, Lukas Wunner wrote: > > > > > > > When pcie_failed_link_retrain() fails to retrain, it tries to revert to > > > > > the previous link speed. However it calculates that speed from the Link > > > > > Control 2 register without masking out non-speed bits first. > > > > > > > PCIE_LNKCTL2_TLS2SPEED() converts such incorrect values to > > > > > PCI_SPEED_UNKNOWN, which in turn causes a WARN splat in > > > > > pcie_set_target_speed(): > > > > > > > pci 0000:00:01.1: [1022:14ed] type 01 class 0x060400 PCIe Root Port > > > > > pci 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s > > > > > pci 0000:00:01.1: retraining failed > > > > > WARNING: CPU: 1 PID: 1 at drivers/pci/pcie/bwctrl.c:168 pcie_set_target_speed > > > > > RDX: 0000000000000001 RSI: 00000000000000ff RDI: ffff9acd82efa000 > > > > > pcie_failed_link_retrain > > > > > pci_device_add > > > > > pci_scan_single_device > > > > > pci_scan_slot > > > > > pci_scan_child_bus_extend > > > > > acpi_pci_root_create > > > > > pci_acpi_scan_root > > > > > acpi_pci_root_add > > > > > acpi_bus_attach > > > > > device_for_each_child > > > > > acpi_dev_for_each_child > > > > > acpi_bus_attach > > > > > device_for_each_child > > > > > acpi_dev_for_each_child > > > > > acpi_bus_attach > > > > > acpi_bus_scan > > > > > acpi_scan_init > > > > > acpi_init > > > > > > > Per the calling convention of the System V AMD64 ABI, the arguments to > > > > > pcie_set_target_speed(struct pci_dev *, enum pci_bus_speed, bool) are > > > > > stored in RDI, RSI, RDX. As visible above, RSI contains 0xff, i.e. > > > > > PCI_SPEED_UNKNOWN. > > > > > > > Fixes: f68dea13405c ("PCI: Revert to the original speed after PCIe failed link retraining") > > > > > Reported-by: Andrew andreasx0@protonmail.com > > > > > Closes: https://lore.kernel.org/r/7iNzXbCGpf8yUMJZBQjLdbjPcXrEJqBxy5-bHfppz0ek-h4_-G93b1KUrm106r2VNF2FV_sSq0nENv4RsRIUGnlYZMlQr2ZD2NyB5sdj5aU=@protonmail.com/ > > > > > Signed-off-by: Lukas Wunner lukas@wunner.de > > > > > Cc: stable@vger.kernel.org # v6.12+ > > > > > I like the brevity of this patch, but I do worry that if we ever have > > > > > other users of PCIE_LNKCTL2_TLS2SPEED(), we might have the same > > > > > problem again. > > > > > > Also, it looks like PCIE_LNKCAP_SLS2SPEED() has the same problem. > > > > > > f68dea13405c predates PCIE_LNKCTL2_TLS2SPEED(), and I don't think this > > > > problem existed as of f68dea13405c. I think the Fixes: tag should be > > > > for de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe > > > > Link Speed"), which added PCIE_LNKCTL2_TLS2SPEED() and > > > > PCIE_LNKCAP_SLS2SPEED() without masking out the other bits. > > > > > > I think I'll take Jiwei's patch [1], which fixes > > > > PCIE_LNKCTL2_TLS2SPEED() and PCIE_LNKCAP_SLS2SPEED() without requiring > > > > changes in the users. I'll add the details of Andrew's report to the > > > > commit log. > > > > > Agree. It is better to fix it in the macro. > > > > > > [1] https://lore.kernel.org/all/20250123055155.22648-2-sjiwei@163.com/ > > > > > > > --- > > > > > drivers/pci/quirks.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > > > > index d7f4ee6..deaaf4f 100644 > > > > > --- a/drivers/pci/quirks.c > > > > > +++ b/drivers/pci/quirks.c > > > > > @@ -108,7 +108,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > > > > > pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > > > > > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > > > > > if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) { > > > > > - u16 oldlnkctl2 = lnkctl2; > > > > > + u16 oldlnkctl2 = lnkctl2 & PCI_EXP_LNKCTL2_TLS; > > > > > > > pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n"); > > > > > > > -- > > > > > 2.47.2 > > > > > -- > > > Sathyanarayanan Kuppuswamy > > > Linux Kernel Developer > > > > [-- Attachment #1.2: publickey - andreasx0@protonmail.com - 0xF61BB148.asc --] [-- Type: application/pgp-keys, Size: 661 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 343 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-06-26 22:33 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-23 13:22 [PATCH] PCI: Fix link speed calculation on retrain failure Lukas Wunner 2025-06-23 13:49 ` Sathyanarayanan Kuppuswamy 2025-06-24 11:23 ` Ilpo Järvinen 2025-06-24 12:19 ` Lukas Wunner 2025-06-24 12:28 ` Ilpo Järvinen 2025-06-24 16:48 ` Bjorn Helgaas 2025-06-24 18:13 ` Sathyanarayanan Kuppuswamy 2025-06-25 16:06 ` andreasx0 2025-06-25 17:46 ` Bjorn Helgaas 2025-06-26 22:33 ` andreasx0
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).