From: Bjorn Helgaas <helgaas@kernel.org>
To: andreasx0 <andreasx0@protonmail.com>
Cc: Sathyanarayanan Kuppuswamy
<sathyanarayanan.kuppuswamy@linux.intel.com>,
Lukas Wunner <lukas@wunner.de>,
Ilpo Jarvinen <ilpo.jarvinen@linux.intel.com>,
"Maciej W. Rozycki" <macro@orcam.me.uk>,
Matthew W Carlis <mattc@purestorage.com>,
linux-pci@vger.kernel.org, Jiwei Sun <sunjw10@lenovo.com>,
Adrian Huang12 <ahuang12@lenovo.com>
Subject: Re: [PATCH] PCI: Fix link speed calculation on retrain failure
Date: Wed, 25 Jun 2025 12:46:52 -0500 [thread overview]
Message-ID: <20250625174652.GA1578845@bhelgaas> (raw)
In-Reply-To: <9GZ44D4l8VOon-B2Uc15vxasiaSrnTLkvk18qrogb08_K_aCKBPOep6JxmMQRK8UuxTnv0ZxgxIOFA8v8e3yJZuVtLLPzZsmmwRc7BODcVs=@protonmail.com>
On Wed, Jun 25, 2025 at 04:06:58PM +0000, andreasx0 wrote:
> Again. As said the patch from Lucas fixed the warning that was
> caused because the discrete nvidia gpu was disabled by bios.
The series I applied is at
https://lore.kernel.org/all/20250123055155.22648-1-sjiwei@163.com/.
The patches currently queued are at
https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h=enumeration
I cc'd you on my response to that series, so if you think the commit
log needs a change, feel free to suggest something in that thread.
It's a generic problem, not anything specific to the GPU, so I just
included the log messages a user would see when the problem happens.
I added your Reported-by because I think the first patch [2] *should*
fix the problem you saw. If it doesn't, please let me know. If you
test it and it does fix the problem, I'd be happy to add your
Tested-by as well.
Thanks very much for reporting this issue and giving it a nudge to get
it fixed!
[2] https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?id=9989e0ca7462
> On Tuesday, June 24th, 2025 at 21:13, Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com> wrote:
>
> > On 6/24/25 9:48 AM, Bjorn Helgaas wrote:
> >
>
> > > [+cc Sathy, Jiwei, Adrian]
> > >
>
> > > On Mon, Jun 23, 2025 at 03:22:14PM +0200, Lukas Wunner wrote:
> > >
>
> > > > When pcie_failed_link_retrain() fails to retrain, it tries to revert to
> > > > the previous link speed. However it calculates that speed from the Link
> > > > Control 2 register without masking out non-speed bits first.
> > > >
>
> > > > PCIE_LNKCTL2_TLS2SPEED() converts such incorrect values to
> > > > PCI_SPEED_UNKNOWN, which in turn causes a WARN splat in
> > > > pcie_set_target_speed():
> > > >
>
> > > > pci 0000:00:01.1: [1022:14ed] type 01 class 0x060400 PCIe Root Port
> > > > pci 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s
> > > > pci 0000:00:01.1: retraining failed
> > > > WARNING: CPU: 1 PID: 1 at drivers/pci/pcie/bwctrl.c:168 pcie_set_target_speed
> > > > RDX: 0000000000000001 RSI: 00000000000000ff RDI: ffff9acd82efa000
> > > > pcie_failed_link_retrain
> > > > pci_device_add
> > > > pci_scan_single_device
> > > > pci_scan_slot
> > > > pci_scan_child_bus_extend
> > > > acpi_pci_root_create
> > > > pci_acpi_scan_root
> > > > acpi_pci_root_add
> > > > acpi_bus_attach
> > > > device_for_each_child
> > > > acpi_dev_for_each_child
> > > > acpi_bus_attach
> > > > device_for_each_child
> > > > acpi_dev_for_each_child
> > > > acpi_bus_attach
> > > > acpi_bus_scan
> > > > acpi_scan_init
> > > > acpi_init
> > > >
>
> > > > Per the calling convention of the System V AMD64 ABI, the arguments to
> > > > pcie_set_target_speed(struct pci_dev *, enum pci_bus_speed, bool) are
> > > > stored in RDI, RSI, RDX. As visible above, RSI contains 0xff, i.e.
> > > > PCI_SPEED_UNKNOWN.
> > > >
>
> > > > Fixes: f68dea13405c ("PCI: Revert to the original speed after PCIe failed link retraining")
> > > > Reported-by: Andrew andreasx0@protonmail.com
> > > > Closes: https://lore.kernel.org/r/7iNzXbCGpf8yUMJZBQjLdbjPcXrEJqBxy5-bHfppz0ek-h4_-G93b1KUrm106r2VNF2FV_sSq0nENv4RsRIUGnlYZMlQr2ZD2NyB5sdj5aU=@protonmail.com/
> > > > Signed-off-by: Lukas Wunner lukas@wunner.de
> > > > Cc: stable@vger.kernel.org # v6.12+
> > > > I like the brevity of this patch, but I do worry that if we ever have
> > > > other users of PCIE_LNKCTL2_TLS2SPEED(), we might have the same
> > > > problem again.
> > >
>
> > > Also, it looks like PCIE_LNKCAP_SLS2SPEED() has the same problem.
> > >
>
> > > f68dea13405c predates PCIE_LNKCTL2_TLS2SPEED(), and I don't think this
> > > problem existed as of f68dea13405c. I think the Fixes: tag should be
> > > for de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe
> > > Link Speed"), which added PCIE_LNKCTL2_TLS2SPEED() and
> > > PCIE_LNKCAP_SLS2SPEED() without masking out the other bits.
> > >
>
> > > I think I'll take Jiwei's patch [1], which fixes
> > > PCIE_LNKCTL2_TLS2SPEED() and PCIE_LNKCAP_SLS2SPEED() without requiring
> > > changes in the users. I'll add the details of Andrew's report to the
> > > commit log.
> >
>
> >
>
> > Agree. It is better to fix it in the macro.
> >
>
> > > [1] https://lore.kernel.org/all/20250123055155.22648-2-sjiwei@163.com/
> > >
>
> > > > ---
> > > > drivers/pci/quirks.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
>
> > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > > index d7f4ee6..deaaf4f 100644
> > > > --- a/drivers/pci/quirks.c
> > > > +++ b/drivers/pci/quirks.c
> > > > @@ -108,7 +108,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
> > > > pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
> > > > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
> > > > if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) {
> > > > - u16 oldlnkctl2 = lnkctl2;
> > > > + u16 oldlnkctl2 = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
> > > >
>
> > > > pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
> > > >
>
> > > > --
> > > > 2.47.2
> >
>
> > --
> > Sathyanarayanan Kuppuswamy
> > Linux Kernel Developer
next prev parent reply other threads:[~2025-06-25 17:46 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-23 13:22 [PATCH] PCI: Fix link speed calculation on retrain failure Lukas Wunner
2025-06-23 13:49 ` Sathyanarayanan Kuppuswamy
2025-06-24 11:23 ` Ilpo Järvinen
2025-06-24 12:19 ` Lukas Wunner
2025-06-24 12:28 ` Ilpo Järvinen
2025-06-24 16:48 ` Bjorn Helgaas
2025-06-24 18:13 ` Sathyanarayanan Kuppuswamy
2025-06-25 16:06 ` andreasx0
2025-06-25 17:46 ` Bjorn Helgaas [this message]
2025-06-26 22:33 ` andreasx0
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250625174652.GA1578845@bhelgaas \
--to=helgaas@kernel.org \
--cc=ahuang12@lenovo.com \
--cc=andreasx0@protonmail.com \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=macro@orcam.me.uk \
--cc=mattc@purestorage.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=sunjw10@lenovo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox