From: Jiwei Sun <jiwei.sun.bj@qq.com>
To: macro@orcam.me.uk, ilpo.jarvinen@linux.intel.com, bhelgaas@google.com
Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
guojinhui.liam@bytedance.com, helgaas@kernel.org,
lukas@wunner.de, ahuang12@lenovo.com, sunjw10@lenovo.com,
jiwei.sun.bj@qq.com
Subject: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing
Date: Fri, 10 Jan 2025 21:44:01 +0800 [thread overview]
Message-ID: <tencent_B9290375427BDF73A2DC855F50397CC9FA08@qq.com> (raw)
From: Jiwei Sun <sunjw10@lenovo.com>
When we do the quick hot-add/hot-remove test (within 1 second) with a PCIE
Gen 5 NVMe disk, there is a possibility that the PCIe bridge will decrease
to 2.5GT/s from 32GT/s
pcieport 10002:00:04.0: pciehp: Slot(75): Link Down
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
...
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: broken device, retraining non-functional downstream link at 2.5GT/s
pcieport 10002:00:04.0: pciehp: Slot(75): No link
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): Link Up
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pci 10002:02:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint
pci 10002:02:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit]
pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit]
pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs
pci 10002:02:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 10002:00:04.0 (capable of 126.028 Gb/s with 32.0 GT/s PCIe x4 link)
If a NVMe disk is hot removed, the pciehp interrupt will be triggered, and
the kernel thread pciehp_ist will be woken up, the
pcie_failed_link_retrain() will be called as the following call trace.
irq/87-pciehp-2524 [121] ..... 152046.006765: pcie_failed_link_retrain <-pcie_wait_for_link
irq/87-pciehp-2524 [121] ..... 152046.006782: <stack trace>
=> [FTRACE TRAMPOLINE]
=> pcie_failed_link_retrain
=> pcie_wait_for_link
=> pciehp_check_link_status
=> pciehp_enable_slot
=> pciehp_handle_presence_or_link_change
=> pciehp_ist
=> irq_thread_fn
=> irq_thread
=> kthread
=> ret_from_fork
=> ret_from_fork_asm
Accorind to investigation, the issue is caused by the following scenerios,
NVMe disk pciehp hardirq
hot-remove top-half pciehp irq kernel thread
======================================================================
pciehp hardirq
will be triggered
cpu handle pciehp
hardirq
pciehp irq kthread will
be woken up
pciehp_ist
...
pcie_failed_link_retrain
read PCI_EXP_LNKCTL2 register
read PCI_EXP_LNKSTA register
If NVMe disk
hot-add before
calling pcie_retrain_link()
set target speed to 2_5GT
pcie_bwctrl_change_speed
pcie_retrain_link
: the retrain work will be
successful, because
pci_match_id() will be
0 in
pcie_failed_link_retrain()
the target link speed
field of the Link Control
2 Register will keep 0x1.
In order to fix the issue, don't do the retraining work except ASMedia
ASM2824.
Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Reported-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Jiwei Sun <sunjw10@lenovo.com>
---
drivers/pci/quirks.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 605628c810a5..ff04ebd9ae16 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -104,6 +104,9 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
u16 lnksta, lnkctl2;
int ret = -ENOTTY;
+ if (!pci_match_id(ids, dev))
+ return 0;
+
if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
!pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
return ret;
@@ -129,8 +132,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
}
if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
- (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
- pci_match_id(ids, dev)) {
+ (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) {
u32 lnkcap;
pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
--
2.34.1
next reply other threads:[~2025-01-10 13:44 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-10 13:44 Jiwei Sun [this message]
2025-01-11 16:00 ` [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing Maciej W. Rozycki
2025-01-13 12:44 ` Jiwei
2025-01-13 15:08 ` Ilpo Järvinen
2025-01-14 15:04 ` Jiwei
2025-01-14 18:25 ` Ilpo Järvinen
2025-01-15 10:18 ` Lukas Wunner
2025-11-25 19:23 ` [External] : " ALOK TIWARI
2025-12-01 3:54 ` Maciej W. Rozycki
2025-01-15 11:39 ` Jiwei
2025-09-09 12:33 ` [External] : " ALOK TIWARI
-- strict thread matches above, loose matches on Subject: below --
2025-12-01 3:52 [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining Maciej W. Rozycki
2025-12-04 1:40 ` [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing Matthew W Carlis
2025-12-04 23:43 ` Maciej W. Rozycki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tencent_B9290375427BDF73A2DC855F50397CC9FA08@qq.com \
--to=jiwei.sun.bj@qq.com \
--cc=ahuang12@lenovo.com \
--cc=bhelgaas@google.com \
--cc=guojinhui.liam@bytedance.com \
--cc=helgaas@kernel.org \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=macro@orcam.me.uk \
--cc=sunjw10@lenovo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox