* [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing
@ 2025-01-10 13:44 Jiwei Sun
2025-01-11 16:00 ` Maciej W. Rozycki
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Jiwei Sun @ 2025-01-10 13:44 UTC (permalink / raw)
To: macro, ilpo.jarvinen, bhelgaas
Cc: linux-pci, linux-kernel, guojinhui.liam, helgaas, lukas, ahuang12,
sunjw10, jiwei.sun.bj
From: Jiwei Sun <sunjw10@lenovo.com>
When we do the quick hot-add/hot-remove test (within 1 second) with a PCIE
Gen 5 NVMe disk, there is a possibility that the PCIe bridge will decrease
to 2.5GT/s from 32GT/s
pcieport 10002:00:04.0: pciehp: Slot(75): Link Down
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
...
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: broken device, retraining non-functional downstream link at 2.5GT/s
pcieport 10002:00:04.0: pciehp: Slot(75): No link
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): Link Up
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pci 10002:02:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint
pci 10002:02:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit]
pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit]
pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs
pci 10002:02:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 10002:00:04.0 (capable of 126.028 Gb/s with 32.0 GT/s PCIe x4 link)
If a NVMe disk is hot removed, the pciehp interrupt will be triggered, and
the kernel thread pciehp_ist will be woken up, the
pcie_failed_link_retrain() will be called as the following call trace.
irq/87-pciehp-2524 [121] ..... 152046.006765: pcie_failed_link_retrain <-pcie_wait_for_link
irq/87-pciehp-2524 [121] ..... 152046.006782: <stack trace>
=> [FTRACE TRAMPOLINE]
=> pcie_failed_link_retrain
=> pcie_wait_for_link
=> pciehp_check_link_status
=> pciehp_enable_slot
=> pciehp_handle_presence_or_link_change
=> pciehp_ist
=> irq_thread_fn
=> irq_thread
=> kthread
=> ret_from_fork
=> ret_from_fork_asm
Accorind to investigation, the issue is caused by the following scenerios,
NVMe disk pciehp hardirq
hot-remove top-half pciehp irq kernel thread
======================================================================
pciehp hardirq
will be triggered
cpu handle pciehp
hardirq
pciehp irq kthread will
be woken up
pciehp_ist
...
pcie_failed_link_retrain
read PCI_EXP_LNKCTL2 register
read PCI_EXP_LNKSTA register
If NVMe disk
hot-add before
calling pcie_retrain_link()
set target speed to 2_5GT
pcie_bwctrl_change_speed
pcie_retrain_link
: the retrain work will be
successful, because
pci_match_id() will be
0 in
pcie_failed_link_retrain()
the target link speed
field of the Link Control
2 Register will keep 0x1.
In order to fix the issue, don't do the retraining work except ASMedia
ASM2824.
Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Reported-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Jiwei Sun <sunjw10@lenovo.com>
---
drivers/pci/quirks.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 605628c810a5..ff04ebd9ae16 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -104,6 +104,9 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
u16 lnksta, lnkctl2;
int ret = -ENOTTY;
+ if (!pci_match_id(ids, dev))
+ return 0;
+
if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
!pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
return ret;
@@ -129,8 +132,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
}
if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
- (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
- pci_match_id(ids, dev)) {
+ (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) {
u32 lnkcap;
pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
--
2.34.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing 2025-01-10 13:44 [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing Jiwei Sun @ 2025-01-11 16:00 ` Maciej W. Rozycki 2025-01-13 12:44 ` Jiwei 2025-01-13 15:08 ` Ilpo Järvinen 2025-09-09 12:33 ` [External] : " ALOK TIWARI 2 siblings, 1 reply; 13+ messages in thread From: Maciej W. Rozycki @ 2025-01-11 16:00 UTC (permalink / raw) To: Jiwei Sun Cc: ilpo.jarvinen, Bjorn Helgaas, linux-pci, linux-kernel, guojinhui.liam, helgaas, lukas, ahuang12, sunjw10 On Fri, 10 Jan 2025, Jiwei Sun wrote: > In order to fix the issue, don't do the retraining work except ASMedia > ASM2824. I yet need to go through all of your submission in detail, but this assumption defeats the purpose of the workaround, as the current understanding of the origin of the training failure and the reason to retrain by hand with the speed limited to 2.5GT/s is the *downstream* device rather than the ASMedia ASM2824 switch. It is also why the quirk has been wired to run everywhere rather than having been keyed by VID:DID, and the VID:DID of the switch is only listed, conservatively, because it seems safe with the switch to lift the speed restriction once the link has successfully completed training. Overall I think we need to get your problem sorted differently, because I suppose in principle your hot-plug scenario could also happen with the ASMedia ASM2824 switch as the upstream device and your NVMe storage element as the downstream device. Perhaps the speed restriction could be always lifted, and then the bandwidth controller infrastructure used for that, so that it doesn't have to happen within `pcie_failed_link_retrain'? Maciej ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing 2025-01-11 16:00 ` Maciej W. Rozycki @ 2025-01-13 12:44 ` Jiwei 0 siblings, 0 replies; 13+ messages in thread From: Jiwei @ 2025-01-13 12:44 UTC (permalink / raw) To: Maciej W. Rozycki Cc: ilpo.jarvinen, Bjorn Helgaas, linux-pci, linux-kernel, guojinhui.liam, helgaas, lukas, ahuang12, sunjw10 On 1/12/25 00:00, Maciej W. Rozycki wrote: > On Fri, 10 Jan 2025, Jiwei Sun wrote: > >> In order to fix the issue, don't do the retraining work except ASMedia >> ASM2824. > > I yet need to go through all of your submission in detail, but this > assumption defeats the purpose of the workaround, as the current > understanding of the origin of the training failure and the reason to > retrain by hand with the speed limited to 2.5GT/s is the *downstream* > device rather than the ASMedia ASM2824 switch. > > It is also why the quirk has been wired to run everywhere rather than > having been keyed by VID:DID, and the VID:DID of the switch is only > listed, conservatively, because it seems safe with the switch to lift the > speed restriction once the link has successfully completed training. > > Overall I think we need to get your problem sorted differently, because I > suppose in principle your hot-plug scenario could also happen with the > ASMedia ASM2824 switch as the upstream device and your NVMe storage > element as the downstream device. Perhaps the speed restriction could be > always lifted, and then the bandwidth controller infrastructure used for > that, so that it doesn't have to happen within `pcie_failed_link_retrain'? According to our test, the following modification can fix the issue in our test machine. diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 02d2e16672a8..9ca051b86878 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -97,10 +97,6 @@ static bool pcie_lbms_seen(struct pci_dev *dev, u16 lnksta) */ int pcie_failed_link_retrain(struct pci_dev *dev) { - static const struct pci_device_id ids[] = { - { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */ - {} - }; u16 lnksta, lnkctl2; int ret = -ENOTTY; @@ -128,8 +124,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) } if ((lnksta & PCI_EXP_LNKSTA_DLLLA) && - (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT && - pci_match_id(ids, dev)) { + (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == + PCI_EXP_LNKCTL2_TLS_2_5GT) { u32 lnkcap; pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n"); But I don't know if the above modification will have any other negative effects on other devices. Could you please share your thoughts? Thanks, Regards, Jiwei > > Maciej ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing 2025-01-10 13:44 [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing Jiwei Sun 2025-01-11 16:00 ` Maciej W. Rozycki @ 2025-01-13 15:08 ` Ilpo Järvinen 2025-01-14 15:04 ` Jiwei 2025-09-09 12:33 ` [External] : " ALOK TIWARI 2 siblings, 1 reply; 13+ messages in thread From: Ilpo Järvinen @ 2025-01-13 15:08 UTC (permalink / raw) To: Jiwei Sun Cc: macro, bhelgaas, linux-pci, LKML, guojinhui.liam, helgaas, Lukas Wunner, ahuang12, sunjw10 On Fri, 10 Jan 2025, Jiwei Sun wrote: > From: Jiwei Sun <sunjw10@lenovo.com> > > When we do the quick hot-add/hot-remove test (within 1 second) with a PCIE > Gen 5 NVMe disk, there is a possibility that the PCIe bridge will decrease > to 2.5GT/s from 32GT/s > > pcieport 10002:00:04.0: pciehp: Slot(75): Link Down > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > ... > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: broken device, retraining non-functional downstream link at 2.5GT/s > pcieport 10002:00:04.0: pciehp: Slot(75): No link > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): Link Up > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pci 10002:02:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint > pci 10002:02:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] > pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] > pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs > pci 10002:02:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 10002:00:04.0 (capable of 126.028 Gb/s with 32.0 GT/s PCIe x4 link) > > If a NVMe disk is hot removed, the pciehp interrupt will be triggered, and > the kernel thread pciehp_ist will be woken up, the > pcie_failed_link_retrain() will be called as the following call trace. > > irq/87-pciehp-2524 [121] ..... 152046.006765: pcie_failed_link_retrain <-pcie_wait_for_link > irq/87-pciehp-2524 [121] ..... 152046.006782: <stack trace> > => [FTRACE TRAMPOLINE] > => pcie_failed_link_retrain > => pcie_wait_for_link > => pciehp_check_link_status > => pciehp_enable_slot > => pciehp_handle_presence_or_link_change > => pciehp_ist > => irq_thread_fn > => irq_thread > => kthread > => ret_from_fork > => ret_from_fork_asm > > Accorind to investigation, the issue is caused by the following scenerios, > > NVMe disk pciehp hardirq > hot-remove top-half pciehp irq kernel thread > ====================================================================== > pciehp hardirq > will be triggered > cpu handle pciehp > hardirq > pciehp irq kthread will > be woken up > pciehp_ist > ... > pcie_failed_link_retrain > read PCI_EXP_LNKCTL2 register > read PCI_EXP_LNKSTA register > If NVMe disk > hot-add before > calling pcie_retrain_link() > set target speed to 2_5GT This assumes LBMS has been seen but DLLLA isn't? Why is that? > pcie_bwctrl_change_speed > pcie_retrain_link > : the retrain work will be > successful, because > pci_match_id() will be > 0 in > pcie_failed_link_retrain() There's no pci_match_id() in pcie_retrain_link() ?? What does that : mean? I think the nesting level is wrong in your flow description? I don't understand how retrain success relates to the pci_match_id() as there are two different steps in pcie_failed_link_retrain(). In step 1, pcie_failed_link_retrain() sets speed to 2.5GT/s if DLLLA=0 and LBMS has been seen. Why is that condition happening in your case? You didn't explain LBMS (nor DLLLA) in the above sequence so it's hard to follow what is going on here. LBMS in particular is of high interest here because I'm trying to understand if something should clear it on the hotplug side (there's already one call to clear it in remove_board()). In step 2 (pcie_set_target_speed() in step 1 succeeded), pcie_failed_link_retrain() attempts to restore >2.5GT/s speed, this only occurs when pci_match_id() matches. I guess you're trying to say that step 2 is not taken because pci_match_id() is not matching but the wording above is very confusing. Overall, I failed to understand the scenario here fully despite trying to think it through over these few days. > the target link speed > field of the Link Control > 2 Register will keep 0x1. > > In order to fix the issue, don't do the retraining work except ASMedia > ASM2824. > > Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures") > Reported-by: Adrian Huang <ahuang12@lenovo.com> > Signed-off-by: Jiwei Sun <sunjw10@lenovo.com> > --- > drivers/pci/quirks.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 605628c810a5..ff04ebd9ae16 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -104,6 +104,9 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > u16 lnksta, lnkctl2; > int ret = -ENOTTY; > > + if (!pci_match_id(ids, dev)) > + return 0; > + > if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) || > !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting) > return ret; > @@ -129,8 +132,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > } > > if ((lnksta & PCI_EXP_LNKSTA_DLLLA) && > - (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT && > - pci_match_id(ids, dev)) { > + (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) { > u32 lnkcap; > > pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n"); > -- i. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing 2025-01-13 15:08 ` Ilpo Järvinen @ 2025-01-14 15:04 ` Jiwei 2025-01-14 18:25 ` Ilpo Järvinen 0 siblings, 1 reply; 13+ messages in thread From: Jiwei @ 2025-01-14 15:04 UTC (permalink / raw) To: Ilpo Järvinen Cc: macro, bhelgaas, linux-pci, LKML, guojinhui.liam, helgaas, Lukas Wunner, ahuang12, sunjw10 On 1/13/25 23:08, Ilpo Järvinen wrote: > On Fri, 10 Jan 2025, Jiwei Sun wrote: > >> From: Jiwei Sun <sunjw10@lenovo.com> >> >> When we do the quick hot-add/hot-remove test (within 1 second) with a PCIE >> Gen 5 NVMe disk, there is a possibility that the PCIe bridge will decrease >> to 2.5GT/s from 32GT/s >> >> pcieport 10002:00:04.0: pciehp: Slot(75): Link Down >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >> ... >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >> pcieport 10002:00:04.0: broken device, retraining non-functional downstream link at 2.5GT/s >> pcieport 10002:00:04.0: pciehp: Slot(75): No link >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >> pcieport 10002:00:04.0: pciehp: Slot(75): Link Up >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >> pci 10002:02:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint >> pci 10002:02:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] >> pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] >> pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs >> pci 10002:02:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 10002:00:04.0 (capable of 126.028 Gb/s with 32.0 GT/s PCIe x4 link) >> >> If a NVMe disk is hot removed, the pciehp interrupt will be triggered, and >> the kernel thread pciehp_ist will be woken up, the >> pcie_failed_link_retrain() will be called as the following call trace. >> >> irq/87-pciehp-2524 [121] ..... 152046.006765: pcie_failed_link_retrain <-pcie_wait_for_link >> irq/87-pciehp-2524 [121] ..... 152046.006782: <stack trace> >> => [FTRACE TRAMPOLINE] >> => pcie_failed_link_retrain >> => pcie_wait_for_link >> => pciehp_check_link_status >> => pciehp_enable_slot >> => pciehp_handle_presence_or_link_change >> => pciehp_ist >> => irq_thread_fn >> => irq_thread >> => kthread >> => ret_from_fork >> => ret_from_fork_asm >> >> Accorind to investigation, the issue is caused by the following scenerios, >> >> NVMe disk pciehp hardirq >> hot-remove top-half pciehp irq kernel thread >> ====================================================================== >> pciehp hardirq >> will be triggered >> cpu handle pciehp >> hardirq >> pciehp irq kthread will >> be woken up >> pciehp_ist >> ... >> pcie_failed_link_retrain >> read PCI_EXP_LNKCTL2 register >> read PCI_EXP_LNKSTA register >> If NVMe disk >> hot-add before >> calling pcie_retrain_link() >> set target speed to 2_5GT > > This assumes LBMS has been seen but DLLLA isn't? Why is that? Please look at the content below. > >> pcie_bwctrl_change_speed >> pcie_retrain_link > >> : the retrain work will be >> successful, because >> pci_match_id() will be >> 0 in >> pcie_failed_link_retrain() > > There's no pci_match_id() in pcie_retrain_link() ?? What does that : mean? > I think the nesting level is wrong in your flow description? Sorry for the confusing information, the complete meaning I want to express is as follows, NVMe disk pciehp hardirq hot-remove top-half pciehp irq kernel thread ====================================================================== pciehp hardirq will be triggered cpu handle pciehp hardirq "pciehp" irq kthread will be woken up pciehp_ist ... pcie_failed_link_retrain pcie_capability_read_word(PCI_EXP_LNKCTL2) pcie_capability_read_word(PCI_EXP_LNKSTA) If NVMe disk hot-add before calling pcie_retrain_link() pcie_set_target_speed(PCIE_SPEED_2_5GT) pcie_bwctrl_change_speed pcie_retrain_link // (1) The target link speed field of LNKCTL2 was set to 0x1, // the retrain work will be successful. // (2) Return to pcie_failed_link_retrain() pcie_capability_read_word(PCI_EXP_LNKSTA) if lnksta & PCI_EXP_LNKSTA_DLLLA and PCI_EXP_LNKCTL2_TLS_2_5GT was set and pci_match_id pcie_capability_read_dword(PCI_EXP_LNKCAP) pcie_set_target_speed(PCIE_LNKCAP_SLS2SPEED(lnkcap)) // Although the target link speed field of LNKCTL2 was set to 0x1, // however the dev is not in ids[], the removing downstream // link speed restriction can not be executed. // The target link speed field of LNKCTL2 could not be restored. Due to the limitation of a length of 75 characters per line, the original explanation omitted many details. > > I don't understand how retrain success relates to the pci_match_id() as > there are two different steps in pcie_failed_link_retrain(). > > In step 1, pcie_failed_link_retrain() sets speed to 2.5GT/s if DLLLA=0 and > LBMS has been seen. Why is that condition happening in your case? You According to our test result, it seems so. Maybe it is related to our test. Our test involves plugging and unplugging multiple times within a second. Below is the dmesg log taken from our testing process. The log below is a portion of the dmesg log that I have captured, (Please allow me to retain the timestamps, as this information is important.) -------------------------------dmesg log----------------------------------------- [ 537.981302] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 [ 537.981329] ==== pcie_bwnotif_irq 256 lbms_count++ [ 537.981338] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 [ 538.014638] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 538.014662] ==== pciehp_ist 703 start running [ 538.014678] pcieport 10001:80:02.0: pciehp: Slot(77): Link Down [ 538.199104] ==== pcie_reset_lbms_count 281 lbms_count set to 0 [ 538.199130] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 538.567377] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 538.567393] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 538.616219] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 538.617594] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 539.362382] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 [ 539.362393] ==== pcie_bwnotif_irq 256 lbms_count++ [ 539.362400] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 [ 539.395720] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 539.787501] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 539.787514] ==== pciehp_ist 759 stop running [ 539.787521] ==== pciehp_ist 703 start running [ 539.787533] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 539.914182] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 540.503965] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 540.808415] ======pcie_wait_for_link_delay 4787,wait for linksta:-110 [ 540.808430] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 116, lnkctl2:0x5, lnksta:0x1041 [ 540.808440] ==== pcie_lbms_seen 48 count:0x1 [ 540.808448] pcieport 10001:80:02.0: broken device, retraining non-functional downstream link at 2.5GT/s [ 540.808452] ========== pcie_set_target_speed 172, speed has been set [ 540.808459] pcieport 10001:80:02.0: retraining sucessfully, but now is in Gen 1 [ 540.808466] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 135, oldlnkctl2:0x5,newlnkctl2:0x5,newlnksta:0x1041 [ 541.041386] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 541.041398] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 541.091231] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 541.568126] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 541.568135] ==== pcie_bwnotif_irq 256 lbms_count++ [ 541.568142] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 541.568168] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 542.029334] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 542.029347] ==== pciehp_ist 759 stop running [ 542.029353] ==== pciehp_ist 703 start running [ 542.029362] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 542.120676] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 542.120687] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 542.170424] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 542.172337] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 542.223909] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 [ 542.223917] ==== pcie_bwnotif_irq 256 lbms_count++ [ 542.223924] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 [ 542.257249] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 542.809830] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 542.809841] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 542.859463] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 543.097871] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 543.097879] ==== pcie_bwnotif_irq 256 lbms_count++ [ 543.097885] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 543.097905] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 543.391250] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 543.391260] ==== pciehp_ist 759 stop running [ 543.391265] ==== pciehp_ist 703 start running [ 543.391273] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 543.650507] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 543.650517] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 543.700174] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 543.700205] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 544.296255] pci 10001:81:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint [ 544.296298] pci 10001:81:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] [ 544.296515] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] [ 544.296522] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs [ 544.297256] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 [ 544.297279] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 544.297288] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 544.297295] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 544.297301] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 544.297314] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned [ 544.297337] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space [ 544.297344] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign [ 544.297352] pcieport 10001:80:02.0: PCI bridge to [bus 81] [ 544.297363] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] [ 544.297373] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] [ 544.297385] PCI: No. 2 try to assign unassigned res [ 544.297390] release child resource [mem 0xbb000000-0xbb007fff 64bit] [ 544.297396] pcieport 10001:80:02.0: resource 14 [mem 0xbb000000-0xbb0fffff] released [ 544.297403] pcieport 10001:80:02.0: PCI bridge to [bus 81] [ 544.297412] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 [ 544.297422] pcieport 10001:80:02.0: bridge window [mem 0x00100000-0x001fffff] to [bus 81] add_size 300000 add_align 100000 [ 544.297438] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: can't assign; no space [ 544.297444] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: failed to assign [ 544.297451] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 544.297457] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 544.297464] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: assigned [ 544.297473] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to expand by 0x300000 [ 544.297481] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to add 300000 [ 544.297488] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 544.297494] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 544.297503] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned [ 544.297524] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space [ 544.297530] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign [ 544.297538] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned [ 544.297558] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space [ 544.297563] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign [ 544.297569] pcieport 10001:80:02.0: PCI bridge to [bus 81] [ 544.297579] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] [ 544.297588] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] [ 544.298256] nvme nvme1: pci function 10001:81:00.0 [ 544.298278] nvme 10001:81:00.0: enabling device (0000 -> 0002) [ 544.298291] pcieport 10001:80:02.0: can't derive routing for PCI INT A [ 544.298298] nvme 10001:81:00.0: PCI INT A: no GSI [ 544.875198] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 544.875208] ==== pcie_bwnotif_irq 256 lbms_count++ [ 544.875215] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 544.875231] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 544.875910] ==== pciehp_ist 759 stop running [ 544.875920] ==== pciehp_ist 703 start running [ 544.875928] pcieport 10001:80:02.0: pciehp: Slot(77): Link Down [ 544.876857] ==== pcie_reset_lbms_count 281 lbms_count set to 0 [ 544.876868] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 545.427157] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 545.427169] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 545.476411] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 545.478099] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 545.857887] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 545.857896] ==== pcie_bwnotif_irq 256 lbms_count++ [ 545.857902] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 545.857929] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 546.410193] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 546.410205] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 546.460531] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 546.697008] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 546.697020] ==== pciehp_ist 759 stop running [ 546.697025] ==== pciehp_ist 703 start running [ 546.697034] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 546.697039] pcieport 10001:80:02.0: pciehp: Slot(77): Link Up [ 546.718015] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 546.987498] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 546.987507] ==== pcie_bwnotif_irq 256 lbms_count++ [ 546.987514] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 546.987542] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 547.539681] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 547.539693] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 547.589214] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 547.850003] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 547.850011] ==== pcie_bwnotif_irq 256 lbms_count++ [ 547.850018] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 547.850046] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 547.996918] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 547.996930] ==== pciehp_ist 759 stop running [ 547.996934] ==== pciehp_ist 703 start running [ 547.996944] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 548.401899] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 548.401911] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 548.451186] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 548.452886] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 548.682838] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 548.682846] ==== pcie_bwnotif_irq 256 lbms_count++ [ 548.682852] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 548.682871] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 549.235408] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 549.235420] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 549.284761] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 549.654883] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 549.654892] ==== pcie_bwnotif_irq 256 lbms_count++ [ 549.654899] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 549.654926] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 549.738806] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 549.738815] ==== pciehp_ist 759 stop running [ 549.738819] ==== pciehp_ist 703 start running [ 549.738829] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 550.207186] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 550.207198] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 550.256868] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 550.256890] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 550.575344] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 550.575353] ==== pcie_bwnotif_irq 256 lbms_count++ [ 550.575360] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 550.575386] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 551.127757] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 551.127768] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 551.177224] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 551.477699] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 551.477711] ==== pciehp_ist 759 stop running [ 551.477716] ==== pciehp_ist 703 start running [ 551.477725] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 551.477730] pcieport 10001:80:02.0: pciehp: Slot(77): Link Up [ 551.498667] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 551.788685] pci 10001:81:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint [ 551.788723] pci 10001:81:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] [ 551.788933] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] [ 551.788941] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs [ 551.789619] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 [ 551.789653] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 551.789663] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 551.789672] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 551.789677] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 551.789688] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned [ 551.789708] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space [ 551.789715] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign [ 551.789722] pcieport 10001:80:02.0: PCI bridge to [bus 81] [ 551.789733] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] [ 551.789743] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] [ 551.789755] PCI: No. 2 try to assign unassigned res [ 551.789759] release child resource [mem 0xbb000000-0xbb007fff 64bit] [ 551.789764] pcieport 10001:80:02.0: resource 14 [mem 0xbb000000-0xbb0fffff] released [ 551.789771] pcieport 10001:80:02.0: PCI bridge to [bus 81] [ 551.789779] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 [ 551.789790] pcieport 10001:80:02.0: bridge window [mem 0x00100000-0x001fffff] to [bus 81] add_size 300000 add_align 100000 [ 551.789804] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: can't assign; no space [ 551.789811] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: failed to assign [ 551.789817] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 551.789823] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 551.789831] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: assigned [ 551.789839] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to expand by 0x300000 [ 551.789847] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to add 300000 [ 551.789854] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 551.789860] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 551.789869] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned [ 551.789889] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space [ 551.789895] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign [ 551.789903] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned [ 551.789921] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space [ 551.789927] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign [ 551.789933] pcieport 10001:80:02.0: PCI bridge to [bus 81] [ 551.789942] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] [ 551.789951] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] [ 551.790638] nvme nvme1: pci function 10001:81:00.0 [ 551.790656] nvme 10001:81:00.0: enabling device (0000 -> 0002) [ 551.790667] pcieport 10001:80:02.0: can't derive routing for PCI INT A [ 551.790674] nvme 10001:81:00.0: PCI INT A: no GSI [ 552.546963] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 552.546973] ==== pcie_bwnotif_irq 256 lbms_count++ [ 552.546980] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 552.546996] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 552.547590] ==== pciehp_ist 759 stop running [ 552.547598] ==== pciehp_ist 703 start running [ 552.547605] pcieport 10001:80:02.0: pciehp: Slot(77): Link Down [ 552.548215] ==== pcie_reset_lbms_count 281 lbms_count set to 0 [ 552.548224] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 553.098957] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 553.098969] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 553.148031] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 553.149553] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 553.499647] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 553.499654] ==== pcie_bwnotif_irq 256 lbms_count++ [ 553.499660] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 553.499683] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 554.052313] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 554.052325] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 554.102175] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 554.265181] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 554.265188] ==== pcie_bwnotif_irq 256 lbms_count++ [ 554.265194] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 554.265217] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 554.453449] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 554.453458] ==== pciehp_ist 759 stop running [ 554.453463] ==== pciehp_ist 703 start running [ 554.453472] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 554.743040] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 555.475369] ======pcie_wait_for_link_delay 4787,wait for linksta:-110 [ 555.475384] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 116, lnkctl2:0x5, lnksta:0x1041 [ 555.475392] ==== pcie_lbms_seen 48 count:0x2 [ 555.475398] pcieport 10001:80:02.0: broken device, retraining non-functional downstream link at 2.5GT/s [ 555.475404] ========== pcie_set_target_speed 172, speed has been set [ 555.475409] pcieport 10001:80:02.0: retraining sucessfully, but now is in Gen 1 [ 555.475417] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 135, oldlnkctl2:0x5,newlnkctl2:0x5,newlnksta:0x1041 [ 556.633310] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 556.633322] ==== pciehp_ist 759 stop running [ 556.633328] ==== pciehp_ist 703 start running [ 556.633336] ==== pciehp_ist 759 stop running [ 556.828412] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 556.828440] ==== pciehp_ist 703 start running [ 556.828448] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 557.017389] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 557.017400] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 557.066666] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 557.066688] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 557.209334] pci 10001:81:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint [ 557.209374] pci 10001:81:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] [ 557.209585] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] [ 557.209592] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs [ 557.210275] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 [ 557.210292] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 557.210300] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 557.210307] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 557.210312] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 557.210322] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned [ 557.210342] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space [ 557.210349] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign [ 557.210356] pcieport 10001:80:02.0: PCI bridge to [bus 81] [ 557.210366] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] [ 557.210376] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] [ 557.210388] PCI: No. 2 try to assign unassigned res [ 557.210392] release child resource [mem 0xbb000000-0xbb007fff 64bit] [ 557.210397] pcieport 10001:80:02.0: resource 14 [mem 0xbb000000-0xbb0fffff] released [ 557.210405] pcieport 10001:80:02.0: PCI bridge to [bus 81] [ 557.210414] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 [ 557.210424] pcieport 10001:80:02.0: bridge window [mem 0x00100000-0x001fffff] to [bus 81] add_size 300000 add_align 100000 [ 557.210438] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: can't assign; no space [ 557.210445] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: failed to assign [ 557.210451] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 557.210457] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 557.210464] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: assigned [ 557.210472] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to expand by 0x300000 [ 557.210479] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to add 300000 [ 557.210487] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 557.210492] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 557.210501] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned [ 557.210521] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space [ 557.210527] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign [ 557.210534] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned [ 557.210553] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space [ 557.210559] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign [ 557.210565] pcieport 10001:80:02.0: PCI bridge to [bus 81] [ 557.210574] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] [ 557.210583] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] [ 557.211286] nvme nvme1: pci function 10001:81:00.0 [ 557.211303] nvme 10001:81:00.0: enabling device (0000 -> 0002) [ 557.211315] pcieport 10001:80:02.0: can't derive routing for PCI INT A [ 557.211322] nvme 10001:81:00.0: PCI INT A: no GSI [ 557.565811] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 557.565820] ==== pcie_bwnotif_irq 256 lbms_count++ [ 557.565827] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 557.565842] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 557.566410] ==== pciehp_ist 759 stop running [ 557.566416] ==== pciehp_ist 703 start running [ 557.566423] pcieport 10001:80:02.0: pciehp: Slot(77): Link Down [ 557.567592] ==== pcie_reset_lbms_count 281 lbms_count set to 0 [ 557.567602] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 558.117581] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 558.117594] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 558.166639] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 558.168190] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 558.376176] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 558.376184] ==== pcie_bwnotif_irq 256 lbms_count++ [ 558.376190] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 558.376208] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 558.928611] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 558.928621] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 558.977769] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 559.186385] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 559.186394] ==== pcie_bwnotif_irq 256 lbms_count++ [ 559.186400] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 559.186419] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 559.459099] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 559.459111] ==== pciehp_ist 759 stop running [ 559.459116] ==== pciehp_ist 703 start running [ 559.459124] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 559.738599] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 559.738610] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 559.787690] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 559.787712] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 560.307243] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 560.307253] ==== pcie_bwnotif_irq 256 lbms_count++ [ 560.307260] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 560.307282] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 560.978997] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 560.979007] ==== pciehp_ist 759 stop running [ 560.979013] ==== pciehp_ist 703 start running [ 560.979022] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 561.410141] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 561.410153] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 561.459064] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 561.459087] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 561.648520] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 561.648528] ==== pcie_bwnotif_irq 256 lbms_count++ [ 561.648536] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 561.648559] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 562.247076] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 562.247087] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 562.296600] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 562.454228] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 [ 562.454236] ==== pcie_bwnotif_irq 256 lbms_count++ [ 562.454244] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 [ 562.487632] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 562.674863] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 562.674874] ==== pciehp_ist 759 stop running [ 562.674879] ==== pciehp_ist 703 start running [ 562.674888] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 563.696784] ======pcie_wait_for_link_delay 4787,wait for linksta:-110 [ 563.696798] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 116, lnkctl2:0x5, lnksta:0x1041 [ 563.696806] ==== pcie_lbms_seen 48 count:0x5 [ 563.696813] pcieport 10001:80:02.0: broken device, retraining non-functional downstream link at 2.5GT/s [ 563.696817] ========== pcie_set_target_speed 172, speed has been set [ 563.696823] pcieport 10001:80:02.0: retraining sucessfully, but now is in Gen 1 [ 563.696830] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 135, oldlnkctl2:0x5,newlnkctl2:0x5,newlnksta:0x1041 [ 564.133582] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 564.133594] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 564.183003] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 564.364911] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 564.364921] ==== pcie_bwnotif_irq 256 lbms_count++ [ 564.364930] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 564.364954] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 564.889708] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 564.889719] ==== pciehp_ist 759 stop running [ 564.889724] ==== pciehp_ist 703 start running [ 564.889732] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 565.493151] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 565.493162] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 565.542478] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 565.542501] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 565.752276] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 [ 565.752285] ==== pcie_bwnotif_irq 256 lbms_count++ [ 565.752291] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 [ 565.752316] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 566.359793] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 566.359804] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 566.408820] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 566.581150] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 [ 566.581159] ==== pcie_bwnotif_irq 256 lbms_count++ [ 566.581166] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 [ 566.614491] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 566.755582] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 566.755591] ==== pciehp_ist 759 stop running [ 566.755596] ==== pciehp_ist 703 start running [ 566.755605] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 567.751399] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 567.751412] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 567.776517] ======pcie_wait_for_link_delay 4787,wait for linksta:-110 [ 567.776529] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 116, lnkctl2:0x5, lnksta:0x1845 [ 567.776538] ==== pcie_lbms_seen 48 count:0x8 [ 567.776544] pcieport 10001:80:02.0: broken device, retraining non-functional downstream link at 2.5GT/s [ 567.801147] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 567.801177] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 [ 567.801184] ==== pcie_bwnotif_irq 256 lbms_count++ [ 567.801192] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 [ 567.801201] ==== pcie_reset_lbms_count 281 lbms_count set to 0 [ 567.801207] ========== pcie_set_target_speed 189, bwctl change speed ret:0x0 [ 567.801214] pcieport 10001:80:02.0: retraining sucessfully, but now is in Gen 1 [ 567.801220] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 135, oldlnkctl2:0x5,newlnkctl2:0x1,newlnksta:0x3841 [ 567.815102] ==== pcie_bwnotif_irq 247(start running),link_status:0x7041 [ 567.815110] ==== pcie_bwnotif_irq 256 lbms_count++ [ 567.815117] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7041 [ 567.910155] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 568.961434] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 568.961444] ==== pciehp_ist 759 stop running [ 568.961450] ==== pciehp_ist 703 start running [ 568.961459] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 569.008665] ==== pcie_bwnotif_irq 247(start running),link_status:0x3041 [ 569.010428] ======pcie_wait_for_link_delay 4787,wait for linksta:0 [ 569.391482] pci 10001:81:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint [ 569.391549] pci 10001:81:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] [ 569.391968] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] [ 569.391975] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs [ 569.392869] pci 10001:81:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 10001:80:02.0 (capable of 126.028 Gb/s with 32.0 GT/s PCIe x4 link) [ 569.393233] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 [ 569.393249] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 569.393257] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 569.393264] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 569.393270] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 569.393279] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned [ 569.393315] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space [ 569.393322] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign [ 569.393329] pcieport 10001:80:02.0: PCI bridge to [bus 81] [ 569.393340] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] [ 569.393350] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] [ 569.393362] PCI: No. 2 try to assign unassigned res [ 569.393366] release child resource [mem 0xbb000000-0xbb007fff 64bit] [ 569.393371] pcieport 10001:80:02.0: resource 14 [mem 0xbb000000-0xbb0fffff] released [ 569.393378] pcieport 10001:80:02.0: PCI bridge to [bus 81] [ 569.393404] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 [ 569.393414] pcieport 10001:80:02.0: bridge window [mem 0x00100000-0x001fffff] to [bus 81] add_size 300000 add_align 100000 [ 569.393430] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: can't assign; no space [ 569.393438] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: failed to assign [ 569.393445] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 569.393451] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 569.393458] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: assigned [ 569.393466] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to expand by 0x300000 [ 569.393474] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to add 300000 [ 569.393481] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space [ 569.393487] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign [ 569.393495] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned [ 569.393529] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space [ 569.393536] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign [ 569.393543] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned [ 569.393576] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space [ 569.393582] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign [ 569.393588] pcieport 10001:80:02.0: PCI bridge to [bus 81] [ 569.393597] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] [ 569.393606] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] [ 569.394076] nvme nvme1: pci function 10001:81:00.0 [ 569.394095] nvme 10001:81:00.0: enabling device (0000 -> 0002) [ 569.394109] pcieport 10001:80:02.0: can't derive routing for PCI INT A [ 569.394116] nvme 10001:81:00.0: PCI INT A: no GSI [ 570.158994] nvme nvme1: D3 entry latency set to 10 seconds [ 570.239267] nvme nvme1: 127/0/0 default/read/poll queues [ 570.287896] ==== pciehp_ist 759 stop running [ 570.287911] ==== pciehp_ist 703 start running [ 570.287918] ==== pciehp_ist 759 stop running [ 570.288953] nvme1n1: p1 p2 p3 p4 p5 p6 p7 -------------------------------dmesg log----------------------------------------- From the log above, it can be seen that I added some debugging codes in the kernel. The specific modifications are as follows: -------------------------------diff file----------------------------------------- diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index bb5a8d9f03ad..c9f3ed86a084 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -700,6 +700,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) irqreturn_t ret; u32 events; + printk("==== %s %d start running\n", __func__, __LINE__); ctrl->ist_running = true; pci_config_pm_runtime_get(pdev); @@ -755,6 +756,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) pci_config_pm_runtime_put(pdev); ctrl->ist_running = false; wake_up(&ctrl->requester); + printk("==== %s %d stop running\n", __func__, __LINE__); return ret; } diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 661f98c6c63a..ffa58f389456 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4784,6 +4784,7 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active, if (active) msleep(20); rc = pcie_wait_for_link_status(pdev, false, active); + printk("======%s %d,wait for linksta:%d\n", __func__, __LINE__, rc); if (active) { if (rc) rc = pcie_failed_link_retrain(pdev); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 2e40fc63ba31..b7e5af859517 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -337,12 +337,13 @@ void pci_bus_put(struct pci_bus *bus); #define PCIE_LNKCAP_SLS2SPEED(lnkcap) \ ({ \ - ((lnkcap) == PCI_EXP_LNKCAP_SLS_64_0GB ? PCIE_SPEED_64_0GT : \ - (lnkcap) == PCI_EXP_LNKCAP_SLS_32_0GB ? PCIE_SPEED_32_0GT : \ - (lnkcap) == PCI_EXP_LNKCAP_SLS_16_0GB ? PCIE_SPEED_16_0GT : \ - (lnkcap) == PCI_EXP_LNKCAP_SLS_8_0GB ? PCIE_SPEED_8_0GT : \ - (lnkcap) == PCI_EXP_LNKCAP_SLS_5_0GB ? PCIE_SPEED_5_0GT : \ - (lnkcap) == PCI_EXP_LNKCAP_SLS_2_5GB ? PCIE_SPEED_2_5GT : \ + u32 __lnkcap = (lnkcap) & PCI_EXP_LNKCAP_SLS; \ + (__lnkcap == PCI_EXP_LNKCAP_SLS_64_0GB ? PCIE_SPEED_64_0GT : \ + __lnkcap == PCI_EXP_LNKCAP_SLS_32_0GB ? PCIE_SPEED_32_0GT : \ + __lnkcap == PCI_EXP_LNKCAP_SLS_16_0GB ? PCIE_SPEED_16_0GT : \ + __lnkcap == PCI_EXP_LNKCAP_SLS_8_0GB ? PCIE_SPEED_8_0GT : \ + __lnkcap == PCI_EXP_LNKCAP_SLS_5_0GB ? PCIE_SPEED_5_0GT : \ + __lnkcap == PCI_EXP_LNKCAP_SLS_2_5GB ? PCIE_SPEED_2_5GT : \ PCI_SPEED_UNKNOWN); \ }) @@ -357,13 +358,16 @@ void pci_bus_put(struct pci_bus *bus); PCI_SPEED_UNKNOWN) #define PCIE_LNKCTL2_TLS2SPEED(lnkctl2) \ - ((lnkctl2) == PCI_EXP_LNKCTL2_TLS_64_0GT ? PCIE_SPEED_64_0GT : \ - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_32_0GT ? PCIE_SPEED_32_0GT : \ - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_16_0GT ? PCIE_SPEED_16_0GT : \ - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_8_0GT ? PCIE_SPEED_8_0GT : \ - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_5_0GT ? PCIE_SPEED_5_0GT : \ - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_2_5GT ? PCIE_SPEED_2_5GT : \ - PCI_SPEED_UNKNOWN) +({ \ + u16 __lnkctl2 = (lnkctl2) & PCI_EXP_LNKCTL2_TLS; \ + (__lnkctl2 == PCI_EXP_LNKCTL2_TLS_64_0GT ? PCIE_SPEED_64_0GT : \ + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_32_0GT ? PCIE_SPEED_32_0GT : \ + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_16_0GT ? PCIE_SPEED_16_0GT : \ + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_8_0GT ? PCIE_SPEED_8_0GT : \ + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_5_0GT ? PCIE_SPEED_5_0GT : \ + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_2_5GT ? PCIE_SPEED_2_5GT : \ + PCI_SPEED_UNKNOWN); \ +}) /* PCIe speed to Mb/s reduced by encoding overhead */ #define PCIE_SPEED2MBS_ENC(speed) \ diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c index b59cacc740fa..a8ce09f67d3b 100644 --- a/drivers/pci/pcie/bwctrl.c +++ b/drivers/pci/pcie/bwctrl.c @@ -168,8 +168,10 @@ int pcie_set_target_speed(struct pci_dev *port, enum pci_bus_speed speed_req, if (WARN_ON_ONCE(!pcie_valid_speed(speed_req))) return -EINVAL; - if (bus && bus->cur_bus_speed == speed_req) + if (bus && bus->cur_bus_speed == speed_req) { + printk("========== %s %d, speed has been set\n", __func__, __LINE__); return 0; + } target_speed = pcie_bwctrl_select_speed(port, speed_req); @@ -184,6 +186,7 @@ int pcie_set_target_speed(struct pci_dev *port, enum pci_bus_speed speed_req, mutex_lock(&data->set_speed_mutex); ret = pcie_bwctrl_change_speed(port, target_speed, use_lt); + printk("========== %s %d, bwctl change speed ret:0x%x\n", __func__, __LINE__,ret); if (data) mutex_unlock(&data->set_speed_mutex); @@ -209,8 +212,10 @@ static void pcie_bwnotif_enable(struct pcie_device *srv) /* Count LBMS seen so far as one */ ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status); - if (ret == PCIBIOS_SUCCESSFUL && link_status & PCI_EXP_LNKSTA_LBMS) + if (ret == PCIBIOS_SUCCESSFUL && link_status & PCI_EXP_LNKSTA_LBMS) { + printk("==== %s %d lbms_count++\n", __func__, __LINE__); atomic_inc(&data->lbms_count); + } pcie_capability_set_word(port, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE); @@ -239,6 +244,7 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context) int ret; ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status); + printk("==== %s %d(start running),link_status:0x%x\n", __func__, __LINE__,link_status); if (ret != PCIBIOS_SUCCESSFUL) return IRQ_NONE; @@ -246,8 +252,10 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context) if (!events) return IRQ_NONE; - if (events & PCI_EXP_LNKSTA_LBMS) + if (events & PCI_EXP_LNKSTA_LBMS) { + printk("==== %s %d lbms_count++\n", __func__, __LINE__); atomic_inc(&data->lbms_count); + } pcie_capability_write_word(port, PCI_EXP_LNKSTA, events); @@ -258,6 +266,7 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context) * cleared to avoid missing link speed changes. */ pcie_update_link_speed(port->subordinate); + printk("==== %s %d(stop running),link_status:0x%x\n", __func__, __LINE__,link_status); return IRQ_HANDLED; } @@ -268,8 +277,10 @@ void pcie_reset_lbms_count(struct pci_dev *port) guard(rwsem_read)(&pcie_bwctrl_lbms_rwsem); data = port->link_bwctrl; - if (data) + if (data) { + printk("==== %s %d lbms_count set to 0\n", __func__, __LINE__); atomic_set(&data->lbms_count, 0); + } else pcie_capability_write_word(port, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS); diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 76f4df75b08a..a602f9aa5d6a 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -41,8 +41,11 @@ static bool pcie_lbms_seen(struct pci_dev *dev, u16 lnksta) int ret; ret = pcie_lbms_count(dev, &count); - if (ret < 0) + if (ret < 0) { + printk("==== %s %d lnksta(0x%x) & LBMS\n", __func__, __LINE__, lnksta); return lnksta & PCI_EXP_LNKSTA_LBMS; + } + printk("==== %s %d count:0x%lx\n", __func__, __LINE__, count); return count > 0; } @@ -110,6 +113,8 @@ int pcie_failed_link_retrain(struct pci_dev *dev) pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); + pci_info(dev, "============ %s %d, lnkctl2:0x%x, lnksta:0x%x\n", + __func__, __LINE__, lnkctl2, lnksta); if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) { u16 oldlnkctl2 = lnkctl2; @@ -121,9 +126,14 @@ int pcie_failed_link_retrain(struct pci_dev *dev) pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2), true); return ret; + } else { + pci_info(dev, "retraining sucessfully, but now is in Gen 1\n"); } + pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); + pci_info(dev, "============ %s %d, oldlnkctl2:0x%x,newlnkctl2:0x%x,newlnksta:0x%x\n", + __func__, __LINE__, oldlnkctl2, lnkctl2, lnksta); } if ((lnksta & PCI_EXP_LNKSTA_DLLLA) && -------------------------------diff file----------------------------------------- Based on the information in the log from 566.755596 to 567.801220, the issue has been reproduced. Between 566 and 567 seconds, the pcie_bwnotif_irq interrupt was triggered 4 times, this indicates that during this period, the NVMe drive was plugged and unplugged multiple times. Thanks, Regards, Jiwei > didn't explain LBMS (nor DLLLA) in the above sequence so it's hard to > follow what is going on here. LBMS in particular is of high interest here > because I'm trying to understand if something should clear it on the > hotplug side (there's already one call to clear it in remove_board()). > > In step 2 (pcie_set_target_speed() in step 1 succeeded), > pcie_failed_link_retrain() attempts to restore >2.5GT/s speed, this only > occurs when pci_match_id() matches. I guess you're trying to say that step > 2 is not taken because pci_match_id() is not matching but the wording > above is very confusing. > > Overall, I failed to understand the scenario here fully despite trying to > think it through over these few days. > >> the target link speed >> field of the Link Control >> 2 Register will keep 0x1. >> >> In order to fix the issue, don't do the retraining work except ASMedia >> ASM2824. >> >> Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures") >> Reported-by: Adrian Huang <ahuang12@lenovo.com> >> Signed-off-by: Jiwei Sun <sunjw10@lenovo.com> >> --- >> drivers/pci/quirks.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c >> index 605628c810a5..ff04ebd9ae16 100644 >> --- a/drivers/pci/quirks.c >> +++ b/drivers/pci/quirks.c >> @@ -104,6 +104,9 @@ int pcie_failed_link_retrain(struct pci_dev *dev) >> u16 lnksta, lnkctl2; >> int ret = -ENOTTY; >> >> + if (!pci_match_id(ids, dev)) >> + return 0; >> + >> if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) || >> !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting) >> return ret; >> @@ -129,8 +132,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) >> } >> >> if ((lnksta & PCI_EXP_LNKSTA_DLLLA) && >> - (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT && >> - pci_match_id(ids, dev)) { >> + (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) { >> u32 lnkcap; >> >> pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n"); >> > ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing 2025-01-14 15:04 ` Jiwei @ 2025-01-14 18:25 ` Ilpo Järvinen 2025-01-15 10:18 ` Lukas Wunner 2025-01-15 11:39 ` Jiwei 0 siblings, 2 replies; 13+ messages in thread From: Ilpo Järvinen @ 2025-01-14 18:25 UTC (permalink / raw) To: Jiwei, Lukas Wunner Cc: macro, bhelgaas, linux-pci, LKML, guojinhui.liam, helgaas, ahuang12, sunjw10 [-- Attachment #1: Type: text/plain, Size: 56668 bytes --] On Tue, 14 Jan 2025, Jiwei wrote: > On 1/13/25 23:08, Ilpo Järvinen wrote: > > On Fri, 10 Jan 2025, Jiwei Sun wrote: > > > >> From: Jiwei Sun <sunjw10@lenovo.com> > >> > >> When we do the quick hot-add/hot-remove test (within 1 second) with a PCIE > >> Gen 5 NVMe disk, there is a possibility that the PCIe bridge will decrease > >> to 2.5GT/s from 32GT/s > >> > >> pcieport 10002:00:04.0: pciehp: Slot(75): Link Down > >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present > >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found > >> ... > >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present > >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found > >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present > >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found > >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present > >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found > >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present > >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found > >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present > >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found > >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present > >> pcieport 10002:00:04.0: broken device, retraining non-functional downstream link at 2.5GT/s > >> pcieport 10002:00:04.0: pciehp: Slot(75): No link > >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present > >> pcieport 10002:00:04.0: pciehp: Slot(75): Link Up > >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found > >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present > >> pcieport 10002:00:04.0: pciehp: Slot(75): No device found > >> pcieport 10002:00:04.0: pciehp: Slot(75): Card present > >> pci 10002:02:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint > >> pci 10002:02:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] > >> pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] > >> pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs > >> pci 10002:02:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 10002:00:04.0 (capable of 126.028 Gb/s with 32.0 GT/s PCIe x4 link) > >> > >> If a NVMe disk is hot removed, the pciehp interrupt will be triggered, and > >> the kernel thread pciehp_ist will be woken up, the > >> pcie_failed_link_retrain() will be called as the following call trace. > >> > >> irq/87-pciehp-2524 [121] ..... 152046.006765: pcie_failed_link_retrain <-pcie_wait_for_link > >> irq/87-pciehp-2524 [121] ..... 152046.006782: <stack trace> > >> => [FTRACE TRAMPOLINE] > >> => pcie_failed_link_retrain > >> => pcie_wait_for_link > >> => pciehp_check_link_status > >> => pciehp_enable_slot > >> => pciehp_handle_presence_or_link_change > >> => pciehp_ist > >> => irq_thread_fn > >> => irq_thread > >> => kthread > >> => ret_from_fork > >> => ret_from_fork_asm > >> > >> Accorind to investigation, the issue is caused by the following scenerios, > >> > >> NVMe disk pciehp hardirq > >> hot-remove top-half pciehp irq kernel thread > >> ====================================================================== > >> pciehp hardirq > >> will be triggered > >> cpu handle pciehp > >> hardirq > >> pciehp irq kthread will > >> be woken up > >> pciehp_ist > >> ... > >> pcie_failed_link_retrain > >> read PCI_EXP_LNKCTL2 register > >> read PCI_EXP_LNKSTA register > >> If NVMe disk > >> hot-add before > >> calling pcie_retrain_link() > >> set target speed to 2_5GT > > > > This assumes LBMS has been seen but DLLLA isn't? Why is that? > > Please look at the content below. > > > > >> pcie_bwctrl_change_speed > >> pcie_retrain_link > > > >> : the retrain work will be > >> successful, because > >> pci_match_id() will be > >> 0 in > >> pcie_failed_link_retrain() > > > > There's no pci_match_id() in pcie_retrain_link() ?? What does that : mean? > > I think the nesting level is wrong in your flow description? > > Sorry for the confusing information, the complete meaning I want to express > is as follows, > NVMe disk pciehp hardirq > hot-remove top-half pciehp irq kernel thread > ====================================================================== > pciehp hardirq > will be triggered > cpu handle pciehp > hardirq > "pciehp" irq kthread > will be woken up > pciehp_ist > ... > pcie_failed_link_retrain > pcie_capability_read_word(PCI_EXP_LNKCTL2) > pcie_capability_read_word(PCI_EXP_LNKSTA) > If NVMe disk > hot-add before > calling pcie_retrain_link() > pcie_set_target_speed(PCIE_SPEED_2_5GT) > pcie_bwctrl_change_speed > pcie_retrain_link > // (1) The target link speed field of LNKCTL2 was set to 0x1, > // the retrain work will be successful. > // (2) Return to pcie_failed_link_retrain() > pcie_capability_read_word(PCI_EXP_LNKSTA) > if lnksta & PCI_EXP_LNKSTA_DLLLA > and PCI_EXP_LNKCTL2_TLS_2_5GT was set > and pci_match_id > pcie_capability_read_dword(PCI_EXP_LNKCAP) > pcie_set_target_speed(PCIE_LNKCAP_SLS2SPEED(lnkcap)) > > // Although the target link speed field of LNKCTL2 was set to 0x1, > // however the dev is not in ids[], the removing downstream > // link speed restriction can not be executed. > // The target link speed field of LNKCTL2 could not be restored. > > Due to the limitation of a length of 75 characters per line, the original > explanation omitted many details. > > > I don't understand how retrain success relates to the pci_match_id() as > > there are two different steps in pcie_failed_link_retrain(). > > > > In step 1, pcie_failed_link_retrain() sets speed to 2.5GT/s if DLLLA=0 and > > LBMS has been seen. Why is that condition happening in your case? You > > According to our test result, it seems so. > Maybe it is related to our test. Our test involves plugging and unplugging > multiple times within a second. Below is the dmesg log taken from our testing > process. The log below is a portion of the dmesg log that I have captured, > (Please allow me to retain the timestamps, as this information is important.) > > -------------------------------dmesg log----------------------------------------- > > [ 537.981302] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 > [ 537.981329] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 537.981338] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 > [ 538.014638] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 538.014662] ==== pciehp_ist 703 start running > [ 538.014678] pcieport 10001:80:02.0: pciehp: Slot(77): Link Down > [ 538.199104] ==== pcie_reset_lbms_count 281 lbms_count set to 0 > [ 538.199130] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 538.567377] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 538.567393] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 DLLLA=0 & LBMS=0 > [ 538.616219] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 DLLLA=1 & LBMS=0 Are all of these for the same device? It would be nice to print the pci_name() too so it's clear what device it's about. > [ 538.617594] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 539.362382] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 > [ 539.362393] ==== pcie_bwnotif_irq 256 lbms_count++ DLLLA=1 & LBMS=1 > [ 539.362400] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 > [ 539.395720] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 DLLLA=0 But LBMS did not get reset. So is this perhaps because hotplug cannot keep up with the rapid remove/add going on, and thus will not always call the remove_board() even if the device went away? Lukas, do you know if there's a good way to resolve this within hotplug side? > [ 539.787501] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 539.787514] ==== pciehp_ist 759 stop running > [ 539.787521] ==== pciehp_ist 703 start running > [ 539.787533] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 539.914182] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 540.503965] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 540.808415] ======pcie_wait_for_link_delay 4787,wait for linksta:-110 > [ 540.808430] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 116, lnkctl2:0x5, lnksta:0x1041 > [ 540.808440] ==== pcie_lbms_seen 48 count:0x1 > [ 540.808448] pcieport 10001:80:02.0: broken device, retraining non-functional downstream link at 2.5GT/s > [ 540.808452] ========== pcie_set_target_speed 172, speed has been set > [ 540.808459] pcieport 10001:80:02.0: retraining sucessfully, but now is in Gen 1 > [ 540.808466] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 135, oldlnkctl2:0x5,newlnkctl2:0x5,newlnksta:0x1041 -- i. > [ 541.041386] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 541.041398] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 541.091231] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 541.568126] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 541.568135] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 541.568142] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 541.568168] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 542.029334] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 542.029347] ==== pciehp_ist 759 stop running > [ 542.029353] ==== pciehp_ist 703 start running > [ 542.029362] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 542.120676] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 542.120687] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 542.170424] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 542.172337] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 542.223909] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 > [ 542.223917] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 542.223924] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 > [ 542.257249] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 542.809830] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 542.809841] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 542.859463] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 543.097871] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 543.097879] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 543.097885] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 543.097905] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 543.391250] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 543.391260] ==== pciehp_ist 759 stop running > [ 543.391265] ==== pciehp_ist 703 start running > [ 543.391273] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 543.650507] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 543.650517] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 543.700174] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 543.700205] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 544.296255] pci 10001:81:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint > [ 544.296298] pci 10001:81:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] > [ 544.296515] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] > [ 544.296522] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs > [ 544.297256] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 > [ 544.297279] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 544.297288] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 544.297295] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 544.297301] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 544.297314] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned > [ 544.297337] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space > [ 544.297344] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign > [ 544.297352] pcieport 10001:80:02.0: PCI bridge to [bus 81] > [ 544.297363] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] > [ 544.297373] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] > [ 544.297385] PCI: No. 2 try to assign unassigned res > [ 544.297390] release child resource [mem 0xbb000000-0xbb007fff 64bit] > [ 544.297396] pcieport 10001:80:02.0: resource 14 [mem 0xbb000000-0xbb0fffff] released > [ 544.297403] pcieport 10001:80:02.0: PCI bridge to [bus 81] > [ 544.297412] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 > [ 544.297422] pcieport 10001:80:02.0: bridge window [mem 0x00100000-0x001fffff] to [bus 81] add_size 300000 add_align 100000 > [ 544.297438] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: can't assign; no space > [ 544.297444] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: failed to assign > [ 544.297451] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 544.297457] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 544.297464] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: assigned > [ 544.297473] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to expand by 0x300000 > [ 544.297481] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to add 300000 > [ 544.297488] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 544.297494] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 544.297503] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned > [ 544.297524] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space > [ 544.297530] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign > [ 544.297538] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned > [ 544.297558] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space > [ 544.297563] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign > [ 544.297569] pcieport 10001:80:02.0: PCI bridge to [bus 81] > [ 544.297579] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] > [ 544.297588] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] > [ 544.298256] nvme nvme1: pci function 10001:81:00.0 > [ 544.298278] nvme 10001:81:00.0: enabling device (0000 -> 0002) > [ 544.298291] pcieport 10001:80:02.0: can't derive routing for PCI INT A > [ 544.298298] nvme 10001:81:00.0: PCI INT A: no GSI > [ 544.875198] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 544.875208] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 544.875215] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 544.875231] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 544.875910] ==== pciehp_ist 759 stop running > [ 544.875920] ==== pciehp_ist 703 start running > [ 544.875928] pcieport 10001:80:02.0: pciehp: Slot(77): Link Down > [ 544.876857] ==== pcie_reset_lbms_count 281 lbms_count set to 0 > [ 544.876868] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 545.427157] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 545.427169] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 545.476411] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 545.478099] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 545.857887] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 545.857896] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 545.857902] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 545.857929] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 546.410193] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 546.410205] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 546.460531] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 546.697008] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 546.697020] ==== pciehp_ist 759 stop running > [ 546.697025] ==== pciehp_ist 703 start running > [ 546.697034] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 546.697039] pcieport 10001:80:02.0: pciehp: Slot(77): Link Up > [ 546.718015] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 546.987498] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 546.987507] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 546.987514] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 546.987542] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 547.539681] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 547.539693] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 547.589214] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 547.850003] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 547.850011] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 547.850018] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 547.850046] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 547.996918] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 547.996930] ==== pciehp_ist 759 stop running > [ 547.996934] ==== pciehp_ist 703 start running > [ 547.996944] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 548.401899] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 548.401911] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 548.451186] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 548.452886] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 548.682838] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 548.682846] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 548.682852] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 548.682871] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 549.235408] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 549.235420] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 549.284761] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 549.654883] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 549.654892] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 549.654899] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 549.654926] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 549.738806] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 549.738815] ==== pciehp_ist 759 stop running > [ 549.738819] ==== pciehp_ist 703 start running > [ 549.738829] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 550.207186] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 550.207198] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 550.256868] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 550.256890] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 550.575344] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 550.575353] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 550.575360] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 550.575386] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 551.127757] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 551.127768] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 551.177224] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 551.477699] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 551.477711] ==== pciehp_ist 759 stop running > [ 551.477716] ==== pciehp_ist 703 start running > [ 551.477725] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 551.477730] pcieport 10001:80:02.0: pciehp: Slot(77): Link Up > [ 551.498667] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 551.788685] pci 10001:81:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint > [ 551.788723] pci 10001:81:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] > [ 551.788933] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] > [ 551.788941] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs > [ 551.789619] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 > [ 551.789653] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 551.789663] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 551.789672] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 551.789677] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 551.789688] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned > [ 551.789708] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space > [ 551.789715] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign > [ 551.789722] pcieport 10001:80:02.0: PCI bridge to [bus 81] > [ 551.789733] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] > [ 551.789743] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] > [ 551.789755] PCI: No. 2 try to assign unassigned res > [ 551.789759] release child resource [mem 0xbb000000-0xbb007fff 64bit] > [ 551.789764] pcieport 10001:80:02.0: resource 14 [mem 0xbb000000-0xbb0fffff] released > [ 551.789771] pcieport 10001:80:02.0: PCI bridge to [bus 81] > [ 551.789779] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 > [ 551.789790] pcieport 10001:80:02.0: bridge window [mem 0x00100000-0x001fffff] to [bus 81] add_size 300000 add_align 100000 > [ 551.789804] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: can't assign; no space > [ 551.789811] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: failed to assign > [ 551.789817] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 551.789823] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 551.789831] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: assigned > [ 551.789839] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to expand by 0x300000 > [ 551.789847] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to add 300000 > [ 551.789854] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 551.789860] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 551.789869] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned > [ 551.789889] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space > [ 551.789895] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign > [ 551.789903] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned > [ 551.789921] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space > [ 551.789927] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign > [ 551.789933] pcieport 10001:80:02.0: PCI bridge to [bus 81] > [ 551.789942] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] > [ 551.789951] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] > [ 551.790638] nvme nvme1: pci function 10001:81:00.0 > [ 551.790656] nvme 10001:81:00.0: enabling device (0000 -> 0002) > [ 551.790667] pcieport 10001:80:02.0: can't derive routing for PCI INT A > [ 551.790674] nvme 10001:81:00.0: PCI INT A: no GSI > [ 552.546963] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 552.546973] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 552.546980] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 552.546996] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 552.547590] ==== pciehp_ist 759 stop running > [ 552.547598] ==== pciehp_ist 703 start running > [ 552.547605] pcieport 10001:80:02.0: pciehp: Slot(77): Link Down > [ 552.548215] ==== pcie_reset_lbms_count 281 lbms_count set to 0 > [ 552.548224] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 553.098957] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 553.098969] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 553.148031] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 553.149553] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 553.499647] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 553.499654] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 553.499660] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 553.499683] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 554.052313] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 554.052325] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 554.102175] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 554.265181] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 554.265188] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 554.265194] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 554.265217] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 554.453449] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 554.453458] ==== pciehp_ist 759 stop running > [ 554.453463] ==== pciehp_ist 703 start running > [ 554.453472] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 554.743040] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 555.475369] ======pcie_wait_for_link_delay 4787,wait for linksta:-110 > [ 555.475384] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 116, lnkctl2:0x5, lnksta:0x1041 > [ 555.475392] ==== pcie_lbms_seen 48 count:0x2 > [ 555.475398] pcieport 10001:80:02.0: broken device, retraining non-functional downstream link at 2.5GT/s > [ 555.475404] ========== pcie_set_target_speed 172, speed has been set > [ 555.475409] pcieport 10001:80:02.0: retraining sucessfully, but now is in Gen 1 > [ 555.475417] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 135, oldlnkctl2:0x5,newlnkctl2:0x5,newlnksta:0x1041 > [ 556.633310] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 556.633322] ==== pciehp_ist 759 stop running > [ 556.633328] ==== pciehp_ist 703 start running > [ 556.633336] ==== pciehp_ist 759 stop running > [ 556.828412] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 556.828440] ==== pciehp_ist 703 start running > [ 556.828448] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 557.017389] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 557.017400] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 557.066666] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 557.066688] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 557.209334] pci 10001:81:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint > [ 557.209374] pci 10001:81:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] > [ 557.209585] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] > [ 557.209592] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs > [ 557.210275] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 > [ 557.210292] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 557.210300] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 557.210307] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 557.210312] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 557.210322] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned > [ 557.210342] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space > [ 557.210349] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign > [ 557.210356] pcieport 10001:80:02.0: PCI bridge to [bus 81] > [ 557.210366] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] > [ 557.210376] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] > [ 557.210388] PCI: No. 2 try to assign unassigned res > [ 557.210392] release child resource [mem 0xbb000000-0xbb007fff 64bit] > [ 557.210397] pcieport 10001:80:02.0: resource 14 [mem 0xbb000000-0xbb0fffff] released > [ 557.210405] pcieport 10001:80:02.0: PCI bridge to [bus 81] > [ 557.210414] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 > [ 557.210424] pcieport 10001:80:02.0: bridge window [mem 0x00100000-0x001fffff] to [bus 81] add_size 300000 add_align 100000 > [ 557.210438] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: can't assign; no space > [ 557.210445] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: failed to assign > [ 557.210451] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 557.210457] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 557.210464] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: assigned > [ 557.210472] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to expand by 0x300000 > [ 557.210479] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to add 300000 > [ 557.210487] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 557.210492] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 557.210501] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned > [ 557.210521] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space > [ 557.210527] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign > [ 557.210534] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned > [ 557.210553] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space > [ 557.210559] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign > [ 557.210565] pcieport 10001:80:02.0: PCI bridge to [bus 81] > [ 557.210574] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] > [ 557.210583] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] > [ 557.211286] nvme nvme1: pci function 10001:81:00.0 > [ 557.211303] nvme 10001:81:00.0: enabling device (0000 -> 0002) > [ 557.211315] pcieport 10001:80:02.0: can't derive routing for PCI INT A > [ 557.211322] nvme 10001:81:00.0: PCI INT A: no GSI > [ 557.565811] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 557.565820] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 557.565827] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 557.565842] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 557.566410] ==== pciehp_ist 759 stop running > [ 557.566416] ==== pciehp_ist 703 start running > [ 557.566423] pcieport 10001:80:02.0: pciehp: Slot(77): Link Down > [ 557.567592] ==== pcie_reset_lbms_count 281 lbms_count set to 0 > [ 557.567602] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 558.117581] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 558.117594] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 558.166639] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 558.168190] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 558.376176] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 558.376184] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 558.376190] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 558.376208] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 558.928611] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 558.928621] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 558.977769] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 559.186385] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 559.186394] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 559.186400] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 559.186419] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 559.459099] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 559.459111] ==== pciehp_ist 759 stop running > [ 559.459116] ==== pciehp_ist 703 start running > [ 559.459124] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 559.738599] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 559.738610] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 559.787690] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 559.787712] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 560.307243] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 560.307253] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 560.307260] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 560.307282] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 560.978997] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 560.979007] ==== pciehp_ist 759 stop running > [ 560.979013] ==== pciehp_ist 703 start running > [ 560.979022] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 561.410141] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 561.410153] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 561.459064] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 561.459087] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 561.648520] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 561.648528] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 561.648536] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 561.648559] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 562.247076] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 562.247087] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 562.296600] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 562.454228] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 > [ 562.454236] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 562.454244] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 > [ 562.487632] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 562.674863] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 562.674874] ==== pciehp_ist 759 stop running > [ 562.674879] ==== pciehp_ist 703 start running > [ 562.674888] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 563.696784] ======pcie_wait_for_link_delay 4787,wait for linksta:-110 > [ 563.696798] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 116, lnkctl2:0x5, lnksta:0x1041 > [ 563.696806] ==== pcie_lbms_seen 48 count:0x5 > [ 563.696813] pcieport 10001:80:02.0: broken device, retraining non-functional downstream link at 2.5GT/s > [ 563.696817] ========== pcie_set_target_speed 172, speed has been set > [ 563.696823] pcieport 10001:80:02.0: retraining sucessfully, but now is in Gen 1 > [ 563.696830] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 135, oldlnkctl2:0x5,newlnkctl2:0x5,newlnksta:0x1041 > [ 564.133582] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 564.133594] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 564.183003] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 564.364911] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 564.364921] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 564.364930] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 564.364954] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 564.889708] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 564.889719] ==== pciehp_ist 759 stop running > [ 564.889724] ==== pciehp_ist 703 start running > [ 564.889732] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 565.493151] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 565.493162] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 565.542478] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 565.542501] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 565.752276] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 > [ 565.752285] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 565.752291] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 > [ 565.752316] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 566.359793] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 566.359804] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 566.408820] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 566.581150] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 > [ 566.581159] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 566.581166] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 > [ 566.614491] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 566.755582] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 566.755591] ==== pciehp_ist 759 stop running > [ 566.755596] ==== pciehp_ist 703 start running > [ 566.755605] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 567.751399] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 > [ 567.751412] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > [ 567.776517] ======pcie_wait_for_link_delay 4787,wait for linksta:-110 > [ 567.776529] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 116, lnkctl2:0x5, lnksta:0x1845 > [ 567.776538] ==== pcie_lbms_seen 48 count:0x8 > [ 567.776544] pcieport 10001:80:02.0: broken device, retraining non-functional downstream link at 2.5GT/s > [ 567.801147] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > [ 567.801177] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 > [ 567.801184] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 567.801192] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 > [ 567.801201] ==== pcie_reset_lbms_count 281 lbms_count set to 0 > [ 567.801207] ========== pcie_set_target_speed 189, bwctl change speed ret:0x0 > [ 567.801214] pcieport 10001:80:02.0: retraining sucessfully, but now is in Gen 1 > [ 567.801220] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 135, oldlnkctl2:0x5,newlnkctl2:0x1,newlnksta:0x3841 > [ 567.815102] ==== pcie_bwnotif_irq 247(start running),link_status:0x7041 > [ 567.815110] ==== pcie_bwnotif_irq 256 lbms_count++ > [ 567.815117] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7041 > [ 567.910155] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > [ 568.961434] pcieport 10001:80:02.0: pciehp: Slot(77): No device found > [ 568.961444] ==== pciehp_ist 759 stop running > [ 568.961450] ==== pciehp_ist 703 start running > [ 568.961459] pcieport 10001:80:02.0: pciehp: Slot(77): Card present > [ 569.008665] ==== pcie_bwnotif_irq 247(start running),link_status:0x3041 > [ 569.010428] ======pcie_wait_for_link_delay 4787,wait for linksta:0 > [ 569.391482] pci 10001:81:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint > [ 569.391549] pci 10001:81:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] > [ 569.391968] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] > [ 569.391975] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs > [ 569.392869] pci 10001:81:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 10001:80:02.0 (capable of 126.028 Gb/s with 32.0 GT/s PCIe x4 link) > [ 569.393233] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 > [ 569.393249] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 569.393257] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 569.393264] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 569.393270] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 569.393279] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned > [ 569.393315] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space > [ 569.393322] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign > [ 569.393329] pcieport 10001:80:02.0: PCI bridge to [bus 81] > [ 569.393340] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] > [ 569.393350] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] > [ 569.393362] PCI: No. 2 try to assign unassigned res > [ 569.393366] release child resource [mem 0xbb000000-0xbb007fff 64bit] > [ 569.393371] pcieport 10001:80:02.0: resource 14 [mem 0xbb000000-0xbb0fffff] released > [ 569.393378] pcieport 10001:80:02.0: PCI bridge to [bus 81] > [ 569.393404] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 > [ 569.393414] pcieport 10001:80:02.0: bridge window [mem 0x00100000-0x001fffff] to [bus 81] add_size 300000 add_align 100000 > [ 569.393430] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: can't assign; no space > [ 569.393438] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: failed to assign > [ 569.393445] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 569.393451] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 569.393458] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: assigned > [ 569.393466] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to expand by 0x300000 > [ 569.393474] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to add 300000 > [ 569.393481] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space > [ 569.393487] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign > [ 569.393495] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned > [ 569.393529] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space > [ 569.393536] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign > [ 569.393543] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned > [ 569.393576] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space > [ 569.393582] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign > [ 569.393588] pcieport 10001:80:02.0: PCI bridge to [bus 81] > [ 569.393597] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] > [ 569.393606] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] > [ 569.394076] nvme nvme1: pci function 10001:81:00.0 > [ 569.394095] nvme 10001:81:00.0: enabling device (0000 -> 0002) > [ 569.394109] pcieport 10001:80:02.0: can't derive routing for PCI INT A > [ 569.394116] nvme 10001:81:00.0: PCI INT A: no GSI > [ 570.158994] nvme nvme1: D3 entry latency set to 10 seconds > [ 570.239267] nvme nvme1: 127/0/0 default/read/poll queues > [ 570.287896] ==== pciehp_ist 759 stop running > [ 570.287911] ==== pciehp_ist 703 start running > [ 570.287918] ==== pciehp_ist 759 stop running > [ 570.288953] nvme1n1: p1 p2 p3 p4 p5 p6 p7 > > -------------------------------dmesg log----------------------------------------- > > >From the log above, it can be seen that I added some debugging codes in the kernel. > The specific modifications are as follows: > > -------------------------------diff file----------------------------------------- > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c > index bb5a8d9f03ad..c9f3ed86a084 100644 > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -700,6 +700,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) > irqreturn_t ret; > u32 events; > > + printk("==== %s %d start running\n", __func__, __LINE__); > ctrl->ist_running = true; > pci_config_pm_runtime_get(pdev); > > @@ -755,6 +756,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) > pci_config_pm_runtime_put(pdev); > ctrl->ist_running = false; > wake_up(&ctrl->requester); > + printk("==== %s %d stop running\n", __func__, __LINE__); > return ret; > } > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 661f98c6c63a..ffa58f389456 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4784,6 +4784,7 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active, > if (active) > msleep(20); > rc = pcie_wait_for_link_status(pdev, false, active); > + printk("======%s %d,wait for linksta:%d\n", __func__, __LINE__, rc); > if (active) { > if (rc) > rc = pcie_failed_link_retrain(pdev); > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 2e40fc63ba31..b7e5af859517 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -337,12 +337,13 @@ void pci_bus_put(struct pci_bus *bus); > > #define PCIE_LNKCAP_SLS2SPEED(lnkcap) \ > ({ \ > - ((lnkcap) == PCI_EXP_LNKCAP_SLS_64_0GB ? PCIE_SPEED_64_0GT : \ > - (lnkcap) == PCI_EXP_LNKCAP_SLS_32_0GB ? PCIE_SPEED_32_0GT : \ > - (lnkcap) == PCI_EXP_LNKCAP_SLS_16_0GB ? PCIE_SPEED_16_0GT : \ > - (lnkcap) == PCI_EXP_LNKCAP_SLS_8_0GB ? PCIE_SPEED_8_0GT : \ > - (lnkcap) == PCI_EXP_LNKCAP_SLS_5_0GB ? PCIE_SPEED_5_0GT : \ > - (lnkcap) == PCI_EXP_LNKCAP_SLS_2_5GB ? PCIE_SPEED_2_5GT : \ > + u32 __lnkcap = (lnkcap) & PCI_EXP_LNKCAP_SLS; \ > + (__lnkcap == PCI_EXP_LNKCAP_SLS_64_0GB ? PCIE_SPEED_64_0GT : \ > + __lnkcap == PCI_EXP_LNKCAP_SLS_32_0GB ? PCIE_SPEED_32_0GT : \ > + __lnkcap == PCI_EXP_LNKCAP_SLS_16_0GB ? PCIE_SPEED_16_0GT : \ > + __lnkcap == PCI_EXP_LNKCAP_SLS_8_0GB ? PCIE_SPEED_8_0GT : \ > + __lnkcap == PCI_EXP_LNKCAP_SLS_5_0GB ? PCIE_SPEED_5_0GT : \ > + __lnkcap == PCI_EXP_LNKCAP_SLS_2_5GB ? PCIE_SPEED_2_5GT : \ > PCI_SPEED_UNKNOWN); \ > }) > > @@ -357,13 +358,16 @@ void pci_bus_put(struct pci_bus *bus); > PCI_SPEED_UNKNOWN) > > #define PCIE_LNKCTL2_TLS2SPEED(lnkctl2) \ > - ((lnkctl2) == PCI_EXP_LNKCTL2_TLS_64_0GT ? PCIE_SPEED_64_0GT : \ > - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_32_0GT ? PCIE_SPEED_32_0GT : \ > - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_16_0GT ? PCIE_SPEED_16_0GT : \ > - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_8_0GT ? PCIE_SPEED_8_0GT : \ > - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_5_0GT ? PCIE_SPEED_5_0GT : \ > - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_2_5GT ? PCIE_SPEED_2_5GT : \ > - PCI_SPEED_UNKNOWN) > +({ \ > + u16 __lnkctl2 = (lnkctl2) & PCI_EXP_LNKCTL2_TLS; \ > + (__lnkctl2 == PCI_EXP_LNKCTL2_TLS_64_0GT ? PCIE_SPEED_64_0GT : \ > + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_32_0GT ? PCIE_SPEED_32_0GT : \ > + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_16_0GT ? PCIE_SPEED_16_0GT : \ > + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_8_0GT ? PCIE_SPEED_8_0GT : \ > + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_5_0GT ? PCIE_SPEED_5_0GT : \ > + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_2_5GT ? PCIE_SPEED_2_5GT : \ > + PCI_SPEED_UNKNOWN); \ > +}) > > /* PCIe speed to Mb/s reduced by encoding overhead */ > #define PCIE_SPEED2MBS_ENC(speed) \ > diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c > index b59cacc740fa..a8ce09f67d3b 100644 > --- a/drivers/pci/pcie/bwctrl.c > +++ b/drivers/pci/pcie/bwctrl.c > @@ -168,8 +168,10 @@ int pcie_set_target_speed(struct pci_dev *port, enum pci_bus_speed speed_req, > if (WARN_ON_ONCE(!pcie_valid_speed(speed_req))) > return -EINVAL; > > - if (bus && bus->cur_bus_speed == speed_req) > + if (bus && bus->cur_bus_speed == speed_req) { > + printk("========== %s %d, speed has been set\n", __func__, __LINE__); > return 0; > + } > > target_speed = pcie_bwctrl_select_speed(port, speed_req); > > @@ -184,6 +186,7 @@ int pcie_set_target_speed(struct pci_dev *port, enum pci_bus_speed speed_req, > mutex_lock(&data->set_speed_mutex); > > ret = pcie_bwctrl_change_speed(port, target_speed, use_lt); > + printk("========== %s %d, bwctl change speed ret:0x%x\n", __func__, __LINE__,ret); > > if (data) > mutex_unlock(&data->set_speed_mutex); > @@ -209,8 +212,10 @@ static void pcie_bwnotif_enable(struct pcie_device *srv) > > /* Count LBMS seen so far as one */ > ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status); > - if (ret == PCIBIOS_SUCCESSFUL && link_status & PCI_EXP_LNKSTA_LBMS) > + if (ret == PCIBIOS_SUCCESSFUL && link_status & PCI_EXP_LNKSTA_LBMS) { > + printk("==== %s %d lbms_count++\n", __func__, __LINE__); > atomic_inc(&data->lbms_count); > + } > > pcie_capability_set_word(port, PCI_EXP_LNKCTL, > PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE); > @@ -239,6 +244,7 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context) > int ret; > > ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status); > + printk("==== %s %d(start running),link_status:0x%x\n", __func__, __LINE__,link_status); > if (ret != PCIBIOS_SUCCESSFUL) > return IRQ_NONE; > > @@ -246,8 +252,10 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context) > if (!events) > return IRQ_NONE; > > - if (events & PCI_EXP_LNKSTA_LBMS) > + if (events & PCI_EXP_LNKSTA_LBMS) { > + printk("==== %s %d lbms_count++\n", __func__, __LINE__); > atomic_inc(&data->lbms_count); > + } > > pcie_capability_write_word(port, PCI_EXP_LNKSTA, events); > > @@ -258,6 +266,7 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context) > * cleared to avoid missing link speed changes. > */ > pcie_update_link_speed(port->subordinate); > + printk("==== %s %d(stop running),link_status:0x%x\n", __func__, __LINE__,link_status); > > return IRQ_HANDLED; > } > @@ -268,8 +277,10 @@ void pcie_reset_lbms_count(struct pci_dev *port) > > guard(rwsem_read)(&pcie_bwctrl_lbms_rwsem); > data = port->link_bwctrl; > - if (data) > + if (data) { > + printk("==== %s %d lbms_count set to 0\n", __func__, __LINE__); > atomic_set(&data->lbms_count, 0); > + } > else > pcie_capability_write_word(port, PCI_EXP_LNKSTA, > PCI_EXP_LNKSTA_LBMS); > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 76f4df75b08a..a602f9aa5d6a 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -41,8 +41,11 @@ static bool pcie_lbms_seen(struct pci_dev *dev, u16 lnksta) > int ret; > > ret = pcie_lbms_count(dev, &count); > - if (ret < 0) > + if (ret < 0) { > + printk("==== %s %d lnksta(0x%x) & LBMS\n", __func__, __LINE__, lnksta); > return lnksta & PCI_EXP_LNKSTA_LBMS; > + } > + printk("==== %s %d count:0x%lx\n", __func__, __LINE__, count); > > return count > 0; > } > @@ -110,6 +113,8 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > > pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > + pci_info(dev, "============ %s %d, lnkctl2:0x%x, lnksta:0x%x\n", > + __func__, __LINE__, lnkctl2, lnksta); > if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) { > u16 oldlnkctl2 = lnkctl2; > > @@ -121,9 +126,14 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2), > true); > return ret; > + } else { > + pci_info(dev, "retraining sucessfully, but now is in Gen 1\n"); > } > > + pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > + pci_info(dev, "============ %s %d, oldlnkctl2:0x%x,newlnkctl2:0x%x,newlnksta:0x%x\n", > + __func__, __LINE__, oldlnkctl2, lnkctl2, lnksta); > } > > if ((lnksta & PCI_EXP_LNKSTA_DLLLA) && > > -------------------------------diff file----------------------------------------- > > Based on the information in the log from 566.755596 to 567.801220, the issue > has been reproduced. Between 566 and 567 seconds, the pcie_bwnotif_irq interrupt > was triggered 4 times, this indicates that during this period, the NVMe drive > was plugged and unplugged multiple times. > > Thanks, > Regards, > Jiwei > > > didn't explain LBMS (nor DLLLA) in the above sequence so it's hard to > > follow what is going on here. LBMS in particular is of high interest here > > because I'm trying to understand if something should clear it on the > > hotplug side (there's already one call to clear it in remove_board()). > > > > In step 2 (pcie_set_target_speed() in step 1 succeeded), > > pcie_failed_link_retrain() attempts to restore >2.5GT/s speed, this only > > occurs when pci_match_id() matches. I guess you're trying to say that step > > 2 is not taken because pci_match_id() is not matching but the wording > > above is very confusing. > > > > Overall, I failed to understand the scenario here fully despite trying to > > think it through over these few days. > > > >> the target link speed > >> field of the Link Control > >> 2 Register will keep 0x1. > >> > >> In order to fix the issue, don't do the retraining work except ASMedia > >> ASM2824. > >> > >> Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures") > >> Reported-by: Adrian Huang <ahuang12@lenovo.com> > >> Signed-off-by: Jiwei Sun <sunjw10@lenovo.com> > >> --- > >> drivers/pci/quirks.c | 6 ++++-- > >> 1 file changed, 4 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > >> index 605628c810a5..ff04ebd9ae16 100644 > >> --- a/drivers/pci/quirks.c > >> +++ b/drivers/pci/quirks.c > >> @@ -104,6 +104,9 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > >> u16 lnksta, lnkctl2; > >> int ret = -ENOTTY; > >> > >> + if (!pci_match_id(ids, dev)) > >> + return 0; > >> + > >> if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) || > >> !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting) > >> return ret; > >> @@ -129,8 +132,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > >> } > >> > >> if ((lnksta & PCI_EXP_LNKSTA_DLLLA) && > >> - (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT && > >> - pci_match_id(ids, dev)) { > >> + (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) { > >> u32 lnkcap; > >> > >> pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n"); > >> > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing 2025-01-14 18:25 ` Ilpo Järvinen @ 2025-01-15 10:18 ` Lukas Wunner 2025-11-25 19:23 ` [External] : " ALOK TIWARI 2025-01-15 11:39 ` Jiwei 1 sibling, 1 reply; 13+ messages in thread From: Lukas Wunner @ 2025-01-15 10:18 UTC (permalink / raw) To: Ilpo Järvinen Cc: Jiwei, macro, bhelgaas, linux-pci, LKML, guojinhui.liam, helgaas, ahuang12, sunjw10 On Tue, Jan 14, 2025 at 08:25:04PM +0200, Ilpo Järvinen wrote: > On Tue, 14 Jan 2025, Jiwei wrote: > > [ 539.362400] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 > > [ 539.395720] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > > DLLLA=0 > > But LBMS did not get reset. > > So is this perhaps because hotplug cannot keep up with the rapid > remove/add going on, and thus will not always call the remove_board() > even if the device went away? > > Lukas, do you know if there's a good way to resolve this within hotplug > side? I believe the pciehp code is fine and suspect this is an issue in the quirk. We've been dealing with rapid add/remove in pciehp for years without issues. I don't understand the quirk sufficiently to make a guess what's going wrong, but I'm wondering if there could be a race accessing the lbms_count? Maybe if lbms_count is replaced by a flag in pci_dev->priv_flags as we've discussed, with proper memory barriers where necessary, this problem will solve itself? Thanks, Lukas ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [External] : Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing 2025-01-15 10:18 ` Lukas Wunner @ 2025-11-25 19:23 ` ALOK TIWARI 2025-12-01 3:54 ` Maciej W. Rozycki 0 siblings, 1 reply; 13+ messages in thread From: ALOK TIWARI @ 2025-11-25 19:23 UTC (permalink / raw) To: Lukas Wunner, Ilpo Järvinen, bhelgaas Cc: Jiwei, macro, linux-pci, LKML, guojinhui.liam, helgaas, ahuang12, sunjw10 Hi, On 1/15/2025 3:48 PM, Lukas Wunner wrote: > On Tue, Jan 14, 2025 at 08:25:04PM +0200, Ilpo Järvinen wrote: >> On Tue, 14 Jan 2025, Jiwei wrote: >>> [ 539.362400] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 >>> [ 539.395720] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> >> DLLLA=0 >> >> But LBMS did not get reset. >> >> So is this perhaps because hotplug cannot keep up with the rapid >> remove/add going on, and thus will not always call the remove_board() >> even if the device went away? >> >> Lukas, do you know if there's a good way to resolve this within hotplug >> side? > > I believe the pciehp code is fine and suspect this is an issue > in the quirk. We've been dealing with rapid add/remove in pciehp > for years without issues. > > I don't understand the quirk sufficiently to make a guess > what's going wrong, but I'm wondering if there could be > a race accessing the lbms_count? > > Maybe if lbms_count is replaced by a flag in pci_dev->priv_flags > as we've discussed, with proper memory barriers where necessary, > this problem will solve itself? > > Thanks, > > Lukas > We are testing hot-add/hot-remove behavior and observed the same issue as, mentioned where the PCIe bridge link speed drops from 32 GT/s to 2.5 GT/s. My understanding is that pcie_failed_link_retrain should only apply to devices matched by PCI_VDEVICE(ASMEDIA, 0x2824), but the current implementation appears to affect all devices that take longer to establish a link. We are unsure if this is intentional, but it effectively allows such devices to continue operating at a reduced speed. If we extend PCIE_LINK_RETRAIN_TIMEOUT_MS to 3000 ms, these slower devices are able to complete link training, and the problem is no longer observed in our testing. Therefore, increasing PCIE_LINK_RETRAIN_TIMEOUT_MS to 3000 ms seems to resolve the issue for us. Would it be acceptable to increase PCIE_LINK_RETRAIN_TIMEOUT_MS, from 1000 to 3000 ms in this case? Thanks, Alok ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [External] : Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing 2025-11-25 19:23 ` [External] : " ALOK TIWARI @ 2025-12-01 3:54 ` Maciej W. Rozycki 0 siblings, 0 replies; 13+ messages in thread From: Maciej W. Rozycki @ 2025-12-01 3:54 UTC (permalink / raw) To: ALOK TIWARI Cc: Lukas Wunner, Ilpo Järvinen, Bjorn Helgaas, Jiwei, linux-pci, LKML, guojinhui.liam, Bjorn Helgaas, ahuang12, sunjw10 On Wed, 26 Nov 2025, ALOK TIWARI wrote: > We are testing hot-add/hot-remove behavior and observed the same issue as, > mentioned where the PCIe bridge link speed drops from 32 GT/s to 2.5 GT/s. > > My understanding is that pcie_failed_link_retrain should only apply to devices > matched by PCI_VDEVICE(ASMEDIA, 0x2824), > but the current implementation appears to affect all devices that take longer > to establish a link. Thank you for your report. No, there seems nothing wrong with said device by itself and the problem is either with the downstream device (which obviously cannot be discovered until a link has been actually established), or the particular device pair or setup. I've originally implemented matching for this particular device out of the abundance of caution, in case the removal of speed restriction for other upstream devices (in case the quirk triggered there) would cause the link to go back into the infinite retraining loop. > We are unsure if this is intentional, but it effectively allows such > devices to continue operating at a reduced speed. It was intentional, but didn't take into account noisy hot-plug scenarios which are not a part of my lab setup. > If we extend PCIE_LINK_RETRAIN_TIMEOUT_MS to 3000 ms, these slower devices are > able to complete link training, > and the problem is no longer observed in our testing. Therefore, increasing > PCIE_LINK_RETRAIN_TIMEOUT_MS to 3000 ms seems to resolve the issue for us. > > Would it be acceptable to increase PCIE_LINK_RETRAIN_TIMEOUT_MS, from 1000 to > 3000 ms in this case? FWIW my understanding is this goes beyond the spec actually. However given other reports I've given more thought to my idea previously shared, which has sadly received no feedback to motivate me further, and implemented yet more simplified an approach, where the 2.5GT/s speed clamp is always removed regardless of the link state and if that fails, then any original clamp as at the entry to the quirk is restored. This I hope will prove robust enough not to cause further issues with hot-plug scenarios. Please give it a try and let me know if it's fixed your issue: <https://lore.kernel.org/r/alpine.DEB.2.21.2511290245460.36486@angie.orcam.me.uk/> Maciej ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing 2025-01-14 18:25 ` Ilpo Järvinen 2025-01-15 10:18 ` Lukas Wunner @ 2025-01-15 11:39 ` Jiwei 1 sibling, 0 replies; 13+ messages in thread From: Jiwei @ 2025-01-15 11:39 UTC (permalink / raw) To: Ilpo Järvinen, Lukas Wunner Cc: macro, bhelgaas, linux-pci, LKML, guojinhui.liam, helgaas, ahuang12, sunjw10 On 1/15/25 02:25, Ilpo Järvinen wrote: > On Tue, 14 Jan 2025, Jiwei wrote: >> On 1/13/25 23:08, Ilpo Järvinen wrote: >>> On Fri, 10 Jan 2025, Jiwei Sun wrote: >>> >>>> From: Jiwei Sun <sunjw10@lenovo.com> >>>> >>>> When we do the quick hot-add/hot-remove test (within 1 second) with a PCIE >>>> Gen 5 NVMe disk, there is a possibility that the PCIe bridge will decrease >>>> to 2.5GT/s from 32GT/s >>>> >>>> pcieport 10002:00:04.0: pciehp: Slot(75): Link Down >>>> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >>>> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >>>> ... >>>> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >>>> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >>>> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >>>> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >>>> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >>>> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >>>> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >>>> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >>>> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >>>> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >>>> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >>>> pcieport 10002:00:04.0: broken device, retraining non-functional downstream link at 2.5GT/s >>>> pcieport 10002:00:04.0: pciehp: Slot(75): No link >>>> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >>>> pcieport 10002:00:04.0: pciehp: Slot(75): Link Up >>>> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >>>> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >>>> pcieport 10002:00:04.0: pciehp: Slot(75): No device found >>>> pcieport 10002:00:04.0: pciehp: Slot(75): Card present >>>> pci 10002:02:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint >>>> pci 10002:02:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] >>>> pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] >>>> pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs >>>> pci 10002:02:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 10002:00:04.0 (capable of 126.028 Gb/s with 32.0 GT/s PCIe x4 link) >>>> >>>> If a NVMe disk is hot removed, the pciehp interrupt will be triggered, and >>>> the kernel thread pciehp_ist will be woken up, the >>>> pcie_failed_link_retrain() will be called as the following call trace. >>>> >>>> irq/87-pciehp-2524 [121] ..... 152046.006765: pcie_failed_link_retrain <-pcie_wait_for_link >>>> irq/87-pciehp-2524 [121] ..... 152046.006782: <stack trace> >>>> => [FTRACE TRAMPOLINE] >>>> => pcie_failed_link_retrain >>>> => pcie_wait_for_link >>>> => pciehp_check_link_status >>>> => pciehp_enable_slot >>>> => pciehp_handle_presence_or_link_change >>>> => pciehp_ist >>>> => irq_thread_fn >>>> => irq_thread >>>> => kthread >>>> => ret_from_fork >>>> => ret_from_fork_asm >>>> >>>> Accorind to investigation, the issue is caused by the following scenerios, >>>> >>>> NVMe disk pciehp hardirq >>>> hot-remove top-half pciehp irq kernel thread >>>> ====================================================================== >>>> pciehp hardirq >>>> will be triggered >>>> cpu handle pciehp >>>> hardirq >>>> pciehp irq kthread will >>>> be woken up >>>> pciehp_ist >>>> ... >>>> pcie_failed_link_retrain >>>> read PCI_EXP_LNKCTL2 register >>>> read PCI_EXP_LNKSTA register >>>> If NVMe disk >>>> hot-add before >>>> calling pcie_retrain_link() >>>> set target speed to 2_5GT >>> >>> This assumes LBMS has been seen but DLLLA isn't? Why is that? >> >> Please look at the content below. >> >>> >>>> pcie_bwctrl_change_speed >>>> pcie_retrain_link >>> >>>> : the retrain work will be >>>> successful, because >>>> pci_match_id() will be >>>> 0 in >>>> pcie_failed_link_retrain() >>> >>> There's no pci_match_id() in pcie_retrain_link() ?? What does that : mean? >>> I think the nesting level is wrong in your flow description? >> >> Sorry for the confusing information, the complete meaning I want to express >> is as follows, >> NVMe disk pciehp hardirq >> hot-remove top-half pciehp irq kernel thread >> ====================================================================== >> pciehp hardirq >> will be triggered >> cpu handle pciehp >> hardirq >> "pciehp" irq kthread >> will be woken up >> pciehp_ist >> ... >> pcie_failed_link_retrain >> pcie_capability_read_word(PCI_EXP_LNKCTL2) >> pcie_capability_read_word(PCI_EXP_LNKSTA) >> If NVMe disk >> hot-add before >> calling pcie_retrain_link() >> pcie_set_target_speed(PCIE_SPEED_2_5GT) >> pcie_bwctrl_change_speed >> pcie_retrain_link >> // (1) The target link speed field of LNKCTL2 was set to 0x1, >> // the retrain work will be successful. >> // (2) Return to pcie_failed_link_retrain() >> pcie_capability_read_word(PCI_EXP_LNKSTA) >> if lnksta & PCI_EXP_LNKSTA_DLLLA >> and PCI_EXP_LNKCTL2_TLS_2_5GT was set >> and pci_match_id >> pcie_capability_read_dword(PCI_EXP_LNKCAP) >> pcie_set_target_speed(PCIE_LNKCAP_SLS2SPEED(lnkcap)) >> >> // Although the target link speed field of LNKCTL2 was set to 0x1, >> // however the dev is not in ids[], the removing downstream >> // link speed restriction can not be executed. >> // The target link speed field of LNKCTL2 could not be restored. >> >> Due to the limitation of a length of 75 characters per line, the original >> explanation omitted many details. >> >>> I don't understand how retrain success relates to the pci_match_id() as >>> there are two different steps in pcie_failed_link_retrain(). >>> >>> In step 1, pcie_failed_link_retrain() sets speed to 2.5GT/s if DLLLA=0 and >>> LBMS has been seen. Why is that condition happening in your case? You >> >> According to our test result, it seems so. >> Maybe it is related to our test. Our test involves plugging and unplugging >> multiple times within a second. Below is the dmesg log taken from our testing >> process. The log below is a portion of the dmesg log that I have captured, >> (Please allow me to retain the timestamps, as this information is important.) >> >> -------------------------------dmesg log----------------------------------------- >> >> [ 537.981302] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 >> [ 537.981329] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 537.981338] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 >> [ 538.014638] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 538.014662] ==== pciehp_ist 703 start running >> [ 538.014678] pcieport 10001:80:02.0: pciehp: Slot(77): Link Down >> [ 538.199104] ==== pcie_reset_lbms_count 281 lbms_count set to 0 >> [ 538.199130] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 538.567377] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 538.567393] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 > > DLLLA=0 & LBMS=0 > >> [ 538.616219] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 > > DLLLA=1 & LBMS=0 > > Are all of these for the same device? It would be nice to print the > pci_name() too so it's clear what device it's about. Yes, they are from the same device. The following log print the device name. [ 5218.875059] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 [ 5218.875080] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 256 lbms_count++ [ 5218.875090] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 [ 5218.908398] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 5218.908420] pcieport 10001:80:02.0: pciehp: ==== pciehp_ist 703 start running [ 5218.908432] pcieport 10001:80:02.0: pciehp: Slot(77): Link Down [ 5219.104559] pcieport 10001:80:02.0: bwctrl: ==== pcie_reset_lbms_count 281 lbms_count set to 0 [ 5219.104582] pcieport 10001:80:02.0: pciehp: Slot(77): Card present [ 5219.460832] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 5219.460848] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 5219.519595] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 247(start running),link_status:0x5841 [ 5219.519604] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 256 lbms_count++ [ 5219.519613] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 269(stop running),link_status:0x5841 [ 5220.104919] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 [ 5220.104931] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 [ 5220.124727] pcieport 10001:80:02.0: ======pcie_wait_for_link_delay 4787,wait for linksta:-110 [ 5220.124740] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 116, lnkctl2:0x5, lnksta:0x1845 [ 5220.124750] pcieport 10001:80:02.0: ==== pcie_lbms_seen 48 count:0x1 [ 5220.124758] pcieport 10001:80:02.0: broken device, retraining non-functional downstream link at 2.5GT/s [ 5220.154323] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 [ 5220.154351] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 [ 5220.154358] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 256 lbms_count++ [ 5220.154366] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 [ 5220.154374] pcieport 10001:80:02.0: bwctrl: ==== pcie_reset_lbms_count 281 lbms_count set to 0 [ 5220.154380] pcieport 10001:80:02.0: bwctrl: ========== pcie_set_target_speed 189, bwctl change speed ret:0x0 [ 5220.154389] pcieport 10001:80:02.0: retraining sucessfully, but now is in Gen 1 [ 5220.154395] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 135, oldlnkctl2:0x5,newlnkctl2:0x1,newlnksta:0x3841 [ 5220.168291] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 247(start running),link_status:0x7041 [ 5220.168299] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 256 lbms_count++ [ 5220.168308] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 269(stop running),link_status:0x7041 [ 5220.259128] pcieport 10001:80:02.0: bwctrl: ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 [ 5221.311642] pcieport 10001:80:02.0: pciehp: Slot(77): No device found [ 5221.311652] pcieport 10001:80:02.0: pciehp: ==== pciehp_ist 759 stop running According to the above log, I have simplified the code execution flow and provided an analysis of some key steps. PCIe bwctrl irq handler pciehp irq handler (top-half) (kernel thread) ================================================================================= pcie_bwnotif_irq atomic_inc(&data->lbms_count) //link_status:0x7841 //(LBMS==1 & DLLLA==1) //lbms_count++ pcie_bwnotif_irq //link_status:0x1041 //(LBMS==0 & DLLLA==0) pciehp_ist pciehp_handle_presence_or_link_change pciehp_disable_slot __pciehp_disable_slot remove_board pcie_reset_lbms_count // set lbms_count = 0 pciehp_enable_slot pcie_bwnotif_irq //link_status:0x9845 //(LBMS==0 & DLLLA==0) __pciehp_enable_slot pcie_bwnotif_irq atomic_inc(&data->lbms_count) board_added //link_status:0x5841 pciehp_check_link_status //(LBMS==1 & DLLLA==0) //lbms_count++, now lbms_count=1 pcie_bwnotif_irq //link_status:0x9845 pcie_wait_for_link //(LBMS==0 & DLLLA==0) pcie_wait_for_link_delay pcie_wait_for_link_status pcie_failed_link_retrain //lnksta:0x1845 // because lbms_count=1, and DLLLA == 0 // the pcie_set_target_speed will be executed. pcie_set_target_speed(PCIE_SPEED_2_5GT) // because the current link speed // field of lnksta is 0x5, // the lnkctl2 will be set to 0x1 // The speed will be limited to Gen 1 Based on the content above, we know that during the processing of pciehp_ist(), after remove_board() and before pcie_wait_for_link_status(), there are multiple rapid remove/add. This causes the previously cleared lbms_count and start counting again, which ultimately leads to entering the pcie_set_target_speed(PCIE_SPEED_2_5GT) process. Thanks, Regards, Jiwei > >> [ 538.617594] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 539.362382] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 >> [ 539.362393] ==== pcie_bwnotif_irq 256 lbms_count++ > > DLLLA=1 & LBMS=1 > >> [ 539.362400] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 >> [ 539.395720] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > > DLLLA=0 > > But LBMS did not get reset. > > So is this perhaps because hotplug cannot keep up with the rapid > remove/add going on, and thus will not always call the remove_board() > even if the device went away? > > Lukas, do you know if there's a good way to resolve this within hotplug > side? > >> [ 539.787501] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 539.787514] ==== pciehp_ist 759 stop running >> [ 539.787521] ==== pciehp_ist 703 start running >> [ 539.787533] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 539.914182] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 540.503965] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 540.808415] ======pcie_wait_for_link_delay 4787,wait for linksta:-110 >> [ 540.808430] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 116, lnkctl2:0x5, lnksta:0x1041 >> [ 540.808440] ==== pcie_lbms_seen 48 count:0x1 >> [ 540.808448] pcieport 10001:80:02.0: broken device, retraining non-functional downstream link at 2.5GT/s >> [ 540.808452] ========== pcie_set_target_speed 172, speed has been set >> [ 540.808459] pcieport 10001:80:02.0: retraining sucessfully, but now is in Gen 1 >> [ 540.808466] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 135, oldlnkctl2:0x5,newlnkctl2:0x5,newlnksta:0x1041 > > -- > i. > >> [ 541.041386] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 541.041398] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 541.091231] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 541.568126] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 541.568135] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 541.568142] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 541.568168] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 542.029334] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 542.029347] ==== pciehp_ist 759 stop running >> [ 542.029353] ==== pciehp_ist 703 start running >> [ 542.029362] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 542.120676] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 542.120687] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 542.170424] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 542.172337] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 542.223909] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 >> [ 542.223917] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 542.223924] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 >> [ 542.257249] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 542.809830] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 542.809841] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 542.859463] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 543.097871] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 543.097879] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 543.097885] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 543.097905] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 543.391250] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 543.391260] ==== pciehp_ist 759 stop running >> [ 543.391265] ==== pciehp_ist 703 start running >> [ 543.391273] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 543.650507] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 543.650517] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 543.700174] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 543.700205] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 544.296255] pci 10001:81:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint >> [ 544.296298] pci 10001:81:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] >> [ 544.296515] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] >> [ 544.296522] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs >> [ 544.297256] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 >> [ 544.297279] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 544.297288] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 544.297295] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 544.297301] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 544.297314] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned >> [ 544.297337] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space >> [ 544.297344] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign >> [ 544.297352] pcieport 10001:80:02.0: PCI bridge to [bus 81] >> [ 544.297363] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] >> [ 544.297373] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] >> [ 544.297385] PCI: No. 2 try to assign unassigned res >> [ 544.297390] release child resource [mem 0xbb000000-0xbb007fff 64bit] >> [ 544.297396] pcieport 10001:80:02.0: resource 14 [mem 0xbb000000-0xbb0fffff] released >> [ 544.297403] pcieport 10001:80:02.0: PCI bridge to [bus 81] >> [ 544.297412] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 >> [ 544.297422] pcieport 10001:80:02.0: bridge window [mem 0x00100000-0x001fffff] to [bus 81] add_size 300000 add_align 100000 >> [ 544.297438] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: can't assign; no space >> [ 544.297444] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: failed to assign >> [ 544.297451] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 544.297457] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 544.297464] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: assigned >> [ 544.297473] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to expand by 0x300000 >> [ 544.297481] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to add 300000 >> [ 544.297488] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 544.297494] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 544.297503] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned >> [ 544.297524] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space >> [ 544.297530] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign >> [ 544.297538] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned >> [ 544.297558] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space >> [ 544.297563] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign >> [ 544.297569] pcieport 10001:80:02.0: PCI bridge to [bus 81] >> [ 544.297579] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] >> [ 544.297588] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] >> [ 544.298256] nvme nvme1: pci function 10001:81:00.0 >> [ 544.298278] nvme 10001:81:00.0: enabling device (0000 -> 0002) >> [ 544.298291] pcieport 10001:80:02.0: can't derive routing for PCI INT A >> [ 544.298298] nvme 10001:81:00.0: PCI INT A: no GSI >> [ 544.875198] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 544.875208] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 544.875215] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 544.875231] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 544.875910] ==== pciehp_ist 759 stop running >> [ 544.875920] ==== pciehp_ist 703 start running >> [ 544.875928] pcieport 10001:80:02.0: pciehp: Slot(77): Link Down >> [ 544.876857] ==== pcie_reset_lbms_count 281 lbms_count set to 0 >> [ 544.876868] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 545.427157] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 545.427169] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 545.476411] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 545.478099] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 545.857887] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 545.857896] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 545.857902] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 545.857929] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 546.410193] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 546.410205] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 546.460531] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 546.697008] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 546.697020] ==== pciehp_ist 759 stop running >> [ 546.697025] ==== pciehp_ist 703 start running >> [ 546.697034] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 546.697039] pcieport 10001:80:02.0: pciehp: Slot(77): Link Up >> [ 546.718015] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 546.987498] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 546.987507] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 546.987514] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 546.987542] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 547.539681] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 547.539693] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 547.589214] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 547.850003] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 547.850011] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 547.850018] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 547.850046] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 547.996918] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 547.996930] ==== pciehp_ist 759 stop running >> [ 547.996934] ==== pciehp_ist 703 start running >> [ 547.996944] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 548.401899] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 548.401911] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 548.451186] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 548.452886] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 548.682838] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 548.682846] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 548.682852] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 548.682871] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 549.235408] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 549.235420] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 549.284761] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 549.654883] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 549.654892] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 549.654899] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 549.654926] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 549.738806] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 549.738815] ==== pciehp_ist 759 stop running >> [ 549.738819] ==== pciehp_ist 703 start running >> [ 549.738829] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 550.207186] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 550.207198] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 550.256868] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 550.256890] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 550.575344] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 550.575353] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 550.575360] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 550.575386] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 551.127757] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 551.127768] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 551.177224] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 551.477699] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 551.477711] ==== pciehp_ist 759 stop running >> [ 551.477716] ==== pciehp_ist 703 start running >> [ 551.477725] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 551.477730] pcieport 10001:80:02.0: pciehp: Slot(77): Link Up >> [ 551.498667] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 551.788685] pci 10001:81:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint >> [ 551.788723] pci 10001:81:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] >> [ 551.788933] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] >> [ 551.788941] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs >> [ 551.789619] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 >> [ 551.789653] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 551.789663] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 551.789672] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 551.789677] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 551.789688] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned >> [ 551.789708] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space >> [ 551.789715] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign >> [ 551.789722] pcieport 10001:80:02.0: PCI bridge to [bus 81] >> [ 551.789733] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] >> [ 551.789743] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] >> [ 551.789755] PCI: No. 2 try to assign unassigned res >> [ 551.789759] release child resource [mem 0xbb000000-0xbb007fff 64bit] >> [ 551.789764] pcieport 10001:80:02.0: resource 14 [mem 0xbb000000-0xbb0fffff] released >> [ 551.789771] pcieport 10001:80:02.0: PCI bridge to [bus 81] >> [ 551.789779] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 >> [ 551.789790] pcieport 10001:80:02.0: bridge window [mem 0x00100000-0x001fffff] to [bus 81] add_size 300000 add_align 100000 >> [ 551.789804] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: can't assign; no space >> [ 551.789811] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: failed to assign >> [ 551.789817] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 551.789823] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 551.789831] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: assigned >> [ 551.789839] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to expand by 0x300000 >> [ 551.789847] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to add 300000 >> [ 551.789854] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 551.789860] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 551.789869] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned >> [ 551.789889] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space >> [ 551.789895] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign >> [ 551.789903] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned >> [ 551.789921] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space >> [ 551.789927] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign >> [ 551.789933] pcieport 10001:80:02.0: PCI bridge to [bus 81] >> [ 551.789942] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] >> [ 551.789951] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] >> [ 551.790638] nvme nvme1: pci function 10001:81:00.0 >> [ 551.790656] nvme 10001:81:00.0: enabling device (0000 -> 0002) >> [ 551.790667] pcieport 10001:80:02.0: can't derive routing for PCI INT A >> [ 551.790674] nvme 10001:81:00.0: PCI INT A: no GSI >> [ 552.546963] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 552.546973] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 552.546980] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 552.546996] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 552.547590] ==== pciehp_ist 759 stop running >> [ 552.547598] ==== pciehp_ist 703 start running >> [ 552.547605] pcieport 10001:80:02.0: pciehp: Slot(77): Link Down >> [ 552.548215] ==== pcie_reset_lbms_count 281 lbms_count set to 0 >> [ 552.548224] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 553.098957] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 553.098969] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 553.148031] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 553.149553] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 553.499647] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 553.499654] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 553.499660] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 553.499683] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 554.052313] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 554.052325] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 554.102175] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 554.265181] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 554.265188] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 554.265194] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 554.265217] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 554.453449] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 554.453458] ==== pciehp_ist 759 stop running >> [ 554.453463] ==== pciehp_ist 703 start running >> [ 554.453472] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 554.743040] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 555.475369] ======pcie_wait_for_link_delay 4787,wait for linksta:-110 >> [ 555.475384] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 116, lnkctl2:0x5, lnksta:0x1041 >> [ 555.475392] ==== pcie_lbms_seen 48 count:0x2 >> [ 555.475398] pcieport 10001:80:02.0: broken device, retraining non-functional downstream link at 2.5GT/s >> [ 555.475404] ========== pcie_set_target_speed 172, speed has been set >> [ 555.475409] pcieport 10001:80:02.0: retraining sucessfully, but now is in Gen 1 >> [ 555.475417] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 135, oldlnkctl2:0x5,newlnkctl2:0x5,newlnksta:0x1041 >> [ 556.633310] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 556.633322] ==== pciehp_ist 759 stop running >> [ 556.633328] ==== pciehp_ist 703 start running >> [ 556.633336] ==== pciehp_ist 759 stop running >> [ 556.828412] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 556.828440] ==== pciehp_ist 703 start running >> [ 556.828448] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 557.017389] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 557.017400] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 557.066666] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 557.066688] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 557.209334] pci 10001:81:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint >> [ 557.209374] pci 10001:81:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] >> [ 557.209585] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] >> [ 557.209592] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs >> [ 557.210275] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 >> [ 557.210292] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 557.210300] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 557.210307] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 557.210312] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 557.210322] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned >> [ 557.210342] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space >> [ 557.210349] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign >> [ 557.210356] pcieport 10001:80:02.0: PCI bridge to [bus 81] >> [ 557.210366] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] >> [ 557.210376] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] >> [ 557.210388] PCI: No. 2 try to assign unassigned res >> [ 557.210392] release child resource [mem 0xbb000000-0xbb007fff 64bit] >> [ 557.210397] pcieport 10001:80:02.0: resource 14 [mem 0xbb000000-0xbb0fffff] released >> [ 557.210405] pcieport 10001:80:02.0: PCI bridge to [bus 81] >> [ 557.210414] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 >> [ 557.210424] pcieport 10001:80:02.0: bridge window [mem 0x00100000-0x001fffff] to [bus 81] add_size 300000 add_align 100000 >> [ 557.210438] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: can't assign; no space >> [ 557.210445] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: failed to assign >> [ 557.210451] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 557.210457] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 557.210464] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: assigned >> [ 557.210472] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to expand by 0x300000 >> [ 557.210479] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to add 300000 >> [ 557.210487] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 557.210492] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 557.210501] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned >> [ 557.210521] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space >> [ 557.210527] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign >> [ 557.210534] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned >> [ 557.210553] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space >> [ 557.210559] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign >> [ 557.210565] pcieport 10001:80:02.0: PCI bridge to [bus 81] >> [ 557.210574] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] >> [ 557.210583] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] >> [ 557.211286] nvme nvme1: pci function 10001:81:00.0 >> [ 557.211303] nvme 10001:81:00.0: enabling device (0000 -> 0002) >> [ 557.211315] pcieport 10001:80:02.0: can't derive routing for PCI INT A >> [ 557.211322] nvme 10001:81:00.0: PCI INT A: no GSI >> [ 557.565811] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 557.565820] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 557.565827] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 557.565842] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 557.566410] ==== pciehp_ist 759 stop running >> [ 557.566416] ==== pciehp_ist 703 start running >> [ 557.566423] pcieport 10001:80:02.0: pciehp: Slot(77): Link Down >> [ 557.567592] ==== pcie_reset_lbms_count 281 lbms_count set to 0 >> [ 557.567602] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 558.117581] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 558.117594] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 558.166639] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 558.168190] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 558.376176] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 558.376184] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 558.376190] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 558.376208] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 558.928611] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 558.928621] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 558.977769] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 559.186385] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 559.186394] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 559.186400] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 559.186419] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 559.459099] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 559.459111] ==== pciehp_ist 759 stop running >> [ 559.459116] ==== pciehp_ist 703 start running >> [ 559.459124] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 559.738599] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 559.738610] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 559.787690] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 559.787712] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 560.307243] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 560.307253] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 560.307260] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 560.307282] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 560.978997] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 560.979007] ==== pciehp_ist 759 stop running >> [ 560.979013] ==== pciehp_ist 703 start running >> [ 560.979022] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 561.410141] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 561.410153] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 561.459064] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 561.459087] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 561.648520] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 561.648528] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 561.648536] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 561.648559] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 562.247076] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 562.247087] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 562.296600] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 562.454228] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 >> [ 562.454236] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 562.454244] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 >> [ 562.487632] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 562.674863] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 562.674874] ==== pciehp_ist 759 stop running >> [ 562.674879] ==== pciehp_ist 703 start running >> [ 562.674888] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 563.696784] ======pcie_wait_for_link_delay 4787,wait for linksta:-110 >> [ 563.696798] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 116, lnkctl2:0x5, lnksta:0x1041 >> [ 563.696806] ==== pcie_lbms_seen 48 count:0x5 >> [ 563.696813] pcieport 10001:80:02.0: broken device, retraining non-functional downstream link at 2.5GT/s >> [ 563.696817] ========== pcie_set_target_speed 172, speed has been set >> [ 563.696823] pcieport 10001:80:02.0: retraining sucessfully, but now is in Gen 1 >> [ 563.696830] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 135, oldlnkctl2:0x5,newlnkctl2:0x5,newlnksta:0x1041 >> [ 564.133582] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 564.133594] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 564.183003] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 564.364911] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 564.364921] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 564.364930] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 564.364954] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 564.889708] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 564.889719] ==== pciehp_ist 759 stop running >> [ 564.889724] ==== pciehp_ist 703 start running >> [ 564.889732] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 565.493151] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 565.493162] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 565.542478] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 565.542501] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 565.752276] ==== pcie_bwnotif_irq 247(start running),link_status:0x5041 >> [ 565.752285] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 565.752291] ==== pcie_bwnotif_irq 269(stop running),link_status:0x5041 >> [ 565.752316] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 566.359793] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 566.359804] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 566.408820] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 566.581150] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 >> [ 566.581159] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 566.581166] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 >> [ 566.614491] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 566.755582] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 566.755591] ==== pciehp_ist 759 stop running >> [ 566.755596] ==== pciehp_ist 703 start running >> [ 566.755605] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 567.751399] ==== pcie_bwnotif_irq 247(start running),link_status:0x9845 >> [ 567.751412] ==== pcie_bwnotif_irq 269(stop running),link_status:0x9845 >> [ 567.776517] ======pcie_wait_for_link_delay 4787,wait for linksta:-110 >> [ 567.776529] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 116, lnkctl2:0x5, lnksta:0x1845 >> [ 567.776538] ==== pcie_lbms_seen 48 count:0x8 >> [ 567.776544] pcieport 10001:80:02.0: broken device, retraining non-functional downstream link at 2.5GT/s >> [ 567.801147] ==== pcie_bwnotif_irq 247(start running),link_status:0x3045 >> [ 567.801177] ==== pcie_bwnotif_irq 247(start running),link_status:0x7841 >> [ 567.801184] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 567.801192] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 >> [ 567.801201] ==== pcie_reset_lbms_count 281 lbms_count set to 0 >> [ 567.801207] ========== pcie_set_target_speed 189, bwctl change speed ret:0x0 >> [ 567.801214] pcieport 10001:80:02.0: retraining sucessfully, but now is in Gen 1 >> [ 567.801220] pcieport 10001:80:02.0: ============ pcie_failed_link_retrain 135, oldlnkctl2:0x5,newlnkctl2:0x1,newlnksta:0x3841 >> [ 567.815102] ==== pcie_bwnotif_irq 247(start running),link_status:0x7041 >> [ 567.815110] ==== pcie_bwnotif_irq 256 lbms_count++ >> [ 567.815117] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7041 >> [ 567.910155] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 >> [ 568.961434] pcieport 10001:80:02.0: pciehp: Slot(77): No device found >> [ 568.961444] ==== pciehp_ist 759 stop running >> [ 568.961450] ==== pciehp_ist 703 start running >> [ 568.961459] pcieport 10001:80:02.0: pciehp: Slot(77): Card present >> [ 569.008665] ==== pcie_bwnotif_irq 247(start running),link_status:0x3041 >> [ 569.010428] ======pcie_wait_for_link_delay 4787,wait for linksta:0 >> [ 569.391482] pci 10001:81:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint >> [ 569.391549] pci 10001:81:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] >> [ 569.391968] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] >> [ 569.391975] pci 10001:81:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs >> [ 569.392869] pci 10001:81:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 10001:80:02.0 (capable of 126.028 Gb/s with 32.0 GT/s PCIe x4 link) >> [ 569.393233] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 >> [ 569.393249] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 569.393257] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 569.393264] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 569.393270] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 569.393279] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned >> [ 569.393315] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space >> [ 569.393322] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign >> [ 569.393329] pcieport 10001:80:02.0: PCI bridge to [bus 81] >> [ 569.393340] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] >> [ 569.393350] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] >> [ 569.393362] PCI: No. 2 try to assign unassigned res >> [ 569.393366] release child resource [mem 0xbb000000-0xbb007fff 64bit] >> [ 569.393371] pcieport 10001:80:02.0: resource 14 [mem 0xbb000000-0xbb0fffff] released >> [ 569.393378] pcieport 10001:80:02.0: PCI bridge to [bus 81] >> [ 569.393404] pcieport 10001:80:02.0: bridge window [io 0x1000-0x0fff] to [bus 81] add_size 1000 >> [ 569.393414] pcieport 10001:80:02.0: bridge window [mem 0x00100000-0x001fffff] to [bus 81] add_size 300000 add_align 100000 >> [ 569.393430] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: can't assign; no space >> [ 569.393438] pcieport 10001:80:02.0: bridge window [mem size 0x00400000]: failed to assign >> [ 569.393445] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 569.393451] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 569.393458] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: assigned >> [ 569.393466] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to expand by 0x300000 >> [ 569.393474] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff]: failed to add 300000 >> [ 569.393481] pcieport 10001:80:02.0: bridge window [io size 0x1000]: can't assign; no space >> [ 569.393487] pcieport 10001:80:02.0: bridge window [io size 0x1000]: failed to assign >> [ 569.393495] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned >> [ 569.393529] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space >> [ 569.393536] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign >> [ 569.393543] pci 10001:81:00.0: BAR 0 [mem 0xbb000000-0xbb007fff 64bit]: assigned >> [ 569.393576] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: can't assign; no space >> [ 569.393582] pci 10001:81:00.0: VF BAR 0 [mem size 0x00200000 64bit]: failed to assign >> [ 569.393588] pcieport 10001:80:02.0: PCI bridge to [bus 81] >> [ 569.393597] pcieport 10001:80:02.0: bridge window [mem 0xbb000000-0xbb0fffff] >> [ 569.393606] pcieport 10001:80:02.0: bridge window [mem 0xbbd00000-0xbbefffff 64bit pref] >> [ 569.394076] nvme nvme1: pci function 10001:81:00.0 >> [ 569.394095] nvme 10001:81:00.0: enabling device (0000 -> 0002) >> [ 569.394109] pcieport 10001:80:02.0: can't derive routing for PCI INT A >> [ 569.394116] nvme 10001:81:00.0: PCI INT A: no GSI >> [ 570.158994] nvme nvme1: D3 entry latency set to 10 seconds >> [ 570.239267] nvme nvme1: 127/0/0 default/read/poll queues >> [ 570.287896] ==== pciehp_ist 759 stop running >> [ 570.287911] ==== pciehp_ist 703 start running >> [ 570.287918] ==== pciehp_ist 759 stop running >> [ 570.288953] nvme1n1: p1 p2 p3 p4 p5 p6 p7 >> >> -------------------------------dmesg log----------------------------------------- >> >> >From the log above, it can be seen that I added some debugging codes in the kernel. >> The specific modifications are as follows: >> >> -------------------------------diff file----------------------------------------- >> >> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c >> index bb5a8d9f03ad..c9f3ed86a084 100644 >> --- a/drivers/pci/hotplug/pciehp_hpc.c >> +++ b/drivers/pci/hotplug/pciehp_hpc.c >> @@ -700,6 +700,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) >> irqreturn_t ret; >> u32 events; >> >> + printk("==== %s %d start running\n", __func__, __LINE__); >> ctrl->ist_running = true; >> pci_config_pm_runtime_get(pdev); >> >> @@ -755,6 +756,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) >> pci_config_pm_runtime_put(pdev); >> ctrl->ist_running = false; >> wake_up(&ctrl->requester); >> + printk("==== %s %d stop running\n", __func__, __LINE__); >> return ret; >> } >> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index 661f98c6c63a..ffa58f389456 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -4784,6 +4784,7 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active, >> if (active) >> msleep(20); >> rc = pcie_wait_for_link_status(pdev, false, active); >> + printk("======%s %d,wait for linksta:%d\n", __func__, __LINE__, rc); >> if (active) { >> if (rc) >> rc = pcie_failed_link_retrain(pdev); >> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h >> index 2e40fc63ba31..b7e5af859517 100644 >> --- a/drivers/pci/pci.h >> +++ b/drivers/pci/pci.h >> @@ -337,12 +337,13 @@ void pci_bus_put(struct pci_bus *bus); >> >> #define PCIE_LNKCAP_SLS2SPEED(lnkcap) \ >> ({ \ >> - ((lnkcap) == PCI_EXP_LNKCAP_SLS_64_0GB ? PCIE_SPEED_64_0GT : \ >> - (lnkcap) == PCI_EXP_LNKCAP_SLS_32_0GB ? PCIE_SPEED_32_0GT : \ >> - (lnkcap) == PCI_EXP_LNKCAP_SLS_16_0GB ? PCIE_SPEED_16_0GT : \ >> - (lnkcap) == PCI_EXP_LNKCAP_SLS_8_0GB ? PCIE_SPEED_8_0GT : \ >> - (lnkcap) == PCI_EXP_LNKCAP_SLS_5_0GB ? PCIE_SPEED_5_0GT : \ >> - (lnkcap) == PCI_EXP_LNKCAP_SLS_2_5GB ? PCIE_SPEED_2_5GT : \ >> + u32 __lnkcap = (lnkcap) & PCI_EXP_LNKCAP_SLS; \ >> + (__lnkcap == PCI_EXP_LNKCAP_SLS_64_0GB ? PCIE_SPEED_64_0GT : \ >> + __lnkcap == PCI_EXP_LNKCAP_SLS_32_0GB ? PCIE_SPEED_32_0GT : \ >> + __lnkcap == PCI_EXP_LNKCAP_SLS_16_0GB ? PCIE_SPEED_16_0GT : \ >> + __lnkcap == PCI_EXP_LNKCAP_SLS_8_0GB ? PCIE_SPEED_8_0GT : \ >> + __lnkcap == PCI_EXP_LNKCAP_SLS_5_0GB ? PCIE_SPEED_5_0GT : \ >> + __lnkcap == PCI_EXP_LNKCAP_SLS_2_5GB ? PCIE_SPEED_2_5GT : \ >> PCI_SPEED_UNKNOWN); \ >> }) >> >> @@ -357,13 +358,16 @@ void pci_bus_put(struct pci_bus *bus); >> PCI_SPEED_UNKNOWN) >> >> #define PCIE_LNKCTL2_TLS2SPEED(lnkctl2) \ >> - ((lnkctl2) == PCI_EXP_LNKCTL2_TLS_64_0GT ? PCIE_SPEED_64_0GT : \ >> - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_32_0GT ? PCIE_SPEED_32_0GT : \ >> - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_16_0GT ? PCIE_SPEED_16_0GT : \ >> - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_8_0GT ? PCIE_SPEED_8_0GT : \ >> - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_5_0GT ? PCIE_SPEED_5_0GT : \ >> - (lnkctl2) == PCI_EXP_LNKCTL2_TLS_2_5GT ? PCIE_SPEED_2_5GT : \ >> - PCI_SPEED_UNKNOWN) >> +({ \ >> + u16 __lnkctl2 = (lnkctl2) & PCI_EXP_LNKCTL2_TLS; \ >> + (__lnkctl2 == PCI_EXP_LNKCTL2_TLS_64_0GT ? PCIE_SPEED_64_0GT : \ >> + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_32_0GT ? PCIE_SPEED_32_0GT : \ >> + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_16_0GT ? PCIE_SPEED_16_0GT : \ >> + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_8_0GT ? PCIE_SPEED_8_0GT : \ >> + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_5_0GT ? PCIE_SPEED_5_0GT : \ >> + __lnkctl2 == PCI_EXP_LNKCTL2_TLS_2_5GT ? PCIE_SPEED_2_5GT : \ >> + PCI_SPEED_UNKNOWN); \ >> +}) >> >> /* PCIe speed to Mb/s reduced by encoding overhead */ >> #define PCIE_SPEED2MBS_ENC(speed) \ >> diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c >> index b59cacc740fa..a8ce09f67d3b 100644 >> --- a/drivers/pci/pcie/bwctrl.c >> +++ b/drivers/pci/pcie/bwctrl.c >> @@ -168,8 +168,10 @@ int pcie_set_target_speed(struct pci_dev *port, enum pci_bus_speed speed_req, >> if (WARN_ON_ONCE(!pcie_valid_speed(speed_req))) >> return -EINVAL; >> >> - if (bus && bus->cur_bus_speed == speed_req) >> + if (bus && bus->cur_bus_speed == speed_req) { >> + printk("========== %s %d, speed has been set\n", __func__, __LINE__); >> return 0; >> + } >> >> target_speed = pcie_bwctrl_select_speed(port, speed_req); >> >> @@ -184,6 +186,7 @@ int pcie_set_target_speed(struct pci_dev *port, enum pci_bus_speed speed_req, >> mutex_lock(&data->set_speed_mutex); >> >> ret = pcie_bwctrl_change_speed(port, target_speed, use_lt); >> + printk("========== %s %d, bwctl change speed ret:0x%x\n", __func__, __LINE__,ret); >> >> if (data) >> mutex_unlock(&data->set_speed_mutex); >> @@ -209,8 +212,10 @@ static void pcie_bwnotif_enable(struct pcie_device *srv) >> >> /* Count LBMS seen so far as one */ >> ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status); >> - if (ret == PCIBIOS_SUCCESSFUL && link_status & PCI_EXP_LNKSTA_LBMS) >> + if (ret == PCIBIOS_SUCCESSFUL && link_status & PCI_EXP_LNKSTA_LBMS) { >> + printk("==== %s %d lbms_count++\n", __func__, __LINE__); >> atomic_inc(&data->lbms_count); >> + } >> >> pcie_capability_set_word(port, PCI_EXP_LNKCTL, >> PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE); >> @@ -239,6 +244,7 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context) >> int ret; >> >> ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status); >> + printk("==== %s %d(start running),link_status:0x%x\n", __func__, __LINE__,link_status); >> if (ret != PCIBIOS_SUCCESSFUL) >> return IRQ_NONE; >> >> @@ -246,8 +252,10 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context) >> if (!events) >> return IRQ_NONE; >> >> - if (events & PCI_EXP_LNKSTA_LBMS) >> + if (events & PCI_EXP_LNKSTA_LBMS) { >> + printk("==== %s %d lbms_count++\n", __func__, __LINE__); >> atomic_inc(&data->lbms_count); >> + } >> >> pcie_capability_write_word(port, PCI_EXP_LNKSTA, events); >> >> @@ -258,6 +266,7 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context) >> * cleared to avoid missing link speed changes. >> */ >> pcie_update_link_speed(port->subordinate); >> + printk("==== %s %d(stop running),link_status:0x%x\n", __func__, __LINE__,link_status); >> >> return IRQ_HANDLED; >> } >> @@ -268,8 +277,10 @@ void pcie_reset_lbms_count(struct pci_dev *port) >> >> guard(rwsem_read)(&pcie_bwctrl_lbms_rwsem); >> data = port->link_bwctrl; >> - if (data) >> + if (data) { >> + printk("==== %s %d lbms_count set to 0\n", __func__, __LINE__); >> atomic_set(&data->lbms_count, 0); >> + } >> else >> pcie_capability_write_word(port, PCI_EXP_LNKSTA, >> PCI_EXP_LNKSTA_LBMS); >> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c >> index 76f4df75b08a..a602f9aa5d6a 100644 >> --- a/drivers/pci/quirks.c >> +++ b/drivers/pci/quirks.c >> @@ -41,8 +41,11 @@ static bool pcie_lbms_seen(struct pci_dev *dev, u16 lnksta) >> int ret; >> >> ret = pcie_lbms_count(dev, &count); >> - if (ret < 0) >> + if (ret < 0) { >> + printk("==== %s %d lnksta(0x%x) & LBMS\n", __func__, __LINE__, lnksta); >> return lnksta & PCI_EXP_LNKSTA_LBMS; >> + } >> + printk("==== %s %d count:0x%lx\n", __func__, __LINE__, count); >> >> return count > 0; >> } >> @@ -110,6 +113,8 @@ int pcie_failed_link_retrain(struct pci_dev *dev) >> >> pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); >> pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); >> + pci_info(dev, "============ %s %d, lnkctl2:0x%x, lnksta:0x%x\n", >> + __func__, __LINE__, lnkctl2, lnksta); >> if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) { >> u16 oldlnkctl2 = lnkctl2; >> >> @@ -121,9 +126,14 @@ int pcie_failed_link_retrain(struct pci_dev *dev) >> pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2), >> true); >> return ret; >> + } else { >> + pci_info(dev, "retraining sucessfully, but now is in Gen 1\n"); >> } >> >> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); >> pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); >> + pci_info(dev, "============ %s %d, oldlnkctl2:0x%x,newlnkctl2:0x%x,newlnksta:0x%x\n", >> + __func__, __LINE__, oldlnkctl2, lnkctl2, lnksta); >> } >> >> if ((lnksta & PCI_EXP_LNKSTA_DLLLA) && >> >> -------------------------------diff file----------------------------------------- >> >> Based on the information in the log from 566.755596 to 567.801220, the issue >> has been reproduced. Between 566 and 567 seconds, the pcie_bwnotif_irq interrupt >> was triggered 4 times, this indicates that during this period, the NVMe drive >> was plugged and unplugged multiple times. >> >> Thanks, >> Regards, >> Jiwei >> >>> didn't explain LBMS (nor DLLLA) in the above sequence so it's hard to >>> follow what is going on here. LBMS in particular is of high interest here >>> because I'm trying to understand if something should clear it on the >>> hotplug side (there's already one call to clear it in remove_board()). >>> >>> In step 2 (pcie_set_target_speed() in step 1 succeeded), >>> pcie_failed_link_retrain() attempts to restore >2.5GT/s speed, this only >>> occurs when pci_match_id() matches. I guess you're trying to say that step >>> 2 is not taken because pci_match_id() is not matching but the wording >>> above is very confusing. >>> >>> Overall, I failed to understand the scenario here fully despite trying to >>> think it through over these few days. >>> >>>> the target link speed >>>> field of the Link Control >>>> 2 Register will keep 0x1. >>>> >>>> In order to fix the issue, don't do the retraining work except ASMedia >>>> ASM2824. >>>> >>>> Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures") >>>> Reported-by: Adrian Huang <ahuang12@lenovo.com> >>>> Signed-off-by: Jiwei Sun <sunjw10@lenovo.com> >>>> --- >>>> drivers/pci/quirks.c | 6 ++++-- >>>> 1 file changed, 4 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c >>>> index 605628c810a5..ff04ebd9ae16 100644 >>>> --- a/drivers/pci/quirks.c >>>> +++ b/drivers/pci/quirks.c >>>> @@ -104,6 +104,9 @@ int pcie_failed_link_retrain(struct pci_dev *dev) >>>> u16 lnksta, lnkctl2; >>>> int ret = -ENOTTY; >>>> >>>> + if (!pci_match_id(ids, dev)) >>>> + return 0; >>>> + >>>> if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) || >>>> !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting) >>>> return ret; >>>> @@ -129,8 +132,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) >>>> } >>>> >>>> if ((lnksta & PCI_EXP_LNKSTA_DLLLA) && >>>> - (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT && >>>> - pci_match_id(ids, dev)) { >>>> + (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) { >>>> u32 lnkcap; >>>> >>>> pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n"); >>>> >>> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [External] : [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing 2025-01-10 13:44 [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing Jiwei Sun 2025-01-11 16:00 ` Maciej W. Rozycki 2025-01-13 15:08 ` Ilpo Järvinen @ 2025-09-09 12:33 ` ALOK TIWARI 2 siblings, 0 replies; 13+ messages in thread From: ALOK TIWARI @ 2025-09-09 12:33 UTC (permalink / raw) To: Jiwei Sun, macro, ilpo.jarvinen, bhelgaas Cc: linux-pci, linux-kernel, guojinhui.liam, helgaas, lukas, ahuang12, sunjw10 On 1/10/2025 7:14 PM, Jiwei Sun wrote: > From: Jiwei Sun <sunjw10@lenovo.com> > > When we do the quick hot-add/hot-remove test (within 1 second) with a PCIE > Gen 5 NVMe disk, there is a possibility that the PCIe bridge will decrease > to 2.5GT/s from 32GT/s > > pcieport 10002:00:04.0: pciehp: Slot(75): Link Down > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > ... > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: broken device, retraining non-functional downstream link at 2.5GT/s > pcieport 10002:00:04.0: pciehp: Slot(75): No link > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): Link Up > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pcieport 10002:00:04.0: pciehp: Slot(75): No device found > pcieport 10002:00:04.0: pciehp: Slot(75): Card present > pci 10002:02:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint > pci 10002:02:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit] > pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit] > pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs > pci 10002:02:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 10002:00:04.0 (capable of 126.028 Gb/s with 32.0 GT/s PCIe x4 link) > > If a NVMe disk is hot removed, the pciehp interrupt will be triggered, and > the kernel thread pciehp_ist will be woken up, the > pcie_failed_link_retrain() will be called as the following call trace. > > irq/87-pciehp-2524 [121] ..... 152046.006765: pcie_failed_link_retrain <-pcie_wait_for_link > irq/87-pciehp-2524 [121] ..... 152046.006782: <stack trace> > => [FTRACE TRAMPOLINE] > => pcie_failed_link_retrain > => pcie_wait_for_link > => pciehp_check_link_status > => pciehp_enable_slot > => pciehp_handle_presence_or_link_change > => pciehp_ist > => irq_thread_fn > => irq_thread > => kthread > => ret_from_fork > => ret_from_fork_asm > > Accorind to investigation, the issue is caused by the following scenerios, > > NVMe disk pciehp hardirq > hot-remove top-half pciehp irq kernel thread > ====================================================================== > pciehp hardirq > will be triggered > cpu handle pciehp > hardirq > pciehp irq kthread will > be woken up > pciehp_ist > ... > pcie_failed_link_retrain > read PCI_EXP_LNKCTL2 register > read PCI_EXP_LNKSTA register > If NVMe disk > hot-add before > calling pcie_retrain_link() > set target speed to 2_5GT > pcie_bwctrl_change_speed > pcie_retrain_link > : the retrain work will be > successful, because > pci_match_id() will be > 0 in > pcie_failed_link_retrain() > the target link speed > field of the Link Control > 2 Register will keep 0x1. > > In order to fix the issue, don't do the retraining work except ASMedia > ASM2824. > > Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures") > Reported-by: Adrian Huang <ahuang12@lenovo.com> > Signed-off-by: Jiwei Sun <sunjw10@lenovo.com> > --- > drivers/pci/quirks.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 605628c810a5..ff04ebd9ae16 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -104,6 +104,9 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > u16 lnksta, lnkctl2; > int ret = -ENOTTY; > > + if (!pci_match_id(ids, dev)) > + return 0; > + > if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) || > !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting) > return ret; > @@ -129,8 +132,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > } > > if ((lnksta & PCI_EXP_LNKSTA_DLLLA) && > - (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT && > - pci_match_id(ids, dev)) { > + (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) { > u32 lnkcap; > > pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n"); Sorry for the noise. As a follow-up to this patch, we are seeing a similar issue in our testing. After applying this patch, things look good. Do we have a final fix for this? Thanks, Alok ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
@ 2025-12-01 3:52 Maciej W. Rozycki
2025-12-04 1:40 ` [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing Matthew W Carlis
0 siblings, 1 reply; 13+ messages in thread
From: Maciej W. Rozycki @ 2025-12-01 3:52 UTC (permalink / raw)
To: Bjorn Helgaas, Matthew W Carlis, ALOK TIWARI
Cc: ashishk, bamstadt, msaggi, sconnor, Lukas Wunner,
Ilpo Järvinen, Jiwei, guojinhui.liam, ahuang12, sunjw10,
linux-pci, linux-kernel
Discard Vendor:Device ID matching in the PCIe failed link retraining
quirk and ignore the link status for the removal of the 2.5GT/s speed
clamp, whether applied by the quirk itself or the firmware earlier on.
Revert to the original target link speed if this final link retraining
has failed.
This is so that link training noise in hot-plug scenarios does not make
a link remain clamped to the 2.5GT/s speed where an event race has led
the quirk to apply the speed clamp for one device, only to leave it in
place for a subsequent device to be plugged in.
Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: <stable@vger.kernel.org> # v6.5+
---
drivers/pci/quirks.c | 50 ++++++++++++++++++--------------------------------
1 file changed, 18 insertions(+), 32 deletions(-)
linux-pcie-failed-link-retrain-unclamp-always.diff
Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -79,11 +79,10 @@ static bool pcie_lbms_seen(struct pci_de
* Restrict the speed to 2.5GT/s then with the Target Link Speed field,
* request a retrain and check the result.
*
- * If this turns out successful and we know by the Vendor:Device ID it is
- * safe to do so, then lift the restriction, letting the devices negotiate
- * a higher speed. Also check for a similar 2.5GT/s speed restriction the
- * firmware may have already arranged and lift it with ports that already
- * report their data link being up.
+ * If this turns out successful, or where a 2.5GT/s speed restriction has
+ * been previously arranged by the firmware and the port reports its link
+ * already being up, lift the restriction, in a hope it is safe to do so,
+ * letting the devices negotiate a higher speed.
*
* Otherwise revert the speed to the original setting and request a retrain
* again to remove any residual state, ignoring the result as it's supposed
@@ -94,52 +93,39 @@ static bool pcie_lbms_seen(struct pci_de
*/
int pcie_failed_link_retrain(struct pci_dev *dev)
{
- static const struct pci_device_id ids[] = {
- { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
- {}
- };
- u16 lnksta, lnkctl2;
+ u16 lnksta, lnkctl2, oldlnkctl2;
int ret = -ENOTTY;
+ u32 lnkcap;
if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
!pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
return ret;
pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+ pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &oldlnkctl2);
if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) {
- u16 oldlnkctl2;
-
pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
- pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &oldlnkctl2);
ret = pcie_set_target_speed(dev, PCIE_SPEED_2_5GT, false);
- if (ret) {
- pci_info(dev, "retraining failed\n");
- pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2),
- true);
- return ret;
- }
-
- pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+ if (ret)
+ goto err;
}
pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
-
- if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
- (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
- pci_match_id(ids, dev)) {
- u32 lnkcap;
-
+ pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
+ if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
+ (lnkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) {
pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
- pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
ret = pcie_set_target_speed(dev, PCIE_LNKCAP_SLS2SPEED(lnkcap), false);
- if (ret) {
- pci_info(dev, "retraining failed\n");
- return ret;
- }
+ if (ret)
+ goto err;
}
return ret;
+err:
+ pci_info(dev, "retraining failed\n");
+ pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2), true);
+ return ret;
}
static ktime_t fixup_debug_start(struct pci_dev *dev,
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing 2025-12-01 3:52 [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining Maciej W. Rozycki @ 2025-12-04 1:40 ` Matthew W Carlis 2025-12-04 23:43 ` Maciej W. Rozycki 0 siblings, 1 reply; 13+ messages in thread From: Matthew W Carlis @ 2025-12-04 1:40 UTC (permalink / raw) To: macro Cc: ahuang12, alok.a.tiwari, ashishk, bamstadt, bhelgaas, guojinhui.liam, ilpo.jarvinen, jiwei.sun.bj, linux-kernel, linux-pci, lukas, mattc, msaggi, sconnor, sunjw10 On Mon, 1 Dec 2025, Maciej W. Rozycki wrote: > Discard Vendor:Device ID matching in the PCIe failed link retraining > quirk and ignore the link status for the removal of the 2.5GT/s speed > clamp, whether applied by the quirk itself or the firmware earlier on. > Revert to the original target link speed if this final link retraining > has failed. I think we should either remove the quirk or only execute the quirk when the downstream port is the specific ASMedia 0x2824. Hardware companies that develop PCIe devices rely on the linux kernel for a significant amount of their testing & the action taken by this quirk is going to introduce noise into those tests by initiating unexpected speed changes etc. As long as we have this quirk messing with link speeds we'll just continue to see issue reports over time in my opinion. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing 2025-12-04 1:40 ` [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing Matthew W Carlis @ 2025-12-04 23:43 ` Maciej W. Rozycki 0 siblings, 0 replies; 13+ messages in thread From: Maciej W. Rozycki @ 2025-12-04 23:43 UTC (permalink / raw) To: Matthew W Carlis Cc: ahuang12, alok.a.tiwari, ashishk, Bjorn Helgaas, guojinhui.liam, Ilpo Järvinen, jiwei.sun.bj, linux-kernel, linux-pci, Lukas Wunner, msaggi, sconnor, sunjw10 On Wed, 3 Dec 2025, Matthew W Carlis wrote: > > Discard Vendor:Device ID matching in the PCIe failed link retraining > > quirk and ignore the link status for the removal of the 2.5GT/s speed > > clamp, whether applied by the quirk itself or the firmware earlier on. > > Revert to the original target link speed if this final link retraining > > has failed. > > I think we should either remove the quirk or only execute the quirk when the > downstream port is the specific ASMedia 0x2824. Hardware companies that > develop PCIe devices rely on the linux kernel for a significant amount of > their testing & the action taken by this quirk is going to introduce > noise into those tests by initiating unexpected speed changes etc. Conversely, ISTM this could be good motivation for hardware designers to reduce hot-plug noise. After all LBMS is only supposed to be ever set for links in the active state and not while training, so perhaps debouncing is needed for the transient state? > As long as we have this quirk messing with link speeds we'll just > continue to see issue reports over time in my opinion. Well, the issues happened because I made an unfortunate design decision with the original implementation which did not clean up after itself, just because I have no use for hot-plug scenarios and didn't envisage it could be causing issues. The most recent update brings the device back to its original state whether retraining succeeded or not, so it should now be transparent to your noisy hot-plug scenarios, while helping stubborn devices at the same time. You might have not noticed this code existed if it had been in its currently proposed shape right from the beginning. It's only those who do nothing that make no mistakes. Maciej ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-12-04 23:44 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-10 13:44 [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing Jiwei Sun 2025-01-11 16:00 ` Maciej W. Rozycki 2025-01-13 12:44 ` Jiwei 2025-01-13 15:08 ` Ilpo Järvinen 2025-01-14 15:04 ` Jiwei 2025-01-14 18:25 ` Ilpo Järvinen 2025-01-15 10:18 ` Lukas Wunner 2025-11-25 19:23 ` [External] : " ALOK TIWARI 2025-12-01 3:54 ` Maciej W. Rozycki 2025-01-15 11:39 ` Jiwei 2025-09-09 12:33 ` [External] : " ALOK TIWARI -- strict thread matches above, loose matches on Subject: below -- 2025-12-01 3:52 [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining Maciej W. Rozycki 2025-12-04 1:40 ` [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing Matthew W Carlis 2025-12-04 23:43 ` Maciej W. Rozycki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox