[PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme

public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
@ 2025-06-11 10:14 Hui Wang
  2025-06-12 16:48 ` Bjorn Helgaas
  2025-08-14 15:55 ` Bjorn Helgaas
  0 siblings, 2 replies; 21+ messages in thread
From: Hui Wang @ 2025-06-11 10:14 UTC (permalink / raw)
  To: linux-pci, bhelgaas
  Cc: raphael.norwitz, alay.shah, suresh.gumpula, ilpo.jarvinen,
	hui.wang

Prior to commit d591f6804e7e ("PCI: Wait for device readiness with
Configuration RRS"), this Intel nvme [8086:0a54] works well. Since
that patch is merged to the kernel, this nvme stops working.

Through debugging, we found that commit introduces the RRS polling in
the pci_dev_wait(), for this nvme, when polling the PCI_VENDOR_ID, it
will return ~0 if the config access is not ready yet, but the polling
expects a return value of 0x0001 or a valid vendor_id, so the RRS
polling doesn't work for this nvme.

Here we add a pci quirk to disable the RRS polling for the device.

Fixes: d591f6804e7e ("PCI: Wait for device readiness with Configuration RRS")
Link: https://bugs.launchpad.net/bugs/2111521
Signed-off-by: Hui Wang <hui.wang@canonical.com>
---
 drivers/pci/probe.c  |  3 ++-
 drivers/pci/quirks.c | 18 ++++++++++++++++++
 include/linux/pci.h  |  2 ++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 4b8693ec9e4c..848fa0e6cf60 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1270,7 +1270,8 @@ static void pci_enable_rrs_sv(struct pci_dev *pdev)
 	if (root_cap & PCI_EXP_RTCAP_RRS_SV) {
 		pcie_capability_set_word(pdev, PCI_EXP_RTCTL,
 					 PCI_EXP_RTCTL_RRS_SVE);
-		pdev->config_rrs_sv = 1;
+		if (!(pdev->dev_flags & PCI_DEV_FLAGS_NO_RRS_SV))
+			pdev->config_rrs_sv = 1;
 	}
 }
 
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index cf483d82572c..519e48ff6448 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6336,3 +6336,21 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
 #endif
+
+/*
+ * Although the root port device claims to support RRS, some devices don't work
+ * with RRS polling, when reading the Vendor ID, they just return ~0 if config
+ * access is not ready, this will break the pci_dev_wait(). Here disable the RRS
+ * forcibly for this type of device.
+ */
+static void quirk_no_rrs_sv(struct pci_dev *dev)
+{
+	struct pci_dev *root;
+
+	root = pcie_find_root_port(dev);
+	if (root) {
+		root->dev_flags |= PCI_DEV_FLAGS_NO_RRS_SV;
+		root->config_rrs_sv = 0;
+	}
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0a54, quirk_no_rrs_sv);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 05e68f35f392..f4dd9ada12e4 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -247,6 +247,8 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_HAS_MSI_MASKING = (__force pci_dev_flags_t) (1 << 12),
 	/* Device requires write to PCI_MSIX_ENTRY_DATA before any MSIX reads */
 	PCI_DEV_FLAGS_MSIX_TOUCH_ENTRY_DATA_FIRST = (__force pci_dev_flags_t) (1 << 13),
+	/* Do not use Configuration Request Retry Status polling in pci_dev_wait() */
+	PCI_DEV_FLAGS_NO_RRS_SV = (__force pci_dev_flags_t) (1 << 14),
 };
 
 enum pci_irq_reroute_variant {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-06-11 10:14 [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme Hui Wang
@ 2025-06-12 16:48 ` Bjorn Helgaas
  2025-06-16 11:55   ` Hui Wang
  2025-08-14 15:55 ` Bjorn Helgaas
  1 sibling, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2025-06-12 16:48 UTC (permalink / raw)
  To: Hui Wang
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Nirmal Patel, Jonathan Derrick

[+cc VMD folks]

On Wed, Jun 11, 2025 at 06:14:42PM +0800, Hui Wang wrote:
> Prior to commit d591f6804e7e ("PCI: Wait for device readiness with
> Configuration RRS"), this Intel nvme [8086:0a54] works well. Since
> that patch is merged to the kernel, this nvme stops working.
> 
> Through debugging, we found that commit introduces the RRS polling in
> the pci_dev_wait(), for this nvme, when polling the PCI_VENDOR_ID, it
> will return ~0 if the config access is not ready yet, but the polling
> expects a return value of 0x0001 or a valid vendor_id, so the RRS
> polling doesn't work for this nvme.

Sorry for breaking this, and thanks for all your work in debugging
this!  Issues like this are really hard to track down.

I would think we would have heard about this earlier if the NVMe
device were broken on all systems.  Maybe there's some connection with
VMD?  From the non-working dmesg log in your bug report
(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/+attachment/5879970/+files/dmesg-60.txt):

  DMI: ASUSTeK COMPUTER INC. ESC8000 G4/Z11PG-D24 Series, BIOS 5501 04/17/2019
  vmd 0000:d7:05.5: PCI host bridge to bus 10000:00
  pci 10000:00:02.0: [8086:2032] type 01 class 0x060400 PCIe Root Port
  pci 10000:00:02.0: PCI bridge to [bus 01]
  pci 10000:00:02.0: bridge window [mem 0xf8000000-0xf81fffff]: assigned
  pci 10000:01:00.0: [8086:0a54] type 00 class 0x010802 PCIe Endpoint
  pci 10000:01:00.0: BAR 0 [mem 0x00000000-0x00003fff 64bit]

  <I think vmd_enable_domain() calls pci_reset_bus() here>

  pci 10000:01:00.0: BAR 0 [mem 0xf8010000-0xf8013fff 64bit]: assigned
  pci 10000:01:00.0: BAR 0: error updating (high 0x00000000 != 0xffffffff)
  pci 10000:01:00.0: BAR 0 [mem 0xf8010000-0xf8013fff 64bit]: assigned
  pci 10000:01:00.0: BAR 0: error updating (0xf8010004 != 0xffffffff)
  nvme nvme0: pci function 10000:01:00.0
  nvme 10000:01:00.0: enabling device (0000 -> 0002)

Things I notice:

  - The 10000:01:00.0 NVMe device is behind a VMD bridge

  - We successfully read the Vendor & Device IDs (8086:0a54)

  - The NVMe device is uninitialized.  We successfully sized the BAR,
    which included successful config reads and writes.  The BAR
    wasn't assigned by BIOS, which is normal since it's behind VMD.

  - We allocated space for BAR 0 but the config writes to program the
    BAR failed.  The read back from the BAR was 0xffffffff; probably a
    PCIe error, e.g., the NVMe device didn't respond.

  - The device *did* respond when nvme_probe() enabled it: the
    "enabling device (0000 -> 0002)" means pci_enable_resources() read
    PCI_COMMAND and got 0x0000.

  - The dmesg from the working config doesn't include the "enabling
    device" line, which suggests that pci_enable_resources() saw
    PCI_COMMAND_MEMORY (0x0002) already set and didn't bother setting
    it again.  I don't know why it would already be set.
   
d591f6804e7e really only changes pci_dev_wait(), which is used after
device resets.  I think vmd_enable_domain() resets the VMD Root Ports
after pci_scan_child_bus(), and maybe we're not waiting long enough
afterwards.

My guess is that we got the ~0 because we did a config read too soon
after reset and the device didn't respond.  The Root Port would time
out, log an error, and synthesize ~0 data to complete the CPU read
(see PCIe r6.0, sec 2.3.2 implementation note).

It's *possible* that we waited long enough but the NVMe device is
broken and didn't respond when it should have, but my money is on a
software defect.

There are a few pci_dbg() calls about these delays; can you set
CONFIG_DYNAMIC_DEBUG=y and boot with dyndbg="file drivers/pci/* +p" to
collect that output?  Please also collect the "sudo lspci -vv" output
from a working system.

Bjorn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-06-12 16:48 ` Bjorn Helgaas
@ 2025-06-16 11:55   ` Hui Wang
  2025-06-16 13:38     ` Hui Wang
  0 siblings, 1 reply; 21+ messages in thread
From: Hui Wang @ 2025-06-16 11:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Nirmal Patel, Jonathan Derrick


On 6/13/25 00:48, Bjorn Helgaas wrote:
> [+cc VMD folks]
>
> On Wed, Jun 11, 2025 at 06:14:42PM +0800, Hui Wang wrote:
>> Prior to commit d591f6804e7e ("PCI: Wait for device readiness with
>> Configuration RRS"), this Intel nvme [8086:0a54] works well. Since
>> that patch is merged to the kernel, this nvme stops working.
>>
>> Through debugging, we found that commit introduces the RRS polling in
>> the pci_dev_wait(), for this nvme, when polling the PCI_VENDOR_ID, it
>> will return ~0 if the config access is not ready yet, but the polling
>> expects a return value of 0x0001 or a valid vendor_id, so the RRS
>> polling doesn't work for this nvme.
> Sorry for breaking this, and thanks for all your work in debugging
> this!  Issues like this are really hard to track down.
>
> I would think we would have heard about this earlier if the NVMe
> device were broken on all systems.  Maybe there's some connection with
> VMD?  From the non-working dmesg log in your bug report
> (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/+attachment/5879970/+files/dmesg-60.txt):
>
>    DMI: ASUSTeK COMPUTER INC. ESC8000 G4/Z11PG-D24 Series, BIOS 5501 04/17/2019
>    vmd 0000:d7:05.5: PCI host bridge to bus 10000:00
>    pci 10000:00:02.0: [8086:2032] type 01 class 0x060400 PCIe Root Port
>    pci 10000:00:02.0: PCI bridge to [bus 01]
>    pci 10000:00:02.0: bridge window [mem 0xf8000000-0xf81fffff]: assigned
>    pci 10000:01:00.0: [8086:0a54] type 00 class 0x010802 PCIe Endpoint
>    pci 10000:01:00.0: BAR 0 [mem 0x00000000-0x00003fff 64bit]
>
>    <I think vmd_enable_domain() calls pci_reset_bus() here>

Yes, and the pci_dev_wait() is called here. With the RRS polling, will 
get a ~0 from PCI_VENDOR_ID, then will get 0xfffffff when configuring 
the BAR0 subsequently. With the original polling method, it will get 
enough delay in the pci_dev_wait(), so nvme works normally.

The line "[   10.193589] hhhhhhhhhhhhhhhhhhhhhhhhhhhh dev->device = 0a54 
id = ffffffff" is output from pci_dev_wait(), please refer to 
https://launchpadlibrarian.net/798708446/LP2111521-dmesg-test9.txt

>
>    pci 10000:01:00.0: BAR 0 [mem 0xf8010000-0xf8013fff 64bit]: assigned
>    pci 10000:01:00.0: BAR 0: error updating (high 0x00000000 != 0xffffffff)
>    pci 10000:01:00.0: BAR 0 [mem 0xf8010000-0xf8013fff 64bit]: assigned
>    pci 10000:01:00.0: BAR 0: error updating (0xf8010004 != 0xffffffff)
>    nvme nvme0: pci function 10000:01:00.0
>    nvme 10000:01:00.0: enabling device (0000 -> 0002)
>
> Things I notice:
>
>    - The 10000:01:00.0 NVMe device is behind a VMD bridge
>
>    - We successfully read the Vendor & Device IDs (8086:0a54)
>
>    - The NVMe device is uninitialized.  We successfully sized the BAR,
>      which included successful config reads and writes.  The BAR
>      wasn't assigned by BIOS, which is normal since it's behind VMD.
>
>    - We allocated space for BAR 0 but the config writes to program the
>      BAR failed.  The read back from the BAR was 0xffffffff; probably a
>      PCIe error, e.g., the NVMe device didn't respond.
>
>    - The device *did* respond when nvme_probe() enabled it: the
>      "enabling device (0000 -> 0002)" means pci_enable_resources() read
>      PCI_COMMAND and got 0x0000.
>
>    - The dmesg from the working config doesn't include the "enabling
>      device" line, which suggests that pci_enable_resources() saw
>      PCI_COMMAND_MEMORY (0x0002) already set and didn't bother setting
>      it again.  I don't know why it would already be set.
>     
> d591f6804e7e really only changes pci_dev_wait(), which is used after
> device resets.  I think vmd_enable_domain() resets the VMD Root Ports
> after pci_scan_child_bus(), and maybe we're not waiting long enough
> afterwards.
>
> My guess is that we got the ~0 because we did a config read too soon
> after reset and the device didn't respond.  The Root Port would time
> out, log an error, and synthesize ~0 data to complete the CPU read
> (see PCIe r6.0, sec 2.3.2 implementation note).
>
> It's *possible* that we waited long enough but the NVMe device is
> broken and didn't respond when it should have, but my money is on a
> software defect.
>
> There are a few pci_dbg() calls about these delays; can you set
> CONFIG_DYNAMIC_DEBUG=y and boot with dyndbg="file drivers/pci/* +p" to
> collect that output?  Please also collect the "sudo lspci -vv" output
> from a working system.

Already passed the testing request to bug reporters, wait for their 
feedback.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/55

Thanks,

Hui.


>
> Bjorn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-06-16 11:55   ` Hui Wang
@ 2025-06-16 13:38     ` Hui Wang
  2025-06-17 20:12       ` Bjorn Helgaas
  0 siblings, 1 reply; 21+ messages in thread
From: Hui Wang @ 2025-06-16 13:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Nirmal Patel, Jonathan Derrick


On 6/16/25 19:55, Hui Wang wrote:
>
> On 6/13/25 00:48, Bjorn Helgaas wrote:
>> [+cc VMD folks]
>>
>> On Wed, Jun 11, 2025 at 06:14:42PM +0800, Hui Wang wrote:
>>> Prior to commit d591f6804e7e ("PCI: Wait for device readiness with
>>> Configuration RRS"), this Intel nvme [8086:0a54] works well. Since
>>> that patch is merged to the kernel, this nvme stops working.
>>>
>>> Through debugging, we found that commit introduces the RRS polling in
>>> the pci_dev_wait(), for this nvme, when polling the PCI_VENDOR_ID, it
>>> will return ~0 if the config access is not ready yet, but the polling
>>> expects a return value of 0x0001 or a valid vendor_id, so the RRS
>>> polling doesn't work for this nvme.
>> Sorry for breaking this, and thanks for all your work in debugging
>> this!  Issues like this are really hard to track down.
>>
>> I would think we would have heard about this earlier if the NVMe
>> device were broken on all systems.  Maybe there's some connection with
>> VMD?  From the non-working dmesg log in your bug report
>> (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/+attachment/5879970/+files/dmesg-60.txt): 
>>
>>
>>    DMI: ASUSTeK COMPUTER INC. ESC8000 G4/Z11PG-D24 Series, BIOS 5501 
>> 04/17/2019
>>    vmd 0000:d7:05.5: PCI host bridge to bus 10000:00
>>    pci 10000:00:02.0: [8086:2032] type 01 class 0x060400 PCIe Root Port
>>    pci 10000:00:02.0: PCI bridge to [bus 01]
>>    pci 10000:00:02.0: bridge window [mem 0xf8000000-0xf81fffff]: 
>> assigned
>>    pci 10000:01:00.0: [8086:0a54] type 00 class 0x010802 PCIe Endpoint
>>    pci 10000:01:00.0: BAR 0 [mem 0x00000000-0x00003fff 64bit]
>>
>>    <I think vmd_enable_domain() calls pci_reset_bus() here>
>
> Yes, and the pci_dev_wait() is called here. With the RRS polling, will 
> get a ~0 from PCI_VENDOR_ID, then will get 0xfffffff when configuring 
> the BAR0 subsequently. With the original polling method, it will get 
> enough delay in the pci_dev_wait(), so nvme works normally.
>
> The line "[   10.193589] hhhhhhhhhhhhhhhhhhhhhhhhhhhh dev->device = 
> 0a54 id = ffffffff" is output from pci_dev_wait(), please refer to 
> https://launchpadlibrarian.net/798708446/LP2111521-dmesg-test9.txt
>
>>
>>    pci 10000:01:00.0: BAR 0 [mem 0xf8010000-0xf8013fff 64bit]: assigned
>>    pci 10000:01:00.0: BAR 0: error updating (high 0x00000000 != 
>> 0xffffffff)
>>    pci 10000:01:00.0: BAR 0 [mem 0xf8010000-0xf8013fff 64bit]: assigned
>>    pci 10000:01:00.0: BAR 0: error updating (0xf8010004 != 0xffffffff)
>>    nvme nvme0: pci function 10000:01:00.0
>>    nvme 10000:01:00.0: enabling device (0000 -> 0002)
>>
>> Things I notice:
>>
>>    - The 10000:01:00.0 NVMe device is behind a VMD bridge
>>
>>    - We successfully read the Vendor & Device IDs (8086:0a54)
>>
>>    - The NVMe device is uninitialized.  We successfully sized the BAR,
>>      which included successful config reads and writes.  The BAR
>>      wasn't assigned by BIOS, which is normal since it's behind VMD.
>>
>>    - We allocated space for BAR 0 but the config writes to program the
>>      BAR failed.  The read back from the BAR was 0xffffffff; probably a
>>      PCIe error, e.g., the NVMe device didn't respond.
>>
>>    - The device *did* respond when nvme_probe() enabled it: the
>>      "enabling device (0000 -> 0002)" means pci_enable_resources() read
>>      PCI_COMMAND and got 0x0000.
>>
>>    - The dmesg from the working config doesn't include the "enabling
>>      device" line, which suggests that pci_enable_resources() saw
>>      PCI_COMMAND_MEMORY (0x0002) already set and didn't bother setting
>>      it again.  I don't know why it would already be set.
>>     d591f6804e7e really only changes pci_dev_wait(), which is used after
>> device resets.  I think vmd_enable_domain() resets the VMD Root Ports
>> after pci_scan_child_bus(), and maybe we're not waiting long enough
>> afterwards.
>>
>> My guess is that we got the ~0 because we did a config read too soon
>> after reset and the device didn't respond.  The Root Port would time
>> out, log an error, and synthesize ~0 data to complete the CPU read
>> (see PCIe r6.0, sec 2.3.2 implementation note).
>>
>> It's *possible* that we waited long enough but the NVMe device is
>> broken and didn't respond when it should have, but my money is on a
>> software defect.
>>
>> There are a few pci_dbg() calls about these delays; can you set
>> CONFIG_DYNAMIC_DEBUG=y and boot with dyndbg="file drivers/pci/* +p" to
>> collect that output?  Please also collect the "sudo lspci -vv" output
>> from a working system.
>
> Already passed the testing request to bug reporters, wait for their 
> feedback.
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/55
>
> Thanks,
>
> Hui.

This is the dmesg with dyndbg="file drivers/pci/* +p"

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/56

And this is the lspci output:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/57

>
>
>>
>> Bjorn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-06-16 13:38     ` Hui Wang
@ 2025-06-17 20:12       ` Bjorn Helgaas
  2025-06-23 22:58         ` Bjorn Helgaas
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2025-06-17 20:12 UTC (permalink / raw)
  To: Hui Wang
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Nirmal Patel, Jonathan Derrick,
	Chaitanya Kumar Borah, Ville Syrjälä

[+cc Chaitanya, Ville for possible Intel NVMe contacts]

On Mon, Jun 16, 2025 at 09:38:25PM +0800, Hui Wang wrote:
> On 6/16/25 19:55, Hui Wang wrote:
> > On 6/13/25 00:48, Bjorn Helgaas wrote:
> > > [+cc VMD folks]
> > > 
> > > On Wed, Jun 11, 2025 at 06:14:42PM +0800, Hui Wang wrote:
> > > > Prior to commit d591f6804e7e ("PCI: Wait for device readiness with
> > > > Configuration RRS"), this Intel nvme [8086:0a54] works well. Since
> > > > that patch is merged to the kernel, this nvme stops working.
> > > > 
> > > > Through debugging, we found that commit introduces the RRS polling in
> > > > the pci_dev_wait(), for this nvme, when polling the PCI_VENDOR_ID, it
> > > > will return ~0 if the config access is not ready yet, but the polling
> > > > expects a return value of 0x0001 or a valid vendor_id, so the RRS
> > > > polling doesn't work for this nvme.
> > >
> > > Sorry for breaking this, and thanks for all your work in debugging
> > > this!  Issues like this are really hard to track down.
> > > 
> > > I would think we would have heard about this earlier if the NVMe
> > > device were broken on all systems.  Maybe there's some connection with
> > > VMD?  From the non-working dmesg log in your bug report
> > > (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/+attachment/5879970/+files/dmesg-60.txt):
> > > 
> > > 
> > >    DMI: ASUSTeK COMPUTER INC. ESC8000 G4/Z11PG-D24 Series, BIOS 5501
> > > 04/17/2019
> > >    vmd 0000:d7:05.5: PCI host bridge to bus 10000:00
> > >    pci 10000:00:02.0: [8086:2032] type 01 class 0x060400 PCIe Root Port
> > >    pci 10000:00:02.0: PCI bridge to [bus 01]
> > >    pci 10000:00:02.0: bridge window [mem 0xf8000000-0xf81fffff]:
> > > assigned
> > >    pci 10000:01:00.0: [8086:0a54] type 00 class 0x010802 PCIe Endpoint
> > >    pci 10000:01:00.0: BAR 0 [mem 0x00000000-0x00003fff 64bit]
> > > 
> > >    <I think vmd_enable_domain() calls pci_reset_bus() here>
> > 
> > Yes, and the pci_dev_wait() is called here. With the RRS polling, will
> > get a ~0 from PCI_VENDOR_ID, then will get 0xfffffff when configuring
> > the BAR0 subsequently. With the original polling method, it will get
> > enough delay in the pci_dev_wait(), so nvme works normally.
> > 
> > The line "[   10.193589] hhhhhhhhhhhhhhhhhhhhhhhhhhhh dev->device = 0a54
> > id = ffffffff" is output from pci_dev_wait(), please refer to
> > https://launchpadlibrarian.net/798708446/LP2111521-dmesg-test9.txt
> > 
> > >    pci 10000:01:00.0: BAR 0 [mem 0xf8010000-0xf8013fff 64bit]: assigned
> > >    pci 10000:01:00.0: BAR 0: error updating (high 0x00000000 !=
> > > 0xffffffff)
> > >    pci 10000:01:00.0: BAR 0 [mem 0xf8010000-0xf8013fff 64bit]: assigned
> > >    pci 10000:01:00.0: BAR 0: error updating (0xf8010004 != 0xffffffff)
> > >    nvme nvme0: pci function 10000:01:00.0
> > >    nvme 10000:01:00.0: enabling device (0000 -> 0002)
> > > 
> > > Things I notice:
> > > 
> > >    - The 10000:01:00.0 NVMe device is behind a VMD bridge
> > > 
> > >    - We successfully read the Vendor & Device IDs (8086:0a54)
> > > 
> > >    - The NVMe device is uninitialized.  We successfully sized the BAR,
> > >      which included successful config reads and writes.  The BAR
> > >      wasn't assigned by BIOS, which is normal since it's behind VMD.
> > > 
> > >    - We allocated space for BAR 0 but the config writes to program the
> > >      BAR failed.  The read back from the BAR was 0xffffffff; probably a
> > >      PCIe error, e.g., the NVMe device didn't respond.
> > > 
> > >    - The device *did* respond when nvme_probe() enabled it: the
> > >      "enabling device (0000 -> 0002)" means pci_enable_resources() read
> > >      PCI_COMMAND and got 0x0000.
> > > 
> > >    - The dmesg from the working config doesn't include the "enabling
> > >      device" line, which suggests that pci_enable_resources() saw
> > >      PCI_COMMAND_MEMORY (0x0002) already set and didn't bother setting
> > >      it again.  I don't know why it would already be set.
> > >      d591f6804e7e really only changes pci_dev_wait(), which is used after
> > >      device resets.  I think vmd_enable_domain() resets the VMD Root Ports
> > >      after pci_scan_child_bus(), and maybe we're not waiting long enough
> > >      afterwards.
> > > 
> > > My guess is that we got the ~0 because we did a config read too soon
> > > after reset and the device didn't respond.  The Root Port would time
> > > out, log an error, and synthesize ~0 data to complete the CPU read
> > > (see PCIe r6.0, sec 2.3.2 implementation note).
> > > 
> > > It's *possible* that we waited long enough but the NVMe device is
> > > broken and didn't respond when it should have, but my money is on a
> > > software defect.
> > > 
> > > There are a few pci_dbg() calls about these delays; can you set
> > > CONFIG_DYNAMIC_DEBUG=y and boot with dyndbg="file drivers/pci/* +p" to
> > > collect that output?  Please also collect the "sudo lspci -vv" output
> > > from a working system.
> > 
> > Already passed the testing request to bug reporters, wait for their
> > feedback.
> > 
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/55
> 
> This is the dmesg with dyndbg="file drivers/pci/* +p"
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/56

Thank you very much!  I'm stumped.

Both Ports here are capable of 8GT/s, and the Root Port is hot-plug
capable and supports Data Link Layer Link Active Reporting but not DRS:

  10000:00:02.0 Intel Sky Lake-E PCI Express Root Port C
    LnkCap: Port #11, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <16us
            ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
    LnkSta: Speed 8GT/s, Width x4
            TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
    SltCap: AttnBtn- PwrCtrl- MRL- AttnInd+ PwrInd+ HotPlug+ Surprise+
    LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-

  10000:01:00.0 Intel NVMe Datacenter SSD
    LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s, Exit Latency L0s <64ns

After a reset, we have to delay before doing config reads.  The wait
time requirements are in PCIe r6.0, sec 6.6.1:

  • ... For cases where system software cannot determine that DRS is
    supported by the attached device, or by the Downstream Port above
    the attached device [as in this case]:

    ◦ ...

    ◦ With a Downstream Port that supports Link speeds greater than
      5.0 GT/s, software must wait a minimum of 100 ms after Link
      training completes before sending a Configuration Request to the
      device immediately below that Port. Software can determine when
      Link training completes by polling the Data Link Layer Link
      Active bit or by setting up an associated interrupt (see §
      Section 6.7.3.3). It is strongly recommended for software to
      use this mechanism whenever the Downstream Port supports it.

Since the Port supports 8GT/s, we should wait 100ms after Link training
completes (PCI_EXP_LNKSTA_DLLLA becomes set), and based on the code
path below, I think we *do*:

  pci_bridge_secondary_bus_reset
    pcibios_reset_secondary_bus
      pci_reset_secondary_bus
        # assert PCI_BRIDGE_CTL_BUS_RESET
    pci_bridge_wait_for_secondary_bus(bridge, "bus reset")
      pci_dbg("waiting %d ms for downstream link, after activation\n") # 10.86
      delay = pci_bus_max_d3cold_delay()    # default 100ms
      pcie_wait_for_link_delay(bridge, active=true, delay=100)
        pcie_wait_for_link_status(use_lt=false, active=true)
          end_jiffies = jiffies + PCIE_LINK_RETRAIN_TIMEOUT_MS
          do {
            pcie_capability_read_word(PCI_EXP_LNKSTA, &lnksta)
            if ((lnksta & PCI_EXP_LNKSTA_DLLLA) == PCI_EXP_LNKSTA_DLLLA)
              return 0
            msleep(1)                       # likely wait several ms for link active
          } while (time_before(jiffies, end_jiffies))
        if (active)                         # (true)
          msleep(delay)                     # wait 100ms here
      pci_dev_wait(child, "bus reset", PCIE_RESET_READY_POLL_MS - delay)
        delay = 1
        for (;;) {
          pci_read_config_dword(PCI_VENDOR_ID, &id)
          if (!pci_bus_rrs_vendor_id(id))   # got 0xffff, assume valid
            break;
        }
        pci_dbg("ready 0ms after bus reset", delay - 1)  # 11.11

And from the dmesg log:

  [   10.862226] pci 10000:00:02.0: waiting 100 ms for downstream link, after activation
  [   11.109581] pci 10000:01:00.0: ready 0ms after bus reset

it looks like we waited about .25s (250ms) after the link came up
before trying to read the Vendor ID, which should be plenty.

I guess maybe this device requires more than 250ms after reset before
it can respond?  That still seems surprising to me; I *assume* Intel
would have tested this for spec conformance.  But maybe there's some
erratum.  I added a couple Intel folks who are mentioned in the nvme
history in case they have pointers.

But it *is* also true that pci_dev_wait() doesn't know how to handle
PCIe errors at all, and maybe there's a way to make it smarter.

Can you try adding the patch below and collect another dmesg log (with
dyndbg="file drivers/pci/* +p" as before)?  This isn't a fix, but
might give a little more insight into what's happening.

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e9448d55113b..42a36ff5c6cd 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1268,6 +1268,7 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 	bool retrain = false;
 	struct pci_dev *root, *bridge;
 
+	pci_dbg(dev, "%s: %s timeout %d\n", __func__, reset_type, timeout);
 	root = pcie_find_root_port(dev);
 
 	if (pci_is_pcie(dev)) {
@@ -1305,14 +1306,32 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 
 		if (root && root->config_rrs_sv) {
 			pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
-			if (!pci_bus_rrs_vendor_id(id))
-				break;
+			pci_dbg(dev, "%s: vf %d read %#06x\n", __func__,
+				dev->is_virtfn, id);
+			if (pci_bus_rrs_vendor_id(id))
+				goto retry;
+
+			/*
+			 * We might read 0xffff if the device is a VF and
+			 * the read was successful (the VF Vendor ID is
+			 * 0xffff per spec).
+			 *
+			 * If the device is not a VF, 0xffff likely means
+			 * there was an error on PCIe.  E.g., maybe the
+			 * device couldn't even respond with RRS status,
+			 * and the RC timed out and synthesized ~0 data.
+			 */
+			if (PCI_POSSIBLE_ERROR(id) && !dev->is_virtfn)
+				    goto retry;
+
+			break;
 		} else {
 			pci_read_config_dword(dev, PCI_COMMAND, &id);
 			if (!PCI_POSSIBLE_ERROR(id))
 				break;
 		}
 
+retry:
 		if (delay > timeout) {
 			pci_warn(dev, "not ready %dms after %s; giving up\n",
 				 delay - 1, reset_type);
@@ -4760,6 +4779,8 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
 	 * Some controllers might not implement link active reporting. In this
 	 * case, we wait for 1000 ms + any delay requested by the caller.
 	 */
+	pci_dbg(pdev, "%s: active %d delay %d link_active_reporting %d\n",
+		__func__, active, delay, pdev->link_active_reporting);
 	if (!pdev->link_active_reporting) {
 		msleep(PCIE_LINK_RETRAIN_TIMEOUT_MS + delay);
 		return true;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-06-17 20:12       ` Bjorn Helgaas
@ 2025-06-23 22:58         ` Bjorn Helgaas
  2025-06-24  0:58           ` Hui Wang
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2025-06-23 22:58 UTC (permalink / raw)
  To: Hui Wang
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Nirmal Patel, Jonathan Derrick,
	Chaitanya Kumar Borah, Ville Syrjälä

Ping, any chance you could try the patch below to collect a little
more data.  The proposed quirk covers up a deeper problem, and I want
to fix that problem instead of covering it up.

On Tue, Jun 17, 2025 at 03:12:55PM -0500, Bjorn Helgaas wrote:
> [+cc Chaitanya, Ville for possible Intel NVMe contacts]
> 
> On Mon, Jun 16, 2025 at 09:38:25PM +0800, Hui Wang wrote:
> > On 6/16/25 19:55, Hui Wang wrote:
> > > On 6/13/25 00:48, Bjorn Helgaas wrote:
> > > > [+cc VMD folks]
> > > > 
> > > > On Wed, Jun 11, 2025 at 06:14:42PM +0800, Hui Wang wrote:
> > > > > Prior to commit d591f6804e7e ("PCI: Wait for device readiness with
> > > > > Configuration RRS"), this Intel nvme [8086:0a54] works well. Since
> > > > > that patch is merged to the kernel, this nvme stops working.
> > > > > 
> > > > > Through debugging, we found that commit introduces the RRS polling in
> > > > > the pci_dev_wait(), for this nvme, when polling the PCI_VENDOR_ID, it
> > > > > will return ~0 if the config access is not ready yet, but the polling
> > > > > expects a return value of 0x0001 or a valid vendor_id, so the RRS
> > > > > polling doesn't work for this nvme.
> > > >
> > > > Sorry for breaking this, and thanks for all your work in debugging
> > > > this!  Issues like this are really hard to track down.
> > > > 
> > > > I would think we would have heard about this earlier if the NVMe
> > > > device were broken on all systems.  Maybe there's some connection with
> > > > VMD?  From the non-working dmesg log in your bug report
> > > > (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/+attachment/5879970/+files/dmesg-60.txt):
> > > > 
> > > > 
> > > >    DMI: ASUSTeK COMPUTER INC. ESC8000 G4/Z11PG-D24 Series, BIOS 5501
> > > > 04/17/2019
> > > >    vmd 0000:d7:05.5: PCI host bridge to bus 10000:00
> > > >    pci 10000:00:02.0: [8086:2032] type 01 class 0x060400 PCIe Root Port
> > > >    pci 10000:00:02.0: PCI bridge to [bus 01]
> > > >    pci 10000:00:02.0: bridge window [mem 0xf8000000-0xf81fffff]:
> > > > assigned
> > > >    pci 10000:01:00.0: [8086:0a54] type 00 class 0x010802 PCIe Endpoint
> > > >    pci 10000:01:00.0: BAR 0 [mem 0x00000000-0x00003fff 64bit]
> > > > 
> > > >    <I think vmd_enable_domain() calls pci_reset_bus() here>
> > > 
> > > Yes, and the pci_dev_wait() is called here. With the RRS polling, will
> > > get a ~0 from PCI_VENDOR_ID, then will get 0xfffffff when configuring
> > > the BAR0 subsequently. With the original polling method, it will get
> > > enough delay in the pci_dev_wait(), so nvme works normally.
> > > 
> > > The line "[   10.193589] hhhhhhhhhhhhhhhhhhhhhhhhhhhh dev->device = 0a54
> > > id = ffffffff" is output from pci_dev_wait(), please refer to
> > > https://launchpadlibrarian.net/798708446/LP2111521-dmesg-test9.txt
> > > 
> > > >    pci 10000:01:00.0: BAR 0 [mem 0xf8010000-0xf8013fff 64bit]: assigned
> > > >    pci 10000:01:00.0: BAR 0: error updating (high 0x00000000 !=
> > > > 0xffffffff)
> > > >    pci 10000:01:00.0: BAR 0 [mem 0xf8010000-0xf8013fff 64bit]: assigned
> > > >    pci 10000:01:00.0: BAR 0: error updating (0xf8010004 != 0xffffffff)
> > > >    nvme nvme0: pci function 10000:01:00.0
> > > >    nvme 10000:01:00.0: enabling device (0000 -> 0002)
> > > > 
> > > > Things I notice:
> > > > 
> > > >    - The 10000:01:00.0 NVMe device is behind a VMD bridge
> > > > 
> > > >    - We successfully read the Vendor & Device IDs (8086:0a54)
> > > > 
> > > >    - The NVMe device is uninitialized.  We successfully sized the BAR,
> > > >      which included successful config reads and writes.  The BAR
> > > >      wasn't assigned by BIOS, which is normal since it's behind VMD.
> > > > 
> > > >    - We allocated space for BAR 0 but the config writes to program the
> > > >      BAR failed.  The read back from the BAR was 0xffffffff; probably a
> > > >      PCIe error, e.g., the NVMe device didn't respond.
> > > > 
> > > >    - The device *did* respond when nvme_probe() enabled it: the
> > > >      "enabling device (0000 -> 0002)" means pci_enable_resources() read
> > > >      PCI_COMMAND and got 0x0000.
> > > > 
> > > >    - The dmesg from the working config doesn't include the "enabling
> > > >      device" line, which suggests that pci_enable_resources() saw
> > > >      PCI_COMMAND_MEMORY (0x0002) already set and didn't bother setting
> > > >      it again.  I don't know why it would already be set.
> > > >      d591f6804e7e really only changes pci_dev_wait(), which is used after
> > > >      device resets.  I think vmd_enable_domain() resets the VMD Root Ports
> > > >      after pci_scan_child_bus(), and maybe we're not waiting long enough
> > > >      afterwards.
> > > > 
> > > > My guess is that we got the ~0 because we did a config read too soon
> > > > after reset and the device didn't respond.  The Root Port would time
> > > > out, log an error, and synthesize ~0 data to complete the CPU read
> > > > (see PCIe r6.0, sec 2.3.2 implementation note).
> > > > 
> > > > It's *possible* that we waited long enough but the NVMe device is
> > > > broken and didn't respond when it should have, but my money is on a
> > > > software defect.
> > > > 
> > > > There are a few pci_dbg() calls about these delays; can you set
> > > > CONFIG_DYNAMIC_DEBUG=y and boot with dyndbg="file drivers/pci/* +p" to
> > > > collect that output?  Please also collect the "sudo lspci -vv" output
> > > > from a working system.
> > > 
> > > Already passed the testing request to bug reporters, wait for their
> > > feedback.
> > > 
> > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/55
> > 
> > This is the dmesg with dyndbg="file drivers/pci/* +p"
> > 
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/56
> 
> Thank you very much!  I'm stumped.
> 
> Both Ports here are capable of 8GT/s, and the Root Port is hot-plug
> capable and supports Data Link Layer Link Active Reporting but not DRS:
> 
>   10000:00:02.0 Intel Sky Lake-E PCI Express Root Port C
>     LnkCap: Port #11, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <16us
>             ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
>     LnkSta: Speed 8GT/s, Width x4
>             TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
>     SltCap: AttnBtn- PwrCtrl- MRL- AttnInd+ PwrInd+ HotPlug+ Surprise+
>     LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
> 
>   10000:01:00.0 Intel NVMe Datacenter SSD
>     LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s, Exit Latency L0s <64ns
> 
> After a reset, we have to delay before doing config reads.  The wait
> time requirements are in PCIe r6.0, sec 6.6.1:
> 
>   • ... For cases where system software cannot determine that DRS is
>     supported by the attached device, or by the Downstream Port above
>     the attached device [as in this case]:
> 
>     ◦ ...
> 
>     ◦ With a Downstream Port that supports Link speeds greater than
>       5.0 GT/s, software must wait a minimum of 100 ms after Link
>       training completes before sending a Configuration Request to the
>       device immediately below that Port. Software can determine when
>       Link training completes by polling the Data Link Layer Link
>       Active bit or by setting up an associated interrupt (see §
>       Section 6.7.3.3). It is strongly recommended for software to
>       use this mechanism whenever the Downstream Port supports it.
> 
> Since the Port supports 8GT/s, we should wait 100ms after Link training
> completes (PCI_EXP_LNKSTA_DLLLA becomes set), and based on the code
> path below, I think we *do*:
> 
>   pci_bridge_secondary_bus_reset
>     pcibios_reset_secondary_bus
>       pci_reset_secondary_bus
>         # assert PCI_BRIDGE_CTL_BUS_RESET
>     pci_bridge_wait_for_secondary_bus(bridge, "bus reset")
>       pci_dbg("waiting %d ms for downstream link, after activation\n") # 10.86
>       delay = pci_bus_max_d3cold_delay()    # default 100ms
>       pcie_wait_for_link_delay(bridge, active=true, delay=100)
>         pcie_wait_for_link_status(use_lt=false, active=true)
>           end_jiffies = jiffies + PCIE_LINK_RETRAIN_TIMEOUT_MS
>           do {
>             pcie_capability_read_word(PCI_EXP_LNKSTA, &lnksta)
>             if ((lnksta & PCI_EXP_LNKSTA_DLLLA) == PCI_EXP_LNKSTA_DLLLA)
>               return 0
>             msleep(1)                       # likely wait several ms for link active
>           } while (time_before(jiffies, end_jiffies))
>         if (active)                         # (true)
>           msleep(delay)                     # wait 100ms here
>       pci_dev_wait(child, "bus reset", PCIE_RESET_READY_POLL_MS - delay)
>         delay = 1
>         for (;;) {
>           pci_read_config_dword(PCI_VENDOR_ID, &id)
>           if (!pci_bus_rrs_vendor_id(id))   # got 0xffff, assume valid
>             break;
>         }
>         pci_dbg("ready 0ms after bus reset", delay - 1)  # 11.11
> 
> And from the dmesg log:
> 
>   [   10.862226] pci 10000:00:02.0: waiting 100 ms for downstream link, after activation
>   [   11.109581] pci 10000:01:00.0: ready 0ms after bus reset
> 
> it looks like we waited about .25s (250ms) after the link came up
> before trying to read the Vendor ID, which should be plenty.
> 
> I guess maybe this device requires more than 250ms after reset before
> it can respond?  That still seems surprising to me; I *assume* Intel
> would have tested this for spec conformance.  But maybe there's some
> erratum.  I added a couple Intel folks who are mentioned in the nvme
> history in case they have pointers.
> 
> But it *is* also true that pci_dev_wait() doesn't know how to handle
> PCIe errors at all, and maybe there's a way to make it smarter.
> 
> Can you try adding the patch below and collect another dmesg log (with
> dyndbg="file drivers/pci/* +p" as before)?  This isn't a fix, but
> might give a little more insight into what's happening.
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e9448d55113b..42a36ff5c6cd 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1268,6 +1268,7 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>  	bool retrain = false;
>  	struct pci_dev *root, *bridge;
>  
> +	pci_dbg(dev, "%s: %s timeout %d\n", __func__, reset_type, timeout);
>  	root = pcie_find_root_port(dev);
>  
>  	if (pci_is_pcie(dev)) {
> @@ -1305,14 +1306,32 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>  
>  		if (root && root->config_rrs_sv) {
>  			pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> -			if (!pci_bus_rrs_vendor_id(id))
> -				break;
> +			pci_dbg(dev, "%s: vf %d read %#06x\n", __func__,
> +				dev->is_virtfn, id);
> +			if (pci_bus_rrs_vendor_id(id))
> +				goto retry;
> +
> +			/*
> +			 * We might read 0xffff if the device is a VF and
> +			 * the read was successful (the VF Vendor ID is
> +			 * 0xffff per spec).
> +			 *
> +			 * If the device is not a VF, 0xffff likely means
> +			 * there was an error on PCIe.  E.g., maybe the
> +			 * device couldn't even respond with RRS status,
> +			 * and the RC timed out and synthesized ~0 data.
> +			 */
> +			if (PCI_POSSIBLE_ERROR(id) && !dev->is_virtfn)
> +				    goto retry;
> +
> +			break;
>  		} else {
>  			pci_read_config_dword(dev, PCI_COMMAND, &id);
>  			if (!PCI_POSSIBLE_ERROR(id))
>  				break;
>  		}
>  
> +retry:
>  		if (delay > timeout) {
>  			pci_warn(dev, "not ready %dms after %s; giving up\n",
>  				 delay - 1, reset_type);
> @@ -4760,6 +4779,8 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
>  	 * Some controllers might not implement link active reporting. In this
>  	 * case, we wait for 1000 ms + any delay requested by the caller.
>  	 */
> +	pci_dbg(pdev, "%s: active %d delay %d link_active_reporting %d\n",
> +		__func__, active, delay, pdev->link_active_reporting);
>  	if (!pdev->link_active_reporting) {
>  		msleep(PCIE_LINK_RETRAIN_TIMEOUT_MS + delay);
>  		return true;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-06-23 22:58         ` Bjorn Helgaas
@ 2025-06-24  0:58           ` Hui Wang
  2025-07-01 23:23             ` Bjorn Helgaas
  0 siblings, 1 reply; 21+ messages in thread
From: Hui Wang @ 2025-06-24  0:58 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Nirmal Patel, Jonathan Derrick,
	Chaitanya Kumar Borah, Ville Syrjälä

Sorry for late response, I was OOO the past week.

This is the log after applied your patch: 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/61

Looks like the "retry" makes the nvme work.


On 6/24/25 06:58, Bjorn Helgaas wrote:
> Ping, any chance you could try the patch below to collect a little
> more data.  The proposed quirk covers up a deeper problem, and I want
> to fix that problem instead of covering it up.
>
> On Tue, Jun 17, 2025 at 03:12:55PM -0500, Bjorn Helgaas wrote:
>> [+cc Chaitanya, Ville for possible Intel NVMe contacts]
>>
>> On Mon, Jun 16, 2025 at 09:38:25PM +0800, Hui Wang wrote:
>>> On 6/16/25 19:55, Hui Wang wrote:
>>>> On 6/13/25 00:48, Bjorn Helgaas wrote:
>>>>> [+cc VMD folks]
>>>>>
>>>>> On Wed, Jun 11, 2025 at 06:14:42PM +0800, Hui Wang wrote:
>>>>>> Prior to commit d591f6804e7e ("PCI: Wait for device readiness with
>>>>>> Configuration RRS"), this Intel nvme [8086:0a54] works well. Since
>>>>>> that patch is merged to the kernel, this nvme stops working.
>>>>>>
>>>>>> Through debugging, we found that commit introduces the RRS polling in
>>>>>> the pci_dev_wait(), for this nvme, when polling the PCI_VENDOR_ID, it
>>>>>> will return ~0 if the config access is not ready yet, but the polling
>>>>>> expects a return value of 0x0001 or a valid vendor_id, so the RRS
>>>>>> polling doesn't work for this nvme.
>>>>> Sorry for breaking this, and thanks for all your work in debugging
>>>>> this!  Issues like this are really hard to track down.
>>>>>
>>>>> I would think we would have heard about this earlier if the NVMe
>>>>> device were broken on all systems.  Maybe there's some connection with
>>>>> VMD?  From the non-working dmesg log in your bug report
>>>>> (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/+attachment/5879970/+files/dmesg-60.txt):
>>>>>
>>>>>
>>>>>     DMI: ASUSTeK COMPUTER INC. ESC8000 G4/Z11PG-D24 Series, BIOS 5501
>>>>> 04/17/2019
>>>>>     vmd 0000:d7:05.5: PCI host bridge to bus 10000:00
>>>>>     pci 10000:00:02.0: [8086:2032] type 01 class 0x060400 PCIe Root Port
>>>>>     pci 10000:00:02.0: PCI bridge to [bus 01]
>>>>>     pci 10000:00:02.0: bridge window [mem 0xf8000000-0xf81fffff]:
>>>>> assigned
>>>>>     pci 10000:01:00.0: [8086:0a54] type 00 class 0x010802 PCIe Endpoint
>>>>>     pci 10000:01:00.0: BAR 0 [mem 0x00000000-0x00003fff 64bit]
>>>>>
>>>>>     <I think vmd_enable_domain() calls pci_reset_bus() here>
>>>> Yes, and the pci_dev_wait() is called here. With the RRS polling, will
>>>> get a ~0 from PCI_VENDOR_ID, then will get 0xfffffff when configuring
>>>> the BAR0 subsequently. With the original polling method, it will get
>>>> enough delay in the pci_dev_wait(), so nvme works normally.
>>>>
>>>> The line "[   10.193589] hhhhhhhhhhhhhhhhhhhhhhhhhhhh dev->device = 0a54
>>>> id = ffffffff" is output from pci_dev_wait(), please refer to
>>>> https://launchpadlibrarian.net/798708446/LP2111521-dmesg-test9.txt
>>>>
>>>>>     pci 10000:01:00.0: BAR 0 [mem 0xf8010000-0xf8013fff 64bit]: assigned
>>>>>     pci 10000:01:00.0: BAR 0: error updating (high 0x00000000 !=
>>>>> 0xffffffff)
>>>>>     pci 10000:01:00.0: BAR 0 [mem 0xf8010000-0xf8013fff 64bit]: assigned
>>>>>     pci 10000:01:00.0: BAR 0: error updating (0xf8010004 != 0xffffffff)
>>>>>     nvme nvme0: pci function 10000:01:00.0
>>>>>     nvme 10000:01:00.0: enabling device (0000 -> 0002)
>>>>>
>>>>> Things I notice:
>>>>>
>>>>>     - The 10000:01:00.0 NVMe device is behind a VMD bridge
>>>>>
>>>>>     - We successfully read the Vendor & Device IDs (8086:0a54)
>>>>>
>>>>>     - The NVMe device is uninitialized.  We successfully sized the BAR,
>>>>>       which included successful config reads and writes.  The BAR
>>>>>       wasn't assigned by BIOS, which is normal since it's behind VMD.
>>>>>
>>>>>     - We allocated space for BAR 0 but the config writes to program the
>>>>>       BAR failed.  The read back from the BAR was 0xffffffff; probably a
>>>>>       PCIe error, e.g., the NVMe device didn't respond.
>>>>>
>>>>>     - The device *did* respond when nvme_probe() enabled it: the
>>>>>       "enabling device (0000 -> 0002)" means pci_enable_resources() read
>>>>>       PCI_COMMAND and got 0x0000.
>>>>>
>>>>>     - The dmesg from the working config doesn't include the "enabling
>>>>>       device" line, which suggests that pci_enable_resources() saw
>>>>>       PCI_COMMAND_MEMORY (0x0002) already set and didn't bother setting
>>>>>       it again.  I don't know why it would already be set.
>>>>>       d591f6804e7e really only changes pci_dev_wait(), which is used after
>>>>>       device resets.  I think vmd_enable_domain() resets the VMD Root Ports
>>>>>       after pci_scan_child_bus(), and maybe we're not waiting long enough
>>>>>       afterwards.
>>>>>
>>>>> My guess is that we got the ~0 because we did a config read too soon
>>>>> after reset and the device didn't respond.  The Root Port would time
>>>>> out, log an error, and synthesize ~0 data to complete the CPU read
>>>>> (see PCIe r6.0, sec 2.3.2 implementation note).
>>>>>
>>>>> It's *possible* that we waited long enough but the NVMe device is
>>>>> broken and didn't respond when it should have, but my money is on a
>>>>> software defect.
>>>>>
>>>>> There are a few pci_dbg() calls about these delays; can you set
>>>>> CONFIG_DYNAMIC_DEBUG=y and boot with dyndbg="file drivers/pci/* +p" to
>>>>> collect that output?  Please also collect the "sudo lspci -vv" output
>>>>> from a working system.
>>>> Already passed the testing request to bug reporters, wait for their
>>>> feedback.
>>>>
>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/55
>>> This is the dmesg with dyndbg="file drivers/pci/* +p"
>>>
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/56
>> Thank you very much!  I'm stumped.
>>
>> Both Ports here are capable of 8GT/s, and the Root Port is hot-plug
>> capable and supports Data Link Layer Link Active Reporting but not DRS:
>>
>>    10000:00:02.0 Intel Sky Lake-E PCI Express Root Port C
>>      LnkCap: Port #11, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <16us
>>              ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
>>      LnkSta: Speed 8GT/s, Width x4
>>              TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
>>      SltCap: AttnBtn- PwrCtrl- MRL- AttnInd+ PwrInd+ HotPlug+ Surprise+
>>      LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
>>
>>    10000:01:00.0 Intel NVMe Datacenter SSD
>>      LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s, Exit Latency L0s <64ns
>>
>> After a reset, we have to delay before doing config reads.  The wait
>> time requirements are in PCIe r6.0, sec 6.6.1:
>>
>>    • ... For cases where system software cannot determine that DRS is
>>      supported by the attached device, or by the Downstream Port above
>>      the attached device [as in this case]:
>>
>>      ◦ ...
>>
>>      ◦ With a Downstream Port that supports Link speeds greater than
>>        5.0 GT/s, software must wait a minimum of 100 ms after Link
>>        training completes before sending a Configuration Request to the
>>        device immediately below that Port. Software can determine when
>>        Link training completes by polling the Data Link Layer Link
>>        Active bit or by setting up an associated interrupt (see §
>>        Section 6.7.3.3). It is strongly recommended for software to
>>        use this mechanism whenever the Downstream Port supports it.
>>
>> Since the Port supports 8GT/s, we should wait 100ms after Link training
>> completes (PCI_EXP_LNKSTA_DLLLA becomes set), and based on the code
>> path below, I think we *do*:
>>
>>    pci_bridge_secondary_bus_reset
>>      pcibios_reset_secondary_bus
>>        pci_reset_secondary_bus
>>          # assert PCI_BRIDGE_CTL_BUS_RESET
>>      pci_bridge_wait_for_secondary_bus(bridge, "bus reset")
>>        pci_dbg("waiting %d ms for downstream link, after activation\n") # 10.86
>>        delay = pci_bus_max_d3cold_delay()    # default 100ms
>>        pcie_wait_for_link_delay(bridge, active=true, delay=100)
>>          pcie_wait_for_link_status(use_lt=false, active=true)
>>            end_jiffies = jiffies + PCIE_LINK_RETRAIN_TIMEOUT_MS
>>            do {
>>              pcie_capability_read_word(PCI_EXP_LNKSTA, &lnksta)
>>              if ((lnksta & PCI_EXP_LNKSTA_DLLLA) == PCI_EXP_LNKSTA_DLLLA)
>>                return 0
>>              msleep(1)                       # likely wait several ms for link active
>>            } while (time_before(jiffies, end_jiffies))
>>          if (active)                         # (true)
>>            msleep(delay)                     # wait 100ms here
>>        pci_dev_wait(child, "bus reset", PCIE_RESET_READY_POLL_MS - delay)
>>          delay = 1
>>          for (;;) {
>>            pci_read_config_dword(PCI_VENDOR_ID, &id)
>>            if (!pci_bus_rrs_vendor_id(id))   # got 0xffff, assume valid
>>              break;
>>          }
>>          pci_dbg("ready 0ms after bus reset", delay - 1)  # 11.11
>>
>> And from the dmesg log:
>>
>>    [   10.862226] pci 10000:00:02.0: waiting 100 ms for downstream link, after activation
>>    [   11.109581] pci 10000:01:00.0: ready 0ms after bus reset
>>
>> it looks like we waited about .25s (250ms) after the link came up
>> before trying to read the Vendor ID, which should be plenty.
>>
>> I guess maybe this device requires more than 250ms after reset before
>> it can respond?  That still seems surprising to me; I *assume* Intel
>> would have tested this for spec conformance.  But maybe there's some
>> erratum.  I added a couple Intel folks who are mentioned in the nvme
>> history in case they have pointers.
>>
>> But it *is* also true that pci_dev_wait() doesn't know how to handle
>> PCIe errors at all, and maybe there's a way to make it smarter.
>>
>> Can you try adding the patch below and collect another dmesg log (with
>> dyndbg="file drivers/pci/* +p" as before)?  This isn't a fix, but
>> might give a little more insight into what's happening.
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index e9448d55113b..42a36ff5c6cd 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -1268,6 +1268,7 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>>   	bool retrain = false;
>>   	struct pci_dev *root, *bridge;
>>   
>> +	pci_dbg(dev, "%s: %s timeout %d\n", __func__, reset_type, timeout);
>>   	root = pcie_find_root_port(dev);
>>   
>>   	if (pci_is_pcie(dev)) {
>> @@ -1305,14 +1306,32 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>>   
>>   		if (root && root->config_rrs_sv) {
>>   			pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
>> -			if (!pci_bus_rrs_vendor_id(id))
>> -				break;
>> +			pci_dbg(dev, "%s: vf %d read %#06x\n", __func__,
>> +				dev->is_virtfn, id);
>> +			if (pci_bus_rrs_vendor_id(id))
>> +				goto retry;
>> +
>> +			/*
>> +			 * We might read 0xffff if the device is a VF and
>> +			 * the read was successful (the VF Vendor ID is
>> +			 * 0xffff per spec).
>> +			 *
>> +			 * If the device is not a VF, 0xffff likely means
>> +			 * there was an error on PCIe.  E.g., maybe the
>> +			 * device couldn't even respond with RRS status,
>> +			 * and the RC timed out and synthesized ~0 data.
>> +			 */
>> +			if (PCI_POSSIBLE_ERROR(id) && !dev->is_virtfn)
>> +				    goto retry;
>> +
>> +			break;
>>   		} else {
>>   			pci_read_config_dword(dev, PCI_COMMAND, &id);
>>   			if (!PCI_POSSIBLE_ERROR(id))
>>   				break;
>>   		}
>>   
>> +retry:
>>   		if (delay > timeout) {
>>   			pci_warn(dev, "not ready %dms after %s; giving up\n",
>>   				 delay - 1, reset_type);
>> @@ -4760,6 +4779,8 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
>>   	 * Some controllers might not implement link active reporting. In this
>>   	 * case, we wait for 1000 ms + any delay requested by the caller.
>>   	 */
>> +	pci_dbg(pdev, "%s: active %d delay %d link_active_reporting %d\n",
>> +		__func__, active, delay, pdev->link_active_reporting);
>>   	if (!pdev->link_active_reporting) {
>>   		msleep(PCIE_LINK_RETRAIN_TIMEOUT_MS + delay);
>>   		return true;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-06-24  0:58           ` Hui Wang
@ 2025-07-01 23:23             ` Bjorn Helgaas
  2025-07-02  9:43               ` Hui Wang
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2025-07-01 23:23 UTC (permalink / raw)
  To: Hui Wang
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Nirmal Patel, Jonathan Derrick,
	Chaitanya Kumar Borah, Ville Syrjälä

On Tue, Jun 24, 2025 at 08:58:57AM +0800, Hui Wang wrote:
> Sorry for late response, I was OOO the past week.
> 
> This is the log after applied your patch:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/61
> 
> Looks like the "retry" makes the nvme work.

Thank you!  It seems like we get 0xffffffff (probably PCIe error) for
a long time after we think the device should be able to respond with
RRS.

I always thought the spec required that after the delays, a device
should respond with RRS if it's not ready, but now I guess I'm not
100% sure.  Maybe it's allowed to just do nothing, which would lead to
the Root Port timing out and logging an Unsupported Request error.

Can I trouble you to try the patch below?  I think we might have to
start explicitly checking for that error.  That probably would require
some setup to enable the error, check for it, and clear it.  I hacked
in some of that here, but ultimately some of it should go elsewhere.

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e9448d55113b..c276d0a2b522 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1264,10 +1264,13 @@ void pci_resume_bus(struct pci_bus *bus)
 
 static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 {
-	int delay = 1;
+	int delay = 10;
 	bool retrain = false;
 	struct pci_dev *root, *bridge;
+	u16 devctl, devsta;
 
+	pci_info(dev, "%s: VF%c %s timeout %d\n", __func__,
+		 dev->is_virtfn ? '+' : '-', reset_type, timeout);
 	root = pcie_find_root_port(dev);
 
 	if (pci_is_pcie(dev)) {
@@ -1276,6 +1279,19 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 			retrain = true;
 	}
 
+	if (root) {
+		pcie_capability_read_word(root, PCI_EXP_DEVCTL, &devctl);
+		if (!(devctl & PCI_EXP_DEVCTL_URRE))
+			pcie_capability_write_word(root, PCI_EXP_DEVCTL,
+					    devctl | PCI_EXP_DEVCTL_URRE);
+		pcie_capability_read_word(root, PCI_EXP_DEVSTA, &devsta);
+		if (devsta & PCI_EXP_DEVSTA_URD)
+			pcie_capability_write_word(root, PCI_EXP_DEVSTA,
+						   PCI_EXP_DEVSTA_URD);
+		pci_info(root, "%s: DEVCTL %#06x DEVSTA %#06x\n", __func__,
+			 devctl, devsta);
+	}
+
 	/*
 	 * The caller has already waited long enough after a reset that the
 	 * device should respond to config requests, but it may respond
@@ -1305,14 +1321,33 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 
 		if (root && root->config_rrs_sv) {
 			pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
-			if (!pci_bus_rrs_vendor_id(id))
-				break;
+
+			if (pci_bus_rrs_vendor_id(id)) {
+				pci_info(dev, "%s: read %#06x (RRS)\n",
+					 __func__, id);
+				goto retry;
+			}
+
+			if (PCI_POSSIBLE_ERROR(id)) {
+				pcie_capability_read_word(root, PCI_EXP_DEVSTA,
+							  &devsta);
+				if (devsta & PCI_EXP_DEVSTA_URD)
+					pcie_capability_write_word(root,
+							    PCI_EXP_DEVSTA,
+							    PCI_EXP_DEVSTA_URD);
+				pci_info(root, "%s: read %#06x DEVSTA %#06x\n",
+					 __func__, id, devsta);
+				goto retry;
+			}
+
+			break;
 		} else {
 			pci_read_config_dword(dev, PCI_COMMAND, &id);
 			if (!PCI_POSSIBLE_ERROR(id))
 				break;
 		}
 
+retry:
 		if (delay > timeout) {
 			pci_warn(dev, "not ready %dms after %s; giving up\n",
 				 delay - 1, reset_type);
@@ -1332,7 +1367,6 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 		}
 
 		msleep(delay);
-		delay *= 2;
 	}
 
 	if (delay > PCI_RESET_WAIT)
@@ -4670,8 +4704,10 @@ static int pcie_wait_for_link_status(struct pci_dev *pdev,
 	end_jiffies = jiffies + msecs_to_jiffies(PCIE_LINK_RETRAIN_TIMEOUT_MS);
 	do {
 		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta);
-		if ((lnksta & lnksta_mask) == lnksta_match)
+		if ((lnksta & lnksta_mask) == lnksta_match) {
+			pci_info(pdev, "%s: LNKSTA %#06x\n", __func__, lnksta);
 			return 0;
+		}
 		msleep(1);
 	} while (time_before(jiffies, end_jiffies));
 
@@ -4760,6 +4796,8 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
 	 * Some controllers might not implement link active reporting. In this
 	 * case, we wait for 1000 ms + any delay requested by the caller.
 	 */
+	pci_info(pdev, "%s: active %d delay %d link_active_reporting %d\n",
+		 __func__, active, delay, pdev->link_active_reporting);
 	if (!pdev->link_active_reporting) {
 		msleep(PCIE_LINK_RETRAIN_TIMEOUT_MS + delay);
 		return true;
@@ -4784,6 +4822,7 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
 			return false;
 
 		msleep(delay);
+		pci_info(pdev, "%s: waited %dms\n", __func__, delay);
 		return true;
 	}
 
@@ -4960,6 +4999,7 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
 
 	ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
 	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
+	pci_info(dev, "%s: PCI_BRIDGE_CTL_BUS_RESET deasserted\n", __func__);
 }
 
 void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-07-01 23:23             ` Bjorn Helgaas
@ 2025-07-02  9:43               ` Hui Wang
  2025-07-03  0:05                 ` Hui Wang
  0 siblings, 1 reply; 21+ messages in thread
From: Hui Wang @ 2025-07-02  9:43 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Nirmal Patel, Jonathan Derrick,
	Chaitanya Kumar Borah, Ville Syrjälä


On 7/2/25 07:23, Bjorn Helgaas wrote:
> On Tue, Jun 24, 2025 at 08:58:57AM +0800, Hui Wang wrote:
>> Sorry for late response, I was OOO the past week.
>>
>> This is the log after applied your patch:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/61
>>
>> Looks like the "retry" makes the nvme work.
> Thank you!  It seems like we get 0xffffffff (probably PCIe error) for
> a long time after we think the device should be able to respond with
> RRS.
>
> I always thought the spec required that after the delays, a device
> should respond with RRS if it's not ready, but now I guess I'm not
> 100% sure.  Maybe it's allowed to just do nothing, which would lead to
> the Root Port timing out and logging an Unsupported Request error.
>
> Can I trouble you to try the patch below?  I think we might have to
> start explicitly checking for that error.  That probably would require
> some setup to enable the error, check for it, and clear it.  I hacked
> in some of that here, but ultimately some of it should go elsewhere.

OK, built a testing kernel, wait for bug reporter to test it and collect 
the log.


> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e9448d55113b..c276d0a2b522 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1264,10 +1264,13 @@ void pci_resume_bus(struct pci_bus *bus)
>   
>   static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>   {
> -	int delay = 1;
> +	int delay = 10;
>   	bool retrain = false;
>   	struct pci_dev *root, *bridge;
> +	u16 devctl, devsta;
>   
> +	pci_info(dev, "%s: VF%c %s timeout %d\n", __func__,
> +		 dev->is_virtfn ? '+' : '-', reset_type, timeout);
>   	root = pcie_find_root_port(dev);
>   
>   	if (pci_is_pcie(dev)) {
> @@ -1276,6 +1279,19 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>   			retrain = true;
>   	}
>   
> +	if (root) {
> +		pcie_capability_read_word(root, PCI_EXP_DEVCTL, &devctl);
> +		if (!(devctl & PCI_EXP_DEVCTL_URRE))
> +			pcie_capability_write_word(root, PCI_EXP_DEVCTL,
> +					    devctl | PCI_EXP_DEVCTL_URRE);
> +		pcie_capability_read_word(root, PCI_EXP_DEVSTA, &devsta);
> +		if (devsta & PCI_EXP_DEVSTA_URD)
> +			pcie_capability_write_word(root, PCI_EXP_DEVSTA,
> +						   PCI_EXP_DEVSTA_URD);
> +		pci_info(root, "%s: DEVCTL %#06x DEVSTA %#06x\n", __func__,
> +			 devctl, devsta);
> +	}
> +
>   	/*
>   	 * The caller has already waited long enough after a reset that the
>   	 * device should respond to config requests, but it may respond
> @@ -1305,14 +1321,33 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>   
>   		if (root && root->config_rrs_sv) {
>   			pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> -			if (!pci_bus_rrs_vendor_id(id))
> -				break;
> +
> +			if (pci_bus_rrs_vendor_id(id)) {
> +				pci_info(dev, "%s: read %#06x (RRS)\n",
> +					 __func__, id);
> +				goto retry;
> +			}
> +
> +			if (PCI_POSSIBLE_ERROR(id)) {
> +				pcie_capability_read_word(root, PCI_EXP_DEVSTA,
> +							  &devsta);
> +				if (devsta & PCI_EXP_DEVSTA_URD)
> +					pcie_capability_write_word(root,
> +							    PCI_EXP_DEVSTA,
> +							    PCI_EXP_DEVSTA_URD);
> +				pci_info(root, "%s: read %#06x DEVSTA %#06x\n",
> +					 __func__, id, devsta);
> +				goto retry;
> +			}
> +
> +			break;
>   		} else {
>   			pci_read_config_dword(dev, PCI_COMMAND, &id);
>   			if (!PCI_POSSIBLE_ERROR(id))
>   				break;
>   		}
>   
> +retry:
>   		if (delay > timeout) {
>   			pci_warn(dev, "not ready %dms after %s; giving up\n",
>   				 delay - 1, reset_type);
> @@ -1332,7 +1367,6 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>   		}
>   
>   		msleep(delay);
> -		delay *= 2;
>   	}
>   
>   	if (delay > PCI_RESET_WAIT)
> @@ -4670,8 +4704,10 @@ static int pcie_wait_for_link_status(struct pci_dev *pdev,
>   	end_jiffies = jiffies + msecs_to_jiffies(PCIE_LINK_RETRAIN_TIMEOUT_MS);
>   	do {
>   		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta);
> -		if ((lnksta & lnksta_mask) == lnksta_match)
> +		if ((lnksta & lnksta_mask) == lnksta_match) {
> +			pci_info(pdev, "%s: LNKSTA %#06x\n", __func__, lnksta);
>   			return 0;
> +		}
>   		msleep(1);
>   	} while (time_before(jiffies, end_jiffies));
>   
> @@ -4760,6 +4796,8 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
>   	 * Some controllers might not implement link active reporting. In this
>   	 * case, we wait for 1000 ms + any delay requested by the caller.
>   	 */
> +	pci_info(pdev, "%s: active %d delay %d link_active_reporting %d\n",
> +		 __func__, active, delay, pdev->link_active_reporting);
>   	if (!pdev->link_active_reporting) {
>   		msleep(PCIE_LINK_RETRAIN_TIMEOUT_MS + delay);
>   		return true;
> @@ -4784,6 +4822,7 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
>   			return false;
>   
>   		msleep(delay);
> +		pci_info(pdev, "%s: waited %dms\n", __func__, delay);
>   		return true;
>   	}
>   
> @@ -4960,6 +4999,7 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
>   
>   	ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
>   	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
> +	pci_info(dev, "%s: PCI_BRIDGE_CTL_BUS_RESET deasserted\n", __func__);
>   }
>   
>   void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-07-02  9:43               ` Hui Wang
@ 2025-07-03  0:05                 ` Hui Wang
  2025-08-08  2:23                   ` Hui Wang
  2025-08-21 16:39                   ` Bjorn Helgaas
  0 siblings, 2 replies; 21+ messages in thread
From: Hui Wang @ 2025-07-03  0:05 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Nirmal Patel, Jonathan Derrick,
	Chaitanya Kumar Borah, Ville Syrjälä


On 7/2/25 17:43, Hui Wang wrote:
>
> On 7/2/25 07:23, Bjorn Helgaas wrote:
>> On Tue, Jun 24, 2025 at 08:58:57AM +0800, Hui Wang wrote:
>>> Sorry for late response, I was OOO the past week.
>>>
>>> This is the log after applied your patch:
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/61 
>>>
>>>
>>> Looks like the "retry" makes the nvme work.
>> Thank you!  It seems like we get 0xffffffff (probably PCIe error) for
>> a long time after we think the device should be able to respond with
>> RRS.
>>
>> I always thought the spec required that after the delays, a device
>> should respond with RRS if it's not ready, but now I guess I'm not
>> 100% sure.  Maybe it's allowed to just do nothing, which would lead to
>> the Root Port timing out and logging an Unsupported Request error.
>>
>> Can I trouble you to try the patch below?  I think we might have to
>> start explicitly checking for that error.  That probably would require
>> some setup to enable the error, check for it, and clear it.  I hacked
>> in some of that here, but ultimately some of it should go elsewhere.
>
> OK, built a testing kernel, wait for bug reporter to test it and 
> collect the log.
>
This is the testing result and log. 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/65


>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index e9448d55113b..c276d0a2b522 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -1264,10 +1264,13 @@ void pci_resume_bus(struct pci_bus *bus)
>>     static int pci_dev_wait(struct pci_dev *dev, char *reset_type, 
>> int timeout)
>>   {
>> -    int delay = 1;
>> +    int delay = 10;
>>       bool retrain = false;
>>       struct pci_dev *root, *bridge;
>> +    u16 devctl, devsta;
>>   +    pci_info(dev, "%s: VF%c %s timeout %d\n", __func__,
>> +         dev->is_virtfn ? '+' : '-', reset_type, timeout);
>>       root = pcie_find_root_port(dev);
>>         if (pci_is_pcie(dev)) {
>> @@ -1276,6 +1279,19 @@ static int pci_dev_wait(struct pci_dev *dev, 
>> char *reset_type, int timeout)
>>               retrain = true;
>>       }
>>   +    if (root) {
>> +        pcie_capability_read_word(root, PCI_EXP_DEVCTL, &devctl);
>> +        if (!(devctl & PCI_EXP_DEVCTL_URRE))
>> +            pcie_capability_write_word(root, PCI_EXP_DEVCTL,
>> +                        devctl | PCI_EXP_DEVCTL_URRE);
>> +        pcie_capability_read_word(root, PCI_EXP_DEVSTA, &devsta);
>> +        if (devsta & PCI_EXP_DEVSTA_URD)
>> +            pcie_capability_write_word(root, PCI_EXP_DEVSTA,
>> +                           PCI_EXP_DEVSTA_URD);
>> +        pci_info(root, "%s: DEVCTL %#06x DEVSTA %#06x\n", __func__,
>> +             devctl, devsta);
>> +    }
>> +
>>       /*
>>        * The caller has already waited long enough after a reset that 
>> the
>>        * device should respond to config requests, but it may respond
>> @@ -1305,14 +1321,33 @@ static int pci_dev_wait(struct pci_dev *dev, 
>> char *reset_type, int timeout)
>>             if (root && root->config_rrs_sv) {
>>               pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
>> -            if (!pci_bus_rrs_vendor_id(id))
>> -                break;
>> +
>> +            if (pci_bus_rrs_vendor_id(id)) {
>> +                pci_info(dev, "%s: read %#06x (RRS)\n",
>> +                     __func__, id);
>> +                goto retry;
>> +            }
>> +
>> +            if (PCI_POSSIBLE_ERROR(id)) {
>> +                pcie_capability_read_word(root, PCI_EXP_DEVSTA,
>> +                              &devsta);
>> +                if (devsta & PCI_EXP_DEVSTA_URD)
>> +                    pcie_capability_write_word(root,
>> +                                PCI_EXP_DEVSTA,
>> +                                PCI_EXP_DEVSTA_URD);
>> +                pci_info(root, "%s: read %#06x DEVSTA %#06x\n",
>> +                     __func__, id, devsta);
>> +                goto retry;
>> +            }
>> +
>> +            break;
>>           } else {
>>               pci_read_config_dword(dev, PCI_COMMAND, &id);
>>               if (!PCI_POSSIBLE_ERROR(id))
>>                   break;
>>           }
>>   +retry:
>>           if (delay > timeout) {
>>               pci_warn(dev, "not ready %dms after %s; giving up\n",
>>                    delay - 1, reset_type);
>> @@ -1332,7 +1367,6 @@ static int pci_dev_wait(struct pci_dev *dev, 
>> char *reset_type, int timeout)
>>           }
>>             msleep(delay);
>> -        delay *= 2;
>>       }
>>         if (delay > PCI_RESET_WAIT)
>> @@ -4670,8 +4704,10 @@ static int pcie_wait_for_link_status(struct 
>> pci_dev *pdev,
>>       end_jiffies = jiffies + 
>> msecs_to_jiffies(PCIE_LINK_RETRAIN_TIMEOUT_MS);
>>       do {
>>           pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta);
>> -        if ((lnksta & lnksta_mask) == lnksta_match)
>> +        if ((lnksta & lnksta_mask) == lnksta_match) {
>> +            pci_info(pdev, "%s: LNKSTA %#06x\n", __func__, lnksta);
>>               return 0;
>> +        }
>>           msleep(1);
>>       } while (time_before(jiffies, end_jiffies));
>>   @@ -4760,6 +4796,8 @@ static bool pcie_wait_for_link_delay(struct 
>> pci_dev *pdev, bool active,
>>        * Some controllers might not implement link active reporting. 
>> In this
>>        * case, we wait for 1000 ms + any delay requested by the caller.
>>        */
>> +    pci_info(pdev, "%s: active %d delay %d link_active_reporting %d\n",
>> +         __func__, active, delay, pdev->link_active_reporting);
>>       if (!pdev->link_active_reporting) {
>>           msleep(PCIE_LINK_RETRAIN_TIMEOUT_MS + delay);
>>           return true;
>> @@ -4784,6 +4822,7 @@ static bool pcie_wait_for_link_delay(struct 
>> pci_dev *pdev, bool active,
>>               return false;
>>             msleep(delay);
>> +        pci_info(pdev, "%s: waited %dms\n", __func__, delay);
>>           return true;
>>       }
>>   @@ -4960,6 +4999,7 @@ void pci_reset_secondary_bus(struct pci_dev 
>> *dev)
>>         ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
>>       pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
>> +    pci_info(dev, "%s: PCI_BRIDGE_CTL_BUS_RESET deasserted\n", 
>> __func__);
>>   }
>>     void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-07-03  0:05                 ` Hui Wang
@ 2025-08-08  2:23                   ` Hui Wang
  2025-08-11 23:04                     ` Bjorn Helgaas
  2025-08-21 16:39                   ` Bjorn Helgaas
  1 sibling, 1 reply; 21+ messages in thread
From: Hui Wang @ 2025-08-08  2:23 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Nirmal Patel, Jonathan Derrick,
	Chaitanya Kumar Borah, Ville Syrjälä, Hui Wang

Hi Bjorn,

Any progress on this issue, do we have a fix for this now? The ubuntu 
users are waiting for a fix :-).

Thanks,

Hui.


On 7/3/25 08:05, Hui Wang wrote:
>
> On 7/2/25 17:43, Hui Wang wrote:
>>
>> On 7/2/25 07:23, Bjorn Helgaas wrote:
>>> On Tue, Jun 24, 2025 at 08:58:57AM +0800, Hui Wang wrote:
>>>> Sorry for late response, I was OOO the past week.
>>>>
>>>> This is the log after applied your patch:
>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/61 
>>>>
>>>>
>>>> Looks like the "retry" makes the nvme work.
>>> Thank you!  It seems like we get 0xffffffff (probably PCIe error) for
>>> a long time after we think the device should be able to respond with
>>> RRS.
>>>
>>> I always thought the spec required that after the delays, a device
>>> should respond with RRS if it's not ready, but now I guess I'm not
>>> 100% sure.  Maybe it's allowed to just do nothing, which would lead to
>>> the Root Port timing out and logging an Unsupported Request error.
>>>
>>> Can I trouble you to try the patch below?  I think we might have to
>>> start explicitly checking for that error.  That probably would require
>>> some setup to enable the error, check for it, and clear it.  I hacked
>>> in some of that here, but ultimately some of it should go elsewhere.
>>
>> OK, built a testing kernel, wait for bug reporter to test it and 
>> collect the log.
>>
> This is the testing result and log. 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/65
>
>
>>
>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>> index e9448d55113b..c276d0a2b522 100644
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -1264,10 +1264,13 @@ void pci_resume_bus(struct pci_bus *bus)
>>>     static int pci_dev_wait(struct pci_dev *dev, char *reset_type, 
>>> int timeout)
>>>   {
>>> -    int delay = 1;
>>> +    int delay = 10;
>>>       bool retrain = false;
>>>       struct pci_dev *root, *bridge;
>>> +    u16 devctl, devsta;
>>>   +    pci_info(dev, "%s: VF%c %s timeout %d\n", __func__,
>>> +         dev->is_virtfn ? '+' : '-', reset_type, timeout);
>>>       root = pcie_find_root_port(dev);
>>>         if (pci_is_pcie(dev)) {
>>> @@ -1276,6 +1279,19 @@ static int pci_dev_wait(struct pci_dev *dev, 
>>> char *reset_type, int timeout)
>>>               retrain = true;
>>>       }
>>>   +    if (root) {
>>> +        pcie_capability_read_word(root, PCI_EXP_DEVCTL, &devctl);
>>> +        if (!(devctl & PCI_EXP_DEVCTL_URRE))
>>> +            pcie_capability_write_word(root, PCI_EXP_DEVCTL,
>>> +                        devctl | PCI_EXP_DEVCTL_URRE);
>>> +        pcie_capability_read_word(root, PCI_EXP_DEVSTA, &devsta);
>>> +        if (devsta & PCI_EXP_DEVSTA_URD)
>>> +            pcie_capability_write_word(root, PCI_EXP_DEVSTA,
>>> +                           PCI_EXP_DEVSTA_URD);
>>> +        pci_info(root, "%s: DEVCTL %#06x DEVSTA %#06x\n", __func__,
>>> +             devctl, devsta);
>>> +    }
>>> +
>>>       /*
>>>        * The caller has already waited long enough after a reset 
>>> that the
>>>        * device should respond to config requests, but it may respond
>>> @@ -1305,14 +1321,33 @@ static int pci_dev_wait(struct pci_dev *dev, 
>>> char *reset_type, int timeout)
>>>             if (root && root->config_rrs_sv) {
>>>               pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
>>> -            if (!pci_bus_rrs_vendor_id(id))
>>> -                break;
>>> +
>>> +            if (pci_bus_rrs_vendor_id(id)) {
>>> +                pci_info(dev, "%s: read %#06x (RRS)\n",
>>> +                     __func__, id);
>>> +                goto retry;
>>> +            }
>>> +
>>> +            if (PCI_POSSIBLE_ERROR(id)) {
>>> +                pcie_capability_read_word(root, PCI_EXP_DEVSTA,
>>> +                              &devsta);
>>> +                if (devsta & PCI_EXP_DEVSTA_URD)
>>> +                    pcie_capability_write_word(root,
>>> +                                PCI_EXP_DEVSTA,
>>> +                                PCI_EXP_DEVSTA_URD);
>>> +                pci_info(root, "%s: read %#06x DEVSTA %#06x\n",
>>> +                     __func__, id, devsta);
>>> +                goto retry;
>>> +            }
>>> +
>>> +            break;
>>>           } else {
>>>               pci_read_config_dword(dev, PCI_COMMAND, &id);
>>>               if (!PCI_POSSIBLE_ERROR(id))
>>>                   break;
>>>           }
>>>   +retry:
>>>           if (delay > timeout) {
>>>               pci_warn(dev, "not ready %dms after %s; giving up\n",
>>>                    delay - 1, reset_type);
>>> @@ -1332,7 +1367,6 @@ static int pci_dev_wait(struct pci_dev *dev, 
>>> char *reset_type, int timeout)
>>>           }
>>>             msleep(delay);
>>> -        delay *= 2;
>>>       }
>>>         if (delay > PCI_RESET_WAIT)
>>> @@ -4670,8 +4704,10 @@ static int pcie_wait_for_link_status(struct 
>>> pci_dev *pdev,
>>>       end_jiffies = jiffies + 
>>> msecs_to_jiffies(PCIE_LINK_RETRAIN_TIMEOUT_MS);
>>>       do {
>>>           pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta);
>>> -        if ((lnksta & lnksta_mask) == lnksta_match)
>>> +        if ((lnksta & lnksta_mask) == lnksta_match) {
>>> +            pci_info(pdev, "%s: LNKSTA %#06x\n", __func__, lnksta);
>>>               return 0;
>>> +        }
>>>           msleep(1);
>>>       } while (time_before(jiffies, end_jiffies));
>>>   @@ -4760,6 +4796,8 @@ static bool pcie_wait_for_link_delay(struct 
>>> pci_dev *pdev, bool active,
>>>        * Some controllers might not implement link active reporting. 
>>> In this
>>>        * case, we wait for 1000 ms + any delay requested by the caller.
>>>        */
>>> +    pci_info(pdev, "%s: active %d delay %d link_active_reporting 
>>> %d\n",
>>> +         __func__, active, delay, pdev->link_active_reporting);
>>>       if (!pdev->link_active_reporting) {
>>>           msleep(PCIE_LINK_RETRAIN_TIMEOUT_MS + delay);
>>>           return true;
>>> @@ -4784,6 +4822,7 @@ static bool pcie_wait_for_link_delay(struct 
>>> pci_dev *pdev, bool active,
>>>               return false;
>>>             msleep(delay);
>>> +        pci_info(pdev, "%s: waited %dms\n", __func__, delay);
>>>           return true;
>>>       }
>>>   @@ -4960,6 +4999,7 @@ void pci_reset_secondary_bus(struct pci_dev 
>>> *dev)
>>>         ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
>>>       pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
>>> +    pci_info(dev, "%s: PCI_BRIDGE_CTL_BUS_RESET deasserted\n", 
>>> __func__);
>>>   }
>>>     void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-08-08  2:23                   ` Hui Wang
@ 2025-08-11 23:04                     ` Bjorn Helgaas
  2025-09-11  9:24                       ` Vitaly Chikunov
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2025-08-11 23:04 UTC (permalink / raw)
  To: Hui Wang
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Nirmal Patel, Jonathan Derrick,
	Chaitanya Kumar Borah, Ville Syrjälä

On Fri, Aug 08, 2025 at 10:23:45AM +0800, Hui Wang wrote:
> Hi Bjorn,
> 
> Any progress on this issue, do we have a fix for this now? The
> ubuntu users are waiting for a fix :-).

Not yet, but thanks for the reminder.  Keep bugging me!

PCIe r7.0, sec 2.3.1, makes it clear that devices are permitted to
return RRS after FLR:

  ◦ For Configuration Requests only, if Device Readiness Status is not
    supported, following reset it is permitted for a Function to
    terminate the request and indicate that it is temporarily unable
    to process the Request, but will be able to process the Request in
    the future - in this case, the Request Retry Status (RRS)
    Completion Status must be used (see § Section 6.6). Valid reset
    conditions after which a device/Function is permitted to return
    RRS in response to a Configuration Request are:

    ▪ FLRs

    ...

But I am a little bit concerned because sec 2.3.2, which talks about
how a Root Complex handles that RRS and the RRS Software Visiblity
feature, says (note the "system reset" period):

  Root Complex handling of a Completion with Request Retry Status for
  a Configuration Request is implementation specific, except for the
  period following SYSTEM RESET (see § Section 6.6). For Root
  Complexes that support Configuration RRS Software Visibility, the
  following rules apply:

    ◦ If Configuration RRS Software Visibility is enabled:

      ▪ For a Configuration Read Request that includes both bytes of
	the Vendor ID field of a device Function's Configuration Space
	Header, the Root Complex must complete the Request to the host
	by returning a read-data value of 0001h for the Vendor ID
	field and all 1's for any additional bytes included in the
	request.

So I'm worried that the Software Visibility feature might work after
*system reset*, but not necessarily after an FLR.  That might make
sense because I don't think the RC can tell when we are doing an FLR
to a device.

It seems that after FLR, most RCs *do* make RRS visible via SV.  But
if we can't rely on that, I don't know how we're supposed to learn
when a device becomes ready.

Bjorn

> On 7/3/25 08:05, Hui Wang wrote:
> > On 7/2/25 17:43, Hui Wang wrote:
> > > On 7/2/25 07:23, Bjorn Helgaas wrote:
> > > > On Tue, Jun 24, 2025 at 08:58:57AM +0800, Hui Wang wrote:
> > > > > Sorry for late response, I was OOO the past week.
> > > > > 
> > > > > This is the log after applied your patch:
> > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/61
> > > > > 
> > > > > Looks like the "retry" makes the nvme work.
> > > >
> > > > Thank you!  It seems like we get 0xffffffff (probably PCIe
> > > > error) for a long time after we think the device should be
> > > > able to respond with RRS.
> > > > 
> > > > I always thought the spec required that after the delays, a
> > > > device should respond with RRS if it's not ready, but now I
> > > > guess I'm not 100% sure.  Maybe it's allowed to just do
> > > > nothing, which would lead to the Root Port timing out and
> > > > logging an Unsupported Request error.
> > > > 
> > > > Can I trouble you to try the patch below?  I think we might
> > > > have to start explicitly checking for that error.  That
> > > > probably would require some setup to enable the error, check
> > > > for it, and clear it.  I hacked in some of that here, but
> > > > ultimately some of it should go elsewhere.
> ...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-06-11 10:14 [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme Hui Wang
  2025-06-12 16:48 ` Bjorn Helgaas
@ 2025-08-14 15:55 ` Bjorn Helgaas
  1 sibling, 0 replies; 21+ messages in thread
From: Bjorn Helgaas @ 2025-08-14 15:55 UTC (permalink / raw)
  To: Hui Wang
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, regressions

On Wed, Jun 11, 2025 at 06:14:42PM +0800, Hui Wang wrote:
> Prior to commit d591f6804e7e ("PCI: Wait for device readiness with
> Configuration RRS"), this Intel nvme [8086:0a54] works well. Since
> that patch is merged to the kernel, this nvme stops working.
> 
> Through debugging, we found that commit introduces the RRS polling in
> the pci_dev_wait(), for this nvme, when polling the PCI_VENDOR_ID, it
> will return ~0 if the config access is not ready yet, but the polling
> expects a return value of 0x0001 or a valid vendor_id, so the RRS
> polling doesn't work for this nvme.
> 
> Here we add a pci quirk to disable the RRS polling for the device.
> 
> Fixes: d591f6804e7e ("PCI: Wait for device readiness with Configuration RRS")
> Link: https://bugs.launchpad.net/bugs/2111521
> Signed-off-by: Hui Wang <hui.wang@canonical.com>

To help keep this top of mind:

#regzbot introduced: d591f6804e7e ("PCI: Wait for device readiness with Configuration RRS")

> ---
>  drivers/pci/probe.c  |  3 ++-
>  drivers/pci/quirks.c | 18 ++++++++++++++++++
>  include/linux/pci.h  |  2 ++
>  3 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 4b8693ec9e4c..848fa0e6cf60 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1270,7 +1270,8 @@ static void pci_enable_rrs_sv(struct pci_dev *pdev)
>  	if (root_cap & PCI_EXP_RTCAP_RRS_SV) {
>  		pcie_capability_set_word(pdev, PCI_EXP_RTCTL,
>  					 PCI_EXP_RTCTL_RRS_SVE);
> -		pdev->config_rrs_sv = 1;
> +		if (!(pdev->dev_flags & PCI_DEV_FLAGS_NO_RRS_SV))
> +			pdev->config_rrs_sv = 1;
>  	}
>  }
>  
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index cf483d82572c..519e48ff6448 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -6336,3 +6336,21 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
>  #endif
> +
> +/*
> + * Although the root port device claims to support RRS, some devices don't work
> + * with RRS polling, when reading the Vendor ID, they just return ~0 if config
> + * access is not ready, this will break the pci_dev_wait(). Here disable the RRS
> + * forcibly for this type of device.
> + */
> +static void quirk_no_rrs_sv(struct pci_dev *dev)
> +{
> +	struct pci_dev *root;
> +
> +	root = pcie_find_root_port(dev);
> +	if (root) {
> +		root->dev_flags |= PCI_DEV_FLAGS_NO_RRS_SV;
> +		root->config_rrs_sv = 0;
> +	}
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0a54, quirk_no_rrs_sv);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 05e68f35f392..f4dd9ada12e4 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -247,6 +247,8 @@ enum pci_dev_flags {
>  	PCI_DEV_FLAGS_HAS_MSI_MASKING = (__force pci_dev_flags_t) (1 << 12),
>  	/* Device requires write to PCI_MSIX_ENTRY_DATA before any MSIX reads */
>  	PCI_DEV_FLAGS_MSIX_TOUCH_ENTRY_DATA_FIRST = (__force pci_dev_flags_t) (1 << 13),
> +	/* Do not use Configuration Request Retry Status polling in pci_dev_wait() */
> +	PCI_DEV_FLAGS_NO_RRS_SV = (__force pci_dev_flags_t) (1 << 14),
>  };
>  
>  enum pci_irq_reroute_variant {
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-07-03  0:05                 ` Hui Wang
  2025-08-08  2:23                   ` Hui Wang
@ 2025-08-21 16:39                   ` Bjorn Helgaas
  2025-10-08 16:53                     ` Bjorn Helgaas
  1 sibling, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2025-08-21 16:39 UTC (permalink / raw)
  To: Hui Wang
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Nirmal Patel, Jonathan Derrick,
	Chaitanya Kumar Borah, Ville Syrjälä

On Thu, Jul 03, 2025 at 08:05:05AM +0800, Hui Wang wrote:
> On 7/2/25 17:43, Hui Wang wrote:
> > On 7/2/25 07:23, Bjorn Helgaas wrote:
> > > On Tue, Jun 24, 2025 at 08:58:57AM +0800, Hui Wang wrote:
> > > > Sorry for late response, I was OOO the past week.
> > > > 
> > > > This is the log after applied your patch:
> > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/61
> > > > 
> > > > 
> > > > Looks like the "retry" makes the nvme work.
> > > Thank you!  It seems like we get 0xffffffff (probably PCIe error) for
> > > a long time after we think the device should be able to respond with
> > > RRS.
> > > 
> > > I always thought the spec required that after the delays, a device
> > > should respond with RRS if it's not ready, but now I guess I'm not
> > > 100% sure.  Maybe it's allowed to just do nothing, which would lead to
> > > the Root Port timing out and logging an Unsupported Request error.
> > > 
> > > Can I trouble you to try the patch below?  I think we might have to
> > > start explicitly checking for that error.  That probably would require
> > > some setup to enable the error, check for it, and clear it.  I hacked
> > > in some of that here, but ultimately some of it should go elsewhere.
> > 
> > OK, built a testing kernel, wait for bug reporter to test it and collect
> > the log.
> > 
> This is the testing result and log.
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/65

Thanks!  This looks like an Intel S2600WFT, and I assume it has a BMC
that maintains a System Event Log.  Any chance you check or keep that
log?

> > > @@ -1305,14 +1321,33 @@ static int pci_dev_wait(struct pci_dev *dev,
> > > char *reset_type, int timeout)
> > >             if (root && root->config_rrs_sv) {
> > >               pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> > > -            if (!pci_bus_rrs_vendor_id(id))
> > > -                break;
> > > +
> > > +            if (pci_bus_rrs_vendor_id(id)) {
> > > +                pci_info(dev, "%s: read %#06x (RRS)\n",
> > > +                     __func__, id);
> > > +                goto retry;
> > > +            }
> > > +
> > > +            if (PCI_POSSIBLE_ERROR(id)) {
> > > +                pcie_capability_read_word(root, PCI_EXP_DEVSTA,
> > > +                              &devsta);
> > > +                if (devsta & PCI_EXP_DEVSTA_URD)
> > > +                    pcie_capability_write_word(root,
> > > +                                PCI_EXP_DEVSTA,
> > > +                                PCI_EXP_DEVSTA_URD);
> > > +                pci_info(root, "%s: read %#06x DEVSTA %#06x\n",
> > > +                     __func__, id, devsta);

We're waiting for 01:00.0, and we're seeing the poll message for about
375 ms:

  [   10.334786] pci 10000:01:00.0: pci_dev_wait: VF- bus reset timeout 59900
  [   10.334792] pci 10000:00:02.0: pci_dev_wait: read 0xffffffff DEVSTA 0x0000
  ...
  [   10.708367] pci 10000:00:02.0: pci_dev_wait: read 0xffffffff DEVSTA 0x0000

The 00:02.0 Root Port has RRS enabled, but the config reads of the
01:00.0 Vendor ID did not return the RRS value (0x0001).  Instead,
they returned 0xffffffff, which typically means an error on PCIe.

If an error occurred, I think it *should* set one of the Error
Detected bits in the Device Status register, but we always see 0
there.

I think the platform enabled firmware-first error handling and
declined to give Linux control of AER, so I'm wondering if BIOS is
capturing and clearing those errors before Linux would see them, hence
my question about the SEL.

  [    6.565996] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
  [    6.702329] acpi PNP0A08:00: _OSC: platform does not support [SHPCHotplug AER LTR DPC]
  [    6.702463] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME PCIeCapability]

Even if this is the case and the SEL has error info, I don't know how
that would help us, other than maybe to understand why Linux doesn't
see the errors.

Bjorn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-08-11 23:04                     ` Bjorn Helgaas
@ 2025-09-11  9:24                       ` Vitaly Chikunov
  0 siblings, 0 replies; 21+ messages in thread
From: Vitaly Chikunov @ 2025-09-11  9:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Hui Wang, linux-pci, bhelgaas, raphael.norwitz, alay.shah,
	suresh.gumpula, ilpo.jarvinen, Nirmal Patel, Jonathan Derrick,
	Chaitanya Kumar Borah, Ville Syrjälä

Bjorn,

On Mon, Aug 11, 2025 at 06:04:45PM -0500, Bjorn Helgaas wrote:
> On Fri, Aug 08, 2025 at 10:23:45AM +0800, Hui Wang wrote:
> > Hi Bjorn,
> > 
> > Any progress on this issue, do we have a fix for this now? The
> > ubuntu users are waiting for a fix :-).
> 
> Not yet, but thanks for the reminder.  Keep bugging me!

Other distributions' users waiting for the fix too!

Thanks,

> 
> PCIe r7.0, sec 2.3.1, makes it clear that devices are permitted to
> return RRS after FLR:
> 
>   ◦ For Configuration Requests only, if Device Readiness Status is not
>     supported, following reset it is permitted for a Function to
>     terminate the request and indicate that it is temporarily unable
>     to process the Request, but will be able to process the Request in
>     the future - in this case, the Request Retry Status (RRS)
>     Completion Status must be used (see § Section 6.6). Valid reset
>     conditions after which a device/Function is permitted to return
>     RRS in response to a Configuration Request are:
> 
>     ▪ FLRs
> 
>     ...
> 
> But I am a little bit concerned because sec 2.3.2, which talks about
> how a Root Complex handles that RRS and the RRS Software Visiblity
> feature, says (note the "system reset" period):
> 
>   Root Complex handling of a Completion with Request Retry Status for
>   a Configuration Request is implementation specific, except for the
>   period following SYSTEM RESET (see § Section 6.6). For Root
>   Complexes that support Configuration RRS Software Visibility, the
>   following rules apply:
> 
>     ◦ If Configuration RRS Software Visibility is enabled:
> 
>       ▪ For a Configuration Read Request that includes both bytes of
> 	the Vendor ID field of a device Function's Configuration Space
> 	Header, the Root Complex must complete the Request to the host
> 	by returning a read-data value of 0001h for the Vendor ID
> 	field and all 1's for any additional bytes included in the
> 	request.
> 
> So I'm worried that the Software Visibility feature might work after
> *system reset*, but not necessarily after an FLR.  That might make
> sense because I don't think the RC can tell when we are doing an FLR
> to a device.
> 
> It seems that after FLR, most RCs *do* make RRS visible via SV.  But
> if we can't rely on that, I don't know how we're supposed to learn
> when a device becomes ready.
> 
> Bjorn
> 
> > On 7/3/25 08:05, Hui Wang wrote:
> > > On 7/2/25 17:43, Hui Wang wrote:
> > > > On 7/2/25 07:23, Bjorn Helgaas wrote:
> > > > > On Tue, Jun 24, 2025 at 08:58:57AM +0800, Hui Wang wrote:
> > > > > > Sorry for late response, I was OOO the past week.
> > > > > > 
> > > > > > This is the log after applied your patch:
> > > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/61
> > > > > > 
> > > > > > Looks like the "retry" makes the nvme work.
> > > > >
> > > > > Thank you!  It seems like we get 0xffffffff (probably PCIe
> > > > > error) for a long time after we think the device should be
> > > > > able to respond with RRS.
> > > > > 
> > > > > I always thought the spec required that after the delays, a
> > > > > device should respond with RRS if it's not ready, but now I
> > > > > guess I'm not 100% sure.  Maybe it's allowed to just do
> > > > > nothing, which would lead to the Root Port timing out and
> > > > > logging an Unsupported Request error.
> > > > > 
> > > > > Can I trouble you to try the patch below?  I think we might
> > > > > have to start explicitly checking for that error.  That
> > > > > probably would require some setup to enable the error, check
> > > > > for it, and clear it.  I hacked in some of that here, but
> > > > > ultimately some of it should go elsewhere.
> > ...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-08-21 16:39                   ` Bjorn Helgaas
@ 2025-10-08 16:53                     ` Bjorn Helgaas
  2026-01-06 13:30                       ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2025-10-08 16:53 UTC (permalink / raw)
  To: Hui Wang, Nirmal Patel, Jonathan Derrick
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Chaitanya Kumar Borah, Ville Syrjälä

Nirmal, Jonathan, can you confirm that when RRS SV is enabled for VMD
Root Ports, we should actually see the 0x0001 value when a device
downstream of VMD responds with RRS?  From the log below, it appears
that we actually get 0xffffffff when reading Device ID after a reset.

On Thu, Aug 21, 2025 at 11:39:36AM -0500, Bjorn Helgaas wrote:
> On Thu, Jul 03, 2025 at 08:05:05AM +0800, Hui Wang wrote:
> > On 7/2/25 17:43, Hui Wang wrote:
> > > On 7/2/25 07:23, Bjorn Helgaas wrote:
> ...

> > > > Thank you!  It seems like we get 0xffffffff (probably PCIe error) for
> > > > a long time after we think the device should be able to respond with
> > > > RRS.
> > > > 
> > > > I always thought the spec required that after the delays, a device
> > > > should respond with RRS if it's not ready, but now I guess I'm not
> > > > 100% sure.  Maybe it's allowed to just do nothing, which would lead to
> > > > the Root Port timing out and logging an Unsupported Request error.
> > > > 
> > > > Can I trouble you to try the patch below?  I think we might have to
> > > > start explicitly checking for that error.  That probably would require
> > > > some setup to enable the error, check for it, and clear it.  I hacked
> > > > in some of that here, but ultimately some of it should go elsewhere.
> ...

The patch is here:
https://lore.kernel.org/r/20250701232341.GA1859056@bhelgaas
and the log with that patch is here:

> > This is the testing result and log.
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/65

> We're waiting for 01:00.0, and we're seeing the poll message for about
> 375 ms:

[   10.177356] pci 10000:00:02.0: PCI bridge to [bus 01]
[   10.182278] pci 10000:01:00.0: [8086:0a54] type 00 class 0x010802 PCIe Endpoint
[   10.195247] pci 10000:00:02.0: pci_reset_secondary_bus: PCI_BRIDGE_CTL_BUS_RESET deasserted
[   10.195464] pci 10000:00:02.0: waiting 100 ms for downstream link, after activation
[   10.195467] pci 10000:00:02.0: pcie_wait_for_link_delay: active 1 delay 100 link_active_reporting 1
[   10.229269] pci 10000:00:02.0: pcie_wait_for_link_status: LNKSTA 0xb043
[   10.334784] pci 10000:00:02.0: pcie_wait_for_link_delay: waited 100ms

>   [   10.334786] pci 10000:01:00.0: pci_dev_wait: VF- bus reset timeout 59900
>   [   10.334792] pci 10000:00:02.0: pci_dev_wait: read 0xffffffff DEVSTA 0x0000
>   ...
>   [   10.708367] pci 10000:00:02.0: pci_dev_wait: read 0xffffffff DEVSTA 0x0000
> 
> The 00:02.0 Root Port has RRS SV enabled, but the config reads of the
> 01:00.0 Vendor ID did not return the RRS value (0x0001).  Instead,
> they returned 0xffffffff, which typically means an error on PCIe.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2025-10-08 16:53                     ` Bjorn Helgaas
@ 2026-01-06 13:30                       ` Linux regression tracking (Thorsten Leemhuis)
  2026-02-08 21:30                         ` Linux-Fan
  0 siblings, 1 reply; 21+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2026-01-06 13:30 UTC (permalink / raw)
  To: Bjorn Helgaas, Hui Wang, Nirmal Patel, Jonathan Derrick
  Cc: linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Chaitanya Kumar Borah, Ville Syrjälä

[top-posting to facilitate processing]

The issue discussed in this thread is still on my list of tracked
regression, which got me wondering what's the status here. Did it fall
through the cracks? Or was the regression maybe fixed already and I just
missed it?

Ciao, Thorsten

On 10/8/25 18:53, Bjorn Helgaas wrote:
> Nirmal, Jonathan, can you confirm that when RRS SV is enabled for VMD
> Root Ports, we should actually see the 0x0001 value when a device
> downstream of VMD responds with RRS?  From the log below, it appears
> that we actually get 0xffffffff when reading Device ID after a reset.
> 
> On Thu, Aug 21, 2025 at 11:39:36AM -0500, Bjorn Helgaas wrote:
>> On Thu, Jul 03, 2025 at 08:05:05AM +0800, Hui Wang wrote:
>>> On 7/2/25 17:43, Hui Wang wrote:
>>>> On 7/2/25 07:23, Bjorn Helgaas wrote:
>> ...
> 
>>>>> Thank you!  It seems like we get 0xffffffff (probably PCIe error) for
>>>>> a long time after we think the device should be able to respond with
>>>>> RRS.
>>>>>
>>>>> I always thought the spec required that after the delays, a device
>>>>> should respond with RRS if it's not ready, but now I guess I'm not
>>>>> 100% sure.  Maybe it's allowed to just do nothing, which would lead to
>>>>> the Root Port timing out and logging an Unsupported Request error.
>>>>>
>>>>> Can I trouble you to try the patch below?  I think we might have to
>>>>> start explicitly checking for that error.  That probably would require
>>>>> some setup to enable the error, check for it, and clear it.  I hacked
>>>>> in some of that here, but ultimately some of it should go elsewhere.
>> ...
> 
> The patch is here:
> https://lore.kernel.org/r/20250701232341.GA1859056@bhelgaas
> and the log with that patch is here:
> 
>>> This is the testing result and log.
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/65
> 
>> We're waiting for 01:00.0, and we're seeing the poll message for about
>> 375 ms:
> 
> [   10.177356] pci 10000:00:02.0: PCI bridge to [bus 01]
> [   10.182278] pci 10000:01:00.0: [8086:0a54] type 00 class 0x010802 PCIe Endpoint
> [   10.195247] pci 10000:00:02.0: pci_reset_secondary_bus: PCI_BRIDGE_CTL_BUS_RESET deasserted
> [   10.195464] pci 10000:00:02.0: waiting 100 ms for downstream link, after activation
> [   10.195467] pci 10000:00:02.0: pcie_wait_for_link_delay: active 1 delay 100 link_active_reporting 1
> [   10.229269] pci 10000:00:02.0: pcie_wait_for_link_status: LNKSTA 0xb043
> [   10.334784] pci 10000:00:02.0: pcie_wait_for_link_delay: waited 100ms
> 
>>   [   10.334786] pci 10000:01:00.0: pci_dev_wait: VF- bus reset timeout 59900
>>   [   10.334792] pci 10000:00:02.0: pci_dev_wait: read 0xffffffff DEVSTA 0x0000
>>   ...
>>   [   10.708367] pci 10000:00:02.0: pci_dev_wait: read 0xffffffff DEVSTA 0x0000
>>
>> The 00:02.0 Root Port has RRS SV enabled, but the config reads of the
>> 01:00.0 Vendor ID did not return the RRS value (0x0001).  Instead,
>> they returned 0xffffffff, which typically means an error on PCIe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2026-01-06 13:30                       ` Linux regression tracking (Thorsten Leemhuis)
@ 2026-02-08 21:30                         ` Linux-Fan
  2026-02-09  9:37                           ` Thorsten Leemhuis
  2026-02-09 16:34                           ` Bjorn Helgaas
  0 siblings, 2 replies; 21+ messages in thread
From: Linux-Fan @ 2026-02-08 21:30 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Bjorn Helgaas, Hui Wang, Nirmal Patel, Jonathan Derrick,
	linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Chaitanya Kumar Borah, Ville Syrjälä

Linux regression tracking (Thorsten Leemhuis) writes:

> [top-posting to facilitate processing]
>
> The issue discussed in this thread is still on my list of tracked
> regression, which got me wondering what's the status here. Did it fall
> through the cracks? Or was the regression maybe fixed already and I just
> missed it?

Hello Thorsten,

I saw the regression was closed here:
<https://lore.kernel.org/regressions/e03d4af5-5cdb-4bf4-a568-06b9213247a0@leemhuis.info/>

I am one of the users affected by this issue and are currently running an  
old Linux version (Linux 6.1 from Debian oldstable) to work around it. As a  
user I would very much appreciate an upstream fix.

The Debian bug is still open
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1112238> whereas I think  
Ubuntu got a patch working around the issue for their users:
<https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521>

Is there a chance to possibly include the workaround in the upstream?

As the issue is reproducible on my system, I should be able to test some  
patches if necessary. Tests are not easy to perform because it degrades the  
RAID system and I would like to avoid damaging my “production” data hence  
limit the tests to the required minimum.

But if there is some real doubt that the issue is still present, tell me and  
I check it again with a more recent kernel version?

Thanks in advance and Kind regards
Linux-Fan

> On 10/8/25 18:53, Bjorn Helgaas wrote:
> > Nirmal, Jonathan, can you confirm that when RRS SV is enabled for VMD
> > Root Ports, we should actually see the 0x0001 value when a device
> > downstream of VMD responds with RRS?  From the log below, it appears
> > that we actually get 0xffffffff when reading Device ID after a reset.
> >
> > On Thu, Aug 21, 2025 at 11:39:36AM -0500, Bjorn Helgaas wrote:
> >> On Thu, Jul 03, 2025 at 08:05:05AM +0800, Hui Wang wrote:
> >>> On 7/2/25 17:43, Hui Wang wrote:

[...]

> > The patch is here:
> > https://lore.kernel.org/r/20250701232341.GA1859056@bhelgaas
> > and the log with that patch is here:

[...]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2026-02-08 21:30                         ` Linux-Fan
@ 2026-02-09  9:37                           ` Thorsten Leemhuis
  2026-02-09 16:34                           ` Bjorn Helgaas
  1 sibling, 0 replies; 21+ messages in thread
From: Thorsten Leemhuis @ 2026-02-09  9:37 UTC (permalink / raw)
  To: Linux-Fan, Linux regressions mailing list
  Cc: Bjorn Helgaas, Hui Wang, Nirmal Patel, Jonathan Derrick,
	linux-pci, bhelgaas, raphael.norwitz, alay.shah, suresh.gumpula,
	ilpo.jarvinen, Chaitanya Kumar Borah, Ville Syrjälä

On 2/8/26 22:30, Linux-Fan wrote:
> Linux regression tracking (Thorsten Leemhuis) writes:
> 
>> [top-posting to facilitate processing]
>>
>> The issue discussed in this thread is still on my list of tracked
>> regression, which got me wondering what's the status here. Did it fall
>> through the cracks? Or was the regression maybe fixed already and I just
>> missed it?
> 
> Hello Thorsten,
> 
> I saw the regression was closed here:
> <https://lore.kernel.org/regressions/e03d4af5-5cdb-4bf4-
> a568-06b9213247a0@leemhuis.info/>

That is not much relevant, what is relevant is the bug report and the
activity, in this case this thread.

> I am one of the users affected by this issue and are currently running
> an old Linux version (Linux 6.1 from Debian oldstable) to work around
> it. As a user I would very much appreciate an upstream fix.
> 
> The Debian bug is still open
> <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1112238> whereas I
> think Ubuntu got a patch working around the issue for their users:
> <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521>
> 
> Is there a chance to possibly include the workaround in the upstream?

The workaround: maybe. But getting it fixed is what you want, and the
workaround might be the wrong solution for that. That's up for the
developers of the affected area to decide.

But the thing is, this thread got messy and doesn't work on its own
anymore (it should ideally answer questions like "What does the
workaround look like" without providing a URL, as developers are busy
and might not follow it; any by now devs likely have totally forgotten
what this is about, so a kind of re-start might be needed).

But maybe we are lucky, and some developers of this area will reply
within the next two or three days.

If not, I'd suggest checking if 6.19 vanilla is still affected (fixing
6.12.y comes later, so let's ignore that for now). If it is, check if
the workaround from Ubuntu fixes it there. It might even make sense to
check if a revert of the culprit fixes it, too. Then report your
findings and sum up the situation briefly (somewhat similar to a new bug
report; people likely have forgotten what this thread is about). That
should work as a good base to restart things here.

Ciao, Thorsten

> As the issue is reproducible on my system, I should be able to test some
> patches if necessary. Tests are not easy to perform because it degrades
> the RAID system and I would like to avoid damaging my “production” data
> hence limit the tests to the required minimum.
> 
> But if there is some real doubt that the issue is still present, tell me
> and I check it again with a more recent kernel version?
> 
> Thanks in advance and Kind regards
> Linux-Fan
> 
>> On 10/8/25 18:53, Bjorn Helgaas wrote:
>> > Nirmal, Jonathan, can you confirm that when RRS SV is enabled for VMD
>> > Root Ports, we should actually see the 0x0001 value when a device
>> > downstream of VMD responds with RRS?  From the log below, it appears
>> > that we actually get 0xffffffff when reading Device ID after a reset.
>> >
>> > On Thu, Aug 21, 2025 at 11:39:36AM -0500, Bjorn Helgaas wrote:
>> >> On Thu, Jul 03, 2025 at 08:05:05AM +0800, Hui Wang wrote:
>> >>> On 7/2/25 17:43, Hui Wang wrote:
> 
> [...]
> 
>> > The patch is here:
>> > https://lore.kernel.org/r/20250701232341.GA1859056@bhelgaas
>> > and the log with that patch is here:
> 
> [...]
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2026-02-08 21:30                         ` Linux-Fan
  2026-02-09  9:37                           ` Thorsten Leemhuis
@ 2026-02-09 16:34                           ` Bjorn Helgaas
  2026-02-13 19:37                             ` Linux-Fan
  1 sibling, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2026-02-09 16:34 UTC (permalink / raw)
  To: Linux-Fan
  Cc: Linux regressions mailing list, Hui Wang, Nirmal Patel,
	Jonathan Derrick, linux-pci, bhelgaas, raphael.norwitz, alay.shah,
	suresh.gumpula, ilpo.jarvinen, Chaitanya Kumar Borah,
	Ville Syrjälä

On Sun, Feb 08, 2026 at 10:30:39PM +0100, Linux-Fan wrote:
> Linux regression tracking (Thorsten Leemhuis) writes:
> > The issue discussed in this thread is still on my list of tracked
> > regression, which got me wondering what's the status here. Did it
> > fall through the cracks? Or was the regression maybe fixed already
> > and I just missed it?

> I am one of the users affected by this issue and are currently
> running an old Linux version (Linux 6.1 from Debian oldstable) to
> work around it. As a user I would very much appreciate an upstream
> fix.
> 
> The Debian bug is still open
> <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1112238> whereas
> I think Ubuntu got a patch working around the issue for their users:
> <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521>
> 
> Is there a chance to possibly include the workaround in the
> upstream?
> 
> As the issue is reproducible on my system, I should be able to test
> some patches if necessary. Tests are not easy to perform because it
> degrades the RAID system and I would like to avoid damaging my
> “production” data hence limit the tests to the required minimum.
> 
> But if there is some real doubt that the issue is still present,
> tell me and I check it again with a more recent kernel version?

As far as I know this issue still exists, and it has unfortunately
fallen through the cracks, mostly because I don't have a good idea how
to debug it.

Thanks for keeping it alive.  I'll try to come up with some ideas.

> > On 10/8/25 18:53, Bjorn Helgaas wrote:
> > > Nirmal, Jonathan, can you confirm that when RRS SV is enabled for VMD
> > > Root Ports, we should actually see the 0x0001 value when a device
> > > downstream of VMD responds with RRS?  From the log below, it appears
> > > that we actually get 0xffffffff when reading Device ID after a reset.
> > >
> > > On Thu, Aug 21, 2025 at 11:39:36AM -0500, Bjorn Helgaas wrote:
> > >> On Thu, Jul 03, 2025 at 08:05:05AM +0800, Hui Wang wrote:
> > >>> On 7/2/25 17:43, Hui Wang wrote:
> 
> [...]
> 
> > > The patch is here:
> > > https://lore.kernel.org/r/20250701232341.GA1859056@bhelgaas
> > > and the log with that patch is here:
> 
> [...]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
  2026-02-09 16:34                           ` Bjorn Helgaas
@ 2026-02-13 19:37                             ` Linux-Fan
  0 siblings, 0 replies; 21+ messages in thread
From: Linux-Fan @ 2026-02-13 19:37 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Linux regressions mailing list, Hui Wang, Nirmal Patel,
	Jonathan Derrick, linux-pci, bhelgaas, raphael.norwitz, alay.shah,
	suresh.gumpula, ilpo.jarvinen, Chaitanya Kumar Borah,
	Ville Syrjälä

[-- Attachment #1: Type: text/plain, Size: 3486 bytes --]

Bjorn Helgaas writes:

> On Sun, Feb 08, 2026 at 10:30:39PM +0100, Linux-Fan wrote:

[...]

> > As the issue is reproducible on my system, I should be able to test
> > some patches if necessary. Tests are not easy to perform because it
> > degrades the RAID system and I would like to avoid damaging my
> > “production” data hence limit the tests to the required minimum.
> >
> > But if there is some real doubt that the issue is still present,
> > tell me and I check it again with a more recent kernel version?
>
> As far as I know this issue still exists, and it has unfortunately
> fallen through the cracks, mostly because I don't have a good idea how
> to debug it.
>
> Thanks for keeping it alive.  I'll try to come up with some ideas.

[...]

In the meantime, I would like to follow up on the suggestion by Thorsten to  
summarize the state of the issue. I hope it is not too long-winded...

On 2025-05-22, an Ubuntu user reported that after updating the Linux kernel,  
their NVME SSDs (Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock  
Controller] [8086:0a54]) were no longer detected [1].

It was bisected to be caused by
d591f6804e ("PCI: Wait for device readiness with Configuration RRS").

I encountered this issue upon upgrading to Debian Trixie (Linux 6.12.41) and  
for me the issue manifests as only one of two Intel P4510 SSDs being  
recognized, causing RAID arrays to be degraded [2].

I think the most recent state of the error analysis starts here [5].
I cannot summarize it since I know too little about the matter.

During the work on the issue, multiple patches were devised:

 * One was implemented by Ubuntu as a workaround.
   I attach it as `ubuntu.patch`.

 * One was suggested on this very thread for debugging [3].
   I attach it as `patch.patch`.

I have performed additional testing as suggested by Thorsten [4] 

 * Tested with vanilla Linux 6.19 commit 05f7e89ab9 ("Linux 6.19").
   I call it “standard” in my tests below. It recognizes only
   one of two SSDs on my system (issue still present)

 * Tested with reverting the culprit. The revert produced some
   conflicts. I tried to resolve the conflicts (it may be wrong...)
   and have attached the outcome as `revert.patch`.

I applied each of the patches on top of the 6.19 “standard” checkout
(undoing previous patches for the next test of course) and gave the builds  
different names.

The outcome of the test is as follows:

 * standard - FAILS - issue is still present on Linux 6.19 per my testing
 * ubuntu   - WORKS - workaround patch applies and bypasses the issue
 * patch    - WORKS - suggested debugging patch applies and bypasses the issue
 * revert   - WORKS - my experimental revert commit bypasses the issue 

In case it matters I have collected `dmesg` output of all four test runs,  
attached as `dmesgs.tar.gz`.

In case I could help with testing some additional patches etc. I have now  
setup a separate system (on a CF card) that I can boot as an alternative to  
the main system and which leaves the RAID arrays alone, avoiding the RAID  
degradation on testing.

Kind regards
Linux-Fan

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1112238
[3] https://lore.kernel.org/linux-pci/20250701232341.GA1859056@bhelgaas/
[4] https://lore.kernel.org/linux-pci/94b6639b-d1f4-4d03-8ec2-3b24d438bb9b@leemhuis.info/
[5] https://lore.kernel.org/linux-pci/20250821163936.GA681451@bhelgaas/

[-- Attachment #2: patch.patch --]
[-- Type: text/x-diff, Size: 3917 bytes --]

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 13dbb405dc31..21babff6afca 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1208,10 +1208,13 @@ void pci_resume_bus(struct pci_bus *bus)
 
 static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 {
-	int delay = 1;
+	int delay = 10;
 	bool retrain = false;
 	struct pci_dev *root, *bridge;
+	u16 devctl, devsta;
 
+	pci_info(dev, "%s: VF%c %s timeout %d\n", __func__,
+		 dev->is_virtfn ? '+' : '-', reset_type, timeout);
 	root = pcie_find_root_port(dev);
 
 	if (pci_is_pcie(dev)) {
@@ -1220,6 +1223,19 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 			retrain = true;
 	}
 
+	if (root) {
+		pcie_capability_read_word(root, PCI_EXP_DEVCTL, &devctl);
+		if (!(devctl & PCI_EXP_DEVCTL_URRE))
+			pcie_capability_write_word(root, PCI_EXP_DEVCTL,
+					    devctl | PCI_EXP_DEVCTL_URRE);
+		pcie_capability_read_word(root, PCI_EXP_DEVSTA, &devsta);
+		if (devsta & PCI_EXP_DEVSTA_URD)
+			pcie_capability_write_word(root, PCI_EXP_DEVSTA,
+						   PCI_EXP_DEVSTA_URD);
+		pci_info(root, "%s: DEVCTL %#06x DEVSTA %#06x\n", __func__,
+			 devctl, devsta);
+	}
+
 	/*
 	 * The caller has already waited long enough after a reset that the
 	 * device should respond to config requests, but it may respond
@@ -1249,14 +1265,33 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 
 		if (root && root->config_rrs_sv) {
 			pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
-			if (!pci_bus_rrs_vendor_id(id))
-				break;
+
+			if (pci_bus_rrs_vendor_id(id)) {
+				pci_info(dev, "%s: read %#06x (RRS)\n",
+					 __func__, id);
+				goto retry;
+			}
+
+			if (PCI_POSSIBLE_ERROR(id)) {
+				pcie_capability_read_word(root, PCI_EXP_DEVSTA,
+							  &devsta);
+				if (devsta & PCI_EXP_DEVSTA_URD)
+					pcie_capability_write_word(root,
+							    PCI_EXP_DEVSTA,
+							    PCI_EXP_DEVSTA_URD);
+				pci_info(root, "%s: read %#06x DEVSTA %#06x\n",
+					 __func__, id, devsta);
+				goto retry;
+			}
+
+			break;
 		} else {
 			pci_read_config_dword(dev, PCI_COMMAND, &id);
 			if (!PCI_POSSIBLE_ERROR(id))
 				break;
 		}
 
+retry:
 		if (delay > timeout) {
 			pci_warn(dev, "not ready %dms after %s; giving up\n",
 				 delay - 1, reset_type);
@@ -1276,7 +1311,6 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 		}
 
 		msleep(delay);
-		delay *= 2;
 	}
 
 	if (delay > PCI_RESET_WAIT)
@@ -4482,8 +4516,10 @@ static int pcie_wait_for_link_status(struct pci_dev *pdev,
 	end_jiffies = jiffies + msecs_to_jiffies(PCIE_LINK_RETRAIN_TIMEOUT_MS);
 	do {
 		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta);
-		if ((lnksta & lnksta_mask) == lnksta_match)
+		if ((lnksta & lnksta_mask) == lnksta_match) {
+			pci_info(pdev, "%s: LNKSTA %#06x\n", __func__, lnksta);
 			return 0;
+		}
 		msleep(1);
 	} while (time_before(jiffies, end_jiffies));
 
@@ -4572,6 +4608,8 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
 	 * Some controllers might not implement link active reporting. In this
 	 * case, we wait for 1000 ms + any delay requested by the caller.
 	 */
+	pci_info(pdev, "%s: active %d delay %d link_active_reporting %d\n",
+		 __func__, active, delay, pdev->link_active_reporting);
 	if (!pdev->link_active_reporting) {
 		msleep(PCIE_LINK_RETRAIN_TIMEOUT_MS + delay);
 		return true;
@@ -4596,6 +4634,7 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
 			return false;
 
 		msleep(delay);
+		pci_info(pdev, "%s: waited %dms\n", __func__, delay);
 		return true;
 	}
 
@@ -4772,6 +4811,7 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
 
 	ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
 	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
+	pci_info(dev, "%s: PCI_BRIDGE_CTL_BUS_RESET deasserted\n", __func__);
 }
 
 void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)

[-- Attachment #3: ubuntu.patch --]
[-- Type: text/x-diff, Size: 3253 bytes --]

commit 2eecca1afcfae786128a698b07320c3ef74ee477
Author: Hui Wang <hui.wang@canonical.com>
Date:   Tue Aug 19 16:21:23 2025 +0800

    UBUNTU: SAUCE: PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme
    
    BugLink: https://bugs.launchpad.net/bugs/2111521
    
    Prior to commit d591f6804e7e ("PCI: Wait for device readiness with
    Configuration RRS"), this Intel nvme [8086:0a54] works well. Since
    that patch is merged to the kernel, this nvme stops working.
    
    Through debugging, we found that commit introduces the RRS polling in
    the pci_dev_wait(), for this nvme, when polling the PCI_VENDOR_ID, it
    will return ~0 if the config access is not ready yet, but the polling
    expects a return value of 0x0001 or a valid vendor_id, so the RRS
    polling doesn't work for this nvme.
    
    Here we add a pci quirk to disable the RRS polling for the device.
    
    Fixes: d591f6804e7e ("PCI: Wait for device readiness with Configuration RRS")
    Signed-off-by: Hui Wang <hui.wang@canonical.com>
    Acked-by: Zixing Liu <zixing.liu@canonical.com>
    Acked-by: Wen-chien Jesse Sung <jesse.sung@canonical.com>
    Signed-off-by: Timo Aaltonen <timo.aaltonen@canonical.com>

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index f41128f91ca7..fd6cb1400f3e 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1270,7 +1270,8 @@ static void pci_enable_rrs_sv(struct pci_dev *pdev)
 	if (root_cap & PCI_EXP_RTCAP_RRS_SV) {
 		pcie_capability_set_word(pdev, PCI_EXP_RTCTL,
 					 PCI_EXP_RTCTL_RRS_SVE);
-		pdev->config_rrs_sv = 1;
+		if (!(pdev->dev_flags & PCI_DEV_FLAGS_NO_RRS_SV))
+			pdev->config_rrs_sv = 1;
 	}
 }
 
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 767631b3771d..eadce5a8caab 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6353,3 +6353,21 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
 #endif
+
+/*
+ * Although the root port device claims to support RRS, some devices don't work
+ * with RRS polling, when reading the Vendor ID, they just return ~0 if config
+ * access is not ready, this will break the pci_dev_wait(). Here disable the RRS
+ * forcibly for this type of device.
+ */
+static void quirk_no_rrs_sv(struct pci_dev *dev)
+{
+	struct pci_dev *root;
+
+	root = pcie_find_root_port(dev);
+	if (root) {
+		root->dev_flags |= PCI_DEV_FLAGS_NO_RRS_SV;
+		root->config_rrs_sv = 0;
+	}
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0a54, quirk_no_rrs_sv);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 05e85adf7605..fa845398e84d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -247,6 +247,8 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_HAS_MSI_MASKING = (__force pci_dev_flags_t) (1 << 12),
 	/* Device requires write to PCI_MSIX_ENTRY_DATA before any MSIX reads */
 	PCI_DEV_FLAGS_MSIX_TOUCH_ENTRY_DATA_FIRST = (__force pci_dev_flags_t) (1 << 13),
+	/* Do not use Configuration Request Retry Status polling in pci_dev_wait() */
+	PCI_DEV_FLAGS_NO_RRS_SV = (__force pci_dev_flags_t) (1 << 14),
 };
 
 enum pci_irq_reroute_variant {

[-- Attachment #4: revert.patch --]
[-- Type: text/x-diff, Size: 4361 bytes --]

From ea7e9b43b8bdc4633bb264e0bbf7fff8af640ec6 Mon Sep 17 00:00:00 2001
From: Linux-Fan <linux-fan@vm-debian-mdvl-sid.masysma.org>
Date: Thu, 12 Feb 2026 21:01:32 +0100
Subject: [PATCH] Revert "PCI: Wait for device readiness with Configuration
 RRS"

MA TEST

This reverts commit d591f6804e7e1310881c9224d72247a2b65039af.
---
 drivers/pci/pci.c   | 41 +++++++++++++----------------------------
 drivers/pci/probe.c |  4 +---
 include/linux/pci.h |  1 -
 3 files changed, 14 insertions(+), 32 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 13dbb405dc31..aea72367f129 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1210,9 +1210,7 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 {
 	int delay = 1;
 	bool retrain = false;
-	struct pci_dev *root, *bridge;
-
-	root = pcie_find_root_port(dev);
+	struct pci_dev *bridge;
 
 	if (pci_is_pcie(dev)) {
 		bridge = pci_upstream_bridge(dev);
@@ -1221,23 +1219,16 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 	}
 
 	/*
-	 * The caller has already waited long enough after a reset that the
-	 * device should respond to config requests, but it may respond
-	 * with Request Retry Status (RRS) if it needs more time to
-	 * initialize.
-	 *
-	 * If the device is below a Root Port with Configuration RRS
-	 * Software Visibility enabled, reading the Vendor ID returns a
-	 * special data value if the device responded with RRS.  Read the
-	 * Vendor ID until we get non-RRS status.
+	 * After reset, the device should not silently discard config
+	 * requests, but it may still indicate that it needs more time by
+	 * responding to them with CRS completions.  The Root Port will
+	 * generally synthesize ~0 (PCI_ERROR_RESPONSE) data to complete
+	 * the read (except when CRS SV is enabled and the read was for the
+	 * Vendor ID; in that case it synthesizes 0x0001 data).
 	 *
-	 * If there's no Root Port or Configuration RRS Software Visibility
-	 * is not enabled, the device may still respond with RRS, but
-	 * hardware may retry the config request.  If no retries receive
-	 * Successful Completion, hardware generally synthesizes ~0
-	 * (PCI_ERROR_RESPONSE) data to complete the read.  Reading Vendor
-	 * ID for VFs and non-existent devices also returns ~0, so read the
-	 * Command register until it returns something other than ~0.
+	 * Wait for the device to return a non-CRS completion.  Read the
+	 * Command register instead of Vendor ID so we don't have to
+	 * contend with the CRS SV value.
 	 */
 	for (;;) {
 		u32 id;
@@ -1247,15 +1238,9 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 			return -ENOTTY;
 		}
 
-		if (root && root->config_rrs_sv) {
-			pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
-			if (!pci_bus_rrs_vendor_id(id))
-				break;
-		} else {
-			pci_read_config_dword(dev, PCI_COMMAND, &id);
-			if (!PCI_POSSIBLE_ERROR(id))
-				break;
-		}
+		pci_read_config_dword(dev, PCI_COMMAND, &id);
+		if (!PCI_POSSIBLE_ERROR(id))
+			break;
 
 		if (delay > timeout) {
 			pci_warn(dev, "not ready %dms after %s; giving up\n",
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 41183aed8f5d..c33b2401272b 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1304,11 +1304,9 @@ static void pci_enable_rrs_sv(struct pci_dev *pdev)
 
 	/* Enable Configuration RRS Software Visibility if supported */
 	pcie_capability_read_word(pdev, PCI_EXP_RTCAP, &root_cap);
-	if (root_cap & PCI_EXP_RTCAP_RRS_SV) {
+	if (root_cap & PCI_EXP_RTCAP_RRS_SV)
 		pcie_capability_set_word(pdev, PCI_EXP_RTCTL,
 					 PCI_EXP_RTCTL_RRS_SVE);
-		pdev->config_rrs_sv = 1;
-	}
 }
 
 static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index b5cc0c2b9906..11673c53ceb1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -388,7 +388,6 @@ struct pci_dev {
 					   can be generated */
 	unsigned int	pme_poll:1;	/* Poll device's PME status bit */
 	unsigned int	pinned:1;	/* Whether this dev is pinned */
-	unsigned int	config_rrs_sv:1; /* Config RRS software visibility */
 	unsigned int	imm_ready:1;	/* Supports Immediate Readiness */
 	unsigned int	d1_support:1;	/* Low power state D1 is supported */
 	unsigned int	d2_support:1;	/* Low power state D2 is supported */
-- 
2.51.0


[-- Attachment #5: dmesgs.tar.gz --]
[-- Type: application/gzip, Size: 185828 bytes --]

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2026-02-13 19:40 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-11 10:14 [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme Hui Wang
2025-06-12 16:48 ` Bjorn Helgaas
2025-06-16 11:55   ` Hui Wang
2025-06-16 13:38     ` Hui Wang
2025-06-17 20:12       ` Bjorn Helgaas
2025-06-23 22:58         ` Bjorn Helgaas
2025-06-24  0:58           ` Hui Wang
2025-07-01 23:23             ` Bjorn Helgaas
2025-07-02  9:43               ` Hui Wang
2025-07-03  0:05                 ` Hui Wang
2025-08-08  2:23                   ` Hui Wang
2025-08-11 23:04                     ` Bjorn Helgaas
2025-09-11  9:24                       ` Vitaly Chikunov
2025-08-21 16:39                   ` Bjorn Helgaas
2025-10-08 16:53                     ` Bjorn Helgaas
2026-01-06 13:30                       ` Linux regression tracking (Thorsten Leemhuis)
2026-02-08 21:30                         ` Linux-Fan
2026-02-09  9:37                           ` Thorsten Leemhuis
2026-02-09 16:34                           ` Bjorn Helgaas
2026-02-13 19:37                             ` Linux-Fan
2025-08-14 15:55 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox