[PATCH v5 0/5] Verify devices transition from D3cold to D0

linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 0/5] Verify devices transition from D3cold to D0
@ 2024-08-23 15:40 Mario Limonciello
  2024-08-23 15:40 ` [PATCH v5 1/5] PCI: Use an enum for reset type in pci_dev_wait() Mario Limonciello
                   ` (5 more replies)
  0 siblings, 6 replies; 31+ messages in thread
From: Mario Limonciello @ 2024-08-23 15:40 UTC (permalink / raw)
  To: Bjorn Helgaas, Mathias Nyman, Mika Westerberg
  Cc: open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Gary Li, Greg Kroah-Hartman, Ilpo Järvinen,
	Mario Limonciello

From: Mario Limonciello <mario.limonciello@amd.com>

Gary has reported that when a dock is plugged into a system at the same
time the autosuspend delay has tripped that the USB4 stack malfunctions.

Messages show up like this:

```
thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled
```

Furthermore the USB4 router is non-functional at this point.

Those messages happen because the device is still in D3cold at the time
that the PCI core handed control back to the USB4 connection manager
(thunderbolt).

The issue is that it takes time for a device to enter D3cold and do a
conventional reset, and then more time for it to exit D3cold.

This appears not to be a new problem; previously there were very similar
reports from Ryzen XHCI controllers.  Quirks were added for those.
Furthermore; adding extra logging it's apparent that other PCI devices
in the system can take more than 10ms to recover from D3cold as well.

This series add a wait into pci_power_up() specifically for D3cold exit and
then drops the quirks that were previously used for the Ryzen XHCI controllers.

Mario Limonciello (5):
  PCI: Use an enum for reset type in pci_dev_wait()
  PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  PCI: Verify functions currently in D3cold have entered D0
  PCI: Allow Ryzen XHCI controllers into D3cold and drop delays
  PCI: Drop Radeon quirk for Macbook Pro 8.2

 drivers/pci/pci-driver.c    |  2 +-
 drivers/pci/pci.c           | 70 +++++++++++++++++++++++++++----------
 drivers/pci/pci.h           | 13 ++++++-
 drivers/pci/pcie/dpc.c      |  2 +-
 drivers/pci/quirks.c        | 25 -------------
 drivers/usb/host/xhci-pci.c | 11 ------
 6 files changed, 66 insertions(+), 57 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 1/5] PCI: Use an enum for reset type in pci_dev_wait()
  2024-08-23 15:40 [PATCH v5 0/5] Verify devices transition from D3cold to D0 Mario Limonciello
@ 2024-08-23 15:40 ` Mario Limonciello
  2024-08-23 15:40 ` [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND " Mario Limonciello
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Mario Limonciello @ 2024-08-23 15:40 UTC (permalink / raw)
  To: Bjorn Helgaas, Mathias Nyman, Mika Westerberg
  Cc: open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Gary Li, Greg Kroah-Hartman, Ilpo Järvinen,
	Mario Limonciello

From: Mario Limonciello <mario.limonciello@amd.com>

A string is passed to all callers of pci_dev_wait() which is utilized
to demonstrate what kind of reset happened when there was a problem.

This doesn't allow making the behavior for different reset types
conditional though. Lay some plumbing to allow making comparisons of
reset types with integers instead. No functional changes.

Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
v4->v5:
 * Add static assert and PCI_DEV_WAIT_MAX
v3->v4:
 * Use index-based array initialization format for pci_reset_types
 * Fix LKP reported sparse issue
---
 drivers/pci/pci-driver.c |  2 +-
 drivers/pci/pci.c        | 30 ++++++++++++++++++++----------
 drivers/pci/pci.h        | 12 +++++++++++-
 drivers/pci/pcie/dpc.c   |  2 +-
 4 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index f412ef73a6e4b..ac3cfbfa137d9 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -572,7 +572,7 @@ static void pci_pm_bridge_power_up_actions(struct pci_dev *pci_dev)
 {
 	int ret;
 
-	ret = pci_bridge_wait_for_secondary_bus(pci_dev, "resume");
+	ret = pci_bridge_wait_for_secondary_bus(pci_dev, PCI_DEV_WAIT_RESUME);
 	if (ret) {
 		/*
 		 * The downstream link failed to come up, so mark the
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ffaaca0978cbc..1e219057a5069 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -181,6 +181,16 @@ static int __init pcie_port_pm_setup(char *str)
 }
 __setup("pcie_port_pm=", pcie_port_pm_setup);
 
+static const char * const pci_reset_types[] = {
+	[PCI_DEV_WAIT_FLR] = "FLR",
+	[PCI_DEV_WAIT_AF_FLR] = "AF_FLR",
+	[PCI_DEV_WAIT_D3HOT_D0] = "PM D3HOT->D0",
+	[PCI_DEV_WAIT_BUS_RESET] = "bus reset",
+	[PCI_DEV_WAIT_RESUME] = "resume",
+	[PCI_DEV_WAIT_DPC] = "DPC",
+};
+static_assert(ARRAY_SIZE(pci_reset_types) == PCI_DEV_WAIT_MAX);
+
 /**
  * pci_bus_max_busnr - returns maximum PCI bus number of given bus' children
  * @bus: pointer to PCI bus structure to search
@@ -1279,7 +1289,7 @@ void pci_resume_bus(struct pci_bus *bus)
 		pci_walk_bus(bus, pci_resume_one, NULL);
 }
 
-static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
+static int pci_dev_wait(struct pci_dev *dev, enum pci_reset_type reset_type, int timeout)
 {
 	int delay = 1;
 	bool retrain = false;
@@ -1317,7 +1327,7 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 
 		if (delay > timeout) {
 			pci_warn(dev, "not ready %dms after %s; giving up\n",
-				 delay - 1, reset_type);
+				 delay - 1, pci_reset_types[reset_type]);
 			return -ENOTTY;
 		}
 
@@ -1330,7 +1340,7 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 				}
 			}
 			pci_info(dev, "not ready %dms after %s; waiting\n",
-				 delay - 1, reset_type);
+				 delay - 1, pci_reset_types[reset_type]);
 		}
 
 		msleep(delay);
@@ -1339,10 +1349,10 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 
 	if (delay > PCI_RESET_WAIT)
 		pci_info(dev, "ready %dms after %s\n", delay - 1,
-			 reset_type);
+			 pci_reset_types[reset_type]);
 	else
 		pci_dbg(dev, "ready %dms after %s\n", delay - 1,
-			reset_type);
+			pci_reset_types[reset_type]);
 
 	return 0;
 }
@@ -4536,7 +4546,7 @@ int pcie_flr(struct pci_dev *dev)
 	 */
 	msleep(100);
 
-	return pci_dev_wait(dev, "FLR", PCIE_RESET_READY_POLL_MS);
+	return pci_dev_wait(dev, PCI_DEV_WAIT_FLR, PCIE_RESET_READY_POLL_MS);
 }
 EXPORT_SYMBOL_GPL(pcie_flr);
 
@@ -4603,7 +4613,7 @@ static int pci_af_flr(struct pci_dev *dev, bool probe)
 	 */
 	msleep(100);
 
-	return pci_dev_wait(dev, "AF_FLR", PCIE_RESET_READY_POLL_MS);
+	return pci_dev_wait(dev, PCI_DEV_WAIT_AF_FLR, PCIE_RESET_READY_POLL_MS);
 }
 
 /**
@@ -4648,7 +4658,7 @@ static int pci_pm_reset(struct pci_dev *dev, bool probe)
 	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, csr);
 	pci_dev_d3_sleep(dev);
 
-	return pci_dev_wait(dev, "PM D3hot->D0", PCIE_RESET_READY_POLL_MS);
+	return pci_dev_wait(dev, PCI_DEV_WAIT_D3HOT_D0, PCIE_RESET_READY_POLL_MS);
 }
 
 /**
@@ -4822,7 +4832,7 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus)
  * Return 0 on success or -ENOTTY if the first device on the secondary bus
  * failed to become accessible.
  */
-int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
+int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, enum pci_reset_type reset_type)
 {
 	struct pci_dev *child __free(pci_dev_put) = NULL;
 	int delay;
@@ -4959,7 +4969,7 @@ int pci_bridge_secondary_bus_reset(struct pci_dev *dev)
 			      __builtin_return_address(0));
 	pcibios_reset_secondary_bus(dev);
 
-	return pci_bridge_wait_for_secondary_bus(dev, "bus reset");
+	return pci_bridge_wait_for_secondary_bus(dev, PCI_DEV_WAIT_BUS_RESET);
 }
 EXPORT_SYMBOL_GPL(pci_bridge_secondary_bus_reset);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 79c8398f39384..be06c38342c76 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -4,6 +4,16 @@
 
 #include <linux/pci.h>
 
+enum pci_reset_type {
+	PCI_DEV_WAIT_FLR,
+	PCI_DEV_WAIT_AF_FLR,
+	PCI_DEV_WAIT_D3HOT_D0,
+	PCI_DEV_WAIT_BUS_RESET,
+	PCI_DEV_WAIT_RESUME,
+	PCI_DEV_WAIT_DPC,
+	PCI_DEV_WAIT_MAX,
+};
+
 /* Number of possible devfns: 0.0 to 1f.7 inclusive */
 #define MAX_NR_DEVFNS 256
 
@@ -137,7 +147,7 @@ void pci_msi_init(struct pci_dev *dev);
 void pci_msix_init(struct pci_dev *dev);
 bool pci_bridge_d3_possible(struct pci_dev *dev);
 void pci_bridge_d3_update(struct pci_dev *dev);
-int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type);
+int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, enum pci_reset_type reset_type);
 
 static inline void pci_wakeup_event(struct pci_dev *dev)
 {
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 2b6ef7efa3c11..95cd985244729 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -174,7 +174,7 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
 			      PCI_EXP_DPC_STATUS_TRIGGER);
 
-	if (pci_bridge_wait_for_secondary_bus(pdev, "DPC")) {
+	if (pci_bridge_wait_for_secondary_bus(pdev, PCI_DEV_WAIT_DPC)) {
 		clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
 		ret = PCI_ERS_RESULT_DISCONNECT;
 	} else {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-08-23 15:40 [PATCH v5 0/5] Verify devices transition from D3cold to D0 Mario Limonciello
  2024-08-23 15:40 ` [PATCH v5 1/5] PCI: Use an enum for reset type in pci_dev_wait() Mario Limonciello
@ 2024-08-23 15:40 ` Mario Limonciello
  2024-08-23 19:54   ` Bjorn Helgaas
  2024-08-30  0:01   ` Bjorn Helgaas
  2024-08-23 15:40 ` [PATCH v5 3/5] PCI: Verify functions currently in D3cold have entered D0 Mario Limonciello
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 31+ messages in thread
From: Mario Limonciello @ 2024-08-23 15:40 UTC (permalink / raw)
  To: Bjorn Helgaas, Mathias Nyman, Mika Westerberg
  Cc: open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Gary Li, Greg Kroah-Hartman, Ilpo Järvinen,
	Mario Limonciello

From: Mario Limonciello <mario.limonciello@amd.com>

If a dock is plugged in at the same time as autosuspend delay then this
can cause malfunctions in the USB4 stack. This happens because the
device is still in D3cold at the time that the PCI core handed
control back to the USB4 stack.

A device that has gone through a reset may return a value in PCI_COMMAND
but that doesn't mean it's finished transitioning to D0.  For evices that
support power management explicitly check PCI_PM_CTRL on everything but
system resume to ensure the transition happened.

Devices that don't support power management and system resume will
continue to use PCI_COMMAND.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
v4->v5:
 * Fix misleading indentation
 * Amend commit message
---
 drivers/pci/pci.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 1e219057a5069..f032a4aaec268 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1309,21 +1309,33 @@ static int pci_dev_wait(struct pci_dev *dev, enum pci_reset_type reset_type, int
 	 * the read (except when CRS SV is enabled and the read was for the
 	 * Vendor ID; in that case it synthesizes 0x0001 data).
 	 *
-	 * Wait for the device to return a non-CRS completion.  Read the
-	 * Command register instead of Vendor ID so we don't have to
-	 * contend with the CRS SV value.
+	 * Wait for the device to return a non-CRS completion.  On devices
+	 * that support PM control and on waits that aren't part of system
+	 * resume read the PM control register to ensure the device has
+	 * transitioned to D0.  On devices that don't support PM control,
+	 * or during system resume read the command register to instead of
+	 * Vendor ID so we don't have to contend with the CRS SV value.
 	 */
 	for (;;) {
-		u32 id;
-
 		if (pci_dev_is_disconnected(dev)) {
 			pci_dbg(dev, "disconnected; not waiting\n");
 			return -ENOTTY;
 		}
 
-		pci_read_config_dword(dev, PCI_COMMAND, &id);
-		if (!PCI_POSSIBLE_ERROR(id))
-			break;
+		if (dev->pm_cap && reset_type != PCI_DEV_WAIT_RESUME) {
+			u16 pmcsr;
+
+			pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+			if (!PCI_POSSIBLE_ERROR(pmcsr) &&
+			    (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0)
+				break;
+		} else {
+			u32 id;
+
+			pci_read_config_dword(dev, PCI_COMMAND, &id);
+			if (!PCI_POSSIBLE_ERROR(id))
+				break;
+		}
 
 		if (delay > timeout) {
 			pci_warn(dev, "not ready %dms after %s; giving up\n",
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 3/5] PCI: Verify functions currently in D3cold have entered D0
  2024-08-23 15:40 [PATCH v5 0/5] Verify devices transition from D3cold to D0 Mario Limonciello
  2024-08-23 15:40 ` [PATCH v5 1/5] PCI: Use an enum for reset type in pci_dev_wait() Mario Limonciello
  2024-08-23 15:40 ` [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND " Mario Limonciello
@ 2024-08-23 15:40 ` Mario Limonciello
  2024-08-23 15:40 ` [PATCH v5 4/5] PCI: Allow Ryzen XHCI controllers into D3cold and drop delays Mario Limonciello
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Mario Limonciello @ 2024-08-23 15:40 UTC (permalink / raw)
  To: Bjorn Helgaas, Mathias Nyman, Mika Westerberg
  Cc: open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Gary Li, Greg Kroah-Hartman, Ilpo Järvinen,
	Mario Limonciello

From: Mario Limonciello <mario.limonciello@amd.com>

It is reported that USB4 routers and downstream devices may behave
incorrectly if a dock cable is plugged in at approximately the time that
the autosuspend_delay is configured. In this situation the device has
attempted to enter D3cold, but didn't finish D3cold entry when the PCI
core tried to transition it back to D0.

Empirically measuring this situation an "aborted" D3cold exit takes
~60ms and a "normal" D3cold exit takes ~6ms.

The PCI-PM 1.2 spec specifies that the restore time for functions
in D3cold is either 'Full context restore or boot latency'.

As PCIe r6.0 sec 5.8 specifies that the device will have gone
through a conventional reset, it may take some time for the
device to be ready.

Wait up to 1 sec as specified in PCIe r6.0 sec 6.6.1 for a device
in D3cold to return to D0.

Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
v4->v5:
 * Add missing string for new pci_reset_type
---
 drivers/pci/pci.c | 12 ++++++++++++
 drivers/pci/pci.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f032a4aaec268..02d94e0527e6a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -188,6 +188,7 @@ static const char * const pci_reset_types[] = {
 	[PCI_DEV_WAIT_BUS_RESET] = "bus reset",
 	[PCI_DEV_WAIT_RESUME] = "resume",
 	[PCI_DEV_WAIT_DPC] = "DPC",
+	[PCI_DEV_WAIT_D3COLD_D0] = "D3cold->D0",
 };
 static_assert(ARRAY_SIZE(pci_reset_types) == PCI_DEV_WAIT_MAX);
 
@@ -1426,6 +1427,17 @@ int pci_power_up(struct pci_dev *dev)
 	else if (state == PCI_D2)
 		udelay(PCI_PM_D2_DELAY);
 
+	/*
+	 * D3cold -> D0 will have gone through a conventional reset and may need
+	 * time to be ready.
+	 */
+	if (dev->current_state == PCI_D3cold) {
+		int ret;
+
+		ret = pci_dev_wait(dev, PCI_DEV_WAIT_D3COLD_D0, PCI_RESET_WAIT);
+		if (ret)
+			return ret;
+	}
 end:
 	dev->current_state = PCI_D0;
 	if (need_restore)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index be06c38342c76..6a48419924ea1 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -11,6 +11,7 @@ enum pci_reset_type {
 	PCI_DEV_WAIT_BUS_RESET,
 	PCI_DEV_WAIT_RESUME,
 	PCI_DEV_WAIT_DPC,
+	PCI_DEV_WAIT_D3COLD_D0,
 	PCI_DEV_WAIT_MAX,
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 4/5] PCI: Allow Ryzen XHCI controllers into D3cold and drop delays
  2024-08-23 15:40 [PATCH v5 0/5] Verify devices transition from D3cold to D0 Mario Limonciello
                   ` (2 preceding siblings ...)
  2024-08-23 15:40 ` [PATCH v5 3/5] PCI: Verify functions currently in D3cold have entered D0 Mario Limonciello
@ 2024-08-23 15:40 ` Mario Limonciello
  2024-08-23 15:40 ` [PATCH v5 5/5] PCI: Drop Radeon quirk for Macbook Pro 8.2 Mario Limonciello
  2024-12-04 17:30 ` [PATCH v5 0/5] Verify devices transition from D3cold to D0 Mario Limonciello
  5 siblings, 0 replies; 31+ messages in thread
From: Mario Limonciello @ 2024-08-23 15:40 UTC (permalink / raw)
  To: Bjorn Helgaas, Mathias Nyman, Mika Westerberg
  Cc: open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Gary Li, Greg Kroah-Hartman, Ilpo Järvinen,
	Mario Limonciello

From: Mario Limonciello <mario.limonciello@amd.com>

As the PCI core now has a delay after D3cold exit, the Ryzen XHCI
controllers that were quirked to not use D3cold and to add a delay
on D3hot no longer need these quirks.

Drop both the PCI and XHCI sets of quirks.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/pci/quirks.c        | 17 -----------------
 drivers/usb/host/xhci-pci.c | 11 -----------
 2 files changed, 28 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index a2ce4e08edf5a..3480a0445ff50 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2059,23 +2059,6 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
 			      PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8,
 			      quirk_nvidia_hda_pm);
 
-/*
- * Ryzen5/7 XHCI controllers fail upon resume from runtime suspend or s2idle.
- * https://bugzilla.kernel.org/show_bug.cgi?id=205587
- *
- * The kernel attempts to transition these devices to D3cold, but that seems
- * to be ineffective on the platforms in question; the PCI device appears to
- * remain on in D3hot state. The D3hot-to-D0 transition then requires an
- * extended delay in order to succeed.
- */
-static void quirk_ryzen_xhci_d3hot(struct pci_dev *dev)
-{
-	quirk_d3hot_delay(dev, 20);
-}
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3hot);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3hot);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1639, quirk_ryzen_xhci_d3hot);
-
 #ifdef CONFIG_X86_IO_APIC
 static int dmi_disable_ioapicreroute(const struct dmi_system_id *d)
 {
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index dc1e345ab67ea..d726810a04838 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -316,10 +316,6 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
 		(pdev->device == PCI_DEVICE_ID_AMD_PROMONTORYA_1)))
 		xhci->quirks |= XHCI_U2_DISABLE_WAKE;
 
-	if (pdev->vendor == PCI_VENDOR_ID_AMD &&
-		pdev->device == PCI_DEVICE_ID_AMD_RENOIR_XHCI)
-		xhci->quirks |= XHCI_BROKEN_D3COLD_S2I;
-
 	if (pdev->vendor == PCI_VENDOR_ID_INTEL) {
 		xhci->quirks |= XHCI_LPM_SUPPORT;
 		xhci->quirks |= XHCI_INTEL_HOST;
@@ -752,13 +748,6 @@ static int xhci_pci_suspend(struct usb_hcd *hcd, bool do_wakeup)
 	if (xhci->quirks & XHCI_COMP_MODE_QUIRK)
 		pci_d3cold_disable(pdev);
 
-#ifdef CONFIG_SUSPEND
-	/* d3cold is broken, but only when s2idle is used */
-	if (pm_suspend_target_state == PM_SUSPEND_TO_IDLE &&
-	    xhci->quirks & (XHCI_BROKEN_D3COLD_S2I))
-		pci_d3cold_disable(pdev);
-#endif
-
 	if (xhci->quirks & XHCI_PME_STUCK_QUIRK)
 		xhci_pme_quirk(hcd);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 5/5] PCI: Drop Radeon quirk for Macbook Pro 8.2
  2024-08-23 15:40 [PATCH v5 0/5] Verify devices transition from D3cold to D0 Mario Limonciello
                   ` (3 preceding siblings ...)
  2024-08-23 15:40 ` [PATCH v5 4/5] PCI: Allow Ryzen XHCI controllers into D3cold and drop delays Mario Limonciello
@ 2024-08-23 15:40 ` Mario Limonciello
  2024-12-04 17:30 ` [PATCH v5 0/5] Verify devices transition from D3cold to D0 Mario Limonciello
  5 siblings, 0 replies; 31+ messages in thread
From: Mario Limonciello @ 2024-08-23 15:40 UTC (permalink / raw)
  To: Bjorn Helgaas, Mathias Nyman, Mika Westerberg
  Cc: open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Gary Li, Greg Kroah-Hartman, Ilpo Järvinen,
	Mario Limonciello

From: Mario Limonciello <mario.limonciello@amd.com>

commit 5938628c51a7 ("drm/radeon: make MacBook Pro d3_delay quirk more
generic") introduced a generic quirk for Macbook Pro 8.2s that contain
Radeon graphics to ensure that enough time had past when the device
was powered on.

As the PCI core now verifies the device is in D0 during power on this
extra artificial delay is no longer necessary.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/pci/quirks.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 3480a0445ff50..e76ff1037fb35 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2038,14 +2038,6 @@ static void quirk_d3hot_delay(struct pci_dev *dev, unsigned int delay)
 		 dev->d3hot_delay);
 }
 
-static void quirk_radeon_pm(struct pci_dev *dev)
-{
-	if (dev->subsystem_vendor == PCI_VENDOR_ID_APPLE &&
-	    dev->subsystem_device == 0x00e2)
-		quirk_d3hot_delay(dev, 20);
-}
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6741, quirk_radeon_pm);
-
 /*
  * NVIDIA Ampere-based HDA controllers can wedge the whole device if a bus
  * reset is performed too soon after transition to D0, extend d3hot_delay
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-08-23 15:40 ` [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND " Mario Limonciello
@ 2024-08-23 19:54   ` Bjorn Helgaas
  2024-08-26 19:16     ` Mario Limonciello
  2024-08-30  0:01   ` Bjorn Helgaas
  1 sibling, 1 reply; 31+ messages in thread
From: Bjorn Helgaas @ 2024-08-23 19:54 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Bjorn Helgaas, Mathias Nyman, Mika Westerberg,
	open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Gary Li, Greg Kroah-Hartman, Ilpo Järvinen,
	Mario Limonciello, Duc Dang, Alex Williamson

[+cc Duc, Alex]

On Fri, Aug 23, 2024 at 10:40:20AM -0500, Mario Limonciello wrote:
> If a dock is plugged in at the same time as autosuspend delay then this
> can cause malfunctions in the USB4 stack. This happens because the
> device is still in D3cold at the time that the PCI core handed
> control back to the USB4 stack.

I assume the USB device in question is in the dock that was hot-added?
This patch suggests that pci_dev_wait() has waited for a read of
PCI_COMMAND to respond with something other than ~0, but the device is
still in D3cold.  I suppose we got to pci_dev_wait() via
pci_pm_bridge_power_up_actions() calling
pci_bridge_wait_for_secondary_bus(), since I wouldn't expect a reset
in the hot-add case.

> A device that has gone through a reset may return a value in PCI_COMMAND
> but that doesn't mean it's finished transitioning to D0.  For evices that
> support power management explicitly check PCI_PM_CTRL on everything but
> system resume to ensure the transition happened.

s/evices/devices/

> Devices that don't support power management and system resume will
> continue to use PCI_COMMAND.

Is there a bugzilla or other report with more details that we can
include here?

> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
> v4->v5:
>  * Fix misleading indentation
>  * Amend commit message
> ---
>  drivers/pci/pci.c | 28 ++++++++++++++++++++--------
>  1 file changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 1e219057a5069..f032a4aaec268 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1309,21 +1309,33 @@ static int pci_dev_wait(struct pci_dev *dev, enum pci_reset_type reset_type, int
>  	 * the read (except when CRS SV is enabled and the read was for the
>  	 * Vendor ID; in that case it synthesizes 0x0001 data).
>  	 *
> -	 * Wait for the device to return a non-CRS completion.  Read the
> -	 * Command register instead of Vendor ID so we don't have to
> -	 * contend with the CRS SV value.
> +	 * Wait for the device to return a non-CRS completion.  On devices
> +	 * that support PM control and on waits that aren't part of system
> +	 * resume read the PM control register to ensure the device has
> +	 * transitioned to D0.  On devices that don't support PM control,
> +	 * or during system resume read the command register to instead of
> +	 * Vendor ID so we don't have to contend with the CRS SV value.
>  	 */
>  	for (;;) {
> -		u32 id;
> -
>  		if (pci_dev_is_disconnected(dev)) {
>  			pci_dbg(dev, "disconnected; not waiting\n");
>  			return -ENOTTY;
>  		}
>  
> -		pci_read_config_dword(dev, PCI_COMMAND, &id);
> -		if (!PCI_POSSIBLE_ERROR(id))
> -			break;
> +		if (dev->pm_cap && reset_type != PCI_DEV_WAIT_RESUME) {
> +			u16 pmcsr;
> +
> +			pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> +			if (!PCI_POSSIBLE_ERROR(pmcsr) &&
> +			    (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0)
> +				break;
> +		} else {
> +			u32 id;
> +
> +			pci_read_config_dword(dev, PCI_COMMAND, &id);
> +			if (!PCI_POSSIBLE_ERROR(id))
> +				break;
> +		}

What is the rationale behind using PCI_PM_CTRL in some cases and
PCI_COMMAND in others?  Is there some spec language we can cite for
this?

IIUC, pci_dev_wait() waits for a device to be ready after a reset
(FLR, D3hot->D0 transition for devices where No_Soft_Reset is clear,
DPC) and after power-up from D3cold (pci_pm_bridge_power_up_actions()).
I think device readiness in all of these cases is covered by PCIe
r6.0, sec 6.6.1.

If the Root Port above the device supports Configuration RRS Software
Visibility, I think we probably should use that here instead of either
PCI_COMMAND or PCI_PM_CTRL.  SR-IOV VFs don't implement Vendor ID,
which might complicate this a little.  But it niggles in my mind that
there may be some other problem beyond that.  Maybe Alex remembers.

Anyway, if we meet the requirements of sec 6.6.1, the device should be
ready to respond to config requests, and I assume that also means
the device is in D0.

>  		if (delay > timeout) {
>  			pci_warn(dev, "not ready %dms after %s; giving up\n",
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-08-23 19:54   ` Bjorn Helgaas
@ 2024-08-26 19:16     ` Mario Limonciello
  2024-08-27 17:43       ` Mario Limonciello
  2024-08-27 19:44       ` Bjorn Helgaas
  0 siblings, 2 replies; 31+ messages in thread
From: Mario Limonciello @ 2024-08-26 19:16 UTC (permalink / raw)
  To: Bjorn Helgaas, Mario Limonciello, Gary Li
  Cc: Bjorn Helgaas, Mathias Nyman, Mika Westerberg,
	open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Greg Kroah-Hartman, Ilpo Järvinen, Duc Dang,
	Alex Williamson

On 8/23/2024 14:54, Bjorn Helgaas wrote:
> [+cc Duc, Alex]
> 
> On Fri, Aug 23, 2024 at 10:40:20AM -0500, Mario Limonciello wrote:
>> If a dock is plugged in at the same time as autosuspend delay then this
>> can cause malfunctions in the USB4 stack. This happens because the
>> device is still in D3cold at the time that the PCI core handed
>> control back to the USB4 stack.
> 
> I assume the USB device in question is in the dock that was hot-added?

No; it's actually the USB4 router that is malfunctioning.  The CM 
(thunderbolt.ko) thinks the router is in D0 already, but when it 
attempts to do a register read it gets back all F's and it trusts that.

> This patch suggests that pci_dev_wait() has waited for a read of
> PCI_COMMAND to respond with something other than ~0, but the device is
> still in D3cold.  I suppose we got to pci_dev_wait() via
> pci_pm_bridge_power_up_actions() calling
> pci_bridge_wait_for_secondary_bus(), since I wouldn't expect a reset
> in the hot-add case.

As I said it's the router (part of the SoC). The device never 
disappears.  It's the action of plugging in/out the the dock that causes 
it to change power states.

We didn't try it, but I wouldn't be surprised if it could be reproduced 
with a script that turned on/off runtime PM on very tight timing around 
the autosuspend delay.

> 
>> A device that has gone through a reset may return a value in PCI_COMMAND
>> but that doesn't mean it's finished transitioning to D0.  For evices that
>> support power management explicitly check PCI_PM_CTRL on everything but
>> system resume to ensure the transition happened.
> 
> s/evices/devices/

Thanks.

> 
>> Devices that don't support power management and system resume will
>> continue to use PCI_COMMAND.
> 
> Is there a bugzilla or other report with more details that we can
> include here?

No, unfortunately in this case it was only reported internally at AMD.

Gary who is on CC brought it to me though, and if you think there are 
some other specific details needed but are missing we can see what else 
can be added to the commit message.

> 
>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>> ---
>> v4->v5:
>>   * Fix misleading indentation
>>   * Amend commit message
>> ---
>>   drivers/pci/pci.c | 28 ++++++++++++++++++++--------
>>   1 file changed, 20 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 1e219057a5069..f032a4aaec268 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -1309,21 +1309,33 @@ static int pci_dev_wait(struct pci_dev *dev, enum pci_reset_type reset_type, int
>>   	 * the read (except when CRS SV is enabled and the read was for the
>>   	 * Vendor ID; in that case it synthesizes 0x0001 data).
>>   	 *
>> -	 * Wait for the device to return a non-CRS completion.  Read the
>> -	 * Command register instead of Vendor ID so we don't have to
>> -	 * contend with the CRS SV value.
>> +	 * Wait for the device to return a non-CRS completion.  On devices
>> +	 * that support PM control and on waits that aren't part of system
>> +	 * resume read the PM control register to ensure the device has
>> +	 * transitioned to D0.  On devices that don't support PM control,
>> +	 * or during system resume read the command register to instead of
>> +	 * Vendor ID so we don't have to contend with the CRS SV value.
>>   	 */
>>   	for (;;) {
>> -		u32 id;
>> -
>>   		if (pci_dev_is_disconnected(dev)) {
>>   			pci_dbg(dev, "disconnected; not waiting\n");
>>   			return -ENOTTY;
>>   		}
>>   
>> -		pci_read_config_dword(dev, PCI_COMMAND, &id);
>> -		if (!PCI_POSSIBLE_ERROR(id))
>> -			break;
>> +		if (dev->pm_cap && reset_type != PCI_DEV_WAIT_RESUME) {
>> +			u16 pmcsr;
>> +
>> +			pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>> +			if (!PCI_POSSIBLE_ERROR(pmcsr) &&
>> +			    (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0)
>> +				break;
>> +		} else {
>> +			u32 id;
>> +
>> +			pci_read_config_dword(dev, PCI_COMMAND, &id);
>> +			if (!PCI_POSSIBLE_ERROR(id))
>> +				break;
>> +		}
> 
> What is the rationale behind using PCI_PM_CTRL in some cases and
> PCI_COMMAND in others?

We saw a deadlock during resume from suspend when PCI_PM_CTRL was used 
for all cases that supported dev->pm_cap.

> Is there some spec language we can cite for
> this?

Perhaps it being a "cold reset" during resume?

> 
> IIUC, pci_dev_wait() waits for a device to be ready after a reset
> (FLR, D3hot->D0 transition for devices where No_Soft_Reset is clear,
> DPC) and after power-up from D3cold (pci_pm_bridge_power_up_actions()).
> I think device readiness in all of these cases is covered by PCIe
> r6.0, sec 6.6.1.

Would it be helpful to you to get a dump_stack() call trace to 
pci_power_up() the specific call flow that needed this fix?

Gary is able to to reproduce this at will, I think he should be able to 
gather that using an unpatched kernel to help this conversation.

> 
> If the Root Port above the device supports Configuration RRS Software
> Visibility, I think we probably should use that here instead of either
> PCI_COMMAND or PCI_PM_CTRL.  

I did check and in this case the root port the USB4 routers are 
connected to support this.

How do you think this should be done instead?

> SR-IOV VFs don't implement Vendor ID,
> which might complicate this a little.  But it niggles in my mind that
> there may be some other problem beyond that.  Maybe Alex remembers.


> 
> Anyway, if we meet the requirements of sec 6.6.1, the device should be
> ready to respond to config requests, and I assume that also means
> the device is in D0.
> 

Within that section there is a quote to point out:

"
To allow components to perform internal initialization, system software 
must wait a specified minimum period
following exit from a Conventional Reset of one or more devices before 
it is permitted to issue Configuration
Requests to those devices
"

In pci_power_up() I don't really see any hardcoded delays specifically 
for this case of leaving D3cold. The PCI PM spec specifies that it will 
take "Full context restore or boot latency".  I don't think it's 
reasonable to have NO delay.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-08-26 19:16     ` Mario Limonciello
@ 2024-08-27 17:43       ` Mario Limonciello
  2024-08-27 19:44       ` Bjorn Helgaas
  1 sibling, 0 replies; 31+ messages in thread
From: Mario Limonciello @ 2024-08-27 17:43 UTC (permalink / raw)
  To: Bjorn Helgaas, Mario Limonciello, Gary Li
  Cc: Bjorn Helgaas, Mathias Nyman, Mika Westerberg,
	open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Greg Kroah-Hartman, Ilpo Järvinen, Duc Dang,
	Alex Williamson

On 8/26/2024 14:16, Mario Limonciello wrote:
> On 8/23/2024 14:54, Bjorn Helgaas wrote:
>> [+cc Duc, Alex]
>>
>> On Fri, Aug 23, 2024 at 10:40:20AM -0500, Mario Limonciello wrote:
>>> If a dock is plugged in at the same time as autosuspend delay then this
>>> can cause malfunctions in the USB4 stack. This happens because the
>>> device is still in D3cold at the time that the PCI core handed
>>> control back to the USB4 stack.
>>
>> I assume the USB device in question is in the dock that was hot-added?
> 
> No; it's actually the USB4 router that is malfunctioning.  The CM 
> (thunderbolt.ko) thinks the router is in D0 already, but when it 
> attempts to do a register read it gets back all F's and it trusts that.
> 
>> This patch suggests that pci_dev_wait() has waited for a read of
>> PCI_COMMAND to respond with something other than ~0, but the device is
>> still in D3cold.  I suppose we got to pci_dev_wait() via
>> pci_pm_bridge_power_up_actions() calling
>> pci_bridge_wait_for_secondary_bus(), since I wouldn't expect a reset
>> in the hot-add case.
> 
> As I said it's the router (part of the SoC). The device never 
> disappears.  It's the action of plugging in/out the the dock that causes 
> it to change power states.
> 
> We didn't try it, but I wouldn't be surprised if it could be reproduced 
> with a script that turned on/off runtime PM on very tight timing around 
> the autosuspend delay.
> 
>>
>>> A device that has gone through a reset may return a value in PCI_COMMAND
>>> but that doesn't mean it's finished transitioning to D0.  For evices 
>>> that
>>> support power management explicitly check PCI_PM_CTRL on everything but
>>> system resume to ensure the transition happened.
>>
>> s/evices/devices/
> 
> Thanks.
> 
>>
>>> Devices that don't support power management and system resume will
>>> continue to use PCI_COMMAND.
>>
>> Is there a bugzilla or other report with more details that we can
>> include here?
> 
> No, unfortunately in this case it was only reported internally at AMD.
> 
> Gary who is on CC brought it to me though, and if you think there are 
> some other specific details needed but are missing we can see what else 
> can be added to the commit message.
> 
>>
>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>> ---
>>> v4->v5:
>>>   * Fix misleading indentation
>>>   * Amend commit message
>>> ---
>>>   drivers/pci/pci.c | 28 ++++++++++++++++++++--------
>>>   1 file changed, 20 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>> index 1e219057a5069..f032a4aaec268 100644
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -1309,21 +1309,33 @@ static int pci_dev_wait(struct pci_dev *dev, 
>>> enum pci_reset_type reset_type, int
>>>        * the read (except when CRS SV is enabled and the read was for 
>>> the
>>>        * Vendor ID; in that case it synthesizes 0x0001 data).
>>>        *
>>> -     * Wait for the device to return a non-CRS completion.  Read the
>>> -     * Command register instead of Vendor ID so we don't have to
>>> -     * contend with the CRS SV value.
>>> +     * Wait for the device to return a non-CRS completion.  On devices
>>> +     * that support PM control and on waits that aren't part of system
>>> +     * resume read the PM control register to ensure the device has
>>> +     * transitioned to D0.  On devices that don't support PM control,
>>> +     * or during system resume read the command register to instead of
>>> +     * Vendor ID so we don't have to contend with the CRS SV value.
>>>        */
>>>       for (;;) {
>>> -        u32 id;
>>> -
>>>           if (pci_dev_is_disconnected(dev)) {
>>>               pci_dbg(dev, "disconnected; not waiting\n");
>>>               return -ENOTTY;
>>>           }
>>> -        pci_read_config_dword(dev, PCI_COMMAND, &id);
>>> -        if (!PCI_POSSIBLE_ERROR(id))
>>> -            break;
>>> +        if (dev->pm_cap && reset_type != PCI_DEV_WAIT_RESUME) {
>>> +            u16 pmcsr;
>>> +
>>> +            pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 
>>> &pmcsr);
>>> +            if (!PCI_POSSIBLE_ERROR(pmcsr) &&
>>> +                (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0)
>>> +                break;
>>> +        } else {
>>> +            u32 id;
>>> +
>>> +            pci_read_config_dword(dev, PCI_COMMAND, &id);
>>> +            if (!PCI_POSSIBLE_ERROR(id))
>>> +                break;
>>> +        }
>>
>> What is the rationale behind using PCI_PM_CTRL in some cases and
>> PCI_COMMAND in others?
> 
> We saw a deadlock during resume from suspend when PCI_PM_CTRL was used 
> for all cases that supported dev->pm_cap.
> 
>> Is there some spec language we can cite for
>> this?
> 
> Perhaps it being a "cold reset" during resume?
> 
>>
>> IIUC, pci_dev_wait() waits for a device to be ready after a reset
>> (FLR, D3hot->D0 transition for devices where No_Soft_Reset is clear,
>> DPC) and after power-up from D3cold (pci_pm_bridge_power_up_actions()).
>> I think device readiness in all of these cases is covered by PCIe
>> r6.0, sec 6.6.1.
> 
> Would it be helpful to you to get a dump_stack() call trace to 
> pci_power_up() the specific call flow that needed this fix?
> 
> Gary is able to to reproduce this at will, I think he should be able to 
> gather that using an unpatched kernel to help this conversation.

Here is the kernel trace with a dump_stack() and a printk of current 
inserted into pci_power_up() right before the failure occurs.

https://gist.github.com/superm1/cb407766ab15f42f12a6fe9d1196f6fc

Also the failure is visible right after.

> 
>>
>> If the Root Port above the device supports Configuration RRS Software
>> Visibility, I think we probably should use that here instead of either
>> PCI_COMMAND or PCI_PM_CTRL. 
> 
> I did check and in this case the root port the USB4 routers are 
> connected to support this.
> 
> How do you think this should be done instead?
> 
>> SR-IOV VFs don't implement Vendor ID,
>> which might complicate this a little.  But it niggles in my mind that
>> there may be some other problem beyond that.  Maybe Alex remembers.
> 
> 
>>
>> Anyway, if we meet the requirements of sec 6.6.1, the device should be
>> ready to respond to config requests, and I assume that also means
>> the device is in D0.
>>
> 
> Within that section there is a quote to point out:
> 
> "
> To allow components to perform internal initialization, system software 
> must wait a specified minimum period
> following exit from a Conventional Reset of one or more devices before 
> it is permitted to issue Configuration
> Requests to those devices
> "
> 
> In pci_power_up() I don't really see any hardcoded delays specifically 
> for this case of leaving D3cold. The PCI PM spec specifies that it will 
> take "Full context restore or boot latency".  I don't think it's 
> reasonable to have NO delay.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-08-26 19:16     ` Mario Limonciello
  2024-08-27 17:43       ` Mario Limonciello
@ 2024-08-27 19:44       ` Bjorn Helgaas
  1 sibling, 0 replies; 31+ messages in thread
From: Bjorn Helgaas @ 2024-08-27 19:44 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Mario Limonciello, Gary Li, Bjorn Helgaas, Mathias Nyman,
	Mika Westerberg, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Greg Kroah-Hartman,
	Ilpo Järvinen, Duc Dang, Alex Williamson, Rafael J. Wysocki

[+cc Rafael for ACPI power transition from D3cold to D0 delay]

On Mon, Aug 26, 2024 at 02:16:34PM -0500, Mario Limonciello wrote:
> On 8/23/2024 14:54, Bjorn Helgaas wrote:
> > On Fri, Aug 23, 2024 at 10:40:20AM -0500, Mario Limonciello wrote:
> > > If a dock is plugged in at the same time as autosuspend delay
> > > then this can cause malfunctions in the USB4 stack. This happens
> > > because the device is still in D3cold at the time that the PCI
> > > core handed control back to the USB4 stack.
> ...

> > > A device that has gone through a reset may return a value in
> > > PCI_COMMAND but that doesn't mean it's finished transitioning to
> > > D0.  For devices that support power management explicitly check
> > > PCI_PM_CTRL on everything but system resume to ensure the
> > > transition happened.
> ...

> > > Devices that don't support power management and system resume
> > > will continue to use PCI_COMMAND.

> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -1309,21 +1309,33 @@ static int pci_dev_wait(struct pci_dev *dev, enum pci_reset_type reset_type, int
> > >   	 * the read (except when CRS SV is enabled and the read was for the
> > >   	 * Vendor ID; in that case it synthesizes 0x0001 data).
> > >   	 *
> > > -	 * Wait for the device to return a non-CRS completion.  Read the
> > > -	 * Command register instead of Vendor ID so we don't have to
> > > -	 * contend with the CRS SV value.
> > > +	 * Wait for the device to return a non-CRS completion.  On devices
> > > +	 * that support PM control and on waits that aren't part of system
> > > +	 * resume read the PM control register to ensure the device has
> > > +	 * transitioned to D0.  On devices that don't support PM control,
> > > +	 * or during system resume read the command register to instead of
> > > +	 * Vendor ID so we don't have to contend with the CRS SV value.
> > >   	 */
> > >   	for (;;) {
> > > -		u32 id;
> > > -
> > >   		if (pci_dev_is_disconnected(dev)) {
> > >   			pci_dbg(dev, "disconnected; not waiting\n");
> > >   			return -ENOTTY;
> > >   		}
> > > -		pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > -		if (!PCI_POSSIBLE_ERROR(id))
> > > -			break;
> > > +		if (dev->pm_cap && reset_type != PCI_DEV_WAIT_RESUME) {
> > > +			u16 pmcsr;
> > > +
> > > +			pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> > > +			if (!PCI_POSSIBLE_ERROR(pmcsr) &&
> > > +			    (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0)
> > > +				break;
> > > +		} else {
> > > +			u32 id;
> > > +
> > > +			pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > +			if (!PCI_POSSIBLE_ERROR(id))
> > > +				break;
> > > +		}
> > 
> > What is the rationale behind using PCI_PM_CTRL in some cases and
> > PCI_COMMAND in others?
> 
> We saw a deadlock during resume from suspend when PCI_PM_CTRL was used for
> all cases that supported dev->pm_cap.
> 
> > Is there some spec language we can cite for
> > this?
> 
> Perhaps it being a "cold reset" during resume?
> 
> > IIUC, pci_dev_wait() waits for a device to be ready after a reset
> > (FLR, D3hot->D0 transition for devices where No_Soft_Reset is clear,
> > DPC) and after power-up from D3cold (pci_pm_bridge_power_up_actions()).
> > I think device readiness in all of these cases is covered by PCIe
> > r6.0, sec 6.6.1.
> ...

> > If the Root Port above the device supports Configuration RRS
> > Software Visibility, I think we probably should use that here
> > instead of either PCI_COMMAND or PCI_PM_CTRL.
> 
> I did check and in this case the root port the USB4 routers are
> connected to support this.
> 
> How do you think this should be done instead?
> 
> > SR-IOV VFs don't implement Vendor ID, which might complicate this
> > a little.  But it niggles in my mind that there may be some other
> > problem beyond that.  Maybe Alex remembers.
> > 
> > Anyway, if we meet the requirements of sec 6.6.1, the device
> > should be ready to respond to config requests, and I assume that
> > also means the device is in D0.
> 
> Within that section there is a quote to point out:
> 
> "
> To allow components to perform internal initialization, system
> software must wait a specified minimum period following exit from a
> Conventional Reset of one or more devices before it is permitted to
> issue Configuration Requests to those devices
> "
> 
> In pci_power_up() I don't really see any hardcoded delays
> specifically for this case of leaving D3cold. The PCI PM spec
> specifies that it will take "Full context restore or boot latency".
> I don't think it's reasonable to have NO delay.

I agree, leaving D3cold means "applying power to the device", and PCIe
r6.0, sec 6.6.1, says that's a Fundamental Reset:

  A Fundamental Reset must occur following the application of power to
  the component. This is called a Cold Reset.

So we need a delay similar to what we do in
pci_bridge_wait_for_secondary_bus().  I don't know whether that is
supposed to happen somewhere on the ACPI side or in the PCI side, but
my inclination would be the PCI side because the delay isn't really
platform-specific, it's specified by the PCI specs, and the OS needs
to manage the RRS Software Visibility retries and features like
Readiness Notifications.

pci_set_low_power_state() contains corresponding delays for putting
devices in D1, D2, and D3hot.

Maybe the D3cold -> D0 delays should be in
platform_pci_set_power_state()?  I think pci_power_up() is the only
caller that sets the state to D0, and it assumes the device is
Configuration Ready when platform_pci_set_power_state() returns:

  pci_power_up
    platform_pci_set_power_state
    pci_read_config_word(PCI_PM_CTRL)

So I think we need basically the same delays and pci_dev_wait() stuff
in this path.

Bjorn

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-08-23 15:40 ` [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND " Mario Limonciello
  2024-08-23 19:54   ` Bjorn Helgaas
@ 2024-08-30  0:01   ` Bjorn Helgaas
  2024-09-03 16:29     ` Mario Limonciello
  1 sibling, 1 reply; 31+ messages in thread
From: Bjorn Helgaas @ 2024-08-30  0:01 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Bjorn Helgaas, Mathias Nyman, Mika Westerberg,
	open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Gary Li, Greg Kroah-Hartman, Ilpo Järvinen,
	Mario Limonciello

On Fri, Aug 23, 2024 at 10:40:20AM -0500, Mario Limonciello wrote:
> From: Mario Limonciello <mario.limonciello@amd.com>
> 
> If a dock is plugged in at the same time as autosuspend delay then
> this can cause malfunctions in the USB4 stack. This happens because
> the device is still in D3cold at the time that the PCI core handed
> control back to the USB4 stack.
> 
> A device that has gone through a reset may return a value in
> PCI_COMMAND but that doesn't mean it's finished transitioning to D0.
> For devices that support power management explicitly check
> PCI_PM_CTRL on everything but system resume to ensure the transition
> happened.

Still trying to understand what's going on here.

I posted a change to pci_dev_wait() to read Vendor ID, look for Config
RRS status, and wait for a successful completion (when RRS Software
Visibility is enabled) [1].

You tested that and found that it didn't help with *this* issue [2].
I assume you tested something like v6.11-rc plus the patches from [1],
i.e., without the PCI_PM_CTRL changes in this series.

  1) Initially the device is in D0

  2) We put it in D3cold (using some ACPI method) because the
  autosuspend delay expired (?)

  3) Plugging in the dock wakes up the device, so we power up the
  device (via pci_power_up(), which again uses some ACPI method), and
  it should transition to D0uninitialized

  4) The USB4 stack sees the device but thinks it's in D3cold (?)

If your testing only included [1], but did not include the
pci_power_up() change from patch 3/5 "Verify functions currently in
D3cold have entered D0", I don't think we would call pci_dev_wait(),
so I wouldn't expect [1] to make any difference.

If you *did* include both [1] and patch 3/5, the implication would be
that pci_dev_wait() successfully read the Vendor ID, meaning the
device is not in D3cold when pci_power_up() returns.

Can you clarify what you see and possibly expand/correct my timeline
above?

[1] https://lore.kernel.org/linux-pci/20240827234848.4429-1-helgaas@kernel.org/
[2] https://lore.kernel.org/linux-pci/30d9589a-8050-421b-a9a5-ad3422feadad@amd.com/

> Devices that don't support power management and system resume will
> continue to use PCI_COMMAND.
> 
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
> v4->v5:
>  * Fix misleading indentation
>  * Amend commit message
> ---
>  drivers/pci/pci.c | 28 ++++++++++++++++++++--------
>  1 file changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 1e219057a5069..f032a4aaec268 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1309,21 +1309,33 @@ static int pci_dev_wait(struct pci_dev *dev, enum pci_reset_type reset_type, int
>  	 * the read (except when CRS SV is enabled and the read was for the
>  	 * Vendor ID; in that case it synthesizes 0x0001 data).
>  	 *
> -	 * Wait for the device to return a non-CRS completion.  Read the
> -	 * Command register instead of Vendor ID so we don't have to
> -	 * contend with the CRS SV value.
> +	 * Wait for the device to return a non-CRS completion.  On devices
> +	 * that support PM control and on waits that aren't part of system
> +	 * resume read the PM control register to ensure the device has
> +	 * transitioned to D0.  On devices that don't support PM control,
> +	 * or during system resume read the command register to instead of
> +	 * Vendor ID so we don't have to contend with the CRS SV value.
>  	 */
>  	for (;;) {
> -		u32 id;
> -
>  		if (pci_dev_is_disconnected(dev)) {
>  			pci_dbg(dev, "disconnected; not waiting\n");
>  			return -ENOTTY;
>  		}
>  
> -		pci_read_config_dword(dev, PCI_COMMAND, &id);
> -		if (!PCI_POSSIBLE_ERROR(id))
> -			break;
> +		if (dev->pm_cap && reset_type != PCI_DEV_WAIT_RESUME) {
> +			u16 pmcsr;
> +
> +			pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> +			if (!PCI_POSSIBLE_ERROR(pmcsr) &&
> +			    (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0)
> +				break;
> +		} else {
> +			u32 id;
> +
> +			pci_read_config_dword(dev, PCI_COMMAND, &id);
> +			if (!PCI_POSSIBLE_ERROR(id))
> +				break;
> +		}
>  
>  		if (delay > timeout) {
>  			pci_warn(dev, "not ready %dms after %s; giving up\n",
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-08-30  0:01   ` Bjorn Helgaas
@ 2024-09-03 16:29     ` Mario Limonciello
  2024-09-03 17:11       ` Bjorn Helgaas
  0 siblings, 1 reply; 31+ messages in thread
From: Mario Limonciello @ 2024-09-03 16:29 UTC (permalink / raw)
  To: Bjorn Helgaas, Mario Limonciello
  Cc: Bjorn Helgaas, Mathias Nyman, Mika Westerberg,
	open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Gary Li, Greg Kroah-Hartman, Ilpo Järvinen

On 8/29/2024 19:01, Bjorn Helgaas wrote:
> On Fri, Aug 23, 2024 at 10:40:20AM -0500, Mario Limonciello wrote:
>> From: Mario Limonciello <mario.limonciello@amd.com>
>>
>> If a dock is plugged in at the same time as autosuspend delay then
>> this can cause malfunctions in the USB4 stack. This happens because
>> the device is still in D3cold at the time that the PCI core handed
>> control back to the USB4 stack.
>>
>> A device that has gone through a reset may return a value in
>> PCI_COMMAND but that doesn't mean it's finished transitioning to D0.
>> For devices that support power management explicitly check
>> PCI_PM_CTRL on everything but system resume to ensure the transition
>> happened.
> 
> Still trying to understand what's going on here.
> 
> I posted a change to pci_dev_wait() to read Vendor ID, look for Config
> RRS status, and wait for a successful completion (when RRS Software
> Visibility is enabled) [1].
> 
> You tested that and found that it didn't help with *this* issue [2].
> I assume you tested something like v6.11-rc plus the patches from [1],
> i.e., without the PCI_PM_CTRL changes in this series.
> 
>    1) Initially the device is in D0
> 
>    2) We put it in D3cold (using some ACPI method) because the
>    autosuspend delay expired (?)
> 
>    3) Plugging in the dock wakes up the device, so we power up the
>    device (via pci_power_up(), which again uses some ACPI method), and
>    it should transition to D0uninitialized
> 
>    4) The USB4 stack sees the device but thinks it's in D3cold (?)
> 
> If your testing only included [1], but did not include the
> pci_power_up() change from patch 3/5 "Verify functions currently in
> D3cold have entered D0", I don't think we would call pci_dev_wait(),
> so I wouldn't expect [1] to make any difference.
> 
> If you *did* include both [1] and patch 3/5, the implication would be
> that pci_dev_wait() successfully read the Vendor ID, meaning the
> device is not in D3cold when pci_power_up() returns.

The testing here was from the LTS 6.6.y kernel with both [1] and patch 
3/5 from this series.

I can get testing from 6.11-rc6 with a combination of patches if you 
would like.

> 
> Can you clarify what you see and possibly expand/correct my timeline
> above?

The timeline you shared is nearly correct.  The USB4 stack *thinks* the 
device is in D0 because of the return of pci_power_up().

As by polling PCI_PM_CTRL we can see it's still in D3, so it hasn't made 
it to D0uninitialized yet.

I guess I reading between the lines you have an assumption that you 
can't read the vendor ID from D3; which doesn't appear to be the case 
from our testing.

> 
> [1] https://lore.kernel.org/linux-pci/20240827234848.4429-1-helgaas@kernel.org/
> [2] https://lore.kernel.org/linux-pci/30d9589a-8050-421b-a9a5-ad3422feadad@amd.com/
> 
>> Devices that don't support power management and system resume will
>> continue to use PCI_COMMAND.
>>
>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>> ---
>> v4->v5:
>>   * Fix misleading indentation
>>   * Amend commit message
>> ---
>>   drivers/pci/pci.c | 28 ++++++++++++++++++++--------
>>   1 file changed, 20 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 1e219057a5069..f032a4aaec268 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -1309,21 +1309,33 @@ static int pci_dev_wait(struct pci_dev *dev, enum pci_reset_type reset_type, int
>>   	 * the read (except when CRS SV is enabled and the read was for the
>>   	 * Vendor ID; in that case it synthesizes 0x0001 data).
>>   	 *
>> -	 * Wait for the device to return a non-CRS completion.  Read the
>> -	 * Command register instead of Vendor ID so we don't have to
>> -	 * contend with the CRS SV value.
>> +	 * Wait for the device to return a non-CRS completion.  On devices
>> +	 * that support PM control and on waits that aren't part of system
>> +	 * resume read the PM control register to ensure the device has
>> +	 * transitioned to D0.  On devices that don't support PM control,
>> +	 * or during system resume read the command register to instead of
>> +	 * Vendor ID so we don't have to contend with the CRS SV value.
>>   	 */
>>   	for (;;) {
>> -		u32 id;
>> -
>>   		if (pci_dev_is_disconnected(dev)) {
>>   			pci_dbg(dev, "disconnected; not waiting\n");
>>   			return -ENOTTY;
>>   		}
>>   
>> -		pci_read_config_dword(dev, PCI_COMMAND, &id);
>> -		if (!PCI_POSSIBLE_ERROR(id))
>> -			break;
>> +		if (dev->pm_cap && reset_type != PCI_DEV_WAIT_RESUME) {
>> +			u16 pmcsr;
>> +
>> +			pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>> +			if (!PCI_POSSIBLE_ERROR(pmcsr) &&
>> +			    (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0)
>> +				break;
>> +		} else {
>> +			u32 id;
>> +
>> +			pci_read_config_dword(dev, PCI_COMMAND, &id);
>> +			if (!PCI_POSSIBLE_ERROR(id))
>> +				break;
>> +		}
>>   
>>   		if (delay > timeout) {
>>   			pci_warn(dev, "not ready %dms after %s; giving up\n",
>> -- 
>> 2.43.0
>>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-03 16:29     ` Mario Limonciello
@ 2024-09-03 17:11       ` Bjorn Helgaas
  2024-09-03 17:31         ` Mario Limonciello
  0 siblings, 1 reply; 31+ messages in thread
From: Bjorn Helgaas @ 2024-09-03 17:11 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Mario Limonciello, Bjorn Helgaas, Mathias Nyman, Mika Westerberg,
	open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Gary Li, Greg Kroah-Hartman, Ilpo Järvinen

On Tue, Sep 03, 2024 at 11:29:23AM -0500, Mario Limonciello wrote:
> On 8/29/2024 19:01, Bjorn Helgaas wrote:
> > On Fri, Aug 23, 2024 at 10:40:20AM -0500, Mario Limonciello wrote:
> > > From: Mario Limonciello <mario.limonciello@amd.com>
> > > 
> > > If a dock is plugged in at the same time as autosuspend delay then
> > > this can cause malfunctions in the USB4 stack. This happens because
> > > the device is still in D3cold at the time that the PCI core handed
> > > control back to the USB4 stack.
> > > 
> > > A device that has gone through a reset may return a value in
> > > PCI_COMMAND but that doesn't mean it's finished transitioning to D0.
> > > For devices that support power management explicitly check
> > > PCI_PM_CTRL on everything but system resume to ensure the transition
> > > happened.
> > 
> > Still trying to understand what's going on here.
> > 
> > I posted a change to pci_dev_wait() to read Vendor ID, look for Config
> > RRS status, and wait for a successful completion (when RRS Software
> > Visibility is enabled) [1].
> > 
> > You tested that and found that it didn't help with *this* issue [2].
> > I assume you tested something like v6.11-rc plus the patches from [1],
> > i.e., without the PCI_PM_CTRL changes in this series.
> > 
> >    1) Initially the device is in D0
> > 
> >    2) We put it in D3cold (using some ACPI method) because the
> >    autosuspend delay expired (?)
> > 
> >    3) Plugging in the dock wakes up the device, so we power up the
> >    device (via pci_power_up(), which again uses some ACPI method), and
> >    it should transition to D0uninitialized
> > 
> >    4) The USB4 stack sees the device but thinks it's in D3cold (?)
> ...

> > If you *did* include both [1] and patch 3/5, the implication would be
> > that pci_dev_wait() successfully read the Vendor ID, meaning the
> > device is not in D3cold when pci_power_up() returns.
> 
> The testing here was from the LTS 6.6.y kernel with both [1] and
> patch 3/5 from this series.
> 
> > Can you clarify what you see and possibly expand/correct my
> > timeline above?
> 
> The timeline you shared is nearly correct.  The USB4 stack *thinks*
> the device is in D0 because of the return of pci_power_up().
> 
> As by polling PCI_PM_CTRL we can see it's still in D3, so it hasn't
> made it to D0uninitialized yet.
> 
> I guess I reading between the lines you have an assumption that you
> can't read the vendor ID from D3; which doesn't appear to be the
> case from our testing.

A Vendor ID read of a device in D3hot should definitely work.
Obviously if the device were in D3cold, we'd get no response at all,
so the requester should log a UR error and fabricate ~0 data.

But if the device starts out in D3cold and we power it up, it should
not go through D3hot.  The only legal transition from D3cold is to
D0uninitialized (PCIe r6.0, sec 5.8).

OK, so with [1] and patch 3/5:

  1) Initially the device is in D0

  2) We put it in D3cold (using some ACPI method) because the
  autosuspend delay expired (?)

  3) Plugging in the dock wakes up the device, so we power up the
  device (via pci_power_up(), which again uses some ACPI method), and
  it should transition to D0uninitialized

  4) With patch 3/5, pci_power_up() calls pci_dev_wait() because
  dev->current_state == PCI_D3cold

  5) I *assume* RRS SV is enabled (lspci -vv of Root Port would
  confirm this; maybe we should add a pci_dbg message about which
  register we're polling).  If so, patch [1] means we should poll
  Vendor ID until successful completion.

  6) pci_dbg log should confirm the device is ready with a "ready %dms
  after D3cold->D0" message, which would mean we got a successful
  completion when reading Vendor ID

  7) For debugging purposes, it would be interesting to read and log
  the PCI_PM_CTRL value here.  Per sec 2.3.1, the device is not
  allowed to return RRS at this point since we already got a
  successful completion.

  8) The USB4 stack sees the device and assumes it is in D0, but it
  seems to still be in D3cold.  What is this based on?  Is there a
  config read that returns ~0 data when it shouldn't?

> > [1] https://lore.kernel.org/linux-pci/20240827234848.4429-1-helgaas@kernel.org/
> > [2] https://lore.kernel.org/linux-pci/30d9589a-8050-421b-a9a5-ad3422feadad@amd.com/
> > 
> > > Devices that don't support power management and system resume will
> > > continue to use PCI_COMMAND.
> > > 
> > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> > > ---
> > > v4->v5:
> > >   * Fix misleading indentation
> > >   * Amend commit message
> > > ---
> > >   drivers/pci/pci.c | 28 ++++++++++++++++++++--------
> > >   1 file changed, 20 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 1e219057a5069..f032a4aaec268 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -1309,21 +1309,33 @@ static int pci_dev_wait(struct pci_dev *dev, enum pci_reset_type reset_type, int
> > >   	 * the read (except when CRS SV is enabled and the read was for the
> > >   	 * Vendor ID; in that case it synthesizes 0x0001 data).
> > >   	 *
> > > -	 * Wait for the device to return a non-CRS completion.  Read the
> > > -	 * Command register instead of Vendor ID so we don't have to
> > > -	 * contend with the CRS SV value.
> > > +	 * Wait for the device to return a non-CRS completion.  On devices
> > > +	 * that support PM control and on waits that aren't part of system
> > > +	 * resume read the PM control register to ensure the device has
> > > +	 * transitioned to D0.  On devices that don't support PM control,
> > > +	 * or during system resume read the command register to instead of
> > > +	 * Vendor ID so we don't have to contend with the CRS SV value.
> > >   	 */
> > >   	for (;;) {
> > > -		u32 id;
> > > -
> > >   		if (pci_dev_is_disconnected(dev)) {
> > >   			pci_dbg(dev, "disconnected; not waiting\n");
> > >   			return -ENOTTY;
> > >   		}
> > > -		pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > -		if (!PCI_POSSIBLE_ERROR(id))
> > > -			break;
> > > +		if (dev->pm_cap && reset_type != PCI_DEV_WAIT_RESUME) {
> > > +			u16 pmcsr;
> > > +
> > > +			pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> > > +			if (!PCI_POSSIBLE_ERROR(pmcsr) &&
> > > +			    (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0)
> > > +				break;
> > > +		} else {
> > > +			u32 id;
> > > +
> > > +			pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > +			if (!PCI_POSSIBLE_ERROR(id))
> > > +				break;
> > > +		}
> > >   		if (delay > timeout) {
> > >   			pci_warn(dev, "not ready %dms after %s; giving up\n",
> > > -- 
> > > 2.43.0
> > > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-03 17:11       ` Bjorn Helgaas
@ 2024-09-03 17:31         ` Mario Limonciello
  2024-09-03 18:25           ` Bjorn Helgaas
  0 siblings, 1 reply; 31+ messages in thread
From: Mario Limonciello @ 2024-09-03 17:31 UTC (permalink / raw)
  To: Bjorn Helgaas, Gary Li
  Cc: Mario Limonciello, Bjorn Helgaas, Mathias Nyman, Mika Westerberg,
	open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Greg Kroah-Hartman, Ilpo Järvinen

On 9/3/2024 12:11, Bjorn Helgaas wrote:

<snip>

>>
>> I guess I reading between the lines you have an assumption that you
>> can't read the vendor ID from D3; which doesn't appear to be the
>> case from our testing.
> 
> A Vendor ID read of a device in D3hot should definitely work.
> Obviously if the device were in D3cold, we'd get no response at all,
> so the requester should log a UR error and fabricate ~0 data.
> 
> But if the device starts out in D3cold and we power it up, it should
> not go through D3hot.  The only legal transition from D3cold is to
> D0uninitialized (PCIe r6.0, sec 5.8).

Right.  The issue is it didn't finish getting into D3 at the time that 
we attempted to go to D0 though.  So all this extra time is basically 
waiting for the D0->D3 transition to finish followed by D3->D0uninitialized.

The best description I could offer is to call it an "aborted" D3.

> 
> OK, so with [1] and patch 3/5:
> 
>    1) Initially the device is in D0
> 
>    2) We put it in D3cold (using some ACPI method) because the
>    autosuspend delay expired (?)
> 
>    3) Plugging in the dock wakes up the device, so we power up the
>    device (via pci_power_up(), which again uses some ACPI method), and
>    it should transition to D0uninitialized
> 
>    4) With patch 3/5, pci_power_up() calls pci_dev_wait() because
>    dev->current_state == PCI_D3cold
> 
>    5) I *assume* RRS SV is enabled (lspci -vv of Root Port would
>    confirm this; maybe we should add a pci_dbg message about which
>    register we're polling).  If so, patch [1] means we should poll
>    Vendor ID until successful completion.
> 

Yup.
                 RootCap: CRSVisible+
                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- 
PMEIntEna+ CRSVisible+

>    6) pci_dbg log should confirm the device is ready with a "ready %dms
>    after D3cold->D0" message, which would mean we got a successful
>    completion when reading Vendor ID
> 
>    7) For debugging purposes, it would be interesting to read and log
>    the PCI_PM_CTRL value here.  Per sec 2.3.1, the device is not
>    allowed to return RRS at this point since we already got a
>    successful completion.
> 

OK let me get a debug log with [1], 3/5, 6.11-rc6 and a message added 
about this value to share back.

>    8) The USB4 stack sees the device and assumes it is in D0, but it
>    seems to still be in D3cold.  What is this based on?  Is there a
>    config read that returns ~0 data when it shouldn't?
> 

Yes there is.  From earlier in the thread I have a [log] I shared.

The message emitted is from ring_interrupt_active():

"thunderbolt 0000:e5:00.5: interrupt for TX ring 0 is already enabled"

[log] https://gist.github.com/superm1/cb407766ab15f42f12a6fe9d1196f6fc

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-03 17:31         ` Mario Limonciello
@ 2024-09-03 18:25           ` Bjorn Helgaas
  2024-09-03 18:32             ` Mario Limonciello
  0 siblings, 1 reply; 31+ messages in thread
From: Bjorn Helgaas @ 2024-09-03 18:25 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Gary Li, Mario Limonciello, Bjorn Helgaas, Mathias Nyman,
	Mika Westerberg, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Greg Kroah-Hartman,
	Ilpo Järvinen

On Tue, Sep 03, 2024 at 12:31:00PM -0500, Mario Limonciello wrote:
> On 9/3/2024 12:11, Bjorn Helgaas wrote:
> ...

> >    8) The USB4 stack sees the device and assumes it is in D0, but it
> >    seems to still be in D3cold.  What is this based on?  Is there a
> >    config read that returns ~0 data when it shouldn't?
> 
> Yes there is.  From earlier in the thread I have a [log] I shared.
> 
> The message emitted is from ring_interrupt_active():
> 
> "thunderbolt 0000:e5:00.5: interrupt for TX ring 0 is already enabled"

Right, that's in the cover letter, but I can't tell from this what the
ioread32(ring->nhi->iobase + reg) returned.  It looks like this is an
MMIO read of BAR 0, not a config read.

> [log] https://gist.github.com/superm1/cb407766ab15f42f12a6fe9d1196f6fc

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-03 18:25           ` Bjorn Helgaas
@ 2024-09-03 18:32             ` Mario Limonciello
  2024-09-03 21:32               ` Bjorn Helgaas
  2024-09-04 12:05               ` Mika Westerberg
  0 siblings, 2 replies; 31+ messages in thread
From: Mario Limonciello @ 2024-09-03 18:32 UTC (permalink / raw)
  To: Bjorn Helgaas, Mika Westerberg
  Cc: Gary Li, Mario Limonciello, Bjorn Helgaas, Mathias Nyman,
	open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Greg Kroah-Hartman, Ilpo Järvinen

On 9/3/2024 13:25, Bjorn Helgaas wrote:
> On Tue, Sep 03, 2024 at 12:31:00PM -0500, Mario Limonciello wrote:
>> On 9/3/2024 12:11, Bjorn Helgaas wrote:
>> ...
> 
>>>     8) The USB4 stack sees the device and assumes it is in D0, but it
>>>     seems to still be in D3cold.  What is this based on?  Is there a
>>>     config read that returns ~0 data when it shouldn't?
>>
>> Yes there is.  From earlier in the thread I have a [log] I shared.
>>
>> The message emitted is from ring_interrupt_active():
>>
>> "thunderbolt 0000:e5:00.5: interrupt for TX ring 0 is already enabled"
> 
> Right, that's in the cover letter, but I can't tell from this what the
> ioread32(ring->nhi->iobase + reg) returned.  It looks like this is an
> MMIO read of BAR 0, not a config read.
> 

Yeah.  I suppose another way to approach this problem is to make 
something else in the call chain poll PCI_PM_CTRL.

Polling at the start of nhi_runtime_resume() should also work.  For the 
"normal" scenario it would just be a single read to PCI_PM_CTRL.

Mika, thoughts?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-03 18:32             ` Mario Limonciello
@ 2024-09-03 21:32               ` Bjorn Helgaas
  2024-09-04 12:05               ` Mika Westerberg
  1 sibling, 0 replies; 31+ messages in thread
From: Bjorn Helgaas @ 2024-09-03 21:32 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Mika Westerberg, Gary Li, Mario Limonciello, Bjorn Helgaas,
	Mathias Nyman, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Greg Kroah-Hartman,
	Ilpo Järvinen

On Tue, Sep 03, 2024 at 01:32:30PM -0500, Mario Limonciello wrote:
> On 9/3/2024 13:25, Bjorn Helgaas wrote:
> > On Tue, Sep 03, 2024 at 12:31:00PM -0500, Mario Limonciello wrote:
> > > On 9/3/2024 12:11, Bjorn Helgaas wrote:
> > > ...
> > 
> > > >     8) The USB4 stack sees the device and assumes it is in D0, but it
> > > >     seems to still be in D3cold.  What is this based on?  Is there a
> > > >     config read that returns ~0 data when it shouldn't?
> > > 
> > > Yes there is.  From earlier in the thread I have a [log] I shared.
> > > 
> > > The message emitted is from ring_interrupt_active():
> > > 
> > > "thunderbolt 0000:e5:00.5: interrupt for TX ring 0 is already enabled"
> > 
> > Right, that's in the cover letter, but I can't tell from this what the
> > ioread32(ring->nhi->iobase + reg) returned.  It looks like this is an
> > MMIO read of BAR 0, not a config read.
> 
> Yeah.  I suppose another way to approach this problem is to make something
> else in the call chain poll PCI_PM_CTRL.
> 
> Polling at the start of nhi_runtime_resume() should also work.  For the
> "normal" scenario it would just be a single read to PCI_PM_CTRL.

It doesn't sound like this is anything specific to Thunderbolt, so
there should be a generic solution.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-03 18:32             ` Mario Limonciello
  2024-09-03 21:32               ` Bjorn Helgaas
@ 2024-09-04 12:05               ` Mika Westerberg
  2024-09-04 15:24                 ` Mario Limonciello
  1 sibling, 1 reply; 31+ messages in thread
From: Mika Westerberg @ 2024-09-04 12:05 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Bjorn Helgaas, Gary Li, Mario Limonciello, Bjorn Helgaas,
	Mathias Nyman, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Greg Kroah-Hartman,
	Ilpo Järvinen

Hi,

On Tue, Sep 03, 2024 at 01:32:30PM -0500, Mario Limonciello wrote:
> On 9/3/2024 13:25, Bjorn Helgaas wrote:
> > On Tue, Sep 03, 2024 at 12:31:00PM -0500, Mario Limonciello wrote:
> > > On 9/3/2024 12:11, Bjorn Helgaas wrote:
> > > ...
> > 
> > > >     8) The USB4 stack sees the device and assumes it is in D0, but it
> > > >     seems to still be in D3cold.  What is this based on?  Is there a
> > > >     config read that returns ~0 data when it shouldn't?
> > > 
> > > Yes there is.  From earlier in the thread I have a [log] I shared.
> > > 
> > > The message emitted is from ring_interrupt_active():
> > > 
> > > "thunderbolt 0000:e5:00.5: interrupt for TX ring 0 is already enabled"
> > 
> > Right, that's in the cover letter, but I can't tell from this what the
> > ioread32(ring->nhi->iobase + reg) returned.  It looks like this is an
> > MMIO read of BAR 0, not a config read.
> > 
> 
> Yeah.  I suppose another way to approach this problem is to make something
> else in the call chain poll PCI_PM_CTRL.
> 
> Polling at the start of nhi_runtime_resume() should also work.  For the
> "normal" scenario it would just be a single read to PCI_PM_CTRL.
> 
> Mika, thoughts?

I'm starting to wonder if we are looking at the correct place ;-) This
reminds me that our PCIe SV people recently reported a couple of Linux
related issues which they recommended to fix, and these are on my list
but I'll share them because maybe they are related?

First problem, and actually a PCI spec violation, is that Linux does not
clear Bus Master, MMIO and IO space enables when it programs the device
to D3 on runtime suspend path. It does so on system sleep path though.
Something like below (untested) should do that:

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index f412ef73a6e4..79a566376301 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1332,6 +1332,7 @@ static int pci_pm_runtime_suspend(struct device *dev)
 
 	if (!pci_dev->state_saved) {
 		pci_save_state(pci_dev);
+		pci_pm_default_suspend(pci_dev);
 		pci_finish_runtime_suspend(pci_dev);
 	}
 

The second thing is that Thunderbolt driver, for historical reasons,
leaves the MSI enabled when entering D3. This too might be related. I
think we can unconditionally disable it so below hack should do that
(untested as well). I wonder if you could try if any of these or both
can help here? Both of these issues can result unwanted events during D3
entry as far as I understand.

diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
index dc1f456736dc..73b815fbbceb 100644
--- a/drivers/thunderbolt/ctl.c
+++ b/drivers/thunderbolt/ctl.c
@@ -659,12 +659,11 @@ struct tb_ctl *tb_ctl_alloc(struct tb_nhi *nhi, int index, int timeout_msec,
 	if (!ctl->frame_pool)
 		goto err;
 
-	ctl->tx = tb_ring_alloc_tx(nhi, 0, 10, RING_FLAG_NO_SUSPEND);
+	ctl->tx = tb_ring_alloc_tx(nhi, 0, 10, 0);
 	if (!ctl->tx)
 		goto err;
 
-	ctl->rx = tb_ring_alloc_rx(nhi, 0, 10, RING_FLAG_NO_SUSPEND, 0, 0xffff,
-				   0xffff, NULL, NULL);
+	ctl->rx = tb_ring_alloc_rx(nhi, 0, 10, 0, 0, 0xffff, 0xffff, NULL, NULL);
 	if (!ctl->rx)
 		goto err;
 

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-04 12:05               ` Mika Westerberg
@ 2024-09-04 15:24                 ` Mario Limonciello
  2024-09-05  9:33                   ` Mika Westerberg
  0 siblings, 1 reply; 31+ messages in thread
From: Mario Limonciello @ 2024-09-04 15:24 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Gary Li, Mario Limonciello, Bjorn Helgaas,
	Mathias Nyman, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Greg Kroah-Hartman,
	Ilpo Järvinen

On 9/4/2024 07:05, Mika Westerberg wrote:
> Hi,
> 
> On Tue, Sep 03, 2024 at 01:32:30PM -0500, Mario Limonciello wrote:
>> On 9/3/2024 13:25, Bjorn Helgaas wrote:
>>> On Tue, Sep 03, 2024 at 12:31:00PM -0500, Mario Limonciello wrote:
>>>> On 9/3/2024 12:11, Bjorn Helgaas wrote:
>>>> ...
>>>
>>>>>      8) The USB4 stack sees the device and assumes it is in D0, but it
>>>>>      seems to still be in D3cold.  What is this based on?  Is there a
>>>>>      config read that returns ~0 data when it shouldn't?
>>>>
>>>> Yes there is.  From earlier in the thread I have a [log] I shared.
>>>>
>>>> The message emitted is from ring_interrupt_active():
>>>>
>>>> "thunderbolt 0000:e5:00.5: interrupt for TX ring 0 is already enabled"
>>>
>>> Right, that's in the cover letter, but I can't tell from this what the
>>> ioread32(ring->nhi->iobase + reg) returned.  It looks like this is an
>>> MMIO read of BAR 0, not a config read.
>>>
>>
>> Yeah.  I suppose another way to approach this problem is to make something
>> else in the call chain poll PCI_PM_CTRL.
>>
>> Polling at the start of nhi_runtime_resume() should also work.  For the
>> "normal" scenario it would just be a single read to PCI_PM_CTRL.
>>
>> Mika, thoughts?

We did this experiment to throw code to poll PCI_PM_CTRL at the start of 
nhi_runtime_resume() but this also fails.  From that I would hypothesize 
the device transitioned to D0uninitialized sometime in the middle of 
pci_pm_runtime_resume() before the call to pm->runtime_resume(dev);

> 
> I'm starting to wonder if we are looking at the correct place ;-) This
> reminds me that our PCIe SV people recently reported a couple of Linux
> related issues which they recommended to fix, and these are on my list
> but I'll share them because maybe they are related?

Thanks for sharing those.  We had a try with them but sorry to say no 
improvements to the issue at hand.

> 
> First problem, and actually a PCI spec violation, is that Linux does not
> clear Bus Master, MMIO and IO space enables when it programs the device
> to D3 on runtime suspend path. It does so on system sleep path though.
> Something like below (untested) should do that:
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index f412ef73a6e4..79a566376301 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1332,6 +1332,7 @@ static int pci_pm_runtime_suspend(struct device *dev)
>   
>   	if (!pci_dev->state_saved) {
>   		pci_save_state(pci_dev);
> +		pci_pm_default_suspend(pci_dev);
>   		pci_finish_runtime_suspend(pci_dev);
>   	}
>   
> 
> The second thing is that Thunderbolt driver, for historical reasons,
> leaves the MSI enabled when entering D3. This too might be related. I
> think we can unconditionally disable it so below hack should do that
> (untested as well). I wonder if you could try if any of these or both
> can help here? Both of these issues can result unwanted events during D3
> entry as far as I understand.
> 
> diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
> index dc1f456736dc..73b815fbbceb 100644
> --- a/drivers/thunderbolt/ctl.c
> +++ b/drivers/thunderbolt/ctl.c
> @@ -659,12 +659,11 @@ struct tb_ctl *tb_ctl_alloc(struct tb_nhi *nhi, int index, int timeout_msec,
>   	if (!ctl->frame_pool)
>   		goto err;
>   
> -	ctl->tx = tb_ring_alloc_tx(nhi, 0, 10, RING_FLAG_NO_SUSPEND);
> +	ctl->tx = tb_ring_alloc_tx(nhi, 0, 10, 0);
>   	if (!ctl->tx)
>   		goto err;
>   
> -	ctl->rx = tb_ring_alloc_rx(nhi, 0, 10, RING_FLAG_NO_SUSPEND, 0, 0xffff,
> -				   0xffff, NULL, NULL);
> +	ctl->rx = tb_ring_alloc_rx(nhi, 0, 10, 0, 0, 0xffff, 0xffff, NULL, NULL);
>   	if (!ctl->rx)
>   		goto err;
>   


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-04 15:24                 ` Mario Limonciello
@ 2024-09-05  9:33                   ` Mika Westerberg
  2024-09-09 20:40                     ` Mario Limonciello
  0 siblings, 1 reply; 31+ messages in thread
From: Mika Westerberg @ 2024-09-05  9:33 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Bjorn Helgaas, Gary Li, Mario Limonciello, Bjorn Helgaas,
	Mathias Nyman, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Greg Kroah-Hartman,
	Ilpo Järvinen

Hi,

On Wed, Sep 04, 2024 at 10:24:26AM -0500, Mario Limonciello wrote:
> On 9/4/2024 07:05, Mika Westerberg wrote:
> > Hi,
> > 
> > On Tue, Sep 03, 2024 at 01:32:30PM -0500, Mario Limonciello wrote:
> > > On 9/3/2024 13:25, Bjorn Helgaas wrote:
> > > > On Tue, Sep 03, 2024 at 12:31:00PM -0500, Mario Limonciello wrote:
> > > > > On 9/3/2024 12:11, Bjorn Helgaas wrote:
> > > > > ...
> > > > 
> > > > > >      8) The USB4 stack sees the device and assumes it is in D0, but it
> > > > > >      seems to still be in D3cold.  What is this based on?  Is there a
> > > > > >      config read that returns ~0 data when it shouldn't?
> > > > > 
> > > > > Yes there is.  From earlier in the thread I have a [log] I shared.
> > > > > 
> > > > > The message emitted is from ring_interrupt_active():
> > > > > 
> > > > > "thunderbolt 0000:e5:00.5: interrupt for TX ring 0 is already enabled"
> > > > 
> > > > Right, that's in the cover letter, but I can't tell from this what the
> > > > ioread32(ring->nhi->iobase + reg) returned.  It looks like this is an
> > > > MMIO read of BAR 0, not a config read.
> > > > 
> > > 
> > > Yeah.  I suppose another way to approach this problem is to make something
> > > else in the call chain poll PCI_PM_CTRL.
> > > 
> > > Polling at the start of nhi_runtime_resume() should also work.  For the
> > > "normal" scenario it would just be a single read to PCI_PM_CTRL.
> > > 
> > > Mika, thoughts?
> 
> We did this experiment to throw code to poll PCI_PM_CTRL at the start of
> nhi_runtime_resume() but this also fails.  From that I would hypothesize the
> device transitioned to D0uninitialized sometime in the middle of
> pci_pm_runtime_resume() before the call to pm->runtime_resume(dev);
> 
> > 
> > I'm starting to wonder if we are looking at the correct place ;-) This
> > reminds me that our PCIe SV people recently reported a couple of Linux
> > related issues which they recommended to fix, and these are on my list
> > but I'll share them because maybe they are related?
> 
> Thanks for sharing those.  We had a try with them but sorry to say no
> improvements to the issue at hand.

Okay, thanks for checking.

Few additional side paths here, though. This is supposed to work so that
once the host router sleep bit is set the driver is supposed to allow
the domain to enter sleep (e.g it should not be waken up before it is
fully transitioned). That's what we do:

1. All tunneled PCIe Root/Downstream ports are in D3.
2. All tunneled USB 3.x ports are in U3.
3. No DisplayPort is tunneled.
4. Thunderbolt driver enables wakes.
5. Thunderbolt driver writes sleep ready bit of the host router.
6. Thunderbolt driver runtime suspend is complete.
7. ACPI method is called (_PS3 or _PR3.OFF) that will trigger the "Sleep
Event".

If between 5 and 7 there is device connected, it should not "abort" the
sequence. Unfortunately this is not explict in the USB4 spec but the
connection manager guide has similar note. Even if the connect happens
there the "Sleep Event" should happen but after that it can trigger
normal wakeup which will then bring everything back.

Would it be possible to enable tracing around these steps so that we
could see if there is hotplug notification somewhere there that is not
expected? Here are instructions how to get pretty accurate trace:

https://github.com/intel/tbtools?tab=readme-ov-file#tracing

Please also take full dmesg.

It is entirely possible that this has nothing to do with the issue but I
think it is worth checking.

The second thing we could try is to check the wake status bits after
this has happened, like:

  # tbdump -r 0 -a <ADAPTER> -vv -N 1 PORT_CS_18

(where <ADAPTER> is the lane 0 adapter of the USB4 port the device was
connected).

The third thing to try is to comment out TB_WAKE_ON_CONNECT in
tb_switch_suspend(). This should result no wake even if the device is
connected. This tells us that it is really the connect on USB4 port that
triggered the wake.

These could (also) explain why the host router appears to be in D3 even
if it should be in D0 already.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-05  9:33                   ` Mika Westerberg
@ 2024-09-09 20:40                     ` Mario Limonciello
  2024-09-10  9:13                       ` Mika Westerberg
  0 siblings, 1 reply; 31+ messages in thread
From: Mario Limonciello @ 2024-09-09 20:40 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Gary Li, Mario Limonciello, Bjorn Helgaas,
	Mathias Nyman, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Greg Kroah-Hartman,
	Ilpo Järvinen

On 9/5/2024 04:33, Mika Westerberg wrote:
> Hi,
> 
> On Wed, Sep 04, 2024 at 10:24:26AM -0500, Mario Limonciello wrote:
>> On 9/4/2024 07:05, Mika Westerberg wrote:
>>> Hi,
>>>
>>> On Tue, Sep 03, 2024 at 01:32:30PM -0500, Mario Limonciello wrote:
>>>> On 9/3/2024 13:25, Bjorn Helgaas wrote:
>>>>> On Tue, Sep 03, 2024 at 12:31:00PM -0500, Mario Limonciello wrote:
>>>>>> On 9/3/2024 12:11, Bjorn Helgaas wrote:
>>>>>> ...
>>>>>
>>>>>>>       8) The USB4 stack sees the device and assumes it is in D0, but it
>>>>>>>       seems to still be in D3cold.  What is this based on?  Is there a
>>>>>>>       config read that returns ~0 data when it shouldn't?
>>>>>>
>>>>>> Yes there is.  From earlier in the thread I have a [log] I shared.
>>>>>>
>>>>>> The message emitted is from ring_interrupt_active():
>>>>>>
>>>>>> "thunderbolt 0000:e5:00.5: interrupt for TX ring 0 is already enabled"
>>>>>
>>>>> Right, that's in the cover letter, but I can't tell from this what the
>>>>> ioread32(ring->nhi->iobase + reg) returned.  It looks like this is an
>>>>> MMIO read of BAR 0, not a config read.
>>>>>
>>>>
>>>> Yeah.  I suppose another way to approach this problem is to make something
>>>> else in the call chain poll PCI_PM_CTRL.
>>>>
>>>> Polling at the start of nhi_runtime_resume() should also work.  For the
>>>> "normal" scenario it would just be a single read to PCI_PM_CTRL.
>>>>
>>>> Mika, thoughts?
>>
>> We did this experiment to throw code to poll PCI_PM_CTRL at the start of
>> nhi_runtime_resume() but this also fails.  From that I would hypothesize the
>> device transitioned to D0uninitialized sometime in the middle of
>> pci_pm_runtime_resume() before the call to pm->runtime_resume(dev);
>>
>>>
>>> I'm starting to wonder if we are looking at the correct place ;-) This
>>> reminds me that our PCIe SV people recently reported a couple of Linux
>>> related issues which they recommended to fix, and these are on my list
>>> but I'll share them because maybe they are related?
>>
>> Thanks for sharing those.  We had a try with them but sorry to say no
>> improvements to the issue at hand.
> 
> Okay, thanks for checking.
> 
> Few additional side paths here, though. This is supposed to work so that
> once the host router sleep bit is set the driver is supposed to allow
> the domain to enter sleep (e.g it should not be waken up before it is
> fully transitioned). That's what we do:
> 
> 1. All tunneled PCIe Root/Downstream ports are in D3.
> 2. All tunneled USB 3.x ports are in U3.
> 3. No DisplayPort is tunneled.
> 4. Thunderbolt driver enables wakes.
> 5. Thunderbolt driver writes sleep ready bit of the host router.
> 6. Thunderbolt driver runtime suspend is complete.
> 7. ACPI method is called (_PS3 or _PR3.OFF) that will trigger the "Sleep
> Event".
> 
> If between 5 and 7 there is device connected, it should not "abort" the
> sequence. Unfortunately this is not explict in the USB4 spec but the
> connection manager guide has similar note. Even if the connect happens
> there the "Sleep Event" should happen but after that it can trigger
> normal wakeup which will then bring everything back.
> 
> Would it be possible to enable tracing around these steps so that we
> could see if there is hotplug notification somewhere there that is not
> expected? Here are instructions how to get pretty accurate trace:
> 
> https://github.com/intel/tbtools?tab=readme-ov-file#tracing
> 
> Please also take full dmesg.

Sure, here is the dmesg with tracing enabled:

https://gist.github.com/superm1/5186e0023c8a5d2ecd75c50fd2168308
> 
> It is entirely possible that this has nothing to do with the issue but I
> think it is worth checking.
> 
> The second thing we could try is to check the wake status bits after
> this has happened, like:
> 
>    # tbdump -r 0 -a <ADAPTER> -vv -N 1 PORT_CS_18
> 
> (where <ADAPTER> is the lane 0 adapter of the USB4 port the device was
> connected).
> 

Unfortunately the adapter is in such a bad state at this time that 
tbdump doesn't work.

> The third thing to try is to comment out TB_WAKE_ON_CONNECT in
> tb_switch_suspend(). This should result no wake even if the device is
> connected. This tells us that it is really the connect on USB4 port that
> triggered the wake.

Yup that's correct; there is no action on the hotplug with this change.

> 
> These could (also) explain why the host router appears to be in D3 even
> if it should be in D0 already.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-09 20:40                     ` Mario Limonciello
@ 2024-09-10  9:13                       ` Mika Westerberg
  2024-09-13  4:12                         ` Mario Limonciello
  0 siblings, 1 reply; 31+ messages in thread
From: Mika Westerberg @ 2024-09-10  9:13 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Bjorn Helgaas, Gary Li, Mario Limonciello, Bjorn Helgaas,
	Mathias Nyman, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Greg Kroah-Hartman,
	Ilpo Järvinen

Hi,

On Mon, Sep 09, 2024 at 03:40:54PM -0500, Mario Limonciello wrote:
> > Few additional side paths here, though. This is supposed to work so that
> > once the host router sleep bit is set the driver is supposed to allow
> > the domain to enter sleep (e.g it should not be waken up before it is
> > fully transitioned). That's what we do:
> > 
> > 1. All tunneled PCIe Root/Downstream ports are in D3.
> > 2. All tunneled USB 3.x ports are in U3.
> > 3. No DisplayPort is tunneled.
> > 4. Thunderbolt driver enables wakes.
> > 5. Thunderbolt driver writes sleep ready bit of the host router.
> > 6. Thunderbolt driver runtime suspend is complete.
> > 7. ACPI method is called (_PS3 or _PR3.OFF) that will trigger the "Sleep
> > Event".
> > 
> > If between 5 and 7 there is device connected, it should not "abort" the
> > sequence. Unfortunately this is not explict in the USB4 spec but the
> > connection manager guide has similar note. Even if the connect happens
> > there the "Sleep Event" should happen but after that it can trigger
> > normal wakeup which will then bring everything back.
> > 
> > Would it be possible to enable tracing around these steps so that we
> > could see if there is hotplug notification somewhere there that is not
> > expected? Here are instructions how to get pretty accurate trace:
> > 
> > https://github.com/intel/tbtools?tab=readme-ov-file#tracing
> > 
> > Please also take full dmesg.
> 
> Sure, here is the dmesg with tracing enabled:
> 
> https://gist.github.com/superm1/5186e0023c8a5d2ecd75c50fd2168308

Thanks! I meant also enable control channel tracing as explained in the
above linked page so we could maybe see the hotplug packet coming from
the lane adapter too. Anyhow,

[  560.789681] thunderbolt 0-2: device disconnected
[  560.789771] thunderbolt 0000:e5:00.5: bandwidth consumption changed, re-calculating estimated bandwidth
[  560.789775] thunderbolt 0000:e5:00.5: bandwidth re-calculation done
[  560.789778] thunderbolt 0000:e5:00.5: looking for DP IN <-> DP OUT pairs:
[  560.789800] thunderbolt 0000:e5:00.5: 0:6: DP IN available
[  560.789803] thunderbolt 0000:e5:00.5: 0:6: no suitable DP OUT adapter available, not tunneling
[  560.790484] thunderbolt 0000:e5:00.5: 0:7: DP IN available
[  560.790487] thunderbolt 0000:e5:00.5: 0:7: no suitable DP OUT adapter available, not tunneling
...
[  578.677640] thunderbolt 0000:e5:00.5: nhi_runtime_suspend() - enter, pdev->current_state = 0
[  578.677648] thunderbolt 0000:e5:00.5: 0: suspending switch
[  578.677653] thunderbolt 0000:e5:00.5: 0: enabling wakeup: 0x3f
[  578.678248] thunderbolt 0000:e5:00.5: stopping RX ring 0
[  578.678256] thunderbolt 0000:e5:00.5: disabling interrupt at register 0x38200 bit 3 (0x9 -> 0x1)
[  578.678272] thunderbolt 0000:e5:00.5: stopping TX ring 0
[  578.678277] thunderbolt 0000:e5:00.5: disabling interrupt at register 0x38200 bit 0 (0x1 -> 0x0)
[  578.678287] thunderbolt 0000:e5:00.5: control channel stopped
...
pci_pm_runtime_resume()
  pci_pm_default_resume_early(pci_dev);
    pci_pm_power_up_and_verify_state(pci_dev);
      pci_power_up(pci_dev);
        platform_pci_set_power_state(dev, PCI_D0);
	// here we should see acpi_pci_set_power_state() printing "xxx: power
	// state changed by ACPI to D0" but we don't.
	//
	// also pdev->current_state is set to PCI_D0
      pci_update_current_state(pci_dev, PCI_D0);
        // this one reads it back, it should be 0..
  pm->runtime_resume(dev);
    nhi_runtime_resume(dev)
[  578.773503] thunderbolt 0000:e5:00.5: nhi_runtime_resume() - enter, pdev->current_state = 3

// .. but it is not. It seems to be powered off, D3cold.

[  578.773516] thunderbolt 0000:e5:00.5: control channel starting...
[  578.773518] thunderbolt 0000:e5:00.5: starting TX ring 0
[  578.773524] thunderbolt 0000:e5:00.5: enabling interrupt at register 0x38200 bit 0 (0xffffffff -> 0xffffffff)

// As here too.

It would be interesting to see what ACPI does here, I mean enabling
dynamic debugging of acpi_device_set_power() and friends. One suspect is
that when the runtime suspends happens, the power resource or _PS3()
gets called to put the device into D3cold and it has some sort of
Sleep() inside. This allows the OS to relese the CPU to do other things
too according to ACPI spec (and Linux does this) so now when you plug in
the device the GPE triggers wake and Linux starts processing that one.
We see the above runtime resume but then the Sleep() in the _PS3()
completes and cuts the power from the NHI while we are in the middle of
resuming it already (or something along those lines).

We saw similar "context switch" happening when PCIe L2 failed, BIOS
implemented the timeout using Sleep() that allowed Linux to process
other things which resulted unexpected wake (not the same as here
though).

So one thing to check too is how the ACPI methods get called and
especially if they somehow end up messing with each other.

> > It is entirely possible that this has nothing to do with the issue but I
> > think it is worth checking.
> > 
> > The second thing we could try is to check the wake status bits after
> > this has happened, like:
> > 
> >    # tbdump -r 0 -a <ADAPTER> -vv -N 1 PORT_CS_18
> > 
> > (where <ADAPTER> is the lane 0 adapter of the USB4 port the device was
> > connected).
> > 
> 
> Unfortunately the adapter is in such a bad state at this time that tbdump
> doesn't work.

Oh, indeed it is.

> > The third thing to try is to comment out TB_WAKE_ON_CONNECT in
> > tb_switch_suspend(). This should result no wake even if the device is
> > connected. This tells us that it is really the connect on USB4 port that
> > triggered the wake.
> 
> Yup that's correct; there is no action on the hotplug with this change.

Okay thanks for checking.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-10  9:13                       ` Mika Westerberg
@ 2024-09-13  4:12                         ` Mario Limonciello
  2024-09-13  4:58                           ` Mika Westerberg
  0 siblings, 1 reply; 31+ messages in thread
From: Mario Limonciello @ 2024-09-13  4:12 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Gary Li, Mario Limonciello, Bjorn Helgaas,
	Mathias Nyman, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Greg Kroah-Hartman,
	Ilpo Järvinen

On 9/10/2024 04:13, Mika Westerberg wrote:
> Hi,
> 
> On Mon, Sep 09, 2024 at 03:40:54PM -0500, Mario Limonciello wrote:
>>> Few additional side paths here, though. This is supposed to work so that
>>> once the host router sleep bit is set the driver is supposed to allow
>>> the domain to enter sleep (e.g it should not be waken up before it is
>>> fully transitioned). That's what we do:
>>>
>>> 1. All tunneled PCIe Root/Downstream ports are in D3.
>>> 2. All tunneled USB 3.x ports are in U3.
>>> 3. No DisplayPort is tunneled.
>>> 4. Thunderbolt driver enables wakes.
>>> 5. Thunderbolt driver writes sleep ready bit of the host router.
>>> 6. Thunderbolt driver runtime suspend is complete.
>>> 7. ACPI method is called (_PS3 or _PR3.OFF) that will trigger the "Sleep
>>> Event".
>>>
>>> If between 5 and 7 there is device connected, it should not "abort" the
>>> sequence. Unfortunately this is not explict in the USB4 spec but the
>>> connection manager guide has similar note. Even if the connect happens
>>> there the "Sleep Event" should happen but after that it can trigger
>>> normal wakeup which will then bring everything back.
>>>
>>> Would it be possible to enable tracing around these steps so that we
>>> could see if there is hotplug notification somewhere there that is not
>>> expected? Here are instructions how to get pretty accurate trace:
>>>
>>> https://github.com/intel/tbtools?tab=readme-ov-file#tracing
>>>
>>> Please also take full dmesg.
>>
>> Sure, here is the dmesg with tracing enabled:
>>
>> https://gist.github.com/superm1/5186e0023c8a5d2ecd75c50fd2168308
> 
> Thanks! I meant also enable control channel tracing as explained in the
> above linked page so we could maybe see the hotplug packet coming from
> the lane adapter too. Anyhow,
> 
> [  560.789681] thunderbolt 0-2: device disconnected
> [  560.789771] thunderbolt 0000:e5:00.5: bandwidth consumption changed, re-calculating estimated bandwidth
> [  560.789775] thunderbolt 0000:e5:00.5: bandwidth re-calculation done
> [  560.789778] thunderbolt 0000:e5:00.5: looking for DP IN <-> DP OUT pairs:
> [  560.789800] thunderbolt 0000:e5:00.5: 0:6: DP IN available
> [  560.789803] thunderbolt 0000:e5:00.5: 0:6: no suitable DP OUT adapter available, not tunneling
> [  560.790484] thunderbolt 0000:e5:00.5: 0:7: DP IN available
> [  560.790487] thunderbolt 0000:e5:00.5: 0:7: no suitable DP OUT adapter available, not tunneling
> ...
> [  578.677640] thunderbolt 0000:e5:00.5: nhi_runtime_suspend() - enter, pdev->current_state = 0
> [  578.677648] thunderbolt 0000:e5:00.5: 0: suspending switch
> [  578.677653] thunderbolt 0000:e5:00.5: 0: enabling wakeup: 0x3f
> [  578.678248] thunderbolt 0000:e5:00.5: stopping RX ring 0
> [  578.678256] thunderbolt 0000:e5:00.5: disabling interrupt at register 0x38200 bit 3 (0x9 -> 0x1)
> [  578.678272] thunderbolt 0000:e5:00.5: stopping TX ring 0
> [  578.678277] thunderbolt 0000:e5:00.5: disabling interrupt at register 0x38200 bit 0 (0x1 -> 0x0)
> [  578.678287] thunderbolt 0000:e5:00.5: control channel stopped
> ...
> pci_pm_runtime_resume()
>    pci_pm_default_resume_early(pci_dev);
>      pci_pm_power_up_and_verify_state(pci_dev);
>        pci_power_up(pci_dev);
>          platform_pci_set_power_state(dev, PCI_D0);
> 	// here we should see acpi_pci_set_power_state() printing "xxx: power
> 	// state changed by ACPI to D0" but we don't.
> 	//
> 	// also pdev->current_state is set to PCI_D0
>        pci_update_current_state(pci_dev, PCI_D0);
>          // this one reads it back, it should be 0..
>    pm->runtime_resume(dev);
>      nhi_runtime_resume(dev)
> [  578.773503] thunderbolt 0000:e5:00.5: nhi_runtime_resume() - enter, pdev->current_state = 3
> 
> // .. but it is not. It seems to be powered off, D3cold.
> 
> [  578.773516] thunderbolt 0000:e5:00.5: control channel starting...
> [  578.773518] thunderbolt 0000:e5:00.5: starting TX ring 0
> [  578.773524] thunderbolt 0000:e5:00.5: enabling interrupt at register 0x38200 bit 0 (0xffffffff -> 0xffffffff)
> 
> // As here too.
> 
> It would be interesting to see what ACPI does here, I mean enabling
> dynamic debugging of acpi_device_set_power() and friends. 

Here is a log with dyndbg turned up.
https://gist.github.com/superm1/2f7ca6123431365d11e9bc3cc9329c14

You can see firmware node \_SB_.PCI0.BUSC.NHI0 is physical device 
0000:e5:00.5.

> One suspect is
> that when the runtime suspends happens, the power resource or _PS3()
> gets called to put the device into D3cold and it has some sort of
> Sleep() inside. This allows the OS to relese the CPU to do other things
> too according to ACPI spec (and Linux does this) so now when you plug in
> the device the GPE triggers wake and Linux starts processing that one.
> We see the above runtime resume but then the Sleep() in the _PS3()
> completes and cuts the power from the NHI while we are in the middle of
> resuming it already (or something along those lines).
> 
> We saw similar "context switch" happening when PCIe L2 failed, BIOS
> implemented the timeout using Sleep() that allowed Linux to process
> other things which resulted unexpected wake (not the same as here
> though).
> 
> So one thing to check too is how the ACPI methods get called and
> especially if they somehow end up messing with each other.

At least to me it looks pretty straightforward that each D0 sequence is 
going through and nothing looked out of order:

acpi_pci_set_power_state()
->acpi_device_set_power()
->->acpi_dev_pm_explicit_set()


There is a Sleep() call in M232() which is a SMI used in both _PS0 and 
_PS3, but as far as I can tell it's not actually called as the case 
always has zero as an argument.

Here's some snippets of ASL from the NHI0 device to see.
https://gist.github.com/superm1/32ef1f822a6220a41b19574024717f79

I feel your theory is right about it being an aborted D3cold request, 
but I don't believe it's from concurrency issue of _PS3.  I feel there 
isn't any guard rails either in Linux or the AML for ensuring that
the transition actually finished.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-13  4:12                         ` Mario Limonciello
@ 2024-09-13  4:58                           ` Mika Westerberg
  2024-09-13  7:23                             ` Mika Westerberg
  0 siblings, 1 reply; 31+ messages in thread
From: Mika Westerberg @ 2024-09-13  4:58 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Bjorn Helgaas, Gary Li, Mario Limonciello, Bjorn Helgaas,
	Mathias Nyman, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Greg Kroah-Hartman,
	Ilpo Järvinen

Hi,

On Thu, Sep 12, 2024 at 11:12:11PM -0500, Mario Limonciello wrote:
> On 9/10/2024 04:13, Mika Westerberg wrote:
> > Hi,
> > 
> > On Mon, Sep 09, 2024 at 03:40:54PM -0500, Mario Limonciello wrote:
> > > > Few additional side paths here, though. This is supposed to work so that
> > > > once the host router sleep bit is set the driver is supposed to allow
> > > > the domain to enter sleep (e.g it should not be waken up before it is
> > > > fully transitioned). That's what we do:
> > > > 
> > > > 1. All tunneled PCIe Root/Downstream ports are in D3.
> > > > 2. All tunneled USB 3.x ports are in U3.
> > > > 3. No DisplayPort is tunneled.
> > > > 4. Thunderbolt driver enables wakes.
> > > > 5. Thunderbolt driver writes sleep ready bit of the host router.
> > > > 6. Thunderbolt driver runtime suspend is complete.
> > > > 7. ACPI method is called (_PS3 or _PR3.OFF) that will trigger the "Sleep
> > > > Event".
> > > > 
> > > > If between 5 and 7 there is device connected, it should not "abort" the
> > > > sequence. Unfortunately this is not explict in the USB4 spec but the
> > > > connection manager guide has similar note. Even if the connect happens
> > > > there the "Sleep Event" should happen but after that it can trigger
> > > > normal wakeup which will then bring everything back.
> > > > 
> > > > Would it be possible to enable tracing around these steps so that we
> > > > could see if there is hotplug notification somewhere there that is not
> > > > expected? Here are instructions how to get pretty accurate trace:
> > > > 
> > > > https://github.com/intel/tbtools?tab=readme-ov-file#tracing
> > > > 
> > > > Please also take full dmesg.
> > > 
> > > Sure, here is the dmesg with tracing enabled:
> > > 
> > > https://gist.github.com/superm1/5186e0023c8a5d2ecd75c50fd2168308
> > 
> > Thanks! I meant also enable control channel tracing as explained in the
> > above linked page so we could maybe see the hotplug packet coming from
> > the lane adapter too. Anyhow,
> > 
> > [  560.789681] thunderbolt 0-2: device disconnected
> > [  560.789771] thunderbolt 0000:e5:00.5: bandwidth consumption changed, re-calculating estimated bandwidth
> > [  560.789775] thunderbolt 0000:e5:00.5: bandwidth re-calculation done
> > [  560.789778] thunderbolt 0000:e5:00.5: looking for DP IN <-> DP OUT pairs:
> > [  560.789800] thunderbolt 0000:e5:00.5: 0:6: DP IN available
> > [  560.789803] thunderbolt 0000:e5:00.5: 0:6: no suitable DP OUT adapter available, not tunneling
> > [  560.790484] thunderbolt 0000:e5:00.5: 0:7: DP IN available
> > [  560.790487] thunderbolt 0000:e5:00.5: 0:7: no suitable DP OUT adapter available, not tunneling
> > ...
> > [  578.677640] thunderbolt 0000:e5:00.5: nhi_runtime_suspend() - enter, pdev->current_state = 0
> > [  578.677648] thunderbolt 0000:e5:00.5: 0: suspending switch
> > [  578.677653] thunderbolt 0000:e5:00.5: 0: enabling wakeup: 0x3f
> > [  578.678248] thunderbolt 0000:e5:00.5: stopping RX ring 0
> > [  578.678256] thunderbolt 0000:e5:00.5: disabling interrupt at register 0x38200 bit 3 (0x9 -> 0x1)
> > [  578.678272] thunderbolt 0000:e5:00.5: stopping TX ring 0
> > [  578.678277] thunderbolt 0000:e5:00.5: disabling interrupt at register 0x38200 bit 0 (0x1 -> 0x0)
> > [  578.678287] thunderbolt 0000:e5:00.5: control channel stopped
> > ...
> > pci_pm_runtime_resume()
> >    pci_pm_default_resume_early(pci_dev);
> >      pci_pm_power_up_and_verify_state(pci_dev);
> >        pci_power_up(pci_dev);
> >          platform_pci_set_power_state(dev, PCI_D0);
> > 	// here we should see acpi_pci_set_power_state() printing "xxx: power
> > 	// state changed by ACPI to D0" but we don't.
> > 	//
> > 	// also pdev->current_state is set to PCI_D0
> >        pci_update_current_state(pci_dev, PCI_D0);
> >          // this one reads it back, it should be 0..
> >    pm->runtime_resume(dev);
> >      nhi_runtime_resume(dev)
> > [  578.773503] thunderbolt 0000:e5:00.5: nhi_runtime_resume() - enter, pdev->current_state = 3
> > 
> > // .. but it is not. It seems to be powered off, D3cold.
> > 
> > [  578.773516] thunderbolt 0000:e5:00.5: control channel starting...
> > [  578.773518] thunderbolt 0000:e5:00.5: starting TX ring 0
> > [  578.773524] thunderbolt 0000:e5:00.5: enabling interrupt at register 0x38200 bit 0 (0xffffffff -> 0xffffffff)
> > 
> > // As here too.
> > 
> > It would be interesting to see what ACPI does here, I mean enabling
> > dynamic debugging of acpi_device_set_power() and friends.
> 
> Here is a log with dyndbg turned up.
> https://gist.github.com/superm1/2f7ca6123431365d11e9bc3cc9329c14
> 
> You can see firmware node \_SB_.PCI0.BUSC.NHI0 is physical device
> 0000:e5:00.5.

Thanks!

> > One suspect is
> > that when the runtime suspends happens, the power resource or _PS3()
> > gets called to put the device into D3cold and it has some sort of
> > Sleep() inside. This allows the OS to relese the CPU to do other things
> > too according to ACPI spec (and Linux does this) so now when you plug in
> > the device the GPE triggers wake and Linux starts processing that one.
> > We see the above runtime resume but then the Sleep() in the _PS3()
> > completes and cuts the power from the NHI while we are in the middle of
> > resuming it already (or something along those lines).
> > 
> > We saw similar "context switch" happening when PCIe L2 failed, BIOS
> > implemented the timeout using Sleep() that allowed Linux to process
> > other things which resulted unexpected wake (not the same as here
> > though).
> > 
> > So one thing to check too is how the ACPI methods get called and
> > especially if they somehow end up messing with each other.
> 
> At least to me it looks pretty straightforward that each D0 sequence is
> going through and nothing looked out of order:
> 
> acpi_pci_set_power_state()
> ->acpi_device_set_power()
> ->->acpi_dev_pm_explicit_set()
> 
> 
> There is a Sleep() call in M232() which is a SMI used in both _PS0 and _PS3,
> but as far as I can tell it's not actually called as the case always has
> zero as an argument.
> 
> Here's some snippets of ASL from the NHI0 device to see.
> https://gist.github.com/superm1/32ef1f822a6220a41b19574024717f79
> 
> I feel your theory is right about it being an aborted D3cold request, but I
> don't believe it's from concurrency issue of _PS3.  I feel there isn't any
> guard rails either in Linux or the AML for ensuring that
> the transition actually finished.

Yeah, I agree now. It does not look like the methods are messing each
other here. We don't see the GPE handler being run but I don't think it
matters here. For some reason the device just is not yet ready when it
is supposed to be in D0.

Sorry for wasting your time with these suspects.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-13  4:58                           ` Mika Westerberg
@ 2024-09-13  7:23                             ` Mika Westerberg
  2024-09-13 20:56                               ` Mario Limonciello
  0 siblings, 1 reply; 31+ messages in thread
From: Mika Westerberg @ 2024-09-13  7:23 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Bjorn Helgaas, Gary Li, Mario Limonciello, Bjorn Helgaas,
	Mathias Nyman, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Greg Kroah-Hartman,
	Ilpo Järvinen

Hi again,

On Fri, Sep 13, 2024 at 07:58:07AM +0300, Mika Westerberg wrote:
> Yeah, I agree now. It does not look like the methods are messing each
> other here. We don't see the GPE handler being run but I don't think it
> matters here. For some reason the device just is not yet ready when it
> is supposed to be in D0.
> 
> Sorry for wasting your time with these suspects.

One more suggestion though ;-) I realized that my hack patch to disable
I/O and MMIO did not actually do that because it looks like we don't
clear those bits ever. I wonder if you could still check if the below
changes anything? At least it should now "disable" the device to follow
the spec.

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index f412ef73a6e4..79a566376301 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1332,6 +1332,7 @@ static int pci_pm_runtime_suspend(struct device *dev)
 
 	if (!pci_dev->state_saved) {
 		pci_save_state(pci_dev);
+		pci_pm_default_suspend(pci_dev);
 		pci_finish_runtime_suspend(pci_dev);
 	}
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ffaaca0978cb..91f4e7a03c94 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2218,6 +2218,13 @@ static void do_pci_disable_device(struct pci_dev *dev)
 		pci_command &= ~PCI_COMMAND_MASTER;
 		pci_write_config_word(dev, PCI_COMMAND, pci_command);
 	}
+	/*
+	 * PCI PM 1.2 sec 8.2.2 says that when a function is put into D3
+	 * the OS needs to disable I/O and MMIO space in addition to bus
+	 * mastering so do that here.
+	 */
+	pci_command &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY);
+	pci_write_config_word(dev, PCI_COMMAND, pci_command);
 
 	pcibios_disable_device(dev);
 }

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-13  7:23                             ` Mika Westerberg
@ 2024-09-13 20:56                               ` Mario Limonciello
  2024-09-15  7:07                                 ` Mika Westerberg
  0 siblings, 1 reply; 31+ messages in thread
From: Mario Limonciello @ 2024-09-13 20:56 UTC (permalink / raw)
  To: Mika Westerberg, Bjorn Helgaas
  Cc: Gary Li, Mario Limonciello, Bjorn Helgaas, Mathias Nyman,
	open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Greg Kroah-Hartman, Ilpo Järvinen

On 9/13/2024 02:23, Mika Westerberg wrote:
> Hi again,
> 
> On Fri, Sep 13, 2024 at 07:58:07AM +0300, Mika Westerberg wrote:
>> Yeah, I agree now. It does not look like the methods are messing each
>> other here. We don't see the GPE handler being run but I don't think it
>> matters here. For some reason the device just is not yet ready when it
>> is supposed to be in D0.
>>
>> Sorry for wasting your time with these suspects.

Don't worry about it.  It validates that we're coming to the right 
conclusions about where the problem is.

> 
> One more suggestion though ;-) I realized that my hack patch to disable
> I/O and MMIO did not actually do that because it looks like we don't
> clear those bits ever. I wonder if you could still check if the below
> changes anything? At least it should now "disable" the device to follow
> the spec.

This actually causes the system to fail to boot.  I guess some deadlock 
from other callers to pci_disable_device().

We also double checked putting the PCI_COMMAND writes just into the 
runtime suspend call backs instead (to narrow down if that is part of 
the issue here).  Putting it there fixed the boot hang, but no change to 
the actual issue behavior.

Bjorn,

Considering this finding, are you more amenable to the polling approach?

> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index f412ef73a6e4..79a566376301 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1332,6 +1332,7 @@ static int pci_pm_runtime_suspend(struct device *dev)
>   
>   	if (!pci_dev->state_saved) {
>   		pci_save_state(pci_dev);
> +		pci_pm_default_suspend(pci_dev);
>   		pci_finish_runtime_suspend(pci_dev);
>   	}
>   
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ffaaca0978cb..91f4e7a03c94 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2218,6 +2218,13 @@ static void do_pci_disable_device(struct pci_dev *dev)
>   		pci_command &= ~PCI_COMMAND_MASTER;
>   		pci_write_config_word(dev, PCI_COMMAND, pci_command);
>   	}
> +	/*
> +	 * PCI PM 1.2 sec 8.2.2 says that when a function is put into D3
> +	 * the OS needs to disable I/O and MMIO space in addition to bus
> +	 * mastering so do that here.
> +	 */
> +	pci_command &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY);
> +	pci_write_config_word(dev, PCI_COMMAND, pci_command);
>   
>   	pcibios_disable_device(dev);
>   }


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
  2024-09-13 20:56                               ` Mario Limonciello
@ 2024-09-15  7:07                                 ` Mika Westerberg
  0 siblings, 0 replies; 31+ messages in thread
From: Mika Westerberg @ 2024-09-15  7:07 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Bjorn Helgaas, Gary Li, Mario Limonciello, Bjorn Helgaas,
	Mathias Nyman, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Greg Kroah-Hartman,
	Ilpo Järvinen

Hi,

On Fri, Sep 13, 2024 at 03:56:46PM -0500, Mario Limonciello wrote:
> > One more suggestion though ;-) I realized that my hack patch to disable
> > I/O and MMIO did not actually do that because it looks like we don't
> > clear those bits ever. I wonder if you could still check if the below
> > changes anything? At least it should now "disable" the device to follow
> > the spec.
> 
> This actually causes the system to fail to boot.  I guess some deadlock from
> other callers to pci_disable_device().
> 
> We also double checked putting the PCI_COMMAND writes just into the runtime
> suspend call backs instead (to narrow down if that is part of the issue
> here).  Putting it there fixed the boot hang, but no change to the actual
> issue behavior.

Okay, thanks again for checking!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 0/5] Verify devices transition from D3cold to D0
  2024-08-23 15:40 [PATCH v5 0/5] Verify devices transition from D3cold to D0 Mario Limonciello
                   ` (4 preceding siblings ...)
  2024-08-23 15:40 ` [PATCH v5 5/5] PCI: Drop Radeon quirk for Macbook Pro 8.2 Mario Limonciello
@ 2024-12-04 17:30 ` Mario Limonciello
  2024-12-04 23:45   ` Bjorn Helgaas
  5 siblings, 1 reply; 31+ messages in thread
From: Mario Limonciello @ 2024-12-04 17:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: open list : PCI SUBSYSTEM, open list, open list : USB XHCI DRIVER,
	Daniel Drake, Gary Li, Greg Kroah-Hartman, Ilpo Järvinen,
	Mathias Nyman, Mika Westerberg

On 8/23/2024 10:40, Mario Limonciello wrote:
> From: Mario Limonciello <mario.limonciello@amd.com>
> 
> Gary has reported that when a dock is plugged into a system at the same
> time the autosuspend delay has tripped that the USB4 stack malfunctions.
> 
> Messages show up like this:
> 
> ```
> thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled
> ```
> 
> Furthermore the USB4 router is non-functional at this point.
> 
> Those messages happen because the device is still in D3cold at the time
> that the PCI core handed control back to the USB4 connection manager
> (thunderbolt).
> 
> The issue is that it takes time for a device to enter D3cold and do a
> conventional reset, and then more time for it to exit D3cold.
> 
> This appears not to be a new problem; previously there were very similar
> reports from Ryzen XHCI controllers.  Quirks were added for those.
> Furthermore; adding extra logging it's apparent that other PCI devices
> in the system can take more than 10ms to recover from D3cold as well.
> 
> This series add a wait into pci_power_up() specifically for D3cold exit and
> then drops the quirks that were previously used for the Ryzen XHCI controllers.
> 
> Mario Limonciello (5):
>    PCI: Use an enum for reset type in pci_dev_wait()
>    PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
>    PCI: Verify functions currently in D3cold have entered D0
>    PCI: Allow Ryzen XHCI controllers into D3cold and drop delays
>    PCI: Drop Radeon quirk for Macbook Pro 8.2
> 
>   drivers/pci/pci-driver.c    |  2 +-
>   drivers/pci/pci.c           | 70 +++++++++++++++++++++++++++----------
>   drivers/pci/pci.h           | 13 ++++++-
>   drivers/pci/pcie/dpc.c      |  2 +-
>   drivers/pci/quirks.c        | 25 -------------
>   drivers/usb/host/xhci-pci.c | 11 ------
>   6 files changed, 66 insertions(+), 57 deletions(-)
> 

Bjorn,

This series has stalled a while.

Mika and I went back and forth and I think are generally in agreement so 
I think it's waiting on your feedback.

Can you take another look?

The alternative is to add some more piles of quirks, but I'm hoping that 
we can go this direction and drop a bunch of the old ones instead.

LMK if you want me to rebase it on 6.13-rc1 and resend a v6.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 0/5] Verify devices transition from D3cold to D0
  2024-12-04 17:30 ` [PATCH v5 0/5] Verify devices transition from D3cold to D0 Mario Limonciello
@ 2024-12-04 23:45   ` Bjorn Helgaas
  2024-12-05  3:44     ` Mario Limonciello
  0 siblings, 1 reply; 31+ messages in thread
From: Bjorn Helgaas @ 2024-12-04 23:45 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Bjorn Helgaas, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Gary Li,
	Greg Kroah-Hartman, Ilpo Järvinen, Mathias Nyman,
	Mika Westerberg

On Wed, Dec 04, 2024 at 11:30:51AM -0600, Mario Limonciello wrote:
> On 8/23/2024 10:40, Mario Limonciello wrote:
> > From: Mario Limonciello <mario.limonciello@amd.com>
> > 
> > Gary has reported that when a dock is plugged into a system at the same
> > time the autosuspend delay has tripped that the USB4 stack malfunctions.
> > 
> > Messages show up like this:
> > 
> > ```
> > thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled
> > ```
> > 
> > Furthermore the USB4 router is non-functional at this point.
> > 
> > Those messages happen because the device is still in D3cold at the time
> > that the PCI core handed control back to the USB4 connection manager
> > (thunderbolt).
> > 
> > The issue is that it takes time for a device to enter D3cold and do a
> > conventional reset, and then more time for it to exit D3cold.
> > 
> > This appears not to be a new problem; previously there were very similar
> > reports from Ryzen XHCI controllers.  Quirks were added for those.
> > Furthermore; adding extra logging it's apparent that other PCI devices
> > in the system can take more than 10ms to recover from D3cold as well.
> > 
> > This series add a wait into pci_power_up() specifically for D3cold exit and
> > then drops the quirks that were previously used for the Ryzen XHCI controllers.
> > 
> > Mario Limonciello (5):
> >    PCI: Use an enum for reset type in pci_dev_wait()
> >    PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
> >    PCI: Verify functions currently in D3cold have entered D0
> >    PCI: Allow Ryzen XHCI controllers into D3cold and drop delays
> >    PCI: Drop Radeon quirk for Macbook Pro 8.2
> > 
> >   drivers/pci/pci-driver.c    |  2 +-
> >   drivers/pci/pci.c           | 70 +++++++++++++++++++++++++++----------
> >   drivers/pci/pci.h           | 13 ++++++-
> >   drivers/pci/pcie/dpc.c      |  2 +-
> >   drivers/pci/quirks.c        | 25 -------------
> >   drivers/usb/host/xhci-pci.c | 11 ------
> >   6 files changed, 66 insertions(+), 57 deletions(-)
> 
> Bjorn,
> 
> This series has stalled a while.
> 
> Mika and I went back and forth and I think are generally in agreement so I
> think it's waiting on your feedback.
> 
> Can you take another look?
> 
> The alternative is to add some more piles of quirks, but I'm hoping that we
> can go this direction and drop a bunch of the old ones instead.
> 
> LMK if you want me to rebase it on 6.13-rc1 and resend a v6.

I'm still stuck on patch 2/5 because I'm not aware of any spec
language about polling PCI_PM_CTRL to wait for a power state
transition, so it seems really ad hoc.

If you do rebase to v6.13-rc1, in the 2/5 commit log,
s/evices/devices/.

I guess that whole patch and commit log needs updating since the RRS
code was added to pci_dev_wait() in the interim, so the "device that
has gone through a reset may return a value in PCI_COMMAND but that
doesn't mean it's finished transitioning to D0" doesn't directly apply
anymore.

Bjorn

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 0/5] Verify devices transition from D3cold to D0
  2024-12-04 23:45   ` Bjorn Helgaas
@ 2024-12-05  3:44     ` Mario Limonciello
  2024-12-05 18:12       ` Bjorn Helgaas
  0 siblings, 1 reply; 31+ messages in thread
From: Mario Limonciello @ 2024-12-05  3:44 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Gary Li,
	Greg Kroah-Hartman, Ilpo Järvinen, Mathias Nyman,
	Mika Westerberg

On 12/4/2024 17:45, Bjorn Helgaas wrote:
> On Wed, Dec 04, 2024 at 11:30:51AM -0600, Mario Limonciello wrote:
>> On 8/23/2024 10:40, Mario Limonciello wrote:
>>> From: Mario Limonciello <mario.limonciello@amd.com>
>>>
>>> Gary has reported that when a dock is plugged into a system at the same
>>> time the autosuspend delay has tripped that the USB4 stack malfunctions.
>>>
>>> Messages show up like this:
>>>
>>> ```
>>> thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled
>>> ```
>>>
>>> Furthermore the USB4 router is non-functional at this point.
>>>
>>> Those messages happen because the device is still in D3cold at the time
>>> that the PCI core handed control back to the USB4 connection manager
>>> (thunderbolt).
>>>
>>> The issue is that it takes time for a device to enter D3cold and do a
>>> conventional reset, and then more time for it to exit D3cold.
>>>
>>> This appears not to be a new problem; previously there were very similar
>>> reports from Ryzen XHCI controllers.  Quirks were added for those.
>>> Furthermore; adding extra logging it's apparent that other PCI devices
>>> in the system can take more than 10ms to recover from D3cold as well.
>>>
>>> This series add a wait into pci_power_up() specifically for D3cold exit and
>>> then drops the quirks that were previously used for the Ryzen XHCI controllers.
>>>
>>> Mario Limonciello (5):
>>>     PCI: Use an enum for reset type in pci_dev_wait()
>>>     PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
>>>     PCI: Verify functions currently in D3cold have entered D0
>>>     PCI: Allow Ryzen XHCI controllers into D3cold and drop delays
>>>     PCI: Drop Radeon quirk for Macbook Pro 8.2
>>>
>>>    drivers/pci/pci-driver.c    |  2 +-
>>>    drivers/pci/pci.c           | 70 +++++++++++++++++++++++++++----------
>>>    drivers/pci/pci.h           | 13 ++++++-
>>>    drivers/pci/pcie/dpc.c      |  2 +-
>>>    drivers/pci/quirks.c        | 25 -------------
>>>    drivers/usb/host/xhci-pci.c | 11 ------
>>>    6 files changed, 66 insertions(+), 57 deletions(-)
>>
>> Bjorn,
>>
>> This series has stalled a while.
>>
>> Mika and I went back and forth and I think are generally in agreement so I
>> think it's waiting on your feedback.
>>
>> Can you take another look?
>>
>> The alternative is to add some more piles of quirks, but I'm hoping that we
>> can go this direction and drop a bunch of the old ones instead.
>>
>> LMK if you want me to rebase it on 6.13-rc1 and resend a v6.
> 
> I'm still stuck on patch 2/5 because I'm not aware of any spec
> language about polling PCI_PM_CTRL to wait for a power state
> transition, so it seems really ad hoc.

I'm not really sure how to overcome this.  If I rebase everything I'll 
give specs another read through in case I missed anything, but I suspect 
you know these specs better than anyoe on this list.

Is it worth raising this to PCI-SIG to discuss?
Did you perhaps already do that?

> 
> If you do rebase to v6.13-rc1, in the 2/5 commit log,
> s/evices/devices/.
> 
> I guess that whole patch and commit log needs updating since the RRS
> code was added to pci_dev_wait() in the interim, so the "device that
> has gone through a reset may return a value in PCI_COMMAND but that
> doesn't mean it's finished transitioning to D0" doesn't directly apply
> anymore.
> 
> Bjorn


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 0/5] Verify devices transition from D3cold to D0
  2024-12-05  3:44     ` Mario Limonciello
@ 2024-12-05 18:12       ` Bjorn Helgaas
  0 siblings, 0 replies; 31+ messages in thread
From: Bjorn Helgaas @ 2024-12-05 18:12 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Bjorn Helgaas, open list : PCI SUBSYSTEM, open list,
	open list : USB XHCI DRIVER, Daniel Drake, Gary Li,
	Greg Kroah-Hartman, Ilpo Järvinen, Mathias Nyman,
	Mika Westerberg

On Wed, Dec 04, 2024 at 09:44:05PM -0600, Mario Limonciello wrote:
> On 12/4/2024 17:45, Bjorn Helgaas wrote:
> > On Wed, Dec 04, 2024 at 11:30:51AM -0600, Mario Limonciello wrote:
> > > On 8/23/2024 10:40, Mario Limonciello wrote:
> > > > From: Mario Limonciello <mario.limonciello@amd.com>
> > > > 
> > > > Gary has reported that when a dock is plugged into a system at the same
> > > > time the autosuspend delay has tripped that the USB4 stack malfunctions.
> > > > 
> > > > Messages show up like this:
> > > > 
> > > > ```
> > > > thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled
> > > > ```
> > > > 
> > > > Furthermore the USB4 router is non-functional at this point.
> > > > 
> > > > Those messages happen because the device is still in D3cold at the time
> > > > that the PCI core handed control back to the USB4 connection manager
> > > > (thunderbolt).
> > > > 
> > > > The issue is that it takes time for a device to enter D3cold and do a
> > > > conventional reset, and then more time for it to exit D3cold.
> > > > 
> > > > This appears not to be a new problem; previously there were very similar
> > > > reports from Ryzen XHCI controllers.  Quirks were added for those.
> > > > Furthermore; adding extra logging it's apparent that other PCI devices
> > > > in the system can take more than 10ms to recover from D3cold as well.
> > > > 
> > > > This series add a wait into pci_power_up() specifically for D3cold exit and
> > > > then drops the quirks that were previously used for the Ryzen XHCI controllers.
> > > > 
> > > > Mario Limonciello (5):
> > > >     PCI: Use an enum for reset type in pci_dev_wait()
> > > >     PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
> > > >     PCI: Verify functions currently in D3cold have entered D0
> > > >     PCI: Allow Ryzen XHCI controllers into D3cold and drop delays
> > > >     PCI: Drop Radeon quirk for Macbook Pro 8.2
> > > > 
> > > >    drivers/pci/pci-driver.c    |  2 +-
> > > >    drivers/pci/pci.c           | 70 +++++++++++++++++++++++++++----------
> > > >    drivers/pci/pci.h           | 13 ++++++-
> > > >    drivers/pci/pcie/dpc.c      |  2 +-
> > > >    drivers/pci/quirks.c        | 25 -------------
> > > >    drivers/usb/host/xhci-pci.c | 11 ------
> > > >    6 files changed, 66 insertions(+), 57 deletions(-)
> > > 
> > > Bjorn,
> > > 
> > > This series has stalled a while.
> > > 
> > > Mika and I went back and forth and I think are generally in agreement so I
> > > think it's waiting on your feedback.
> > > 
> > > Can you take another look?
> > > 
> > > The alternative is to add some more piles of quirks, but I'm hoping that we
> > > can go this direction and drop a bunch of the old ones instead.
> > > 
> > > LMK if you want me to rebase it on 6.13-rc1 and resend a v6.
> > 
> > I'm still stuck on patch 2/5 because I'm not aware of any spec
> > language about polling PCI_PM_CTRL to wait for a power state
> > transition, so it seems really ad hoc.
> 
> I'm not really sure how to overcome this.  If I rebase everything I'll give
> specs another read through in case I missed anything, but I suspect you know
> these specs better than anyoe on this list.
> 
> Is it worth raising this to PCI-SIG to discuss?
> Did you perhaps already do that?

I haven't, but it seems like that might be appropriate.  Can we
formulate a question in terms of PCIe concepts?

I guess some of the pieces are:

  - ACPI puts device in D3cold
  - ACPI restores device to D0
  - Device is not yet in D0 when ACPI method has completed
  - OS read from device causes UR or RRS, so OS reads ~0

It's much more complicated if ACPI is involved because then we have
question of whether the ACPI method should have waited.

Patch 3/5 mentions "Full context restore or boot latency" language in
PCI PM r1.2, but I couldn't find section.  I don't know how a generic
spec could constrain D0->D3cold or D3cold->D0 transition time because
that depends on platform-specific power management hardware.

Bjorn

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-12-05 18:12 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-23 15:40 [PATCH v5 0/5] Verify devices transition from D3cold to D0 Mario Limonciello
2024-08-23 15:40 ` [PATCH v5 1/5] PCI: Use an enum for reset type in pci_dev_wait() Mario Limonciello
2024-08-23 15:40 ` [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND " Mario Limonciello
2024-08-23 19:54   ` Bjorn Helgaas
2024-08-26 19:16     ` Mario Limonciello
2024-08-27 17:43       ` Mario Limonciello
2024-08-27 19:44       ` Bjorn Helgaas
2024-08-30  0:01   ` Bjorn Helgaas
2024-09-03 16:29     ` Mario Limonciello
2024-09-03 17:11       ` Bjorn Helgaas
2024-09-03 17:31         ` Mario Limonciello
2024-09-03 18:25           ` Bjorn Helgaas
2024-09-03 18:32             ` Mario Limonciello
2024-09-03 21:32               ` Bjorn Helgaas
2024-09-04 12:05               ` Mika Westerberg
2024-09-04 15:24                 ` Mario Limonciello
2024-09-05  9:33                   ` Mika Westerberg
2024-09-09 20:40                     ` Mario Limonciello
2024-09-10  9:13                       ` Mika Westerberg
2024-09-13  4:12                         ` Mario Limonciello
2024-09-13  4:58                           ` Mika Westerberg
2024-09-13  7:23                             ` Mika Westerberg
2024-09-13 20:56                               ` Mario Limonciello
2024-09-15  7:07                                 ` Mika Westerberg
2024-08-23 15:40 ` [PATCH v5 3/5] PCI: Verify functions currently in D3cold have entered D0 Mario Limonciello
2024-08-23 15:40 ` [PATCH v5 4/5] PCI: Allow Ryzen XHCI controllers into D3cold and drop delays Mario Limonciello
2024-08-23 15:40 ` [PATCH v5 5/5] PCI: Drop Radeon quirk for Macbook Pro 8.2 Mario Limonciello
2024-12-04 17:30 ` [PATCH v5 0/5] Verify devices transition from D3cold to D0 Mario Limonciello
2024-12-04 23:45   ` Bjorn Helgaas
2024-12-05  3:44     ` Mario Limonciello
2024-12-05 18:12       ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).