* [PATCH 0/4] Verify devices transition from D3cold to D0
@ 2024-06-13 5:42 Mario Limonciello
2024-06-13 5:42 ` [PATCH 1/4] PCI: Check PCI_PM_CTRL in pci_dev_wait() Mario Limonciello
` (5 more replies)
0 siblings, 6 replies; 16+ messages in thread
From: Mario Limonciello @ 2024-06-13 5:42 UTC (permalink / raw)
To: Bjorn Helgaas, Mathias Nyman, Greg Kroah-Hartman
Cc: open list:PCI SUBSYSTEM, open list, open list:USB XHCI DRIVER,
Daniel Drake, Gary Li, Mika Westerberg, Mario Limonciello
Gary has reported that when a dock is plugged into a system at the same
time the autosuspend delay has tripped that the USB4 stack malfunctions.
Messages show up like this:
```
thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled
```
Furthermore the USB4 router is non-functional at this point.
Those messages happen because the device is still in D3cold at the time
that the PCI core handed control back to the USB4 connection manager
(thunderbolt).
The issue is that it takes time for a device to enter D3cold and do a
conventional reset, and then more time for it to exit D3cold.
This appears not to be a new problem; previously there were very similar
reports from Ryzen XHCI controllers. Quirks were added for those.
Furthermore; adding extra logging it's apparent that other PCI devices
in the system can take more than 10ms to recover from D3cold as well.
This series add a wait into pci_power_up() specifically for D3cold exit and
then drops the quirks that were previously used for the Ryzen XHCI controllers.
Mario Limonciello (4):
PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
PCI: Verify functions currently in D3cold have entered D0
PCI: Allow Ryzen XHCI controllers into D3cold and drop delays
PCI: Drop Radeon quirk for Macbook Pro 8.2
drivers/pci/pci.c | 21 ++++++++++++++++-----
drivers/pci/quirks.c | 25 -------------------------
drivers/usb/host/xhci-pci.c | 11 -----------
3 files changed, 16 insertions(+), 41 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/4] PCI: Check PCI_PM_CTRL in pci_dev_wait()
2024-06-13 5:42 [PATCH 0/4] Verify devices transition from D3cold to D0 Mario Limonciello
@ 2024-06-13 5:42 ` Mario Limonciello
2024-06-19 18:33 ` Markus Elfring
2024-06-13 5:42 ` [PATCH 2/4] PCI: Verify functions currently in D3cold have entered D0 Mario Limonciello
` (4 subsequent siblings)
5 siblings, 1 reply; 16+ messages in thread
From: Mario Limonciello @ 2024-06-13 5:42 UTC (permalink / raw)
To: Bjorn Helgaas, Mathias Nyman, Greg Kroah-Hartman
Cc: open list:PCI SUBSYSTEM, open list, open list:USB XHCI DRIVER,
Daniel Drake, Gary Li, Mika Westerberg, Mario Limonciello
A device that has gone through a reset may return a value in PCI_COMMAND
but that doesn't mean it's finished transitioning to D0. On devices that
support power management explicitly check PCI_PM_CTRL to ensure the
transition happened. Devicees that don't support power management will
continue to use PCI_COMMAND.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
drivers/pci/pci.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 59e0949fb079..41961e28a86c 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1270,21 +1270,33 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
* the read (except when CRS SV is enabled and the read was for the
* Vendor ID; in that case it synthesizes 0x0001 data).
*
- * Wait for the device to return a non-CRS completion. Read the
- * Command register instead of Vendor ID so we don't have to
- * contend with the CRS SV value.
+ * Wait for the device to return a non-CRS completion. On devices
+ * that support PM control read the PM control register to ensure
+ * the device has transitioned to D0. On devices that don't support
+ * PM control, read the command register to instead of Vendor ID so
+ * we don't have to contend with the CRS SV value.
*/
for (;;) {
- u32 id;
if (pci_dev_is_disconnected(dev)) {
pci_dbg(dev, "disconnected; not waiting\n");
return -ENOTTY;
}
- pci_read_config_dword(dev, PCI_COMMAND, &id);
- if (!PCI_POSSIBLE_ERROR(id))
- break;
+ if (dev->pm_cap) {
+ u16 pmcsr;
+
+ pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+ if (!PCI_POSSIBLE_ERROR(pmcsr) &&
+ (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0)
+ break;
+ } else {
+ u32 id;
+
+ pci_read_config_dword(dev, PCI_COMMAND, &id);
+ if (!PCI_POSSIBLE_ERROR(id))
+ break;
+ }
if (delay > timeout) {
pci_warn(dev, "not ready %dms after %s; giving up\n",
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/4] PCI: Verify functions currently in D3cold have entered D0
2024-06-13 5:42 [PATCH 0/4] Verify devices transition from D3cold to D0 Mario Limonciello
2024-06-13 5:42 ` [PATCH 1/4] PCI: Check PCI_PM_CTRL in pci_dev_wait() Mario Limonciello
@ 2024-06-13 5:42 ` Mario Limonciello
2024-06-19 18:45 ` Markus Elfring
2024-06-13 5:42 ` [PATCH 3/4] PCI: Allow Ryzen XHCI controllers into D3cold and drop delays Mario Limonciello
` (3 subsequent siblings)
5 siblings, 1 reply; 16+ messages in thread
From: Mario Limonciello @ 2024-06-13 5:42 UTC (permalink / raw)
To: Bjorn Helgaas, Mathias Nyman, Greg Kroah-Hartman
Cc: open list:PCI SUBSYSTEM, open list, open list:USB XHCI DRIVER,
Daniel Drake, Gary Li, Mika Westerberg, Mario Limonciello
It is reported that USB4 routers and downstream devices may behave
incorrectly if a dock cable is plugged in at approximately the time that
the autosuspend_delay is configured. In this situation the device has
attempted to enter D3cold, but didn't finish D3cold entry when the PCI
core tried to transition it back to D0.
Empirically measuring this situation an "aborted" D3cold exit takes
~60ms and a "normal" D3cold exit takes ~10ms.
The PCI-PM 1.2 spec specifies that the restore time for functions
in D3cold is either 'Full context restore or boot latency'.
As PCIe r6.0 sec 5.8 specifies that the device will have gone
through a conventional reset it may take some time for the
device to be ready.
Wait up to 1 sec as specified in PCIe r6.0 sec 6.6.1 for a device
in D3cold to return to D0.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
drivers/pci/pci.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 41961e28a86c..a6ed85dd8d92 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1387,6 +1387,17 @@ int pci_power_up(struct pci_dev *dev)
else if (state == PCI_D2)
udelay(PCI_PM_D2_DELAY);
+ /*
+ * D3cold -> D0 will have gone through a conventional reset and may need
+ * time to be ready.
+ */
+ if (dev->current_state == PCI_D3cold) {
+ int ret;
+
+ ret = pci_dev_wait(dev, "D3cold->D0", PCI_RESET_WAIT);
+ if (ret)
+ return ret;
+ }
end:
dev->current_state = PCI_D0;
if (need_restore)
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/4] PCI: Allow Ryzen XHCI controllers into D3cold and drop delays
2024-06-13 5:42 [PATCH 0/4] Verify devices transition from D3cold to D0 Mario Limonciello
2024-06-13 5:42 ` [PATCH 1/4] PCI: Check PCI_PM_CTRL in pci_dev_wait() Mario Limonciello
2024-06-13 5:42 ` [PATCH 2/4] PCI: Verify functions currently in D3cold have entered D0 Mario Limonciello
@ 2024-06-13 5:42 ` Mario Limonciello
2024-06-13 5:42 ` [PATCH 4/4] PCI: Drop Radeon quirk for Macbook Pro 8.2 Mario Limonciello
` (2 subsequent siblings)
5 siblings, 0 replies; 16+ messages in thread
From: Mario Limonciello @ 2024-06-13 5:42 UTC (permalink / raw)
To: Bjorn Helgaas, Mathias Nyman, Greg Kroah-Hartman
Cc: open list:PCI SUBSYSTEM, open list, open list:USB XHCI DRIVER,
Daniel Drake, Gary Li, Mika Westerberg, Mario Limonciello
As the PCI core now waits for D0 after D3cold exit, the Ryzen XHCI
controllers that were quirked to not use D3cold and to add a delay
on D3hot no longer need these quirks.
Drop both the PCI and XHCI sets of quirks.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
drivers/pci/quirks.c | 17 -----------------
drivers/usb/host/xhci-pci.c | 11 -----------
drivers/usb/host/xhci.h | 11 +++++------
3 files changed, 5 insertions(+), 34 deletions(-)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 568410e64ce6..942d0fe12cb1 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2059,23 +2059,6 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8,
quirk_nvidia_hda_pm);
-/*
- * Ryzen5/7 XHCI controllers fail upon resume from runtime suspend or s2idle.
- * https://bugzilla.kernel.org/show_bug.cgi?id=205587
- *
- * The kernel attempts to transition these devices to D3cold, but that seems
- * to be ineffective on the platforms in question; the PCI device appears to
- * remain on in D3hot state. The D3hot-to-D0 transition then requires an
- * extended delay in order to succeed.
- */
-static void quirk_ryzen_xhci_d3hot(struct pci_dev *dev)
-{
- quirk_d3hot_delay(dev, 20);
-}
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3hot);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3hot);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1639, quirk_ryzen_xhci_d3hot);
-
#ifdef CONFIG_X86_IO_APIC
static int dmi_disable_ioapicreroute(const struct dmi_system_id *d)
{
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 0614d002aba1..ae330d3df6de 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -314,10 +314,6 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
(pdev->device == PCI_DEVICE_ID_AMD_PROMONTORYA_1)))
xhci->quirks |= XHCI_U2_DISABLE_WAKE;
- if (pdev->vendor == PCI_VENDOR_ID_AMD &&
- pdev->device == PCI_DEVICE_ID_AMD_RENOIR_XHCI)
- xhci->quirks |= XHCI_BROKEN_D3COLD_S2I;
-
if (pdev->vendor == PCI_VENDOR_ID_INTEL) {
xhci->quirks |= XHCI_LPM_SUPPORT;
xhci->quirks |= XHCI_INTEL_HOST;
@@ -743,13 +739,6 @@ static int xhci_pci_suspend(struct usb_hcd *hcd, bool do_wakeup)
if (xhci->quirks & XHCI_COMP_MODE_QUIRK)
pci_d3cold_disable(pdev);
-#ifdef CONFIG_SUSPEND
- /* d3cold is broken, but only when s2idle is used */
- if (pm_suspend_target_state == PM_SUSPEND_TO_IDLE &&
- xhci->quirks & (XHCI_BROKEN_D3COLD_S2I))
- pci_d3cold_disable(pdev);
-#endif
-
if (xhci->quirks & XHCI_PME_STUCK_QUIRK)
xhci_pme_quirk(hcd);
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 30415158ed3c..b84b1c48c517 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1621,12 +1621,11 @@ struct xhci_hcd {
#define XHCI_DISABLE_SPARSE BIT_ULL(38)
#define XHCI_SG_TRB_CACHE_SIZE_QUIRK BIT_ULL(39)
#define XHCI_NO_SOFT_RETRY BIT_ULL(40)
-#define XHCI_BROKEN_D3COLD_S2I BIT_ULL(41)
-#define XHCI_EP_CTX_BROKEN_DCS BIT_ULL(42)
-#define XHCI_SUSPEND_RESUME_CLKS BIT_ULL(43)
-#define XHCI_RESET_TO_DEFAULT BIT_ULL(44)
-#define XHCI_ZHAOXIN_TRB_FETCH BIT_ULL(45)
-#define XHCI_ZHAOXIN_HOST BIT_ULL(46)
+#define XHCI_EP_CTX_BROKEN_DCS BIT_ULL(41)
+#define XHCI_SUSPEND_RESUME_CLKS BIT_ULL(42)
+#define XHCI_RESET_TO_DEFAULT BIT_ULL(43)
+#define XHCI_ZHAOXIN_TRB_FETCH BIT_ULL(44)
+#define XHCI_ZHAOXIN_HOST BIT_ULL(45)
unsigned int num_active_eps;
unsigned int limit_active_eps;
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/4] PCI: Drop Radeon quirk for Macbook Pro 8.2
2024-06-13 5:42 [PATCH 0/4] Verify devices transition from D3cold to D0 Mario Limonciello
` (2 preceding siblings ...)
2024-06-13 5:42 ` [PATCH 3/4] PCI: Allow Ryzen XHCI controllers into D3cold and drop delays Mario Limonciello
@ 2024-06-13 5:42 ` Mario Limonciello
2024-06-18 13:14 ` [PATCH 0/4] Verify devices transition from D3cold to D0 Mika Westerberg
2024-07-09 3:07 ` Mario Limonciello
5 siblings, 0 replies; 16+ messages in thread
From: Mario Limonciello @ 2024-06-13 5:42 UTC (permalink / raw)
To: Bjorn Helgaas, Mathias Nyman, Greg Kroah-Hartman
Cc: open list:PCI SUBSYSTEM, open list, open list:USB XHCI DRIVER,
Daniel Drake, Gary Li, Mika Westerberg, Mario Limonciello
commit 5938628c51a7 ("drm/radeon: make MacBook Pro d3_delay quirk more
generic") introduced a generic quirk for Macbook Pro 8.2s that contain
Radeon graphics to ensure that enough time had past when the device
was powered on.
As the PCI core now verifies the device is in D0 during power on this
extra artificial delay is no longer necessary.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
drivers/pci/quirks.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 942d0fe12cb1..19be953c9f37 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2038,14 +2038,6 @@ static void quirk_d3hot_delay(struct pci_dev *dev, unsigned int delay)
dev->d3hot_delay);
}
-static void quirk_radeon_pm(struct pci_dev *dev)
-{
- if (dev->subsystem_vendor == PCI_VENDOR_ID_APPLE &&
- dev->subsystem_device == 0x00e2)
- quirk_d3hot_delay(dev, 20);
-}
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6741, quirk_radeon_pm);
-
/*
* NVIDIA Ampere-based HDA controllers can wedge the whole device if a bus
* reset is performed too soon after transition to D0, extend d3hot_delay
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Verify devices transition from D3cold to D0
2024-06-13 5:42 [PATCH 0/4] Verify devices transition from D3cold to D0 Mario Limonciello
` (3 preceding siblings ...)
2024-06-13 5:42 ` [PATCH 4/4] PCI: Drop Radeon quirk for Macbook Pro 8.2 Mario Limonciello
@ 2024-06-18 13:14 ` Mika Westerberg
2024-06-18 16:56 ` Mario Limonciello
2024-07-09 3:07 ` Mario Limonciello
5 siblings, 1 reply; 16+ messages in thread
From: Mika Westerberg @ 2024-06-18 13:14 UTC (permalink / raw)
To: Mario Limonciello
Cc: Bjorn Helgaas, Mathias Nyman, Greg Kroah-Hartman,
open list:PCI SUBSYSTEM, open list, open list:USB XHCI DRIVER,
Daniel Drake, Gary Li
Hi Mario,
On Thu, Jun 13, 2024 at 12:42:00AM -0500, Mario Limonciello wrote:
> Gary has reported that when a dock is plugged into a system at the same
> time the autosuspend delay has tripped that the USB4 stack malfunctions.
>
> Messages show up like this:
>
> ```
> thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled
> ```
>
> Furthermore the USB4 router is non-functional at this point.
Once the USB4 domain starts the sleep transition, it cannot be
interrupted by anything so it always should go through full sleep
transition and only then back from sleep.
> Those messages happen because the device is still in D3cold at the time
> that the PCI core handed control back to the USB4 connection manager
> (thunderbolt).
This is weird. Yes we should be getting the wake from the hotplug but
that should happen only after the domain is fully in sleep (D3cold). The
BIOS ACPI code is supposed to deal with this.
> The issue is that it takes time for a device to enter D3cold and do a
> conventional reset, and then more time for it to exit D3cold.
>
> This appears not to be a new problem; previously there were very similar
> reports from Ryzen XHCI controllers. Quirks were added for those.
> Furthermore; adding extra logging it's apparent that other PCI devices
> in the system can take more than 10ms to recover from D3cold as well.
They can take anything up to 100ms after the link has trained.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Verify devices transition from D3cold to D0
2024-06-18 13:14 ` [PATCH 0/4] Verify devices transition from D3cold to D0 Mika Westerberg
@ 2024-06-18 16:56 ` Mario Limonciello
2024-06-19 5:29 ` Mika Westerberg
0 siblings, 1 reply; 16+ messages in thread
From: Mario Limonciello @ 2024-06-18 16:56 UTC (permalink / raw)
To: Mika Westerberg
Cc: Bjorn Helgaas, Mathias Nyman, Greg Kroah-Hartman,
open list:PCI SUBSYSTEM, open list, open list:USB XHCI DRIVER,
Daniel Drake, Gary Li
On 6/18/2024 08:14, Mika Westerberg wrote:
> Hi Mario,
>
> On Thu, Jun 13, 2024 at 12:42:00AM -0500, Mario Limonciello wrote:
>> Gary has reported that when a dock is plugged into a system at the same
>> time the autosuspend delay has tripped that the USB4 stack malfunctions.
>>
>> Messages show up like this:
>>
>> ```
>> thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled
>> ```
>>
>> Furthermore the USB4 router is non-functional at this point.
>
> Once the USB4 domain starts the sleep transition, it cannot be
> interrupted by anything so it always should go through full sleep
> transition and only then back from sleep.
>
>> Those messages happen because the device is still in D3cold at the time
>> that the PCI core handed control back to the USB4 connection manager
>> (thunderbolt).
>
> This is weird. Yes we should be getting the wake from the hotplug but
> that should happen only after the domain is fully in sleep (D3cold). The
> BIOS ACPI code is supposed to deal with this.
Is that from from experience or do you mean there is a spec behavior?
IE I'm wondering if we have different "expectations" from different
company's hardware designers.
>
>> The issue is that it takes time for a device to enter D3cold and do a
>> conventional reset, and then more time for it to exit D3cold.
>>
>> This appears not to be a new problem; previously there were very similar
>> reports from Ryzen XHCI controllers. Quirks were added for those.
>> Furthermore; adding extra logging it's apparent that other PCI devices
>> in the system can take more than 10ms to recover from D3cold as well.
>
> They can take anything up to 100ms after the link has trained.
Right; so currently there is nothing checking they really made it back
to D0 after calling pci_power_up(). I feel like we've "mostly" gotten
lucky.
If you guys agree with this patch series direction this could
potentially mean cleaning up more code that exists for random delays in
the PCI core too.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Verify devices transition from D3cold to D0
2024-06-18 16:56 ` Mario Limonciello
@ 2024-06-19 5:29 ` Mika Westerberg
2024-06-19 18:50 ` Mario Limonciello
0 siblings, 1 reply; 16+ messages in thread
From: Mika Westerberg @ 2024-06-19 5:29 UTC (permalink / raw)
To: Mario Limonciello
Cc: Bjorn Helgaas, Mathias Nyman, Greg Kroah-Hartman,
open list:PCI SUBSYSTEM, open list, open list:USB XHCI DRIVER,
Daniel Drake, Gary Li
On Tue, Jun 18, 2024 at 11:56:50AM -0500, Mario Limonciello wrote:
> On 6/18/2024 08:14, Mika Westerberg wrote:
> > Hi Mario,
> >
> > On Thu, Jun 13, 2024 at 12:42:00AM -0500, Mario Limonciello wrote:
> > > Gary has reported that when a dock is plugged into a system at the same
> > > time the autosuspend delay has tripped that the USB4 stack malfunctions.
> > >
> > > Messages show up like this:
> > >
> > > ```
> > > thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled
> > > ```
> > >
> > > Furthermore the USB4 router is non-functional at this point.
> >
> > Once the USB4 domain starts the sleep transition, it cannot be
> > interrupted by anything so it always should go through full sleep
> > transition and only then back from sleep.
> >
> > > Those messages happen because the device is still in D3cold at the time
> > > that the PCI core handed control back to the USB4 connection manager
> > > (thunderbolt).
> >
> > This is weird. Yes we should be getting the wake from the hotplug but
> > that should happen only after the domain is fully in sleep (D3cold). The
> > BIOS ACPI code is supposed to deal with this.
>
> Is that from from experience or do you mean there is a spec behavior?
>
> IE I'm wondering if we have different "expectations" from different
> company's hardware designers.
The spec and the CM guide "imply" this behaviour as far as I can tell,
so that the "sleep event" is done completely once started. I guess this
can be interpreted differently too because it is not explicitly said
there.
Can you ask AMD HW folks if this is their interpretation too? Basically
when we get "Sleep Ready" bit set for all the routers in the domain and
turn off power (send PERST) there cannot be wake events until that is
fully completed.
There is typically a timeout mechanism in the BIOS side (part of the
power off method) that waits for the PCIe links to enter L2 before it
triggers PERST. We have seen an issue on our side that if this L2
transition is not completed in time a wake event triggered but that was
a BIOS issue.
> > > The issue is that it takes time for a device to enter D3cold and do a
> > > conventional reset, and then more time for it to exit D3cold.
> > >
> > > This appears not to be a new problem; previously there were very similar
> > > reports from Ryzen XHCI controllers. Quirks were added for those.
> > > Furthermore; adding extra logging it's apparent that other PCI devices
> > > in the system can take more than 10ms to recover from D3cold as well.
> >
> > They can take anything up to 100ms after the link has trained.
>
> Right; so currently there is nothing checking they really made it back to D0
> after calling pci_power_up(). I feel like we've "mostly" gotten lucky.
We do have pci_bridge_wait_for_secondary_bus() there but I guess that's
not called for integrated PCIe endpoints such as xHCI and the USB4 Host
Interface. They too enter D3cold once power is turned off so I agree we
might have gotten lucky here with the D3hot 10ms delay.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] PCI: Check PCI_PM_CTRL in pci_dev_wait()
2024-06-13 5:42 ` [PATCH 1/4] PCI: Check PCI_PM_CTRL in pci_dev_wait() Mario Limonciello
@ 2024-06-19 18:33 ` Markus Elfring
2024-06-19 18:44 ` Mario Limonciello
0 siblings, 1 reply; 16+ messages in thread
From: Markus Elfring @ 2024-06-19 18:33 UTC (permalink / raw)
To: Mario Limonciello, linux-pci, linux-usb, Bjorn Helgaas,
Greg Kroah-Hartman, Mathias Nyman
Cc: LKML, Daniel Drake, Gary Li, Mika Westerberg
> A device that has gone through a reset may return a value in PCI_COMMAND
> but that doesn't mean it's finished transitioning to D0. On devices that
> support power management explicitly check PCI_PM_CTRL to ensure the
> transition happened. Devicees that don't support power management will
Devices?
> continue to use PCI_COMMAND.
Would the tag “Fixes” be relevant for such a change description?
Regards,
Markus
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] PCI: Check PCI_PM_CTRL in pci_dev_wait()
2024-06-19 18:33 ` Markus Elfring
@ 2024-06-19 18:44 ` Mario Limonciello
0 siblings, 0 replies; 16+ messages in thread
From: Mario Limonciello @ 2024-06-19 18:44 UTC (permalink / raw)
To: Markus Elfring, linux-pci, linux-usb, Bjorn Helgaas,
Greg Kroah-Hartman, Mathias Nyman
Cc: LKML, Daniel Drake, Gary Li, Mika Westerberg
On 6/19/2024 13:33, Markus Elfring wrote:
>> A device that has gone through a reset may return a value in PCI_COMMAND
>> but that doesn't mean it's finished transitioning to D0. On devices that
>> support power management explicitly check PCI_PM_CTRL to ensure the
>> transition happened. Devicees that don't support power management will
>
> Devices?
Yes, thanks. I'll fix that up for the next version once we have some
alignment on the functionality outlined in these patches.
>
>
>> continue to use PCI_COMMAND.
>
> Would the tag “Fixes” be relevant for such a change description?
>
> Regards,
> Markus
I did trace back the history of the wait function and it goes back to
4.6. In my mind yes; it is a fix, but I don't think it should go that
far back automatically. I think we should prioritize getting it fixed
for 6.11 or 6.12 and then can revisit how far back to do a stable backport.
For example AMD Rembrandt (where this race condition was found) isn't
enabled until 5.17 or 5.18 IIRC.
The backports would have a dependency on 08e3ed12ca861 (from 6.5-rc1)
and bae26849372b8 (from 5.5-rc1) and 821cdad5c46ca (from 4.14) and
5adecf817dd63 (from 4.6-rc1).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/4] PCI: Verify functions currently in D3cold have entered D0
2024-06-13 5:42 ` [PATCH 2/4] PCI: Verify functions currently in D3cold have entered D0 Mario Limonciello
@ 2024-06-19 18:45 ` Markus Elfring
2024-06-19 18:46 ` Mario Limonciello
0 siblings, 1 reply; 16+ messages in thread
From: Markus Elfring @ 2024-06-19 18:45 UTC (permalink / raw)
To: Mario Limonciello, linux-pci, linux-usb, Bjorn Helgaas,
Greg Kroah-Hartman, Mathias Nyman
Cc: LKML, Daniel Drake, Gary Li, Mika Westerberg
…
> As PCIe r6.0 sec 5.8 specifies that the device will have gone
> through a conventional reset it may take some time for the
> device to be ready.
>
> Wait up to 1 sec as specified in PCIe r6.0 sec 6.6.1 for a device
> in D3cold to return to D0.
Would you like to add the tag “Fixes” accordingly?
Regards,
Markus
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/4] PCI: Verify functions currently in D3cold have entered D0
2024-06-19 18:45 ` Markus Elfring
@ 2024-06-19 18:46 ` Mario Limonciello
0 siblings, 0 replies; 16+ messages in thread
From: Mario Limonciello @ 2024-06-19 18:46 UTC (permalink / raw)
To: Markus Elfring, linux-pci, linux-usb, Bjorn Helgaas,
Greg Kroah-Hartman, Mathias Nyman
Cc: LKML, Daniel Drake, Gary Li, Mika Westerberg
On 6/19/2024 13:45, Markus Elfring wrote:
> …
>> As PCIe r6.0 sec 5.8 specifies that the device will have gone
>> through a conventional reset it may take some time for the
>> device to be ready.
>>
>> Wait up to 1 sec as specified in PCIe r6.0 sec 6.6.1 for a device
>> in D3cold to return to D0.
>
> Would you like to add the tag “Fixes” accordingly?
>
> Regards,
> Markus
The first patch is a strong dependency for this to work, and if we don't
bring the first patch back automatically we shouldn't do this one.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Verify devices transition from D3cold to D0
2024-06-19 5:29 ` Mika Westerberg
@ 2024-06-19 18:50 ` Mario Limonciello
2024-06-25 15:43 ` Mario Limonciello
0 siblings, 1 reply; 16+ messages in thread
From: Mario Limonciello @ 2024-06-19 18:50 UTC (permalink / raw)
To: Mika Westerberg, Gary Li
Cc: Bjorn Helgaas, Mathias Nyman, Greg Kroah-Hartman,
open list:PCI SUBSYSTEM, open list, open list:USB XHCI DRIVER,
Daniel Drake
On 6/19/2024 00:29, Mika Westerberg wrote:
> On Tue, Jun 18, 2024 at 11:56:50AM -0500, Mario Limonciello wrote:
>> On 6/18/2024 08:14, Mika Westerberg wrote:
>>> Hi Mario,
>>>
>>> On Thu, Jun 13, 2024 at 12:42:00AM -0500, Mario Limonciello wrote:
>>>> Gary has reported that when a dock is plugged into a system at the same
>>>> time the autosuspend delay has tripped that the USB4 stack malfunctions.
>>>>
>>>> Messages show up like this:
>>>>
>>>> ```
>>>> thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled
>>>> ```
>>>>
>>>> Furthermore the USB4 router is non-functional at this point.
>>>
>>> Once the USB4 domain starts the sleep transition, it cannot be
>>> interrupted by anything so it always should go through full sleep
>>> transition and only then back from sleep.
>>>
>>>> Those messages happen because the device is still in D3cold at the time
>>>> that the PCI core handed control back to the USB4 connection manager
>>>> (thunderbolt).
>>>
>>> This is weird. Yes we should be getting the wake from the hotplug but
>>> that should happen only after the domain is fully in sleep (D3cold). The
>>> BIOS ACPI code is supposed to deal with this.
>>
>> Is that from from experience or do you mean there is a spec behavior?
>>
>> IE I'm wondering if we have different "expectations" from different
>> company's hardware designers.
>
> The spec and the CM guide "imply" this behaviour as far as I can tell,
> so that the "sleep event" is done completely once started. I guess this
> can be interpreted differently too because it is not explicitly said
> there.
>
> Can you ask AMD HW folks if this is their interpretation too? Basically
> when we get "Sleep Ready" bit set for all the routers in the domain and
> turn off power (send PERST) there cannot be wake events until that is
> fully completed.
>
> There is typically a timeout mechanism in the BIOS side (part of the
> power off method) that waits for the PCIe links to enter L2 before it
> triggers PERST. We have seen an issue on our side that if this L2
> transition is not completed in time a wake event triggered but that was
> a BIOS issue.
Sure thing. I'll discuss it with them and get back with the results.
>
>>>> The issue is that it takes time for a device to enter D3cold and do a
>>>> conventional reset, and then more time for it to exit D3cold.
>>>>
>>>> This appears not to be a new problem; previously there were very similar
>>>> reports from Ryzen XHCI controllers. Quirks were added for those.
>>>> Furthermore; adding extra logging it's apparent that other PCI devices
>>>> in the system can take more than 10ms to recover from D3cold as well.
>>>
>>> They can take anything up to 100ms after the link has trained.
>>
>> Right; so currently there is nothing checking they really made it back to D0
>> after calling pci_power_up(). I feel like we've "mostly" gotten lucky.
>
> We do have pci_bridge_wait_for_secondary_bus() there but I guess that's
> not called for integrated PCIe endpoints such as xHCI and the USB4 Host
> Interface. They too enter D3cold once power is turned off so I agree we
> might have gotten lucky here with the D3hot 10ms delay.
Yup; exactly. I think when Gary found this he was seeing other
integrated devices not being ready in time too, but IIRC there wasn't
functional impact from them like there is on XHCI and USB4.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Verify devices transition from D3cold to D0
2024-06-19 18:50 ` Mario Limonciello
@ 2024-06-25 15:43 ` Mario Limonciello
2024-06-26 7:53 ` Mika Westerberg
0 siblings, 1 reply; 16+ messages in thread
From: Mario Limonciello @ 2024-06-25 15:43 UTC (permalink / raw)
To: Mika Westerberg, Bjorn Helgaas
Cc: Mathias Nyman, Greg Kroah-Hartman, open list:PCI SUBSYSTEM,
open list, open list:USB XHCI DRIVER, Daniel Drake, Gary Li
On 6/19/2024 13:50, Mario Limonciello wrote:
> On 6/19/2024 00:29, Mika Westerberg wrote:
>> On Tue, Jun 18, 2024 at 11:56:50AM -0500, Mario Limonciello wrote:
>>> On 6/18/2024 08:14, Mika Westerberg wrote:
>>>> Hi Mario,
>>>>
>>>> On Thu, Jun 13, 2024 at 12:42:00AM -0500, Mario Limonciello wrote:
>>>>> Gary has reported that when a dock is plugged into a system at the
>>>>> same
>>>>> time the autosuspend delay has tripped that the USB4 stack
>>>>> malfunctions.
>>>>>
>>>>> Messages show up like this:
>>>>>
>>>>> ```
>>>>> thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX
>>>>> ring 0 is already enabled
>>>>> ```
>>>>>
>>>>> Furthermore the USB4 router is non-functional at this point.
>>>>
>>>> Once the USB4 domain starts the sleep transition, it cannot be
>>>> interrupted by anything so it always should go through full sleep
>>>> transition and only then back from sleep.
>>>>
>>>>> Those messages happen because the device is still in D3cold at the
>>>>> time
>>>>> that the PCI core handed control back to the USB4 connection manager
>>>>> (thunderbolt).
>>>>
>>>> This is weird. Yes we should be getting the wake from the hotplug but
>>>> that should happen only after the domain is fully in sleep (D3cold).
>>>> The
>>>> BIOS ACPI code is supposed to deal with this.
>>>
>>> Is that from from experience or do you mean there is a spec behavior?
>>>
>>> IE I'm wondering if we have different "expectations" from different
>>> company's hardware designers.
>>
>> The spec and the CM guide "imply" this behaviour as far as I can tell,
>> so that the "sleep event" is done completely once started. I guess this
>> can be interpreted differently too because it is not explicitly said
>> there.
>>
>> Can you ask AMD HW folks if this is their interpretation too? Basically
>> when we get "Sleep Ready" bit set for all the routers in the domain and
>> turn off power (send PERST) there cannot be wake events until that is
>> fully completed.
>>
>> There is typically a timeout mechanism in the BIOS side (part of the
>> power off method) that waits for the PCIe links to enter L2 before it
>> triggers PERST. We have seen an issue on our side that if this L2
>> transition is not completed in time a wake event triggered but that was
>> a BIOS issue.
>
> Sure thing. I'll discuss it with them and get back with the results.
From the hardware team they describe this as an abnormal state that
they don't expect. I don't believe there is anything in the BIOS to
prevent it though.
I could discuss options for this with the BIOS team in the future for
the USB4 router ACPI device, but as this "seems" to be the same problem
as XHCI controllers going back at least 5 generations with those quirks
I put reverts in this series I think a general kernel solution to make
"sure" that devices have transitioned is the better way to go.
>
>>
>>>>> The issue is that it takes time for a device to enter D3cold and do a
>>>>> conventional reset, and then more time for it to exit D3cold.
>>>>>
>>>>> This appears not to be a new problem; previously there were very
>>>>> similar
>>>>> reports from Ryzen XHCI controllers. Quirks were added for those.
>>>>> Furthermore; adding extra logging it's apparent that other PCI devices
>>>>> in the system can take more than 10ms to recover from D3cold as well.
>>>>
>>>> They can take anything up to 100ms after the link has trained.
>>>
>>> Right; so currently there is nothing checking they really made it
>>> back to D0
>>> after calling pci_power_up(). I feel like we've "mostly" gotten lucky.
>>
>> We do have pci_bridge_wait_for_secondary_bus() there but I guess that's
>> not called for integrated PCIe endpoints such as xHCI and the USB4 Host
>> Interface. They too enter D3cold once power is turned off so I agree we
>> might have gotten lucky here with the D3hot 10ms delay.
>
> Yup; exactly. I think when Gary found this he was seeing other
> integrated devices not being ready in time too, but IIRC there wasn't
> functional impact from them like there is on XHCI and USB4.
>
>
Bjorn,
Considering the above, can you please review this series?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Verify devices transition from D3cold to D0
2024-06-25 15:43 ` Mario Limonciello
@ 2024-06-26 7:53 ` Mika Westerberg
0 siblings, 0 replies; 16+ messages in thread
From: Mika Westerberg @ 2024-06-26 7:53 UTC (permalink / raw)
To: Mario Limonciello
Cc: Bjorn Helgaas, Mathias Nyman, Greg Kroah-Hartman,
open list:PCI SUBSYSTEM, open list, open list:USB XHCI DRIVER,
Daniel Drake, Gary Li
On Tue, Jun 25, 2024 at 10:43:20AM -0500, Mario Limonciello wrote:
> On 6/19/2024 13:50, Mario Limonciello wrote:
> > On 6/19/2024 00:29, Mika Westerberg wrote:
> > > On Tue, Jun 18, 2024 at 11:56:50AM -0500, Mario Limonciello wrote:
> > > > On 6/18/2024 08:14, Mika Westerberg wrote:
> > > > > Hi Mario,
> > > > >
> > > > > On Thu, Jun 13, 2024 at 12:42:00AM -0500, Mario Limonciello wrote:
> > > > > > Gary has reported that when a dock is plugged into a
> > > > > > system at the same
> > > > > > time the autosuspend delay has tripped that the USB4
> > > > > > stack malfunctions.
> > > > > >
> > > > > > Messages show up like this:
> > > > > >
> > > > > > ```
> > > > > > thunderbolt 0000:e5:00.6: ring_interrupt_active:
> > > > > > interrupt for TX ring 0 is already enabled
> > > > > > ```
> > > > > >
> > > > > > Furthermore the USB4 router is non-functional at this point.
> > > > >
> > > > > Once the USB4 domain starts the sleep transition, it cannot be
> > > > > interrupted by anything so it always should go through full sleep
> > > > > transition and only then back from sleep.
> > > > >
> > > > > > Those messages happen because the device is still in
> > > > > > D3cold at the time
> > > > > > that the PCI core handed control back to the USB4 connection manager
> > > > > > (thunderbolt).
> > > > >
> > > > > This is weird. Yes we should be getting the wake from the hotplug but
> > > > > that should happen only after the domain is fully in sleep
> > > > > (D3cold). The
> > > > > BIOS ACPI code is supposed to deal with this.
> > > >
> > > > Is that from from experience or do you mean there is a spec behavior?
> > > >
> > > > IE I'm wondering if we have different "expectations" from different
> > > > company's hardware designers.
> > >
> > > The spec and the CM guide "imply" this behaviour as far as I can tell,
> > > so that the "sleep event" is done completely once started. I guess this
> > > can be interpreted differently too because it is not explicitly said
> > > there.
> > >
> > > Can you ask AMD HW folks if this is their interpretation too? Basically
> > > when we get "Sleep Ready" bit set for all the routers in the domain and
> > > turn off power (send PERST) there cannot be wake events until that is
> > > fully completed.
> > >
> > > There is typically a timeout mechanism in the BIOS side (part of the
> > > power off method) that waits for the PCIe links to enter L2 before it
> > > triggers PERST. We have seen an issue on our side that if this L2
> > > transition is not completed in time a wake event triggered but that was
> > > a BIOS issue.
> >
> > Sure thing. I'll discuss it with them and get back with the results.
>
> From the hardware team they describe this as an abnormal state that they
> don't expect. I don't believe there is anything in the BIOS to prevent it
> though.
Okay thanks for checking.
>
> I could discuss options for this with the BIOS team in the future for the
> USB4 router ACPI device, but as this "seems" to be the same problem as XHCI
> controllers going back at least 5 generations with those quirks I put
> reverts in this series I think a general kernel solution to make "sure" that
> devices have transitioned is the better way to go.
Agreed.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Verify devices transition from D3cold to D0
2024-06-13 5:42 [PATCH 0/4] Verify devices transition from D3cold to D0 Mario Limonciello
` (4 preceding siblings ...)
2024-06-18 13:14 ` [PATCH 0/4] Verify devices transition from D3cold to D0 Mika Westerberg
@ 2024-07-09 3:07 ` Mario Limonciello
5 siblings, 0 replies; 16+ messages in thread
From: Mario Limonciello @ 2024-07-09 3:07 UTC (permalink / raw)
To: Bjorn Helgaas, Mathias Nyman, Mika Westerberg
Cc: open list:PCI SUBSYSTEM, open list, open list:USB XHCI DRIVER,
Daniel Drake, Gary Li, Greg Kroah-Hartman
On 6/13/2024 0:42, Mario Limonciello wrote:
> Gary has reported that when a dock is plugged into a system at the same
> time the autosuspend delay has tripped that the USB4 stack malfunctions.
>
> Messages show up like this:
>
> ```
> thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled
> ```
>
> Furthermore the USB4 router is non-functional at this point.
>
> Those messages happen because the device is still in D3cold at the time
> that the PCI core handed control back to the USB4 connection manager
> (thunderbolt).
>
> The issue is that it takes time for a device to enter D3cold and do a
> conventional reset, and then more time for it to exit D3cold.
>
> This appears not to be a new problem; previously there were very similar
> reports from Ryzen XHCI controllers. Quirks were added for those.
> Furthermore; adding extra logging it's apparent that other PCI devices
> in the system can take more than 10ms to recover from D3cold as well.
>
> This series add a wait into pci_power_up() specifically for D3cold exit and
> then drops the quirks that were previously used for the Ryzen XHCI controllers.
>
> Mario Limonciello (4):
> PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
> PCI: Verify functions currently in D3cold have entered D0
> PCI: Allow Ryzen XHCI controllers into D3cold and drop delays
> PCI: Drop Radeon quirk for Macbook Pro 8.2
>
> drivers/pci/pci.c | 21 ++++++++++++++++-----
> drivers/pci/quirks.c | 25 -------------------------
> drivers/usb/host/xhci-pci.c | 11 -----------
> 3 files changed, 16 insertions(+), 41 deletions(-)
>
Bjorn, Mathias,
Any feedback for this series? I did check and it still applies to
6.10-rc7 as is, but if you want me to rebase on linux-pci happy to do so.
At least Mika and I seem in agreement from other comments in the thread
on this directionally.
Thanks,
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-07-09 3:07 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 5:42 [PATCH 0/4] Verify devices transition from D3cold to D0 Mario Limonciello
2024-06-13 5:42 ` [PATCH 1/4] PCI: Check PCI_PM_CTRL in pci_dev_wait() Mario Limonciello
2024-06-19 18:33 ` Markus Elfring
2024-06-19 18:44 ` Mario Limonciello
2024-06-13 5:42 ` [PATCH 2/4] PCI: Verify functions currently in D3cold have entered D0 Mario Limonciello
2024-06-19 18:45 ` Markus Elfring
2024-06-19 18:46 ` Mario Limonciello
2024-06-13 5:42 ` [PATCH 3/4] PCI: Allow Ryzen XHCI controllers into D3cold and drop delays Mario Limonciello
2024-06-13 5:42 ` [PATCH 4/4] PCI: Drop Radeon quirk for Macbook Pro 8.2 Mario Limonciello
2024-06-18 13:14 ` [PATCH 0/4] Verify devices transition from D3cold to D0 Mika Westerberg
2024-06-18 16:56 ` Mario Limonciello
2024-06-19 5:29 ` Mika Westerberg
2024-06-19 18:50 ` Mario Limonciello
2024-06-25 15:43 ` Mario Limonciello
2024-06-26 7:53 ` Mika Westerberg
2024-07-09 3:07 ` Mario Limonciello
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox