linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume
@ 2023-09-18  5:30 Mika Westerberg
  2023-09-18  8:37 ` Lukas Wunner
  2023-09-21 20:19 ` Bjorn Helgaas
  0 siblings, 2 replies; 10+ messages in thread
From: Mika Westerberg @ 2023-09-18  5:30 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Lukas Wunner, Mark Blakeney, Mika Westerberg, linux-pci

Mark Blakeney reported that when suspending system with a Thunderbolt
dock connected and then unplugging the dock before resume (which is
pretty normal flow with laptops), resuming takes long time.

What happens is that the PCIe link from the root port to the PCIe switch
inside the Thunderbolt device does not train (as expected, the link is
upplugged):

[   34.903158] pcieport 0000:00:07.2: restoring config space at offset 0x24 (was 0x3bf12001, writing 0x3bf12001)
[   34.903231] pcieport 0000:00:07.0: waiting 100 ms for downstream link
[   36.140616] pcieport 0000:01:00.0: not ready 1023ms after resume; giving up

However, at this point we still try the resume the devices below that
unplugged link:

[   36.140741] pcieport 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
...
[   36.142235] pcieport 0000:01:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
...
[   36.144702] pcieport 0000:02:02.0: waiting 100 ms for downstream link, after activation

And this is the link from PCIe switch downstream port to the xHCI on the
dock:

[   38.380618] xhci_hcd 0000:03:00.0: not ready 1023ms after resume; waiting
[   39.420587] xhci_hcd 0000:03:00.0: not ready 2047ms after resume; waiting
[   41.527250] xhci_hcd 0000:03:00.0: not ready 4095ms after resume; waiting
[   45.793957] xhci_hcd 0000:03:00.0: not ready 8191ms after resume; waiting
[   54.113950] xhci_hcd 0000:03:00.0: not ready 16383ms after resume; waiting
[   71.180576] xhci_hcd 0000:03:00.0: not ready 32767ms after resume; waiting
...
[  105.313963] xhci_hcd 0000:03:00.0: not ready 65535ms after resume; giving up
[  105.314037] xhci_hcd 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  105.315640] xhci_hcd 0000:03:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
...

This ends up slowing down the resume time considerably. For this reason
mark these devices as disconnected if the link above them did not train
properly.

Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/pci-driver.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index a79c110c7e51..51ec9e7e784f 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -572,7 +572,19 @@ static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
 
 static void pci_pm_bridge_power_up_actions(struct pci_dev *pci_dev)
 {
-	pci_bridge_wait_for_secondary_bus(pci_dev, "resume");
+	int ret;
+
+	ret = pci_bridge_wait_for_secondary_bus(pci_dev, "resume");
+	if (ret) {
+		/*
+		 * The downstream link failed to come up, so mark the
+		 * devices below as disconnected to make sure we don't
+		 * attempt to resume them.
+		 */
+		pci_walk_bus(pci_dev->subordinate, pci_dev_set_disconnected,
+			     NULL);
+		return;
+	}
 
 	/*
 	 * When powering on a bridge from D3cold, the whole hierarchy may be
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume
  2023-09-18  5:30 [PATCH] PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume Mika Westerberg
@ 2023-09-18  8:37 ` Lukas Wunner
  2023-09-21 20:19 ` Bjorn Helgaas
  1 sibling, 0 replies; 10+ messages in thread
From: Lukas Wunner @ 2023-09-18  8:37 UTC (permalink / raw)
  To: Mika Westerberg; +Cc: Bjorn Helgaas, Mark Blakeney, linux-pci

On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
> Mark Blakeney reported that when suspending system with a Thunderbolt
> dock connected and then unplugging the dock before resume (which is
> pretty normal flow with laptops), resuming takes long time.
> 
> What happens is that the PCIe link from the root port to the PCIe switch
> inside the Thunderbolt device does not train (as expected, the link is
> upplugged):
> 
> [   34.903158] pcieport 0000:00:07.2: restoring config space at offset 0x24 (was 0x3bf12001, writing 0x3bf12001)
> [   34.903231] pcieport 0000:00:07.0: waiting 100 ms for downstream link
> [   36.140616] pcieport 0000:01:00.0: not ready 1023ms after resume; giving up
> 
> However, at this point we still try the resume the devices below that
> unplugged link:
> 
> [   36.140741] pcieport 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
> ...
> [   36.142235] pcieport 0000:01:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
> ...
> [   36.144702] pcieport 0000:02:02.0: waiting 100 ms for downstream link, after activation
> 
> And this is the link from PCIe switch downstream port to the xHCI on the
> dock:
> 
> [   38.380618] xhci_hcd 0000:03:00.0: not ready 1023ms after resume; waiting
> [   39.420587] xhci_hcd 0000:03:00.0: not ready 2047ms after resume; waiting
> [   41.527250] xhci_hcd 0000:03:00.0: not ready 4095ms after resume; waiting
> [   45.793957] xhci_hcd 0000:03:00.0: not ready 8191ms after resume; waiting
> [   54.113950] xhci_hcd 0000:03:00.0: not ready 16383ms after resume; waiting
> [   71.180576] xhci_hcd 0000:03:00.0: not ready 32767ms after resume; waiting
> ...
> [  105.313963] xhci_hcd 0000:03:00.0: not ready 65535ms after resume; giving up
> [  105.314037] xhci_hcd 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
> [  105.315640] xhci_hcd 0000:03:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
> ...
> 
> This ends up slowing down the resume time considerably. For this reason
> mark these devices as disconnected if the link above them did not train
> properly.
> 
> Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
> Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>

Reviewed-by: Lukas Wunner <lukas@wunner.de>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume
  2023-09-18  5:30 [PATCH] PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume Mika Westerberg
  2023-09-18  8:37 ` Lukas Wunner
@ 2023-09-21 20:19 ` Bjorn Helgaas
  2023-09-22  4:42   ` Mika Westerberg
                     ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2023-09-21 20:19 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Lukas Wunner, Mark Blakeney, Kamil Paral,
	Chris Chiu, linux-pci

[+cc Kamil, Chris]

On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
> Mark Blakeney reported that when suspending system with a Thunderbolt
> dock connected and then unplugging the dock before resume (which is
> pretty normal flow with laptops), resuming takes long time.
> 
> What happens is that the PCIe link from the root port to the PCIe switch
> inside the Thunderbolt device does not train (as expected, the link is
> upplugged):
> 
> [   34.903158] pcieport 0000:00:07.2: restoring config space at offset 0x24 (was 0x3bf12001, writing 0x3bf12001)
> [   34.903231] pcieport 0000:00:07.0: waiting 100 ms for downstream link
> [   36.140616] pcieport 0000:01:00.0: not ready 1023ms after resume; giving up
> 
> However, at this point we still try the resume the devices below that
> unplugged link:
> 
> [   36.140741] pcieport 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
> ...
> [   36.142235] pcieport 0000:01:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
> ...
> [   36.144702] pcieport 0000:02:02.0: waiting 100 ms for downstream link, after activation
> 
> And this is the link from PCIe switch downstream port to the xHCI on the
> dock:
> 
> [   38.380618] xhci_hcd 0000:03:00.0: not ready 1023ms after resume; waiting
> [   39.420587] xhci_hcd 0000:03:00.0: not ready 2047ms after resume; waiting
> [   41.527250] xhci_hcd 0000:03:00.0: not ready 4095ms after resume; waiting
> [   45.793957] xhci_hcd 0000:03:00.0: not ready 8191ms after resume; waiting
> [   54.113950] xhci_hcd 0000:03:00.0: not ready 16383ms after resume; waiting
> [   71.180576] xhci_hcd 0000:03:00.0: not ready 32767ms after resume; waiting
> ...
> [  105.313963] xhci_hcd 0000:03:00.0: not ready 65535ms after resume; giving up
> [  105.314037] xhci_hcd 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
> [  105.315640] xhci_hcd 0000:03:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
> ...
> 
> This ends up slowing down the resume time considerably. For this reason
> mark these devices as disconnected if the link above them did not train
> properly.
> 
> Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
> Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>

Applied with Lukas' Reviewed-by to pm for v6.7.

e8b908146d44 appeared in v6.4.  Seems like maybe a candidate for
stable?  IIUC, resume actually does work, but takes 65+ seconds longer
than it should?

Kamil also bisected a 60+ second resume delay to e8b908146d44
(https://lore.kernel.org/r/CA+cBOTeWrsTyANjLZQ=bGoBQ_yOkkV1juyRvJq-C8GOrbW6t9Q@mail.gmail.com),
but IIUC at
https://lore.kernel.org/linux-pci/20230824114300.GU3465@black.fi.intel.com/T/#u
you concluded that Kamil's issue was related to firmware and actually
had nothing to do with e8b908146d44.

Do you still think Kamil's issue is unrelated to e8b908146d44 and this
patch?  If so, how do we handle Kamil's issue?  An answer like "users
of v6.4+ must upgrade their Thunderbolt firmware" seems like it would
be kind of a nightmare for users.

> ---
>  drivers/pci/pci-driver.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index a79c110c7e51..51ec9e7e784f 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -572,7 +572,19 @@ static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
>  
>  static void pci_pm_bridge_power_up_actions(struct pci_dev *pci_dev)
>  {
> -	pci_bridge_wait_for_secondary_bus(pci_dev, "resume");
> +	int ret;
> +
> +	ret = pci_bridge_wait_for_secondary_bus(pci_dev, "resume");
> +	if (ret) {
> +		/*
> +		 * The downstream link failed to come up, so mark the
> +		 * devices below as disconnected to make sure we don't
> +		 * attempt to resume them.
> +		 */
> +		pci_walk_bus(pci_dev->subordinate, pci_dev_set_disconnected,
> +			     NULL);
> +		return;
> +	}
>  
>  	/*
>  	 * When powering on a bridge from D3cold, the whole hierarchy may be
> -- 
> 2.40.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume
  2023-09-21 20:19 ` Bjorn Helgaas
@ 2023-09-22  4:42   ` Mika Westerberg
  2023-09-22 12:59     ` Bjorn Helgaas
  2023-09-22 11:45   ` Thorsten Leemhuis
  2023-09-29 22:45   ` Bjorn Helgaas
  2 siblings, 1 reply; 10+ messages in thread
From: Mika Westerberg @ 2023-09-22  4:42 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Lukas Wunner, Mark Blakeney, Kamil Paral,
	Chris Chiu, linux-pci

Hi Bjorn,

On Thu, Sep 21, 2023 at 03:19:45PM -0500, Bjorn Helgaas wrote:
> [+cc Kamil, Chris]
> 
> On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
> > Mark Blakeney reported that when suspending system with a Thunderbolt
> > dock connected and then unplugging the dock before resume (which is
> > pretty normal flow with laptops), resuming takes long time.
> > 
> > What happens is that the PCIe link from the root port to the PCIe switch
> > inside the Thunderbolt device does not train (as expected, the link is
> > upplugged):
> > 
> > [   34.903158] pcieport 0000:00:07.2: restoring config space at offset 0x24 (was 0x3bf12001, writing 0x3bf12001)
> > [   34.903231] pcieport 0000:00:07.0: waiting 100 ms for downstream link
> > [   36.140616] pcieport 0000:01:00.0: not ready 1023ms after resume; giving up
> > 
> > However, at this point we still try the resume the devices below that
> > unplugged link:
> > 
> > [   36.140741] pcieport 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
> > ...
> > [   36.142235] pcieport 0000:01:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
> > ...
> > [   36.144702] pcieport 0000:02:02.0: waiting 100 ms for downstream link, after activation
> > 
> > And this is the link from PCIe switch downstream port to the xHCI on the
> > dock:
> > 
> > [   38.380618] xhci_hcd 0000:03:00.0: not ready 1023ms after resume; waiting
> > [   39.420587] xhci_hcd 0000:03:00.0: not ready 2047ms after resume; waiting
> > [   41.527250] xhci_hcd 0000:03:00.0: not ready 4095ms after resume; waiting
> > [   45.793957] xhci_hcd 0000:03:00.0: not ready 8191ms after resume; waiting
> > [   54.113950] xhci_hcd 0000:03:00.0: not ready 16383ms after resume; waiting
> > [   71.180576] xhci_hcd 0000:03:00.0: not ready 32767ms after resume; waiting
> > ...
> > [  105.313963] xhci_hcd 0000:03:00.0: not ready 65535ms after resume; giving up
> > [  105.314037] xhci_hcd 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
> > [  105.315640] xhci_hcd 0000:03:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
> > ...
> > 
> > This ends up slowing down the resume time considerably. For this reason
> > mark these devices as disconnected if the link above them did not train
> > properly.
> > 
> > Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
> > Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> 
> Applied with Lukas' Reviewed-by to pm for v6.7.

Thanks!

> e8b908146d44 appeared in v6.4.  Seems like maybe a candidate for
> stable?  IIUC, resume actually does work, but takes 65+ seconds longer
> than it should?

Yes, I think it should be tagged for stable.

> Kamil also bisected a 60+ second resume delay to e8b908146d44
> (https://lore.kernel.org/r/CA+cBOTeWrsTyANjLZQ=bGoBQ_yOkkV1juyRvJq-C8GOrbW6t9Q@mail.gmail.com),
> but IIUC at
> https://lore.kernel.org/linux-pci/20230824114300.GU3465@black.fi.intel.com/T/#u
> you concluded that Kamil's issue was related to firmware and actually
> had nothing to do with e8b908146d44.
> 
> Do you still think Kamil's issue is unrelated to e8b908146d44 and this
> patch?  If so, how do we handle Kamil's issue?  An answer like "users
> of v6.4+ must upgrade their Thunderbolt firmware" seems like it would
> be kind of a nightmare for users.

It's a different issue. What happens in his system is that the link went
down even though the dock was still connected and this should not happen
(the firmware should bring the link up during resume). The delay was
just a "symptom".

What happen here is that the user suspends the device and deliberately
disconnects the dock.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume
  2023-09-21 20:19 ` Bjorn Helgaas
  2023-09-22  4:42   ` Mika Westerberg
@ 2023-09-22 11:45   ` Thorsten Leemhuis
  2023-09-22 12:41     ` Bjorn Helgaas
  2023-09-29 22:45   ` Bjorn Helgaas
  2 siblings, 1 reply; 10+ messages in thread
From: Thorsten Leemhuis @ 2023-09-22 11:45 UTC (permalink / raw)
  To: Bjorn Helgaas, Mika Westerberg
  Cc: Bjorn Helgaas, Lukas Wunner, Mark Blakeney, Kamil Paral,
	Chris Chiu, linux-pci, Linux kernel regressions list

On 21.09.23 22:19, Bjorn Helgaas wrote:
> [+cc Kamil, Chris]
> 
> On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
>> Mark Blakeney reported that when suspending system with a Thunderbolt
>> dock connected and then unplugging the dock before resume (which is
>> pretty normal flow with laptops), resuming takes long time.
>>
>> What happens is that the PCIe link from the root port to the PCIe switch
>> inside the Thunderbolt device does not train (as expected, the link is
>> upplugged):
> [...]
>> Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
>> Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
>> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> 
> Applied with Lukas' Reviewed-by to pm for v6.7.
>
> e8b908146d44 appeared in v6.4. 

Then why did you apply this for 6.7 and not to a branch targeting the
current cycle? Linus wants regression introduced during round about the
last 12 months to be handled liked regressions from the current cycle,
unless there is some good reason to treat the fix differently (big risk
of other regressions for example).

> Seems like maybe a candidate for stable? 

+1

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume
  2023-09-22 11:45   ` Thorsten Leemhuis
@ 2023-09-22 12:41     ` Bjorn Helgaas
  2023-09-22 12:53       ` Thorsten Leemhuis
  0 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2023-09-22 12:41 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Mika Westerberg, Bjorn Helgaas, Lukas Wunner, Mark Blakeney,
	Kamil Paral, Chris Chiu, linux-pci, Linux kernel regressions list

On Fri, Sep 22, 2023 at 01:45:58PM +0200, Thorsten Leemhuis wrote:
> On 21.09.23 22:19, Bjorn Helgaas wrote:
> > On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
> >> Mark Blakeney reported that when suspending system with a Thunderbolt
> >> dock connected and then unplugging the dock before resume (which is
> >> pretty normal flow with laptops), resuming takes long time.
> >>
> >> What happens is that the PCIe link from the root port to the PCIe switch
> >> inside the Thunderbolt device does not train (as expected, the link is
> >> upplugged):
> > [...]
> >> Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
> >> Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
> >> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
> >> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > 
> > Applied with Lukas' Reviewed-by to pm for v6.7.
> >
> > e8b908146d44 appeared in v6.4. 
> 
> Then why did you apply this for 6.7 and not to a branch targeting the
> current cycle? Linus wants regression introduced during round about the
> last 12 months to be handled liked regressions from the current cycle,

I was not aware of the last 12 months rule.  Happy to change if that's
the guideline.  My previous rule of thumb was: fixes for regressions
in the most recent merge window always go to current cycle, fixes for
older regressions case-by-case.

Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume
  2023-09-22 12:41     ` Bjorn Helgaas
@ 2023-09-22 12:53       ` Thorsten Leemhuis
  0 siblings, 0 replies; 10+ messages in thread
From: Thorsten Leemhuis @ 2023-09-22 12:53 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Bjorn Helgaas, Lukas Wunner, Mark Blakeney,
	Kamil Paral, Chris Chiu, linux-pci, Linux kernel regressions list

On 22.09.23 14:41, Bjorn Helgaas wrote:
> On Fri, Sep 22, 2023 at 01:45:58PM +0200, Thorsten Leemhuis wrote:
>> On 21.09.23 22:19, Bjorn Helgaas wrote:
>>> On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
>>>> Mark Blakeney reported that when suspending system with a Thunderbolt
>>>> dock connected and then unplugging the dock before resume (which is
>>>> pretty normal flow with laptops), resuming takes long time.
>>>>
>>>> What happens is that the PCIe link from the root port to the PCIe switch
>>>> inside the Thunderbolt device does not train (as expected, the link is
>>>> upplugged):
>>> [...]
>>>> Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
>>>> Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
>>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
>>>> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>>>
>>> Applied with Lukas' Reviewed-by to pm for v6.7.
>>>
>>> e8b908146d44 appeared in v6.4. 
>>
>> Then why did you apply this for 6.7 and not to a branch targeting the
>> current cycle? Linus wants regression introduced during round about the
>> last 12 months to be handled liked regressions from the current cycle,
> 
> I was not aware of the last 12 months rule.  Happy to change if that's
> the guideline.  

Thx. FWIW, if you want to know what Linus said exactly, check these out:

https://lore.kernel.org/all/CAHk-=wis_qQy4oDNynNKi5b7Qhosmxtoj1jxo5wmB6SRUwQUBQ@mail.gmail.com/
https://lore.kernel.org/all/CAHk-=wgD98pmSK3ZyHk_d9kZ2bhgN6DuNZMAJaV0WTtbkf=RDw@mail.gmail.com/

> My previous rule of thumb was: fixes for regressions
> in the most recent merge window always go to current cycle, fixes for
> older regressions case-by-case.

Yeah, there are cases where waiting is the right thing, but most of the
time it's not I'd say.

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume
  2023-09-22  4:42   ` Mika Westerberg
@ 2023-09-22 12:59     ` Bjorn Helgaas
  2023-09-24 13:44       ` Mika Westerberg
  0 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2023-09-22 12:59 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Lukas Wunner, Mark Blakeney, Kamil Paral,
	Chris Chiu, linux-pci, Thorsten Leemhuis

[+cc Thorsten]

On Fri, Sep 22, 2023 at 07:42:37AM +0300, Mika Westerberg wrote:
> On Thu, Sep 21, 2023 at 03:19:45PM -0500, Bjorn Helgaas wrote:
> > On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
> ...

> > Kamil also bisected a 60+ second resume delay to e8b908146d44
> > (https://lore.kernel.org/r/CA+cBOTeWrsTyANjLZQ=bGoBQ_yOkkV1juyRvJq-C8GOrbW6t9Q@mail.gmail.com),
> > but IIUC at
> > https://lore.kernel.org/linux-pci/20230824114300.GU3465@black.fi.intel.com/T/#u
> > you concluded that Kamil's issue was related to firmware and actually
> > had nothing to do with e8b908146d44.
> > 
> > Do you still think Kamil's issue is unrelated to e8b908146d44 and this
> > patch?  If so, how do we handle Kamil's issue?  An answer like "users
> > of v6.4+ must upgrade their Thunderbolt firmware" seems like it would
> > be kind of a nightmare for users.
> 
> It's a different issue. What happens in his system is that the link went
> down even though the dock was still connected and this should not happen
> (the firmware should bring the link up during resume). The delay was
> just a "symptom".

Do you have any leads for Kamil's issue?  If we had known that
e8b908146d44 would cause that problem, we never would have applied it
in the first place.

No OS would accept that resume delay, so there must be some way to fix
that in the OS without requiring a firmware update.

If Kamil's issue is that firmware doesn't bring up the link during
resume, how *does* the link get brought up, and what does the delay
have to do with it?

Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume
  2023-09-22 12:59     ` Bjorn Helgaas
@ 2023-09-24 13:44       ` Mika Westerberg
  0 siblings, 0 replies; 10+ messages in thread
From: Mika Westerberg @ 2023-09-24 13:44 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Lukas Wunner, Mark Blakeney, Kamil Paral,
	Chris Chiu, linux-pci, Thorsten Leemhuis

On Fri, Sep 22, 2023 at 07:59:26AM -0500, Bjorn Helgaas wrote:
> [+cc Thorsten]
> 
> On Fri, Sep 22, 2023 at 07:42:37AM +0300, Mika Westerberg wrote:
> > On Thu, Sep 21, 2023 at 03:19:45PM -0500, Bjorn Helgaas wrote:
> > > On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
> > ...
> 
> > > Kamil also bisected a 60+ second resume delay to e8b908146d44
> > > (https://lore.kernel.org/r/CA+cBOTeWrsTyANjLZQ=bGoBQ_yOkkV1juyRvJq-C8GOrbW6t9Q@mail.gmail.com),
> > > but IIUC at
> > > https://lore.kernel.org/linux-pci/20230824114300.GU3465@black.fi.intel.com/T/#u
> > > you concluded that Kamil's issue was related to firmware and actually
> > > had nothing to do with e8b908146d44.
> > > 
> > > Do you still think Kamil's issue is unrelated to e8b908146d44 and this
> > > patch?  If so, how do we handle Kamil's issue?  An answer like "users
> > > of v6.4+ must upgrade their Thunderbolt firmware" seems like it would
> > > be kind of a nightmare for users.
> > 
> > It's a different issue. What happens in his system is that the link went
> > down even though the dock was still connected and this should not happen
> > (the firmware should bring the link up during resume). The delay was
> > just a "symptom".
> 
> Do you have any leads for Kamil's issue?  If we had known that
> e8b908146d44 would cause that problem, we never would have applied it
> in the first place.

I explained it in the other email I just sent. I should mention here
that the two issues are different.

> No OS would accept that resume delay, so there must be some way to fix
> that in the OS without requiring a firmware update.

It is not "resume" delay. It is the delay what we wait for the device to
become ready until we decide it is not functional/disconnect. That delay
is completely arbitrary.

> If Kamil's issue is that firmware doesn't bring up the link during
> resume, how *does* the link get brought up, and what does the delay
> have to do with it?

The PCIe tunnel (the "link" above) gets established after D3cold by the
Thunderbolt firmware running inside the host controller. The trigger is
typically when _PR0 ACPI method is called, this sends special command
through the mailbox that makes the firmware re-connect all the tunnels
that were previously connected.

The delay we are talking about here is the PCIe spec required delay
after the device went through a reset that the OS must observe before it
can send configuration requests to that device. Now, the PCIe spec does
not specify how long the OS should wait for device on a link that does
not come up. We increased that delay to the ~60s to fix another issue on
a xHCI controller but forgot the fact that when the device is
deliberately unplugged we still wait for the ~60s which is wasted effort
and just ends up annoying users.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume
  2023-09-21 20:19 ` Bjorn Helgaas
  2023-09-22  4:42   ` Mika Westerberg
  2023-09-22 11:45   ` Thorsten Leemhuis
@ 2023-09-29 22:45   ` Bjorn Helgaas
  2 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2023-09-29 22:45 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Lukas Wunner, Mark Blakeney, Kamil Paral,
	Chris Chiu, linux-pci, Thorsten Leemhuis

[+cc Thorsten]

On Thu, Sep 21, 2023 at 03:19:45PM -0500, Bjorn Helgaas wrote:
> On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
> > Mark Blakeney reported that when suspending system with a Thunderbolt
> > dock connected and then unplugging the dock before resume (which is
> > pretty normal flow with laptops), resuming takes long time.
> > ...

> > Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
> > Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> 
> Applied with Lukas' Reviewed-by to pm for v6.7.
> 
> e8b908146d44 appeared in v6.4.  Seems like maybe a candidate for
> stable?  IIUC, resume actually does work, but takes 65+ seconds longer
> than it should?

I moved this to for-linus for v6.6 and added a stable tag for v6.4+.

Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-09-29 22:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-18  5:30 [PATCH] PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume Mika Westerberg
2023-09-18  8:37 ` Lukas Wunner
2023-09-21 20:19 ` Bjorn Helgaas
2023-09-22  4:42   ` Mika Westerberg
2023-09-22 12:59     ` Bjorn Helgaas
2023-09-24 13:44       ` Mika Westerberg
2023-09-22 11:45   ` Thorsten Leemhuis
2023-09-22 12:41     ` Bjorn Helgaas
2023-09-22 12:53       ` Thorsten Leemhuis
2023-09-29 22:45   ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).