linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Kamil Paral <kparal@redhat.com>
Cc: linux-pci@vger.kernel.org, regressions@lists.linux.dev,
	mika.westerberg@linux.intel.com, bhelgaas@google.com,
	chris.chiu@canonical.com
Subject: Re: [REGRESSION] resume with a Thunderbolt dock broke with commit e8b908146d44 "PCI/PM: Increase wait time after resume"
Date: Mon, 21 Aug 2023 14:10:55 -0500	[thread overview]
Message-ID: <20230821191055.GA362994@bhelgaas> (raw)
In-Reply-To: <CA+cBOTeWrsTyANjLZQ=bGoBQ_yOkkV1juyRvJq-C8GOrbW6t9Q@mail.gmail.com>

On Mon, Aug 21, 2023 at 12:39:02PM +0200, Kamil Paral wrote:
> = Summary =
> A Thinkpad T480s laptop with a Thinkpad Thunderbolt 3 Dock connected
> can no longer resume from suspend. The problem was introduced in
> e8b908146d44 "PCI/PM: Increase wait time after resume".
> 
> = Detailed description =
> When running a kernel containing the identified commit and trying to
> resume the laptop from sleep, the laptop's power light changes from
> blinking (sleep state) to shining (running state), but the display
> stays black and it doesn't respond to any keyboard input, nor to
> ping/ssh, and no logs are written to the disk (which means I don't
> know how to gather error logs). It needs to be force-rebooted. I
> bisected the kernel and identified the commit which causes this
> behavior. I used the vanilla kernel with a Fedora kernel config.
> 
> The reproducer is:
> 1. Connect the dock to the laptop.
> 2. Boot the laptop (in my case, to the gdm).
> 3. Suspend the laptop.
> 4. Resume the laptop. This is successful before the identified commit
> (the last tested good commit was cc8a983d0fce), and unsuccessful
> (black screen, frozen system) after the identified commit
> (e8b908146d44).
> 
> The reproducibility is 100%, I tested it many many times in a row.
> 
> When the dock is unplugged, suspend and resume works as expected. When
> I connect a different laptop to the dock (Thinkpad P1 gen3), I don't
> see any resume failure. So this is somehow related to the particular
> combination of Thinkpad T480s and Thinkpad Thunderbolt 3 Dock. The
> dock is running the latest firmware.

Thanks very much for the report.

Wow, this is super interesting.  e8b908146d44 literally just increases
a timeout; the complete patch is:

   static void pci_pm_bridge_power_up_actions(struct pci_dev *pci_dev)
   {
  -       pci_bridge_wait_for_secondary_bus(pci_dev, "resume", PCI_RESET_WAIT);
  +       pci_bridge_wait_for_secondary_bus(pci_dev, "resume",
  +                                         PCIE_RESET_READY_POLL_MS);

Increasing a timeout should never cause a failure like this, so
there must be something really unexpected going on.

> I also tested "pcie_aspm=off",
> and that allows the laptop to resume properly, but then there's a race
> condition whether devices on the dock are visible to the OS or not
> after resume, so this is not useful even just as a workaround.

Wow, also shocking.  I see you have lspci output attached to the RH
bugzilla, but it doesn't include the details.  Would you mind
collecting the output of "sudo lspci -vv" both with and without
"pcie_aspm=off"?  No need to try suspend/resume to collect these.

Also, what does this race condition look like?  Dock devices are
visible before suspend, but sometimes none of them are visible *after*
resume?  We don't re-enumerate on resume, so does this mean they still
appear in lspci output but they just don't work?

If you can collect the complete dmesg log and "sudo lspci -vv" output
after a resume when the dock devices aren't visible, maybe there would
be a clue there.

> I already created a downstream Fedora bug report in Red Hat Bugzilla:
> https://bugzilla.redhat.com/show_bug.cgi?id=2230357
> 
> lspci of the laptop:
> https://bugzilla-attachments.redhat.com/attachment.cgi?id=1982541
> git bisect log:
> https://bugzilla-attachments.redhat.com/attachment.cgi?id=1983351
> 
> 
> The commit which broke resume is the following:
> 
> e8b908146d44310473e43b3382eca126e12d279c is the first bad commit
> commit e8b908146d44310473e43b3382eca126e12d279c
> Author: Mika Westerberg <mika.westerberg.com>
> Date:   Tue Apr 4 08:27:13 2023 +0300
> 
>     PCI/PM: Increase wait time after resume
> 
>     PCIe r6.0 sec 6.6.1 prescribes that a device must be able to respond to
>     config requests within 1.0 s (PCI_RESET_WAIT) after exiting conventional
>     reset and this same delay is prescribed when coming out of D3cold (as that
>     involves reset too).
> 
>     A device that requires more than 1 second to initialize after reset may
>     respond to config requests with Request Retry Status completions (sec
>     2.3.1), and we accommodate that in Linux with a 60 second cap
>     (PCIE_RESET_READY_POLL_MS).
> 
>     Previously we waited up to PCIE_RESET_READY_POLL_MS only in the reset code
>     path, not in the resume path.  However, a device has surfaced, namely Intel
>     Titan Ridge xHCI, which requires a longer delay also in the resume code
>     path.
> 
>     Make the resume code path to use this same extended delay as the reset
>     path.
> 
>     Link: https://bugzilla.kernel.org/show_bug.cgi?id=216728
>     Link: https://lore.kernel.org/r/20230404052714.51315-2-mika.westerberg@linux.intel.com
>     Reported-by: Chris Chiu <chris.chiu>
>     Signed-off-by: Mika Westerberg <mika.westerberg.com>
>     Signed-off-by: Bjorn Helgaas <bhelgaas>
>     Cc: Lukas Wunner <lukas>
> 
>  drivers/pci/pci-driver.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> 
> I'm happy to add further details, perform additional debugging, or
> test some experimental patches in order to resolve this regression.
> Please CC me in your replies, I'm not subscribed to this list. Thank
> you!
> Kamil Páral
> 
> 
> #regzbot introduced: e8b908146d44
> 

  parent reply	other threads:[~2023-08-21 19:11 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-21 10:39 [REGRESSION] resume with a Thunderbolt dock broke with commit e8b908146d44 "PCI/PM: Increase wait time after resume" Kamil Paral
2023-08-21 13:12 ` Mika Westerberg
2023-08-22 16:43   ` Kamil Paral
2023-08-23  5:07     ` Mika Westerberg
2023-08-23  7:00       ` Kamil Paral
2023-08-23  7:44         ` Mika Westerberg
2023-08-23  7:56           ` Mika Westerberg
2023-08-23  8:20             ` Kamil Paral
2023-08-23  9:05               ` Mika Westerberg
2023-08-23 14:02                 ` Kamil Paral
2023-08-24 11:43                   ` Mika Westerberg
2023-08-25  8:42                     ` Kamil Paral
2023-08-25  9:46                       ` Mika Westerberg
2023-08-25 11:42                         ` Kamil Paral
2023-09-23 22:46                       ` Bjorn Helgaas
2023-09-24 13:27                         ` Mika Westerberg
2023-09-24 20:18                           ` Bjorn Helgaas
2023-09-25  4:59                             ` Mika Westerberg
2023-09-25 13:48                               ` Bjorn Helgaas
2023-09-25 14:19                                 ` Lukas Wunner
2023-09-26 17:55                                   ` Bjorn Helgaas
2023-09-27  5:16                                     ` Mika Westerberg
2023-09-27 11:57                                       ` Bjorn Helgaas
2023-09-27 12:47                                         ` Mika Westerberg
2023-09-27 14:31                                           ` Lukas Wunner
2023-09-27 14:42                                             ` Mika Westerberg
2023-09-27 15:36                                               ` Mika Westerberg
2023-09-27 16:50                                           ` Bjorn Helgaas
2023-09-27 17:01                                             ` Mika Westerberg
2023-09-27 17:24                                               ` Bjorn Helgaas
2023-09-27 18:02                                                 ` Mika Westerberg
2023-09-27 19:41                                                   ` Bjorn Helgaas
2023-09-28  4:42                                                     ` Mika Westerberg
2023-09-28 15:49                                                       ` Bjorn Helgaas
2023-10-05 13:01                                                         ` Kamil Paral
2023-10-05 19:00                                                           ` Bjorn Helgaas
     [not found]                                       ` <CA+cBOTds9k1Q2haC_gTpsUvjP02dHOv9vSconFEAu-Fsxwf36A@mail.gmail.com>
2023-09-27 13:53                                         ` Mika Westerberg
2023-09-27 14:12                                           ` Kamil Paral
2023-10-05 12:54                                             ` Kamil Paral
2023-10-05 13:09                                               ` Mika Westerberg
2023-09-27 14:08                                         ` Kamil Paral
2023-08-21 19:10 ` Bjorn Helgaas [this message]
2023-08-22 16:36   ` Kamil Paral
2023-11-01 10:59 ` Linux regression tracking (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230821191055.GA362994@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=chris.chiu@canonical.com \
    --cc=kparal@redhat.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=mika.westerberg@linux.intel.com \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).