From: Mika Westerberg <mika.westerberg@linux.intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Lukas Wunner <lukas@wunner.de>, Kamil Paral <kparal@redhat.com>,
linux-pci@vger.kernel.org, regressions@lists.linux.dev,
bhelgaas@google.com, chris.chiu@canonical.com
Subject: Re: [REGRESSION] resume with a Thunderbolt dock broke with commit e8b908146d44 "PCI/PM: Increase wait time after resume"
Date: Wed, 27 Sep 2023 20:01:14 +0300 [thread overview]
Message-ID: <20230927170114.GP3208943@black.fi.intel.com> (raw)
In-Reply-To: <20230927165036.GA450933@bhelgaas>
On Wed, Sep 27, 2023 at 11:50:36AM -0500, Bjorn Helgaas wrote:
> On Wed, Sep 27, 2023 at 03:47:32PM +0300, Mika Westerberg wrote:
> > On Wed, Sep 27, 2023 at 06:57:03AM -0500, Bjorn Helgaas wrote:
> > > On Wed, Sep 27, 2023 at 08:16:02AM +0300, Mika Westerberg wrote:
> > > > On Tue, Sep 26, 2023 at 12:55:30PM -0500, Bjorn Helgaas wrote:
> > > > > On Mon, Sep 25, 2023 at 04:19:30PM +0200, Lukas Wunner wrote:
> > > > > > On Mon, Sep 25, 2023 at 08:48:41AM -0500, Bjorn Helgaas wrote:
> > > > > > > Now pciehp thinks the slot is occupied and the link is up, so we
> > > > > > > re-enumerate the hierarchy. Is this because thunderbolt did something
> > > > > > > to 06:00.0 that made the link from 05:01.0 come up?
> > > > > >
> > > > > > PCIe TLPs are encapsulated into Thunderbolt packets and transmitted
> > > > > > alongside DisplayPort and other data over the same physical link.
> > > > > >
> > > > > > For this to work, PCIe tunnels need to be set up between the Thunderbolt
> > > > > > host controller and attached devices. Once a tunnel is established,
> > > > > > the PCIe link magically goes up and TLPs can be transmitted.
> > > > > >
> > > > > > There are two ways to establish those tunnels:
> > > > > >
> > > > > > 1/ By a firmware in the Thunderbolt host controller.
> > > > > > (firmware or "internal" connection manager, drivers/thunderbolt/icm.c)
> > > > > >
> > > > > > 2/ Natively by the kernel.
> > > > > > (software connection manager)
> > > > > >
> > > > > > I'm assuming that the laptop in question exclusively uses the firmware
> > > > > > connection manager, hence the kernel is reliant on that firmware to
> > > > > > establish tunnels and can't really do anything if it fails to do so.
> > > > >
> > > > > Thanks for the background; that improves my meager understanding a
> > > > > lot.
> > > > >
> > > > > Since this seems to be a firmware issue, it does sound like this
> > > > > laptop uses a firmware connection manager. But there still seems to
> > > > > be some kernel connection because pre-e8b908146d44, the link came up
> > > > > in <5 seconds, and after the minor e8b908146d44 change, it takes >60
> > > > > seconds.
> > > >
> > > > In both cases (with or without) the commit what happens is that after
> > > > resume is finished the firmware connection manager notices the
> > > > connection, announces it to the Thunderbolt driver that exposes it to
> > > > the userspace where boltd re-authorizes the device. This brings up the
> > > > PCIe tunnel again and things get working.
> > > >
> > > > (What is expected to happen is that during the resume the firmware
> > > > connection manager re-connects the PCIe tunnel.)
> > > >
> > > > This took previously the ~5s before resume is complete so that the above
> > > > steps can happen where as after the commit it got delayed more up to the
> > > > arbitrary ~60s because we started to use that with the commit
> > > > e8b908146d44 (PCIE_RESET_READY_POLL_MS).
> > >
> > > Why does the kernel delay affect the timing of when the firmware
> > > connection manager notices the connection? It seems like Linux waits
> > > for the timeout, then Linux does something that kicks the firmware
> > > connection manager. That's why I asked about this sequence:
> > >
> > > [ 118.985530] pcieport 0000:05:01.0: Data Link Layer Link Active not set in 1000 msec
> > > [ 190.090902] pcieport 0000:05:01.0: pciehp: Slot(1): Card not present
> > > [ 191.754347] thunderbolt 0000:06:00.0: 1: DROM version: 1
> > > [ 191.762638] thunderbolt 0-1: new device found, vendor=0x108 device=0x1630
> > > [ 191.762641] thunderbolt 0-1: Lenovo ThinkPad Thunderbolt 3 Dock
> > > [ 191.943506] pcieport 0000:05:01.0: pciehp: Slot(1): Card present
> > >
> > > where we wait for the timeout, decide the device is gone, remove
> > > everything, and then the thunderbolt driver does something, and we
> > > notice the device is magically back.
> >
> > Well the delay delays the whole resume and this includes Thunderbolt
> > driver resume too, and userspace (where the bolt daemon authorizes the
> > device again).
>
> I don't know how the Thunderbolt driver works. I assume this refers
> to "thunderbolt 0000:06:00.0"? Is the 06:00.0 resume related to the
> firmware connection manager somehow?
>
> The removal affects the sub-hierarchy below 05:01.0 (bus 07-3b).
> 06:00.0 is below 05:00.0, so it's in a different sub-hierarchy. I
> don't think there's a PCIe requirement that 05:01.0 be resumed before
> 05:00.0, or even that they be serialized at all.
>
> The hierarchy:
>
> pci 0000:00:1c.4: PCI bridge to [bus 04-3c]
> pci 0000:04:00.0: PCI bridge to [bus 05-3c]
> pci 0000:05:00.0: PCI bridge to [bus 06]
> pci 0000:05:01.0: PCI bridge to [bus 07-3b]
>
> It looks like we start the 06:00.0 resume first (118.9), but it
> doesn't complete until after the timeout (191.7):
>
> [ 118.915870] thunderbolt 0000:06:00.0: control channel starting...
> [ 118.985530] pcieport 0000:05:01.0: Data Link Layer Link Active not set in 1000 msec
> [ 190.090902] pcieport 0000:05:01.0: pciehp: Slot(1): Card not present
> [ 191.754347] thunderbolt 0000:06:00.0: 1: DROM version: 1
> [ 191.762638] thunderbolt 0-1: new device found, vendor=0x108 device=0x1630
> [ 191.762641] thunderbolt 0-1: Lenovo ThinkPad Thunderbolt 3 Dock
> [ 191.943506] pcieport 0000:05:01.0: pciehp: Slot(1): Card present
>
> Did the Thunderbolt driver do something to 06:00.0 that caused the
> 05:01.0 link to come up, or is the timing just coincidental?
Yes it sent the firmware a command telling that the driver is ready again, then
the firmware sends back notification that there is a new device:
[ 191.754347] thunderbolt 0000:06:00.0: 1: DROM version: 1
[ 191.762638] thunderbolt 0-1: new device found, vendor=0x108 device=0x1630
[ 191.762641] thunderbolt 0-1: Lenovo ThinkPad Thunderbolt 3 Dock
this then is send to the userspace via uevent where bolt goes and
authorizes it and this results the tunnel to be created which show in
the log as:
[ 191.943506] pcieport 0000:05:01.0: pciehp: Slot(1): Card present
next prev parent reply other threads:[~2023-09-27 17:01 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-21 10:39 [REGRESSION] resume with a Thunderbolt dock broke with commit e8b908146d44 "PCI/PM: Increase wait time after resume" Kamil Paral
2023-08-21 13:12 ` Mika Westerberg
2023-08-22 16:43 ` Kamil Paral
2023-08-23 5:07 ` Mika Westerberg
2023-08-23 7:00 ` Kamil Paral
2023-08-23 7:44 ` Mika Westerberg
2023-08-23 7:56 ` Mika Westerberg
2023-08-23 8:20 ` Kamil Paral
2023-08-23 9:05 ` Mika Westerberg
2023-08-23 14:02 ` Kamil Paral
2023-08-24 11:43 ` Mika Westerberg
2023-08-25 8:42 ` Kamil Paral
2023-08-25 9:46 ` Mika Westerberg
2023-08-25 11:42 ` Kamil Paral
2023-09-23 22:46 ` Bjorn Helgaas
2023-09-24 13:27 ` Mika Westerberg
2023-09-24 20:18 ` Bjorn Helgaas
2023-09-25 4:59 ` Mika Westerberg
2023-09-25 13:48 ` Bjorn Helgaas
2023-09-25 14:19 ` Lukas Wunner
2023-09-26 17:55 ` Bjorn Helgaas
2023-09-27 5:16 ` Mika Westerberg
2023-09-27 11:57 ` Bjorn Helgaas
2023-09-27 12:47 ` Mika Westerberg
2023-09-27 14:31 ` Lukas Wunner
2023-09-27 14:42 ` Mika Westerberg
2023-09-27 15:36 ` Mika Westerberg
2023-09-27 16:50 ` Bjorn Helgaas
2023-09-27 17:01 ` Mika Westerberg [this message]
2023-09-27 17:24 ` Bjorn Helgaas
2023-09-27 18:02 ` Mika Westerberg
2023-09-27 19:41 ` Bjorn Helgaas
2023-09-28 4:42 ` Mika Westerberg
2023-09-28 15:49 ` Bjorn Helgaas
2023-10-05 13:01 ` Kamil Paral
2023-10-05 19:00 ` Bjorn Helgaas
[not found] ` <CA+cBOTds9k1Q2haC_gTpsUvjP02dHOv9vSconFEAu-Fsxwf36A@mail.gmail.com>
2023-09-27 13:53 ` Mika Westerberg
2023-09-27 14:12 ` Kamil Paral
2023-10-05 12:54 ` Kamil Paral
2023-10-05 13:09 ` Mika Westerberg
2023-09-27 14:08 ` Kamil Paral
2023-08-21 19:10 ` Bjorn Helgaas
2023-08-22 16:36 ` Kamil Paral
2023-11-01 10:59 ` Linux regression tracking (Thorsten Leemhuis)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230927170114.GP3208943@black.fi.intel.com \
--to=mika.westerberg@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=chris.chiu@canonical.com \
--cc=helgaas@kernel.org \
--cc=kparal@redhat.com \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=regressions@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.