From: Bjorn Helgaas <helgaas@kernel.org>
To: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Lukas Wunner <lukas@wunner.de>, Kamil Paral <kparal@redhat.com>,
linux-pci@vger.kernel.org, regressions@lists.linux.dev,
bhelgaas@google.com, chris.chiu@canonical.com
Subject: Re: [REGRESSION] resume with a Thunderbolt dock broke with commit e8b908146d44 "PCI/PM: Increase wait time after resume"
Date: Wed, 27 Sep 2023 12:24:55 -0500 [thread overview]
Message-ID: <20230927172455.GA455882@bhelgaas> (raw)
In-Reply-To: <20230927170114.GP3208943@black.fi.intel.com>
On Wed, Sep 27, 2023 at 08:01:14PM +0300, Mika Westerberg wrote:
> On Wed, Sep 27, 2023 at 11:50:36AM -0500, Bjorn Helgaas wrote:
> > On Wed, Sep 27, 2023 at 03:47:32PM +0300, Mika Westerberg wrote:
> > > On Wed, Sep 27, 2023 at 06:57:03AM -0500, Bjorn Helgaas wrote:
> > > > On Wed, Sep 27, 2023 at 08:16:02AM +0300, Mika Westerberg wrote:
> > > > > On Tue, Sep 26, 2023 at 12:55:30PM -0500, Bjorn Helgaas wrote:
> > > > > > On Mon, Sep 25, 2023 at 04:19:30PM +0200, Lukas Wunner wrote:
> > > > > > > On Mon, Sep 25, 2023 at 08:48:41AM -0500, Bjorn Helgaas wrote:
> > > > > > > > Now pciehp thinks the slot is occupied and the link is up, so we
> > > > > > > > re-enumerate the hierarchy. Is this because thunderbolt did something
> > > > > > > > to 06:00.0 that made the link from 05:01.0 come up?
> > > > > > >
> > > > > > > PCIe TLPs are encapsulated into Thunderbolt packets and transmitted
> > > > > > > alongside DisplayPort and other data over the same physical link.
> > > > > > >
> > > > > > > For this to work, PCIe tunnels need to be set up between the Thunderbolt
> > > > > > > host controller and attached devices. Once a tunnel is established,
> > > > > > > the PCIe link magically goes up and TLPs can be transmitted.
> > > > > > >
> > > > > > > There are two ways to establish those tunnels:
> > > > > > >
> > > > > > > 1/ By a firmware in the Thunderbolt host controller.
> > > > > > > (firmware or "internal" connection manager, drivers/thunderbolt/icm.c)
> > > > > > >
> > > > > > > 2/ Natively by the kernel.
> > > > > > > (software connection manager)
> > > > > > >
> > > > > > > I'm assuming that the laptop in question exclusively uses the firmware
> > > > > > > connection manager, hence the kernel is reliant on that firmware to
> > > > > > > establish tunnels and can't really do anything if it fails to do so.
> > > > > >
> > > > > > Thanks for the background; that improves my meager understanding a
> > > > > > lot.
> > > > > >
> > > > > > Since this seems to be a firmware issue, it does sound like this
> > > > > > laptop uses a firmware connection manager. But there still seems to
> > > > > > be some kernel connection because pre-e8b908146d44, the link came up
> > > > > > in <5 seconds, and after the minor e8b908146d44 change, it takes >60
> > > > > > seconds.
> > > > >
> > > > > In both cases (with or without) the commit what happens is that after
> > > > > resume is finished the firmware connection manager notices the
> > > > > connection, announces it to the Thunderbolt driver that exposes it to
> > > > > the userspace where boltd re-authorizes the device. This brings up the
> > > > > PCIe tunnel again and things get working.
> > > > >
> > > > > (What is expected to happen is that during the resume the firmware
> > > > > connection manager re-connects the PCIe tunnel.)
> > > > >
> > > > > This took previously the ~5s before resume is complete so that the above
> > > > > steps can happen where as after the commit it got delayed more up to the
> > > > > arbitrary ~60s because we started to use that with the commit
> > > > > e8b908146d44 (PCIE_RESET_READY_POLL_MS).
> > > >
> > > > Why does the kernel delay affect the timing of when the firmware
> > > > connection manager notices the connection? It seems like Linux waits
> > > > for the timeout, then Linux does something that kicks the firmware
> > > > connection manager. That's why I asked about this sequence:
> > > >
> > > > [ 118.985530] pcieport 0000:05:01.0: Data Link Layer Link Active not set in 1000 msec
> > > > [ 190.090902] pcieport 0000:05:01.0: pciehp: Slot(1): Card not present
> > > > [ 191.754347] thunderbolt 0000:06:00.0: 1: DROM version: 1
> > > > [ 191.762638] thunderbolt 0-1: new device found, vendor=0x108 device=0x1630
> > > > [ 191.762641] thunderbolt 0-1: Lenovo ThinkPad Thunderbolt 3 Dock
> > > > [ 191.943506] pcieport 0000:05:01.0: pciehp: Slot(1): Card present
> > > >
> > > > where we wait for the timeout, decide the device is gone, remove
> > > > everything, and then the thunderbolt driver does something, and we
> > > > notice the device is magically back.
> > >
> > > Well the delay delays the whole resume and this includes Thunderbolt
> > > driver resume too, and userspace (where the bolt daemon authorizes the
> > > device again).
> >
> > I don't know how the Thunderbolt driver works. I assume this refers
> > to "thunderbolt 0000:06:00.0"? Is the 06:00.0 resume related to the
> > firmware connection manager somehow?
> >
> > The removal affects the sub-hierarchy below 05:01.0 (bus 07-3b).
> > 06:00.0 is below 05:00.0, so it's in a different sub-hierarchy. I
> > don't think there's a PCIe requirement that 05:01.0 be resumed before
> > 05:00.0, or even that they be serialized at all.
> >
> > The hierarchy:
> >
> > pci 0000:00:1c.4: PCI bridge to [bus 04-3c]
> > pci 0000:04:00.0: PCI bridge to [bus 05-3c]
> > pci 0000:05:00.0: PCI bridge to [bus 06]
> > pci 0000:05:01.0: PCI bridge to [bus 07-3b]
> >
> > It looks like we start the 06:00.0 resume first (118.9), but it
> > doesn't complete until after the timeout (191.7):
> >
> > [ 118.915870] thunderbolt 0000:06:00.0: control channel starting...
> > [ 118.985530] pcieport 0000:05:01.0: Data Link Layer Link Active not set in 1000 msec
> > [ 190.090902] pcieport 0000:05:01.0: pciehp: Slot(1): Card not present
> > [ 191.754347] thunderbolt 0000:06:00.0: 1: DROM version: 1
> > [ 191.762638] thunderbolt 0-1: new device found, vendor=0x108 device=0x1630
> > [ 191.762641] thunderbolt 0-1: Lenovo ThinkPad Thunderbolt 3 Dock
> > [ 191.943506] pcieport 0000:05:01.0: pciehp: Slot(1): Card present
> >
> > Did the Thunderbolt driver do something to 06:00.0 that caused the
> > 05:01.0 link to come up, or is the timing just coincidental?
>
> Yes it sent the firmware a command telling that the driver is ready
> again, then the firmware sends back notification that there is a new
> device:
>
> [ 191.754347] thunderbolt 0000:06:00.0: 1: DROM version: 1
> [ 191.762638] thunderbolt 0-1: new device found, vendor=0x108 device=0x1630
> [ 191.762641] thunderbolt 0-1: Lenovo ThinkPad Thunderbolt 3 Dock
>
> this then is send to the userspace via uevent where bolt goes and
> authorizes it and this results the tunnel to be created which show in
> the log as:
>
> [ 191.943506] pcieport 0000:05:01.0: pciehp: Slot(1): Card present
So the obvious next question is why we have to wait for the 05:01.0
link timeout before sending the command to the 06:00.0 firmware, since
there's no PCI connection between them.
But there must be *some* connection between the 05:01.0 link coming up
and the 06:00.0 behavior. Maybe this is related to the
nhi_resume_noirq() comment about "The tunneled pci bridges are
siblings of us. Use resume_noirq to reenable the tunnels asap. A
corresponding pci quirk blocks the downstream bridges resume_noirq
until we are done." Unfortunately the comment doesn't mention the NAME
of the quirk, so I lost the trail there.
Maybe there's an opportunity for a quirk that says "this Thunderbolt
device should never need a whole second for the link to come up", for
example.
Bjorn
next prev parent reply other threads:[~2023-09-27 17:25 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-21 10:39 [REGRESSION] resume with a Thunderbolt dock broke with commit e8b908146d44 "PCI/PM: Increase wait time after resume" Kamil Paral
2023-08-21 13:12 ` Mika Westerberg
2023-08-22 16:43 ` Kamil Paral
2023-08-23 5:07 ` Mika Westerberg
2023-08-23 7:00 ` Kamil Paral
2023-08-23 7:44 ` Mika Westerberg
2023-08-23 7:56 ` Mika Westerberg
2023-08-23 8:20 ` Kamil Paral
2023-08-23 9:05 ` Mika Westerberg
2023-08-23 14:02 ` Kamil Paral
2023-08-24 11:43 ` Mika Westerberg
2023-08-25 8:42 ` Kamil Paral
2023-08-25 9:46 ` Mika Westerberg
2023-08-25 11:42 ` Kamil Paral
2023-09-23 22:46 ` Bjorn Helgaas
2023-09-24 13:27 ` Mika Westerberg
2023-09-24 20:18 ` Bjorn Helgaas
2023-09-25 4:59 ` Mika Westerberg
2023-09-25 13:48 ` Bjorn Helgaas
2023-09-25 14:19 ` Lukas Wunner
2023-09-26 17:55 ` Bjorn Helgaas
2023-09-27 5:16 ` Mika Westerberg
2023-09-27 11:57 ` Bjorn Helgaas
2023-09-27 12:47 ` Mika Westerberg
2023-09-27 14:31 ` Lukas Wunner
2023-09-27 14:42 ` Mika Westerberg
2023-09-27 15:36 ` Mika Westerberg
2023-09-27 16:50 ` Bjorn Helgaas
2023-09-27 17:01 ` Mika Westerberg
2023-09-27 17:24 ` Bjorn Helgaas [this message]
2023-09-27 18:02 ` Mika Westerberg
2023-09-27 19:41 ` Bjorn Helgaas
2023-09-28 4:42 ` Mika Westerberg
2023-09-28 15:49 ` Bjorn Helgaas
2023-10-05 13:01 ` Kamil Paral
2023-10-05 19:00 ` Bjorn Helgaas
[not found] ` <CA+cBOTds9k1Q2haC_gTpsUvjP02dHOv9vSconFEAu-Fsxwf36A@mail.gmail.com>
2023-09-27 13:53 ` Mika Westerberg
2023-09-27 14:12 ` Kamil Paral
2023-10-05 12:54 ` Kamil Paral
2023-10-05 13:09 ` Mika Westerberg
2023-09-27 14:08 ` Kamil Paral
2023-08-21 19:10 ` Bjorn Helgaas
2023-08-22 16:36 ` Kamil Paral
2023-11-01 10:59 ` Linux regression tracking (Thorsten Leemhuis)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230927172455.GA455882@bhelgaas \
--to=helgaas@kernel.org \
--cc=bhelgaas@google.com \
--cc=chris.chiu@canonical.com \
--cc=kparal@redhat.com \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mika.westerberg@linux.intel.com \
--cc=regressions@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).