From: Mika Westerberg <mika.westerberg@linux.intel.com>
To: Mario Limonciello <mario.limonciello@amd.com>
Cc: "Bjorn Helgaas" <helgaas@kernel.org>, "Gary Li" <Gary.Li@amd.com>,
"Mario Limonciello" <superm1@kernel.org>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Mathias Nyman" <mathias.nyman@intel.com>,
"open list : PCI SUBSYSTEM" <linux-pci@vger.kernel.org>,
"open list" <linux-kernel@vger.kernel.org>,
"open list : USB XHCI DRIVER" <linux-usb@vger.kernel.org>,
"Daniel Drake" <drake@endlessos.org>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
Subject: Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
Date: Thu, 5 Sep 2024 12:33:25 +0300 [thread overview]
Message-ID: <20240905093325.GJ1532424@black.fi.intel.com> (raw)
In-Reply-To: <2bf715fb-509b-4b00-a28d-1cc83c0bb588@amd.com>
Hi,
On Wed, Sep 04, 2024 at 10:24:26AM -0500, Mario Limonciello wrote:
> On 9/4/2024 07:05, Mika Westerberg wrote:
> > Hi,
> >
> > On Tue, Sep 03, 2024 at 01:32:30PM -0500, Mario Limonciello wrote:
> > > On 9/3/2024 13:25, Bjorn Helgaas wrote:
> > > > On Tue, Sep 03, 2024 at 12:31:00PM -0500, Mario Limonciello wrote:
> > > > > On 9/3/2024 12:11, Bjorn Helgaas wrote:
> > > > > ...
> > > >
> > > > > > 8) The USB4 stack sees the device and assumes it is in D0, but it
> > > > > > seems to still be in D3cold. What is this based on? Is there a
> > > > > > config read that returns ~0 data when it shouldn't?
> > > > >
> > > > > Yes there is. From earlier in the thread I have a [log] I shared.
> > > > >
> > > > > The message emitted is from ring_interrupt_active():
> > > > >
> > > > > "thunderbolt 0000:e5:00.5: interrupt for TX ring 0 is already enabled"
> > > >
> > > > Right, that's in the cover letter, but I can't tell from this what the
> > > > ioread32(ring->nhi->iobase + reg) returned. It looks like this is an
> > > > MMIO read of BAR 0, not a config read.
> > > >
> > >
> > > Yeah. I suppose another way to approach this problem is to make something
> > > else in the call chain poll PCI_PM_CTRL.
> > >
> > > Polling at the start of nhi_runtime_resume() should also work. For the
> > > "normal" scenario it would just be a single read to PCI_PM_CTRL.
> > >
> > > Mika, thoughts?
>
> We did this experiment to throw code to poll PCI_PM_CTRL at the start of
> nhi_runtime_resume() but this also fails. From that I would hypothesize the
> device transitioned to D0uninitialized sometime in the middle of
> pci_pm_runtime_resume() before the call to pm->runtime_resume(dev);
>
> >
> > I'm starting to wonder if we are looking at the correct place ;-) This
> > reminds me that our PCIe SV people recently reported a couple of Linux
> > related issues which they recommended to fix, and these are on my list
> > but I'll share them because maybe they are related?
>
> Thanks for sharing those. We had a try with them but sorry to say no
> improvements to the issue at hand.
Okay, thanks for checking.
Few additional side paths here, though. This is supposed to work so that
once the host router sleep bit is set the driver is supposed to allow
the domain to enter sleep (e.g it should not be waken up before it is
fully transitioned). That's what we do:
1. All tunneled PCIe Root/Downstream ports are in D3.
2. All tunneled USB 3.x ports are in U3.
3. No DisplayPort is tunneled.
4. Thunderbolt driver enables wakes.
5. Thunderbolt driver writes sleep ready bit of the host router.
6. Thunderbolt driver runtime suspend is complete.
7. ACPI method is called (_PS3 or _PR3.OFF) that will trigger the "Sleep
Event".
If between 5 and 7 there is device connected, it should not "abort" the
sequence. Unfortunately this is not explict in the USB4 spec but the
connection manager guide has similar note. Even if the connect happens
there the "Sleep Event" should happen but after that it can trigger
normal wakeup which will then bring everything back.
Would it be possible to enable tracing around these steps so that we
could see if there is hotplug notification somewhere there that is not
expected? Here are instructions how to get pretty accurate trace:
https://github.com/intel/tbtools?tab=readme-ov-file#tracing
Please also take full dmesg.
It is entirely possible that this has nothing to do with the issue but I
think it is worth checking.
The second thing we could try is to check the wake status bits after
this has happened, like:
# tbdump -r 0 -a <ADAPTER> -vv -N 1 PORT_CS_18
(where <ADAPTER> is the lane 0 adapter of the USB4 port the device was
connected).
The third thing to try is to comment out TB_WAKE_ON_CONNECT in
tb_switch_suspend(). This should result no wake even if the device is
connected. This tells us that it is really the connect on USB4 port that
triggered the wake.
These could (also) explain why the host router appears to be in D3 even
if it should be in D0 already.
next prev parent reply other threads:[~2024-09-05 9:33 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-23 15:40 [PATCH v5 0/5] Verify devices transition from D3cold to D0 Mario Limonciello
2024-08-23 15:40 ` [PATCH v5 1/5] PCI: Use an enum for reset type in pci_dev_wait() Mario Limonciello
2024-08-23 15:40 ` [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND " Mario Limonciello
2024-08-23 19:54 ` Bjorn Helgaas
2024-08-26 19:16 ` Mario Limonciello
2024-08-27 17:43 ` Mario Limonciello
2024-08-27 19:44 ` Bjorn Helgaas
2024-08-30 0:01 ` Bjorn Helgaas
2024-09-03 16:29 ` Mario Limonciello
2024-09-03 17:11 ` Bjorn Helgaas
2024-09-03 17:31 ` Mario Limonciello
2024-09-03 18:25 ` Bjorn Helgaas
2024-09-03 18:32 ` Mario Limonciello
2024-09-03 21:32 ` Bjorn Helgaas
2024-09-04 12:05 ` Mika Westerberg
2024-09-04 15:24 ` Mario Limonciello
2024-09-05 9:33 ` Mika Westerberg [this message]
2024-09-09 20:40 ` Mario Limonciello
2024-09-10 9:13 ` Mika Westerberg
2024-09-13 4:12 ` Mario Limonciello
2024-09-13 4:58 ` Mika Westerberg
2024-09-13 7:23 ` Mika Westerberg
2024-09-13 20:56 ` Mario Limonciello
2024-09-15 7:07 ` Mika Westerberg
2024-08-23 15:40 ` [PATCH v5 3/5] PCI: Verify functions currently in D3cold have entered D0 Mario Limonciello
2024-08-23 15:40 ` [PATCH v5 4/5] PCI: Allow Ryzen XHCI controllers into D3cold and drop delays Mario Limonciello
2024-08-23 15:40 ` [PATCH v5 5/5] PCI: Drop Radeon quirk for Macbook Pro 8.2 Mario Limonciello
2024-12-04 17:30 ` [PATCH v5 0/5] Verify devices transition from D3cold to D0 Mario Limonciello
2024-12-04 23:45 ` Bjorn Helgaas
2024-12-05 3:44 ` Mario Limonciello
2024-12-05 18:12 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240905093325.GJ1532424@black.fi.intel.com \
--to=mika.westerberg@linux.intel.com \
--cc=Gary.Li@amd.com \
--cc=bhelgaas@google.com \
--cc=drake@endlessos.org \
--cc=gregkh@linuxfoundation.org \
--cc=helgaas@kernel.org \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=mario.limonciello@amd.com \
--cc=mathias.nyman@intel.com \
--cc=superm1@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).