linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
@ 2024-10-02 17:42 Harry Wentland
  2024-10-02 19:39 ` Mario Limonciello
  2024-10-03  5:47 ` Mika Westerberg
  0 siblings, 2 replies; 21+ messages in thread
From: Harry Wentland @ 2024-10-02 17:42 UTC (permalink / raw)
  To: linux-usb, mathias.nyman
  Cc: regressions, Limonciello, Mario, mika.westerberg, Raju.Rangoju,
	Sanath.S, Greg KH

I was checking out the 6.12 rc1 (through drm-next) kernel and found
my system hung at boot. No meaningful message showed on the kernel
boot screen.

A bisect revealed the culprit to be

commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
Author: Mathias Nyman <mathias.nyman@linux.intel.com>
Date:   Fri Aug 30 18:26:29 2024 +0300

    usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface

A revert of this single patch "fixes" the issue and I can boot again.
    
The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
It's running Arch Linux but I doubt that's of consequence.

lspci output:
    https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0
acpidump:
    https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089

Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
Another suggestion to do usbcore.nousb lets me boot to the desktop
on a kernel with the faulty patch, without USB functionality, obviously.

I'd be happy to try any patches, provide more data, or run experiments.

Thanks,
Harry

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-02 17:42 [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface Harry Wentland
@ 2024-10-02 19:39 ` Mario Limonciello
  2024-10-03  5:47 ` Mika Westerberg
  1 sibling, 0 replies; 21+ messages in thread
From: Mario Limonciello @ 2024-10-02 19:39 UTC (permalink / raw)
  To: Harry Wentland, linux-usb, mathias.nyman
  Cc: regressions, mika.westerberg, Raju.Rangoju, Sanath.S, Greg KH

On 10/2/2024 12:42, Harry Wentland wrote:
> I was checking out the 6.12 rc1 (through drm-next) kernel and found
> my system hung at boot. No meaningful message showed on the kernel
> boot screen.
> 
> A bisect revealed the culprit to be
> 
> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
> Author: Mathias Nyman <mathias.nyman@linux.intel.com>
> Date:   Fri Aug 30 18:26:29 2024 +0300
> 
>      usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
> 
> A revert of this single patch "fixes" the issue and I can boot again.
>      
> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
> It's running Arch Linux but I doubt that's of consequence.
> 
> lspci output:
>      https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0
> acpidump:
>      https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089
> 
> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
> Another suggestion to do usbcore.nousb lets me boot to the desktop
> on a kernel with the faulty patch, without USB functionality, obviously.
> 
> I'd be happy to try any patches, provide more data, or run experiments.
> 
> Thanks,
> Harry

FWIW I did take another Lenovo laptop (Z13) w/ a CPU from the same 
generation (Ryzen 5 PRO 6650U) and 6.12-rc1 but can't reproduce this issue.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-02 17:42 [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface Harry Wentland
  2024-10-02 19:39 ` Mario Limonciello
@ 2024-10-03  5:47 ` Mika Westerberg
  2024-10-03 13:10   ` Mario Limonciello
  1 sibling, 1 reply; 21+ messages in thread
From: Mika Westerberg @ 2024-10-03  5:47 UTC (permalink / raw)
  To: Harry Wentland
  Cc: linux-usb, mathias.nyman, regressions, Limonciello, Mario,
	Raju.Rangoju, Sanath.S, Greg KH

Hi Harry,

On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
> I was checking out the 6.12 rc1 (through drm-next) kernel and found
> my system hung at boot. No meaningful message showed on the kernel
> boot screen.
> 
> A bisect revealed the culprit to be
> 
> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
> Author: Mathias Nyman <mathias.nyman@linux.intel.com>
> Date:   Fri Aug 30 18:26:29 2024 +0300
> 
>     usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
> 
> A revert of this single patch "fixes" the issue and I can boot again.
>     
> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
> It's running Arch Linux but I doubt that's of consequence.
> 
> lspci output:
>     https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0
> acpidump:
>     https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089
> 
> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
> Another suggestion to do usbcore.nousb lets me boot to the desktop
> on a kernel with the faulty patch, without USB functionality, obviously.
> 
> I'd be happy to try any patches, provide more data, or run experiments.

Do you boot with any device connected?

Second thing that I noticed, though I'm not familiar with AMD hardware,
but from your lspci dump, I do not see the PCIe ports that are being
used to tunnel PCIe. Does this system have PCIe tunneling disabled
somehow?

You don't see anything on the console? It's all blank or it just hangs
after some messages?

Can you also provide full dmesg with that commit reverted with
"thunderbolt.dyndbg=+p" in the kernel command line?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-03  5:47 ` Mika Westerberg
@ 2024-10-03 13:10   ` Mario Limonciello
  2024-10-03 13:27     ` Mika Westerberg
  0 siblings, 1 reply; 21+ messages in thread
From: Mario Limonciello @ 2024-10-03 13:10 UTC (permalink / raw)
  To: Mika Westerberg, Harry Wentland
  Cc: linux-usb, mathias.nyman, regressions, Raju.Rangoju, Sanath.S,
	Greg KH

On 10/3/2024 00:47, Mika Westerberg wrote:
> Hi Harry,
> 
> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
>> I was checking out the 6.12 rc1 (through drm-next) kernel and found
>> my system hung at boot. No meaningful message showed on the kernel
>> boot screen.
>>
>> A bisect revealed the culprit to be
>>
>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
>> Author: Mathias Nyman <mathias.nyman@linux.intel.com>
>> Date:   Fri Aug 30 18:26:29 2024 +0300
>>
>>      usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
>>
>> A revert of this single patch "fixes" the issue and I can boot again.
>>      
>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
>> It's running Arch Linux but I doubt that's of consequence.
>>
>> lspci output:
>>      https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0
>> acpidump:
>>      https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089
>>
>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
>> Another suggestion to do usbcore.nousb lets me boot to the desktop
>> on a kernel with the faulty patch, without USB functionality, obviously.
>>
>> I'd be happy to try any patches, provide more data, or run experiments.
> 
> Do you boot with any device connected?
 > > Second thing that I noticed, though I'm not familiar with AMD hardware,
> but from your lspci dump, I do not see the PCIe ports that are being
> used to tunnel PCIe. Does this system have PCIe tunneling disabled
> somehow?

On some OEM systems it's possible to lock down from BIOS to turn off 
PCIe tunneling, and I agree that looks like the most common cause.

This is what you would see on a system that has tunnels (I checked on my 
side w/ Z series laptop w/ Rembrandt and a dock connected):

            +-03.0
            +-03.1-[03-32]--
            +-04.0
            +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
            |                               \-04.0-[36-62]--

00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 
17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h 
USB4/Thunderbolt PCIe tunnel [1022:14cd]
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 
17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h 
USB4/Thunderbolt PCIe tunnel [1022:14cd]

> 
> You don't see anything on the console? It's all blank or it just hangs
> after some messages?

I guess it is getting stuck on fwnode_find_reference() because it never 
finds the given node?

> 
> Can you also provide full dmesg with that commit reverted with
> "thunderbolt.dyndbg=+p" in the kernel command line?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-03 13:10   ` Mario Limonciello
@ 2024-10-03 13:27     ` Mika Westerberg
  2024-10-03 13:42       ` Mario Limonciello
  0 siblings, 1 reply; 21+ messages in thread
From: Mika Westerberg @ 2024-10-03 13:27 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Harry Wentland, linux-usb, mathias.nyman, regressions,
	Raju.Rangoju, Sanath.S, Greg KH

On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote:
> On 10/3/2024 00:47, Mika Westerberg wrote:
> > Hi Harry,
> > 
> > On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
> > > I was checking out the 6.12 rc1 (through drm-next) kernel and found
> > > my system hung at boot. No meaningful message showed on the kernel
> > > boot screen.
> > > 
> > > A bisect revealed the culprit to be
> > > 
> > > commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
> > > Author: Mathias Nyman <mathias.nyman@linux.intel.com>
> > > Date:   Fri Aug 30 18:26:29 2024 +0300
> > > 
> > >      usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
> > > 
> > > A revert of this single patch "fixes" the issue and I can boot again.
> > > The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
> > > It's running Arch Linux but I doubt that's of consequence.
> > > 
> > > lspci output:
> > >      https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0
> > > acpidump:
> > >      https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089
> > > 
> > > Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
> > > Another suggestion to do usbcore.nousb lets me boot to the desktop
> > > on a kernel with the faulty patch, without USB functionality, obviously.
> > > 
> > > I'd be happy to try any patches, provide more data, or run experiments.
> > 
> > Do you boot with any device connected?
> > > Second thing that I noticed, though I'm not familiar with AMD hardware,
> > but from your lspci dump, I do not see the PCIe ports that are being
> > used to tunnel PCIe. Does this system have PCIe tunneling disabled
> > somehow?
> 
> On some OEM systems it's possible to lock down from BIOS to turn off PCIe
> tunneling, and I agree that looks like the most common cause.
> 
> This is what you would see on a system that has tunnels (I checked on my
> side w/ Z series laptop w/ Rembrandt and a dock connected):
> 
>            +-03.0
>            +-03.1-[03-32]--
>            +-04.0
>            +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
>            |                               \-04.0-[36-62]--
> 
> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
> USB4/Thunderbolt PCIe tunnel [1022:14cd]
> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
> USB4/Thunderbolt PCIe tunnel [1022:14cd]

Okay this is more like what I expected, although probably not the
reason here.

Are you able to replicate the issue if you disable PCIe tunneling from
the BIOS on your reference system? (Probably not but just in case).

> > You don't see anything on the console? It's all blank or it just hangs
> > after some messages?
> 
> I guess it is getting stuck on fwnode_find_reference() because it never
> finds the given node?

Looking at the code, I don't see where it could get stuck. If for some
reason there is no such reference (there is based on the ACPI dump) then
it should not affect the boot. It only matters when power management is
involved.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-03 13:27     ` Mika Westerberg
@ 2024-10-03 13:42       ` Mario Limonciello
  2024-10-03 13:47         ` Mika Westerberg
  0 siblings, 1 reply; 21+ messages in thread
From: Mario Limonciello @ 2024-10-03 13:42 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Harry Wentland, linux-usb, mathias.nyman, regressions,
	Raju.Rangoju, Sanath.S, Greg KH

On 10/3/2024 08:27, Mika Westerberg wrote:
> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote:
>> On 10/3/2024 00:47, Mika Westerberg wrote:
>>> Hi Harry,
>>>
>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found
>>>> my system hung at boot. No meaningful message showed on the kernel
>>>> boot screen.
>>>>
>>>> A bisect revealed the culprit to be
>>>>
>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
>>>> Author: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>> Date:   Fri Aug 30 18:26:29 2024 +0300
>>>>
>>>>       usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
>>>>
>>>> A revert of this single patch "fixes" the issue and I can boot again.
>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
>>>> It's running Arch Linux but I doubt that's of consequence.
>>>>
>>>> lspci output:
>>>>       https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0
>>>> acpidump:
>>>>       https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089
>>>>
>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop
>>>> on a kernel with the faulty patch, without USB functionality, obviously.
>>>>
>>>> I'd be happy to try any patches, provide more data, or run experiments.
>>>
>>> Do you boot with any device connected?
>>>> Second thing that I noticed, though I'm not familiar with AMD hardware,
>>> but from your lspci dump, I do not see the PCIe ports that are being
>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled
>>> somehow?
>>
>> On some OEM systems it's possible to lock down from BIOS to turn off PCIe
>> tunneling, and I agree that looks like the most common cause.
>>
>> This is what you would see on a system that has tunnels (I checked on my
>> side w/ Z series laptop w/ Rembrandt and a dock connected):
>>
>>             +-03.0
>>             +-03.1-[03-32]--
>>             +-04.0
>>             +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
>>             |                               \-04.0-[36-62]--
>>
>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
> 
> Okay this is more like what I expected, although probably not the
> reason here.
> 
> Are you able to replicate the issue if you disable PCIe tunneling from
> the BIOS on your reference system? (Probably not but just in case).

I checked on the Lenovo Z13 laptop I have and turned off "USB port" in 
BIOS setup and this caused the endpoints 3.1 and 4.1 I listed above to 
disappear but the system still boots up just fine for me on 6.12-rc1.

> 
>>> You don't see anything on the console? It's all blank or it just hangs
>>> after some messages?
>>
>> I guess it is getting stuck on fwnode_find_reference() because it never
>> finds the given node?
> 
> Looking at the code, I don't see where it could get stuck. If for some
> reason there is no such reference (there is based on the ACPI dump) then
> it should not affect the boot. It only matters when power management is
> involved.

Nothing jumps out to me either.  Maybe this is a situation that Harry 
can sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to 
enlighten what's going on (assuming the console output is "working" when 
this happened).


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-03 13:42       ` Mario Limonciello
@ 2024-10-03 13:47         ` Mika Westerberg
  2024-10-03 18:23           ` Harry Wentland
  2024-10-09 21:52           ` Mathias Nyman
  0 siblings, 2 replies; 21+ messages in thread
From: Mika Westerberg @ 2024-10-03 13:47 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Harry Wentland, linux-usb, mathias.nyman, regressions,
	Raju.Rangoju, Sanath.S, Greg KH

On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote:
> On 10/3/2024 08:27, Mika Westerberg wrote:
> > On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote:
> > > On 10/3/2024 00:47, Mika Westerberg wrote:
> > > > Hi Harry,
> > > > 
> > > > On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
> > > > > I was checking out the 6.12 rc1 (through drm-next) kernel and found
> > > > > my system hung at boot. No meaningful message showed on the kernel
> > > > > boot screen.
> > > > > 
> > > > > A bisect revealed the culprit to be
> > > > > 
> > > > > commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
> > > > > Author: Mathias Nyman <mathias.nyman@linux.intel.com>
> > > > > Date:   Fri Aug 30 18:26:29 2024 +0300
> > > > > 
> > > > >       usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
> > > > > 
> > > > > A revert of this single patch "fixes" the issue and I can boot again.
> > > > > The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
> > > > > It's running Arch Linux but I doubt that's of consequence.
> > > > > 
> > > > > lspci output:
> > > > >       https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0
> > > > > acpidump:
> > > > >       https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089
> > > > > 
> > > > > Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
> > > > > Another suggestion to do usbcore.nousb lets me boot to the desktop
> > > > > on a kernel with the faulty patch, without USB functionality, obviously.
> > > > > 
> > > > > I'd be happy to try any patches, provide more data, or run experiments.
> > > > 
> > > > Do you boot with any device connected?
> > > > > Second thing that I noticed, though I'm not familiar with AMD hardware,
> > > > but from your lspci dump, I do not see the PCIe ports that are being
> > > > used to tunnel PCIe. Does this system have PCIe tunneling disabled
> > > > somehow?
> > > 
> > > On some OEM systems it's possible to lock down from BIOS to turn off PCIe
> > > tunneling, and I agree that looks like the most common cause.
> > > 
> > > This is what you would see on a system that has tunnels (I checked on my
> > > side w/ Z series laptop w/ Rembrandt and a dock connected):
> > > 
> > >             +-03.0
> > >             +-03.1-[03-32]--
> > >             +-04.0
> > >             +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
> > >             |                               \-04.0-[36-62]--
> > > 
> > > 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
> > > 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
> > > 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
> > > USB4/Thunderbolt PCIe tunnel [1022:14cd]
> > > 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
> > > 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
> > > 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
> > > USB4/Thunderbolt PCIe tunnel [1022:14cd]
> > 
> > Okay this is more like what I expected, although probably not the
> > reason here.
> > 
> > Are you able to replicate the issue if you disable PCIe tunneling from
> > the BIOS on your reference system? (Probably not but just in case).
> 
> I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS
> setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear
> but the system still boots up just fine for me on 6.12-rc1.

Okay thanks for checking!

> > > > You don't see anything on the console? It's all blank or it just hangs
> > > > after some messages?
> > > 
> > > I guess it is getting stuck on fwnode_find_reference() because it never
> > > finds the given node?
> > 
> > Looking at the code, I don't see where it could get stuck. If for some
> > reason there is no such reference (there is based on the ACPI dump) then
> > it should not affect the boot. It only matters when power management is
> > involved.
> 
> Nothing jumps out to me either.  Maybe this is a situation that Harry can
> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to
> enlighten what's going on (assuming the console output is "working" when
> this happened).

There are couple of places there that may cause it to crash, I think.
And the __free() magic is something I cannot wrap my head around :(

Anyways, Harry can you try the below patch and see if it makes any
difference? Also if it does please provide dmesg.

diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
index 21585ed89ef8..90360f7ca905 100644
--- a/drivers/usb/core/usb-acpi.c
+++ b/drivers/usb/core/usb-acpi.c
@@ -157,6 +157,7 @@ EXPORT_SYMBOL_GPL(usb_acpi_set_power_state);
  */
 static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
 {
+	struct fwnode_handle *nhi_fwnode;
 	const struct device_link *link;
 	struct usb_port *port_dev;
 	struct usb_hub *hub;
@@ -165,11 +166,12 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
 		return 0;
 
 	hub = usb_hub_to_struct_hub(udev->parent);
-	port_dev = hub->ports[udev->portnum - 1];
+	if (WARN_ON(!hub))
+		return 0;
 
-	struct fwnode_handle *nhi_fwnode __free(fwnode_handle) =
-		fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0);
+	port_dev = hub->ports[udev->portnum - 1];
 
+	nhi_fwnode = fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0);
 	if (IS_ERR(nhi_fwnode))
 		return 0;
 
@@ -180,12 +182,14 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
 	if (!link) {
 		dev_err(&port_dev->dev, "Failed to created device link from %s to %s\n",
 			dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
+		fwnode_handle_put(nhi_fwnode);
 		return -EINVAL;
 	}
 
-	dev_dbg(&port_dev->dev, "Created device link from %s to %s\n",
-		dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
+	dev_info(&port_dev->dev, "Created device link from %s to %s\n",
+		 dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
 
+	fwnode_handle_put(nhi_fwnode);
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-03 13:47         ` Mika Westerberg
@ 2024-10-03 18:23           ` Harry Wentland
  2024-10-03 18:51             ` Harry Wentland
  2024-10-09 21:52           ` Mathias Nyman
  1 sibling, 1 reply; 21+ messages in thread
From: Harry Wentland @ 2024-10-03 18:23 UTC (permalink / raw)
  To: Mika Westerberg, Mario Limonciello
  Cc: linux-usb, mathias.nyman, regressions, Raju.Rangoju, Sanath.S,
	Greg KH



On 2024-10-03 09:47, Mika Westerberg wrote:
> On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote:
>> On 10/3/2024 08:27, Mika Westerberg wrote:
>>> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote:
>>>> On 10/3/2024 00:47, Mika Westerberg wrote:
>>>>> Hi Harry,
>>>>>
>>>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
>>>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found
>>>>>> my system hung at boot. No meaningful message showed on the kernel
>>>>>> boot screen.
>>>>>>
>>>>>> A bisect revealed the culprit to be
>>>>>>
>>>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
>>>>>> Author: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>>>> Date:   Fri Aug 30 18:26:29 2024 +0300
>>>>>>
>>>>>>       usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
>>>>>>
>>>>>> A revert of this single patch "fixes" the issue and I can boot again.
>>>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
>>>>>> It's running Arch Linux but I doubt that's of consequence.
>>>>>>
>>>>>> lspci output:
>>>>>>       https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0
>>>>>> acpidump:
>>>>>>       https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089
>>>>>>
>>>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
>>>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop
>>>>>> on a kernel with the faulty patch, without USB functionality, obviously.
>>>>>>
>>>>>> I'd be happy to try any patches, provide more data, or run experiments.
>>>>>
>>>>> Do you boot with any device connected?

Great question. A Thinkpad USB-C dock. When I unplug the dock at boot it
boots fine and when I plug it in later the laptop charges from it and the
dock's audio output work fine.

In the midst of my experiments I also noticed at one point the dock
wasn't charging my laptop and hard-resetting the laptop didn't fix that.
I had to unplug the dock from the wall and plug it back. So there is
likely some interaction going on with this particular dock that must've
sent the dock's FW into a bad state.

The dmesg with the revert and thunderbolt.dyndbg=+p is here
https://gist.github.com/hwentland/7e25dedd3e707fdae1185d65224d4d66

I don't see any PCIe tunneling option in my BIOS.

>>>>>> Second thing that I noticed, though I'm not familiar with AMD hardware,
>>>>> but from your lspci dump, I do not see the PCIe ports that are being
>>>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled
>>>>> somehow?
>>>>
>>>> On some OEM systems it's possible to lock down from BIOS to turn off PCIe
>>>> tunneling, and I agree that looks like the most common cause.
>>>>
>>>> This is what you would see on a system that has tunnels (I checked on my
>>>> side w/ Z series laptop w/ Rembrandt and a dock connected):
>>>>
>>>>             +-03.0
>>>>             +-03.1-[03-32]--
>>>>             +-04.0
>>>>             +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
>>>>             |                               \-04.0-[36-62]--
>>>>
>>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>
>>> Okay this is more like what I expected, although probably not the
>>> reason here.
>>>
>>> Are you able to replicate the issue if you disable PCIe tunneling from
>>> the BIOS on your reference system? (Probably not but just in case).
>>
>> I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS
>> setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear
>> but the system still boots up just fine for me on 6.12-rc1.
> 
> Okay thanks for checking!
> 
>>>>> You don't see anything on the console? It's all blank or it just hangs
>>>>> after some messages?
>>>>

It hangs after some messages.

>>>> I guess it is getting stuck on fwnode_find_reference() because it never
>>>> finds the given node?
>>>
>>> Looking at the code, I don't see where it could get stuck. If for some
>>> reason there is no such reference (there is based on the ACPI dump) then
>>> it should not affect the boot. It only matters when power management is
>>> involved.
>>
>> Nothing jumps out to me either.  Maybe this is a situation that Harry can
>> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to
>> enlighten what's going on (assuming the console output is "working" when
>> this happened).
> 
> There are couple of places there that may cause it to crash, I think.
> And the __free() magic is something I cannot wrap my head around :(
> 
> Anyways, Harry can you try the below patch and see if it makes any
> difference? Also if it does please provide dmesg.
> 

The patch doesn't seem to make a difference. Same hang on boot.

Harry

> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
> index 21585ed89ef8..90360f7ca905 100644
> --- a/drivers/usb/core/usb-acpi.c
> +++ b/drivers/usb/core/usb-acpi.c
> @@ -157,6 +157,7 @@ EXPORT_SYMBOL_GPL(usb_acpi_set_power_state);
>   */
>  static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>  {
> +	struct fwnode_handle *nhi_fwnode;
>  	const struct device_link *link;
>  	struct usb_port *port_dev;
>  	struct usb_hub *hub;
> @@ -165,11 +166,12 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>  		return 0;
>  
>  	hub = usb_hub_to_struct_hub(udev->parent);
> -	port_dev = hub->ports[udev->portnum - 1];
> +	if (WARN_ON(!hub))
> +		return 0;
>  
> -	struct fwnode_handle *nhi_fwnode __free(fwnode_handle) =
> -		fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0);
> +	port_dev = hub->ports[udev->portnum - 1];
>  
> +	nhi_fwnode = fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0);
>  	if (IS_ERR(nhi_fwnode))
>  		return 0;
>  
> @@ -180,12 +182,14 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>  	if (!link) {
>  		dev_err(&port_dev->dev, "Failed to created device link from %s to %s\n",
>  			dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
> +		fwnode_handle_put(nhi_fwnode);
>  		return -EINVAL;
>  	}
>  
> -	dev_dbg(&port_dev->dev, "Created device link from %s to %s\n",
> -		dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
> +	dev_info(&port_dev->dev, "Created device link from %s to %s\n",
> +		 dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
>  
> +	fwnode_handle_put(nhi_fwnode);
>  	return 0;
>  }
>  
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-03 18:23           ` Harry Wentland
@ 2024-10-03 18:51             ` Harry Wentland
  2024-10-03 19:09               ` Mario Limonciello
  2024-10-03 19:48               ` Michał Pecio
  0 siblings, 2 replies; 21+ messages in thread
From: Harry Wentland @ 2024-10-03 18:51 UTC (permalink / raw)
  To: Mika Westerberg, Mario Limonciello
  Cc: linux-usb, mathias.nyman, regressions, Raju.Rangoju, Sanath.S,
	Greg KH



On 2024-10-03 14:23, Harry Wentland wrote:
> 
> 
> On 2024-10-03 09:47, Mika Westerberg wrote:
>> On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote:
>>> On 10/3/2024 08:27, Mika Westerberg wrote:
>>>> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote:
>>>>> On 10/3/2024 00:47, Mika Westerberg wrote:
>>>>>> Hi Harry,
>>>>>>
>>>>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
>>>>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found
>>>>>>> my system hung at boot. No meaningful message showed on the kernel
>>>>>>> boot screen.
>>>>>>>
>>>>>>> A bisect revealed the culprit to be
>>>>>>>
>>>>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
>>>>>>> Author: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>>>>> Date:   Fri Aug 30 18:26:29 2024 +0300
>>>>>>>
>>>>>>>       usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
>>>>>>>
>>>>>>> A revert of this single patch "fixes" the issue and I can boot again.
>>>>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
>>>>>>> It's running Arch Linux but I doubt that's of consequence.
>>>>>>>
>>>>>>> lspci output:
>>>>>>>       https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0
>>>>>>> acpidump:
>>>>>>>       https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089
>>>>>>>
>>>>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
>>>>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop
>>>>>>> on a kernel with the faulty patch, without USB functionality, obviously.
>>>>>>>
>>>>>>> I'd be happy to try any patches, provide more data, or run experiments.
>>>>>>
>>>>>> Do you boot with any device connected?
> 
> Great question. A Thinkpad USB-C dock. When I unplug the dock at boot it
> boots fine and when I plug it in later the laptop charges from it and the
> dock's audio output work fine.
> 
> In the midst of my experiments I also noticed at one point the dock
> wasn't charging my laptop and hard-resetting the laptop didn't fix that.
> I had to unplug the dock from the wall and plug it back. So there is
> likely some interaction going on with this particular dock that must've
> sent the dock's FW into a bad state.
> 
> The dmesg with the revert and thunderbolt.dyndbg=+p is here
> https://gist.github.com/hwentland/7e25dedd3e707fdae1185d65224d4d66
> 

Apologies, that dmesg was from a build with a bad .config and has some
FW loading errors. They seem to be unrelated though. This is a dmesg
from a good build. It still has a wlan FW error but that shouldn't have
anything to do with the problem at hand.

https://gist.github.com/hwentland/867f7afbf3df20547a877e794a8d8e6b

> I don't see any PCIe tunneling option in my BIOS.
> 
>>>>>>> Second thing that I noticed, though I'm not familiar with AMD hardware,
>>>>>> but from your lspci dump, I do not see the PCIe ports that are being
>>>>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled
>>>>>> somehow?
>>>>>
>>>>> On some OEM systems it's possible to lock down from BIOS to turn off PCIe
>>>>> tunneling, and I agree that looks like the most common cause.
>>>>>
>>>>> This is what you would see on a system that has tunnels (I checked on my
>>>>> side w/ Z series laptop w/ Rembrandt and a dock connected):
>>>>>
>>>>>             +-03.0
>>>>>             +-03.1-[03-32]--
>>>>>             +-04.0
>>>>>             +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
>>>>>             |                               \-04.0-[36-62]--
>>>>>
>>>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>
>>>> Okay this is more like what I expected, although probably not the
>>>> reason here.
>>>>
>>>> Are you able to replicate the issue if you disable PCIe tunneling from
>>>> the BIOS on your reference system? (Probably not but just in case).
>>>
>>> I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS
>>> setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear
>>> but the system still boots up just fine for me on 6.12-rc1.
>>
>> Okay thanks for checking!
>>
>>>>>> You don't see anything on the console? It's all blank or it just hangs
>>>>>> after some messages?
>>>>>
> 
> It hangs after some messages.
> 
>>>>> I guess it is getting stuck on fwnode_find_reference() because it never
>>>>> finds the given node?
>>>>
>>>> Looking at the code, I don't see where it could get stuck. If for some
>>>> reason there is no such reference (there is based on the ACPI dump) then
>>>> it should not affect the boot. It only matters when power management is
>>>> involved.
>>>
>>> Nothing jumps out to me either.  Maybe this is a situation that Harry can
>>> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to
>>> enlighten what's going on (assuming the console output is "working" when
>>> this happened).
>>

I sprinkled printks but don't see any on the console.

Harry

>> There are couple of places there that may cause it to crash, I think.
>> And the __free() magic is something I cannot wrap my head around :(
>>
>> Anyways, Harry can you try the below patch and see if it makes any
>> difference? Also if it does please provide dmesg.
>>
> 
> The patch doesn't seem to make a difference. Same hang on boot.
> 
> Harry
> 
>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
>> index 21585ed89ef8..90360f7ca905 100644
>> --- a/drivers/usb/core/usb-acpi.c
>> +++ b/drivers/usb/core/usb-acpi.c
>> @@ -157,6 +157,7 @@ EXPORT_SYMBOL_GPL(usb_acpi_set_power_state);
>>   */
>>  static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>  {
>> +	struct fwnode_handle *nhi_fwnode;
>>  	const struct device_link *link;
>>  	struct usb_port *port_dev;
>>  	struct usb_hub *hub;
>> @@ -165,11 +166,12 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>  		return 0;
>>  
>>  	hub = usb_hub_to_struct_hub(udev->parent);
>> -	port_dev = hub->ports[udev->portnum - 1];
>> +	if (WARN_ON(!hub))
>> +		return 0;
>>  
>> -	struct fwnode_handle *nhi_fwnode __free(fwnode_handle) =
>> -		fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0);
>> +	port_dev = hub->ports[udev->portnum - 1];
>>  
>> +	nhi_fwnode = fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0);
>>  	if (IS_ERR(nhi_fwnode))
>>  		return 0;
>>  
>> @@ -180,12 +182,14 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>  	if (!link) {
>>  		dev_err(&port_dev->dev, "Failed to created device link from %s to %s\n",
>>  			dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
>> +		fwnode_handle_put(nhi_fwnode);
>>  		return -EINVAL;
>>  	}
>>  
>> -	dev_dbg(&port_dev->dev, "Created device link from %s to %s\n",
>> -		dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
>> +	dev_info(&port_dev->dev, "Created device link from %s to %s\n",
>> +		 dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
>>  
>> +	fwnode_handle_put(nhi_fwnode);
>>  	return 0;
>>  }
>>  
>>
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-03 18:51             ` Harry Wentland
@ 2024-10-03 19:09               ` Mario Limonciello
  2024-10-03 19:41                 ` Harry Wentland
  2024-10-03 19:48               ` Michał Pecio
  1 sibling, 1 reply; 21+ messages in thread
From: Mario Limonciello @ 2024-10-03 19:09 UTC (permalink / raw)
  To: Harry Wentland, Mika Westerberg
  Cc: linux-usb, mathias.nyman, regressions, Raju.Rangoju, Sanath.S,
	Greg KH

On 10/3/2024 13:51, Harry Wentland wrote:
> 
> 
> On 2024-10-03 14:23, Harry Wentland wrote:
>>
>>
>> On 2024-10-03 09:47, Mika Westerberg wrote:
>>> On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote:
>>>> On 10/3/2024 08:27, Mika Westerberg wrote:
>>>>> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote:
>>>>>> On 10/3/2024 00:47, Mika Westerberg wrote:
>>>>>>> Hi Harry,
>>>>>>>
>>>>>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
>>>>>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found
>>>>>>>> my system hung at boot. No meaningful message showed on the kernel
>>>>>>>> boot screen.
>>>>>>>>
>>>>>>>> A bisect revealed the culprit to be
>>>>>>>>
>>>>>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
>>>>>>>> Author: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>>>>>> Date:   Fri Aug 30 18:26:29 2024 +0300
>>>>>>>>
>>>>>>>>        usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
>>>>>>>>
>>>>>>>> A revert of this single patch "fixes" the issue and I can boot again.
>>>>>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
>>>>>>>> It's running Arch Linux but I doubt that's of consequence.
>>>>>>>>
>>>>>>>> lspci output:
>>>>>>>>        https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0
>>>>>>>> acpidump:
>>>>>>>>        https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089
>>>>>>>>
>>>>>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
>>>>>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop
>>>>>>>> on a kernel with the faulty patch, without USB functionality, obviously.
>>>>>>>>
>>>>>>>> I'd be happy to try any patches, provide more data, or run experiments.
>>>>>>>
>>>>>>> Do you boot with any device connected?
>>
>> Great question. A Thinkpad USB-C dock. When I unplug the dock at boot it
>> boots fine and when I plug it in later the laptop charges from it and the
>> dock's audio output work fine.
>>
>> In the midst of my experiments I also noticed at one point the dock
>> wasn't charging my laptop and hard-resetting the laptop didn't fix that.
>> I had to unplug the dock from the wall and plug it back. So there is
>> likely some interaction going on with this particular dock that must've
>> sent the dock's FW into a bad state.
>>
>> The dmesg with the revert and thunderbolt.dyndbg=+p is here
>> https://gist.github.com/hwentland/7e25dedd3e707fdae1185d65224d4d66
>>
> 
> Apologies, that dmesg was from a build with a bad .config and has some
> FW loading errors. They seem to be unrelated though. This is a dmesg
> from a good build. It still has a wlan FW error but that shouldn't have
> anything to do with the problem at hand.
> 
> https://gist.github.com/hwentland/867f7afbf3df20547a877e794a8d8e6b
> 
>> I don't see any PCIe tunneling option in my BIOS.
>>
>>>>>>>> Second thing that I noticed, though I'm not familiar with AMD hardware,
>>>>>>> but from your lspci dump, I do not see the PCIe ports that are being
>>>>>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled
>>>>>>> somehow?
>>>>>>
>>>>>> On some OEM systems it's possible to lock down from BIOS to turn off PCIe
>>>>>> tunneling, and I agree that looks like the most common cause.
>>>>>>
>>>>>> This is what you would see on a system that has tunnels (I checked on my
>>>>>> side w/ Z series laptop w/ Rembrandt and a dock connected):
>>>>>>
>>>>>>              +-03.0
>>>>>>              +-03.1-[03-32]--
>>>>>>              +-04.0
>>>>>>              +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
>>>>>>              |                               \-04.0-[36-62]--
>>>>>>
>>>>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>>
>>>>> Okay this is more like what I expected, although probably not the
>>>>> reason here.
>>>>>
>>>>> Are you able to replicate the issue if you disable PCIe tunneling from
>>>>> the BIOS on your reference system? (Probably not but just in case).
>>>>
>>>> I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS
>>>> setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear
>>>> but the system still boots up just fine for me on 6.12-rc1.
>>>
>>> Okay thanks for checking!
>>>
>>>>>>> You don't see anything on the console? It's all blank or it just hangs
>>>>>>> after some messages?
>>>>>>
>>
>> It hangs after some messages.
>>
>>>>>> I guess it is getting stuck on fwnode_find_reference() because it never
>>>>>> finds the given node?
>>>>>
>>>>> Looking at the code, I don't see where it could get stuck. If for some
>>>>> reason there is no such reference (there is based on the ACPI dump) then
>>>>> it should not affect the boot. It only matters when power management is
>>>>> involved.
>>>>
>>>> Nothing jumps out to me either.  Maybe this is a situation that Harry can
>>>> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to
>>>> enlighten what's going on (assuming the console output is "working" when
>>>> this happened).
>>>
> 
> I sprinkled printks but don't see any on the console.
> 

You said it can work properly without the revert if you don't boot with 
the dock plugged in?

How about if you unplug it, does unhang and you get everything flushed 
to the console?

Or maybe magic sysrq with a backtrace (l) can help see where something 
is spinning.

> Harry
> 
>>> There are couple of places there that may cause it to crash, I think.
>>> And the __free() magic is something I cannot wrap my head around :(
>>>
>>> Anyways, Harry can you try the below patch and see if it makes any
>>> difference? Also if it does please provide dmesg.
>>>
>>
>> The patch doesn't seem to make a difference. Same hang on boot.
>>
>> Harry
>>
>>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
>>> index 21585ed89ef8..90360f7ca905 100644
>>> --- a/drivers/usb/core/usb-acpi.c
>>> +++ b/drivers/usb/core/usb-acpi.c
>>> @@ -157,6 +157,7 @@ EXPORT_SYMBOL_GPL(usb_acpi_set_power_state);
>>>    */
>>>   static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>>   {
>>> +	struct fwnode_handle *nhi_fwnode;
>>>   	const struct device_link *link;
>>>   	struct usb_port *port_dev;
>>>   	struct usb_hub *hub;
>>> @@ -165,11 +166,12 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>>   		return 0;
>>>   
>>>   	hub = usb_hub_to_struct_hub(udev->parent);
>>> -	port_dev = hub->ports[udev->portnum - 1];
>>> +	if (WARN_ON(!hub))
>>> +		return 0;
>>>   
>>> -	struct fwnode_handle *nhi_fwnode __free(fwnode_handle) =
>>> -		fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0);
>>> +	port_dev = hub->ports[udev->portnum - 1];
>>>   
>>> +	nhi_fwnode = fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0);
>>>   	if (IS_ERR(nhi_fwnode))
>>>   		return 0;
>>>   
>>> @@ -180,12 +182,14 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>>   	if (!link) {
>>>   		dev_err(&port_dev->dev, "Failed to created device link from %s to %s\n",
>>>   			dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
>>> +		fwnode_handle_put(nhi_fwnode);
>>>   		return -EINVAL;
>>>   	}
>>>   
>>> -	dev_dbg(&port_dev->dev, "Created device link from %s to %s\n",
>>> -		dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
>>> +	dev_info(&port_dev->dev, "Created device link from %s to %s\n",
>>> +		 dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
>>>   
>>> +	fwnode_handle_put(nhi_fwnode);
>>>   	return 0;
>>>   }
>>>   
>>>
>>
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-03 19:09               ` Mario Limonciello
@ 2024-10-03 19:41                 ` Harry Wentland
  2024-10-04  6:20                   ` Mika Westerberg
  0 siblings, 1 reply; 21+ messages in thread
From: Harry Wentland @ 2024-10-03 19:41 UTC (permalink / raw)
  To: Mario Limonciello, Mika Westerberg
  Cc: linux-usb, mathias.nyman, regressions, Raju.Rangoju, Sanath.S,
	Greg KH



On 2024-10-03 15:09, Mario Limonciello wrote:
> On 10/3/2024 13:51, Harry Wentland wrote:
>>
>>
>> On 2024-10-03 14:23, Harry Wentland wrote:
>>>
>>>
>>> On 2024-10-03 09:47, Mika Westerberg wrote:
>>>> On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote:
>>>>> On 10/3/2024 08:27, Mika Westerberg wrote:
>>>>>> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote:
>>>>>>> On 10/3/2024 00:47, Mika Westerberg wrote:
>>>>>>>> Hi Harry,
>>>>>>>>
>>>>>>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
>>>>>>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found
>>>>>>>>> my system hung at boot. No meaningful message showed on the kernel
>>>>>>>>> boot screen.
>>>>>>>>>
>>>>>>>>> A bisect revealed the culprit to be
>>>>>>>>>
>>>>>>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
>>>>>>>>> Author: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>>>>>>> Date:   Fri Aug 30 18:26:29 2024 +0300
>>>>>>>>>
>>>>>>>>>        usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
>>>>>>>>>
>>>>>>>>> A revert of this single patch "fixes" the issue and I can boot again.
>>>>>>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
>>>>>>>>> It's running Arch Linux but I doubt that's of consequence.
>>>>>>>>>
>>>>>>>>> lspci output:
>>>>>>>>>        https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0
>>>>>>>>> acpidump:
>>>>>>>>>        https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089
>>>>>>>>>
>>>>>>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
>>>>>>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop
>>>>>>>>> on a kernel with the faulty patch, without USB functionality, obviously.
>>>>>>>>>
>>>>>>>>> I'd be happy to try any patches, provide more data, or run experiments.
>>>>>>>>
>>>>>>>> Do you boot with any device connected?
>>>
>>> Great question. A Thinkpad USB-C dock. When I unplug the dock at boot it
>>> boots fine and when I plug it in later the laptop charges from it and the
>>> dock's audio output work fine.
>>>
>>> In the midst of my experiments I also noticed at one point the dock
>>> wasn't charging my laptop and hard-resetting the laptop didn't fix that.
>>> I had to unplug the dock from the wall and plug it back. So there is
>>> likely some interaction going on with this particular dock that must've
>>> sent the dock's FW into a bad state.
>>>
>>> The dmesg with the revert and thunderbolt.dyndbg=+p is here
>>> https://gist.github.com/hwentland/7e25dedd3e707fdae1185d65224d4d66
>>>
>>
>> Apologies, that dmesg was from a build with a bad .config and has some
>> FW loading errors. They seem to be unrelated though. This is a dmesg
>> from a good build. It still has a wlan FW error but that shouldn't have
>> anything to do with the problem at hand.
>>
>> https://gist.github.com/hwentland/867f7afbf3df20547a877e794a8d8e6b
>>
>>> I don't see any PCIe tunneling option in my BIOS.
>>>
>>>>>>>>> Second thing that I noticed, though I'm not familiar with AMD hardware,
>>>>>>>> but from your lspci dump, I do not see the PCIe ports that are being
>>>>>>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled
>>>>>>>> somehow?
>>>>>>>
>>>>>>> On some OEM systems it's possible to lock down from BIOS to turn off PCIe
>>>>>>> tunneling, and I agree that looks like the most common cause.
>>>>>>>
>>>>>>> This is what you would see on a system that has tunnels (I checked on my
>>>>>>> side w/ Z series laptop w/ Rembrandt and a dock connected):
>>>>>>>
>>>>>>>              +-03.0
>>>>>>>              +-03.1-[03-32]--
>>>>>>>              +-04.0
>>>>>>>              +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
>>>>>>>              |                               \-04.0-[36-62]--
>>>>>>>
>>>>>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>>>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>>>
>>>>>> Okay this is more like what I expected, although probably not the
>>>>>> reason here.
>>>>>>
>>>>>> Are you able to replicate the issue if you disable PCIe tunneling from
>>>>>> the BIOS on your reference system? (Probably not but just in case).
>>>>>
>>>>> I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS
>>>>> setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear
>>>>> but the system still boots up just fine for me on 6.12-rc1.
>>>>
>>>> Okay thanks for checking!
>>>>
>>>>>>>> You don't see anything on the console? It's all blank or it just hangs
>>>>>>>> after some messages?
>>>>>>>
>>>
>>> It hangs after some messages.
>>>
>>>>>>> I guess it is getting stuck on fwnode_find_reference() because it never
>>>>>>> finds the given node?
>>>>>>
>>>>>> Looking at the code, I don't see where it could get stuck. If for some
>>>>>> reason there is no such reference (there is based on the ACPI dump) then
>>>>>> it should not affect the boot. It only matters when power management is
>>>>>> involved.
>>>>>
>>>>> Nothing jumps out to me either.  Maybe this is a situation that Harry can
>>>>> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to
>>>>> enlighten what's going on (assuming the console output is "working" when
>>>>> this happened).
>>>>
>>
>> I sprinkled printks but don't see any on the console.
>>
> 
> You said it can work properly without the revert if you don't boot with the dock plugged in?
> 

It can work properly without the revert if I boot without the dock plugged in.

> How about if you unplug it, does unhang and you get everything flushed to the console?
> 

Nothing happens.

> Or maybe magic sysrq with a backtrace (l) can help see where something is spinning.

Nothing happens. CONFIG_MAGIC_SYSRQ is enabled in my kernel.

Harry

> 
>> Harry
>>
>>>> There are couple of places there that may cause it to crash, I think.
>>>> And the __free() magic is something I cannot wrap my head around :(
>>>>
>>>> Anyways, Harry can you try the below patch and see if it makes any
>>>> difference? Also if it does please provide dmesg.
>>>>
>>>
>>> The patch doesn't seem to make a difference. Same hang on boot.
>>>
>>> Harry
>>>
>>>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
>>>> index 21585ed89ef8..90360f7ca905 100644
>>>> --- a/drivers/usb/core/usb-acpi.c
>>>> +++ b/drivers/usb/core/usb-acpi.c
>>>> @@ -157,6 +157,7 @@ EXPORT_SYMBOL_GPL(usb_acpi_set_power_state);
>>>>    */
>>>>   static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>>>   {
>>>> +    struct fwnode_handle *nhi_fwnode;
>>>>       const struct device_link *link;
>>>>       struct usb_port *port_dev;
>>>>       struct usb_hub *hub;
>>>> @@ -165,11 +166,12 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>>>           return 0;
>>>>         hub = usb_hub_to_struct_hub(udev->parent);
>>>> -    port_dev = hub->ports[udev->portnum - 1];
>>>> +    if (WARN_ON(!hub))
>>>> +        return 0;
>>>>   -    struct fwnode_handle *nhi_fwnode __free(fwnode_handle) =
>>>> -        fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0);
>>>> +    port_dev = hub->ports[udev->portnum - 1];
>>>>   +    nhi_fwnode = fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0);
>>>>       if (IS_ERR(nhi_fwnode))
>>>>           return 0;
>>>>   @@ -180,12 +182,14 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>>>       if (!link) {
>>>>           dev_err(&port_dev->dev, "Failed to created device link from %s to %s\n",
>>>>               dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
>>>> +        fwnode_handle_put(nhi_fwnode);
>>>>           return -EINVAL;
>>>>       }
>>>>   -    dev_dbg(&port_dev->dev, "Created device link from %s to %s\n",
>>>> -        dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
>>>> +    dev_info(&port_dev->dev, "Created device link from %s to %s\n",
>>>> +         dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
>>>>   +    fwnode_handle_put(nhi_fwnode);
>>>>       return 0;
>>>>   }
>>>>  
>>>
>>
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-03 18:51             ` Harry Wentland
  2024-10-03 19:09               ` Mario Limonciello
@ 2024-10-03 19:48               ` Michał Pecio
  2024-10-03 20:43                 ` Harry Wentland
  1 sibling, 1 reply; 21+ messages in thread
From: Michał Pecio @ 2024-10-03 19:48 UTC (permalink / raw)
  To: harry.wentland
  Cc: Raju.Rangoju, Sanath.S, gregkh, linux-usb, mario.limonciello,
	mathias.nyman, mika.westerberg, regressions

> It hangs after some messages.
What are those messages?

> I sprinkled printks but don't see any on the console.
Did you use loglevel=3 on the kernel command line? No surprise then,
unless you turn all messages you care about into 'printk(KERN_ERR ...)'
or 'dev_err(...)'.

I would also recommend increasing loglevel to 4 (permanently), because
KERN_WARN is rare and usually tells you about pathological conditions.
The 'quiet' option is an alias for that and used to be a quite popular
default in distributions.

Regards,
Michal

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-03 19:48               ` Michał Pecio
@ 2024-10-03 20:43                 ` Harry Wentland
  2024-10-03 21:42                   ` Michał Pecio
  0 siblings, 1 reply; 21+ messages in thread
From: Harry Wentland @ 2024-10-03 20:43 UTC (permalink / raw)
  To: Michał Pecio
  Cc: Raju.Rangoju, Sanath.S, gregkh, linux-usb, mario.limonciello,
	mathias.nyman, mika.westerberg, regressions

On 2024-10-03 15:48, Michał Pecio wrote:
>> It hangs after some messages.
> What are those messages?
> 

https://pasteboard.co/dsPP640WFSBq.jpg

>> I sprinkled printks but don't see any on the console.
> Did you use loglevel=3 on the kernel command line? No surprise then,
> unless you turn all messages you care about into 'printk(KERN_ERR ...)'
> or 'dev_err(...)'.
> 
> I would also recommend increasing loglevel to 4 (permanently), because
> KERN_WARN is rare and usually tells you about pathological conditions.
> The 'quiet' option is an alias for that and used to be a quite popular
> default in distributions.
> 

No 'quiet' option and with loglevel at 4 I still don't seem to see
any prints.

This journalctl -k -b -1 might help:
https://gist.github.com/hwentland/7564cd08950e38dc4b7d305352481c85

I see this, but the same message appears on a good boot:
Oct 03 16:30:02 hwentlanrmb kernel: hub 6-0:1.0: config failed, hub doesn't have any ports! (err -19)

Interestingly if I set loglevel to 6 my boot gets much farther but then
gets stuck at "A start job is running for Load Kernel Modules
with ever increasing timeout values".

Harry

> Regards,
> Michal


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-03 20:43                 ` Harry Wentland
@ 2024-10-03 21:42                   ` Michał Pecio
  0 siblings, 0 replies; 21+ messages in thread
From: Michał Pecio @ 2024-10-03 21:42 UTC (permalink / raw)
  To: Harry Wentland
  Cc: Raju.Rangoju, Sanath.S, gregkh, linux-usb, mario.limonciello,
	mathias.nyman, mika.westerberg, regressions

> No 'quiet' option and with loglevel at 4 I still don't seem to see
> any prints.
What's the loglevel of your printks? If they are INFO, NOTICE or DEBUG
they will not appear on loglevel 4. In particular, the patch you are
testing added a dev_info, which is INFO and will not show on level 4.
I would simply change it to dev_err, then it should show.

> Interestingly if I set loglevel to 6 my boot gets much farther but
> then gets stuck at "A start job is running for Load Kernel Modules
> with ever increasing timeout values".
So the kernel isn't crashing, but probably some module gets stuck on
initialization and 'modprobe' never completes.

Not sure if it really gets much further or simply prints more noise?

If Magic SysRq doesn't work, maybe systemd simply disabled it? It does
so by default, unless overriden by custom config files. It looks like
your root FS gets mounted, so those configs may work.

You could also try blacklisting xhci_pci. I don't know what sort of
kernel config you are using, but on Arch Linux xhci_hcd is built-in so
your blacklisting attempt failed, but xhci_pci is a loadable module.

If this helps, then the next logical step is to 'modprobe xhci_pci'
while running 'dmesg -w' and see what happens.

Regards,
Michal

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-03 19:41                 ` Harry Wentland
@ 2024-10-04  6:20                   ` Mika Westerberg
  0 siblings, 0 replies; 21+ messages in thread
From: Mika Westerberg @ 2024-10-04  6:20 UTC (permalink / raw)
  To: Harry Wentland
  Cc: Mario Limonciello, linux-usb, mathias.nyman, regressions,
	Raju.Rangoju, Sanath.S, Greg KH

Hi Harry,

On Thu, Oct 03, 2024 at 03:41:42PM -0400, Harry Wentland wrote:
> > You said it can work properly without the revert if you don't boot with the dock plugged in?
> > 
> It can work properly without the revert if I boot without the dock plugged in.

Can you provide full dmesg of that with "thunderbolt.dyndbg=+p" in the
command line?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-03 13:47         ` Mika Westerberg
  2024-10-03 18:23           ` Harry Wentland
@ 2024-10-09 21:52           ` Mathias Nyman
  2024-10-10  2:23             ` Mario Limonciello
  1 sibling, 1 reply; 21+ messages in thread
From: Mathias Nyman @ 2024-10-09 21:52 UTC (permalink / raw)
  To: Mika Westerberg, Mario Limonciello, <Harry Wentland
  Cc: linux-usb, regressions, Raju.Rangoju, Sanath.S, Greg KH

On 3.10.2024 16.47, Mika Westerberg wrote:
> On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote:
>> On 10/3/2024 08:27, Mika Westerberg wrote:
>>> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote:
>>>> On 10/3/2024 00:47, Mika Westerberg wrote:
>>>>> Hi Harry,
>>>>>
>>>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
>>>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found
>>>>>> my system hung at boot. No meaningful message showed on the kernel
>>>>>> boot screen.
>>>>>>
>>>>>> A bisect revealed the culprit to be
>>>>>>
>>>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
>>>>>> Author: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>>>> Date:   Fri Aug 30 18:26:29 2024 +0300
>>>>>>
>>>>>>        usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
>>>>>>
>>>>>> A revert of this single patch "fixes" the issue and I can boot again.
>>>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
>>>>>> It's running Arch Linux but I doubt that's of consequence.
>>>>>>
>>>>>> lspci output:
>>>>>>        https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0
>>>>>> acpidump:
>>>>>>        https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089
>>>>>>
>>>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
>>>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop
>>>>>> on a kernel with the faulty patch, without USB functionality, obviously.
>>>>>>
>>>>>> I'd be happy to try any patches, provide more data, or run experiments.
>>>>>
>>>>> Do you boot with any device connected?
>>>>>> Second thing that I noticed, though I'm not familiar with AMD hardware,
>>>>> but from your lspci dump, I do not see the PCIe ports that are being
>>>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled
>>>>> somehow?
>>>>
>>>> On some OEM systems it's possible to lock down from BIOS to turn off PCIe
>>>> tunneling, and I agree that looks like the most common cause.
>>>>
>>>> This is what you would see on a system that has tunnels (I checked on my
>>>> side w/ Z series laptop w/ Rembrandt and a dock connected):
>>>>
>>>>              +-03.0
>>>>              +-03.1-[03-32]--
>>>>              +-04.0
>>>>              +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
>>>>              |                               \-04.0-[36-62]--
>>>>
>>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>
>>> Okay this is more like what I expected, although probably not the
>>> reason here.
>>>
>>> Are you able to replicate the issue if you disable PCIe tunneling from
>>> the BIOS on your reference system? (Probably not but just in case).
>>
>> I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS
>> setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear
>> but the system still boots up just fine for me on 6.12-rc1.
> 
> Okay thanks for checking!
> 
>>>>> You don't see anything on the console? It's all blank or it just hangs
>>>>> after some messages?
>>>>
>>>> I guess it is getting stuck on fwnode_find_reference() because it never
>>>> finds the given node?
>>>
>>> Looking at the code, I don't see where it could get stuck. If for some
>>> reason there is no such reference (there is based on the ACPI dump) then
>>> it should not affect the boot. It only matters when power management is
>>> involved.
>>
>> Nothing jumps out to me either.  Maybe this is a situation that Harry can
>> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to
>> enlighten what's going on (assuming the console output is "working" when
>> this happened).
> 
> There are couple of places there that may cause it to crash, I think.

Its possible we end up trying to create a device link during usb3 device
"consumer" enumeration before the "supplier" NHI device is properly bound to a driver.

This is something driver-api/device_link.rst states can cause issues.

This could happen if xhci isn't capable of detecting tunneled devices,
but ACPI tables contain all info needed to assume device might be tunneled.
i.e. udev->tunnel_mode == USB_LINK_UNKNOWN.

Harry, could you test if the code below helps?

diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
index 21585ed89ef8..94c335a7b933 100644
--- a/drivers/usb/core/usb-acpi.c
+++ b/drivers/usb/core/usb-acpi.c
@@ -173,6 +173,13 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
         if (IS_ERR(nhi_fwnode))
                 return 0;
  
+       if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) {
+               dev_info(&port_dev->dev, "%s not tunneled as it probed before USB4 Host Interface\n",
+                        dev_name(&port_dev->child->dev));
+               udev->tunnel_mode = USB_LINK_NATIVE;
+               return 0;
+       }
+
         link = device_link_add(&port_dev->child->dev, nhi_fwnode->dev,
                                DL_FLAG_AUTOREMOVE_CONSUMER |
                                DL_FLAG_RPM_ACTIVE |





^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-09 21:52           ` Mathias Nyman
@ 2024-10-10  2:23             ` Mario Limonciello
  2024-10-10 12:01               ` Mathias Nyman
  0 siblings, 1 reply; 21+ messages in thread
From: Mario Limonciello @ 2024-10-10  2:23 UTC (permalink / raw)
  To: Mathias Nyman, Mika Westerberg, <Harry Wentland
  Cc: linux-usb, regressions, Raju.Rangoju, Sanath.S, Greg KH

On 10/9/2024 16:52, Mathias Nyman wrote:
> On 3.10.2024 16.47, Mika Westerberg wrote:
>> On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote:
>>> On 10/3/2024 08:27, Mika Westerberg wrote:
>>>> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote:
>>>>> On 10/3/2024 00:47, Mika Westerberg wrote:
>>>>>> Hi Harry,
>>>>>>
>>>>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
>>>>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found
>>>>>>> my system hung at boot. No meaningful message showed on the kernel
>>>>>>> boot screen.
>>>>>>>
>>>>>>> A bisect revealed the culprit to be
>>>>>>>
>>>>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
>>>>>>> Author: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>>>>> Date:   Fri Aug 30 18:26:29 2024 +0300
>>>>>>>
>>>>>>>        usb: acpi: add device link between tunneled USB3 device 
>>>>>>> and USB4 Host Interface
>>>>>>>
>>>>>>> A revert of this single patch "fixes" the issue and I can boot 
>>>>>>> again.
>>>>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U 
>>>>>>> CPU.
>>>>>>> It's running Arch Linux but I doubt that's of consequence.
>>>>>>>
>>>>>>> lspci output:
>>>>>>>        https://gist.github.com/ 
>>>>>>> hwentland/59aef63d9b742b7b64d2604aae9792e0
>>>>>>> acpidump:
>>>>>>>        https://gist.github.com/ 
>>>>>>> hwentland/4824afc8d712c3d600be5c291f7f1089
>>>>>>>
>>>>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did 
>>>>>>> nothing.
>>>>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop
>>>>>>> on a kernel with the faulty patch, without USB functionality, 
>>>>>>> obviously.
>>>>>>>
>>>>>>> I'd be happy to try any patches, provide more data, or run 
>>>>>>> experiments.
>>>>>>
>>>>>> Do you boot with any device connected?
>>>>>>> Second thing that I noticed, though I'm not familiar with AMD 
>>>>>>> hardware,
>>>>>> but from your lspci dump, I do not see the PCIe ports that are being
>>>>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled
>>>>>> somehow?
>>>>>
>>>>> On some OEM systems it's possible to lock down from BIOS to turn 
>>>>> off PCIe
>>>>> tunneling, and I agree that looks like the most common cause.
>>>>>
>>>>> This is what you would see on a system that has tunnels (I checked 
>>>>> on my
>>>>> side w/ Z series laptop w/ Rembrandt and a dock connected):
>>>>>
>>>>>              +-03.0
>>>>>              +-03.1-[03-32]--
>>>>>              +-04.0
>>>>>              +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
>>>>>              |                               \-04.0-[36-62]--
>>>>>
>>>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 
>>>>> Family 19h
>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 
>>>>> Family 19h
>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>
>>>> Okay this is more like what I expected, although probably not the
>>>> reason here.
>>>>
>>>> Are you able to replicate the issue if you disable PCIe tunneling from
>>>> the BIOS on your reference system? (Probably not but just in case).
>>>
>>> I checked on the Lenovo Z13 laptop I have and turned off "USB port" 
>>> in BIOS
>>> setup and this caused the endpoints 3.1 and 4.1 I listed above to 
>>> disappear
>>> but the system still boots up just fine for me on 6.12-rc1.
>>
>> Okay thanks for checking!
>>
>>>>>> You don't see anything on the console? It's all blank or it just 
>>>>>> hangs
>>>>>> after some messages?
>>>>>
>>>>> I guess it is getting stuck on fwnode_find_reference() because it 
>>>>> never
>>>>> finds the given node?
>>>>
>>>> Looking at the code, I don't see where it could get stuck. If for some
>>>> reason there is no such reference (there is based on the ACPI dump) 
>>>> then
>>>> it should not affect the boot. It only matters when power management is
>>>> involved.
>>>
>>> Nothing jumps out to me either.  Maybe this is a situation that Harry 
>>> can
>>> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to
>>> enlighten what's going on (assuming the console output is "working" when
>>> this happened).
>>
>> There are couple of places there that may cause it to crash, I think.
> 
> Its possible we end up trying to create a device link during usb3 device
> "consumer" enumeration before the "supplier" NHI device is properly 
> bound to a driver.
> 
> This is something driver-api/device_link.rst states can cause issues.
> 
> This could happen if xhci isn't capable of detecting tunneled devices,
> but ACPI tables contain all info needed to assume device might be tunneled.
> i.e. udev->tunnel_mode == USB_LINK_UNKNOWN.
> 
> Harry, could you test if the code below helps?
> 
> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
> index 21585ed89ef8..94c335a7b933 100644
> --- a/drivers/usb/core/usb-acpi.c
> +++ b/drivers/usb/core/usb-acpi.c
> @@ -173,6 +173,13 @@ static int usb_acpi_add_usb4_devlink(struct 
> usb_device *udev)
>          if (IS_ERR(nhi_fwnode))
>                  return 0;
> 
> +       if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) {
> +               dev_info(&port_dev->dev, "%s not tunneled as it probed 
> before USB4 Host Interface\n",

I'm aware this message is mostly to prove whether this is the actual 
issue but I do want to say if this patch indeed helps Harry's problem 
and you keep a message in what goes upstream I don't think this is 
accurate for all cases.

If you have a Pre-OS CM, it might build tunnels and those could be 
active until the USB4 CM loads and resets them (by the default behavior).

So I think a more accurate message would just be "%s probed before USB4 
host interface".

> +                        dev_name(&port_dev->child->dev));
> +               udev->tunnel_mode = USB_LINK_NATIVE;
> +               return 0;
> +       }
> +
>          link = device_link_add(&port_dev->child->dev, nhi_fwnode->dev,
>                                 DL_FLAG_AUTOREMOVE_CONSUMER |
>                                 DL_FLAG_RPM_ACTIVE |
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-10  2:23             ` Mario Limonciello
@ 2024-10-10 12:01               ` Mathias Nyman
  2024-10-16 19:48                 ` Harry Wentland
  0 siblings, 1 reply; 21+ messages in thread
From: Mathias Nyman @ 2024-10-10 12:01 UTC (permalink / raw)
  To: Mario Limonciello, Mika Westerberg, <Harry Wentland
  Cc: linux-usb, regressions, Raju.Rangoju, Sanath.S, Greg KH

On 10.10.2024 5.23, Mario Limonciello wrote:
> On 10/9/2024 16:52, Mathias Nyman wrote:
>> On 3.10.2024 16.47, Mika Westerberg wrote:
>>> On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote:
>>>> On 10/3/2024 08:27, Mika Westerberg wrote:
>>>>> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote:
>>>>>> On 10/3/2024 00:47, Mika Westerberg wrote:
>>>>>>> Hi Harry,
>>>>>>>
>>>>>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
>>>>>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found
>>>>>>>> my system hung at boot. No meaningful message showed on the kernel
>>>>>>>> boot screen.
>>>>>>>>
>>>>>>>> A bisect revealed the culprit to be
>>>>>>>>
>>>>>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
>>>>>>>> Author: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>>>>>> Date:   Fri Aug 30 18:26:29 2024 +0300
>>>>>>>>
>>>>>>>>        usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
>>>>>>>>
>>>>>>>> A revert of this single patch "fixes" the issue and I can boot again.
>>>>>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
>>>>>>>> It's running Arch Linux but I doubt that's of consequence.
>>>>>>>>
>>>>>>>> lspci output:
>>>>>>>>        https://gist.github.com/ hwentland/59aef63d9b742b7b64d2604aae9792e0
>>>>>>>> acpidump:
>>>>>>>>        https://gist.github.com/ hwentland/4824afc8d712c3d600be5c291f7f1089
>>>>>>>>
>>>>>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
>>>>>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop
>>>>>>>> on a kernel with the faulty patch, without USB functionality, obviously.
>>>>>>>>
>>>>>>>> I'd be happy to try any patches, provide more data, or run experiments.
>>>>>>>
>>>>>>> Do you boot with any device connected?
>>>>>>>> Second thing that I noticed, though I'm not familiar with AMD hardware,
>>>>>>> but from your lspci dump, I do not see the PCIe ports that are being
>>>>>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled
>>>>>>> somehow?
>>>>>>
>>>>>> On some OEM systems it's possible to lock down from BIOS to turn off PCIe
>>>>>> tunneling, and I agree that looks like the most common cause.
>>>>>>
>>>>>> This is what you would see on a system that has tunnels (I checked on my
>>>>>> side w/ Z series laptop w/ Rembrandt and a dock connected):
>>>>>>
>>>>>>              +-03.0
>>>>>>              +-03.1-[03-32]--
>>>>>>              +-04.0
>>>>>>              +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
>>>>>>              |                               \-04.0-[36-62]--
>>>>>>
>>>>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>>
>>>>> Okay this is more like what I expected, although probably not the
>>>>> reason here.
>>>>>
>>>>> Are you able to replicate the issue if you disable PCIe tunneling from
>>>>> the BIOS on your reference system? (Probably not but just in case).
>>>>
>>>> I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS
>>>> setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear
>>>> but the system still boots up just fine for me on 6.12-rc1.
>>>
>>> Okay thanks for checking!
>>>
>>>>>>> You don't see anything on the console? It's all blank or it just hangs
>>>>>>> after some messages?
>>>>>>
>>>>>> I guess it is getting stuck on fwnode_find_reference() because it never
>>>>>> finds the given node?
>>>>>
>>>>> Looking at the code, I don't see where it could get stuck. If for some
>>>>> reason there is no such reference (there is based on the ACPI dump) then
>>>>> it should not affect the boot. It only matters when power management is
>>>>> involved.
>>>>
>>>> Nothing jumps out to me either.  Maybe this is a situation that Harry can
>>>> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to
>>>> enlighten what's going on (assuming the console output is "working" when
>>>> this happened).
>>>
>>> There are couple of places there that may cause it to crash, I think.
>>
>> Its possible we end up trying to create a device link during usb3 device
>> "consumer" enumeration before the "supplier" NHI device is properly bound to a driver.
>>
>> This is something driver-api/device_link.rst states can cause issues.
>>
>> This could happen if xhci isn't capable of detecting tunneled devices,
>> but ACPI tables contain all info needed to assume device might be tunneled.
>> i.e. udev->tunnel_mode == USB_LINK_UNKNOWN.
>>
>> Harry, could you test if the code below helps?
>>
>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
>> index 21585ed89ef8..94c335a7b933 100644
>> --- a/drivers/usb/core/usb-acpi.c
>> +++ b/drivers/usb/core/usb-acpi.c
>> @@ -173,6 +173,13 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>          if (IS_ERR(nhi_fwnode))
>>                  return 0;
>>
>> +       if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) {
>> +               dev_info(&port_dev->dev, "%s not tunneled as it probed before USB4 Host Interface\n",
> 
> I'm aware this message is mostly to prove whether this is the actual issue but I do want to say if this patch indeed helps Harry's problem and you keep a message in what goes upstream I don't think this is accurate for all cases.
> 
> If you have a Pre-OS CM, it might build tunnels and those could be active until the USB4 CM loads and resets them (by the default behavior).
> 
> So I think a more accurate message would just be "%s probed before USB4 host interface".

Makes sense, I'll tune the message in the final patch if this works

Thanks
Mathias


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-10 12:01               ` Mathias Nyman
@ 2024-10-16 19:48                 ` Harry Wentland
  2024-10-21 10:56                   ` Mathias Nyman
  0 siblings, 1 reply; 21+ messages in thread
From: Harry Wentland @ 2024-10-16 19:48 UTC (permalink / raw)
  To: Mathias Nyman, Mario Limonciello, Mika Westerberg
  Cc: linux-usb, regressions, Raju.Rangoju, Sanath.S, Greg KH



On 2024-10-10 08:01, Mathias Nyman wrote:
> On 10.10.2024 5.23, Mario Limonciello wrote:
>> On 10/9/2024 16:52, Mathias Nyman wrote:
>>> On 3.10.2024 16.47, Mika Westerberg wrote:
>>>> On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote:
>>>>> On 10/3/2024 08:27, Mika Westerberg wrote:
>>>>>> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote:
>>>>>>> On 10/3/2024 00:47, Mika Westerberg wrote:
>>>>>>>> Hi Harry,
>>>>>>>>
>>>>>>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
>>>>>>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found
>>>>>>>>> my system hung at boot. No meaningful message showed on the kernel
>>>>>>>>> boot screen.
>>>>>>>>>
>>>>>>>>> A bisect revealed the culprit to be
>>>>>>>>>
>>>>>>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
>>>>>>>>> Author: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>>>>>>> Date:   Fri Aug 30 18:26:29 2024 +0300
>>>>>>>>>
>>>>>>>>>        usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
>>>>>>>>>
>>>>>>>>> A revert of this single patch "fixes" the issue and I can boot again.
>>>>>>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
>>>>>>>>> It's running Arch Linux but I doubt that's of consequence.
>>>>>>>>>
>>>>>>>>> lspci output:
>>>>>>>>>        https://gist.github.com/ hwentland/59aef63d9b742b7b64d2604aae9792e0
>>>>>>>>> acpidump:
>>>>>>>>>        https://gist.github.com/ hwentland/4824afc8d712c3d600be5c291f7f1089
>>>>>>>>>
>>>>>>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
>>>>>>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop
>>>>>>>>> on a kernel with the faulty patch, without USB functionality, obviously.
>>>>>>>>>
>>>>>>>>> I'd be happy to try any patches, provide more data, or run experiments.
>>>>>>>>
>>>>>>>> Do you boot with any device connected?
>>>>>>>>> Second thing that I noticed, though I'm not familiar with AMD hardware,
>>>>>>>> but from your lspci dump, I do not see the PCIe ports that are being
>>>>>>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled
>>>>>>>> somehow?
>>>>>>>
>>>>>>> On some OEM systems it's possible to lock down from BIOS to turn off PCIe
>>>>>>> tunneling, and I agree that looks like the most common cause.
>>>>>>>
>>>>>>> This is what you would see on a system that has tunnels (I checked on my
>>>>>>> side w/ Z series laptop w/ Rembrandt and a dock connected):
>>>>>>>
>>>>>>>              +-03.0
>>>>>>>              +-03.1-[03-32]--
>>>>>>>              +-04.0
>>>>>>>              +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
>>>>>>>              |                               \-04.0-[36-62]--
>>>>>>>
>>>>>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>>>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>>>
>>>>>> Okay this is more like what I expected, although probably not the
>>>>>> reason here.
>>>>>>
>>>>>> Are you able to replicate the issue if you disable PCIe tunneling from
>>>>>> the BIOS on your reference system? (Probably not but just in case).
>>>>>
>>>>> I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS
>>>>> setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear
>>>>> but the system still boots up just fine for me on 6.12-rc1.
>>>>
>>>> Okay thanks for checking!
>>>>
>>>>>>>> You don't see anything on the console? It's all blank or it just hangs
>>>>>>>> after some messages?
>>>>>>>
>>>>>>> I guess it is getting stuck on fwnode_find_reference() because it never
>>>>>>> finds the given node?
>>>>>>
>>>>>> Looking at the code, I don't see where it could get stuck. If for some
>>>>>> reason there is no such reference (there is based on the ACPI dump) then
>>>>>> it should not affect the boot. It only matters when power management is
>>>>>> involved.
>>>>>
>>>>> Nothing jumps out to me either.  Maybe this is a situation that Harry can
>>>>> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to
>>>>> enlighten what's going on (assuming the console output is "working" when
>>>>> this happened).
>>>>
>>>> There are couple of places there that may cause it to crash, I think.
>>>
>>> Its possible we end up trying to create a device link during usb3 device
>>> "consumer" enumeration before the "supplier" NHI device is properly bound to a driver.
>>>
>>> This is something driver-api/device_link.rst states can cause issues.
>>>
>>> This could happen if xhci isn't capable of detecting tunneled devices,
>>> but ACPI tables contain all info needed to assume device might be tunneled.
>>> i.e. udev->tunnel_mode == USB_LINK_UNKNOWN.
>>>
>>> Harry, could you test if the code below helps?
>>>
>>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
>>> index 21585ed89ef8..94c335a7b933 100644
>>> --- a/drivers/usb/core/usb-acpi.c
>>> +++ b/drivers/usb/core/usb-acpi.c
>>> @@ -173,6 +173,13 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>>          if (IS_ERR(nhi_fwnode))
>>>                  return 0;
>>>
>>> +       if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) {
>>> +               dev_info(&port_dev->dev, "%s not tunneled as it probed before USB4 Host Interface\n",
>>
>> I'm aware this message is mostly to prove whether this is the actual issue but I do want to say if this patch indeed helps Harry's problem and you keep a message in what goes upstream I don't think this is accurate for all cases.
>>
>> If you have a Pre-OS CM, it might build tunnels and those could be active until the USB4 CM loads and resets them (by the default behavior).
>>
>> So I think a more accurate message would just be "%s probed before USB4 host interface".
> 
> Makes sense, I'll tune the message in the final patch if this works
> 

Apologies for the late response. I was traveling last week.

This patch does the trick, i.e., no more hangs on boot when
connected to the Lenovo USB dock.

Harry


> Thanks
> Mathias
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-16 19:48                 ` Harry Wentland
@ 2024-10-21 10:56                   ` Mathias Nyman
  2024-10-22 12:33                     ` Mathias Nyman
  0 siblings, 1 reply; 21+ messages in thread
From: Mathias Nyman @ 2024-10-21 10:56 UTC (permalink / raw)
  To: Harry Wentland, Mario Limonciello, Mika Westerberg
  Cc: linux-usb, regressions, Raju.Rangoju, Sanath.S, Greg KH

On 16.10.2024 22.48, Harry Wentland wrote:
> 
> 
> On 2024-10-10 08:01, Mathias Nyman wrote:
>> On 10.10.2024 5.23, Mario Limonciello wrote:
>>> On 10/9/2024 16:52, Mathias Nyman wrote:
>>>> On 3.10.2024 16.47, Mika Westerberg wrote:
>>>>> On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote:
>>>>>> On 10/3/2024 08:27, Mika Westerberg wrote:
>>>>>>> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote:
>>>>>>>> On 10/3/2024 00:47, Mika Westerberg wrote:
>>>>>>>>> Hi Harry,
>>>>>>>>>
>>>>>>>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
>>>>>>>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found
>>>>>>>>>> my system hung at boot. No meaningful message showed on the kernel
>>>>>>>>>> boot screen.
>>>>>>>>>>
>>>>>>>>>> A bisect revealed the culprit to be
>>>>>>>>>>
>>>>>>>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
>>>>>>>>>> Author: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>>>>>>>> Date:   Fri Aug 30 18:26:29 2024 +0300
>>>>>>>>>>
>>>>>>>>>>         usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
>>>>>>>>>>
>>>>>>>>>> A revert of this single patch "fixes" the issue and I can boot again.
>>>>>>>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
>>>>>>>>>> It's running Arch Linux but I doubt that's of consequence.
>>>>>>>>>>
>>>>>>>>>> lspci output:
>>>>>>>>>>         https://gist.github.com/ hwentland/59aef63d9b742b7b64d2604aae9792e0
>>>>>>>>>> acpidump:
>>>>>>>>>>         https://gist.github.com/ hwentland/4824afc8d712c3d600be5c291f7f1089
>>>>>>>>>>
>>>>>>>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
>>>>>>>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop
>>>>>>>>>> on a kernel with the faulty patch, without USB functionality, obviously.
>>>>>>>>>>
>>>>>>>>>> I'd be happy to try any patches, provide more data, or run experiments.
>>>>>>>>>
>>>>>>>>> Do you boot with any device connected?
>>>>>>>>>> Second thing that I noticed, though I'm not familiar with AMD hardware,
>>>>>>>>> but from your lspci dump, I do not see the PCIe ports that are being
>>>>>>>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled
>>>>>>>>> somehow?
>>>>>>>>
>>>>>>>> On some OEM systems it's possible to lock down from BIOS to turn off PCIe
>>>>>>>> tunneling, and I agree that looks like the most common cause.
>>>>>>>>
>>>>>>>> This is what you would see on a system that has tunnels (I checked on my
>>>>>>>> side w/ Z series laptop w/ Rembrandt and a dock connected):
>>>>>>>>
>>>>>>>>               +-03.0
>>>>>>>>               +-03.1-[03-32]--
>>>>>>>>               +-04.0
>>>>>>>>               +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
>>>>>>>>               |                               \-04.0-[36-62]--
>>>>>>>>
>>>>>>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>>>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>>>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>>>>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>>>>
>>>>>>> Okay this is more like what I expected, although probably not the
>>>>>>> reason here.
>>>>>>>
>>>>>>> Are you able to replicate the issue if you disable PCIe tunneling from
>>>>>>> the BIOS on your reference system? (Probably not but just in case).
>>>>>>
>>>>>> I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS
>>>>>> setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear
>>>>>> but the system still boots up just fine for me on 6.12-rc1.
>>>>>
>>>>> Okay thanks for checking!
>>>>>
>>>>>>>>> You don't see anything on the console? It's all blank or it just hangs
>>>>>>>>> after some messages?
>>>>>>>>
>>>>>>>> I guess it is getting stuck on fwnode_find_reference() because it never
>>>>>>>> finds the given node?
>>>>>>>
>>>>>>> Looking at the code, I don't see where it could get stuck. If for some
>>>>>>> reason there is no such reference (there is based on the ACPI dump) then
>>>>>>> it should not affect the boot. It only matters when power management is
>>>>>>> involved.
>>>>>>
>>>>>> Nothing jumps out to me either.  Maybe this is a situation that Harry can
>>>>>> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to
>>>>>> enlighten what's going on (assuming the console output is "working" when
>>>>>> this happened).
>>>>>
>>>>> There are couple of places there that may cause it to crash, I think.
>>>>
>>>> Its possible we end up trying to create a device link during usb3 device
>>>> "consumer" enumeration before the "supplier" NHI device is properly bound to a driver.
>>>>
>>>> This is something driver-api/device_link.rst states can cause issues.
>>>>
>>>> This could happen if xhci isn't capable of detecting tunneled devices,
>>>> but ACPI tables contain all info needed to assume device might be tunneled.
>>>> i.e. udev->tunnel_mode == USB_LINK_UNKNOWN.
>>>>
>>>> Harry, could you test if the code below helps?
>>>>
>>>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
>>>> index 21585ed89ef8..94c335a7b933 100644
>>>> --- a/drivers/usb/core/usb-acpi.c
>>>> +++ b/drivers/usb/core/usb-acpi.c
>>>> @@ -173,6 +173,13 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>>>           if (IS_ERR(nhi_fwnode))
>>>>                   return 0;
>>>>
>>>> +       if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) {
>>>> +               dev_info(&port_dev->dev, "%s not tunneled as it probed before USB4 Host Interface\n",
>>>
>>> I'm aware this message is mostly to prove whether this is the actual issue but I do want to say if this patch indeed helps Harry's problem and you keep a message in what goes upstream I don't think this is accurate for all cases.
>>>
>>> If you have a Pre-OS CM, it might build tunnels and those could be active until the USB4 CM loads and resets them (by the default behavior).
>>>
>>> So I think a more accurate message would just be "%s probed before USB4 host interface".
>>
>> Makes sense, I'll tune the message in the final patch if this works
>>
> 
> Apologies for the late response. I was traveling last week.
> 
> This patch does the trick, i.e., no more hangs on boot when
> connected to the Lenovo USB dock.
> 
> Harry
> 

Thanks for testing,

I'm myself seeing some issues now with this fix.
It's not creating the device link when it should due to the !device_is_bound(nhi_fwnode->dev) check.

I need to look into this a bit more

Thanks
Mathias


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
  2024-10-21 10:56                   ` Mathias Nyman
@ 2024-10-22 12:33                     ` Mathias Nyman
  0 siblings, 0 replies; 21+ messages in thread
From: Mathias Nyman @ 2024-10-22 12:33 UTC (permalink / raw)
  To: Harry Wentland, Mario Limonciello, Mika Westerberg
  Cc: linux-usb, regressions, Raju.Rangoju, Sanath.S, Greg KH

On 21.10.2024 13.56, Mathias Nyman wrote:
> On 16.10.2024 22.48, Harry Wentland wrote:

>> This patch does the trick, i.e., no more hangs on boot when
>> connected to the Lenovo USB dock.
>>
>> Harry
>>
> 
> Thanks for testing,
> 
> I'm myself seeing some issues now with this fix.
> It's not creating the device link when it should due to the !device_is_bound(nhi_fwnode->dev) check.
> 
> I need to look into this a bit more
> 

It was an unrelated issue in my setup,
It works for me now, I'll post the patch

Thanks
Mathias


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-10-22 12:31 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-02 17:42 [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface Harry Wentland
2024-10-02 19:39 ` Mario Limonciello
2024-10-03  5:47 ` Mika Westerberg
2024-10-03 13:10   ` Mario Limonciello
2024-10-03 13:27     ` Mika Westerberg
2024-10-03 13:42       ` Mario Limonciello
2024-10-03 13:47         ` Mika Westerberg
2024-10-03 18:23           ` Harry Wentland
2024-10-03 18:51             ` Harry Wentland
2024-10-03 19:09               ` Mario Limonciello
2024-10-03 19:41                 ` Harry Wentland
2024-10-04  6:20                   ` Mika Westerberg
2024-10-03 19:48               ` Michał Pecio
2024-10-03 20:43                 ` Harry Wentland
2024-10-03 21:42                   ` Michał Pecio
2024-10-09 21:52           ` Mathias Nyman
2024-10-10  2:23             ` Mario Limonciello
2024-10-10 12:01               ` Mathias Nyman
2024-10-16 19:48                 ` Harry Wentland
2024-10-21 10:56                   ` Mathias Nyman
2024-10-22 12:33                     ` Mathias Nyman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).