public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
* So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
@ 2025-03-02  4:57 Kenneth Crudup
  2025-03-02  5:36 ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-02  4:57 UTC (permalink / raw)
  To: Mika Westerberg; +Cc: linux-usb


Remember all those "__tb_path_deactivate_hop" messages you'd seen in my 
previous pstore dumps? It was 'cause when I didn't get crashes with my 
NVMe adaptor (which you found was caused by 9d573d1954) I was getting 
these whenever I had an external monitor (all USB-C DP tunneled):

----
<4>[21119.295762][T22907] thunderbolt 0000:00:0d.2: 0:5: path does not 
end on a DP adapter, cleaning up
<4>[21119.297327][T22907] Oops: Oops: 0000 [#1] PREEMPT SMP
<4>[21119.297334][T22907] CPU: 4 UID: 0 PID: 22907 Comm: systemd-sleep 
Tainted: G S   U             6.14.0-rc4-kenny+ #1
<4>[21119.297342][T22907] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
<4>[21119.297344][T22907] Hardware name: Dell Inc. XPS 9320/0KNXGD, BIOS 
2.18.1 12/24/2024
<4>[21119.297347][T22907] RIP: 0010:__tb_path_deactivate_hop+0x5a/0x332
<4>[21119.297359][T22907] Code: 75 d0 41 89 d6 48 89 fa 48 c7 c7 68 49 
fe a9 e8 dc 83 f8 ff 49 8b 47 20 41 0f b6 4f 50 4
1 b9 91 01 00 00 49 c7 c0 70 93 ab a9 <8b> b0 00 03 00 00 8b 90 04 03 00 
00 48 8b 80 30 03 00 00 81 e2 ff
<4>[21119.297363][T22907] RSP: 0000:ffffab7a1f7f37a8 EFLAGS: 00010246
<4>[21119.297368][T22907] RAX: 0000000000000000 RBX: 0000000000000001 
RCX: 0000000000000000
<4>[21119.297371][T22907] RDX: 0000000000000000 RSI: 0000000000000001 
RDI: ffff8c00af51b780
<4>[21119.297375][T22907] RBP: ffffab7a1f7f37e8 R08: ffffffffa9ab9370 
R09: 0000000000000191
<4>[21119.297379][T22907] R10: ffffffffaad58d88 R11: 0000000000000003 
R12: 0000000051c7dd20
<4>[21119.297382][T22907] R13: ffffab7a1f7f37b0 R14: 000000000000001a 
R15: ffffab7a00801b00
<4>[21119.297387][T22907] FS:  00007f4822dde940(0000) 
GS:ffff8c00af500000(0000) knlGS:0000000000000000
<4>[21119.297393][T22907] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[21119.297397][T22907] CR2: 0000000000000300 CR3: 0000000424911002 
CR4: 0000000000770ef0
<4>[21119.297401][T22907] PKRU: 55555554
<4>[21119.297404][T22907] Call Trace:
<4>[21119.297407][T22907]  <TASK>
<4>[21119.297413][T22907]  ? show_regs.part.0+0x1d/0x20
<4>[21119.297425][T22907]  ? __die+0x52/0x91
<4>[21119.297436][T22907]  ? page_fault_oops+0x9a/0x220
<4>[21119.297444][T22907]  ? up+0x2d/0x60
<4>[21119.297450][T22907]  ? exc_page_fault+0x2fc/0x5c0
<4>[21119.297460][T22907]  ? asm_exc_page_fault+0x27/0x30
<4>[21119.297469][T22907]  ? __tb_path_deactivate_hop+0x5a/0x332
<4>[21119.297476][T22907]  ? __tb_path_deactivate_hop+0x44/0x332
<4>[21119.297483][T22907]  __tb_path_deactivate_hops.cold+0x2e/0xaa
<4>[21119.297490][T22907]  tb_path_deactivate+0x1e/0x110
<4>[21119.297496][T22907]  tb_tunnel_deactivate+0x65/0x120
----

So when I got home this afternoon I kept throwing more pr_info() 
checkpoints all over, and found out this was the culprit (line 436/7 of 
".../drivers/thunderbolt/path.c"
----
return tb_port_write(port, &hop, TB_CFG_HOPS, 2 * hop_index, 2);
----

So I wrapped tb_port_write() with pr_info looking for bogus values and 
found none (as well as none in the above call to it).

Taking a look at the underlying actual call to tb_cfg_write(), didn't 
turn up anything obvious, so on a whim I did a log on 
.../drivers/thunderbolt and took a chance, reverted the Subject: commit 
and haven't had a resume/hibernate crash since. (9d573d1954 is also 
reverted).

My typical topology is XPS-9320 -> TB Hub (I have a CalDigit TS4, a 
Plugable TBT4-HUB3C, and a Belkin Thunderbolt 3 Dock Core, it happens on 
all of them) and a either a USB-C DP portable monitor, or at home via a 
USB-C-to-DisplayPort cable.

If there's any other information you need to help fix this, let me know.

-K

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-02  4:57 So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes Kenneth Crudup
@ 2025-03-02  5:36 ` Kenneth Crudup
  2025-03-02 16:26   ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-02  5:36 UTC (permalink / raw)
  To: Mika Westerberg; +Cc: linux-usb, Kenneth Crudup


Thinking it may have been related to timeouts (my Samsung Odyssey 
monitor can sometimes take 15 seconds to come out of sleep and start 
displaying) I'd set thunderbolt.dprx_timeout=100000 to no avail.

-K

On 3/1/25 20:57, Kenneth Crudup wrote:
> 
> Remember all those "__tb_path_deactivate_hop" messages you'd seen in my 
> previous pstore dumps? It was 'cause when I didn't get crashes with my 
> NVMe adaptor (which you found was caused by 9d573d1954) I was getting 
> these whenever I had an external monitor (all USB-C DP tunneled):
> 
> ----
> <4>[21119.295762][T22907] thunderbolt 0000:00:0d.2: 0:5: path does not 
> end on a DP adapter, cleaning up
> <4>[21119.297327][T22907] Oops: Oops: 0000 [#1] PREEMPT SMP
> <4>[21119.297334][T22907] CPU: 4 UID: 0 PID: 22907 Comm: systemd-sleep 
> Tainted: G S   U             6.14.0-rc4-kenny+ #1
> <4>[21119.297342][T22907] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
> <4>[21119.297344][T22907] Hardware name: Dell Inc. XPS 9320/0KNXGD, BIOS 
> 2.18.1 12/24/2024
> <4>[21119.297347][T22907] RIP: 0010:__tb_path_deactivate_hop+0x5a/0x332
> <4>[21119.297359][T22907] Code: 75 d0 41 89 d6 48 89 fa 48 c7 c7 68 49 
> fe a9 e8 dc 83 f8 ff 49 8b 47 20 41 0f b6 4f 50 4
> 1 b9 91 01 00 00 49 c7 c0 70 93 ab a9 <8b> b0 00 03 00 00 8b 90 04 03 00 
> 00 48 8b 80 30 03 00 00 81 e2 ff
> <4>[21119.297363][T22907] RSP: 0000:ffffab7a1f7f37a8 EFLAGS: 00010246
> <4>[21119.297368][T22907] RAX: 0000000000000000 RBX: 0000000000000001 
> RCX: 0000000000000000
> <4>[21119.297371][T22907] RDX: 0000000000000000 RSI: 0000000000000001 
> RDI: ffff8c00af51b780
> <4>[21119.297375][T22907] RBP: ffffab7a1f7f37e8 R08: ffffffffa9ab9370 
> R09: 0000000000000191
> <4>[21119.297379][T22907] R10: ffffffffaad58d88 R11: 0000000000000003 
> R12: 0000000051c7dd20
> <4>[21119.297382][T22907] R13: ffffab7a1f7f37b0 R14: 000000000000001a 
> R15: ffffab7a00801b00
> <4>[21119.297387][T22907] FS:  00007f4822dde940(0000) 
> GS:ffff8c00af500000(0000) knlGS:0000000000000000
> <4>[21119.297393][T22907] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[21119.297397][T22907] CR2: 0000000000000300 CR3: 0000000424911002 
> CR4: 0000000000770ef0
> <4>[21119.297401][T22907] PKRU: 55555554
> <4>[21119.297404][T22907] Call Trace:
> <4>[21119.297407][T22907]  <TASK>
> <4>[21119.297413][T22907]  ? show_regs.part.0+0x1d/0x20
> <4>[21119.297425][T22907]  ? __die+0x52/0x91
> <4>[21119.297436][T22907]  ? page_fault_oops+0x9a/0x220
> <4>[21119.297444][T22907]  ? up+0x2d/0x60
> <4>[21119.297450][T22907]  ? exc_page_fault+0x2fc/0x5c0
> <4>[21119.297460][T22907]  ? asm_exc_page_fault+0x27/0x30
> <4>[21119.297469][T22907]  ? __tb_path_deactivate_hop+0x5a/0x332
> <4>[21119.297476][T22907]  ? __tb_path_deactivate_hop+0x44/0x332
> <4>[21119.297483][T22907]  __tb_path_deactivate_hops.cold+0x2e/0xaa
> <4>[21119.297490][T22907]  tb_path_deactivate+0x1e/0x110
> <4>[21119.297496][T22907]  tb_tunnel_deactivate+0x65/0x120
> ----
> 
> So when I got home this afternoon I kept throwing more pr_info() 
> checkpoints all over, and found out this was the culprit (line 436/7 of 
> ".../drivers/thunderbolt/path.c"
> ----
> return tb_port_write(port, &hop, TB_CFG_HOPS, 2 * hop_index, 2);
> ----
> 
> So I wrapped tb_port_write() with pr_info looking for bogus values and 
> found none (as well as none in the above call to it).
> 
> Taking a look at the underlying actual call to tb_cfg_write(), didn't 
> turn up anything obvious, so on a whim I did a log on .../drivers/ 
> thunderbolt and took a chance, reverted the Subject: commit and haven't 
> had a resume/hibernate crash since. (9d573d1954 is also reverted).
> 
> My typical topology is XPS-9320 -> TB Hub (I have a CalDigit TS4, a 
> Plugable TBT4-HUB3C, and a Belkin Thunderbolt 3 Dock Core, it happens on 
> all of them) and a either a USB-C DP portable monitor, or at home via a 
> USB-C-to-DisplayPort cable.
> 
> If there's any other information you need to help fix this, let me know.
> 
> -K
> 

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-02  5:36 ` Kenneth Crudup
@ 2025-03-02 16:26   ` Kenneth Crudup
  2025-03-02 16:30     ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-02 16:26 UTC (permalink / raw)
  To: Mika Westerberg, Kenneth Crudup; +Cc: linux-usb


FWIW, seeing a Metric F'ton (13171) of these after testing a hibernate 
cycle- I guess now that my resumes are completing these are occurring now:

thunderbolt 0000:00:0d.3: hotplug event from non existent switch 1:d 
(unplug: 0)

This is one of my onboard ports (presumably the right one I use all the 
time at home):

----
0000:00:0d.3 USB controller [0c03]: Intel Corporation Alder Lake-P 
Thunderbolt 4 NHI #1 [8086:466d] (rev 02) (prog-if 40 [USB4 Host Interface])
         Subsystem: Dell Device [1028:0af3]
         Flags: bus master, fast devsel, latency 0, IRQ 16, IOMMU group 8
         Memory at 6040200000 (64-bit, non-prefetchable) [size=256K]
         Memory at 60402e3000 (64-bit, non-prefetchable) [size=4K]
         Capabilities: [80] Power Management version 3
         Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
         Capabilities: [a0] MSI-X: Enable+ Count=16 Masked-
         Kernel driver in use: thunderbolt
----

-K

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-02 16:26   ` Kenneth Crudup
@ 2025-03-02 16:30     ` Kenneth Crudup
  2025-03-03 10:46       ` Mika Westerberg
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-02 16:30 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb

[-- Attachment #1: Type: text/plain, Size: 1225 bytes --]


Forgot to add the dmesg.

-K

On 3/2/25 08:26, Kenneth Crudup wrote:
> 
> FWIW, seeing a Metric F'ton (13171) of these after testing a hibernate 
> cycle- I guess now that my resumes are completing these are occurring now:
> 
> thunderbolt 0000:00:0d.3: hotplug event from non existent switch 1:d 
> (unplug: 0)
> 
> This is one of my onboard ports (presumably the right one I use all the 
> time at home):
> 
> ----
> 0000:00:0d.3 USB controller [0c03]: Intel Corporation Alder Lake-P 
> Thunderbolt 4 NHI #1 [8086:466d] (rev 02) (prog-if 40 [USB4 Host 
> Interface])
>          Subsystem: Dell Device [1028:0af3]
>          Flags: bus master, fast devsel, latency 0, IRQ 16, IOMMU group 8
>          Memory at 6040200000 (64-bit, non-prefetchable) [size=256K]
>          Memory at 60402e3000 (64-bit, non-prefetchable) [size=4K]
>          Capabilities: [80] Power Management version 3
>          Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
>          Capabilities: [a0] MSI-X: Enable+ Count=16 Masked-
>          Kernel driver in use: thunderbolt
> ----
> 
> -K
> 

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA

[-- Attachment #2: dmesg-202503020829.bz2 --]
[-- Type: application/x-bzip, Size: 103504 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-02 16:30     ` Kenneth Crudup
@ 2025-03-03 10:46       ` Mika Westerberg
  2025-03-03 11:02         ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Mika Westerberg @ 2025-03-03 10:46 UTC (permalink / raw)
  To: Kenneth Crudup; +Cc: linux-usb

Hi Kenneth,

Like discussed, let's deal one issue at the time.

It is really hard to debug anything if you keep changing the steps so
please let's keep these as separate issues:

1) Hang/crash during resume when dock + NVMe is disconnected before resume.
2) Monitor issue over DP tunnel.

For the first is this now solved if you revert
9d573d19547b3fae0c1d4e5fce52bdad3fda3664?

You can "isolate" this to PCIe side completely by doing the steps with the
commit but don't connect any monitors. Then you can do the steps (just
these , don't throw in any additional steps unless you think they are
needed but then mention them):

1. Boot the system up, nothing connected.
2. Connect TBT 4 dock to the host (no monitors)
3. Connect TBT 3 NVMe to the TBT 4 dock (no monitors)
4. Verify that the PCIe devices such as the NVMe are visible and working.
5. Suspend the system by closing the lid.
6. Unplug the device chain from the host.
7. Resume the system by opening the lid.

Expectation: System resumes just fine, PCIe devices are gone but system is
responsive.
Actual result: System does not resume and is not responsive.

If this gets solved by the revert then that's one issue nailed, good.

----------------------------------

For the second issue, I'm not sure I know the steps but since you mention
reverting d6d458d42e1e ("thunderbolt: Handle DisplayPort tunnel activation
asynchronously"), it should trigger pretty much any time you plug in
monitor so we can follow different and hopefully simpler steps:

1. Boot the system up, nothing connected.
2. Connect TBT 4 dock to the system.
3. Connect monitor to the TBT 4 dock.

Expectation: Monitor shows the screen properly.
Actual result: Blank screen.

For this issue please enable "thunderbolt.dyndbg=p+" in the kernel command
line so we can see in the dmesg what is going on. Once you reproduce (if
above steps then no need to mention, if different steps mention exactly the
simplest steps you do to reproduce) provide full dmesg of this run. I will
then take a look.

Thanks!

On Sun, Mar 02, 2025 at 08:30:55AM -0800, Kenneth Crudup wrote:
> 
> Forgot to add the dmesg.
> 
> -K
> 
> On 3/2/25 08:26, Kenneth Crudup wrote:
> > 
> > FWIW, seeing a Metric F'ton (13171) of these after testing a hibernate
> > cycle- I guess now that my resumes are completing these are occurring
> > now:
> > 
> > thunderbolt 0000:00:0d.3: hotplug event from non existent switch 1:d
> > (unplug: 0)
> > 
> > This is one of my onboard ports (presumably the right one I use all the
> > time at home):
> > 
> > ----
> > 0000:00:0d.3 USB controller [0c03]: Intel Corporation Alder Lake-P
> > Thunderbolt 4 NHI #1 [8086:466d] (rev 02) (prog-if 40 [USB4 Host
> > Interface])
> >          Subsystem: Dell Device [1028:0af3]
> >          Flags: bus master, fast devsel, latency 0, IRQ 16, IOMMU group 8
> >          Memory at 6040200000 (64-bit, non-prefetchable) [size=256K]
> >          Memory at 60402e3000 (64-bit, non-prefetchable) [size=4K]
> >          Capabilities: [80] Power Management version 3
> >          Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
> >          Capabilities: [a0] MSI-X: Enable+ Count=16 Masked-
> >          Kernel driver in use: thunderbolt
> > ----
> > 
> > -K
> > 
> 
> -- 
> Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County
> CA



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 10:46       ` Mika Westerberg
@ 2025-03-03 11:02         ` Kenneth Crudup
  2025-03-03 11:21           ` Mika Westerberg
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-03 11:02 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb


On 3/3/25 02:46, Mika Westerberg wrote:

> Like discussed, let's deal one issue at the time.

Understood, but when my machine keeps locking up after a resume from 
suspend/hibernate under normal use-cases I really need(ed) to get to the 
bottom of things, as it was beginning to interfere with my workflow.

OK, so first:

> 1) Hang/crash during resume when dock + NVMe is disconnected before resume.
> For the first is this now solved if you revert
> 9d573d19547b3fae0c1d4e5fce52bdad3fda3664 ?

Yes, and thank you for that.

> You can "isolate" this to PCIe side completely by doing the steps with the
> commit but don't connect any monitors.

Yeah, that's how I'd started to verify that, as the DP tunnel crashing 
issue was getting in the way of testing.

> If this gets solved by the revert then that's one issue nailed, good.

After several cycles this appears to be the case.
Now I'd like to help you guys figure out what was causing the panics.

> For the second issue, I'm not sure I know the steps but since you mention
> reverting d6d458d42e1e ("thunderbolt: Handle DisplayPort tunnel activation
> asynchronously"), it should trigger pretty much any time you plug in 
> monitor so we can follow different and hopefully simpler steps:
> 
> 1. Boot the system up, nothing connected.
> 2. Connect TBT 4 dock to the system.
> 3. Connect monitor to the TBT 4 dock.
> 
> Expectation: Monitor shows the screen properly.
> Actual result: Blank screen.

Actually, what was happening was connecting a monitor at any time worked 
as expected. The issue was approximately most of the time after a resume 
from suspend/hibernate, if I had an external (DP tunneled) monitor 
connected, I'd get OOPSes in the line mentioned in my first E-mail, 
which appeared from tracing to come from trying to write to a TB 
tunnel(?) which no longer existed; my (totally wild) guess was that some 
race condition between: resuming the machine and reenumerating the 
tunnels, my monitors taking their time coming out of sleep, and 
"something" happening with the async tunnel activation means it was 
hitting an NPE somewhere.

Bottom line is I've done quite a bit of testing with these reverts and 
have yet to get any resume from S/H failures since.

... and as with 9d573d19, I'd like to help fix this underlying issue, as 
maybe there's something unique to my laptop's chipset(?) (as I have 
different docks and monitors at home and when on the road but it happens 
in both scenarios).

-Kenny


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 11:02         ` Kenneth Crudup
@ 2025-03-03 11:21           ` Mika Westerberg
  2025-03-03 11:38             ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Mika Westerberg @ 2025-03-03 11:21 UTC (permalink / raw)
  To: Kenneth Crudup; +Cc: linux-usb

On Mon, Mar 03, 2025 at 03:02:30AM -0800, Kenneth Crudup wrote:
> > For the second issue, I'm not sure I know the steps but since you mention
> > reverting d6d458d42e1e ("thunderbolt: Handle DisplayPort tunnel activation
> > asynchronously"), it should trigger pretty much any time you plug in
> > monitor so we can follow different and hopefully simpler steps:
> > 
> > 1. Boot the system up, nothing connected.
> > 2. Connect TBT 4 dock to the system.
> > 3. Connect monitor to the TBT 4 dock.
> > 
> > Expectation: Monitor shows the screen properly.
> > Actual result: Blank screen.
> 
> Actually, what was happening was connecting a monitor at any time worked as
> expected. The issue was approximately most of the time after a resume from
> suspend/hibernate, if I had an external (DP tunneled) monitor connected, I'd
> get OOPSes in the line mentioned in my first E-mail, which appeared from
> tracing to come from trying to write to a TB tunnel(?) which no longer
> existed; my (totally wild) guess was that some race condition between:
> resuming the machine and reenumerating the tunnels, my monitors taking their
> time coming out of sleep, and "something" happening with the async tunnel
> activation means it was hitting an NPE somewhere.

Let's not guess, let's try to figure out the root cause.

> Bottom line is I've done quite a bit of testing with these reverts and have
> yet to get any resume from S/H failures since.

Okay that's good.

Now you say that you don't reproduce the DP tunnel issue if you simply plug
in monitor so let's try to figure out reliable steps to repro so we can
investigate.

In theory it should trigger any time you plug in monitor since the paths
are the same but okay. Then let's try to limit this to single monitor (the
one you see this most reliably and let's stick with suspend since hibernate
is more complex).

So with 9d573d19547b3 only reverted, no other changes to the kernel and
"thunderbolt.dyndbg=+p" in the command line do following steps:

1. Boot the system up, nothing connected.
2. Connect TBT 4 dock to the system.
3. Connect monitor to the TBT 4 dock.
4. Suspend the system by closing lid.
5. Resume the system by openling lid.

Expectation: Monitor over Thunderbolt still shows picture.
Actual result: Screen is blank.

If these are not accurate, can you in the same format write down your steps
how it reproduces (try to keep it minimal). Then since now resume at least
completes you can provide full dmesg to me and I can try to figure out what
is wrong there.

> ... and as with 9d573d19, I'd like to help fix this underlying issue, as
> maybe there's something unique to my laptop's chipset(?) (as I have
> different docks and monitors at home and when on the road but it happens in
> both scenarios).

I don't think this is hardware issue, I see it too in my hardware so the
commit just somehow ends up in a deadlock and of course needs to be
investigated but for the time being we can use the workaround revert and
concentrate on the DP issue at hand.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 11:21           ` Mika Westerberg
@ 2025-03-03 11:38             ` Kenneth Crudup
  2025-03-03 11:45               ` Kenneth Crudup
  2025-03-03 11:53               ` Mika Westerberg
  0 siblings, 2 replies; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-03 11:38 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb


On 3/3/25 03:21, Mika Westerberg wrote:

> Now you say that you don't reproduce the DP tunnel issue if you simply plug
> in monitor so let's try to figure out reliable steps to repro so we can
> investigate.
...
> So with 9d573d19547b3 only reverted, no other changes to the kernel and
> "thunderbolt.dyndbg=+p" in the command line do following steps:
> 
> 1. Boot the system up, nothing connected.
> 2. Connect TBT 4 dock to the system.
> 3. Connect monitor to the TBT 4 dock.
> 4. Suspend the system by closing lid.
> 5. Resume the system by openling lid.
> 
> Expectation: Monitor over Thunderbolt still shows picture.
> Actual result: Screen is blank.

I will do this in a couple of days (got a few things to do first), but 
what actually happens is I get as OOPS and have to power-button-reset to 
recover, not even SysRq-B gets me out of it (I've since added "Panic on 
OOPS" with a 15-second timeout while I was trying to figure out this DP 
monitor issue).

> Then since now resume at least
> completes you can provide full dmesg to me and I can try to figure out what
> is wrong there.

It'll have to be more pstore dumps, as resume doesn't (usually) complete 
with d6d458d42e1.

And apparently this became two issues as d6d458d42e1 was added to Linus' 
master somewhat recently, and after I'd started the issue of the NVMe 
OOPSes on resume, so I'd get two different crashes, depending on what 
was connected when I'd suspend, and what was connected when I'd resumed.

Once you'd discovered 9d5.., which I'd reverted that same day, it then 
isolated the second failure mode (which was the reason for those 
__tb_path_deactivate_hop()s in the dmesg/pstores I'd been sending, as I 
thought it was related to the original NVMe adaptor crashes, but helped 
me then find this current problematic commit).

>> ... and as with 9d573d19

> I don't think this is hardware issue, I see it too in my hardware

Sorry, wasn't clear- what I'd meant was "Just as with [the revert of] 
9d573d19 I'd like to help figure out what's causing d6d458d42e1 to OOPS 
my machine on resume."

-Kenny

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 11:38             ` Kenneth Crudup
@ 2025-03-03 11:45               ` Kenneth Crudup
  2025-03-03 11:55                 ` Mika Westerberg
  2025-03-03 11:53               ` Mika Westerberg
  1 sibling, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-03 11:45 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb



On 3/3/25 03:38, Kenneth Crudup wrote:

> It'll have to be more pstore dumps, as resume doesn't (usually) complete 
> with d6d458d42e1.

I should clarify this to read "... as resume doesn't usually complete 
with d6d... if an external USB-C DP tunneled monitor is connected."

And suspend/hibernate resumes work flawlessly on my machine with Linus' 
unchanged master if I have no NVMe adaptors connected at suspend, and/or 
if I have no USB-C DP tunneled monitors connected at resume.

-K

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 11:38             ` Kenneth Crudup
  2025-03-03 11:45               ` Kenneth Crudup
@ 2025-03-03 11:53               ` Mika Westerberg
  2025-03-03 12:33                 ` Kenneth Crudup
  1 sibling, 1 reply; 34+ messages in thread
From: Mika Westerberg @ 2025-03-03 11:53 UTC (permalink / raw)
  To: Kenneth Crudup; +Cc: linux-usb

On Mon, Mar 03, 2025 at 03:38:58AM -0800, Kenneth Crudup wrote:
> 
> On 3/3/25 03:21, Mika Westerberg wrote:
> 
> > Now you say that you don't reproduce the DP tunnel issue if you simply plug
> > in monitor so let's try to figure out reliable steps to repro so we can
> > investigate.
> ...
> > So with 9d573d19547b3 only reverted, no other changes to the kernel and
> > "thunderbolt.dyndbg=+p" in the command line do following steps:
> > 
> > 1. Boot the system up, nothing connected.
> > 2. Connect TBT 4 dock to the system.
> > 3. Connect monitor to the TBT 4 dock.
> > 4. Suspend the system by closing lid.
> > 5. Resume the system by openling lid.
> > 
> > Expectation: Monitor over Thunderbolt still shows picture.
> > Actual result: Screen is blank.
> 
> I will do this in a couple of days (got a few things to do first), but what
> actually happens is I get as OOPS and have to power-button-reset to recover,
> not even SysRq-B gets me out of it (I've since added "Panic on OOPS" with a
> 15-second timeout while I was trying to figure out this DP monitor issue).

I thought the system resumes fine after you reverted the other commit
(9d573d19), no? Just you don't get display tunneled so for example if you
login over ethernet (ssh) you should still be able to get full dmesg.

> > Then since now resume at least
> > completes you can provide full dmesg to me and I can try to figure out what
> > is wrong there.
> 
> It'll have to be more pstore dumps, as resume doesn't (usually) complete
> with d6d458d42e1.

What you mean does not usually complete?

Let's try to figure out how we can get full dmesg of the crash without need
to dig the pstore things if possible at all. I was hoping it now resumes
just you don't see picture on the screen.

We can actually take PCIe out of the equation so that you ask "boltctl" to
forget the device temporarily (or from the GNOME settings "privacy and
security" -> "Thunderbolt" then "forget device" for each).  This means your
docks do not work fully but display should and then we hopefully can get
the dmesg.

I will try to reproduce on my end too. I have serial port connected so I
can see all the messages even if the kernel crashes.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 11:45               ` Kenneth Crudup
@ 2025-03-03 11:55                 ` Mika Westerberg
  2025-03-03 12:39                   ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Mika Westerberg @ 2025-03-03 11:55 UTC (permalink / raw)
  To: Kenneth Crudup; +Cc: linux-usb

On Mon, Mar 03, 2025 at 03:45:36AM -0800, Kenneth Crudup wrote:
> 
> 
> On 3/3/25 03:38, Kenneth Crudup wrote:
> 
> > It'll have to be more pstore dumps, as resume doesn't (usually) complete
> > with d6d458d42e1.
> 
> I should clarify this to read "... as resume doesn't usually complete with
> d6d... if an external USB-C DP tunneled monitor is connected."

Okay and this "external USB-C DP tunneled" monitor you have it connected to
the TBT dock during the suspend? So you don't unplug plug or anything like
that, just suspend the system and then resume?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 11:53               ` Mika Westerberg
@ 2025-03-03 12:33                 ` Kenneth Crudup
  2025-03-03 13:13                   ` Mika Westerberg
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-03 12:33 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb


OK, I may not be explaining the history properly, so more background:

(I tend to run Linus' master that I pull every few days, partially 
'cause I like to see all the new fixes and features, and partially 
'cause over the years I'll stumble over bugs and help the subsystems' 
Maintainer(s) fix the problems.)

Anyway, late last year I'd notice lately (it wasn't happening before) 
that once I'd get to the office, my laptop would be hard-hung on resume, 
which I eventually traced back to having my NVMe adaptor connected to my 
TB Dock when I suspended/hibernated. I'd started to try to bisect it, 
but couldn't find a good starting point (or one too far back) and would 
have to give up 'cause I'd run out of time. However, I'd mention the 
issue in the mailing lists, hoping for a solution- and that's when you'd 
discovered 9d573d19.

But between your NVMe discovery (and by this time I was mostly :( 
careful about disconnecting the NVMe adaptor before suspend) and 
sometime around the beginning of the year I was also getting occasional 
hard-hangs on resume even if I hadn't had the NVMe adaptor connected on 
suspend. I'd seen where the pstore dumps were pointing to the display 
driver, so I'd switched back to the i915 from the xe driver, but that 
hadn't fixed it either. In the meantime, having seen one of the OOPses 
be in __tb_path_deactivate_hop(), I'd dropped some printks (actually 
"tb_port_info()", I think) at various points printing the line# so I 
could try and tell approximately where the crash occurred (yeah, I know 
I need to get my ksymoops up and running :) ). I hadn't made the 
correlation yet between having an external monitor connected or not, and 
having seen a number of xe/i915/dp/Thunderbolt changes come thru, was 
both hoping for the fix to be reported and corrected, or try and find 
time and find out why it was happening via my tracing.

So in late February we'd had two failure modes for me in Linus' master:
- 9d573d19 (NVMe adaptor connected on suspend causing an OOPS on resume)
- d6d458d4 (OOPS if external USB-C DP monitor connected on resume)

I couldn't/didn't recognize the 2nd issue fully until you'd discovered 
the cause of the first one.

At home I have a Samsung Odyssey monitor connected to a USB-C-to-DP 2.1 
cable, to a TB port on a CalDigit TB4 dock.

My travel bag has a generic Chinese USB-C DP tunneling portable monitor 
which is usually connected to a Plugable TB hub.

In any case, the resume failures happen with either one.

On 3/3/25 03:53, Mika Westerberg wrote:

> I thought the system resumes fine after you reverted the other commit
> (9d573d19), no? Just you don't get display tunneled so for example if you
> login over ethernet (ssh) you should still be able to get full dmesg.

Nah, it usually hard hangs if a monitor is connected when I resume; has 
to be power-cycled at that point.

> We can actually take PCIe out of the equation so that you ask "boltctl" to
> forget the device temporarily (or from the GNOME settings "privacy and
> security" -> "Thunderbolt" then "forget device" for each).  This means your
> docks do not work fully but display should and then we hopefully can get
> the dmesg.

Well my topology is almost always Laptop -> Dock -> Monitor .

This workflow came about ironically enough 'cause my client has given me 
a MS Surface (Windows) machine with only one TB/USB-C port, and since I 
will physically switch to using my own machine, to minimize setup 
changes I just use the "one cable for all" approach (i.e., never 
connecting the external monitor to the other TB port on my XPS-9320).

Oh and the failure mode for d6d458d4 is ALWAYS this, and always(?) from 
line 436/7 of ".../drivers/thunderbolt/path.c", a call to tb_port_write() :

----
<4>[  236.546634][ T4600] Oops: general protection fault, probably for 
non-canonical address 0xba65fbf27d6de496: 0000 [#1] PREEMPT SMP
<4>[  236.546646][ T4600] CPU: 7 UID: 0 PID: 4600 Comm: systemd-sleep 
Tainted: G S   U  W          6.14.0-rc4-kenny+ #10
<4>[  236.546655][ T4600] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER, [W]=WARN
<4>[  236.546657][ T4600] Hardware name: Dell Inc. XPS 9320/0KNXGD, BIOS 
2.18.1 12/24/2024
<4>[  236.546660][ T4600] RIP: 0010:__tb_path_deactivate_hop+0x11/0x49a
<4>[  236.546673][ T4600] Code: f5 f5 db 00 5a 48 8d 65 e8 5b 41 5c 41 
5d 5d c3 b8 ed ff ff ff c3 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 41 54 
53 48 83 ec 18 <4c> 8b 47 20 48 85 ff 65 4c 8b 34 25 28 00 00 00 4c 89 
75 d0 49 89
<4>[  236.546677][ T4600] RSP: 0018:ffffbe85080a77f0 EFLAGS: 00010286
<4>[  236.546682][ T4600] RAX: ffff957ee8373a20 RBX: 0000000000000000 
RCX: 0000000000000002
<4>[  236.546686][ T4600] RDX: 000000000000007d RSI: 0000000011000010 
RDI: ba65fbf27d6de476
<4>[  236.546689][ T4600] RBP: ffffbe85080a7830 R08: 0000000000000000 
R09: ffffffff84255760
<4>[  236.546691][ T4600] R10: 0000000000000000 R11: 0000000000000000 
R12: ffff957ee8373a00
<4>[  236.546693][ T4600] R13: 0000000000000000 R14: ffffbe85080a78a0 
R15: ffffbe85080a7820
<4>[  236.546696][ T4600] FS:  00007f2fcaa4a940(0000) 
GS:ffff9585af5c0000(0000) knlGS:0000000000000000
<4>[  236.546700][ T4600] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  236.546703][ T4600] CR2: 0000000000000000 CR3: 00000001f0833002 
CR4: 0000000000770ef0
<4>[  236.546705][ T4600] PKRU: 55555554
<4>[  236.546707][ T4600] Call Trace:
<4>[  236.546710][ T4600]  <TASK>
<4>[  236.546713][ T4600]  ? show_regs.part.0+0x1d/0x20
<4>[  236.546722][ T4600]  ? die_addr.cold+0x8/0xd
<4>[  236.546729][ T4600]  ? exc_general_protection+0x1c0/0x490
<4>[  236.546740][ T4600]  ? asm_exc_general_protection+0x27/0x30
<4>[  236.546747][ T4600]  ? __tb_path_deactivate_hop+0x11/0x49a
<4>[  236.546754][ T4600]  __tb_path_deactivate_hops.cold+0x2e/0xaa
<4>[  236.546760][ T4600]  tb_path_deactivate+0x1e/0x110
<4>[  236.546769][ T4600]  tb_tunnel_deactivate+0x65/0x120
<4>[  236.546775][ T4600]  tb_resume_noirq+0xc2/0x2a0
<4>[  236.546779][ T4600]  tb_domain_resume_noirq+0x3f/0x60
<4>[  236.546787][ T4600]  nhi_resume_noirq+0x34/0x90
<4>[  236.546795][ T4600]  pci_pm_restore_noirq+0x71/0xc0
<4>[  236.546801][ T4600]  ? new_id_store+0x1b0/0x1b0
<4>[  236.546807][ T4600]  dpm_run_callback+0x40/0xb0
<4>[  236.546812][ T4600]  device_resume_noirq+0xc4/0x2a0
<4>[  236.546817][ T4600]  dpm_noirq_resume_devices+0x11b/0x150
<4>[  236.546822][ T4600]  dpm_resume_start+0xc/0x30
<4>[  236.546827][ T4600]  hibernation_snapshot+0x26d/0x430
<4>[  236.546835][ T4600]  hibernate.cold+0x9c/0x333
<4>[  236.546840][ T4600]  state_store+0xbe/0xc0
<4>[  236.546845][ T4600]  kobj_attr_store+0xf/0x20
<4>[  236.546854][ T4600]  sysfs_kf_write+0x34/0x40
<4>[  236.546861][ T4600]  kernfs_fop_write_iter+0x134/0x1e0
<4>[  236.546868][ T4600]  vfs_write+0x244/0x410
<4>[  236.546878][ T4600]  ksys_write+0x63/0xd0
<4>[  236.546885][ T4600]  __x64_sys_write+0x14/0x20
<4>[  236.546892][ T4600]  x64_sys_call+0x9eb/0xa00
<4>[  236.546899][ T4600]  do_syscall_64+0x63/0xf0
<4>[  236.546906][ T4600]  ? do_syscall_64+0x6f/0xf0
<4>[  236.546913][ T4600]  ? do_filp_open+0xbe/0x170
<4>[  236.546919][ T4600]  ? from_kgid_munged+0xd/0x20
<4>[  236.546924][ T4600]  ? cp_new_stat+0x14a/0x180
<4>[  236.546931][ T4600]  ? do_wp_page+0x7f3/0xe80
<4>[  236.546936][ T4600]  ? ___pte_offset_map+0x17/0xe0
<4>[  236.546944][ T4600]  ? __handle_mm_fault+0xb13/0x1160
<4>[  236.546951][ T4600]  ? __count_memcg_events+0x49/0xe0
<4>[  236.546956][ T4600]  ? handle_mm_fault+0x181/0x2a0
<4>[  236.546961][ T4600]  ? irqentry_exit+0x4a/0x60
<4>[  236.546964][ T4600]  ? exc_page_fault+0x196/0x5c0
<4>[  236.546972][ T4600]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
<4>[  236.546977][ T4600] RIP: 0033:0x7f2fca926274
<4>[  236.546984][ T4600] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 
2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d f5 2d 0f 00 00 74 13 b8 01 
00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 
ec 20 48 89
<4>[  236.546987][ T4600] RSP: 002b:00007ffec678fb58 EFLAGS: 00000202 
ORIG_RAX: 0000000000000001
<4>[  236.546992][ T4600] RAX: ffffffffffffffda RBX: 0000000000000005 
RCX: 00007f2fca926274
<4>[  236.546994][ T4600] RDX: 0000000000000005 RSI: 000055f4304eb730 
RDI: 0000000000000007
<4>[  236.546996][ T4600] RBP: 00007ffec678fb80 R08: 0000000000000000 
R09: 0000000000000001
<4>[  236.546998][ T4600] R10: 000055f4304eb720 R11: 0000000000000202 
R12: 0000000000000005
<4>[  236.547000][ T4600] R13: 000055f4304eb730 R14: 000055f4304e12a0 
R15: 00007f2fcaa0fea0
<4>[  236.547004][ T4600]  </TASK>
<4>[  236.547006][ T4600] Modules linked in: vmw_vmci btusb btintel 
snd_soc_sof_sdw snd_soc_sdw_utils snd_sof_probes iwlmvm mei_hdcp mei_pxp 
mac80211 snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl 
snd_sof_intel_hda_generic snd_sof_pci soundwire_intel 
soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda_common 
snd_soc_hdac_hda iwlwifi snd_sof_intel_hda_mlink snd_sof_intel_hda 
mei_me cfg80211 ov01a10 xe drm_ttm_helper gpu_sched drm_suballoc_helper 
drm_gpuvm drm_exec i915 drm_buddy intel_gtt drm_display_helper cec ttm
<4>[  236.547061][ T4600] ---[ end trace 0000000000000000 ]---
----

-Kenny

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 11:55                 ` Mika Westerberg
@ 2025-03-03 12:39                   ` Kenneth Crudup
  2025-03-03 12:51                     ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-03 12:39 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb


On 3/3/25 03:55, Mika Westerberg wrote:

>> I should clarify this to read "... as resume doesn't usually complete with
>> d6d... if an external USB-C DP tunneled monitor is connected."
> 
> Okay and this "external USB-C DP tunneled" monitor you have it connected to
> the TBT dock during the suspend? So you don't unplug plug or anything like
> that, just suspend the system and then resume?

While figuring it out, the issue only manifested itself if a USB-C DP 
monitor was connected on resume; it didn't matter if it were connected 
on suspend or not (nor which monitor).

-Kenny

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 12:39                   ` Kenneth Crudup
@ 2025-03-03 12:51                     ` Kenneth Crudup
  0 siblings, 0 replies; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-03 12:51 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb


... and FWIW I've been tempting fate while writing these emails this 
morning too, as I've found nothing breaks a thing faster than declaring 
it fixed.

With both reverts on Linus master (14-rc5) I've since moved from the 
CalDigit setup (with attached NVMe) to resuming from hibernate into 
nothing connected to the laptop at all, to finally suspending then 
moving to the downstairs setup, which is a standard LED monitor 
connected via a DP cable to a Belkin TB dock; been 3 for 3 with no failures.

LMK if there's anything I can do to help diagnose d6d4... ; having seen 
the commit message it looks like quite a useful commit (and maybe 
solvable with a race-condition fix?).

-K

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 12:33                 ` Kenneth Crudup
@ 2025-03-03 13:13                   ` Mika Westerberg
  2025-03-03 13:19                     ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Mika Westerberg @ 2025-03-03 13:13 UTC (permalink / raw)
  To: Kenneth Crudup; +Cc: linux-usb

On Mon, Mar 03, 2025 at 04:33:08AM -0800, Kenneth Crudup wrote:
> 
> OK, I may not be explaining the history properly, so more background:
> 
> (I tend to run Linus' master that I pull every few days, partially 'cause I
> like to see all the new fixes and features, and partially 'cause over the
> years I'll stumble over bugs and help the subsystems' Maintainer(s) fix the
> problems.)
> 
> Anyway, late last year I'd notice lately (it wasn't happening before) that
> once I'd get to the office, my laptop would be hard-hung on resume, which I
> eventually traced back to having my NVMe adaptor connected to my TB Dock
> when I suspended/hibernated. I'd started to try to bisect it, but couldn't
> find a good starting point (or one too far back) and would have to give up
> 'cause I'd run out of time. However, I'd mention the issue in the mailing
> lists, hoping for a solution- and that's when you'd discovered 9d573d19.
> 
> But between your NVMe discovery (and by this time I was mostly :( careful
> about disconnecting the NVMe adaptor before suspend) and sometime around the
> beginning of the year I was also getting occasional hard-hangs on resume
> even if I hadn't had the NVMe adaptor connected on suspend. I'd seen where
> the pstore dumps were pointing to the display driver, so I'd switched back
> to the i915 from the xe driver, but that hadn't fixed it either. In the
> meantime, having seen one of the OOPses be in __tb_path_deactivate_hop(),
> I'd dropped some printks (actually "tb_port_info()", I think) at various
> points printing the line# so I could try and tell approximately where the
> crash occurred (yeah, I know I need to get my ksymoops up and running :) ).
> I hadn't made the correlation yet between having an external monitor
> connected or not, and having seen a number of xe/i915/dp/Thunderbolt changes
> come thru, was both hoping for the fix to be reported and corrected, or try
> and find time and find out why it was happening via my tracing.
> 
> So in late February we'd had two failure modes for me in Linus' master:
> - 9d573d19 (NVMe adaptor connected on suspend causing an OOPS on resume)
> - d6d458d4 (OOPS if external USB-C DP monitor connected on resume)
> 
> I couldn't/didn't recognize the 2nd issue fully until you'd discovered the
> cause of the first one.
> 
> At home I have a Samsung Odyssey monitor connected to a USB-C-to-DP 2.1
> cable, to a TB port on a CalDigit TB4 dock.
> 
> My travel bag has a generic Chinese USB-C DP tunneling portable monitor
> which is usually connected to a Plugable TB hub.
> 
> In any case, the resume failures happen with either one.

Okay thanks for elaborating that.

> On 3/3/25 03:53, Mika Westerberg wrote:
> 
> > I thought the system resumes fine after you reverted the other commit
> > (9d573d19), no? Just you don't get display tunneled so for example if you
> > login over ethernet (ssh) you should still be able to get full dmesg.
> 
> Nah, it usually hard hangs if a monitor is connected when I resume; has to
> be power-cycled at that point.
> 
> > We can actually take PCIe out of the equation so that you ask "boltctl" to
> > forget the device temporarily (or from the GNOME settings "privacy and
> > security" -> "Thunderbolt" then "forget device" for each).  This means your
> > docks do not work fully but display should and then we hopefully can get
> > the dmesg.
> 
> Well my topology is almost always Laptop -> Dock -> Monitor .

Okay.

> This workflow came about ironically enough 'cause my client has given me a
> MS Surface (Windows) machine with only one TB/USB-C port, and since I will
> physically switch to using my own machine, to minimize setup changes I just
> use the "one cable for all" approach (i.e., never connecting the external
> monitor to the other TB port on my XPS-9320).
> 
> Oh and the failure mode for d6d458d4 is ALWAYS this, and always(?) from line
> 436/7 of ".../drivers/thunderbolt/path.c", a call to tb_port_write() :

That's also weird because we don't do anything for DP tunnels on resume so
what this code is doing is to clean up for the tunnels left by the boot
kernel (since this is hibernate). The code added by d6d458d4 is not run
yet, only later on when we get hotplugs from the connected device DP OUT
adapter. I will see if I can reproduce this on my setup, next.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 13:13                   ` Mika Westerberg
@ 2025-03-03 13:19                     ` Kenneth Crudup
  2025-03-03 13:23                       ` Mika Westerberg
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-03 13:19 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb


 >> Oh and the failure mode for d6d458d4 is ALWAYS this, and always(?) 
from line
 >> 436/7 of ".../drivers/thunderbolt/path.c", a call to tb_port_write():

I was looking thru the pstore dumps; it may not ALWAYS be line 436, but 
it is always in "tb_port_write()".

On 3/3/25 05:13, Mika Westerberg wrote:

> That's also weird because we don't do anything for DP tunnels on resume so
> what this code is doing is to clean up for the tunnels left by the boot
> kernel (since this is hibernate). 

OK, but for completeness' sake the crash happens on a suspend resume as 
well (and the same failure mode as a resume hibernate).

-Kenny

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 13:19                     ` Kenneth Crudup
@ 2025-03-03 13:23                       ` Mika Westerberg
  2025-03-03 13:46                         ` Mika Westerberg
  0 siblings, 1 reply; 34+ messages in thread
From: Mika Westerberg @ 2025-03-03 13:23 UTC (permalink / raw)
  To: Kenneth Crudup; +Cc: linux-usb

On Mon, Mar 03, 2025 at 05:19:27AM -0800, Kenneth Crudup wrote:
> 
> >> Oh and the failure mode for d6d458d4 is ALWAYS this, and always(?) from
> line
> >> 436/7 of ".../drivers/thunderbolt/path.c", a call to tb_port_write():
> 
> I was looking thru the pstore dumps; it may not ALWAYS be line 436, but it
> is always in "tb_port_write()".
> 
> On 3/3/25 05:13, Mika Westerberg wrote:
> 
> > That's also weird because we don't do anything for DP tunnels on resume so
> > what this code is doing is to clean up for the tunnels left by the boot
> > kernel (since this is hibernate).
> 
> OK, but for completeness' sake the crash happens on a suspend resume as well
> (and the same failure mode as a resume hibernate).

On suspend/resume we don't do anything for the DP tunnels, not even clean
up anything as compared to hibernate so it's double weird that at that
point d6d458d4 affects it in any way.

Since if it is happening over suspend resume, I suggest we stick with that
for repro because it involves simpler code paths.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 13:23                       ` Mika Westerberg
@ 2025-03-03 13:46                         ` Mika Westerberg
  2025-03-03 13:53                           ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Mika Westerberg @ 2025-03-03 13:46 UTC (permalink / raw)
  To: Kenneth Crudup; +Cc: linux-usb

On Mon, Mar 03, 2025 at 03:23:27PM +0200, Mika Westerberg wrote:
> Since if it is happening over suspend resume, I suggest we stick with that
> for repro because it involves simpler code paths.

Okay I'm now trying with following.

0. "Forget" all devices from boltctl to make sure PCIe is not involved.

1. Boot the system, nothing connected.
2. Plug in TBT 4 dock.
3. Plug in DP monitor through DP to Type-C adapter to the TBT 4 dock.
4. Verify that there is picture on that monitor.
5. Enter system suspend (s2idle):

  # rtcwake -s 30 -mmem

6. Once it wakes up verify that the monitor display a picture.
7. Repeat steps 5. and 6. several times.

I did serveral iterations (will do more) but I did not see any issues.

Can you try this one first, when you have time. If you see the issue, try
to take full dmesg.

---------------------------------------------------------------------

Then I tried another flow.

0. "Forget" all devices from boltctl to make sure PCIe is not involved.

1. Boot the system, nothing connected.
2. Plug in TBT 4 dock.
3. Plug in DP monitor through DP to Type-C adapter to the TBT 4 dock.
4. Verify that there is picture on that monitor.
5. Enter system suspend (s2idle):

  # rtcwake -s 30 -mmem

6. Once the system is suspended, unplug the monitor.
7. Once system resumes it should stay responsive.
8. Repeat steps 3. - 7. several times.

Here too, I don't see any issues. Please try this too if you have not been
able to reproduce with the first flow.

---------------------------------------------------------------------

Then yet another flow

0. "Forget" all devices from boltctl to make sure PCIe is not involved.

1. Boot the system, nothing connected.
2. Plug in TBT 4 dock.
3. Plug in DP monitor through DP to Type-C adapter to the TBT 4 dock.
4. Verify that there is picture on that monitor.
5. Enter system suspend (s2idle):

  # rtcwake -s 30 -mmem

6. Once the system is suspended, unplug the monitor.
7. Plug it back while the system is still suspended. You can use different
   Type-C port too.
8. Once system resumes the monitor should come up and show picture.
9. Repeat steps 5. - 8. several times.

I was not able to trigger any issues with this flow either but I'll
continue with them. Please try this one too if you don't manage to
reproduce the issue with the above two.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 13:46                         ` Mika Westerberg
@ 2025-03-03 13:53                           ` Kenneth Crudup
  2025-03-03 14:01                             ` Mika Westerberg
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-03 13:53 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb


No failures for you at all? OK. It'll take me a couple of days to do all 
the steps, but I'll get on it and get back to you.

 > "Forget" all devices from boltctl to make sure PCIe is not involved.

What's this do? And my system recognizes all new TB devices 
automatically (no manual intervention required).

----
$ boltctl

<snip>

  ● Belkin Thunderbolt 3 Dock Core
    ├─ type:          peripheral
    ├─ name:          Thunderbolt 3 Dock Core
    ├─ vendor:        Belkin
    ├─ uuid:          c2010000-0072-7c1e-8310-72c784524a06
    ├─ generation:    Thunderbolt 3
    ├─ status:        authorized
    │  ├─ domain:     60838780-4021-ceb2-ffff-ffffffffffff
    │  ├─ rx speed:   40 Gb/s = 2 lanes * 20 Gb/s
    │  ├─ tx speed:   40 Gb/s = 2 lanes * 20 Gb/s
    │  └─ authflags:  none
    ├─ authorized:    Mon 03 Mar 2025 11:59:48 AM UTC
    ├─ connected:     Mon 03 Mar 2025 11:59:48 AM UTC
    └─ stored:        Sat 29 Jun 2024 11:03:09 PM UTC
       ├─ policy:     iommu
       └─ key:        no
---

On 3/3/25 05:46, Mika Westerberg wrote:
> On Mon, Mar 03, 2025 at 03:23:27PM +0200, Mika Westerberg wrote:
>> Since if it is happening over suspend resume, I suggest we stick with that
>> for repro because it involves simpler code paths.

-Kenny

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 13:53                           ` Kenneth Crudup
@ 2025-03-03 14:01                             ` Mika Westerberg
  2025-03-03 14:10                               ` Kenneth Crudup
  2025-03-03 14:17                               ` Kenneth Crudup
  0 siblings, 2 replies; 34+ messages in thread
From: Mika Westerberg @ 2025-03-03 14:01 UTC (permalink / raw)
  To: Kenneth Crudup; +Cc: linux-usb

On Mon, Mar 03, 2025 at 05:53:22AM -0800, Kenneth Crudup wrote:
> 
> No failures for you at all? OK. It'll take me a couple of days to do all the
> steps, but I'll get on it and get back to you.
> 
> > "Forget" all devices from boltctl to make sure PCIe is not involved.
> 
> What's this do? And my system recognizes all new TB devices automatically
> (no manual intervention required).

Right it does that if you have screen unlocked.

If you "forget" them then it should in theory at least keep from creating
PCIe tunnels, so keeping them out of the equation (we just want to
concentrate on the TB/DP side here).

Actually if you use GNOME there is a better way, same dialog but switch off

  "Allow direct access to devices such as docks and external GPUs."

this will temporarily stop authorizing PCIe tunnels. You can switch it back
on when you are done reproducing.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 14:01                             ` Mika Westerberg
@ 2025-03-03 14:10                               ` Kenneth Crudup
  2025-03-03 14:20                                 ` Mika Westerberg
  2025-03-03 14:17                               ` Kenneth Crudup
  1 sibling, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-03 14:10 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb


>> And my system recognizes all new TB devices automatically
>> (no manual intervention required).

On 3/3/25 06:01, Mika Westerberg wrote:

> Right it does that if you have screen unlocked.

I'm running Kubuntu (24.10); AFAIK it just allows them anyway. The 
"System Settings" dialog is just an Enable/Disable toggle.

> If you "forget" them then it should in theory at least keep from creating
> PCIe tunnels, so keeping them out of the equation (we just want to
> concentrate on the TB/DP side here).

But what I can try is just connecting the monitors directly; the 
portable monitor directly to one of the laptop's USB-Cs, and the 
Odyssey's USB-C-to-DP w/o using the dock.

Now I'm curious, and may take an hour or so to try this stuff out.

-Kenny

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 14:01                             ` Mika Westerberg
  2025-03-03 14:10                               ` Kenneth Crudup
@ 2025-03-03 14:17                               ` Kenneth Crudup
  1 sibling, 0 replies; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-03 14:17 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb

[-- Attachment #1: Type: text/plain, Size: 403 bytes --]


Oh, and to be complete I should point out I have the following attached 
commit against Linus' master, else I don't get full power savings during 
s0ix sleep.

(I've been trying to get the PM people to get a version of this in for 
nearly a year now.)

I don't think it's relevant here, but just-in-case ....

-Kenny

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA

[-- Attachment #2: 0001-PCI-ASPM-Fixup-ASPM-for-VMD-bridges.patch --]
[-- Type: text/x-patch, Size: 2110 bytes --]

From 849e27052d5a1e279cce7b6ab14e40a39c3b2b24 Mon Sep 17 00:00:00 2001
From: "Kenneth R. Crudup" <kenny@panix.com>
Date: Fri, 13 Dec 2024 15:28:42 -0800
Subject: [PATCH] PCI/ASPM: Fixup ASPM for VMD bridges

Effectively a squashed commit of:
UBUNTU: SAUCE: PCI/ASPM: Enable ASPM for links under VMD domain
UBUNTU: SAUCE: PCI/ASPM: Enable LTR for endpoints behind VMD
UBUNTU: SAUCE: vmd: fixup bridge ASPM by driver name instead
---
 drivers/pci/pcie/aspm.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 28567d457613..a5df6230cf3c 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -768,6 +768,31 @@ static void aspm_l1ss_init(struct pcie_link_state *link)
 		aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
 }
 
+/*
+ * BIOS may not be able to access config space of devices under VMD domain, so
+ * it relies on software to enable ASPM for links under VMD.
+ */
+static bool pci_fixup_vmd_bridge_enable_aspm(struct pci_dev *pdev)
+{
+       struct pci_bus *bus = pdev->bus;
+       struct device *dev;
+       struct pci_driver *pdrv;
+
+       if (!pci_is_root_bus(bus))
+               return false;
+
+       dev = bus->bridge->parent;
+       if (dev == NULL)
+               return false;
+
+       pdrv = pci_dev_driver(to_pci_dev(dev));
+       if (pdrv == NULL || strcmp("vmd", pdrv->name))
+               return false;
+
+       pci_info(pdev, "enable ASPM for pci bridge behind vmd");
+       return true;
+}
+
 static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
 {
 	struct pci_dev *child = link->downstream, *parent = link->pdev;
@@ -846,7 +871,8 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
 	}
 
 	/* Save default state */
-	link->aspm_default = link->aspm_enabled;
+	link->aspm_default = pci_fixup_vmd_bridge_enable_aspm(parent) ?
+		PCIE_LINK_STATE_ASPM_ALL : link->aspm_enabled;
 
 	/* Setup initial capable state. Will be updated later */
 	link->aspm_capable = link->aspm_support;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 14:10                               ` Kenneth Crudup
@ 2025-03-03 14:20                                 ` Mika Westerberg
  2025-03-03 14:33                                   ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Mika Westerberg @ 2025-03-03 14:20 UTC (permalink / raw)
  To: Kenneth Crudup; +Cc: linux-usb

On Mon, Mar 03, 2025 at 06:10:09AM -0800, Kenneth Crudup wrote:
> 
> > > And my system recognizes all new TB devices automatically
> > > (no manual intervention required).
> 
> On 3/3/25 06:01, Mika Westerberg wrote:
> 
> > Right it does that if you have screen unlocked.
> 
> I'm running Kubuntu (24.10); AFAIK it just allows them anyway. The "System
> Settings" dialog is just an Enable/Disable toggle.

Ah okay then nevermind.

> > If you "forget" them then it should in theory at least keep from creating
> > PCIe tunnels, so keeping them out of the equation (we just want to
> > concentrate on the TB/DP side here).
> 
> But what I can try is just connecting the monitors directly; the portable
> monitor directly to one of the laptop's USB-Cs, and the Odyssey's
> USB-C-to-DP w/o using the dock.

Actually just managed to reproduce this with hibernate \o/ so debugging
now.

My steps:

(I run buildroot based distro on my test systems so there is nothing
 authorizing PCIe tunnels by default)

1. Boot the system up, nothing connected.
2. Connect TBT 4 dock to the host.
3. Connect monitor to the TBT 4 dock.
4. Verify picture on screen.
5. Enter hibernate

  # rtcwake -s 60 -m disk

6. Once booted up and resumed from disk verify that the monitor displays
   correctly.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 14:20                                 ` Mika Westerberg
@ 2025-03-03 14:33                                   ` Kenneth Crudup
  2025-03-03 17:58                                     ` Mika Westerberg
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-03 14:33 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb


On 3/3/25 06:20, Mika Westerberg wrote:

> Actually just managed to reproduce this with hibernate \o/ so debugging
> now.

OK, this is good ... but now you've got me wondering if I indeed saw it 
during suspend cycles as well (I usually suspend only, then systemd will 
initiate a hibernation after 4H so just going back/forth to the office 
shouldn't trigger this).

Waiting to see what you find,

-Kenny

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 14:33                                   ` Kenneth Crudup
@ 2025-03-03 17:58                                     ` Mika Westerberg
  2025-03-03 18:20                                       ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Mika Westerberg @ 2025-03-03 17:58 UTC (permalink / raw)
  To: Kenneth Crudup; +Cc: linux-usb

On Mon, Mar 03, 2025 at 06:33:06AM -0800, Kenneth Crudup wrote:
> 
> On 3/3/25 06:20, Mika Westerberg wrote:
> 
> > Actually just managed to reproduce this with hibernate \o/ so debugging
> > now.
> 
> OK, this is good ... but now you've got me wondering if I indeed saw it
> during suspend cycles as well (I usually suspend only, then systemd will
> initiate a hibernation after 4H so just going back/forth to the office
> shouldn't trigger this).
> 
> Waiting to see what you find,

Okay, I think I figured out what is going on. Indeed d6d458d4 is buggy but
not the way I thought it was ;-) What actually happens is that once we
resume from hibernate we discover the tunnels created by the boot kernel
and tear them down. For discovery we never start the DPRX negotiation but
we still ended up calling tb_dp_dprx_stop() which does tb_tunnel_put() and
this releases the tunnel object. All accesses after this and up touching
already freed memory!

I've played with the below patch for a while and I have not seen that issue
anymore. Can you try it out on your end too?

diff --git a/drivers/thunderbolt/tunnel.c b/drivers/thunderbolt/tunnel.c
index 8229a6fbda5a..717b31d78728 100644
--- a/drivers/thunderbolt/tunnel.c
+++ b/drivers/thunderbolt/tunnel.c
@@ -1009,6 +1009,8 @@ static int tb_dp_dprx_start(struct tb_tunnel *tunnel)
 	 */
 	tb_tunnel_get(tunnel);
 
+	tunnel->dprx_started = true;
+
 	if (tunnel->callback) {
 		tunnel->dprx_timeout = dprx_timeout_to_ktime(dprx_timeout);
 		queue_delayed_work(tunnel->tb->wq, &tunnel->dprx_work, 0);
@@ -1021,9 +1023,12 @@ static int tb_dp_dprx_start(struct tb_tunnel *tunnel)
 
 static void tb_dp_dprx_stop(struct tb_tunnel *tunnel)
 {
-	tunnel->dprx_canceled = true;
-	cancel_delayed_work(&tunnel->dprx_work);
-	tb_tunnel_put(tunnel);
+	if (tunnel->dprx_started) {
+		tunnel->dprx_started = false;
+		tunnel->dprx_canceled = true;
+		cancel_delayed_work(&tunnel->dprx_work);
+		tb_tunnel_put(tunnel);
+	}
 }
 
 static int tb_dp_activate(struct tb_tunnel *tunnel, bool active)
diff --git a/drivers/thunderbolt/tunnel.h b/drivers/thunderbolt/tunnel.h
index 7f6d3a18a41e..8a0a0cb21a89 100644
--- a/drivers/thunderbolt/tunnel.h
+++ b/drivers/thunderbolt/tunnel.h
@@ -63,6 +63,7 @@ enum tb_tunnel_state {
  * @allocated_down: Allocated downstream bandwidth (only for USB3)
  * @bw_mode: DP bandwidth allocation mode registers can be used to
  *	     determine consumed and allocated bandwidth
+ * @dprx_started: DPRX negotiation was started (tb_dp_dprx_start() was called for it)
  * @dprx_canceled: Was DPRX capabilities read poll canceled
  * @dprx_timeout: If set DPRX capabilities read poll work will timeout after this passes
  * @dprx_work: Worker that is scheduled to poll completion of DPRX capabilities read
@@ -100,6 +101,7 @@ struct tb_tunnel {
 	int allocated_up;
 	int allocated_down;
 	bool bw_mode;
+	bool dprx_started;
 	bool dprx_canceled;
 	ktime_t dprx_timeout;
 	struct delayed_work dprx_work;

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 17:58                                     ` Mika Westerberg
@ 2025-03-03 18:20                                       ` Kenneth Crudup
  2025-03-03 19:44                                         ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-03 18:20 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb



On 3/3/25 09:58, Mika Westerberg wrote:
> For discovery we never start the DPRX negotiation but
> we still ended up calling tb_dp_dprx_stop() which does tb_tunnel_put() and
> this releases the tunnel object. All accesses after this and up touching
> already freed memory!

> I've played with the below patch for a while and I have not seen that issue
> anymore. Can you try it out on your end too?

Building now. Fingers crossed!

-K

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 18:20                                       ` Kenneth Crudup
@ 2025-03-03 19:44                                         ` Kenneth Crudup
  2025-03-04  8:27                                           ` Mika Westerberg
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-03 19:44 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb


On 3/3/25 10:20, Kenneth Crudup wrote:

> Building now. Fingers crossed!

So far, so good- tried a variety of suspend/hibernate with/without 
scenarios on none, one and two connected monitors, and I can't get any 
resume OOPSes. Nice!

I did see one anomaly I haven't seen before, but I'm not sure if it's 
related to this patch (or original commit, masked by the OOPS) or not. 
For some reason after resuming from the 2nd or 3rd hibernation cycle my 
Belkin Dock couldn't get authorized by boltd after I'd plugged it in 
post-hibernation-resume. It was indeed authorized the first time 
(post-hibernate) with the new code (was plugged in at the time of resume):

----
2025-03-03T10:39:34.405568-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] parent is 63ae8780-500c...
2025-03-03T10:39:34.406815-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] connected: connected 
(/sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1)
2025-03-03T10:39:34.406995-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] auto-auth: authmode: 
enabled, policy: iommu, iommu: yes -> ok
2025-03-03T10:39:34.407094-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] auto-auth: security: 
iommu+user mode, key: no -> ok
2025-03-03T10:39:34.407287-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] authorize: authorization 
prepared for 'user' level
2025-03-03T10:39:34.408876-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] udev: device changed: 
authorizing -> authorizing
2025-03-03T10:39:34.412223-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] udev: device changed: 
authorizing -> authorizing
2025-03-03T10:39:34.417191-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] authorize: finished: ok 
(status: authorized, flags: 0)
2025-03-03T10:39:34.417414-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] auto-auth: authorization 
successful
2025-03-03T10:39:34.419207-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] udev: device changed: 
authorized -> authorized
2025-03-03T10:47:42.252854-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] disconnected 
(/sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1)
----

But after that, it wouldn't get authorized again until I'd rebooted:
----
2025-03-03T10:47:42.252854-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] disconnected 
(/sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1)
2025-03-03T10:49:24.319123-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] parent is 63ae8780-500c...
2025-03-03T10:49:24.320239-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] connected: connected 
(/sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1)
2025-03-03T10:49:24.320320-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] auto-auth: authmode: 
enabled, policy: iommu, iommu: yes -> ok
2025-03-03T10:49:24.320368-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] auto-auth: security: 
iommu+user mode, key: no -> ok
2025-03-03T10:49:24.320449-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] authorize: authorization 
prepared for 'user' level
2025-03-03T10:49:24.320539-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] udev: device changed: 
authorizing -> authorizing
2025-03-03T10:49:24.321698-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] udev: device changed: 
authorizing -> authorizing
2025-03-03T10:49:24.335697-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] authorize: finished: FAIL 
(status: auth-error, flags: 0)
2025-03-03T10:49:24.335817-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] auto-auth: authorization 
failed: kernel error: write error: Cannot allocate memory
2025-03-03T10:49:59.011121-08:00 xps-9320 boltd[1240]: 
[c2010000-0072-Thunderbolt 3 Dock Core    ] disconnected 
(/sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1)
----

Oh, and while I couldn't see any of the USB functions of the dock, the 
DP tunnel was working, the external (cable-attached) monitor was on. 
There were no kernel messages from the failure either (but I didn't have 
TB dyndbg turned on).

Several attempts at reconnecting and a fully-disconnected power-cycle of 
the dock gave the same error until I'd rebooted the laptop. What's 
interesting is my CalDigit dock had no problem being recognized when I'd 
plugged it in during these failures:

----
2025-03-03T11:03:33.383513-08:00 xps-9320 boltd[1240]: 
[80a78780-00b3-TS4                        ] parent is 833f8780-3179...
2025-03-03T11:03:33.385441-08:00 xps-9320 boltd[1240]: 
[80a78780-00b3-TS4                        ] connected: connected 
(/sys/devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1)
2025-03-03T11:03:33.385585-08:00 xps-9320 boltd[1240]: 
[80a78780-00b3-TS4                        ] auto-auth: authmode: 
enabled, policy: iommu, iommu: yes -> ok
2025-03-03T11:03:33.385635-08:00 xps-9320 boltd[1240]: 
[80a78780-00b3-TS4                        ] auto-auth: security: 
iommu+user mode, key: no -> ok
2025-03-03T11:03:33.385733-08:00 xps-9320 boltd[1240]: 
[80a78780-00b3-TS4                        ] authorize: authorization 
prepared for 'user' level
2025-03-03T11:03:33.387211-08:00 xps-9320 boltd[1240]: 
[80a78780-00b3-TS4                        ] udev: device changed: 
authorizing -> authorizing
2025-03-03T11:03:33.389891-08:00 xps-9320 boltd[1240]: 
[80a78780-00b3-TS4                        ] udev: device changed: 
authorizing -> authorizing
2025-03-03T11:03:34.395468-08:00 xps-9320 boltd[1240]: 
[80a78780-00b3-TS4                        ] authorize: finished: ok 
(status: authorized, flags: 0)
2025-03-03T11:03:34.395641-08:00 xps-9320 boltd[1240]: 
[80a78780-00b3-TS4                        ] auto-auth: authorization 
successful
2025-03-03T11:03:34.395943-08:00 xps-9320 boltd[1240]: 
[80a78780-00b3-TS4                        ] udev: device changed: 
authorized -> authorized
----

I'll keep an eye out for it if it happens again, but at least it's not 
crashing!

-Kenny

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-03 19:44                                         ` Kenneth Crudup
@ 2025-03-04  8:27                                           ` Mika Westerberg
  2025-03-04 12:52                                             ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Mika Westerberg @ 2025-03-04  8:27 UTC (permalink / raw)
  To: Kenneth Crudup; +Cc: linux-usb

On Mon, Mar 03, 2025 at 11:44:13AM -0800, Kenneth Crudup wrote:
> 
> On 3/3/25 10:20, Kenneth Crudup wrote:
> 
> > Building now. Fingers crossed!
> 
> So far, so good- tried a variety of suspend/hibernate with/without scenarios
> on none, one and two connected monitors, and I can't get any resume OOPSes.
> Nice!

Okay cool, let me know any findings.

> I did see one anomaly I haven't seen before, but I'm not sure if it's
> related to this patch (or original commit, masked by the OOPS) or not. For
> some reason after resuming from the 2nd or 3rd hibernation cycle my Belkin
> Dock couldn't get authorized by boltd after I'd plugged it in
> post-hibernation-resume. It was indeed authorized the first time
> (post-hibernate) with the new code (was plugged in at the time of resume):
> 
> ----
> 2025-03-03T10:39:34.405568-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] parent is 63ae8780-500c...
> 2025-03-03T10:39:34.406815-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] connected: connected
> (/sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1)
> 2025-03-03T10:39:34.406995-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] auto-auth: authmode: enabled,
> policy: iommu, iommu: yes -> ok
> 2025-03-03T10:39:34.407094-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] auto-auth: security: iommu+user
> mode, key: no -> ok
> 2025-03-03T10:39:34.407287-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] authorize: authorization
> prepared for 'user' level
> 2025-03-03T10:39:34.408876-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] udev: device changed:
> authorizing -> authorizing
> 2025-03-03T10:39:34.412223-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] udev: device changed:
> authorizing -> authorizing
> 2025-03-03T10:39:34.417191-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] authorize: finished: ok (status:
> authorized, flags: 0)
> 2025-03-03T10:39:34.417414-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] auto-auth: authorization
> successful
> 2025-03-03T10:39:34.419207-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] udev: device changed: authorized
> -> authorized
> 2025-03-03T10:47:42.252854-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] disconnected
> (/sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1)
> ----
> 
> But after that, it wouldn't get authorized again until I'd rebooted:
> ----
> 2025-03-03T10:47:42.252854-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] disconnected
> (/sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1)
> 2025-03-03T10:49:24.319123-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] parent is 63ae8780-500c...
> 2025-03-03T10:49:24.320239-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] connected: connected
> (/sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1)
> 2025-03-03T10:49:24.320320-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] auto-auth: authmode: enabled,
> policy: iommu, iommu: yes -> ok
> 2025-03-03T10:49:24.320368-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] auto-auth: security: iommu+user
> mode, key: no -> ok
> 2025-03-03T10:49:24.320449-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] authorize: authorization
> prepared for 'user' level
> 2025-03-03T10:49:24.320539-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] udev: device changed:
> authorizing -> authorizing
> 2025-03-03T10:49:24.321698-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] udev: device changed:
> authorizing -> authorizing
> 2025-03-03T10:49:24.335697-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] authorize: finished: FAIL
> (status: auth-error, flags: 0)
> 2025-03-03T10:49:24.335817-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] auto-auth: authorization failed:
> kernel error: write error: Cannot allocate memory

This could happen if you unplug the device (or the link goes down) in the
middle of creating PCIe tunnel, it ends up returning -ENOMEM. If you have
dmesg with "thunderbolt.dyndbg=+p" that would help to confirm.

In any other cases (e.g you did not unplug in the middle) this is unexpected.

> 2025-03-03T10:49:59.011121-08:00 xps-9320 boltd[1240]:
> [c2010000-0072-Thunderbolt 3 Dock Core    ] disconnected
> (/sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1)
> ----
> 
> Oh, and while I couldn't see any of the USB functions of the dock, the DP
> tunnel was working, the external (cable-attached) monitor was on. There were
> no kernel messages from the failure either (but I didn't have TB dyndbg
> turned on).
> 
> Several attempts at reconnecting and a fully-disconnected power-cycle of the
> dock gave the same error until I'd rebooted the laptop. What's interesting
> is my CalDigit dock had no problem being recognized when I'd plugged it in
> during these failures:
> 
> ----
> 2025-03-03T11:03:33.383513-08:00 xps-9320 boltd[1240]: [80a78780-00b3-TS4
> ] parent is 833f8780-3179...
> 2025-03-03T11:03:33.385441-08:00 xps-9320 boltd[1240]: [80a78780-00b3-TS4
> ] connected: connected
> (/sys/devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1)
> 2025-03-03T11:03:33.385585-08:00 xps-9320 boltd[1240]: [80a78780-00b3-TS4
> ] auto-auth: authmode: enabled, policy: iommu, iommu: yes -> ok
> 2025-03-03T11:03:33.385635-08:00 xps-9320 boltd[1240]: [80a78780-00b3-TS4
> ] auto-auth: security: iommu+user mode, key: no -> ok
> 2025-03-03T11:03:33.385733-08:00 xps-9320 boltd[1240]: [80a78780-00b3-TS4
> ] authorize: authorization prepared for 'user' level
> 2025-03-03T11:03:33.387211-08:00 xps-9320 boltd[1240]: [80a78780-00b3-TS4
> ] udev: device changed: authorizing -> authorizing
> 2025-03-03T11:03:33.389891-08:00 xps-9320 boltd[1240]: [80a78780-00b3-TS4
> ] udev: device changed: authorizing -> authorizing
> 2025-03-03T11:03:34.395468-08:00 xps-9320 boltd[1240]: [80a78780-00b3-TS4
> ] authorize: finished: ok (status: authorized, flags: 0)
> 2025-03-03T11:03:34.395641-08:00 xps-9320 boltd[1240]: [80a78780-00b3-TS4
> ] auto-auth: authorization successful
> 2025-03-03T11:03:34.395943-08:00 xps-9320 boltd[1240]: [80a78780-00b3-TS4
> ] udev: device changed: authorized -> authorized
> ----
> 
> I'll keep an eye out for it if it happens again, but at least it's not
> crashing!

If possible add "thunderbolt.dyndbg=+p" now to your kernel command line so
if this happens again, we hopefully have full dmesg to investigate.
Otherwise it is hard to diagnose.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-04  8:27                                           ` Mika Westerberg
@ 2025-03-04 12:52                                             ` Kenneth Crudup
  2025-03-04 13:40                                               ` Mika Westerberg
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-04 12:52 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb


On 3/4/25 00:27, Mika Westerberg wrote:

> If possible add "thunderbolt.dyndbg=+p" now to your kernel command line so
> if this happens again, we hopefully have full dmesg to investigate.

I've not seen any further instances of weird behavior, but I've added 
that to the command line going further.

But I have been doing a fair amount of testing of the kernel with your 
patch and Lucas' NVMe adaptor (etc.) patch and am concerned that you're 
still seeing his issue, as it (at least as of now) hasn't occurred here 
since applying it.

In any case, on the next reboot it'll be applied.

-Kenny

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-04 12:52                                             ` Kenneth Crudup
@ 2025-03-04 13:40                                               ` Mika Westerberg
  2025-03-04 13:48                                                 ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Mika Westerberg @ 2025-03-04 13:40 UTC (permalink / raw)
  To: Kenneth Crudup; +Cc: linux-usb

On Tue, Mar 04, 2025 at 04:52:19AM -0800, Kenneth Crudup wrote:
> 
> On 3/4/25 00:27, Mika Westerberg wrote:
> 
> > If possible add "thunderbolt.dyndbg=+p" now to your kernel command line so
> > if this happens again, we hopefully have full dmesg to investigate.
> 
> I've not seen any further instances of weird behavior, but I've added that
> to the command line going further.

Okay thanks!

> But I have been doing a fair amount of testing of the kernel with your patch
> and Lucas' NVMe adaptor (etc.) patch and am concerned that you're still
> seeing his issue, as it (at least as of now) hasn't occurred here since
> applying it.

It only happens if you have TBT dock and the NVMe connected and you
disconnect them while the system is suspended.

I suggest trying that a couple times and see if that happens. For me it
happened pretty much on first suspend cycle.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-04 13:40                                               ` Mika Westerberg
@ 2025-03-04 13:48                                                 ` Kenneth Crudup
  2025-03-04 13:51                                                   ` Mika Westerberg
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-04 13:48 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb



On 3/4/25 05:40, Mika Westerberg wrote:

> On Tue, Mar 04, 2025 at 04:52:19AM -0800, Kenneth Crudup wrote:

>> But I have been doing a fair amount of testing of the kernel with your patch
>> and Lucas' NVMe adaptor (etc.) patch and am concerned that you're still
>> seeing his issue, as it (at least as of now) hasn't occurred here since
>> applying it.

> It only happens if you have TBT dock and the NVMe connected and you
> disconnect them while the system is suspended. 
> I suggest trying that a couple times and see if that happens. For me it
> happened pretty much on first suspend cycle.

That's exactly the failure mode I was testing for, though ... I've run a 
few iterations with the fix (and about to do one more as I'm about to 
head to a clients' office) and so far, so good.

Is your patch from yesterday applied as well? It is here.

-Kenny

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-04 13:48                                                 ` Kenneth Crudup
@ 2025-03-04 13:51                                                   ` Mika Westerberg
  2025-03-04 17:29                                                     ` Kenneth Crudup
  0 siblings, 1 reply; 34+ messages in thread
From: Mika Westerberg @ 2025-03-04 13:51 UTC (permalink / raw)
  To: Kenneth Crudup; +Cc: linux-usb

On Tue, Mar 04, 2025 at 05:48:12AM -0800, Kenneth Crudup wrote:
> 
> 
> On 3/4/25 05:40, Mika Westerberg wrote:
> 
> > On Tue, Mar 04, 2025 at 04:52:19AM -0800, Kenneth Crudup wrote:
> 
> > > But I have been doing a fair amount of testing of the kernel with your patch
> > > and Lucas' NVMe adaptor (etc.) patch and am concerned that you're still
> > > seeing his issue, as it (at least as of now) hasn't occurred here since
> > > applying it.
> 
> > It only happens if you have TBT dock and the NVMe connected and you
> > disconnect them while the system is suspended. I suggest trying that a
> > couple times and see if that happens. For me it
> > happened pretty much on first suspend cycle.
> 
> That's exactly the failure mode I was testing for, though ... I've run a few
> iterations with the fix (and about to do one more as I'm about to head to a
> clients' office) and so far, so good.
> 
> Is your patch from yesterday applied as well? It is here.

Yes it is.

I don't have display tunneled though, only PCIe (well and USB 3.x).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-04 13:51                                                   ` Mika Westerberg
@ 2025-03-04 17:29                                                     ` Kenneth Crudup
  2025-03-05  8:31                                                       ` Mika Westerberg
  0 siblings, 1 reply; 34+ messages in thread
From: Kenneth Crudup @ 2025-03-04 17:29 UTC (permalink / raw)
  To: Mika Westerberg, Me; +Cc: linux-usb


On 3/4/25 05:51, Mika Westerberg wrote:

>>> It only happens if you have TBT dock and the NVMe connected and you
>>> disconnect them while the system is suspended. I suggest trying that a
>>> couple times and see if that happens. For me it
>>> happened pretty much on first suspend cycle.

So I've tried it twice again today-

1 - CalDigit dock, NVMe adaptor. Put it to sleep, disconnected 
everything, even waited a while (call me crazy, but I swear how long the 
system is suspended seems to make a difference). Opened the lid, and it 
came right up.

2 - CalDigit dock, NVMe adaptor. Hibernated, drove to clients' offices. 
Resumed, came up OK.

Now I'm curious what difference the "4. Authorize both PCIe tunnels, 
verify devices are there." makes to your system, as I have "boltd" 
running and that handles it for me.

Tell ya what- if Linus pushes anything to master today, I'll 
pull/build/boot it and since the TB dyndbg is on, I'll post the dmesg 
from the runs so you can see them when you get in tomorrow.

-Kenny

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes
  2025-03-04 17:29                                                     ` Kenneth Crudup
@ 2025-03-05  8:31                                                       ` Mika Westerberg
  0 siblings, 0 replies; 34+ messages in thread
From: Mika Westerberg @ 2025-03-05  8:31 UTC (permalink / raw)
  To: Kenneth Crudup; +Cc: linux-usb

On Tue, Mar 04, 2025 at 09:29:49AM -0800, Kenneth Crudup wrote:
> 
> On 3/4/25 05:51, Mika Westerberg wrote:
> 
> > > > It only happens if you have TBT dock and the NVMe connected and you
> > > > disconnect them while the system is suspended. I suggest trying that a
> > > > couple times and see if that happens. For me it
> > > > happened pretty much on first suspend cycle.
> 
> So I've tried it twice again today-
> 
> 1 - CalDigit dock, NVMe adaptor. Put it to sleep, disconnected everything,
> even waited a while (call me crazy, but I swear how long the system is
> suspended seems to make a difference). Opened the lid, and it came right up.
> 
> 2 - CalDigit dock, NVMe adaptor. Hibernated, drove to clients' offices.
> Resumed, came up OK.
> 
> Now I'm curious what difference the "4. Authorize both PCIe tunnels, verify
> devices are there." makes to your system, as I have "boltd" running and that
> handles it for me.

It should not matter the underlying mechanism is the same. boltd is fine
here.

Can you try the more "synthetic" way if that makes any difference? E.g do
exactly following steps. Do not connect any monitors to keep DP out of
this.

Also do this first without the latest patch from Lukas so you can see that
the issue actually triggers. Then apply the patch, just that patch nothing
else and try again.

1. Boot the system up, nothing connected.
2. Plug in TBT 4 dock to the host.
3. Plug in TBT NVMe to the TBT 4 dock.
4. Verify that the devices are there (lspci)
5. Enter s2idle:

  # rtcwake -s 30 -mmem

6. Once the system suspends, unplug the device chain.
7. Wait for the system to wake up (it wakes up automatically in 30s).

Repeat steps 2. - 7. several times in a row (say 10).

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2025-03-05  8:31 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-02  4:57 So, I had to revert d6d458d42e1 ("Handle DisplayPort tunnel activation asynchronously") too, to stop my resume crashes Kenneth Crudup
2025-03-02  5:36 ` Kenneth Crudup
2025-03-02 16:26   ` Kenneth Crudup
2025-03-02 16:30     ` Kenneth Crudup
2025-03-03 10:46       ` Mika Westerberg
2025-03-03 11:02         ` Kenneth Crudup
2025-03-03 11:21           ` Mika Westerberg
2025-03-03 11:38             ` Kenneth Crudup
2025-03-03 11:45               ` Kenneth Crudup
2025-03-03 11:55                 ` Mika Westerberg
2025-03-03 12:39                   ` Kenneth Crudup
2025-03-03 12:51                     ` Kenneth Crudup
2025-03-03 11:53               ` Mika Westerberg
2025-03-03 12:33                 ` Kenneth Crudup
2025-03-03 13:13                   ` Mika Westerberg
2025-03-03 13:19                     ` Kenneth Crudup
2025-03-03 13:23                       ` Mika Westerberg
2025-03-03 13:46                         ` Mika Westerberg
2025-03-03 13:53                           ` Kenneth Crudup
2025-03-03 14:01                             ` Mika Westerberg
2025-03-03 14:10                               ` Kenneth Crudup
2025-03-03 14:20                                 ` Mika Westerberg
2025-03-03 14:33                                   ` Kenneth Crudup
2025-03-03 17:58                                     ` Mika Westerberg
2025-03-03 18:20                                       ` Kenneth Crudup
2025-03-03 19:44                                         ` Kenneth Crudup
2025-03-04  8:27                                           ` Mika Westerberg
2025-03-04 12:52                                             ` Kenneth Crudup
2025-03-04 13:40                                               ` Mika Westerberg
2025-03-04 13:48                                                 ` Kenneth Crudup
2025-03-04 13:51                                                   ` Mika Westerberg
2025-03-04 17:29                                                     ` Kenneth Crudup
2025-03-05  8:31                                                       ` Mika Westerberg
2025-03-03 14:17                               ` Kenneth Crudup

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox