public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [RFC] thunderbolt: Add delay for Dell U2725QE link width
@ 2025-12-09  5:41 Chia-Lin Kao (AceLan)
  2025-12-09  7:06 ` Mika Westerberg
  0 siblings, 1 reply; 21+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2025-12-09  5:41 UTC (permalink / raw)
  To: Andreas Noever, Mika Westerberg, Yehezkel Bernat, linux-usb,
	linux-kernel

When plugging in a Dell U2725QE Thunderbolt monitor, the kernel produces
a call trace during initial enumeration. The device automatically
disconnects and reconnects ~3 seconds later, and works correctly on the
second attempt.

Issue Description:
==================
The Dell U2725QE (USB4 device 8087:b26) requires additional time during
link width negotiation from single lane to dual lane. On first plug, the
following sequence occurs:

1. Port state reaches TB_PORT_UP (link established, single lane)
2. Path activation begins immediately
3. tb_path_activate() - > tb_port_write() returns -ENOTCONN (error -107)
4. Call trace is generated at tb_path_activate()
5. Device disconnects/reconnects automatically after ~3 seconds
6. Second attempt succeeds with full dual-lane bandwidth

First attempt dmesg (failure):
-------------------------------
[   36.030347] thunderbolt 0000:c7:00.6: 2:16: available bandwidth for new USB3 tunnel 9000/9000 Mb/s
[   36.030613] thunderbolt 0000:c7:00.6: 2: USB3 tunnel creation failed
[   36.031530] thunderbolt 0000:c7:00.6: PCIe Down path activation failed
[   36.031531] WARNING: drivers/thunderbolt/path.c:589 at 0x0, CPU#12: pool-/usr/libex/3145

Second attempt dmesg (success):
--------------------------------
[   40.440012] thunderbolt 0000:c7:00.6: 2:16: available bandwidth for new USB3 tunnel 36000/36000 Mb/s
[   40.440261] thunderbolt 0000:c7:00.6: 2:16: maximum required bandwidth for USB3 tunnel 9000 Mb/s
[   40.440269] thunderbolt 0000:c7:00.6: 0:4 <-> 2:16 (USB3): activating
[   40.440271] thunderbolt 0000:c7:00.6: 0:4 <-> 2:16 (USB3): allocating initial bandwidth 9000/9000 Mb/s

The bandwidth difference (9000 vs 36000 Mb/s) indicates the first attempt
occurs while the link is still in single-lane mode.

Root Cause Analysis:
====================
The error originates from the Thunderbolt/USB4 device hardware itself:

1. Port config space read/write returns TB_CFG_ERROR_PORT_NOT_CONNECTED
2. This gets translated to -ENOTCONN in tb_cfg_get_error()
3. The port's control channel is temporarily unavailable during state
   transition from single lane to dual lane (lane bonding)

The comment in drivers/thunderbolt/ctl.c explains this is expected:
  "Port is not connected. This can happen during surprise removal.
   Do not warn."

Attempted Solutions:
====================
1. Retry logic on -ENOTCONN in tb_path_activate():
   Result: Caused host port (0:0) lockup with hundreds of "downstream
   port is locked" errors. Rejected by user.

2. Increased tb_port_wait_for_link_width() timeout from 100ms to 3000ms:
   Result: Did not resolve the issue. The timeout increase alone is
   insufficient because the port state hasn't reached TB_PORT_UP when
   lane bonding is attempted.

3. Added msleep(2000) at various points in enumeration flow:
   Locations tested:
   - Before tb_switch_configure(): Works ✓
   - Before tb_switch_add(): Works ✓
   - Before usb4_port_hotplug_enable(): Works ✓
   - After tb_switch_add(): Doesn't work ✗
   - In tb_configure_link(): Doesn't work ✗
   - In tb_switch_lane_bonding_enable(): Doesn't work ✗
   - In tb_port_wait_for_link_width(): Doesn't work ✗

   The pattern shows the delay must occur BEFORE hotplug enable, which
   happens early in tb_switch_port_hotplug_enable() -> usb4_port_hotplug_enable().

Current Workaround:
===================
Add a 2-second delay in tb_wait_for_port() when the port state reaches
TB_PORT_UP. This is the earliest point where we know:
- The link is physically established
- The device is responsive
- But lane width negotiation may still be in progress

This location is chosen because:
1. It's called during port enumeration before any tunnel creation
2. The port has just transitioned to TB_PORT_UP state
3. Allows sufficient time for lane bonding to complete
4. Avoids affecting other code paths

Testing Results:
================
With this patch:
- No call trace on first plug
- Device enumerates correctly on first attempt
- Full bandwidth (36000 Mb/s) available immediately
- No disconnect/reconnect cycle
- USB and PCIe tunnels create successfully

Without this patch:
- Call trace on every first plug
- Only 9000 Mb/s bandwidth (single lane) on first attempt
- Automatic disconnect/reconnect after ~3 seconds
- Second attempt works with 36000 Mb/s

Discussion Points for RFC:
===========================
1. Is a fixed 2-second delay acceptable, or should we poll for a
   specific hardware state?

2. Should we check PORT_CS_18_TIP (Transition In Progress) bit instead
   of using a fixed delay?

3. Is there a better location for this delay in the enumeration flow?

4. Should this be device-specific (based on vendor/device ID) or apply
   to all USB4 devices?

5. The 100ms timeout in tb_switch_lane_bonding_enable() may be too
   short for other devices as well. Should we increase it universally?

Hardware Details:
=================
Device: Dell U2725QE Thunderbolt Monitor
USB4 Router: 8087:b26 (Intel USB4 controller)
Host: AMD Thunderbolt 4 controller (0000:c7:00.6)

Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
---
Full dmesg log available at: https://paste.ubuntu.com/p/CXs2T4XzZ3/
---
 drivers/thunderbolt/switch.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c
index b3948aad0b955..e0c65e5fb0dca 100644
--- a/drivers/thunderbolt/switch.c
+++ b/drivers/thunderbolt/switch.c
@@ -530,6 +530,8 @@ int tb_wait_for_port(struct tb_port *port, bool wait_if_unplugged)
 			return 0;
 
 		case TB_PORT_UP:
+			msleep(2000);
+			fallthrough;
 		case TB_PORT_TX_CL0S:
 		case TB_PORT_RX_CL0S:
 		case TB_PORT_CL1:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2026-01-05 11:19 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-09  5:41 [PATCH] [RFC] thunderbolt: Add delay for Dell U2725QE link width Chia-Lin Kao (AceLan)
2025-12-09  7:06 ` Mika Westerberg
2025-12-09 16:49   ` Mario Limonciello
2025-12-10  5:33     ` Chia-Lin Kao (AceLan)
2025-12-10  3:15   ` Chia-Lin Kao (AceLan)
2025-12-10  7:41     ` Mika Westerberg
2025-12-10 21:42       ` Mario Limonciello
     [not found]         ` <coxrm5gishdztghznuvzafg2pbdk4qk3ttbkbq7t5whsfv2lk5@3gqepcs6h4uc>
2025-12-12 12:39           ` Mika Westerberg
2025-12-12 14:40             ` Mario Limonciello
2025-12-17  3:06               ` AceLan Kao
2025-12-17 12:55                 ` Mika Westerberg
2025-12-17 15:53                   ` Mario Limonciello
2025-12-18  1:38                     ` AceLan Kao
2025-12-18  7:21                       ` Mika Westerberg
     [not found]                         ` <6inne3luvw4ot3wqnsaw3gzhlxtd4756i465oto6so5ox3syxp@kibuv4vhvexx>
2025-12-18 10:20                           ` Mika Westerberg
2025-12-22  1:33                             ` Chia-Lin Kao (AceLan)
2025-12-30  7:30                               ` Mika Westerberg
2025-12-31  1:33                                 ` Chia-Lin Kao (AceLan)
2025-12-31  6:03                                   ` Mika Westerberg
2026-01-02  2:03                                     ` Chia-Lin Kao (AceLan)
2026-01-05 11:19                                       ` Mika Westerberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox