* [PATCH 1/1] thunderbolt: Retry DP tunnel setup on DPRX negotiation failure
2026-04-30 7:31 [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 Chia-Lin Kao (AceLan)
@ 2026-04-30 7:31 ` Chia-Lin Kao (AceLan)
2026-04-30 10:03 ` [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 Mika Westerberg
1 sibling, 0 replies; 3+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2026-04-30 7:31 UTC (permalink / raw)
To: Mika Westerberg, Andreas Noever, Yehezkel Bernat; +Cc: linux-usb, linux-kernel
On USB4 v2 routers the Host Router Reset (HRR) performed during
nhi_probe() destroys all BIOS-established tunnels. When the driver
re-creates the DP tunnel after hotplug re-discovery, DPRX negotiation
may fail because the graphics driver is not yet ready.
Currently the driver permanently removes the DP IN adapter from the
available resources on the first DPRX failure, leaving the external
display blank until the next reboot.
Fix this by retrying the DP tunnel setup up to 3 times with a 5-second
delay between attempts, giving the graphics driver time to initialize.
The retry counter is reset on success and on suspend.
Cc: linux-usb@vger.kernel.org
Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
---
drivers/thunderbolt/tb.c | 63 +++++++++++++++++++++++++++++++++-------
1 file changed, 52 insertions(+), 11 deletions(-)
diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c
index c69c323e6952a..19052cac078a2 100644
--- a/drivers/thunderbolt/tb.c
+++ b/drivers/thunderbolt/tb.c
@@ -25,6 +25,15 @@
*/
#define TB_BW_ALLOC_RETRIES 3
+/*
+ * Number of retries for DP tunnel DPRX negotiation if it fails during
+ * boot. This commonly happens on USB4 v2 routers where Host Router
+ * Reset (HRR) destroys BIOS-established tunnels and the Thunderbolt
+ * driver re-creates them before the graphics driver is ready.
+ */
+#define TB_DP_ACTIVATE_RETRIES 3
+#define TB_DP_ACTIVATE_DELAY 5000 /* ms */
+
/*
* Minimum bandwidth (in Mb/s) that is needed in the single transmitter/receiver
* direction. This is 40G - 10% guard band bandwidth.
@@ -59,6 +68,8 @@ MODULE_PARM_DESC(asym_threshold,
* after cfg has been paused.
* @remove_work: Work used to remove any unplugged routers after
* runtime resume
+ * @dp_retry_work: Work used to retry DP tunnel setup after DPRX failure
+ * @dp_retries: Number of remaining DP tunnel activation retries
* @groups: Bandwidth groups used in this domain.
*/
struct tb_cm {
@@ -66,6 +77,8 @@ struct tb_cm {
struct list_head dp_resources;
bool hotplug_active;
struct delayed_work remove_work;
+ struct delayed_work dp_retry_work;
+ int dp_retries;
struct tb_bandwidth_group groups[MAX_GROUPS];
};
@@ -1903,11 +1916,25 @@ static struct tb_port *tb_find_dp_out(struct tb *tb, struct tb_port *in)
return NULL;
}
+static void tb_tunnel_dp(struct tb *tb);
+
+static void tb_dp_retry_work_fn(struct work_struct *work)
+{
+ struct tb_cm *tcm = container_of(work, struct tb_cm,
+ dp_retry_work.work);
+ struct tb *tb = tcm_to_tb(tcm);
+
+ mutex_lock(&tb->lock);
+ tb_tunnel_dp(tb);
+ mutex_unlock(&tb->lock);
+}
+
static void tb_dp_tunnel_active(struct tb_tunnel *tunnel, void *data)
{
struct tb_port *in = tunnel->src_port;
struct tb_port *out = tunnel->dst_port;
struct tb *tb = data;
+ struct tb_cm *tcm = tb_priv(tb);
mutex_lock(&tb->lock);
if (tb_tunnel_is_active(tunnel)) {
@@ -1915,6 +1942,8 @@ static void tb_dp_tunnel_active(struct tb_tunnel *tunnel, void *data)
tb_tunnel_dbg(tunnel, "DPRX capabilities read completed\n");
+ tcm->dp_retries = 0;
+
/* If fail reading tunnel's consumed bandwidth, tear it down */
ret = tb_tunnel_consumed_bandwidth(tunnel, &consumed_up,
&consumed_down);
@@ -1943,8 +1972,6 @@ static void tb_dp_tunnel_active(struct tb_tunnel *tunnel, void *data)
tb_increase_tmu_accuracy(tunnel);
}
} else {
- struct tb_port *in = tunnel->src_port;
-
/*
* This tunnel failed to establish. This means DPRX
* negotiation most likely did not complete which
@@ -1952,16 +1979,26 @@ static void tb_dp_tunnel_active(struct tb_tunnel *tunnel, void *data)
* loaded or not all DP cables where connected to the
* discrete router.
*
- * In both cases we remove the DP IN adapter from the
- * available resources as it is not usable. This will
- * also tear down the tunnel and try to re-use the
- * released DP OUT.
- *
- * It will be added back only if there is hotplug for
- * the DP IN again.
+ * On USB4 v2 routers Host Router Reset (HRR) at boot
+ * destroys BIOS-established tunnels and the driver
+ * re-creates them before the graphics driver is ready.
+ * Retry a few times to allow the graphics driver to
+ * come up.
*/
- tb_tunnel_warn(tunnel, "not active, tearing down\n");
- tb_dp_resource_unavailable(tb, in, "DPRX negotiation failed");
+ if (tcm->dp_retries < TB_DP_ACTIVATE_RETRIES) {
+ tcm->dp_retries++;
+ tb_tunnel_warn(tunnel,
+ "not active, retrying in %d ms (attempt %d/%d)\n",
+ TB_DP_ACTIVATE_DELAY, tcm->dp_retries,
+ TB_DP_ACTIVATE_RETRIES);
+ tb_deactivate_and_free_tunnel(tunnel);
+ queue_delayed_work(tb->wq, &tcm->dp_retry_work,
+ msecs_to_jiffies(TB_DP_ACTIVATE_DELAY));
+ } else {
+ tb_tunnel_warn(tunnel, "not active, tearing down\n");
+ tb_dp_resource_unavailable(tb, in,
+ "DPRX negotiation failed");
+ }
}
mutex_unlock(&tb->lock);
@@ -2937,6 +2974,7 @@ static void tb_stop(struct tb *tb)
struct tb_tunnel *n;
cancel_delayed_work(&tcm->remove_work);
+ cancel_delayed_work(&tcm->dp_retry_work);
/* tunnels are only present after everything has been initialized */
list_for_each_entry_safe(tunnel, n, &tcm->tunnel_list, list) {
/*
@@ -3073,6 +3111,8 @@ static int tb_suspend_noirq(struct tb *tb)
tb_switch_exit_redrive(tb->root_switch);
tb_switch_suspend(tb->root_switch, false);
tcm->hotplug_active = false; /* signal tb_handle_hotplug to quit */
+ cancel_delayed_work(&tcm->dp_retry_work);
+ tcm->dp_retries = 0;
tb_dbg(tb, "suspend finished\n");
return 0;
@@ -3383,6 +3423,7 @@ struct tb *tb_probe(struct tb_nhi *nhi)
INIT_LIST_HEAD(&tcm->tunnel_list);
INIT_LIST_HEAD(&tcm->dp_resources);
INIT_DELAYED_WORK(&tcm->remove_work, tb_remove_work);
+ INIT_DELAYED_WORK(&tcm->dp_retry_work, tb_dp_retry_work_fn);
tb_init_bandwidth_groups(tcm);
tb_dbg(tb, "using software connection manager\n");
--
2.53.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2
2026-04-30 7:31 [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 Chia-Lin Kao (AceLan)
2026-04-30 7:31 ` [PATCH 1/1] thunderbolt: Retry DP tunnel setup on DPRX negotiation failure Chia-Lin Kao (AceLan)
@ 2026-04-30 10:03 ` Mika Westerberg
1 sibling, 0 replies; 3+ messages in thread
From: Mika Westerberg @ 2026-04-30 10:03 UTC (permalink / raw)
To: Chia-Lin Kao (AceLan)
Cc: Mika Westerberg, Andreas Noever, Yehezkel Bernat, linux-usb,
linux-kernel
Hi,
On Thu, Apr 30, 2026 at 03:31:42PM +0800, Chia-Lin Kao (AceLan) wrote:
> Hi,
>
> On Dell XPS 14 (Panther Lake) with a WD22TB4 Thunderbolt dock and BenQ
> PD2725U external display, the display goes permanently blank on ~50% of
> boots. The only way to recover is a full reboot — re-plugging the
> monitor or dock does not help.
>
> The root cause is a race between the USB4 v2 Host Router Reset (HRR)
> and the graphics driver initialization:
>
> 1. nhi_probe() performs HRR at ~t=1s, destroying BIOS-established
> DP tunnels.
> 2. The Thunderbolt driver re-discovers the dock via hotplug at ~t=4s
> and attempts to re-create the DP tunnel.
> 3. DPRX negotiation fails because the graphics driver (xe) is not yet
> ready — the 12-second timeout expires at ~t=18s.
> 4. tb_dp_tunnel_active() permanently removes the DP IN adapter from
> available resources on the first failure, so the display never
> recovers.
>
> The fix adds a retry mechanism: on DPRX negotiation failure, the driver
> retries up to 3 times with a 5-second delay, giving the graphics driver
> time to come up.
>
> Tested with 13 boot cycles on the affected machine:
> - 6 boots hit the HRR + DPRX race: all recovered via retry, display
> came online after 3 retry attempts (~58s).
> - 5 clean boots (no HRR): DP tunnel established immediately.
> - 2 boots with HRR where DPRX succeeded on first try.
> - 0 teardowns: the retry mechanism was never exhausted.
>
> Full dmesg log - https://people.canonical.com/~acelan/bugs/dp-retry-on-hrr/
I'm looking at that but the first thing that stands out is this:
[ 1.051684] thunderbolt: loading out-of-tree module taints kernel.
Which tells me that this has some potential modifications outside of the
mainline.
Second thing is that it's missing "thunderbolt.dyndbg=+p" that could show
what is going on there. I suggest adding that pretty much always.
Yes, this can happen and the 12 s idea was that it accounts for the
possible time that it takes to boot up (as well as the polling the i915
does if it is runtime suspended). I would say that whatever is delaying the
boot time should be investigated first because that's not really good user
experience.
Aside from that if you add "thunderbolt.dprx_timeout=-1" does it work? If
really needed we can increase that a bit but I'm not too enthustiatic
adding code for retrying this because we do have this timeout that we can
adjust as needed (we can make the default higher).
^ permalink raw reply [flat|nested] 3+ messages in thread