* [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2
@ 2026-04-30 7:31 Chia-Lin Kao (AceLan)
2026-04-30 7:31 ` [PATCH 1/1] thunderbolt: Retry DP tunnel setup on DPRX negotiation failure Chia-Lin Kao (AceLan)
2026-04-30 10:03 ` [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 Mika Westerberg
0 siblings, 2 replies; 5+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2026-04-30 7:31 UTC (permalink / raw)
To: Mika Westerberg, Andreas Noever, Yehezkel Bernat; +Cc: linux-usb, linux-kernel
Hi,
On Dell XPS 14 (Panther Lake) with a WD22TB4 Thunderbolt dock and BenQ
PD2725U external display, the display goes permanently blank on ~50% of
boots. The only way to recover is a full reboot — re-plugging the
monitor or dock does not help.
The root cause is a race between the USB4 v2 Host Router Reset (HRR)
and the graphics driver initialization:
1. nhi_probe() performs HRR at ~t=1s, destroying BIOS-established
DP tunnels.
2. The Thunderbolt driver re-discovers the dock via hotplug at ~t=4s
and attempts to re-create the DP tunnel.
3. DPRX negotiation fails because the graphics driver (xe) is not yet
ready — the 12-second timeout expires at ~t=18s.
4. tb_dp_tunnel_active() permanently removes the DP IN adapter from
available resources on the first failure, so the display never
recovers.
The fix adds a retry mechanism: on DPRX negotiation failure, the driver
retries up to 3 times with a 5-second delay, giving the graphics driver
time to come up.
Tested with 13 boot cycles on the affected machine:
- 6 boots hit the HRR + DPRX race: all recovered via retry, display
came online after 3 retry attempts (~58s).
- 5 clean boots (no HRR): DP tunnel established immediately.
- 2 boots with HRR where DPRX succeeded on first try.
- 0 teardowns: the retry mechanism was never exhausted.
Full dmesg log - https://people.canonical.com/~acelan/bugs/dp-retry-on-hrr/
Thanks,
AceLan Kao
Chia-Lin Kao (AceLan) (1):
thunderbolt: Retry DP tunnel setup on DPRX negotiation failure
drivers/thunderbolt/tb.c | 63 +++++++++++++++++++++++++++++++++-------
1 file changed, 52 insertions(+), 11 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/1] thunderbolt: Retry DP tunnel setup on DPRX negotiation failure 2026-04-30 7:31 [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 Chia-Lin Kao (AceLan) @ 2026-04-30 7:31 ` Chia-Lin Kao (AceLan) 2026-04-30 10:03 ` [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 Mika Westerberg 1 sibling, 0 replies; 5+ messages in thread From: Chia-Lin Kao (AceLan) @ 2026-04-30 7:31 UTC (permalink / raw) To: Mika Westerberg, Andreas Noever, Yehezkel Bernat; +Cc: linux-usb, linux-kernel On USB4 v2 routers the Host Router Reset (HRR) performed during nhi_probe() destroys all BIOS-established tunnels. When the driver re-creates the DP tunnel after hotplug re-discovery, DPRX negotiation may fail because the graphics driver is not yet ready. Currently the driver permanently removes the DP IN adapter from the available resources on the first DPRX failure, leaving the external display blank until the next reboot. Fix this by retrying the DP tunnel setup up to 3 times with a 5-second delay between attempts, giving the graphics driver time to initialize. The retry counter is reset on success and on suspend. Cc: linux-usb@vger.kernel.org Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> --- drivers/thunderbolt/tb.c | 63 +++++++++++++++++++++++++++++++++------- 1 file changed, 52 insertions(+), 11 deletions(-) diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c index c69c323e6952a..19052cac078a2 100644 --- a/drivers/thunderbolt/tb.c +++ b/drivers/thunderbolt/tb.c @@ -25,6 +25,15 @@ */ #define TB_BW_ALLOC_RETRIES 3 +/* + * Number of retries for DP tunnel DPRX negotiation if it fails during + * boot. This commonly happens on USB4 v2 routers where Host Router + * Reset (HRR) destroys BIOS-established tunnels and the Thunderbolt + * driver re-creates them before the graphics driver is ready. + */ +#define TB_DP_ACTIVATE_RETRIES 3 +#define TB_DP_ACTIVATE_DELAY 5000 /* ms */ + /* * Minimum bandwidth (in Mb/s) that is needed in the single transmitter/receiver * direction. This is 40G - 10% guard band bandwidth. @@ -59,6 +68,8 @@ MODULE_PARM_DESC(asym_threshold, * after cfg has been paused. * @remove_work: Work used to remove any unplugged routers after * runtime resume + * @dp_retry_work: Work used to retry DP tunnel setup after DPRX failure + * @dp_retries: Number of remaining DP tunnel activation retries * @groups: Bandwidth groups used in this domain. */ struct tb_cm { @@ -66,6 +77,8 @@ struct tb_cm { struct list_head dp_resources; bool hotplug_active; struct delayed_work remove_work; + struct delayed_work dp_retry_work; + int dp_retries; struct tb_bandwidth_group groups[MAX_GROUPS]; }; @@ -1903,11 +1916,25 @@ static struct tb_port *tb_find_dp_out(struct tb *tb, struct tb_port *in) return NULL; } +static void tb_tunnel_dp(struct tb *tb); + +static void tb_dp_retry_work_fn(struct work_struct *work) +{ + struct tb_cm *tcm = container_of(work, struct tb_cm, + dp_retry_work.work); + struct tb *tb = tcm_to_tb(tcm); + + mutex_lock(&tb->lock); + tb_tunnel_dp(tb); + mutex_unlock(&tb->lock); +} + static void tb_dp_tunnel_active(struct tb_tunnel *tunnel, void *data) { struct tb_port *in = tunnel->src_port; struct tb_port *out = tunnel->dst_port; struct tb *tb = data; + struct tb_cm *tcm = tb_priv(tb); mutex_lock(&tb->lock); if (tb_tunnel_is_active(tunnel)) { @@ -1915,6 +1942,8 @@ static void tb_dp_tunnel_active(struct tb_tunnel *tunnel, void *data) tb_tunnel_dbg(tunnel, "DPRX capabilities read completed\n"); + tcm->dp_retries = 0; + /* If fail reading tunnel's consumed bandwidth, tear it down */ ret = tb_tunnel_consumed_bandwidth(tunnel, &consumed_up, &consumed_down); @@ -1943,8 +1972,6 @@ static void tb_dp_tunnel_active(struct tb_tunnel *tunnel, void *data) tb_increase_tmu_accuracy(tunnel); } } else { - struct tb_port *in = tunnel->src_port; - /* * This tunnel failed to establish. This means DPRX * negotiation most likely did not complete which @@ -1952,16 +1979,26 @@ static void tb_dp_tunnel_active(struct tb_tunnel *tunnel, void *data) * loaded or not all DP cables where connected to the * discrete router. * - * In both cases we remove the DP IN adapter from the - * available resources as it is not usable. This will - * also tear down the tunnel and try to re-use the - * released DP OUT. - * - * It will be added back only if there is hotplug for - * the DP IN again. + * On USB4 v2 routers Host Router Reset (HRR) at boot + * destroys BIOS-established tunnels and the driver + * re-creates them before the graphics driver is ready. + * Retry a few times to allow the graphics driver to + * come up. */ - tb_tunnel_warn(tunnel, "not active, tearing down\n"); - tb_dp_resource_unavailable(tb, in, "DPRX negotiation failed"); + if (tcm->dp_retries < TB_DP_ACTIVATE_RETRIES) { + tcm->dp_retries++; + tb_tunnel_warn(tunnel, + "not active, retrying in %d ms (attempt %d/%d)\n", + TB_DP_ACTIVATE_DELAY, tcm->dp_retries, + TB_DP_ACTIVATE_RETRIES); + tb_deactivate_and_free_tunnel(tunnel); + queue_delayed_work(tb->wq, &tcm->dp_retry_work, + msecs_to_jiffies(TB_DP_ACTIVATE_DELAY)); + } else { + tb_tunnel_warn(tunnel, "not active, tearing down\n"); + tb_dp_resource_unavailable(tb, in, + "DPRX negotiation failed"); + } } mutex_unlock(&tb->lock); @@ -2937,6 +2974,7 @@ static void tb_stop(struct tb *tb) struct tb_tunnel *n; cancel_delayed_work(&tcm->remove_work); + cancel_delayed_work(&tcm->dp_retry_work); /* tunnels are only present after everything has been initialized */ list_for_each_entry_safe(tunnel, n, &tcm->tunnel_list, list) { /* @@ -3073,6 +3111,8 @@ static int tb_suspend_noirq(struct tb *tb) tb_switch_exit_redrive(tb->root_switch); tb_switch_suspend(tb->root_switch, false); tcm->hotplug_active = false; /* signal tb_handle_hotplug to quit */ + cancel_delayed_work(&tcm->dp_retry_work); + tcm->dp_retries = 0; tb_dbg(tb, "suspend finished\n"); return 0; @@ -3383,6 +3423,7 @@ struct tb *tb_probe(struct tb_nhi *nhi) INIT_LIST_HEAD(&tcm->tunnel_list); INIT_LIST_HEAD(&tcm->dp_resources); INIT_DELAYED_WORK(&tcm->remove_work, tb_remove_work); + INIT_DELAYED_WORK(&tcm->dp_retry_work, tb_dp_retry_work_fn); tb_init_bandwidth_groups(tcm); tb_dbg(tb, "using software connection manager\n"); -- 2.53.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 2026-04-30 7:31 [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 Chia-Lin Kao (AceLan) 2026-04-30 7:31 ` [PATCH 1/1] thunderbolt: Retry DP tunnel setup on DPRX negotiation failure Chia-Lin Kao (AceLan) @ 2026-04-30 10:03 ` Mika Westerberg 2026-05-28 3:43 ` Chia-Lin Kao (AceLan) 1 sibling, 1 reply; 5+ messages in thread From: Mika Westerberg @ 2026-04-30 10:03 UTC (permalink / raw) To: Chia-Lin Kao (AceLan) Cc: Mika Westerberg, Andreas Noever, Yehezkel Bernat, linux-usb, linux-kernel Hi, On Thu, Apr 30, 2026 at 03:31:42PM +0800, Chia-Lin Kao (AceLan) wrote: > Hi, > > On Dell XPS 14 (Panther Lake) with a WD22TB4 Thunderbolt dock and BenQ > PD2725U external display, the display goes permanently blank on ~50% of > boots. The only way to recover is a full reboot — re-plugging the > monitor or dock does not help. > > The root cause is a race between the USB4 v2 Host Router Reset (HRR) > and the graphics driver initialization: > > 1. nhi_probe() performs HRR at ~t=1s, destroying BIOS-established > DP tunnels. > 2. The Thunderbolt driver re-discovers the dock via hotplug at ~t=4s > and attempts to re-create the DP tunnel. > 3. DPRX negotiation fails because the graphics driver (xe) is not yet > ready — the 12-second timeout expires at ~t=18s. > 4. tb_dp_tunnel_active() permanently removes the DP IN adapter from > available resources on the first failure, so the display never > recovers. > > The fix adds a retry mechanism: on DPRX negotiation failure, the driver > retries up to 3 times with a 5-second delay, giving the graphics driver > time to come up. > > Tested with 13 boot cycles on the affected machine: > - 6 boots hit the HRR + DPRX race: all recovered via retry, display > came online after 3 retry attempts (~58s). > - 5 clean boots (no HRR): DP tunnel established immediately. > - 2 boots with HRR where DPRX succeeded on first try. > - 0 teardowns: the retry mechanism was never exhausted. > > Full dmesg log - https://people.canonical.com/~acelan/bugs/dp-retry-on-hrr/ I'm looking at that but the first thing that stands out is this: [ 1.051684] thunderbolt: loading out-of-tree module taints kernel. Which tells me that this has some potential modifications outside of the mainline. Second thing is that it's missing "thunderbolt.dyndbg=+p" that could show what is going on there. I suggest adding that pretty much always. Yes, this can happen and the 12 s idea was that it accounts for the possible time that it takes to boot up (as well as the polling the i915 does if it is runtime suspended). I would say that whatever is delaying the boot time should be investigated first because that's not really good user experience. Aside from that if you add "thunderbolt.dprx_timeout=-1" does it work? If really needed we can increase that a bit but I'm not too enthustiatic adding code for retrying this because we do have this timeout that we can adjust as needed (we can make the default higher). ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 2026-04-30 10:03 ` [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 Mika Westerberg @ 2026-05-28 3:43 ` Chia-Lin Kao (AceLan) 2026-05-28 10:34 ` Mika Westerberg 0 siblings, 1 reply; 5+ messages in thread From: Chia-Lin Kao (AceLan) @ 2026-05-28 3:43 UTC (permalink / raw) To: Mika Westerberg Cc: Mika Westerberg, Andreas Noever, Yehezkel Bernat, linux-usb, linux-kernel Hi Mika, Sorry for the late reply — I was away for two weeks in early May. On Thu, Apr 30, 2026 at 12:03:11PM +0200, Mika Westerberg wrote: > Hi, > > On Thu, Apr 30, 2026 at 03:31:42PM +0800, Chia-Lin Kao (AceLan) wrote: > > Hi, > > > > On Dell XPS 14 (Panther Lake) with a WD22TB4 Thunderbolt dock and BenQ > > PD2725U external display, the display goes permanently blank on ~50% of > > boots. The only way to recover is a full reboot — re-plugging the > > monitor or dock does not help. > > > > The root cause is a race between the USB4 v2 Host Router Reset (HRR) > > and the graphics driver initialization: > > > > 1. nhi_probe() performs HRR at ~t=1s, destroying BIOS-established > > DP tunnels. > > 2. The Thunderbolt driver re-discovers the dock via hotplug at ~t=4s > > and attempts to re-create the DP tunnel. > > 3. DPRX negotiation fails because the graphics driver (xe) is not yet > > ready — the 12-second timeout expires at ~t=18s. > > 4. tb_dp_tunnel_active() permanently removes the DP IN adapter from > > available resources on the first failure, so the display never > > recovers. > > > > The fix adds a retry mechanism: on DPRX negotiation failure, the driver > > retries up to 3 times with a 5-second delay, giving the graphics driver > > time to come up. > > > > Tested with 13 boot cycles on the affected machine: > > - 6 boots hit the HRR + DPRX race: all recovered via retry, display > > came online after 3 retry attempts (~58s). > > - 5 clean boots (no HRR): DP tunnel established immediately. > > - 2 boots with HRR where DPRX succeeded on first try. > > - 0 teardowns: the retry mechanism was never exhausted. > > > > Full dmesg log - https://people.canonical.com/~acelan/bugs/dp-retry-on-hrr/ > > I'm looking at that but the first thing that stands out is this: > > [ 1.051684] thunderbolt: loading out-of-tree module taints kernel. > > Which tells me that this has some potential modifications outside of the > mainline. > > Second thing is that it's missing "thunderbolt.dyndbg=+p" that could show > what is going on there. I suggest adding that pretty much always. > > Yes, this can happen and the 12 s idea was that it accounts for the > possible time that it takes to boot up (as well as the polling the i915 > does if it is runtime suspended). I would say that whatever is delaying the > boot time should be investigated first because that's not really good user > experience. > > Aside from that if you add "thunderbolt.dprx_timeout=-1" does it work? If > really needed we can increase that a bit but I'm not too enthustiatic > adding code for retrying this because we do have this timeout that we can > adjust as needed (we can make the default higher). Thank you for reviewing and for the helpful suggestions. I have an update on this issue: we've since discovered that a BIOS update (from 1.2.1/1.3.1 to 1.5.1) on this Dell XPS 14 (Panther Lake) appears to have resolved the blank display problem. Looking at what changed: with the old BIOS, the firmware pre-established PCIe tunnels through the dock during early boot — the dock's xHCI (07:00.0) and the OWC NVMe (18:00.0) were already enumerated by BIOS before the kernel started. When nhi_probe() performed HRR at ~t=1s, it destroyed those BIOS-established tunnels, killing xHCI mid-probe ("HC died; cleaning up") and causing the NVMe probe to fail with -EIO. The subsequent DP tunnel re-creation then hit the DPRX timeout because the graphics driver wasn't ready yet. With BIOS 1.5.1, the firmware no longer pre-establishes PCIe tunnels to dock devices — the TBT root port (00:07.0) doesn't even have IO port space allocated anymore. This means HRR has nothing to destroy, and the Thunderbolt driver handles all tunnel setup from scratch. We ran 30 reboot cycles with the full device set (WD22TB4 dock, BenQ monitor, OWC Envoy Express storage) and saw 0% blank display rate. So it seems the root cause was the BIOS establishing tunnels that the kernel's HRR would then tear down, creating the race condition. The BIOS vendor fixed it by leaving tunnel establishment to the kernel entirely. Given this, I think the retry patch is no longer needed for this specific platform. That said, the underlying race (HRR destroying BIOS tunnels → DPRX timeout → permanent DP IN removal) could still affect other USB4 v2 platforms where the BIOS does pre-establish tunnels. Would it still be worth considering either: a) increasing the default dprx_timeout, or b) at minimum, not permanently removing the DP IN adapter on the first DPRX failure? Thanks again for the guidance. Best regards, AceLan Kao. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 2026-05-28 3:43 ` Chia-Lin Kao (AceLan) @ 2026-05-28 10:34 ` Mika Westerberg 0 siblings, 0 replies; 5+ messages in thread From: Mika Westerberg @ 2026-05-28 10:34 UTC (permalink / raw) To: Chia-Lin Kao (AceLan), Mika Westerberg, Andreas Noever, Yehezkel Bernat, linux-usb, linux-kernel Hi, On Thu, May 28, 2026 at 11:43:47AM +0800, Chia-Lin Kao (AceLan) wrote: > Hi Mika, > > Sorry for the late reply — I was away for two weeks in early May. > > On Thu, Apr 30, 2026 at 12:03:11PM +0200, Mika Westerberg wrote: > > Hi, > > > > On Thu, Apr 30, 2026 at 03:31:42PM +0800, Chia-Lin Kao (AceLan) wrote: > > > Hi, > > > > > > On Dell XPS 14 (Panther Lake) with a WD22TB4 Thunderbolt dock and BenQ > > > PD2725U external display, the display goes permanently blank on ~50% of > > > boots. The only way to recover is a full reboot — re-plugging the > > > monitor or dock does not help. > > > > > > The root cause is a race between the USB4 v2 Host Router Reset (HRR) > > > and the graphics driver initialization: > > > > > > 1. nhi_probe() performs HRR at ~t=1s, destroying BIOS-established > > > DP tunnels. > > > 2. The Thunderbolt driver re-discovers the dock via hotplug at ~t=4s > > > and attempts to re-create the DP tunnel. > > > 3. DPRX negotiation fails because the graphics driver (xe) is not yet > > > ready — the 12-second timeout expires at ~t=18s. > > > 4. tb_dp_tunnel_active() permanently removes the DP IN adapter from > > > available resources on the first failure, so the display never > > > recovers. > > > > > > The fix adds a retry mechanism: on DPRX negotiation failure, the driver > > > retries up to 3 times with a 5-second delay, giving the graphics driver > > > time to come up. > > > > > > Tested with 13 boot cycles on the affected machine: > > > - 6 boots hit the HRR + DPRX race: all recovered via retry, display > > > came online after 3 retry attempts (~58s). > > > - 5 clean boots (no HRR): DP tunnel established immediately. > > > - 2 boots with HRR where DPRX succeeded on first try. > > > - 0 teardowns: the retry mechanism was never exhausted. > > > > > > Full dmesg log - https://people.canonical.com/~acelan/bugs/dp-retry-on-hrr/ > > > > I'm looking at that but the first thing that stands out is this: > > > > [ 1.051684] thunderbolt: loading out-of-tree module taints kernel. > > > > Which tells me that this has some potential modifications outside of the > > mainline. > > > > Second thing is that it's missing "thunderbolt.dyndbg=+p" that could show > > what is going on there. I suggest adding that pretty much always. > > > > Yes, this can happen and the 12 s idea was that it accounts for the > > possible time that it takes to boot up (as well as the polling the i915 > > does if it is runtime suspended). I would say that whatever is delaying the > > boot time should be investigated first because that's not really good user > > experience. > > > > Aside from that if you add "thunderbolt.dprx_timeout=-1" does it work? If > > really needed we can increase that a bit but I'm not too enthustiatic > > adding code for retrying this because we do have this timeout that we can > > adjust as needed (we can make the default higher). > Thank you for reviewing and for the helpful suggestions. > > I have an update on this issue: we've since discovered that a BIOS update > (from 1.2.1/1.3.1 to 1.5.1) on this Dell XPS 14 (Panther Lake) appears to > have resolved the blank display problem. > > Looking at what changed: with the old BIOS, the firmware pre-established > PCIe tunnels through the dock during early boot — the dock's xHCI > (07:00.0) and the OWC NVMe (18:00.0) were already enumerated by BIOS > before the kernel started. When nhi_probe() performed HRR at ~t=1s, it > destroyed those BIOS-established tunnels, killing xHCI mid-probe > ("HC died; cleaning up") and causing the NVMe probe to fail with -EIO. > The subsequent DP tunnel re-creation then hit the DPRX timeout because > the graphics driver wasn't ready yet. > > With BIOS 1.5.1, the firmware no longer pre-establishes PCIe tunnels to > dock devices — the TBT root port (00:07.0) doesn't even have IO port > space allocated anymore. This means HRR has nothing to destroy, and the > Thunderbolt driver handles all tunnel setup from scratch. We ran 30 reboot > cycles with the full device set (WD22TB4 dock, BenQ monitor, OWC Envoy > Express storage) and saw 0% blank display rate. Okay thanks for the update. > So it seems the root cause was the BIOS establishing tunnels that the > kernel's HRR would then tear down, creating the race condition. The BIOS > vendor fixed it by leaving tunnel establishment to the kernel entirely. > > Given this, I think the retry patch is no longer needed for this specific > platform. That said, the underlying race (HRR destroying BIOS tunnels → > DPRX timeout → permanent DP IN removal) could still affect other USB4 v2 > platforms where the BIOS does pre-establish tunnels. Would it still be > worth considering either: It is not just USB4 v2, it includes v1 too (PTL is v1). > a) increasing the default dprx_timeout, or > b) at minimum, not permanently removing the DP IN adapter on the first > DPRX failure? There is another discussion about this started by your colleaque at Canonical I believe let's continue there. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-28 10:34 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-30 7:31 [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 Chia-Lin Kao (AceLan) 2026-04-30 7:31 ` [PATCH 1/1] thunderbolt: Retry DP tunnel setup on DPRX negotiation failure Chia-Lin Kao (AceLan) 2026-04-30 10:03 ` [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 Mika Westerberg 2026-05-28 3:43 ` Chia-Lin Kao (AceLan) 2026-05-28 10:34 ` Mika Westerberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox