From: Mika Westerberg <mika.westerberg@linux.intel.com>
To: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com>,
Mika Westerberg <westeri@kernel.org>,
Andreas Noever <andreas.noever@gmail.com>,
Yehezkel Bernat <YehezkelShB@gmail.com>,
linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2
Date: Thu, 28 May 2026 12:34:33 +0200 [thread overview]
Message-ID: <20260528103433.GI3102@black.igk.intel.com> (raw)
In-Reply-To: <ahe3zpDyPVj2QRcL@acelan-Precision-5480>
Hi,
On Thu, May 28, 2026 at 11:43:47AM +0800, Chia-Lin Kao (AceLan) wrote:
> Hi Mika,
>
> Sorry for the late reply — I was away for two weeks in early May.
>
> On Thu, Apr 30, 2026 at 12:03:11PM +0200, Mika Westerberg wrote:
> > Hi,
> >
> > On Thu, Apr 30, 2026 at 03:31:42PM +0800, Chia-Lin Kao (AceLan) wrote:
> > > Hi,
> > >
> > > On Dell XPS 14 (Panther Lake) with a WD22TB4 Thunderbolt dock and BenQ
> > > PD2725U external display, the display goes permanently blank on ~50% of
> > > boots. The only way to recover is a full reboot — re-plugging the
> > > monitor or dock does not help.
> > >
> > > The root cause is a race between the USB4 v2 Host Router Reset (HRR)
> > > and the graphics driver initialization:
> > >
> > > 1. nhi_probe() performs HRR at ~t=1s, destroying BIOS-established
> > > DP tunnels.
> > > 2. The Thunderbolt driver re-discovers the dock via hotplug at ~t=4s
> > > and attempts to re-create the DP tunnel.
> > > 3. DPRX negotiation fails because the graphics driver (xe) is not yet
> > > ready — the 12-second timeout expires at ~t=18s.
> > > 4. tb_dp_tunnel_active() permanently removes the DP IN adapter from
> > > available resources on the first failure, so the display never
> > > recovers.
> > >
> > > The fix adds a retry mechanism: on DPRX negotiation failure, the driver
> > > retries up to 3 times with a 5-second delay, giving the graphics driver
> > > time to come up.
> > >
> > > Tested with 13 boot cycles on the affected machine:
> > > - 6 boots hit the HRR + DPRX race: all recovered via retry, display
> > > came online after 3 retry attempts (~58s).
> > > - 5 clean boots (no HRR): DP tunnel established immediately.
> > > - 2 boots with HRR where DPRX succeeded on first try.
> > > - 0 teardowns: the retry mechanism was never exhausted.
> > >
> > > Full dmesg log - https://people.canonical.com/~acelan/bugs/dp-retry-on-hrr/
> >
> > I'm looking at that but the first thing that stands out is this:
> >
> > [ 1.051684] thunderbolt: loading out-of-tree module taints kernel.
> >
> > Which tells me that this has some potential modifications outside of the
> > mainline.
> >
> > Second thing is that it's missing "thunderbolt.dyndbg=+p" that could show
> > what is going on there. I suggest adding that pretty much always.
> >
> > Yes, this can happen and the 12 s idea was that it accounts for the
> > possible time that it takes to boot up (as well as the polling the i915
> > does if it is runtime suspended). I would say that whatever is delaying the
> > boot time should be investigated first because that's not really good user
> > experience.
> >
> > Aside from that if you add "thunderbolt.dprx_timeout=-1" does it work? If
> > really needed we can increase that a bit but I'm not too enthustiatic
> > adding code for retrying this because we do have this timeout that we can
> > adjust as needed (we can make the default higher).
> Thank you for reviewing and for the helpful suggestions.
>
> I have an update on this issue: we've since discovered that a BIOS update
> (from 1.2.1/1.3.1 to 1.5.1) on this Dell XPS 14 (Panther Lake) appears to
> have resolved the blank display problem.
>
> Looking at what changed: with the old BIOS, the firmware pre-established
> PCIe tunnels through the dock during early boot — the dock's xHCI
> (07:00.0) and the OWC NVMe (18:00.0) were already enumerated by BIOS
> before the kernel started. When nhi_probe() performed HRR at ~t=1s, it
> destroyed those BIOS-established tunnels, killing xHCI mid-probe
> ("HC died; cleaning up") and causing the NVMe probe to fail with -EIO.
> The subsequent DP tunnel re-creation then hit the DPRX timeout because
> the graphics driver wasn't ready yet.
>
> With BIOS 1.5.1, the firmware no longer pre-establishes PCIe tunnels to
> dock devices — the TBT root port (00:07.0) doesn't even have IO port
> space allocated anymore. This means HRR has nothing to destroy, and the
> Thunderbolt driver handles all tunnel setup from scratch. We ran 30 reboot
> cycles with the full device set (WD22TB4 dock, BenQ monitor, OWC Envoy
> Express storage) and saw 0% blank display rate.
Okay thanks for the update.
> So it seems the root cause was the BIOS establishing tunnels that the
> kernel's HRR would then tear down, creating the race condition. The BIOS
> vendor fixed it by leaving tunnel establishment to the kernel entirely.
>
> Given this, I think the retry patch is no longer needed for this specific
> platform. That said, the underlying race (HRR destroying BIOS tunnels →
> DPRX timeout → permanent DP IN removal) could still affect other USB4 v2
> platforms where the BIOS does pre-establish tunnels. Would it still be
> worth considering either:
It is not just USB4 v2, it includes v1 too (PTL is v1).
> a) increasing the default dprx_timeout, or
> b) at minimum, not permanently removing the DP IN adapter on the first
> DPRX failure?
There is another discussion about this started by your colleaque at
Canonical I believe let's continue there.
prev parent reply other threads:[~2026-05-28 10:34 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-30 7:31 [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 Chia-Lin Kao (AceLan)
2026-04-30 7:31 ` [PATCH 1/1] thunderbolt: Retry DP tunnel setup on DPRX negotiation failure Chia-Lin Kao (AceLan)
2026-04-30 10:03 ` [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 Mika Westerberg
2026-05-28 3:43 ` Chia-Lin Kao (AceLan)
2026-05-28 10:34 ` Mika Westerberg [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260528103433.GI3102@black.igk.intel.com \
--to=mika.westerberg@linux.intel.com \
--cc=YehezkelShB@gmail.com \
--cc=acelan.kao@canonical.com \
--cc=andreas.noever@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=westeri@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.