Linux USB
 help / color / mirror / Atom feed
From: Mika Westerberg <mika.westerberg@linux.intel.com>
To: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com>,
	Mika Westerberg <westeri@kernel.org>,
	Andreas Noever <andreas.noever@gmail.com>,
	Yehezkel Bernat <YehezkelShB@gmail.com>,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2
Date: Thu, 28 May 2026 12:34:33 +0200	[thread overview]
Message-ID: <20260528103433.GI3102@black.igk.intel.com> (raw)
In-Reply-To: <ahe3zpDyPVj2QRcL@acelan-Precision-5480>

Hi,

On Thu, May 28, 2026 at 11:43:47AM +0800, Chia-Lin Kao (AceLan) wrote:
> Hi Mika,
> 
> Sorry for the late reply — I was away for two weeks in early May.
> 
> On Thu, Apr 30, 2026 at 12:03:11PM +0200, Mika Westerberg wrote:
> > Hi,
> >
> > On Thu, Apr 30, 2026 at 03:31:42PM +0800, Chia-Lin Kao (AceLan) wrote:
> > > Hi,
> > >
> > > On Dell XPS 14 (Panther Lake) with a WD22TB4 Thunderbolt dock and BenQ
> > > PD2725U external display, the display goes permanently blank on ~50% of
> > > boots. The only way to recover is a full reboot — re-plugging the
> > > monitor or dock does not help.
> > >
> > > The root cause is a race between the USB4 v2 Host Router Reset (HRR)
> > > and the graphics driver initialization:
> > >
> > >   1. nhi_probe() performs HRR at ~t=1s, destroying BIOS-established
> > >      DP tunnels.
> > >   2. The Thunderbolt driver re-discovers the dock via hotplug at ~t=4s
> > >      and attempts to re-create the DP tunnel.
> > >   3. DPRX negotiation fails because the graphics driver (xe) is not yet
> > >      ready — the 12-second timeout expires at ~t=18s.
> > >   4. tb_dp_tunnel_active() permanently removes the DP IN adapter from
> > >      available resources on the first failure, so the display never
> > >      recovers.
> > >
> > > The fix adds a retry mechanism: on DPRX negotiation failure, the driver
> > > retries up to 3 times with a 5-second delay, giving the graphics driver
> > > time to come up.
> > >
> > > Tested with 13 boot cycles on the affected machine:
> > >   - 6 boots hit the HRR + DPRX race: all recovered via retry, display
> > >     came online after 3 retry attempts (~58s).
> > >   - 5 clean boots (no HRR): DP tunnel established immediately.
> > >   - 2 boots with HRR where DPRX succeeded on first try.
> > >   - 0 teardowns: the retry mechanism was never exhausted.
> > >
> > > Full dmesg log - https://people.canonical.com/~acelan/bugs/dp-retry-on-hrr/
> >
> > I'm looking at that but the first thing that stands out is this:
> >
> > [    1.051684] thunderbolt: loading out-of-tree module taints kernel.
> >
> > Which tells me that this has some potential modifications outside of the
> > mainline.
> >
> > Second thing is that it's missing "thunderbolt.dyndbg=+p" that could show
> > what is going on there. I suggest adding that pretty much always.
> >
> > Yes, this can happen and the 12 s idea was that it accounts for the
> > possible time that it takes to boot up (as well as the polling the i915
> > does if it is runtime suspended). I would say that whatever is delaying the
> > boot time should be investigated first because that's not really good user
> > experience.
> >
> > Aside from that if you add "thunderbolt.dprx_timeout=-1" does it work? If
> > really needed we can increase that a bit but I'm not too enthustiatic
> > adding code for retrying this because we do have this timeout that we can
> > adjust as needed (we can make the default higher).
> Thank you for reviewing and for the helpful suggestions.
> 
> I have an update on this issue: we've since discovered that a BIOS update
> (from 1.2.1/1.3.1 to 1.5.1) on this Dell XPS 14 (Panther Lake) appears to
> have resolved the blank display problem.
> 
> Looking at what changed: with the old BIOS, the firmware pre-established
> PCIe tunnels through the dock during early boot — the dock's xHCI
> (07:00.0) and the OWC NVMe (18:00.0) were already enumerated by BIOS
> before the kernel started. When nhi_probe() performed HRR at ~t=1s, it
> destroyed those BIOS-established tunnels, killing xHCI mid-probe
> ("HC died; cleaning up") and causing the NVMe probe to fail with -EIO.
> The subsequent DP tunnel re-creation then hit the DPRX timeout because
> the graphics driver wasn't ready yet.
> 
> With BIOS 1.5.1, the firmware no longer pre-establishes PCIe tunnels to
> dock devices — the TBT root port (00:07.0) doesn't even have IO port
> space allocated anymore. This means HRR has nothing to destroy, and the
> Thunderbolt driver handles all tunnel setup from scratch. We ran 30 reboot
> cycles with the full device set (WD22TB4 dock, BenQ monitor, OWC Envoy
> Express storage) and saw 0% blank display rate.

Okay thanks for the update.

> So it seems the root cause was the BIOS establishing tunnels that the
> kernel's HRR would then tear down, creating the race condition. The BIOS
> vendor fixed it by leaving tunnel establishment to the kernel entirely.
> 
> Given this, I think the retry patch is no longer needed for this specific
> platform. That said, the underlying race (HRR destroying BIOS tunnels →
> DPRX timeout → permanent DP IN removal) could still affect other USB4 v2
> platforms where the BIOS does pre-establish tunnels. Would it still be
> worth considering either:

It is not just USB4 v2, it includes v1 too (PTL is v1).

>   a) increasing the default dprx_timeout, or
>   b) at minimum, not permanently removing the DP IN adapter on the first
>      DPRX failure?

There is another discussion about this started by your colleaque at
Canonical I believe let's continue there.

      reply	other threads:[~2026-05-28 10:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30  7:31 [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 Chia-Lin Kao (AceLan)
2026-04-30  7:31 ` [PATCH 1/1] thunderbolt: Retry DP tunnel setup on DPRX negotiation failure Chia-Lin Kao (AceLan)
2026-04-30 10:03 ` [PATCH 0/1] thunderbolt: Fix blank external display after HRR on USB4 v2 Mika Westerberg
2026-05-28  3:43   ` Chia-Lin Kao (AceLan)
2026-05-28 10:34     ` Mika Westerberg [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260528103433.GI3102@black.igk.intel.com \
    --to=mika.westerberg@linux.intel.com \
    --cc=YehezkelShB@gmail.com \
    --cc=acelan.kao@canonical.com \
    --cc=andreas.noever@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=westeri@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox