All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@kernel.org
To: linux-usb@vger.kernel.org
Subject: [Bug 221699] New: PCIe link drops during GPU BAR2 init — Minisforum DEG2 eGPU dock over Thunderbolt 4
Date: Sun, 28 Jun 2026 07:21:41 +0000	[thread overview]
Message-ID: <bug-221699-208809@https.bugzilla.kernel.org/> (raw)

https://bugzilla.kernel.org/show_bug.cgi?id=221699

            Bug ID: 221699
           Summary: PCIe link drops during GPU BAR2 init — Minisforum DEG2
                    eGPU dock over Thunderbolt 4
           Product: Drivers
           Version: 2.5
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: USB
          Assignee: drivers_usb@kernel-bugs.kernel.org
          Reporter: artiom@flexlabs.org
        Regression: No

Created attachment 310385
  --> https://bugzilla.kernel.org/attachment.cgi?id=310385&action=edit
dmesg logs: Connecting the dock with the GPU attached, USB and NVMe
disconnected, using Nvidia driver

An RTX 3080 Ti connected via a Minisforum DEG2 Thunderbolt 5 eGPU dock to a
Thunderbolt-4 laptop running Linux enumerates correctly over the PCIe tunnel,
but the PCIe link then dies roughly 8–10 seconds later, while the GPU driver is
in its early BAR2 (instance-memory / page-table aperture) initialization phase.
This reproduces under both the proprietary NVIDIA driver and the in-tree
nouveau driver, which strongly suggests the fault is below the driver layer —
in the Thunderbolt/PCIe tunnel itself (the kernel thunderbolt/pciehp path, the
dock firmware, or the Intel Barlow Ridge-class controller).

The same dock, cable, and GPU work without issue under Windows, including under
a sustained 3D stress test.

## System configuration:

* Laptop: Lenovo ThinkPad, model 21VVCTO1WW, BIOS N4QET33W (1.12)
* CPU/platform: Intel Panther Lake, integrated Arc B390 iGPU (xe driver)
* Thunderbolt host: Intel Panther Lake Thunderbolt 4 NHI (USB4)
* OS / kernel: Fedora 44, 7.0.13-200.fc44.x86_64
* eGPU dock: Minisforum DEG2 in Thunderbolt mode. Enumerates over Thunderbolt
as "Micro Computer (HK) Tech. Ltd. TBGAA"
* GPU: NVIDIA RTX 3080 Ti
* Cable: DEG2-supplied TB5 cable (confirmed functional under Windows 3D load)
* Dock PSU: Corsair SF850
* Drivers tested: NVIDIA open kernel module 595.80 (GSP-based); nouveau
(in-tree, GSP/r535 path, firmware RM version 570.144)


## Reproducible symptom

1. Dock connected (reproduces on both cold-plug and hot-plug).
2. Thunderbolt/USB4 link authorizes; PCIe hotplug bridge reports pciehp:
Slot(10-2): Card present → Link Up.
3. Full PCIe enumeration succeeds: the Intel switch, the GPU (82:00.0) and its
audio function (82:00.1) appear, and BARs are assigned (BAR1 prefetchable 256
MB, BAR3 prefetchable 32 MB, BAR0 16 MB, etc.).
4. The GPU driver begins bring-up and reaches BAR2 initialization.
5. Within 5 seconds after connecting the cable, the OS becomes visually
unresponsive
6. Approximately 8–10 seconds after Link Up, the GPU stops responding on the
PCIe link:

  * NVIDIA driver (~10 s): NVRM: Xid (PCI:0000:82:00): 79, GPU has fallen off
the bus, immediately followed by BAR2 setup failing as a consequence —
kbusInitVirtualBar2_HAL ... kern_bus.c:469, kbusInitBar2_HAL ...
kern_bus_gm107.c:332 — then pciehp: Slot(10-2): Link Down and RM resource-leak
assertions during teardown.
  * nouveau driver (~8 s): the driver is actively performing BAR2 bring-up
(gf100_bar_oneinit_bar → nvkm_vmm_boot → nvkm_bar_bar2_init/r535_bar_bar2_init)
when MMIO accesses begin returning all-ones: nouveau: timer: stalled at
ffffffffffffffff in g84_bar_flush/tu102_vmm_flush, looping for several seconds,
then pciehp: pcie_do_write_cmd: no response from device.

7. Unplugging the dock immediately restores stability; replugging reproduces
the fault. Leaving it plugged keeps the system unresponsive, and it shortly
resets within 5 more seconds or so.

The key shared detail: under both drivers, the device falls off the bus during
early BAR2 initialization — NVIDIA's "fallen off the bus" detection and its
BAR2-init failure are interleaved in the same instant, and nouveau's hang is
literally inside the BAR2 flush. The two drivers share no relevant code here,
so this points away from a driver bug.

## Directly observed evidence

* Cross-driver reproduction at the same stage. Two independent drivers fail at
BAR2 init (function names above), with the link dropping ~8–10 s after
enumeration.
* GPU-presence is the determining variable. With the GPU removed from the riser
and only the dock's onboard NVMe SSD connected, the dock ran stably for ~37 s
(49 s → 86 s, ended by manual disconnect) with clean USB/UAS/SCSI enumeration
and no link drop or errors. The fault appears only when the GPU is present and
BAR2 init begins.
* Windows is unaffected. The same dock + cable + GPU sustained a 3D stress test
under Windows on a different host with no link loss, indicating the GPU, cable,
dock power delivery, and dock hardware are functional.

## Kernel parameters tried

* Default parameters: GPU enumerates fully, link dies at BAR2 init as described
above. (This is the failure being reported.)
* pcie_aspm=off: system boots, but the eGPU's PCIe link does not initialize —
the GPU does not reach the enumeration/BAR2 stage at all. This is a different
behavior from the default-parameter crash, not a reproduction of it.

## Other potential factors

* GPU, cable, dock power delivery: Functional, tested by sustained 3D stress
test on Windows
* BIOS configuration: All thunderbolt configuration turned on, including PCIe
tunneling
* DEG2 configuration: Verified that the mode switch is set to Thunderbolt
(tested with the Oculink setting, just to be safe, but also TB mode confirmed
working on Windows)

## Suspected root-cause area (hypothesis)

* A kernel thunderbolt/pciehp interaction specific to a discrete GPU's large
memory-mapped BAR2 traffic tunneled through this Barlow Ridge-class controller
over a TB4 host.
* DEG2 / controller firmware failing to sustain the PCIe tunnel once large
BAR-mapped GPU MMIO begins (vs. the lighter NVMe/USB traffic that runs fine).
* A Linux-specific gap in tunnel setup/maintenance that Windows handles
differently (Windows is unaffected under heavier load).

## Attached log files:

1. GPU only: Connecting the dock with the GPU attached, USB and NVMe
disconnected, using Nvidia driver
2. NVMe only: Connecting the dock with GPU removed, only NVMe drive attached
3. Nouveau driver: Connected the dock with the nvidia driver disabled in grub,
using Nouveau driver instead

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

             reply	other threads:[~2026-06-28  7:21 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-28  7:21 bugzilla-daemon [this message]
2026-06-28  7:22 ` [Bug 221699] PCIe link drops during GPU BAR2 init — Minisforum DEG2 eGPU dock over Thunderbolt 4 bugzilla-daemon
2026-06-28  7:28 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-221699-208809@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@kernel.org \
    --cc=linux-usb@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.