From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Cristian Cocos <cristi@ieee.org>
Cc: linux-pci@vger.kernel.org
Subject: Re: ReBAR over Thunderbolt
Date: Mon, 30 Mar 2026 16:23:40 +0300 (EEST) [thread overview]
Message-ID: <2f6838fa-dd7f-848d-4788-8b04a23b5753@linux.intel.com> (raw)
In-Reply-To: <46877507be26e594e42ccf8aa9daaac2656dbd1d.camel@ieee.org>
[-- Attachment #1: Type: text/plain, Size: 14207 bytes --]
On Sat, 28 Mar 2026, Cristian Cocos wrote:
> Here below is the log excerpt annotated with section headers as
> promised. Full raw log file to be provided upon request.
>
> === ReBAR Hotplug Resize Failure on Thunderbolt 4 ===
> Hardware: AMD RX 9060 XT (Navi 44, PCI 1002:7590, 16 GB VRAM) in Razer
> Core X V2 eGPU
> Host: Framework Laptop 16 (AMD Ryzen 7 7840HS), TB4 via USB4 root port
> Kernel: linux-zen 6.19.8.zen1-1-zen (Arch)
> Driver: amdgpu (rebar=0 NOT set — allowing resize attempt)
> Boot: eGPU disconnected at boot, then hotplugged after login
>
> PCI topology:
> 00:07.0 (root port) → 02:00.0 (TB switch) → 03:00.0 (DS port)
> → 04:00.0 (AMD switch) → 05:00.0 (AMD switch) → 06:00.0 (GPU)
>
> Note: When this GPU is connected at boot with BIOS "OS Native Resource
> Balance" disabled AND thunderbolt.host_reset=0 on the kernel cmdline,
> the firmware pre-assigns a full 16 GB BAR and the driver uses it
> without needing to resize. The hotplug path below shows where the
> kernel's runtime resize fails.
>
> === Relevant dmesg lines (BAR discovery, resize attempt, failure) ===
>
> --- Initial BAR discovery during TB hotplug enumeration ---
> [ 289.893050] pci 0000:06:00.0: BAR 0 [mem 0x00000000-0x0fffffff 64bit
> pref]
> [ 289.893057] pci 0000:06:00.0: BAR 2 [mem 0x00000000-0x001fffff 64bit
> pref]
> [ 289.893062] pci 0000:06:00.0: BAR 4 [io 0x0000-0x00ff]
> [ 289.893067] pci 0000:06:00.0: BAR 5 [mem 0x00000000-0x0007ffff]
>
> --- Bridge window assignment (root port allocates 4 GB prefetchable) --
> -
> [ 289.895531] pci 0000:02:00.0: bridge window [mem 0x10000000-
> 0x204fffff 64bit pref] to [bus 03-2b] add_size 600000 add_align
> 10000000
> [ 289.895535] pci 0000:02:00.0: bridge window [mem 0x00100000-
> 0x005fffff] to [bus 03-2b] add_size 600000 add_align 100000
> [ 292.204937] pcieport 0000:00:07.0: bridge window [mem 0x6000000000-
> 0x60ffffffff 64bit pref]: was not released (still contains assigned
> resources)
>
> --- Initial BAR assignment (256 MB) ---
> [ 289.895644] pci 0000:06:00.0: BAR 0 [mem 0x6000000000-0x600fffffff
> 64bit pref]: assigned
> [ 289.895663] pci 0000:06:00.0: BAR 2 [mem 0x6010000000-0x60101fffff
> 64bit pref]: assigned
> [ 289.895681] pci 0000:06:00.0: BAR 5 [mem 0x60000000-0x6007ffff]:
> assigned
>
> --- amdgpu driver loads, releases BARs for resize ---
> [ 292.191384] amdgpu 0000:06:00.0: amdgpu: Fetched VBIOS from ROM BAR
> [ 292.204930] amdgpu 0000:06:00.0: BAR 0 [mem 0x6000000000-
> 0x600fffffff 64bit pref]: releasing
> [ 292.204932] amdgpu 0000:06:00.0: BAR 2 [mem 0x6010000000-
> 0x60101fffff 64bit pref]: releasing
> [ 292.204933] pcieport 0000:05:00.0: bridge window [mem 0x6000000000-
> 0x60101fffff 64bit pref]: releasing
> [ 292.204934] pcieport 0000:04:00.0: bridge window [mem 0x6000000000-
> 0x60101fffff 64bit pref]: releasing
> [ 292.204935] pcieport 0000:03:00.0: bridge window [mem 0x6000000000-
> 0x60101fffff 64bit pref]: releasing
> [ 292.204936] pcieport 0000:02:00.0: bridge window [mem 0x6000000000-
> 0x60f04fffff 64bit pref]: was not released (still contains assigned
> resources)
> [ 292.204937] pcieport 0000:00:07.0: bridge window [mem 0x6000000000-
> 0x60ffffffff 64bit pref]: was not released (still contains assigned
> resources)
Hi,
These 2 lines indicate where the reason likely lies. BAR resize attempted
to release these windows but there are some sibling resourcers that
prevent the resize from succeeding. I cannot see what those resources are
from these excerpts.
Basically, the resize walks upwards in the PCI hierarchy and tries to free
any bridge window it encounters in order to allow them to be sized freely
(enlarged). The device whose BAR is being resized has its resources
released prior to the resize commencing, but any other device under any of
those bridge windows are not and they pin their bridge windows in-place
(it won't be possible to release them in the general case as some of
those devices may be in use at this point).
It may be possible to workaround this problem by manually removing the
sibling devices that pin the relevant bridge windows, performing resize
through sysfs manually, and then rescanning (but with GPUs it may be
problematic if it's the primary GPU).
Providing a large enough pci=hpmmioprefsize=xx parameter might be able to
make the bridge windows big enough from boot (possibly combining it
with pci=realloc may be required as well). But it could also make things
worse (as realloc algorithm lacks safeguards to detect when the resource
fit was worse than the original). And it could be you run out of iomem
space with large hpmmioprefsize as it's not targetted specifically to the
bridge window of interest.
--
i.
> --- 16 GB resize request fails at every bridge level ---
> [ 292.204953] pcieport 0000:03:00.0: bridge window [mem size
> 0x400200000 64bit pref]: can't assign; no space
> [ 292.204954] pcieport 0000:03:00.0: bridge window [mem size
> 0x400200000 64bit pref]: failed to assign
> [ 292.204955] pcieport 0000:03:00.0: bridge window [io size 0x1000]:
> can't assign; no space
> [ 292.204956] pcieport 0000:03:00.0: bridge window [io size 0x1000]:
> failed to assign
> [ 292.204956] pcieport 0000:03:01.0: bridge window [io size 0x1000]:
> can't assign; no space
> [ 292.204957] pcieport 0000:03:01.0: bridge window [io size 0x1000]:
> failed to assign
> [ 292.204958] pcieport 0000:03:02.0: bridge window [io size 0x1000]:
> can't assign; no space
> [ 292.204958] pcieport 0000:03:02.0: bridge window [io size 0x1000]:
> failed to assign
> [ 292.204959] pcieport 0000:03:03.0: bridge window [io size 0x1000]:
> can't assign; no space
> [ 292.204960] pcieport 0000:03:03.0: bridge window [io size 0x1000]:
> failed to assign
> [ 292.204961] pcieport 0000:03:00.0: bridge window [mem size
> 0x400200000 64bit pref]: can't assign; no space
> [ 292.204962] pcieport 0000:03:00.0: bridge window [mem size
> 0x400200000 64bit pref]: failed to assign
> [ 292.204962] pcieport 0000:03:00.0: bridge window [io size 0x1000]:
> can't assign; no space
> [ 292.204963] pcieport 0000:03:00.0: bridge window [io size 0x1000]:
> failed to assign
> [ 292.204964] pcieport 0000:03:01.0: bridge window [io size 0x1000]:
> can't assign; no space
> [ 292.204964] pcieport 0000:03:01.0: bridge window [io size 0x1000]:
> failed to assign
> [ 292.204965] pcieport 0000:03:02.0: bridge window [io size 0x1000]:
> can't assign; no space
> [ 292.204965] pcieport 0000:03:02.0: bridge window [io size 0x1000]:
> failed to assign
> [ 292.204966] pcieport 0000:03:03.0: bridge window [io size 0x1000]:
> can't assign; no space
> [ 292.204967] pcieport 0000:03:03.0: bridge window [io size 0x1000]:
> failed to assign
> [ 292.204968] pcieport 0000:04:00.0: bridge window [mem size
> 0x400200000 64bit pref]: can't assign; no space
> [ 292.204969] pcieport 0000:04:00.0: bridge window [mem size
> 0x400200000 64bit pref]: failed to assign
> [ 292.204973] pcieport 0000:04:00.0: bridge window [io size 0x1000]:
> can't assign; no space
> [ 292.204974] pcieport 0000:04:00.0: bridge window [io size 0x1000]:
> failed to assign
> [ 292.204976] pcieport 0000:04:00.0: bridge window [mem size
> 0x400200000 64bit pref]: can't assign; no space
> [ 292.204976] pcieport 0000:04:00.0: bridge window [mem size
> 0x400200000 64bit pref]: failed to assign
> [ 292.204977] pcieport 0000:04:00.0: bridge window [io size 0x1000]:
> can't assign; no space
> [ 292.204978] pcieport 0000:04:00.0: bridge window [io size 0x1000]:
> failed to assign
> [ 292.204979] pcieport 0000:05:00.0: bridge window [mem size
> 0x400200000 64bit pref]: can't assign; no space
> [ 292.204980] pcieport 0000:05:00.0: bridge window [mem size
> 0x400200000 64bit pref]: failed to assign
> [ 292.204980] pcieport 0000:05:00.0: bridge window [io size 0x1000]:
> can't assign; no space
> [ 292.204981] pcieport 0000:05:00.0: bridge window [io size 0x1000]:
> failed to assign
> [ 292.204982] pcieport 0000:05:00.0: bridge window [mem size
> 0x400200000 64bit pref]: can't assign; no space
> [ 292.204982] pcieport 0000:05:00.0: bridge window [mem size
> 0x400200000 64bit pref]: failed to assign
> [ 292.204983] pcieport 0000:05:00.0: bridge window [io size 0x1000]:
> can't assign; no space
> [ 292.204984] pcieport 0000:05:00.0: bridge window [io size 0x1000]:
> failed to assign
> [ 292.204985] amdgpu 0000:06:00.0: BAR 0 [mem size 0x400000000 64bit
> pref]: can't assign; no space
> [ 292.204986] amdgpu 0000:06:00.0: BAR 0 [mem size 0x400000000 64bit
> pref]: failed to assign
> [ 292.204987] amdgpu 0000:06:00.0: BAR 2 [mem size 0x00200000 64bit
> pref]: can't assign; no space
> [ 292.204987] amdgpu 0000:06:00.0: BAR 2 [mem size 0x00200000 64bit
> pref]: failed to assign
> [ 292.204988] amdgpu 0000:06:00.0: BAR 0 [mem size 0x400000000 64bit
> pref]: can't assign; no space
> [ 292.204989] amdgpu 0000:06:00.0: BAR 0 [mem size 0x400000000 64bit
> pref]: failed to assign
> [ 292.204990] amdgpu 0000:06:00.0: BAR 2 [mem size 0x00200000 64bit
> pref]: can't assign; no space
> [ 292.204990] amdgpu 0000:06:00.0: BAR 2 [mem size 0x00200000 64bit
> pref]: failed to assign
>
> --- Fallback: old 256 MB values restored ---
> [ 292.205079] amdgpu 0000:06:00.0: BAR 2 [mem 0x6010000000-
> 0x60101fffff 64bit pref]: old value restored
> [ 292.205095] amdgpu 0000:06:00.0: BAR 0 [mem 0x6000000000-
> 0x600fffffff 64bit pref]: old value restored
> [ 292.205096] amdgpu 0000:06:00.0: amdgpu: Not enough PCI address
> space for a large BAR.
> [ 292.205100] amdgpu 0000:06:00.0: amdgpu: VRAM: 16304M
> 0x0000008000000000 - 0x00000083FAFFFFFF (16304M used)
> [ 292.205102] amdgpu 0000:06:00.0: amdgpu: GART: 512M
> 0x0000000000000000 - 0x000000001FFFFFFF
> [ 292.205113] [drm] Detected VRAM RAM=16304M, BAR=256M
>
> === END ===
>
> On Fri, 2026-03-27 at 12:15 +0200, Ilpo Järvinen wrote:
> > On Thu, 26 Mar 2026, Cristian Cocos wrote:
> >
> > > [This has been previously posted on the linux-hotplug list, though
> > > it
> > > has been suggested that I post it here as well.]
> > >
> > > The short story is that ReBAR does not work in eGPU hotplug(!)
> > > scenarios: hotplugged Thunderbolt eGPUs are forced onto a 256MB BAR
> > > regardless of the system's ReBAR capabilities, and this because
> > > during
> > > hotplug the Linux kernel does not consult the ReBAR capability. As
> > > eGPUs are becoming more and more popular, accommodating eGPU
> > > hotplug
> > > scenarios has become imperative.
> > >
> > > The longer story:
> > >
> > > The current Thunderbolt eGPU *hotplug* sequence of events is this:
> > > BAR
> > > 2's hardware register powers up at 256 MB — the default size
> > > programmed
> > > into the BAR's address decoder by Intel at the factory. The PCIe
> > > Resizable BAR capability advertises support for up to 16 GB, but
> > > it's
> > > passive — software must explicitly exercise it. When a Thunderbolt
> > > eGPU
> > > is hotplugged at runtime, the kernel's PCI subsystem enumerates the
> > > new
> > > device, reads the BAR at its 256 MB default, sizes the bridge
> > > windows
> > > to match, and assigns addresses — all before any driver loads. The
> > > ReBAR capability is never consulted(!) during this process.
> >
> > Hi,
> >
> > The intel GPU drivers do attempt to perform the resize at probe time,
> > though it's often not successful.
> >
> > You haven't indicated what kernel you run nor provided any logs, but
> > there
> > have been some improvements to this in general in the recent kernels.
> > And
> > then there's also this patch which relates to thunderbolt cases
> > (which was
> > only accepted yesterday):
> >
> > https://lore.kernel.org/linux-pci/20260219153951.68869-1-
> > ilpo.jarvinen@linux.intel.com
> >
> > > A workaround is available for cold-plug scenarios only, and is
> > > achieved by means of the **thunderbolt.host_reset=0** kernel
> > > parameter:
> > > this preserves the BIOS's PCIe tunnel and BAR assignments from POST
> > > (where the BIOS *does* exercise ReBAR). This delivers the full 16
> > > GB
> > > BAR but, as just mentioned, only works for cold-plug(!) scenarios —
> > > if
> > > the eGPU is power-cycled at runtime, the new tunnel gets the 256 MB
> > > default.
> > >
> > > The proper fix would be for the kernel's PCI hotplug resource
> > > assignment to *first* check for ReBAR capability during
> > > enumeration,
> > > resize the BAR to the largest supported size that fits within
> > > available
> > > bridge headroom, and *then* commit bridge windows and assign
> > > addresses.
> >
> > That's being worked on. But it's not as easy as you depict it as
> > kernel
> > needs to be also able to fallback to some intermediate size if the
> > largest
> > supported size fails.
> >
> > Thanks to how size calculations and assignment are done at different
> > phases of the algorithm, it's not as easy to fallback to something
> > else as
> > you indicate as the secondary failures would still persist from the
> > attempt with the largest window. So effectively, the entire sizing
> > has to
> > be recalculated.
> >
> > > This is essentially what the BIOS does during POST. It hasn't been
> > > implemented yet because eGPU-over-Thunderbolt-with-ReBAR is (was?)
> > > a
> > > niche use case.
> > >
> > > Seeing as eGPU-over-Thunderbolt is poised to become mainstream,
> > > this
> > > can no longer be considered a niche use case.
> >
> > We don't consider it unimportant usecase. It's just the resource
> > fitting
> > and assignment algorithm is very complex and hard to improve, because
> > something usually breaks when trying to improve it. Thus, the
> > progress is
> > slow and everything has to be done carefully (when things break badly
> > in
> > this area, system doesn't even come up so there's not much room for
> > error).
>
next prev parent reply other threads:[~2026-03-30 13:23 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-26 16:36 ReBAR over Thunderbolt Cristian Cocos
2026-03-27 10:15 ` Ilpo Järvinen
2026-03-27 14:00 ` Cristian Cocos
2026-03-28 21:24 ` Cristian Cocos
2026-03-30 13:23 ` Ilpo Järvinen [this message]
2026-03-30 14:49 ` Cristian Cocos
2026-03-30 15:33 ` Ilpo Järvinen
2026-03-31 20:11 ` Cristian Cocos
2026-03-29 19:49 ` Cristian Cocos
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2f6838fa-dd7f-848d-4788-8b04a23b5753@linux.intel.com \
--to=ilpo.jarvinen@linux.intel.com \
--cc=cristi@ieee.org \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox