On Sun, 31 Aug 2025, Steve Oswald wrote: > Hello, > > I¢ve encountered an issue with Thunderbolt eGPU (externally connected > gpu via thunderbolt 4). The change from kernel 6.10.14 to 6.11.0 broke > the pci memory assignment of the external pcie device. I figured out > which version broke it by using ubuntu 25.04 and downgrading the > kernel (https://raw.githubusercontent.com/pimlie/ubuntu-mainline-kernel.sh/master/ubuntu-mainline-kernel.sh). > > >From the dmesg output, on the broken 6.11.0 I see 'failed to assign'. > The issue occurs (almost never) on previous kernel version 6.10.14. > Using pci=realloc did not change the behavior (I can produce the dmesg > output if necessary). > > The issue was tested with 2 egpus (Radeon Instinct MI50 32GB, NVIDIA > 3080 10GB). Both the amd and the nvidia driver fail to initialize the > device because they cannot write the pcie messages. > > System details: > - Kernel: Linux 6.10.14-061014-generic (Ubuntu build) > 6.11.0-061100 > - Laptop: TUXEDO InfinityBook Pro 16 - Gen8 with Thunderbolt 4 > - eGPU: Radeon Instinct MI50 32GB, NVIDIA 3080 10GB > > Steps to reproduce: > 1. Boot the system with the eGPU. > 2. Observe PCI BAR message in `dmesg`. > > Logs: > both kernel messages, lspci can be found here: > https://gist.github.com/stepeos/cd060c7d66ab195f51ab4d5675b4e4af > raw files: > - dmesg_linux_6.11.0.log > https://gist.githubusercontent.com/stepeos/cd060c7d66ab195f51ab4d5675b4e4af/raw/f9470a06ff929d386c50ec6b5d07e0ff3f053dcf/dmesg_linux_6.11.0.log > - dmesg_linux_6.10.14.log > https://gist.githubusercontent.com/stepeos/cd060c7d66ab195f51ab4d5675b4e4af/raw/f9470a06ff929d386c50ec6b5d07e0ff3f053dcf/dmesg_linux_6.10.14.log > > If additional info is needed, I'm happy to help. Hi Steve, Thanks for the report. My analysis is that the problem boils down to lack of this line with 6.11: pcieport 0000:00:07.0: resource 15 [mem 0x6000000000-0x601bffffff 64bit pref] released It means one of the upstream bridge windows could not be released for resize as it is printed from pci_reassign_bridge_resources() which likely occurs inside pci_resize_resource() call from amdgpu(?). The very likely cause is this check: /* Ignore BARs which are still in use */ if (res->child) continue; ...which (until very recently) is entirely silent so there's no warning whatsover what is the root cause. What this means, is that there's some assigned resource underneath 0000:00:07.0 with 6.11 that wasn't there with 6.10. And it is because 6.11 tried harder to get your resources assigned and was successful here and there resulting in pinning the bridge window in its place, whereas 6.10 failed to assign the same resource. Could you provide /proc/iomem (it's enough to do that for 6.11 for now)? You could try to use hpmmioprefsize= on kernel's command line to reserve more space for the bridge windows, the default is only 2M and these GPUs need a magnitude more (gigabytes), you can check from 6.10 what the sizes of the BARs on the GPU are, and round the sum upwards to the next power of two multiple. I'd also be interested to see why pci=realloc failed to solve this problem as it should reconfigure the entire resource tree so if you could provide the logs with that. Please take lspci with -vvv. -- i.