Linux PCI subsystem development
 help / color / mirror / Atom feed
From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Steve Oswald <stevepeter.oswald@gmail.com>
Cc: linux-pci@vger.kernel.org
Subject: Re: [BUG] Thunderbolt eGPU PCI BARs incorrectly assigned, fails to assign memory
Date: Mon, 1 Sep 2025 16:25:26 +0300 (EEST)	[thread overview]
Message-ID: <9254be77-46ea-992f-a1bd-98bea3943520@linux.intel.com> (raw)
In-Reply-To: <CAN95MYEaO8QYYL=5cN19nv_qDGuuP5QOD17pD_ed6a7UqFVZ-g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3454 bytes --]

On Sun, 31 Aug 2025, Steve Oswald wrote:

> Hello,
> 
> I’ve encountered an issue with Thunderbolt eGPU (externally connected
> gpu via thunderbolt 4). The change from kernel 6.10.14 to 6.11.0 broke
> the pci memory assignment of the external pcie device. I figured out
> which version broke it by using ubuntu 25.04 and downgrading the
> kernel (https://raw.githubusercontent.com/pimlie/ubuntu-mainline-kernel.sh/master/ubuntu-mainline-kernel.sh).
> 
> >From the dmesg output, on the broken 6.11.0 I see 'failed to assign'.
> The issue occurs (almost never) on previous kernel version 6.10.14.
> Using pci=realloc did not change the behavior (I can produce the dmesg
> output if necessary).
> 
> The issue was tested with 2 egpus (Radeon Instinct MI50 32GB, NVIDIA
> 3080 10GB). Both the amd and the nvidia driver fail to initialize the
> device because they cannot write the pcie messages.
> 
> System details:
> - Kernel: Linux 6.10.14-061014-generic (Ubuntu build) > 6.11.0-061100
> - Laptop: TUXEDO InfinityBook Pro 16 - Gen8 with Thunderbolt 4
> - eGPU: Radeon Instinct MI50 32GB, NVIDIA 3080 10GB
> 
> Steps to reproduce:
> 1. Boot the system with the eGPU.
> 2. Observe PCI BAR message in `dmesg`.
> 
> Logs:
> both kernel messages, lspci can be found here:
> https://gist.github.com/stepeos/cd060c7d66ab195f51ab4d5675b4e4af
> raw files:
> - dmesg_linux_6.11.0.log
> https://gist.githubusercontent.com/stepeos/cd060c7d66ab195f51ab4d5675b4e4af/raw/f9470a06ff929d386c50ec6b5d07e0ff3f053dcf/dmesg_linux_6.11.0.log
> - dmesg_linux_6.10.14.log
> https://gist.githubusercontent.com/stepeos/cd060c7d66ab195f51ab4d5675b4e4af/raw/f9470a06ff929d386c50ec6b5d07e0ff3f053dcf/dmesg_linux_6.10.14.log
> 
> If additional info is needed, I'm happy to help.

Hi Steve,

Thanks for the report.

My analysis is that the problem boils down to lack of this line with 6.11:

pcieport 0000:00:07.0: resource 15 [mem 0x6000000000-0x601bffffff 64bit pref] released

It means one of the upstream bridge windows could not be released for 
resize as it is printed from pci_reassign_bridge_resources() which likely 
occurs inside pci_resize_resource() call from amdgpu(?).

The very likely cause is this check:

                        /* Ignore BARs which are still in use */
                        if (res->child)
                                continue;

...which (until very recently) is entirely silent so there's no warning 
whatsover what is the root cause.

What this means, is that there's some assigned resource underneath 
0000:00:07.0 with 6.11 that wasn't there with 6.10. And it is because 6.11 
tried harder to get your resources assigned and was successful here and 
there resulting in pinning the bridge window in its place, whereas 6.10 
failed to assign the same resource.

Could you provide /proc/iomem (it's enough to do that for 6.11 for now)?


You could try to use hpmmioprefsize= on kernel's command line to reserve 
more space for the bridge windows, the default is only 2M and these GPUs 
need a magnitude more (gigabytes), you can check from 6.10 what the sizes 
of the BARs on the GPU are, and round the sum upwards to the next power of 
two multiple.


I'd also be interested to see why pci=realloc failed to solve this problem 
as it should reconfigure the entire resource tree so if you could provide 
the logs with that. Please take lspci with -vvv.


-- 
 i.

  reply	other threads:[~2025-09-01 13:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-31 10:51 [BUG] Thunderbolt eGPU PCI BARs incorrectly assigned, fails to assign memory Steve Oswald
2025-09-01 13:25 ` Ilpo Järvinen [this message]
2025-09-01 15:50   ` Ilpo Järvinen
2025-09-01 16:06     ` Ilpo Järvinen
2025-09-01 16:18       ` Steve Oswald
2025-09-01 16:28         ` Ilpo Järvinen
2025-09-03 13:09         ` Ilpo Järvinen
2025-10-08 10:43           ` Ilpo Järvinen
2025-10-11 14:12             ` Steve Oswald
2025-11-07 16:22               ` Ilpo Järvinen
2025-09-01 16:10     ` Steve Oswald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9254be77-46ea-992f-a1bd-98bea3943520@linux.intel.com \
    --to=ilpo.jarvinen@linux.intel.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=stevepeter.oswald@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox