Linux PCI subsystem development
 help / color / mirror / Atom feed
From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Steve Oswald <stevepeter.oswald@gmail.com>
Cc: linux-pci@vger.kernel.org
Subject: Re: [BUG] Thunderbolt eGPU PCI BARs incorrectly assigned, fails to assign memory
Date: Wed, 8 Oct 2025 13:43:53 +0300 (EEST)	[thread overview]
Message-ID: <0263ea7c-8df9-1dba-f797-f18e253b3f30@linux.intel.com> (raw)
In-Reply-To: <eb70b817-175c-7a34-d2bf-9472019afa47@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 10507 bytes --]

On Wed, 3 Sep 2025, Ilpo Järvinen wrote:

> On Mon, 1 Sep 2025, Steve Oswald wrote:
> 
> > I've added the dmesg output fordyndbg="file drivers/pci/*. I wasn't
> > sure if I added it with the escaped quotes correctly.
> > https://gist.github.com/stepeos/cd060c7d66ab195f51ab4d5675b4e4af/raw/9cf5fc3a8c4f13588a33d61865f804f85e50470a/dmesg_linux_6.11.0_dyndbg.log
> 
> Hi again,
> 
> I think the patch below should solve your issue. If you manage to get it 
> tested and are confident enough it fixed your issue, you may provide your 
> Tested-by tag so I can include it into the official submission of the fix.

Hi Steve,

Did you manage to test this patch?

-- 
 i.

> 
> 
> From 1c8ef31c6ac6616869b447a473e879b233df62db Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com>
> Date: Wed, 3 Sep 2025 15:53:48 +0300
> Subject: [PATCH 1/1] PCI: Prevent shrinking bridge window from its required
>  size
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> pci_bridge_distribute_available_resources() -> ... ->
> adjust_bridge_window() is called in between __pci_bus_size_bridges()
> and assigning the resources. Since the commit 948675736a77 ("PCI: Allow
> adjust_bridge_window() to shrink resource if necessary")
> adjust_bridge_window() can also shrink the bridge window to force it to
> a smaller size. The shrunken size, however, conflicts what
> __pci_bus_size_bridges() -> pbus_size_mem() calculated as the required
> bridge window size. By shrinking the resource size,
> adjust_bridge_window() prevents rest of the resource fitting algorithm
> working as intended. Resource fitting logic is expecting assignment
> failures when bridge windows need resizing, but there are cases where
> failures are no longer happening after the commit 948675736a77 ("PCI:
> Allow adjust_bridge_window() to shrink resource if necessary").
> 
> The commit 948675736a77 ("PCI: Allow adjust_bridge_window() to shrink
> resource if necessary") justifies the change by the extra reservation
> made due to hpmemsize parameter, however, the kernel code contradicts
> with that statement. (For simplicity, finer-grained hpmmiosize and
> hpmmiopref parameters that can be used to the same effect as hpmemsize
> are ignored in this description.)
> 
> pbus_size_mem() calls calculate_memsize() twice. First with add_size=0
> to find out the minimal required resource size. The second call occurs
> with add_size=hpmemsize (effectively) but the result does not directly
> affect the resource size only resulting in an entry on the realloc_head
> list (a.k.a. add_list). Yet, adjust_bridge_window() directly changes
> the resource size which does not include what is reserved due to
> hpmemsize. Also, if the required size for the bridge window exceeds
> hpmemsize, the parameter does not have any effect even on the second
> size calculcation made by pbus_size_mem(); from calculate_memsize():
> 
>   size = max(size, add_size) + children_add_size;
> 
> The commit ae4611f1d7e9 ("PCI: Set resource size directly in
> adjust_bridge_window()") that precedes the commit 948675736a77 ("PCI:
> Allow adjust_bridge_window() to shrink resource if necessary") is also
> related to causing this problem. Its changelog explicitly states
> adjust_bridge_window() wants to "guarantee" allocation success.
> Guaranteed allocations, however, are incompatible with how the other
> parts of the resource fitting algorithm work. The given justification
> fails to explain why guaranteed allocations at this stage are required
> nor why forcing window to a smaller value than what was calculated by
> pbus_size_mem() is correct. The the change might have worked by chance
> in some test scenario, too small bridge window does not "guarantee"
> success from the point of view of the endpoint device resource
> assignments. No issue is mentioned within the changelog so it's unclear
> if the change was made to fix some observed issue nor and what that
> issue was.
> 
> The unwanted shrinking of a bridge window occurs, e.g., when a device
> with large BARs such as eGPU is attached using Thunderbolt and the Root
> Port holds less than enough resource space for the eGPU. The GPU
> resources are in order of GBs and the default hotplug allocation is
> mere 2M (DEFAULT_HOTPLUG_MMIO_PREF_SIZE). The problem is illustrated by
> this log (filtered to the relevant content only):
> 
> pci 0000:00:07.0: PCI bridge to [bus 03-2c]
> pci 0000:00:07.0:   bridge window [mem 0x6000000000-0x601bffffff 64bit pref]
> pci 0000:03:00.0: PCI bridge to [bus 00]
> pci 0000:03:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
> pci 0000:03:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> pci 0000:03:00.0: PCI bridge to [bus 04-2c]
> pcieport 0000:00:07.0: Assigned bridge window [mem 0x6000000000-0x601bffffff 64bit pref] to [bus 03-2c] cannot fit 0xc00000000 required for 0000:03:00.0 bridging to [bus 04-2c]
> pci 0000:03:00.0: bridge window [mem 0x800000000-0x10003fffff 64bit pref] to [bus 04-2c] add_size 100000 add_align 100000
> pcieport 0000:00:07.0: distributing available resources
> pci 0000:03:00.0: bridge window [mem 0x800000000-0x10003fffff 64bit pref] shrunken by 0x00000007e4400000
> pci 0000:03:00.0: bridge window [mem 0x6000000000-0x601bffffff 64bit pref]: assigned
> 
> The initial size of the Root Port's window is 448MB (0x601bffffff -
> 0x6000000000). __pci_bus_size_bridges() -> pbus_size_mem() calculates
> the required size to be 32772 MB (0x10003fffff - 0x800000000) which
> would fit the eGPU resources. adjust_bridge_window() then shrinks the
> bridge window down to what is guaranteed to fit into the Root Port's
> bridge window. The bridge window for 03:00.0 is also eliminated from
> the add_list (a.k.a. realloc_head) list by adjust_bridge_window().
> 
> After adjustment, the resources are assigned and as the bridge window
> for 03:00.0 is assigned successfully, no failure is recorded. Without a
> failure, no attempt to resize the window of the Root Port is required.
> The end result is eGPU not having large enough resources to work.
> 
> The commit 948675736a77 ("PCI: Allow adjust_bridge_window() to shrink
> resource if necessary") also claims nested bridge windows are sized the
> same, which is false. pbus_size_mem() calculates the size for the
> parent bridge window by summing all the downstream resources so the
> resource fitting calculates larger bridge window for the parent to
> accomodate the childen. That is, hpmemsize does not result the same
> size for the case where there are nested bridge windows.
> 
> In order to fix the most immediate problem, don't shrink the resource
> size as hpmemsize had nothing to do with it. When considering add_size,
> only reduce it up to what is added due to hpmemsize (if required size
> is larger than hpmemsize, the parameter has no impact, see
> calculate_memsize()).
> 
> This is not exactly a revert of the commits e4611f1d7e9 ("PCI: Set
> resource size directly in adjust_bridge_window()") and 948675736a77
> ("PCI: Allow adjust_bridge_window() to shrink resource if necessary")
> as shrinking still remains in place but is implemented differently,
> and the end result behaves very differently.
> 
> It is possible that those two commits fixed some other issue that is
> not described with enough detail in the changelog and undoing parts of
> them results in another regression due to behavioral change.
> Nonetheless, as described above, the solution by those two commits was
> flawed and the issue, if one exists, should be solved in a way that is
> compatible with the rest of the resource fitting algorithm instead of
> working against it.
> 
> Besides shrinking, the case where adjust_bridge_window() expands the
> bridge window is likely somewhat wrong as well because it removes the
> entry from add_list (a.k.a. realloc_head), but it is less damaging as
> that only impacts optional resources and may have no impact if
> expanding by hpmemsize is larger than what add_size was. Fixing it is
> left as further work.
> 
> Reported-by: Steve Oswald <stevepeter.oswald@gmail.com>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>  drivers/pci/setup-bus.c | 42 +++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 23082bc0ca37..4dd618bc4196 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1828,6 +1828,7 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
>  				 resource_size_t new_size)
>  {
>  	resource_size_t add_size, size = resource_size(res);
> +	struct pci_dev_resource *dev_res;
>  
>  	if (res->parent)
>  		return;
> @@ -1840,9 +1841,46 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
>  		pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
>  			&add_size);
>  	} else if (new_size < size) {
> +		int idx = pci_resource_num(bridge, res);
> +
> +		/*
> +		 * hpio/mmio/mmioprefsize hasn't been included at all? See the
> +		 * add_size param at the callsites of calculate_memsize().
> +		 */
> +		if (!add_list)
> +			return;
> +
> +		/* Only shrink if the hotplug extra relates to window size. */
> +		switch (idx) {
> +			case PCI_BRIDGE_IO_WINDOW:
> +				if (size > pci_hotplug_io_size)
> +					return;
> +				break;
> +			case PCI_BRIDGE_MEM_WINDOW:
> +				if (size > pci_hotplug_mmio_size)
> +					return;
> +				break;
> +			case PCI_BRIDGE_PREF_MEM_WINDOW:
> +				if (size > pci_hotplug_mmio_pref_size)
> +					return;
> +				break;
> +			default:
> +				break;
> +		}
> +
> +		dev_res = res_to_dev_res(add_list, res);
>  		add_size = size - new_size;
> -		pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res,
> -			&add_size);
> +		if (add_size < dev_res->add_size) {
> +			dev_res->add_size -= add_size;
> +			pci_dbg(bridge, "bridge window %pR optional size shrunken by %pa\n",
> +				res, &add_size);
> +		} else {
> +			pci_dbg(bridge, "bridge window %pR optional size removed\n",
> +				res);
> +			remove_from_list(add_list, res);
> +		}
> +		return;
> +
>  	} else {
>  		return;
>  	}
> 
> base-commit: f6d41443f54856ceece0d5b584f47f681513bde4
> 

  reply	other threads:[~2025-10-08 10:43 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-31 10:51 [BUG] Thunderbolt eGPU PCI BARs incorrectly assigned, fails to assign memory Steve Oswald
2025-09-01 13:25 ` Ilpo Järvinen
2025-09-01 15:50   ` Ilpo Järvinen
2025-09-01 16:06     ` Ilpo Järvinen
2025-09-01 16:18       ` Steve Oswald
2025-09-01 16:28         ` Ilpo Järvinen
2025-09-03 13:09         ` Ilpo Järvinen
2025-10-08 10:43           ` Ilpo Järvinen [this message]
2025-10-11 14:12             ` Steve Oswald
2025-11-07 16:22               ` Ilpo Järvinen
2025-09-01 16:10     ` Steve Oswald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0263ea7c-8df9-1dba-f797-f18e253b3f30@linux.intel.com \
    --to=ilpo.jarvinen@linux.intel.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=stevepeter.oswald@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox