From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: "Jonas Höglund" <firefly@firefly.nu>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>,
Bjorn Helgaas <bhelgaas@google.com>,
linux-pci@vger.kernel.org, regressions@lists.linux.dev
Subject: Re: [REGRESSION] amdgpu with Thunderbolt eGPU bracket fails since new bridge window alignment calculation code
Date: Mon, 30 Mar 2026 19:32:28 +0300 (EEST) [thread overview]
Message-ID: <4a7704af-4c30-3050-e8a6-cb1fa3fd7ec9@linux.intel.com> (raw)
In-Reply-To: <52614de3-9658-4390-8e0e-689963f364a4@app.fastmail.com>
[-- Attachment #1: Type: text/plain, Size: 3164 bytes --]
On Mon, 30 Mar 2026, Jonas Höglund wrote:
> On Mon, 30 Mar 2026, at 14:33, Ilpo Järvinen wrote:
> > I'm skeptical it's exactly the same issue even if the end result is the
> > same.
> >
> > The resource fitting algorithm has been in a state of constant flux due to
> > various fixes and improvements into it over all the recent two years.
> > Unfortunately, fixing one thing (or even moving towards fixing an issue)
> > may break another thing due to how different resource interact.
>
> Ok, yeah, I don't envy having to deal with that. You're probably right
> it's more BAR-related, I mostly keyed in on the very similar symptom.
Definitely the gpu driver could handle an resource issue better than by
calling something that triggers a sanity check somewhere, but it's
secondary problem.
> > That "PCI: Improve head free space usage" series is certainly fixing two
> > known corner case with the commit 3958bf16e2fe ("PCI: Stop over-estimating
> > bridge window size") but with only heavily filtered logs, I'm unable to
> > confirm if it applies to this case as well.
>
> Sorry for not providing full logs from the get-go; I couldn't think of
> suitable location. Here's a full dmesg for reference of the crash
> manifesting on 7.0.0-rc5:
>
> https://up.firefly.nu/pub/amdgpu-egpu-crash-7.0.0-rc5.dmesg.txt
>
>
> > From the limited logs, I suspect this is primarily a BAR resize rollback
> > failure which leaves the resources into a state worse than they were prior
> > to the resize. The commit 337b1b566db0 ("PCI: Fix restoring BARs on BAR
> > resize rollback path") attempts to rectify that. The entire series is here
> > (not all of it went to stable):
>
> > https://lore.kernel.org/all/20251113162628.5946-1-ilpo.jarvinen@linux.intel.com/T/#m9b0e316c94f7abc0686e58f902d05ff35aeac3ac
> >
> > The fixes to that series are here:
> >
> > 5528fd38f230 ("PCI: Fix Resizable BAR restore order")
> > 08d9eae76b85 ("PCI: Fix BAR resize rollback path overwriting ret")
>
> Unless I misread something, they should both be included in the recently
> tagged 7.0.0-rc6--I'll try building it and see if the issue is resolved.
>
> I'll reply once I've tested 7.0.0-rc6.
Hi again,
Now that I look more into the logs that probably won't help. For some
reason, it seems that resize is not even attempted and the errno is
-EINVAL which is a bit unexpected.
I'm starting to wonder that the problem fixed by this patch once again is
showing its ugly head (it's currently in pci/resource branch, so it won't
appear until 7.1-rc1):
https://lore.kernel.org/linux-pci/20260326200427.GA1340256@bhelgaas/
I still don't understand why pbus_select_window() would return NULL in
this case but it looks the most likely candidate where -EINVAL could come
from (I still don't understand what cleared resource's flags if that's the
case but it still seems the best explanation).
Please take logs from this point on with dyndbg="file drivers/pci/*.c +p"
on the kernel's command line so there's little bit of extra info (and
check you are building with CONFIG_DYNAMIC_DEBUG).
--
i.
next prev parent reply other threads:[~2026-03-30 16:32 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-27 23:02 [REGRESSION] amdgpu with Thunderbolt eGPU bracket fails since new bridge window alignment calculation code Jonas Höglund
2026-03-28 8:46 ` Thorsten Leemhuis
2026-03-28 16:09 ` Jonas Höglund
2026-03-30 7:21 ` Thorsten Leemhuis
2026-03-30 14:33 ` Ilpo Järvinen
2026-03-30 15:50 ` Jonas Höglund
2026-03-30 16:32 ` Ilpo Järvinen [this message]
2026-04-02 16:51 ` Jonas Höglund
2026-04-02 16:56 ` Jonas Höglund
2026-04-07 7:37 ` Ilpo Järvinen
2026-04-07 7:26 ` Ilpo Järvinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4a7704af-4c30-3050-e8a6-cb1fa3fd7ec9@linux.intel.com \
--to=ilpo.jarvinen@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=firefly@firefly.nu \
--cc=linux-pci@vger.kernel.org \
--cc=regressions@leemhuis.info \
--cc=regressions@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox