public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: Cristian Cocos <cristi@ieee.org>
To: "Geramy Loveless" <gloveless@jqluv.com>,
	"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
Cc: linux-pci@vger.kernel.org, "Christian König" <christian.koenig@amd.com>
Subject: Re: [PATCH v2] PCI: release empty sibling bridge windows during rebar expansion
Date: Fri, 10 Apr 2026 14:58:50 -0400	[thread overview]
Message-ID: <ec0d287c884cbdd5131e1b09147b9b3cd56faf1d.camel@ieee.org> (raw)
In-Reply-To: <CAGpo2mdZ6Ge9ZSYK4kKYJ7etBu5JqoVR-_G7jnxfZYhfQxxryA@mail.gmail.com>

My experience with my 9060XT Thunderbolt eGPU is that the current
amdgpu driver is full of bugs, and this *specifically* in a Thunderbolt
eGPU configuration. I have attempted to document some of these bugs
here:
https://pcforum.amd.com/s/question/0D5Pd00001S3Av9KAF/linux-9060xt-egpuoverthunderbolt-bugs-galore

Apologies for posting this here, as most of these may not be relevant
to ReBAR, yet an AMD representative may still benefit from this
multiple bug report.

C

On Fri, 2026-04-10 at 10:53 -0700, Geramy Loveless wrote:
>  I'm going to loop in Christian Koenig over at AMD he has been
> working
> with me on resolving or attempting to figure out whats going on with
> my gfx1201 connected to a tb5 dock to the host.
> I am currently having problems with the GPU basically loosing MMIO
> and
> crashing randomly. This recent patch change I believe helped but its
> really hard to say at this point.
> Without this patch of course the bar size would be 256MB and cause
> huge performance problems or feature loss. I am able to load up AI
> models and run workloads at nearly 100% gpu usage, i'm seeing 205W
> power draw out of the maximum 300W. But after sustained load I still
> get a crash.
> 
> Maybe you would have an idea as to what is causing that crash or
> where
> I should be looking to find the cause?
> Here are some relevant logs, from what I can tell something is going
> on with MMIO, but the config bar as i understand it is still alive.
> This let me to believe maybe the router was getting put into suspend
> mode which wouldnt make sense for a GPU that is active and busy
> because the pcie tunnel would be active.
> 
> Any advice or tips would be helpful thank you for the suggestions I
> will get started on writing the patch based on those recommendations.
> 
> ## SMU Firmware Version
> 
> ```
> smu driver if version = 0x0000002e
> smu fw if version = 0x00000032
> smu fw program = 0
> smu fw version = 0x00684b00 (104.75.0)
> ```
> 
> Note: Driver interface version (0x2e / 46) does not match firmware
> interface version (0x32 / 50).
> 
> ## PCI Topology
> 
> ```
> 65:00.0 PCI bridge: Intel Barlow Ridge Host 80G (rev 84)
> 66:00.0 PCI bridge: Intel Barlow Ridge Host 80G (rev 84) → NHI
> 66:01.0 PCI bridge: Intel Barlow Ridge Host 80G (rev 84) → empty
> hotplug port
> 66:02.0 PCI bridge: Intel Barlow Ridge Host 80G (rev 84) → USB
> 66:03.0 PCI bridge: Intel Barlow Ridge Host 80G (rev 84) → dock
> 93:00.0 PCI bridge: Intel Barlow Ridge Hub 80G (rev 85) → dock switch
> 94:00.0 PCI bridge: Intel Barlow Ridge Hub 80G (rev 85) → downstream
> 95:00.0 PCI bridge: AMD Navi 10 XL Upstream Port (rev 24)
> 96:00.0 PCI bridge: AMD Navi 10 XL Downstream Port (rev 24)
> 97:00.0 VGA: AMD [1002:7551] (rev c0) ← GPU
> 97:00.1 Audio: AMD [1002:ab40]
> ```
> 
> ## Workload
> 
> GPU compute via llama.cpp (ROCm/HIP backend), running
> Qwen3.5-35B-A3B-Q4_K_M.gguf model (20.49 GiB, fully offloaded to
> VRAM). Flash attention enabled, 128K context, 32 threads.
> 
> ## Crash Timeline
> 
> All timestamps from `dmesg -T`, kernel boot-relative times in
> brackets.
> 
> ### GPU initialization (successful)
> 
> ```
> [603.644s] GPU probe: IP DISCOVERY 0x1002:0x7551
> [603.653s] Detected IP block: smu_v14_0_0, gfx_v12_0_0
> [603.771s] Detected VRAM RAM=32624M, BAR=32768M, RAM width 256bits
> GDDR6
> [604.014s] SMU driver IF 0x2e, FW IF 0x32, FW version 104.75.0
> [604.049s] SMU is initialized successfully!
> [604.119s] Runtime PM manually disabled (amdgpu.runpm=0)
> [604.119s] Initialized amdgpu 3.64.0 for 0000:97:00.0
> ```
> 
> ### SMU stops responding [T+4238s after init, ~70 minutes]
> 
> ```
> [4841.828s] SMU: No response msg_reg: 12 resp_reg: 0
> [4841.828s] [smu_v14_0_2_get_power_profile_mode] Failed to get
> activity monitor!
> [4849.393s] SMU: No response msg_reg: 12 resp_reg: 0
> [4849.393s] Failed to export SMU metrics table!
> ```
> 
> 15 consecutive `SMU: No response` messages logged between [4841s] and
> [4948s], approximately every 7-8 seconds. All with `msg_reg: 12
> resp_reg: 0`. Failed operations include:
> - `smu_v14_0_2_get_power_profile_mode` — Failed to get activity
> monitor
> - `Failed to export SMU metrics table`
> - `Failed to get current clock freq`
> 
> ### Page faults begin [T+4349s after init, ~111s after first SMU
> failure]
> 
> ```
> [4948.927s] [gfxhub] page fault (src_id:0 ring:40 vmid:9 pasid:108)
> Process llama-cli pid 35632
> GCVM_L2_PROTECTION_FAULT_STATUS: 0x00941051
> Faulty UTCL2 client ID: TCP (0x8)
> PERMISSION_FAULTS: 0x5
> WALKER_ERROR: 0x0
> MAPPING_ERROR: 0x0
> RW: 0x1 (write)
> ```
> 
> 10 page faults logged at [4948s], all from TCP (Texture Cache Pipe),
> all PERMISSION_FAULTS=0x5, WALKER_ERROR=0x0, MAPPING_ERROR=0x0. 7
> unique faulting addresses:
> - 0x000072ce90828000
> - 0x000072ce90a88000
> - 0x000072ce90a89000
> - 0x000072ce90cde000
> - 0x000072ce90ce1000
> - 0x000072ce90f51000
> - 0x000072ce90f52000
> 
> ### MES failure and GPU reset [T+4349s]
> 
> ```
> [4952.809s] MES(0) failed to respond to msg=REMOVE_QUEUE
> [4952.809s] failed to remove hardware queue from MES, doorbell=0x1806
> [4952.809s] MES might be in unrecoverable state, issue a GPU reset
> [4952.809s] Failed to evict queue 4
> [4952.809s] Failed to evict process queues
> [4952.809s] GPU reset begin!. Source: 3
> ```
> 
> ### GPU reset fails
> 
> ```
> [4953.121s] Failed to evict queue 4
> [4953.121s] Failed to suspend process pid 28552
> [4953.121s] remove_all_kfd_queues_mes: Failed to remove queue 3 for
> dev 62536
> ```
> 
> 6 MES(1) REMOVE_QUEUE failures, each timing out after ~2.5 seconds:
> ```
> [4955.720s] MES(1) failed to respond to msg=REMOVE_QUEUE → failed to
> unmap legacy queue
> [4958.283s] MES(1) failed to respond to msg=REMOVE_QUEUE → failed to
> unmap legacy queue
> [4960.847s] MES(1) failed to respond to msg=REMOVE_QUEUE → failed to
> unmap legacy queue
> [4963.411s] MES(1) failed to respond to msg=REMOVE_QUEUE → failed to
> unmap legacy queue
> [4965.976s] MES(1) failed to respond to msg=REMOVE_QUEUE → failed to
> unmap legacy queue
> [4968.540s] MES(1) failed to respond to msg=REMOVE_QUEUE → failed to
> unmap legacy queue
> ```
> 
> ### PSP suspend fails
> 
> ```
> [4971.164s] psp gfx command LOAD_IP_FW(0x6) failed and response
> status is (0x0)
> [4971.164s] Failed to terminate ras ta
> [4971.164s] suspend of IP block <psp> failed -22
> ```
> 
> ### Suspend unwind fails — SMU not ready
> 
> ```
> [4971.164s] SMU is resuming...
> [4971.164s] SMC is not ready
> [4971.164s] SMC engine is not correctly up!
> [4971.164s] resume of IP block <smu> failed -5
> [4971.164s] amdgpu_device_ip_resume_phase2 failed during unwind: -5
> [4971.164s] GPU pre asic reset failed with err, -22 for drm dev,
> 0000:97:00.0
> ```
> 
> ### MODE1 reset — SMU still dead
> 
> ```
> [4971.164s] MODE1 reset
> [4971.164s] GPU mode1 reset
> [4971.164s] GPU smu mode1 reset
> [4972.193s] GPU reset succeeded, trying to resume
> [4972.193s] VRAM is lost due to GPU reset!
> [4972.193s] SMU is resuming...
> [4972.193s] SMC is not ready
> [4972.193s] SMC engine is not correctly up!
> [4972.193s] resume of IP block <smu> failed -5
> [4972.193s] GPU reset end with ret = -5
> ```
> 
> 
> 
> 
> On Fri, Apr 10, 2026 at 3:09 AM Ilpo Järvinen
> <ilpo.jarvinen@linux.intel.com> wrote:
> > 
> > On Thu, 9 Apr 2026, Geramy Loveless wrote:
> > 
> > > When pbus_reassign_bridge_resources() walks up the bridge
> > > hierarchy
> > > to expand a window (e.g. for resizable BAR), it refuses to
> > > release
> > > any bridge window that has children.  This prevents BAR resize on
> > > devices behind multi-port PCIe switches (such as Thunderbolt
> > > docks)
> > > where empty sibling downstream ports hold small reservations that
> > > block the parent bridge window from being freed and re-sized.
> > > 
> > > Add pci_bus_subtree_empty() to check whether a bus subtree
> > > contains
> > > any assigned device BARs, and pci_bus_release_empty_bridges() to
> > > release bridge window resources of empty sibling bridges, saving
> > > them to the rollback list so failures can be properly unwound.
> > > 
> > > In pbus_reassign_bridge_resources(), call
> > > pci_bus_release_empty_bridges()
> > > before checking res->child, so empty sibling windows are cleared
> > > first
> > > and the parent window can then be released and grown.
> > > 
> > > Uses PCI bus/device iterators rather than walking the raw
> > > resource
> > > tree, which avoids issues with stale sibling pointers after
> > > resource
> > > release.
> > 
> > This paragraph can be dropped. And it's not exactly correct either
> > as
> > the pointers are only stale for resource entries that reside
> > outside of
> > the resource tree (after they've been released in a specific way)
> > so if
> > you start from a resource tree entry, you should never encounter a
> > stale
> > pointer.
> > 
> > > Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > > Signed-off-by: Geramy Loveless <gloveless@jqluv.com>
> > > ---
> > >  drivers/pci/setup-bus.c | 99
> > > ++++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 97 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > > index 4cf120ebe5a..7a182cd7e4d 100644
> > > --- a/drivers/pci/setup-bus.c
> > > +++ b/drivers/pci/setup-bus.c
> > > @@ -2292,6 +2292,94 @@ void
> > > pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
> > >  }
> > >  EXPORT_SYMBOL_GPL(pci_assign_unassigned_bridge_resources);
> > > 
> > > +/*
> > > + * pci_bus_subtree_empty - Check whether a bus subtree has any
> > > assigned
> > > + * non-bridge device resources.
> > > + * @bus: PCI bus to check
> > > + *
> > > + * Returns true if no device on @bus or its descendant buses has
> > > any
> > > + * assigned BARs (bridge window resources are not considered).
> > > + */
> > > +static bool pci_bus_subtree_empty(struct pci_bus *bus)
> > > +{
> > > +     struct pci_dev *dev;
> > > +
> > > +     list_for_each_entry(dev, &bus->devices, bus_list) {
> > > +             struct resource *r;
> > > +             unsigned int i;
> > > +
> > > +             pci_dev_for_each_resource(dev, r, i) {
> > > +                     if (i >= PCI_BRIDGE_RESOURCES)
> > > +                             break;
> > > +                     if (resource_assigned(r))
> > > +                             return false;
> > > +             }
> > > +
> > > +             if (dev->subordinate &&
> > > +                 !pci_bus_subtree_empty(dev->subordinate))
> > > +                     return false;
> > > +     }
> > > +
> > > +     return true;
> > > +}
> > > +
> > > +/*
> > > + * pci_bus_release_empty_bridges - Release bridge window
> > > resources of
> > > + * empty sibling bridges so the parent window can be freed and
> > > re-sized.
> > > + * @bus: PCI bus whose child bridges to scan
> > > + * @b_win: Parent bridge window resource; only children of this
> > > window
> > > + *         are released
> > > + * @saved: List to save released resources for rollback
> > > + *
> > > + * For each PCI-to-PCI bridge on @bus whose subtree is empty (no
> > > assigned
> > > + * device BARs), releases bridge window resources that are
> > > children of
> > > + * @b_win, saving them for rollback via @saved.
> > > + *
> > > + * Returns 0 on success, negative errno on failure.
> > > + */
> > > +static int pci_bus_release_empty_bridges(struct pci_bus *bus,
> > > +                                      struct resource *b_win,
> > > +                                      struct list_head *saved)
> > > +{
> > > +     struct pci_dev *dev;
> > > +
> > > +     list_for_each_entry(dev, &bus->devices, bus_list) {
> > > +             struct resource *r;
> > > +             unsigned int i;
> > > +
> > > +             if (!dev->subordinate)
> > > +                     continue;
> > > +
> > > +             if ((dev->class >> 8) != PCI_CLASS_BRIDGE_PCI)
> > > +                     continue;
> > 
> > I suppose dev->subordinate check is enough for what we're doing so
> > this
> > looks redundant.
> > 
> > > +
> > > +             if (!pci_bus_subtree_empty(dev->subordinate))
> > > +                     continue;
> > > +
> > > +             pci_dev_for_each_resource(dev, r, i) {
> > > +                     int ret;
> > > +
> > > +                     if (!pci_resource_is_bridge_win(i))
> > > +                             continue;
> > > +
> > > +                     if (!resource_assigned(r))
> > > +                             continue;
> > > +
> > > +                     if (r->parent != b_win)
> > > +                             continue;
> > > +
> > > +                     ret = pci_dev_res_add_to_list(saved, dev,
> > > r, 0, 0);
> > > +                     if (ret)
> > > +                             return ret;
> > > +
> > > +                     release_child_resources(r);
> > 
> > Unfortunately you cannot call this low-level function because it
> > recursively frees child resources which means you won't be able to
> > rollback them as they were not added to the saved list.
> > 
> > I think the release algorithm should basically do this:
> > 
> > - Recurse to the subordinate buses
> > - Loop through bridge window resources of this bus
> >         - Skip resources that are not assigned or are not parented
> > by b_win
> >         - If the resource still has childs, leave the resource
> > alone
> >           (+ log it for easier troubleshooting these cases; any
> > failure
> >              will also cascade to upstream so it may be possible to
> >              shortcut something but it will also make the algorithm
> > more
> >              complicated)
> >         - Save and free the resource
> > 
> > It might be better to move some of the code from
> > pbus_reassign_bridge_resources() here as there's overlap with the
> > sketched
> > algorithm (but I'm not sure until I see the updated version but
> > keep this
> > in mind).
> > 
> > Doing pci_bus_subtree_empty() before any removal is fine with me,
> > but I
> > see it just an optimization.
> > 
> > > +                     pci_release_resource(dev, i);
> > > +             }
> > > +     }
> > > +
> > > +     return 0;
> > > +}
> > > +
> > >  /*
> > >   * Walk to the root bus, find the bridge window relevant for
> > > @res and
> > >   * release it when possible. If the bridge window contains
> > > assigned
> > > @@ -2316,7 +2404,14 @@ static int
> > > pbus_reassign_bridge_resources(struct pci_bus *bus, struct
> > > resource *
> > > 
> > >               i = pci_resource_num(bridge, res);
> > > 
> > > -             /* Ignore BARs which are still in use */
> > 
> > I don't know why you removed this comment (I admit though "BARs"
> > could
> > have been worded better as it's bridge windows we're dealing here).
> > 
> > > +             /* Release empty sibling bridge windows first */
> > > +             if (bridge->subordinate) {
> > > +                     ret = pci_bus_release_empty_bridges(
> > > +                                     bridge->subordinate, res,
> > > saved);
> > 
> > First arg fits to the previous line.
> > 
> > Align the second line to (.
> > 
> > But consider also rearranging code as I mentioned above.
> > 
> > > +                     if (ret)
> > > +                             return ret;
> > 
> > Consider proceeding with the resize even if something failed as
> > there are
> > cases where the bridge windows are large enough (admittedly, you
> > seem to
> > only bail out in case of alloc error).
> > 
> > In to the same vein, there seems to be one existing goto restore
> > (that was
> > added by me), which could also probably do continue instead (but
> > changing
> > it would be worth another patch).
> > 
> > > +             }
> > > +
> > >               if (!res->child) {
> > >                       ret = pci_dev_res_add_to_list(saved,
> > > bridge, res, 0, 0);
> > >                       if (ret)
> > > @@ -2327,7 +2422,7 @@ static int
> > > pbus_reassign_bridge_resources(struct pci_bus *bus, struct
> > > resource *
> > >                       const char *res_name =
> > > pci_resource_name(bridge, i);
> > > 
> > >                       pci_warn(bridge,
> > > -                              "%s %pR: was not released (still
> > > contains assigned resources)\n",
> > > +                              "%s %pR: not released, active
> > > children present\n",
> > >                                res_name, res);
> > >               }
> > > 
> > > 
> > 
> > --
> >  i.

  reply	other threads:[~2026-04-10 18:58 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08 22:31 [PATCH] PCI: release empty sibling resources during bridge window resize Geramy Loveless
2026-04-09  8:03 ` Ilpo Järvinen
     [not found]   ` <CAGpo2mcyLhY6muz9Zgg3zD=Ux-HT8RXeMvbUi27a+SX=VxCRPQ@mail.gmail.com>
2026-04-09 13:26     ` Ilpo Järvinen
2026-04-09 19:32       ` Cristian Cocos
2026-04-10  5:26         ` [PATCH v2] PCI: release empty sibling bridge windows during rebar expansion Geramy Loveless
2026-04-10 10:09           ` Ilpo Järvinen
2026-04-10 17:53             ` Geramy Loveless
2026-04-10 18:58               ` Cristian Cocos [this message]
2026-04-10 19:10                 ` [PATCH] PCI: release empty sibling bridge resources during window resize Geramy Loveless
2026-04-13 10:22                   ` Ilpo Järvinen
2026-04-10 19:14                 ` [PATCH v2] PCI: release empty sibling bridge windows during rebar expansion Geramy Loveless
2026-04-10 23:01                   ` Geramy Loveless
2026-04-10 23:21                     ` Cristian Cocos
2026-04-11  3:28                       ` Mario Limonciello
2026-04-11 16:30                         ` Geramy Loveless
2026-04-11 17:42                           ` Mario Limonciello

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ec0d287c884cbdd5131e1b09147b9b3cd56faf1d.camel@ieee.org \
    --to=cristi@ieee.org \
    --cc=christian.koenig@amd.com \
    --cc=gloveless@jqluv.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox