From: Ard Biesheuvel <ardb@kernel.org>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: linux-pci@vger.kernel.org,
"Lorenzo Pieralisi" <lorenzo.pieralisi@linaro.org>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Christian König" <christian.koenig@amd.com>,
amd-gfx@lists.freedesktop.org,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
"D Scott Phillips" <scott@os.amperecomputing.com>
Subject: Re: 2499f53 (PCI: Rework optional resource handling) regression with AMDGPU on Arm AVA platform
Date: Wed, 22 Oct 2025 19:08:24 +0200 [thread overview]
Message-ID: <CAMj1kXHO50CBHLpUChuGpO4REEAifxDqcr_j362OzJSoz4yLTA@mail.gmail.com> (raw)
In-Reply-To: <874irqop6b.fsf@draig.linaro.org>
On Wed, 22 Oct 2025 at 18:51, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Hi,
>
> I've been tracking a regression on my Arm64 (Altra) AVA platform between
> 6.14 and 6.15. It looks like the rework commit broke the ability of the
> amdgpu driver to resize it's bar, resulting in an SError and failure to
> boot:
>
...
> From discussions with Ard it seems if the firmware had resized the BAR first,
> and then assigned the resources, there would be no issue. However there
> is no latter firmware for the platform.
>
> While the PCI change has provoked this regression I suspect the amdgpu code
> could handle the failure to resize the BAR better and if it can't get
> what it wants just not initialise the driver.
Actually, looking again at the below, which follows the error about
overlapping resource windows, it seems the PCI code is failing to roll
back the changes, and it is not the driver at fault here.
> [ 15.611321] pcieport 000d:00:01.0: PCI bridge to [bus 01-03]
> [ 15.616971] pcieport 000d:00:01.0: bridge window [io size 0x1000]
> [ 15.623315] pcieport 000d:00:01.0: bridge window [mem 0x50000000-0x502fffff]
> [ 15.630527] pcieport 000d:00:01.0: bridge window [mem size 0x18000000 64bit pref]
> [ 15.638174] pcieport 000d:01:00.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: can't claim; no compatible bridge window
> [ 15.650508] pcieport 000d:01:00.0: PCI bridge to [bus 02-03]
> [ 15.656164] pcieport 000d:01:00.0: bridge window [mem 0x50000000-0x501fffff]
> [ 15.663381] pcieport 000d:01:00.0: bridge window [mem size 0x18000000 64bit pref]
> [ 15.671036] pcieport 000d:02:00.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: can't claim; no compatible bridge window
> [ 15.683370] pcieport 000d:02:00.0: PCI bridge to [bus 03]
> [ 15.688764] pcieport 000d:02:00.0: bridge window [mem 0x50000000-0x501fffff]
> [ 15.695982] pcieport 000d:02:00.0: bridge window [mem size 0x18000000 64bit pref]
On Wed, 22 Oct 2025 at 18:51, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Hi,
>
> I've been tracking a regression on my Arm64 (Altra) AVA platform between
> 6.14 and 6.15. It looks like the rework commit broke the ability of the
> amdgpu driver to resize it's bar, resulting in an SError and failure to
> boot:
>
> [ 15.348097] amdgpu 000d:03:00.0: amdgpu: detected ip block number 8 <vcn_v4_0>
> [ 15.355901] amdgpu 000d:03:00.0: amdgpu: detected ip block number 9 <jpeg_v4_0>
> [ 15.363202] amdgpu 000d:03:00.0: amdgpu: detected ip block number 10 <mes_v11_0>
> [ 15.384163] amdgpu 000d:03:00.0: amdgpu: Fetched VBIOS from ROM BAR
> [ 15.390434] amdgpu: ATOM BIOS: 113-4481LHS-UC1
> [ 15.400079] amdgpu 000d:03:00.0: amdgpu: CP RS64 enable
> [ 15.411830] amdgpu 000d:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
> [ 15.419932] amdgpu 000d:03:00.0: amdgpu: PCIE atomic ops is not supported
> [ 15.426719] [drm] GPU posting now...
> [ 15.430329] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
> [ 15.438871] amdgpu 000d:03:00.0: BAR 2 [mem 0x340010000000-0x3400101fffff 64bit pref]: releasing
> [ 15.447648] amdgpu 000d:03:00.0: BAR 0 [mem 0x340000000000-0x34000fffffff 64bit pref]: releasing
> [ 15.456452] pcieport 000d:02:00.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: releasing
> [ 15.466095] pcieport 000d:01:00.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: releasing
> [ 15.475738] pcieport 000d:00:01.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: releasing
> [ 15.485386] pcieport 000d:00:01.0: bridge window [io 0x1000-0x0fff] to [bus 01-03] add_size 1000
> [ 15.494252] pcieport 000d:00:01.0: bridge window [mem 0x340000000000-0x3402ffffffff 64bit pref]: assigned
> [ 15.503809] pcieport 000d:00:01.0: bridge window [io size 0x1000]: can't assign; no space
> [ 15.512063] pcieport 000d:00:01.0: bridge window [io size 0x1000]: failed to assign
> [ 15.519796] pcieport 000d:00:01.0: bridge window [io size 0x1000]: can't assign; no space
> [ 15.528049] pcieport 000d:00:01.0: bridge window [io size 0x1000]: failed to assign
> [ 15.535787] pcieport 000d:01:00.0: bridge window [mem 0x340000000000-0x3402ffffffff 64bit pref]: assigned
> [ 15.545349] pcieport 000d:02:00.0: bridge window [mem 0x340000000000-0x3402ffffffff 64bit pref]: assigned
> [ 15.554911] amdgpu 000d:03:00.0: BAR 0 [mem 0x340000000000-0x3401ffffffff 64bit pref]: assigned
> [ 15.563612] amdgpu 000d:03:00.0: BAR 2 [mem 0x340200000000-0x3402001fffff 64bit pref]: assigned
> [ 15.572313] pcieport 000d:00:01.0: PCI bridge to [bus 01-03]
> [ 15.577962] pcieport 000d:00:01.0: bridge window [mem 0x50000000-0x502fffff]
> [ 15.585175] pcieport 000d:00:01.0: bridge window [mem 0x340000000000-0x3402ffffffff 64bit pref]
> [ 15.594038] pcieport 000d:00:01.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: can't claim; address conflict with PCI Bus 000d:01 [mem 0x340000000000-0x3
> 40017ffffff 64bit pref]
>
> Failure to claim space for the bridge window...
>
> [ 15.611321] pcieport 000d:00:01.0: PCI bridge to [bus 01-03]
> [ 15.616971] pcieport 000d:00:01.0: bridge window [io size 0x1000]
> [ 15.623315] pcieport 000d:00:01.0: bridge window [mem 0x50000000-0x502fffff]
> [ 15.630527] pcieport 000d:00:01.0: bridge window [mem size 0x18000000 64bit pref]
> [ 15.638174] pcieport 000d:01:00.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: can't claim; no compatible bridge window
> [ 15.650508] pcieport 000d:01:00.0: PCI bridge to [bus 02-03]
> [ 15.656164] pcieport 000d:01:00.0: bridge window [mem 0x50000000-0x501fffff]
> [ 15.663381] pcieport 000d:01:00.0: bridge window [mem size 0x18000000 64bit pref]
> [ 15.671036] pcieport 000d:02:00.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: can't claim; no compatible bridge window
> [ 15.683370] pcieport 000d:02:00.0: PCI bridge to [bus 03]
> [ 15.688764] pcieport 000d:02:00.0: bridge window [mem 0x50000000-0x501fffff]
> [ 15.695982] pcieport 000d:02:00.0: bridge window [mem size 0x18000000 64bit pref]
> [ 15.703643] [drm] Not enough PCI address space for a large BAR.
>
> Realisation not enough space for the BAR
>
> [ 15.703648] amdgpu 000d:03:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
> [ 15.719119] amdgpu 000d:03:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
> [ 15.727470] [drm] Detected VRAM RAM=8176M, BAR=256M
> [ 15.732339] [drm] RAM width 128bits GDDR6
> [ 15.736552] [drm] amdgpu: 8176M of VRAM memory ready
> [ 15.741516] [drm] amdgpu: 15888M of GTT memory ready.
> [ 15.746592] [drm] GART: num cpu pages 131072, num gpu pages 131072
> [ 15.752862] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
> [ 15.850408] [drm] Loading DMUB firmware via PSP: version=0x07002D00
> [ 16.128604] [drm] Found VCN firmware Version ENC: 1.23 DEC: 9 VEP: 0 Revision: 16
> [ 16.446347] SError Interrupt on CPU3, code 0x00000000be000411 -- SError
> [ 16.446354] CPU: 3 UID: 0 PID: 11 Comm: kworker/u128:0 Tainted: G U 6.14.0-rc1-ajb-debian-bisect-00027-g2499f5348431-dirty #68
> [ 16.446359] Tainted: [U]=USER
> [ 16.446360] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
> [ 16.446362] Workqueue: efi_rts_wq efi_call_rts
> [ 16.446371] pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 16.446374] pc : __wake_up_common_lock+0x40/0xc0
> [ 16.446379] lr : __wake_up+0x20/0x40
> [ 16.446382] sp : ffff800080aa3790
> [ 16.446383] x29: ffff800080aa3790 x28: ffff3e8780bcb780 x27: 00000000fa481000
> [ 16.446387] x26: ffff3e87a7e14b98 x25: ffffb6df6e1e2978 x24: ffffb6df6e351ed8
> [ 16.446390] x23: ffff3e87a7e10000 x22: 00000000000000c0 x21: 0000000000000003
> [ 16.446392] x20: 0000000000000000 x19: ffff3e87a7e14b98 x18: 0000000000000000
> [ 16.446395] x17: ffff3e878245d180 x16: ffffb6dfa26e0c28 x15: ffff3e87810bcbc0
> [ 16.446398] x14: 00000000fa481758 x13: 0000000000000000 x12: ffff800080aa3dd7
> [ 16.446401] x11: 0000000000000040 x10: ffff3e87801ba830 x9 : ffffb6dfa26e0c48
> [ 16.446403] x8 : ffff3e8786eb5268 x7 : 0000000000000000 x6 : 0000000000000000
> [ 16.446406] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
> [ 16.446408] x2 : 0000000000000000 x1 : 0000000000000003 x0 : 0000000000000001
> [ 16.446412] Kernel panic - not syncing: Asynchronous SError Interrupt
>
> Boom - unrecoverable bus error triggered by the PCI access.
>
> [ 16.446414] CPU: 3 UID: 0 PID: 11 Comm: kworker/u128:0 Tainted: G U 6.14.0-rc1-ajb-debian-bisect-00027-g2499f5348431-dirty #68
> [ 16.446417] Tainted: [U]=USER
> [ 16.446418] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
> [ 16.446419] Workqueue: efi_rts_wq efi_call_rts
> [ 16.446424] Call trace:
> [ 16.446425] show_stack+0x34/0x98 (C)
> [ 16.446431] dump_stack_lvl+0x60/0x80
> [ 16.446436] dump_stack+0x18/0x24
> [ 16.446440] panic+0x164/0x378
> [ 16.446443] nmi_panic+0x90/0x98
> [ 16.446448] arm64_serror_panic+0x6c/0x80
> [ 16.446452] do_serror+0x30/0x78
> [ 16.446456] el1h_64_error_handler+0x30/0x50
> [ 16.446462] el1h_64_error+0x6c/0x70
> [ 16.446464] __wake_up_common_lock+0x40/0xc0 (P)
> [ 16.446468] __wake_up+0x20/0x40
> [ 16.446471] amdgpu_ih_process+0x100/0x160 [amdgpu]
> [ 16.447083] amdgpu_irq_handler+0x34/0xa0 [amdgpu]
> [ 16.447637] __handle_irq_event_percpu+0x60/0x1d8
> [ 16.447642] handle_irq_event+0x4c/0x110
> [ 16.447646] handle_fasteoi_irq+0xb4/0x220
> [ 16.447649] handle_irq_desc+0x3c/0x68
> [ 16.447652] generic_handle_domain_irq+0x24/0x40
> [ 16.447656] gic_handle_irq+0x54/0x124
> [ 16.447658] do_interrupt_handler+0x58/0xa0
> [ 16.447661] el1_interrupt+0x34/0x58
> [ 16.447665] el1h_64_irq_handler+0x18/0x28
> [ 16.447669] el1h_64_irq+0x6c/0x70
> [ 16.447672] 0xfad10918 (P)
> [ 16.447674] 0xfabe01c8
> [ 16.447676] 0xfabe02d4
> [ 16.447677] 0xfa3e209c
> [ 16.447679] 0xfa43ae7c
> [ 16.447680] 0xfa43b6bc
> [ 16.447681] 0xfa436e44
> [ 16.447683] 0xfa43c3f8
> [ 16.447684] __efi_rt_asm_wrapper+0x50/0x78
> [ 16.447687] efi_call_rts+0x1c8/0x280
> [ 16.447691] process_one_work+0x178/0x3e0
> [ 16.447695] worker_thread+0x204/0x3f0
> [ 16.447698] kthread+0x10c/0x1f0
> [ 16.447703] ret_from_fork+0x10/0x20
> [ 16.447705] SMP: stopping secondary CPUs
> [ 16.447796] Kernel Offset: 0x36df225a0000 from 0xffff800080000000
> [ 16.447798] PHYS_OFFSET: 0xffffc97880000000
> [ 16.447799] CPU features: 0x200,00002170,00901250,8241720b
> [ 16.447802] Memory Limit: none
> [ 16.471034] pstore: backend (efi_pstore) writing error (-16)
> [ 16.801136] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
>
> The bisection was slightly complicated by the fact I'm carrying some
> additional patches to work around other PCIe issues which however work
> find before the failing commit. For convenience I've pushed a branch with the work
> around applied here:
>
> https://gitlab.com/stsquad/linux/-/commits/testing/pci-amdgpu-regression-reference
>
> Additional information
>
> lspci -vv info for card
>
> 000d:03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600] (rev cf) (prog-if 00 [VGA controller])
> Subsystem: Sapphire Technology Limited Device e448
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0
> Interrupt: pin A routed to IRQ 151
> NUMA node: 0
> IOMMU group: 21
> Region 0: Memory at 340000000000 (64-bit, prefetchable) [size=8G]
> Region 2: Memory at 340200000000 (64-bit, prefetchable) [size=2M]
> Region 5: Memory at 50000000 (32-bit, non-prefetchable) [size=1M]
> Expansion ROM at 50100000 [disabled] [size=128K]
> Capabilities: [48] Vendor Specific Information: Len=08 <?>
> Capabilities: [50] Power Management version 3
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [64] Express (v2) Legacy Endpoint, IntMsgNum 0
> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- TEE-IO-
> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
> LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM L1, Exit Latency L1 <1us
> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 16GT/s, Width x8
> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
> 10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
> EmergencyPowerReduction Form Factor Dev Specific, EmergencyPowerReductionInit-
> FRS-
> AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
> AtomicOpsCtl: ReqEn-
> IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
> 10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
> LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
> LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
> LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
> EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
> Retimer- 2Retimers- CrosslinkRes: unsupported
> Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> Address: 00000000ffb77040 Data: 0000
> Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> Capabilities: [150 v2] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
> ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
> PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
> ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
> PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
> ECRC- UnsupReq- ACSViol- UncorrIntErr+ BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
> PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF-
> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF-
> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> HeaderLog: 00000000 00000000 00000000 00000000
> Capabilities: [200 v1] Physical Resizable BAR
> BAR 0: current size: 8GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB
> BAR 2: current size: 2MB, supported: 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB
> Capabilities: [240 v1] Power Budgeting <?>
> Capabilities: [270 v1] Secondary PCI Express
> LnkCtl3: LnkEquIntrruptEn- PerformEqu-
> LaneErrStat: 0
> Capabilities: [2a0 v1] Access Control Services
> ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
> ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
> Capabilities: [2d0 v1] Process Address Space ID (PASID)
> PASIDCap: Exec+ Priv+, Max PASID Width: 10
> PASIDCtl: Enable+ Exec+ Priv+
> Capabilities: [320 v1] Latency Tolerance Reporting
> Max snoop latency: 0ns
> Max no snoop latency: 0ns
> Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
> Capabilities: [450 v1] Lane Margining at the Receiver
> PortCap: Uses Driver-
> PortSta: MargReady+ MargSoftReady-
> Kernel driver in use: amdgpu
> Kernel modules: amdgpu
>
> iomem layout from a working bootup (e89df6d2beae):
>
> 08000000-0fffffff : PCI Bus 0002:00
> 08000000-081fffff : PCI Bus 0002:01
> 08200000-083fffff : PCI Bus 0002:02
> 20000000-2fffffff : PCI Bus 0004:00
> 20000000-217fffff : PCI Bus 0004:01
> 20000000-217fffff : PCI Bus 0004:02
> 20000000-20ffffff : 0004:02:00.0
> 20000000-202fffff : efifb
> 21000000-2101ffff : 0004:02:00.0
> 21800000-219fffff : PCI Bus 0004:03
> 21800000-21801fff : 0004:03:00.0
> 21800000-21801fff : xhci-hcd
> 21a00000-21bfffff : PCI Bus 0004:04
> 21a00000-21a7ffff : 0004:04:00.0
> 21a00000-21a7ffff : igb
> 21a80000-21a83fff : 0004:04:00.0
> 21a80000-21a83fff : igb
> 21c00000-21dfffff : PCI Bus 0004:05
> 30000000-3fffffff : PCI Bus 0005:00
> 30000000-301fffff : PCI Bus 0005:01
> 30200000-303fffff : PCI Bus 0005:02
> 30400000-305fffff : PCI Bus 0005:03
> 30400000-30403fff : 0005:03:00.0
> 30400000-30403fff : nvme
> 30600000-307fffff : PCI Bus 0005:04
> 30600000-30603fff : 0005:04:00.0
> 30600000-30603fff : nvme
> 40000000-4fffffff : PCI Bus 000c:00
> 40000000-401fffff : PCI Bus 000c:01
> 50000000-5fffffff : PCI Bus 000d:00
> 50000000-502fffff : PCI Bus 000d:01
> 50000000-501fffff : PCI Bus 000d:02
> 50000000-501fffff : PCI Bus 000d:03
> 50000000-500fffff : 000d:03:00.0
> 50100000-5011ffff : 000d:03:00.0
> 50120000-50123fff : 000d:03:00.1
> 50120000-50123fff : ICH HD audio
> 50200000-50203fff : 000d:01:00.0
> 70000000-7fffffff : PCI Bus 0000:00
> 70000000-701fffff : PCI Bus 0000:01
> 88300000-883fffff : reserved
> 88500000-885fffff : IFX0785:00
> 88500000-885fffff : IFX0785:00
> 88900000-8891ffff : AMPC0005:00
> 90000000-91ffffff : System RAM
> 92000000-927bffff : reserved
> 927c0000-f896ffff : System RAM
> d54f0000-d6adffff : Kernel code
> d6ae0000-d6daffff : reserved
> d6db0000-d717ffff : Kernel data
> ef650000-f3650fff : reserved
> f3850000-f49a2fff : reserved
> f88b0000-f88bffff : reserved
> f8970000-f898ffff : reserved
> f8990000-f899ffff : System RAM
> f89a0000-f89fffff : reserved
> f8a00000-f9196fff : System RAM
> f8a00000-f8a00fff : reserved
> f8a02000-f8a02fff : reserved
> f9197000-f91ecfff : reserved
> f91ed000-f94cffff : System RAM
> f91fb000-f91fbfff : reserved
> f94d0000-f950ffff : reserved
> f9510000-f98bffff : System RAM
> f98c0000-f98fffff : reserved
> f9900000-f999ffff : System RAM
> f99a0000-f99dffff : reserved
> f99e0000-f9f4ffff : System RAM
> f9ef0000-f9f1ffff : reserved
> f9f50000-f9f6ffff : reserved
> f9f70000-fa0affff : System RAM
> fa0b0000-fa0effff : reserved
> fa0f0000-fa1cffff : System RAM
> fa1d0000-fa26ffff : reserved
> fa270000-fa33ffff : System RAM
> fa340000-fa4affff : reserved
> fa4b0000-fa4bffff : System RAM
> fa4c0000-fa57ffff : reserved
> fa580000-fa72ffff : System RAM
> fa730000-fa7cffff : reserved
> fa7d0000-faa4ffff : System RAM
> faa50000-faaeffff : reserved
> faaf0000-fab7ffff : System RAM
> fab80000-fac1ffff : reserved
> fac20000-facaffff : System RAM
> facb0000-fad4ffff : reserved
> fad50000-fae1ffff : System RAM
> fae20000-faebffff : reserved
> faec0000-faf4ffff : System RAM
> faf50000-fafeffff : reserved
> faff0000-ffefffff : System RAM
> fbe00000-ffdfffff : reserved
> fff00000-fff4ffff : reserved
> fff50000-fffaffff : System RAM
> fffb0000-fffdffff : reserved
> fffc0000-fffc0fff : reserved
> fffe0000-ffffffff : System RAM
> fffe0000-fffeffff : reserved
> 80000000000-8007fffffff : System RAM
> 800002bc000-800002bcfff : reserved
> 80000840000-8000084ffff : reserved
> 80000850000-8000085ffff : reserved
> 80000860000-8000086ffff : reserved
> 80000870000-8000087ffff : reserved
> 80000880000-8000088ffff : reserved
> 80000890000-8000089ffff : reserved
> 800008a0000-800008affff : reserved
> 800008b0000-800008bffff : reserved
> 800008c0000-800008cffff : reserved
> 800008d0000-800008dffff : reserved
> 800008e0000-800008effff : reserved
> 800008f0000-800008fffff : reserved
> 80000900000-8000090ffff : reserved
> 80000910000-8000091ffff : reserved
> 80000920000-8000092ffff : reserved
> 80000930000-8000093ffff : reserved
> 80000940000-8000094ffff : reserved
> 80000950000-8000095ffff : reserved
> 80000960000-8000096ffff : reserved
> 80000970000-8000097ffff : reserved
> 80000980000-8000098ffff : reserved
> 80000990000-8000099ffff : reserved
> 800009a0000-800009affff : reserved
> 800009b0000-800009bffff : reserved
> 800009c0000-800009cffff : reserved
> 800009d0000-800009dffff : reserved
> 800009e0000-800009effff : reserved
> 800009f0000-800009fffff : reserved
> 80000a00000-80000a0ffff : reserved
> 80000a10000-80000a1ffff : reserved
> 80000a20000-80000a2ffff : reserved
> 80000a30000-80000a3ffff : reserved
> 80000a40000-80000a4ffff : reserved
> 80100000000-807ffffffff : System RAM
> 807d8c10000-807fbffffff : reserved
> 807fc009000-807fc039fff : reserved
> 807fc03c000-807fc03ffff : reserved
> 807fc040000-807fc040fff : reserved
> 807fc041000-807fc044fff : reserved
> 807fc045000-807fc06afff : reserved
> 807fc06b000-807ffffffff : reserved
> 100002600000-100002600fff : ARMH0011:00
> 100002600000-100002600fff : ARMH0011:00 ARMH0011:00
> 100002620000-100002620fff : ARMH0011:01
> 100002620000-100002620fff : ARMH0011:01 ARMH0011:01
> 1000026c0000-1000026cffff : APMC0D0F:00
> 1000026c0000-1000026cffff : APMC0D0F:00 APMC0D0F:00
> 1000026d0000-1000026dffff : APMC0D07:02
> 1000026f0000-1000026fffff : APMC0D07:00
> 100002730000-100002730fff : arch_mem_timer
> 100002750000-10000275ffff : APMC0D0F:01
> 100002750000-10000275ffff : APMC0D0F:01 APMC0D0F:01
> 100002780000-10000278ffff : APMC0D0F:02
> 100002780000-10000278ffff : APMC0D0F:02 APMC0D0F:02
> 1000027b0000-1000027bffff : APMC0D07:01
> 1000027c0000-1000027c0fff : sbsa-gwdt.0
> 1000027c0000-1000027c0fff : sbsa-gwdt.0 sbsa-gwdt.0
> 1000027d0000-1000027d0fff : sbsa-gwdt.0
> 1000027d0000-1000027d0fff : sbsa-gwdt.0 sbsa-gwdt.0
> 100010000000-10001fffffff : ARMHC600:00
> 100012500000-1000164fffff : ARMHC600:00
> 10008c000a00-10008c000bff : ARMHD620:00
> 10008d000a00-10008d000bff : ARMHD620:04
> 100100000000-10010000ffff : GICD
> 100100140000-10010113ffff : GICR
> 200000000000-23ffdfffffff : PCI Bus 0002:00
> 200000000000-2000001fffff : PCI Bus 0002:01
> 200000200000-2000003fffff : PCI Bus 0002:02
> 23ffe0000000-23ffe001ffff : arm-smmu-v3.3.auto
> 23ffe0000000-23ffe0000dff : arm-smmu-v3.3.auto
> 23ffe0010000-23ffe0010dff : arm-smmu-v3.3.auto
> 23fff0000000-23ffffffffff : PCI ECAM
> 27fff0000000-27ffffffffff : pnp 00:00
> 280000000000-2bffdfffffff : PCI Bus 0004:00
> 280000000000-2800001fffff : PCI Bus 0004:01
> 280000200000-2800003fffff : PCI Bus 0004:03
> 280000400000-2800005fffff : PCI Bus 0004:04
> 280000600000-2800007fffff : PCI Bus 0004:05
> 2bffe0000000-2bffe001ffff : arm-smmu-v3.4.auto
> 2bffe0000000-2bffe0000dff : arm-smmu-v3.4.auto
> 2bffe0010000-2bffe0010dff : arm-smmu-v3.4.auto
> 2bfff0000000-2bffffffffff : PCI ECAM
> 2c0000000000-2fffdfffffff : PCI Bus 0005:00
> 2c0000000000-2c00001fffff : PCI Bus 0005:01
> 2c0000200000-2c00003fffff : PCI Bus 0005:02
> 2c0000400000-2c00005fffff : PCI Bus 0005:03
> 2c0000600000-2c00007fffff : PCI Bus 0005:04
> 2fffe0000000-2fffe001ffff : arm-smmu-v3.5.auto
> 2fffe0000000-2fffe0000dff : arm-smmu-v3.5.auto
> 2fffe0010000-2fffe0010dff : arm-smmu-v3.5.auto
> 2ffff0000000-2fffffffffff : PCI ECAM
> 300000000000-33ffdfffffff : PCI Bus 000c:00
> 300000000000-3000001fffff : PCI Bus 000c:01
> 33ffe0000000-33ffe001ffff : arm-smmu-v3.0.auto
> 33ffe0000000-33ffe0000dff : arm-smmu-v3.0.auto
> 33ffe0010000-33ffe0010dff : arm-smmu-v3.0.auto
> 33fff0000000-33ffffffffff : PCI ECAM
> 340000000000-37ffdfffffff : PCI Bus 000d:00
> 340000000000-3402ffffffff : PCI Bus 000d:01
> 340000000000-3402ffffffff : PCI Bus 000d:02
> 340000000000-3402ffffffff : PCI Bus 000d:03
> 340000000000-3401ffffffff : 000d:03:00.0
> 340200000000-3402001fffff : 000d:03:00.0
> 37ffe0000000-37ffe001ffff : arm-smmu-v3.1.auto
> 37ffe0000000-37ffe0000dff : arm-smmu-v3.1.auto
> 37ffe0010000-37ffe0010dff : arm-smmu-v3.1.auto
> 37fff0000000-37ffffffffff : PCI ECAM
> 3bfff0000000-3bffffffffff : pnp 00:00
> 3c0000000000-3fffdfffffff : PCI Bus 0000:00
> 3c0000000000-3c00001fffff : PCI Bus 0000:01
> 3fffe0000000-3fffe001ffff : arm-smmu-v3.2.auto
> 3fffe0000000-3fffe0000dff : arm-smmu-v3.2.auto
> 3fffe0010000-3fffe0010dff : arm-smmu-v3.2.auto
> 3ffff0000000-3fffffffffff : PCI ECAM
> 63fff0000000-63ffffffffff : pnp 00:00
> 67fff0000000-67ffffffffff : pnp 00:00
> 6bfff0000000-6bffffffffff : pnp 00:00
> 6ffff0000000-6fffffffffff : pnp 00:00
> 7bfff0000000-7bffffffffff : pnp 00:00
> 7ffff0000000-7fffffffffff : pnp 00:00
>
> working dmesg from same:
>
> [ 15.500492] [drm] GPU posting now...
> [ 15.504110] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
> [ 15.512654] amdgpu 000d:03:00.0: BAR 2 [mem 0x340010000000-0x3400101fffff 64bit pref]: releasing
> [ 15.521431] amdgpu 000d:03:00.0: BAR 0 [mem 0x340000000000-0x34000fffffff 64bit pref]: releasing
> [ 15.530230] pcieport 000d:02:00.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: releasing
> [ 15.539881] pcieport 000d:01:00.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: releasing
> [ 15.549528] pcieport 000d:00:01.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: releasing
> [ 15.549535] pcieport 000d:00:01.0: bridge window [io 0x1000-0x0fff] to [bus 01-03] add_size 1000
> [ 15.549544] pcieport 000d:00:01.0: bridge window [mem 0x340000000000-0x3402ffffffff 64bit pref]: assigned
> [ 15.549546] pcieport 000d:00:01.0: bridge window [io size 0x1000]: can't assign; no space
> [ 15.549549] pcieport 000d:00:01.0: bridge window [io size 0x1000]: failed to assign
> [ 15.596468] pcieport 000d:00:01.0: bridge window [io size 0x1000]: can't assign; no space
> [ 15.607594] pcieport 000d:00:01.0: bridge window [io size 0x1000]: failed to assign
> [ 15.618090] pcieport 000d:00:01.0: bridge window [io size 0x1000]: ignoring failure in optional allocation
> [ 15.618095] pcieport 000d:01:00.0: bridge window [mem 0x340000000000-0x3402ffffffff 64bit pref]: assigned
> [ 15.628249] pcieport 000d:02:00.0: bridge window [mem 0x340000000000-0x3402ffffffff 64bit pref]: assigned
> [ 15.637806] amdgpu 000d:03:00.0: BAR 0 [mem 0x340000000000-0x3401ffffffff 64bit pref]: assigned
> [ 15.646506] amdgpu 000d:03:00.0: BAR 2 [mem 0x340200000000-0x3402001fffff 64bit pref]: assigned
> [ 15.655205] pcieport 000d:00:01.0: PCI bridge to [bus 01-03]
> [ 15.660856] pcieport 000d:00:01.0: bridge window [mem 0x50000000-0x502fffff]
> [ 15.668069] pcieport 000d:00:01.0: bridge window [mem 0x340000000000-0x3402ffffffff 64bit pref]
> [ 15.676931] pcieport 000d:01:00.0: PCI bridge to [bus 02-03]
> [ 15.682586] pcieport 000d:01:00.0: bridge window [mem 0x50000000-0x501fffff]
> [ 15.689804] pcieport 000d:01:00.0: bridge window [mem 0x340000000000-0x3402ffffffff 64bit pref]
> [ 15.698672] pcieport 000d:02:00.0: PCI bridge to [bus 03]
> [ 15.704067] pcieport 000d:02:00.0: bridge window [mem 0x50000000-0x501fffff]
> [ 15.711285] pcieport 000d:02:00.0: bridge window [mem 0x340000000000-0x3402ffffffff 64bit pref]
> [ 15.720157] amdgpu 000d:03:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
> [ 15.729714] amdgpu 000d:03:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
> [ 15.738064] [drm] Detected VRAM RAM=8176M, BAR=8192M
> [ 15.743019] [drm] RAM width 128bits GDDR6
> [ 15.747258] [drm] amdgpu: 8176M of VRAM memory ready
> [ 15.752219] [drm] amdgpu: 15888M of GTT memory ready.
> [ 15.757297] [drm] GART: num cpu pages 131072, num gpu pages 131072
> [ 15.763558] [drm] PCIE GART of 512M enabled (table at 0x00000081FEB00000).
> [ 15.884845] [drm] Loading DMUB firmware via PSP: version=0x07002D00
> [ 16.129125] [drm] Found VCN firmware Version ENC: 1.23 DEC: 9 VEP: 0 Revision: 16
>
> From discussions with Ard it seems if the firmware had resized the BAR first,
> and then assigned the resources, there would be no issue. However there
> is no latter firmware for the platform.
>
> While the PCI change has provoked this regression I suspect the amdgpu code
> could handle the failure to resize the BAR better and if it can't get
> what it wants just not initialise the driver. I did hit some cases while
> bisecting where the GPU just wasn't visible.
>
> I'm available to test patches and generate additional debug info so do
> let me know if there is anything I can do to help.
>
> Thanks,
>
> --
> Alex Bennée
> Virtualisation Tech Lead @ Linaro
next prev parent reply other threads:[~2025-10-22 17:08 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-22 16:51 2499f53 (PCI: Rework optional resource handling) regression with AMDGPU on Arm AVA platform Alex Bennée
2025-10-22 17:08 ` Ard Biesheuvel [this message]
2025-10-23 16:20 ` Bjorn Helgaas
2025-10-23 17:24 ` Ilpo Järvinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMj1kXHO50CBHLpUChuGpO4REEAifxDqcr_j362OzJSoz4yLTA@mail.gmail.com \
--to=ardb@kernel.org \
--cc=alex.bennee@linaro.org \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=bhelgaas@google.com \
--cc=christian.koenig@amd.com \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=linux-pci@vger.kernel.org \
--cc=lorenzo.pieralisi@linaro.org \
--cc=scott@os.amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).