* [REGRESSION] amdgpu fails to load eGPU after 6.19
@ 2026-04-14 0:16 Rio Liu
2026-04-14 14:28 ` Ilpo Järvinen
0 siblings, 1 reply; 4+ messages in thread
From: Rio Liu @ 2026-04-14 0:16 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
regressions@lists.linux.dev, Bjorn Helgaas
Hello Ilpo,
There seems to be another PCI alignment issue with external amdgpu since 6.19.
Bisecting this time pointed me to this commit
commit bc75c8e5071120e919beb39e69f0979cccfdf219 (HEAD)
Author: Ilpo J<C3><A4>rvinen <ilpo.jarvinen@linux.intel.com>
Date: Fri Dec 19 19:40:15 2025 +0200
PCI: Rewrite bridge window head alignment function
It looks like the same issue that has happened before in
https://lore.kernel.org/all/o2bL8MtD_40-lf8GlslTw-AZpUPzm8nmfCnJKvS8RQ3NOzOW1uq1dVCEfRpUjJ2i7G2WjfQhk2IWZ7oGp-7G-jXN4qOdtnyOcjRR0PZWK5I=@r26.me/.
It seems like the previous fix with min_align
https://lore.kernel.org/all/20250822123359.16305-2-ilpo.jarvinen@linux.intel.com/
got removed in this commit.
Applying the following patch to the commit fixes the regression. I'm still
looking at how to rebasing it onto latest commit as there is quite a bit of
code change around it. But the same regression still happens as of v7.0-rc7.
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 80e5a8fc62e7..12ab84271214 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1445,7 +1445,7 @@ static void pbus_size_mem(struct pci_bus *bus, unsigned long type,
if (bus->self && size0 &&
!pbus_upstream_space_available(bus, b_res, size0, min_align)) {
- min_align = calculate_head_align(aligns2, max_order);
+ min_align = min(min_align, calculate_head_align(aligns2, max_order));
size0 = calculate_memsize(size, min_size, 0, 0, old_size, win_align);
resource_set_range(b_res, min_align, size0);
pci_info(bus->self, "bridge window %pR to %pR requires relaxed alignment rules\n",
@@ -1459,7 +1459,7 @@ static void pbus_size_mem(struct pci_bus *bus, unsigned long type,
if (bus->self && size1 &&
!pbus_upstream_space_available(bus, b_res, size1, add_align)) {
- min_align = calculate_head_align(aligns2, max_order);
+ min_align = min(min_align, calculate_head_align(aligns2, max_order));
size1 = calculate_memsize(size, min_size, add_size, children_add_size,
old_size, win_align);
pci_info(bus->self,
---
Relevant errors in dmesg:
[ 10.166037] amdgpu: Virtual CRAT table created for CPU
[ 10.166050] amdgpu: Topology: Add CPU node
[ 10.166166] amdgpu 0000:08:00.0: enabling device (0000 -> 0002)
[ 10.166293] amdgpu 0000:08:00.0: initializing kernel modesetting (SIENNA_CICHLID 0x1002:0x73BF 0x148C:0x2406 0xC1).
[ 10.166345] amdgpu 0000:08:00.0: register mmio base: 0x8C000000
[ 10.166347] amdgpu 0000:08:00.0: register mmio size: 1048576
[ 10.173624] wlan0: Limiting TX power to 30 (30 - 0) dBm as advertised by 72:13:01:87:79:82
[ 10.174898] amdgpu 0000:08:00.0: detected ip block number 0 <common_v1_0_0> (nv_common)
[ 10.174901] amdgpu 0000:08:00.0: detected ip block number 1 <gmc_v10_0_0> (gmc_v10_0)
[ 10.174903] amdgpu 0000:08:00.0: detected ip block number 2 <ih_v5_0_0> (navi10_ih)
[ 10.174904] amdgpu 0000:08:00.0: detected ip block number 3 <psp_v11_0_0> (psp)
[ 10.174906] amdgpu 0000:08:00.0: detected ip block number 4 <smu_v11_0_0> (smu)
[ 10.174907] amdgpu 0000:08:00.0: detected ip block number 5 <dce_v1_0_0> (dm)
[ 10.174908] amdgpu 0000:08:00.0: detected ip block number 6 <gfx_v10_0_0> (gfx_v10_0)
[ 10.174909] amdgpu 0000:08:00.0: detected ip block number 7 <sdma_v5_2_0> (sdma_v5_2)
[ 10.174911] amdgpu 0000:08:00.0: detected ip block number 8 <vcn_v3_0_0> (vcn_v3_0)
[ 10.174912] amdgpu 0000:08:00.0: detected ip block number 9 <jpeg_v3_0_0> (jpeg_v3_0)
[ 10.278772] amdgpu 0000:08:00.0: Fetched VBIOS from ROM BAR
[ 10.278776] amdgpu 0000:08:00.0: [drm] ATOM BIOS: 113-001-X01
[ 10.308408] amdgpu 0000:08:00.0: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[ 10.308424] amdgpu 0000:08:00.0: PCIE atomic ops is not supported
[ 10.308433] amdgpu 0000:08:00.0: GPU posting now...
[ 10.308461] amdgpu 0000:08:00.0: MEM ECC is not presented.
[ 10.308462] amdgpu 0000:08:00.0: SRAM ECC is not presented.
[ 10.308484] amdgpu 0000:08:00.0: vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[ 10.308522] amdgpu 0000:08:00.0: Problem resizing BAR0 (-22).
[ 10.308529] amdgpu 0000:08:00.0: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[ 10.308531] amdgpu 0000:08:00.0: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ 10.308545] resource: resource sanity check: requesting [mem 0x0000000000000000-0xffffffffffffffff], which spans more than PCI Bus 0000:00 [mem 0x000a0000-0x000bffff window]
[ 10.308550] ------------[ cut here ]------------
[ 10.308551] WARNING: arch/x86/mm/pat/memtype.c:721 at memtype_reserve_io+0xfc/0x110, CPU#7: (udev-worker)/606
[ 10.308557] Modules linked in: ccm amdgpu(+) amdxcp drm_panel_backlight_quirks gpu_sched drm_exec snd_hda_codec_atihdmi drm_suballoc_helper drm_ttm_helper ntfs3 vfat fat v4l2loopback(OE) snd_seq_midi snd_seq_midi_event snd_seq snd_rawmidi snd_seq_device dm_multipath dm_mod kvmgt mdev vfio_iommu_type1 vfio iommufd crypto_user uinput cmac algif_hash algif_skcipher af_alg uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 btusb videobuf2_common btmtk btrtl videodev btbcm mc btintel snd_hda_codec_intelhdmi snd_hda_codec_hdmi snd_hda_codec_alc269 snd_hda_codec_realtek_lib snd_hda_scodec_component snd_hda_codec_generic snd_hda_intel snd_sof_pci_intel_cnl joydev snd_sof_intel_hda_generic soundwire_intel snd_sof_intel_hda_sdw_bpt snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda intel_rapl_msr soundwire_cadence intel_rapl_common snd_sof_pci snd_sof_xtensa_dsp intel_uncore_frequency intel_uncore_frequency_common snd_sof snd_sof_utils snd_soc_acpi_intel_match
[ 10.308591] snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_sdw_utils snd_soc_acpi intel_tcc_cooling soundwire_bus x86_pkg_temp_thermal intel_powerclamp snd_soc_sdca coretemp crc8 snd_soc_avs kvm_intel snd_soc_hda_codec mousedev snd_hda_ext_core kvm r8169 snd_hda_codec nvme 8021q irqbypass realtek snd_hda_core ghash_clmulni_intel rtsx_pci_sdmmc aesni_intel nvme_core snd_intel_dspcfg garp mdio_devres snd_intel_sdw_acpi spi_nor iwlmvm mrp iTCO_wdt rapl nvme_keyring stp mmc_core libphy intel_cstate nvme_auth snd_hwdep hid_multitouch mei_hdcp mei_pxp ee1004 intel_pmc_bxt mtd intel_wmi_thunderbolt llc clevo_xsm_wmi(OE) i915 intel_uncore snd_soc_core thunderbolt mdio_bus i2c_hid_acpi hkdf mac80211 rtsx_pci i2c_hid snd_compress drm_buddy ptp ac97_bus ttm pps_core snd_pcm_dmaengine i2c_algo_bit intel_oc_wdt libarc4 intel_pmc_core drm_display_helper snd_pcm pmt_telemetry cec pmt_discovery iwlwifi snd_timer i2c_i801 intel_lpss_pci intel_gtt pmt_class mei_me spi_intel_pci i2c_smbus snd intel_lpss
[ 10.308633] intel_pmc_ssram_telemetry intel_hid psmouse spi_intel video idma64 soundcore mei intel_pch_thermal i2c_mux intel_vsec pcspkr mac_hid serio_raw sparse_keymap wmi acpi_pad bnep cfg80211 bluetooth rfkill
[ 10.308645] CPU: 7 UID: 0 PID: 606 Comm: (udev-worker) Tainted: G S OE 7.0.0-rc7 #21 PREEMPT(full) 78435afb69b0b07f3561902db6ca6395f9133c11
[ 10.308648] Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 10.308649] Hardware name: COPELION INTERNATIONAL INC. ZX Series/ZX Series, BIOS 1.07.14RCOP1 12/29/2020
[ 10.308650] RIP: 0010:memtype_reserve_io+0xfc/0x110
[ 10.308652] Code: aa fb ff ff b8 f0 ff ff ff eb 88 8b 54 24 04 4c 89 ee 48 89 df e8 04 fe ff ff 85 c0 75 db 8b 54 24 04 41 89 16 e9 69 ff ff ff <0f> 0b e9 4b ff ff ff e8 48 d2 09 01 0f 1f 84 00 00 00 00 00 90 90
[ 10.308654] RSP: 0018:ffffcf4781c736d0 EFLAGS: 00010286
[ 10.308655] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000027
[ 10.308657] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff9b5f5e70
[ 10.308657] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffefff
[ 10.308658] R10: ffffffff9a85fac0 R11: ffffcf4781c73548 R12: 0000000000000001
[ 10.308659] R13: 0000000000000000 R14: ffffcf4781c7371c R15: 0000000000000001
[ 10.308660] FS: 00007fdda0fa5c80(0000) GS:ffff8b4d39075000(0000) knlGS:0000000000000000
[ 10.308662] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 10.308662] CR2: 00007ffd23cf0ff0 CR3: 0000000109bfa005 CR4: 00000000003726f0
[ 10.308664] Call Trace:
[ 10.308665] <TASK>
[ 10.308666] arch_io_reserve_memtype_wc+0x31/0x50
[ 10.308670] amdgpu_bo_init+0x3e/0x90 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
[ 10.309129] ? amdgpu_gmc_get_vbios_allocations+0xa9/0x140 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
[ 10.309416] gmc_v10_0_sw_init+0x352/0x5d0 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
[ 10.309721] amdgpu_device_init.cold+0x1612/0x22f8 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
[ 10.310080] ? pci_conf1_read+0xb2/0x100
[ 10.310084] ? pci_bus_read_config_word+0x4c/0x80
[ 10.310087] amdgpu_driver_load_kms+0x19/0x80 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
[ 10.310352] amdgpu_pci_probe+0x233/0x480 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
[ 10.310610] local_pci_probe+0x3e/0x90
[ 10.310614] pci_device_probe+0xe1/0x260
[ 10.310616] ? sysfs_do_create_link_sd+0x6d/0xd0
[ 10.310619] really_probe+0xde/0x380
[ 10.310622] __driver_probe_device+0x78/0x150
[ 10.310624] driver_probe_device+0x1f/0xa0
[ 10.310625] ? __pfx___driver_attach+0x10/0x10
[ 10.310627] __driver_attach+0xcb/0x210
[ 10.310628] bus_for_each_dev+0x85/0xd0
[ 10.310632] bus_add_driver+0x118/0x200
[ 10.310634] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
[ 10.310890] driver_register+0x75/0xe0
[ 10.310893] ? amdgpu_init+0x36/0xff0 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
[ 10.311150] do_one_initcall+0x5d/0x330
[ 10.311155] do_init_module+0x62/0x250
[ 10.311158] ? init_module_from_file+0xd8/0x140
[ 10.311160] init_module_from_file+0xd8/0x140
[ 10.311163] idempotent_init_module+0x114/0x310
[ 10.311166] __x64_sys_finit_module+0x71/0xe0
[ 10.311167] do_syscall_64+0x11c/0x15f0
[ 10.311170] ? alloc_fd+0x12e/0x190
[ 10.311172] ? do_sys_openat2+0x9a/0xe0
[ 10.311175] ? __x64_sys_openat+0x61/0xa0
[ 10.311177] ? do_syscall_64+0x11c/0x15f0
[ 10.311179] ? alloc_fd+0x12e/0x190
[ 10.311180] ? do_sys_openat2+0x9a/0xe0
[ 10.311182] ? __x64_sys_openat+0x61/0xa0
[ 10.311184] ? do_syscall_64+0x11c/0x15f0
[ 10.311186] ? do_syscall_64+0x2d6/0x15f0
[ 10.311187] ? do_syscall_64+0x11c/0x15f0
[ 10.311189] ? clear_bhb_loop+0x30/0x80
[ 10.311191] ? clear_bhb_loop+0x30/0x80
[ 10.311192] ? clear_bhb_loop+0x30/0x80
[ 10.311193] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 10.311195] RIP: 0033:0x7fdda10b967d
[ 10.311197] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 63 16 0d 00 f7 d8 64 89 01 48
[ 10.311198] RSP: 002b:00007ffea5c85328 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 10.311200] RAX: ffffffffffffffda RBX: 000055c8dfdae250 RCX: 00007fdda10b967d
[ 10.311201] RDX: 0000000000000004 RSI: 00007fdda0f5b2f2 RDI: 0000000000000018
[ 10.311202] RBP: 00007ffea5c853c0 R08: 0000000000000000 R09: 0000000000000000
[ 10.311203] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000020000
[ 10.311204] R13: 000055c8dfdb3130 R14: 000055c8dfdae250 R15: 0000000000000000
[ 10.311205] </TASK>
[ 10.311206] ---[ end trace 0000000000000000 ]---
[ 10.311208] [drm:amdgpu_bo_init [amdgpu]] *ERROR* Unable to set WC memtype for the aperture base
[ 10.311476] amdgpu 0000:08:00.0: sw_init of IP block <gmc_v10_0> failed -22
[ 10.311477] amdgpu 0000:08:00.0: amdgpu_device_ip_init failed
[ 10.311478] amdgpu 0000:08:00.0: Fatal error during GPU init
[ 10.311480] amdgpu 0000:08:00.0: finishing device.
[ 10.311825] amdgpu 0000:08:00.0: probe with driver amdgpu failed with error -22
Best,
Rio
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [REGRESSION] amdgpu fails to load eGPU after 6.19
2026-04-14 0:16 [REGRESSION] amdgpu fails to load eGPU after 6.19 Rio Liu
@ 2026-04-14 14:28 ` Ilpo Järvinen
2026-04-14 23:03 ` Rio Liu
0 siblings, 1 reply; 4+ messages in thread
From: Ilpo Järvinen @ 2026-04-14 14:28 UTC (permalink / raw)
To: Rio Liu
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
regressions@lists.linux.dev, Bjorn Helgaas
[-- Attachment #1: Type: text/plain, Size: 13882 bytes --]
On Tue, 14 Apr 2026, Rio Liu wrote:
> There seems to be another PCI alignment issue with external amdgpu since 6.19.
> Bisecting this time pointed me to this commit
>
> commit bc75c8e5071120e919beb39e69f0979cccfdf219 (HEAD)
> Author: Ilpo J<C3><A4>rvinen <ilpo.jarvinen@linux.intel.com>
> Date: Fri Dec 19 19:40:15 2025 +0200
>
> PCI: Rewrite bridge window head alignment function
>
> It looks like the same issue that has happened before in
> https://lore.kernel.org/all/o2bL8MtD_40-lf8GlslTw-AZpUPzm8nmfCnJKvS8RQ3NOzOW1uq1dVCEfRpUjJ2i7G2WjfQhk2IWZ7oGp-7G-jXN4qOdtnyOcjRR0PZWK5I=@r26.me/.
> It seems like the previous fix with min_align
> https://lore.kernel.org/all/20250822123359.16305-2-ilpo.jarvinen@linux.intel.com/
> got removed in this commit.
Hi,
Changing it seems 100% intentional. Even if symptoms you see look similar,
the root cause is likely different.
> Applying the following patch to the commit fixes the regression. I'm still
> looking at how to rebasing it onto latest commit as there is quite a bit of
> code change around it. But the same regression still happens as of v7.0-rc7.
Yes, definitely, things have changed a lot. It's not possible to use the
"same" fix with the new algorithm which works in a different way so trying
to forward port the old fix will not be useful.
Please note that there are also some fixes to the new algorithm which are
only queued for v7.1 as is (I expect they'll be backported from there
though). They're currently in the pci/resource branch awaiting PCI
maintainer (Bjorn Helgaas) to make PR to Linus.
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 80e5a8fc62e7..12ab84271214 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1445,7 +1445,7 @@ static void pbus_size_mem(struct pci_bus *bus, unsigned long type,
>
> if (bus->self && size0 &&
> !pbus_upstream_space_available(bus, b_res, size0, min_align)) {
In the very latest code, this is entirely gone (this is likely just
because you used bisect, so it took only part of the commits).
> - min_align = calculate_head_align(aligns2, max_order);
> + min_align = min(min_align, calculate_head_align(aligns2, max_order));
> size0 = calculate_memsize(size, min_size, 0, 0, old_size, win_align);
> resource_set_range(b_res, min_align, size0);
> pci_info(bus->self, "bridge window %pR to %pR requires relaxed alignment rules\n",
> @@ -1459,7 +1459,7 @@ static void pbus_size_mem(struct pci_bus *bus, unsigned long type,
>
> if (bus->self && size1 &&
> !pbus_upstream_space_available(bus, b_res, size1, add_align)) {
> - min_align = calculate_head_align(aligns2, max_order);
> + min_align = min(min_align, calculate_head_align(aligns2, max_order));
> size1 = calculate_memsize(size, min_size, add_size, children_add_size,
> old_size, win_align);
> pci_info(bus->self,
> ---
>
> Relevant errors in dmesg:
This snippet only contains collateral damage and does not show where the
problem originates from. Please provide a full dmesg instead with
dyndbg="file drivers/pci/*.c +p" on the kernel command line.
Preferrably test with the latest code + fixes that are in pci/resources
branch (you can just take all changes from there) and get the logs from
there. I wouldn't be surprised if your problem is already addressed by
those fixes but we'll see.
> [ 10.166037] amdgpu: Virtual CRAT table created for CPU
> [ 10.166050] amdgpu: Topology: Add CPU node
> [ 10.166166] amdgpu 0000:08:00.0: enabling device (0000 -> 0002)
> [ 10.166293] amdgpu 0000:08:00.0: initializing kernel modesetting (SIENNA_CICHLID 0x1002:0x73BF 0x148C:0x2406 0xC1).
> [ 10.166345] amdgpu 0000:08:00.0: register mmio base: 0x8C000000
> [ 10.166347] amdgpu 0000:08:00.0: register mmio size: 1048576
> [ 10.173624] wlan0: Limiting TX power to 30 (30 - 0) dBm as advertised by 72:13:01:87:79:82
> [ 10.174898] amdgpu 0000:08:00.0: detected ip block number 0 <common_v1_0_0> (nv_common)
> [ 10.174901] amdgpu 0000:08:00.0: detected ip block number 1 <gmc_v10_0_0> (gmc_v10_0)
> [ 10.174903] amdgpu 0000:08:00.0: detected ip block number 2 <ih_v5_0_0> (navi10_ih)
> [ 10.174904] amdgpu 0000:08:00.0: detected ip block number 3 <psp_v11_0_0> (psp)
> [ 10.174906] amdgpu 0000:08:00.0: detected ip block number 4 <smu_v11_0_0> (smu)
> [ 10.174907] amdgpu 0000:08:00.0: detected ip block number 5 <dce_v1_0_0> (dm)
> [ 10.174908] amdgpu 0000:08:00.0: detected ip block number 6 <gfx_v10_0_0> (gfx_v10_0)
> [ 10.174909] amdgpu 0000:08:00.0: detected ip block number 7 <sdma_v5_2_0> (sdma_v5_2)
> [ 10.174911] amdgpu 0000:08:00.0: detected ip block number 8 <vcn_v3_0_0> (vcn_v3_0)
> [ 10.174912] amdgpu 0000:08:00.0: detected ip block number 9 <jpeg_v3_0_0> (jpeg_v3_0)
> [ 10.278772] amdgpu 0000:08:00.0: Fetched VBIOS from ROM BAR
> [ 10.278776] amdgpu 0000:08:00.0: [drm] ATOM BIOS: 113-001-X01
> [ 10.308408] amdgpu 0000:08:00.0: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
> [ 10.308424] amdgpu 0000:08:00.0: PCIE atomic ops is not supported
> [ 10.308433] amdgpu 0000:08:00.0: GPU posting now...
> [ 10.308461] amdgpu 0000:08:00.0: MEM ECC is not presented.
> [ 10.308462] amdgpu 0000:08:00.0: SRAM ECC is not presented.
> [ 10.308484] amdgpu 0000:08:00.0: vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
> [ 10.308522] amdgpu 0000:08:00.0: Problem resizing BAR0 (-22).
> [ 10.308529] amdgpu 0000:08:00.0: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
> [ 10.308531] amdgpu 0000:08:00.0: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
> [ 10.308545] resource: resource sanity check: requesting [mem 0x0000000000000000-0xffffffffffffffff], which spans more than PCI Bus 0000:00 [mem 0x000a0000-0x000bffff window]
> [ 10.308550] ------------[ cut here ]------------
> [ 10.308551] WARNING: arch/x86/mm/pat/memtype.c:721 at memtype_reserve_io+0xfc/0x110, CPU#7: (udev-worker)/606
> [ 10.308557] Modules linked in: ccm amdgpu(+) amdxcp drm_panel_backlight_quirks gpu_sched drm_exec snd_hda_codec_atihdmi drm_suballoc_helper drm_ttm_helper ntfs3 vfat fat v4l2loopback(OE) snd_seq_midi snd_seq_midi_event snd_seq snd_rawmidi snd_seq_device dm_multipath dm_mod kvmgt mdev vfio_iommu_type1 vfio iommufd crypto_user uinput cmac algif_hash algif_skcipher af_alg uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 btusb videobuf2_common btmtk btrtl videodev btbcm mc btintel snd_hda_codec_intelhdmi snd_hda_codec_hdmi snd_hda_codec_alc269 snd_hda_codec_realtek_lib snd_hda_scodec_component snd_hda_codec_generic snd_hda_intel snd_sof_pci_intel_cnl joydev snd_sof_intel_hda_generic soundwire_intel snd_sof_intel_hda_sdw_bpt snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda intel_rapl_msr soundwire_cadence intel_rapl_common snd_sof_pci snd_sof_xtensa_dsp intel_uncore_frequency intel_uncore_frequency_common snd_sof snd_sof_utils snd_soc_acpi_intel_match
> [ 10.308591] snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_sdw_utils snd_soc_acpi intel_tcc_cooling soundwire_bus x86_pkg_temp_thermal intel_powerclamp snd_soc_sdca coretemp crc8 snd_soc_avs kvm_intel snd_soc_hda_codec mousedev snd_hda_ext_core kvm r8169 snd_hda_codec nvme 8021q irqbypass realtek snd_hda_core ghash_clmulni_intel rtsx_pci_sdmmc aesni_intel nvme_core snd_intel_dspcfg garp mdio_devres snd_intel_sdw_acpi spi_nor iwlmvm mrp iTCO_wdt rapl nvme_keyring stp mmc_core libphy intel_cstate nvme_auth snd_hwdep hid_multitouch mei_hdcp mei_pxp ee1004 intel_pmc_bxt mtd intel_wmi_thunderbolt llc clevo_xsm_wmi(OE) i915 intel_uncore snd_soc_core thunderbolt mdio_bus i2c_hid_acpi hkdf mac80211 rtsx_pci i2c_hid snd_compress drm_buddy ptp ac97_bus ttm pps_core snd_pcm_dmaengine i2c_algo_bit intel_oc_wdt libarc4 intel_pmc_core drm_display_helper snd_pcm pmt_telemetry cec pmt_discovery iwlwifi snd_timer i2c_i801 intel_lpss_pci intel_gtt pmt_class mei_me spi_intel_pci i2c_smbus snd intel_lpss
> [ 10.308633] intel_pmc_ssram_telemetry intel_hid psmouse spi_intel video idma64 soundcore mei intel_pch_thermal i2c_mux intel_vsec pcspkr mac_hid serio_raw sparse_keymap wmi acpi_pad bnep cfg80211 bluetooth rfkill
> [ 10.308645] CPU: 7 UID: 0 PID: 606 Comm: (udev-worker) Tainted: G S OE 7.0.0-rc7 #21 PREEMPT(full) 78435afb69b0b07f3561902db6ca6395f9133c11
> [ 10.308648] Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [ 10.308649] Hardware name: COPELION INTERNATIONAL INC. ZX Series/ZX Series, BIOS 1.07.14RCOP1 12/29/2020
> [ 10.308650] RIP: 0010:memtype_reserve_io+0xfc/0x110
> [ 10.308652] Code: aa fb ff ff b8 f0 ff ff ff eb 88 8b 54 24 04 4c 89 ee 48 89 df e8 04 fe ff ff 85 c0 75 db 8b 54 24 04 41 89 16 e9 69 ff ff ff <0f> 0b e9 4b ff ff ff e8 48 d2 09 01 0f 1f 84 00 00 00 00 00 90 90
> [ 10.308654] RSP: 0018:ffffcf4781c736d0 EFLAGS: 00010286
> [ 10.308655] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000027
> [ 10.308657] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff9b5f5e70
> [ 10.308657] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffefff
> [ 10.308658] R10: ffffffff9a85fac0 R11: ffffcf4781c73548 R12: 0000000000000001
> [ 10.308659] R13: 0000000000000000 R14: ffffcf4781c7371c R15: 0000000000000001
> [ 10.308660] FS: 00007fdda0fa5c80(0000) GS:ffff8b4d39075000(0000) knlGS:0000000000000000
> [ 10.308662] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 10.308662] CR2: 00007ffd23cf0ff0 CR3: 0000000109bfa005 CR4: 00000000003726f0
> [ 10.308664] Call Trace:
> [ 10.308665] <TASK>
> [ 10.308666] arch_io_reserve_memtype_wc+0x31/0x50
> [ 10.308670] amdgpu_bo_init+0x3e/0x90 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
> [ 10.309129] ? amdgpu_gmc_get_vbios_allocations+0xa9/0x140 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
> [ 10.309416] gmc_v10_0_sw_init+0x352/0x5d0 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
> [ 10.309721] amdgpu_device_init.cold+0x1612/0x22f8 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
> [ 10.310080] ? pci_conf1_read+0xb2/0x100
> [ 10.310084] ? pci_bus_read_config_word+0x4c/0x80
> [ 10.310087] amdgpu_driver_load_kms+0x19/0x80 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
> [ 10.310352] amdgpu_pci_probe+0x233/0x480 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
> [ 10.310610] local_pci_probe+0x3e/0x90
> [ 10.310614] pci_device_probe+0xe1/0x260
> [ 10.310616] ? sysfs_do_create_link_sd+0x6d/0xd0
> [ 10.310619] really_probe+0xde/0x380
> [ 10.310622] __driver_probe_device+0x78/0x150
> [ 10.310624] driver_probe_device+0x1f/0xa0
> [ 10.310625] ? __pfx___driver_attach+0x10/0x10
> [ 10.310627] __driver_attach+0xcb/0x210
> [ 10.310628] bus_for_each_dev+0x85/0xd0
> [ 10.310632] bus_add_driver+0x118/0x200
> [ 10.310634] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
> [ 10.310890] driver_register+0x75/0xe0
> [ 10.310893] ? amdgpu_init+0x36/0xff0 [amdgpu 9e1de60160a9bdc6283126cc89fe53e3272a6751]
> [ 10.311150] do_one_initcall+0x5d/0x330
> [ 10.311155] do_init_module+0x62/0x250
> [ 10.311158] ? init_module_from_file+0xd8/0x140
> [ 10.311160] init_module_from_file+0xd8/0x140
> [ 10.311163] idempotent_init_module+0x114/0x310
> [ 10.311166] __x64_sys_finit_module+0x71/0xe0
> [ 10.311167] do_syscall_64+0x11c/0x15f0
> [ 10.311170] ? alloc_fd+0x12e/0x190
> [ 10.311172] ? do_sys_openat2+0x9a/0xe0
> [ 10.311175] ? __x64_sys_openat+0x61/0xa0
> [ 10.311177] ? do_syscall_64+0x11c/0x15f0
> [ 10.311179] ? alloc_fd+0x12e/0x190
> [ 10.311180] ? do_sys_openat2+0x9a/0xe0
> [ 10.311182] ? __x64_sys_openat+0x61/0xa0
> [ 10.311184] ? do_syscall_64+0x11c/0x15f0
> [ 10.311186] ? do_syscall_64+0x2d6/0x15f0
> [ 10.311187] ? do_syscall_64+0x11c/0x15f0
> [ 10.311189] ? clear_bhb_loop+0x30/0x80
> [ 10.311191] ? clear_bhb_loop+0x30/0x80
> [ 10.311192] ? clear_bhb_loop+0x30/0x80
> [ 10.311193] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 10.311195] RIP: 0033:0x7fdda10b967d
> [ 10.311197] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 63 16 0d 00 f7 d8 64 89 01 48
> [ 10.311198] RSP: 002b:00007ffea5c85328 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [ 10.311200] RAX: ffffffffffffffda RBX: 000055c8dfdae250 RCX: 00007fdda10b967d
> [ 10.311201] RDX: 0000000000000004 RSI: 00007fdda0f5b2f2 RDI: 0000000000000018
> [ 10.311202] RBP: 00007ffea5c853c0 R08: 0000000000000000 R09: 0000000000000000
> [ 10.311203] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000020000
> [ 10.311204] R13: 000055c8dfdb3130 R14: 000055c8dfdae250 R15: 0000000000000000
> [ 10.311205] </TASK>
> [ 10.311206] ---[ end trace 0000000000000000 ]---
> [ 10.311208] [drm:amdgpu_bo_init [amdgpu]] *ERROR* Unable to set WC memtype for the aperture base
> [ 10.311476] amdgpu 0000:08:00.0: sw_init of IP block <gmc_v10_0> failed -22
> [ 10.311477] amdgpu 0000:08:00.0: amdgpu_device_ip_init failed
> [ 10.311478] amdgpu 0000:08:00.0: Fatal error during GPU init
> [ 10.311480] amdgpu 0000:08:00.0: finishing device.
> [ 10.311825] amdgpu 0000:08:00.0: probe with driver amdgpu failed with error -22
>
> Best,
> Rio
>
--
i.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [REGRESSION] amdgpu fails to load eGPU after 6.19
2026-04-14 14:28 ` Ilpo Järvinen
@ 2026-04-14 23:03 ` Rio Liu
2026-04-15 9:19 ` Ilpo Järvinen
0 siblings, 1 reply; 4+ messages in thread
From: Rio Liu @ 2026-04-14 23:03 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
regressions@lists.linux.dev, Bjorn Helgaas
Hi Ilpo,
> This snippet only contains collateral damage and does not show where the
> problem originates from. Please provide a full dmesg instead with
> dyndbg="file drivers/pci/*.c +p" on the kernel command line.
>
> Preferrably test with the latest code + fixes that are in pci/resources
> branch (you can just take all changes from there) and get the logs from
> there. I wouldn't be surprised if your problem is already addressed by
> those fixes but we'll see.
Thanks a lot for the help! I tested on pci/resources
8cb081667377709f4924ab6b3a88a0d7a761fe91 "PCI: Fix alignment calculation for
resource size larger than align", and the eGPU is now loading! There are some
messages in dmesg showing "can't assign; no space" and "failed to assign", but I
don't see any functional impact from it. Including the dmesg anyways in case
there's anything important: https://pastebin.com/n5hhtkz0.
Thanks,
Rio
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [REGRESSION] amdgpu fails to load eGPU after 6.19
2026-04-14 23:03 ` Rio Liu
@ 2026-04-15 9:19 ` Ilpo Järvinen
0 siblings, 0 replies; 4+ messages in thread
From: Ilpo Järvinen @ 2026-04-15 9:19 UTC (permalink / raw)
To: Rio Liu
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
regressions@lists.linux.dev, Bjorn Helgaas
On Tue, 14 Apr 2026, Rio Liu wrote:
> Hi Ilpo,
>
> > This snippet only contains collateral damage and does not show where the
> > problem originates from. Please provide a full dmesg instead with
> > dyndbg="file drivers/pci/*.c +p" on the kernel command line.
> >
> > Preferrably test with the latest code + fixes that are in pci/resources
> > branch (you can just take all changes from there) and get the logs from
> > there. I wouldn't be surprised if your problem is already addressed by
> > those fixes but we'll see.
>
> Thanks a lot for the help! I tested on pci/resources
> 8cb081667377709f4924ab6b3a88a0d7a761fe91 "PCI: Fix alignment calculation for
> resource size larger than align", and the eGPU is now loading! There are some
> messages in dmesg showing "can't assign; no space" and "failed to assign", but I
> don't see any functional impact from it. Including the dmesg anyways in case
> there's anything important: https://pastebin.com/n5hhtkz0.
Hi,
Thanks for testing.
Most of the assignment failures occur for io resources. It tends to be
that in large systems of today, hotplug reservation consumes all io space
for things that don't need any. Usually, the io resource assignment
failures are harmless.
Then there's a bridge window pin for this bridge window:
pcieport 0000:04:00.0: bridge window [mem 0x6000000000-0x60201fffff 64bit pref]: was not released (still contains assigned resources)
That window is pinned in place by sibling devices so kernel cannot
release it prior to attempting BAR resizing, which in turn prevents BAR
resizing from succeeding as the bridge windows cannot be made large
enough.
You should be able to see what the siblings are easily from /proc/iomem.
It may be possible to manually remove the siblings first, resize the
eGPU's BAR through sysfs, and finally rescan.
--
i.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-15 9:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-14 0:16 [REGRESSION] amdgpu fails to load eGPU after 6.19 Rio Liu
2026-04-14 14:28 ` Ilpo Järvinen
2026-04-14 23:03 ` Rio Liu
2026-04-15 9:19 ` Ilpo Järvinen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox