From: "Alexey Klimov" <alexey.klimov@linaro.org>
To: <alexander.deucher@amd.com>, <frank.min@amd.com>,
<amd-gfx@lists.freedesktop.org>
Cc: <stable@vger.kernel.org>, <david.belanger@amd.com>,
<christian.koenig@amd.com>, <peter.chen@cixtech.com>,
<cix-kernel-upstream@cixtech.com>,
<linux-arm-kernel@lists.infradead.org>
Subject: [REGRESSION] amdgpu: async system error exception from hdp_v5_0_flush_hdp()
Date: Tue, 15 Apr 2025 19:28:13 +0100 [thread overview]
Message-ID: <D97FB92117J2.PXTNFKCIRWAS@linaro.org> (raw)
#regzbot introduced: v6.12..v6.13
I use RX6600 on arm64 Orion o6 board and it seems that amdgpu is broken on recent kernels, fails on boot:
[drm] amdgpu: 7886M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
SError Interrupt on CPU11, code 0x00000000be000011 -- SError
CPU: 11 UID: 0 PID: 255 Comm: (udev-worker) Tainted: G S 6.15.0-rc2+ #1 VOLUNTARY
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: Radxa Computer (Shenzhen) Co., Ltd. Radxa Orion O6/Radxa Orion O6, BIOS 1.0 Jan 1 1980
pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : amdgpu_device_rreg+0x60/0xe4 [amdgpu]
lr : hdp_v5_0_flush_hdp+0x6c/0x80 [amdgpu]
sp : ffffffc08321b490
x29: ffffffc08321b490 x28: ffffff80b8b80000 x27: ffffff80b8bd0178
x26: ffffff80b8b8fe88 x25: 0000000000000001 x24: ffffff8081647000
x23: ffffffc079d6e000 x22: ffffff80b8bd5000 x21: 000000000007f000
x20: 000000000001fc00 x19: 00000000ffffffff x18: 00000000000015fc
x17: 00000000000015fc x16: 00000000000015cf x15: 00000000000015ce
x14: 00000000000015d0 x13: 00000000000015d1 x12: 00000000000015d2
x11: 00000000000015d3 x10: 000000000000ec00 x9 : 00000000000015fd
x8 : 00000000000015fd x7 : 0000000000001689 x6 : 0000000000555401
x5 : 0000000000000001 x4 : 0000000000100000 x3 : 0000000000100000
x2 : 0000000000000000 x1 : 000000000007f000 x0 : 0000000000000000
Kernel panic - not syncing: Asynchronous SError Interrupt
CPU: 11 UID: 0 PID: 255 Comm: (udev-worker) Tainted: G S 6.15.0-rc2+ #1 VOLUNTARY
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: Radxa Computer (Shenzhen) Co., Ltd. Radxa Orion O6/Radxa Orion O6, BIOS 1.0 Jan 1 1980
Call trace:
show_stack+0x2c/0x84 (C)
dump_stack_lvl+0x60/0x80
dump_stack+0x18/0x24
panic+0x148/0x330
add_taint+0x0/0xbc
arm64_serror_panic+0x64/0x7c
do_serror+0x28/0x68
el1h_64_error_handler+0x30/0x48
el1h_64_error+0x6c/0x70
amdgpu_device_rreg+0x60/0xe4 [amdgpu] (P)
hdp_v5_0_flush_hdp+0x6c/0x80 [amdgpu]
gmc_v10_0_hw_init+0xec/0x1fc [amdgpu]
amdgpu_device_init+0x19f8/0x2480 [amdgpu]
amdgpu_driver_load_kms+0x20/0xb0 [amdgpu]
amdgpu_pci_probe+0x1b8/0x5d4 [amdgpu]
pci_device_probe+0xbc/0x1a8
really_probe+0xc0/0x39c
__driver_probe_device+0x7c/0x14c
driver_probe_device+0x3c/0x120
__driver_attach+0xc4/0x200
bus_for_each_dev+0x68/0xb4
driver_attach+0x24/0x30
bus_add_driver+0x110/0x240
driver_register+0x68/0x124
__pci_register_driver+0x44/0x50
amdgpu_init+0x84/0xf94 [amdgpu]
do_one_initcall+0x60/0x1e0
do_init_module+0x54/0x200
load_module+0x18f8/0x1e68
init_module_from_file+0x74/0xa0
__arm64_sys_finit_module+0x1e0/0x3f0
invoke_syscall+0x64/0xe4
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x34/0xd0
el0t_64_sync_handler+0x10c/0x138
el0t_64_sync+0x198/0x19c
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x1000,000000e0,f169a650,9b7ff667
Memory Limit: none
---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
(bios version seems to be 45 years old but that is the state of the board
when I received it)
Also saw this crash with RX6700. Old radeons like HD5450 and nvidia gt1030
work fine on that board.
A little bit of testing showed that it was introduced between 6.12 and 6.13.
Also it seems that changes were taken by some distro kernels already and
different iso images I tried failed to boot before I bumped into some iso
with kernel 6.8 that worked just fine.
The only change related to hdp_v5_0_flush_hdp() was
cf424020e040 drm/amdgpu/hdp5.0: do a posting read when flushing HDP
Reverting that commit ^^ did help and resolved that problem. Before sending
revert as-is I was interested to know if there supposed to be a proper fix
for this or maybe someone is interested to debug this or have any suggestions.
In theory I also need to confirm that exactly that change introduced the
regression.
Thanks,
Alexey
next reply other threads:[~2025-04-16 7:25 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-15 18:28 Alexey Klimov [this message]
2025-04-16 3:12 ` 回复: [REGRESSION] amdgpu: async system error exception from hdp_v5_0_flush_hdp() Fugang Duan
2025-04-16 3:12 ` Fugang Duan
2025-04-16 11:25 ` Alexey Klimov
2025-04-16 14:49 ` Alex Deucher
2025-04-17 0:42 ` 回复: " Fugang Duan
2025-04-17 0:42 ` Fugang Duan
2025-04-17 13:08 ` Alex Deucher
2025-04-18 0:30 ` 回复: " Fugang Duan
2025-04-18 0:30 ` Fugang Duan
2025-04-18 1:10 ` Alex Deucher
2025-04-22 2:20 ` Alexey Klimov
2025-04-22 13:00 ` Alex Deucher
2025-04-22 15:59 ` Alexey Klimov
2025-04-23 14:32 ` Christian König
2025-04-24 15:44 ` Alex Deucher
2025-04-27 1:01 ` Alexey Klimov
2025-04-30 16:55 ` Alex Deucher
2025-05-11 23:24 ` Alexey Klimov
2025-05-12 14:46 ` Alex Deucher
2025-04-16 11:44 ` Christian König
2025-04-22 2:49 ` Alexey Klimov
2025-04-24 11:41 ` Peter Chen
2025-04-24 11:41 ` Peter Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D97FB92117J2.PXTNFKCIRWAS@linaro.org \
--to=alexey.klimov@linaro.org \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=cix-kernel-upstream@cixtech.com \
--cc=david.belanger@amd.com \
--cc=frank.min@amd.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=peter.chen@cixtech.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.