All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Alexey Klimov" <alexey.klimov@linaro.org>
To: <alexander.deucher@amd.com>, <frank.min@amd.com>,
	<amd-gfx@lists.freedesktop.org>
Cc: <stable@vger.kernel.org>, <david.belanger@amd.com>,
	<christian.koenig@amd.com>, <peter.chen@cixtech.com>,
	<cix-kernel-upstream@cixtech.com>,
	<linux-arm-kernel@lists.infradead.org>
Subject: [REGRESSION] amdgpu: async system error exception from hdp_v5_0_flush_hdp()
Date: Tue, 15 Apr 2025 19:28:13 +0100	[thread overview]
Message-ID: <D97FB92117J2.PXTNFKCIRWAS@linaro.org> (raw)


#regzbot introduced: v6.12..v6.13

I use RX6600 on arm64 Orion o6 board and it seems that amdgpu is broken on recent kernels, fails on boot:

[drm] amdgpu: 7886M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
SError Interrupt on CPU11, code 0x00000000be000011 -- SError
CPU: 11 UID: 0 PID: 255 Comm: (udev-worker) Tainted: G S                  6.15.0-rc2+ #1 VOLUNTARY
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: Radxa Computer (Shenzhen) Co., Ltd. Radxa Orion O6/Radxa Orion O6, BIOS 1.0 Jan  1 1980
pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : amdgpu_device_rreg+0x60/0xe4 [amdgpu]
lr : hdp_v5_0_flush_hdp+0x6c/0x80 [amdgpu]
sp : ffffffc08321b490
x29: ffffffc08321b490 x28: ffffff80b8b80000 x27: ffffff80b8bd0178
x26: ffffff80b8b8fe88 x25: 0000000000000001 x24: ffffff8081647000
x23: ffffffc079d6e000 x22: ffffff80b8bd5000 x21: 000000000007f000
x20: 000000000001fc00 x19: 00000000ffffffff x18: 00000000000015fc
x17: 00000000000015fc x16: 00000000000015cf x15: 00000000000015ce
x14: 00000000000015d0 x13: 00000000000015d1 x12: 00000000000015d2
x11: 00000000000015d3 x10: 000000000000ec00 x9 : 00000000000015fd
x8 : 00000000000015fd x7 : 0000000000001689 x6 : 0000000000555401
x5 : 0000000000000001 x4 : 0000000000100000 x3 : 0000000000100000
x2 : 0000000000000000 x1 : 000000000007f000 x0 : 0000000000000000
Kernel panic - not syncing: Asynchronous SError Interrupt
CPU: 11 UID: 0 PID: 255 Comm: (udev-worker) Tainted: G S                  6.15.0-rc2+ #1 VOLUNTARY
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: Radxa Computer (Shenzhen) Co., Ltd. Radxa Orion O6/Radxa Orion O6, BIOS 1.0 Jan  1 1980
Call trace:
 show_stack+0x2c/0x84 (C)
 dump_stack_lvl+0x60/0x80
 dump_stack+0x18/0x24
 panic+0x148/0x330
 add_taint+0x0/0xbc
 arm64_serror_panic+0x64/0x7c
 do_serror+0x28/0x68
 el1h_64_error_handler+0x30/0x48
 el1h_64_error+0x6c/0x70
 amdgpu_device_rreg+0x60/0xe4 [amdgpu] (P)
 hdp_v5_0_flush_hdp+0x6c/0x80 [amdgpu]
 gmc_v10_0_hw_init+0xec/0x1fc [amdgpu]
 amdgpu_device_init+0x19f8/0x2480 [amdgpu]
 amdgpu_driver_load_kms+0x20/0xb0 [amdgpu]
 amdgpu_pci_probe+0x1b8/0x5d4 [amdgpu]
 pci_device_probe+0xbc/0x1a8
 really_probe+0xc0/0x39c
 __driver_probe_device+0x7c/0x14c
 driver_probe_device+0x3c/0x120
 __driver_attach+0xc4/0x200
 bus_for_each_dev+0x68/0xb4
 driver_attach+0x24/0x30
 bus_add_driver+0x110/0x240
 driver_register+0x68/0x124
 __pci_register_driver+0x44/0x50
 amdgpu_init+0x84/0xf94 [amdgpu]
 do_one_initcall+0x60/0x1e0
 do_init_module+0x54/0x200
 load_module+0x18f8/0x1e68
 init_module_from_file+0x74/0xa0
 __arm64_sys_finit_module+0x1e0/0x3f0
 invoke_syscall+0x64/0xe4
 el0_svc_common.constprop.0+0x40/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x34/0xd0
 el0t_64_sync_handler+0x10c/0x138
 el0t_64_sync+0x198/0x19c
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x1000,000000e0,f169a650,9b7ff667
Memory Limit: none
---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---

(bios version seems to be 45 years old but that is the state of the board
when I received it)

Also saw this crash with RX6700. Old radeons like HD5450 and nvidia gt1030
work fine on that board.

A little bit of testing showed that it was introduced between 6.12 and 6.13.
Also it seems that changes were taken by some distro kernels already and
different iso images I tried failed to boot before I bumped into some iso
with kernel 6.8 that worked just fine.

The only change related to hdp_v5_0_flush_hdp() was
cf424020e040 drm/amdgpu/hdp5.0: do a posting read when flushing HDP

Reverting that commit ^^ did help and resolved that problem. Before sending
revert as-is I was interested to know if there supposed to be a proper fix
for this or maybe someone is interested to debug this or have any suggestions.

In theory I also need to confirm that exactly that change introduced the
regression.

Thanks,
Alexey


             reply	other threads:[~2025-04-16  7:25 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-15 18:28 Alexey Klimov [this message]
2025-04-16  3:12 ` 回复: [REGRESSION] amdgpu: async system error exception from hdp_v5_0_flush_hdp() Fugang Duan
2025-04-16  3:12   ` Fugang Duan
2025-04-16 11:25   ` Alexey Klimov
2025-04-16 14:49     ` Alex Deucher
2025-04-17  0:42       ` 回复: " Fugang Duan
2025-04-17  0:42         ` Fugang Duan
2025-04-17 13:08         ` Alex Deucher
2025-04-18  0:30           ` 回复: " Fugang Duan
2025-04-18  0:30             ` Fugang Duan
2025-04-18  1:10             ` Alex Deucher
2025-04-22  2:20           ` Alexey Klimov
2025-04-22 13:00             ` Alex Deucher
2025-04-22 15:59               ` Alexey Klimov
2025-04-23 14:32                 ` Christian König
2025-04-24 15:44                 ` Alex Deucher
2025-04-27  1:01                   ` Alexey Klimov
2025-04-30 16:55                     ` Alex Deucher
2025-05-11 23:24                       ` Alexey Klimov
2025-05-12 14:46                         ` Alex Deucher
2025-04-16 11:44 ` Christian König
2025-04-22  2:49   ` Alexey Klimov
2025-04-24 11:41     ` Peter Chen
2025-04-24 11:41       ` Peter Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D97FB92117J2.PXTNFKCIRWAS@linaro.org \
    --to=alexey.klimov@linaro.org \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=cix-kernel-upstream@cixtech.com \
    --cc=david.belanger@amd.com \
    --cc=frank.min@amd.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=peter.chen@cixtech.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.