From: Amber Lin <Amber.Lin@amd.com>
To: <amd-gfx@lists.freedesktop.org>, <alexdeucher@gmail.com>
Cc: <Shaoyun.Liu@amd.com>, <Michael.Chen@amd.com>,
<Jesse.Zhang@amd.com>, Amber Lin <Amber.Lin@amd.com>,
Alex Deucher <alexander.deucher@amd.com>
Subject: [PATCH v2 08/10] drm/amdkfd: Add detect+reset hangs to GC 12.1
Date: Tue, 24 Mar 2026 13:56:50 -0400 [thread overview]
Message-ID: <20260324175653.1325754-9-Amber.Lin@amd.com> (raw)
In-Reply-To: <20260324175653.1325754-1-Amber.Lin@amd.com>
Add detect_and_reset_hung_queues to user mode compute queues on GC 12.1.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdgpu/mes_v12_1.c | 35 +++++++++++++++++++++++++-
1 file changed, 34 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_1.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_1.c
index 7aea3a50e712..ac9e26b8bb52 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_1.c
@@ -46,6 +46,8 @@ static int mes_v12_1_kiq_hw_fini(struct amdgpu_device *adev, uint32_t xcc_id);
static int mes_v12_1_self_test(struct amdgpu_device *adev, int xcc_id);
#define MES_EOP_SIZE 2048
+#define MES12_HUNG_DB_OFFSET_ARRAY_SIZE 8 /* [0:3] = db offset [4:7] hqd info */
+#define MES12_HUNG_HQD_INFO_OFFSET 4
#define regCP_HQD_IB_CONTROL_MES_12_1_DEFAULT 0x100000
#define XCC_MID_MASK 0x41000000
@@ -229,7 +231,7 @@ static int mes_v12_1_submit_pkt_and_poll_completion(struct amdgpu_mes *mes,
xcc_id, pipe, x_pkt->header.opcode);
r = amdgpu_fence_wait_polling(ring, seq, timeout);
- if (r < 1 || !*status_ptr) {
+ if (r < 1 || !lower_32_bits(*status_ptr)) {
if (misc_op_str)
dev_err(adev->dev,
"MES(%d, %d) failed to respond to msg=%s (%s)\n",
@@ -858,6 +860,33 @@ static int mes_v12_1_reset_legacy_queue(struct amdgpu_mes *mes,
}
#endif
+static int mes_v12_1_detect_and_reset_hung_queues(struct amdgpu_mes *mes,
+ struct mes_detect_and_reset_queue_input *input)
+{
+ union MESAPI__RESET mes_reset_queue_pkt;
+
+ memset(&mes_reset_queue_pkt, 0, sizeof(mes_reset_queue_pkt));
+
+ mes_reset_queue_pkt.header.type = MES_API_TYPE_SCHEDULER;
+ mes_reset_queue_pkt.header.opcode = MES_SCH_API_RESET;
+ mes_reset_queue_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
+
+ mes_reset_queue_pkt.queue_type =
+ convert_to_mes_queue_type(input->queue_type);
+ mes_reset_queue_pkt.doorbell_offset_addr =
+ mes->hung_queue_db_array_gpu_addr[0];
+
+ if (input->detect_only)
+ mes_reset_queue_pkt.hang_detect_only = 1;
+ else
+ mes_reset_queue_pkt.hang_detect_then_reset = 1;
+
+ return mes_v12_1_submit_pkt_and_poll_completion(mes,
+ input->xcc_id, AMDGPU_MES_SCHED_PIPE,
+ &mes_reset_queue_pkt, sizeof(mes_reset_queue_pkt),
+ offsetof(union MESAPI__RESET, api_status));
+}
+
static int mes_v12_inv_tlb_convert_hub_id(uint8_t id)
{
/*
@@ -915,6 +944,7 @@ static const struct amdgpu_mes_funcs mes_v12_1_funcs = {
.resume_gang = mes_v12_1_resume_gang,
.misc_op = mes_v12_1_misc_op,
.reset_hw_queue = mes_v12_1_reset_hw_queue,
+ .detect_and_reset_hung_queues = mes_v12_1_detect_and_reset_hung_queues,
.invalidate_tlbs_pasid = mes_v12_1_inv_tlbs_pasid,
};
@@ -1931,6 +1961,9 @@ static int mes_v12_1_early_init(struct amdgpu_ip_block *ip_block)
struct amdgpu_device *adev = ip_block->adev;
int pipe, r;
+ adev->mes.hung_queue_db_array_size = MES12_HUNG_DB_OFFSET_ARRAY_SIZE;
+ adev->mes.hung_queue_hqd_info_offset = MES12_HUNG_HQD_INFO_OFFSET;
+
for (pipe = 0; pipe < AMDGPU_MAX_MES_PIPES; pipe++) {
r = amdgpu_mes_init_microcode(adev, pipe);
if (r)
--
2.43.0
next prev parent reply other threads:[~2026-03-24 17:57 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-24 17:56 [PATCH 0/8] Support compute queue/pipe reset on gfx 12.1 Amber Lin
2026-03-24 17:56 ` [PATCH v2 01/10] drm/amdgpu: Fix gfx_hqd_mask in mes 12.1 Amber Lin
2026-03-24 17:56 ` [PATCH v2 02/10] drm/amdgpu: Fixup boost mes detect hang array size Amber Lin
2026-03-26 18:03 ` Alex Deucher
2026-03-24 17:56 ` [PATCH v2 03/10] drm/amdgpu: Fixup detect and reset Amber Lin
2026-03-24 17:56 ` [PATCH v2 04/10] drm/amdgpu: Create hqd info structure Amber Lin
2026-03-26 17:56 ` Alex Deucher
2026-03-26 20:34 ` Amber Lin
2026-03-24 17:56 ` [PATCH v2 05/10] drm/amdgpu: Update mes 12.1's suspend/resume Amber Lin
2026-03-26 17:57 ` Alex Deucher
2026-03-24 17:56 ` [PATCH v2 06/10] drm/amdgpu: Missing multi-XCC support in MES Amber Lin
2026-03-26 18:02 ` Alex Deucher
2026-03-24 17:56 ` [PATCH v2 07/10] drm/amdgpu: Enable suspend/resume gang in mes 12.1 Amber Lin
2026-03-26 18:03 ` Alex Deucher
2026-03-24 17:56 ` Amber Lin [this message]
2026-03-24 17:56 ` [PATCH v2 09/10] drm/amdkfd: Reset queue/pipe in MES Amber Lin
2026-03-26 16:06 ` Liu, Shaoyun
2026-03-26 17:31 ` Amber Lin
2026-03-26 18:19 ` Liu, Shaoyun
2026-03-26 18:51 ` Alex Deucher
2026-03-26 19:40 ` Amber Lin
2026-03-26 21:08 ` Alex Deucher
2026-03-26 21:35 ` Amber Lin
2026-04-13 18:50 ` Amber Lin
2026-03-24 17:56 ` [PATCH v2 10/10] drm/amdkfd: Queue reset support in KFD topology Amber Lin
2026-03-26 18:27 ` Alex Deucher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260324175653.1325754-9-Amber.Lin@amd.com \
--to=amber.lin@amd.com \
--cc=Jesse.Zhang@amd.com \
--cc=Michael.Chen@amd.com \
--cc=Shaoyun.Liu@amd.com \
--cc=alexander.deucher@amd.com \
--cc=alexdeucher@gmail.com \
--cc=amd-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.