From: Amber Lin <Amber.Lin@amd.com>
To: <amd-gfx@lists.freedesktop.org>
Cc: <Shaoyun.Liu@amd.com>, <Michael.Chen@amd.com>,
<Jesse.Zhang@amd.com>, Amber Lin <Amber.Lin@amd.com>,
Jonathan Kim <jonathan.kim@amd.com>
Subject: [PATCH 3/8] drm/amdgpu: Fixup detect and reset
Date: Fri, 20 Mar 2026 16:02:03 -0400 [thread overview]
Message-ID: <20260320200208.1188307-4-Amber.Lin@amd.com> (raw)
In-Reply-To: <20260320200208.1188307-1-Amber.Lin@amd.com>
Identify hung queues by comparing doorbells shown in hqd_info from MES
with doorbells stored in the driver to find matching queues.
Suggested-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 38 ++++++++++++++++---------
1 file changed, 25 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index b68bf4a9cb40..bea509f6b3ff 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -465,23 +465,35 @@ int amdgpu_mes_detect_and_reset_hung_queues(struct amdgpu_device *adev,
r = adev->mes.funcs->detect_and_reset_hung_queues(&adev->mes,
&input);
- if (r) {
- dev_err(adev->dev, "failed to detect and reset\n");
- } else {
- *hung_db_num = 0;
- for (i = 0; i < adev->mes.hung_queue_hqd_info_offset; i++) {
- if (db_array[i] != AMDGPU_MES_INVALID_DB_OFFSET) {
- hung_db_array[i] = db_array[i];
- *hung_db_num += 1;
- }
+
+ if (r && detect_only) {
+ dev_err(adev->dev, "Failed to detect hung queues\n");
+ return r;
+ }
+
+ *hung_db_num = 0;
+ /* MES passes hung queues' doorbell to driver */
+ for (i = 0; i < adev->mes.hung_queue_hqd_info_offset; i++) {
+ /* Finding hung queues where db_array[i] is a valid doorbell */
+ if (db_array[i] != AMDGPU_MES_INVALID_DB_OFFSET) {
+ hung_db_array[i] = db_array[i];
+ *hung_db_num += 1;
}
+ }
- /*
- * TODO: return HQD info for MES scheduled user compute queue reset cases
- * stored in hung_db_array hqd info offset to full array size
- */
+ if (r && !hung_db_num) {
+ dev_err(adev->dev, "Failed to detect and reset hung queues\n");
+ return r;
}
+ /*
+ * TODO: return HQD info for MES scheduled user compute queue reset cases
+ * stored in hung_db_array hqd info offset to full array size
+ */
+
+ if (r)
+ dev_err(adev->dev, "failed to reset\n");
+
return r;
}
--
2.43.0
next prev parent reply other threads:[~2026-03-20 20:02 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-20 20:02 [PATCH 0/8] Support compute queue/pipe reset on gfx 12.1 Amber Lin
2026-03-20 20:02 ` [PATCH 1/8] drm/amdgpu: Fix gfx_hqd_mask in mes 12.1 Amber Lin
2026-03-23 19:03 ` Alex Deucher
2026-03-20 20:02 ` [PATCH 2/8] drm/amdgpu: Fixup boost mes detect hang array size Amber Lin
2026-03-23 19:04 ` Alex Deucher
2026-03-23 19:15 ` Amber Lin
2026-03-20 20:02 ` Amber Lin [this message]
2026-03-23 19:07 ` [PATCH 3/8] drm/amdgpu: Fixup detect and reset Alex Deucher
2026-03-20 20:02 ` [PATCH 4/8] drm/amdgpu: Create hqd info structure Amber Lin
2026-03-23 19:01 ` Alex Deucher
2026-03-23 19:11 ` Amber Lin
2026-03-20 20:02 ` [PATCH 5/8] drm/amdgpu: Missing multi-XCC support in MES Amber Lin
2026-03-23 19:10 ` Alex Deucher
2026-03-23 19:19 ` Amber Lin
2026-03-20 20:02 ` [PATCH 6/8] drm/amdgpu: Enable suspend/resume gang in mes 12.1 Amber Lin
2026-03-23 19:11 ` Alex Deucher
2026-03-20 20:02 ` [PATCH 7/8] drm/amdkfd: Add detect+reset hangs to GC 12.1 Amber Lin
2026-03-23 19:12 ` Alex Deucher
2026-03-20 20:02 ` [PATCH 8/8] drm/amdkfd: Reset queue/pipe in MES Amber Lin
2026-03-23 19:21 ` Alex Deucher
2026-03-23 19:42 ` Amber Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260320200208.1188307-4-Amber.Lin@amd.com \
--to=amber.lin@amd.com \
--cc=Jesse.Zhang@amd.com \
--cc=Michael.Chen@amd.com \
--cc=Shaoyun.Liu@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=jonathan.kim@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.