All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amber Lin <Amber.Lin@amd.com>
To: <amd-gfx@lists.freedesktop.org>, <alexdeucher@gmail.com>
Cc: <Shaoyun.Liu@amd.com>, <Michael.Chen@amd.com>,
	<Jesse.Zhang@amd.com>, Amber Lin <Amber.Lin@amd.com>,
	Jonathan Kim <jonathan.kim@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>
Subject: [PATCH v2 03/10] drm/amdgpu: Fixup detect and reset
Date: Tue, 24 Mar 2026 13:56:45 -0400	[thread overview]
Message-ID: <20260324175653.1325754-4-Amber.Lin@amd.com> (raw)
In-Reply-To: <20260324175653.1325754-1-Amber.Lin@amd.com>

Identify hung queues by comparing doorbells shown in hqd_info from MES
with doorbells stored in the driver to find matching queues.

Suggested-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 38 ++++++++++++++++---------
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index e639d6c329e9..f1f8bbfc31e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -465,23 +465,35 @@ int amdgpu_mes_detect_and_reset_hung_queues(struct amdgpu_device *adev,
 
 	r = adev->mes.funcs->detect_and_reset_hung_queues(&adev->mes,
 							  &input);
-	if (r) {
-		dev_err(adev->dev, "failed to detect and reset\n");
-	} else {
-		*hung_db_num = 0;
-		for (i = 0; i < adev->mes.hung_queue_hqd_info_offset; i++) {
-			if (db_array[i] != AMDGPU_MES_INVALID_DB_OFFSET) {
-				hung_db_array[i] = db_array[i];
-				*hung_db_num += 1;
-			}
+
+	if (r && detect_only) {
+		dev_err(adev->dev, "Failed to detect hung queues\n");
+		return r;
+	}
+
+	*hung_db_num = 0;
+	/* MES passes hung queues' doorbell to driver */
+	for (i = 0; i < adev->mes.hung_queue_hqd_info_offset; i++) {
+		/* Finding hung queues where db_array[i] is a valid doorbell */
+		if (db_array[i] != AMDGPU_MES_INVALID_DB_OFFSET) {
+			hung_db_array[i] = db_array[i];
+			*hung_db_num += 1;
 		}
+	}
 
-		/*
-		 * TODO: return HQD info for MES scheduled user compute queue reset cases
-		 * stored in hung_db_array hqd info offset to full array size
-		 */
+	if (r && !hung_db_num) {
+		dev_err(adev->dev, "Failed to detect and reset hung queues\n");
+		return r;
 	}
 
+	/*
+	 * TODO: return HQD info for MES scheduled user compute queue reset cases
+	 * stored in hung_db_array hqd info offset to full array size
+	 */
+
+	if (r)
+		dev_err(adev->dev, "failed to reset\n");
+
 	return r;
 }
 
-- 
2.43.0


  parent reply	other threads:[~2026-03-24 17:57 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 17:56 [PATCH 0/8] Support compute queue/pipe reset on gfx 12.1 Amber Lin
2026-03-24 17:56 ` [PATCH v2 01/10] drm/amdgpu: Fix gfx_hqd_mask in mes 12.1 Amber Lin
2026-03-24 17:56 ` [PATCH v2 02/10] drm/amdgpu: Fixup boost mes detect hang array size Amber Lin
2026-03-26 18:03   ` Alex Deucher
2026-03-24 17:56 ` Amber Lin [this message]
2026-03-24 17:56 ` [PATCH v2 04/10] drm/amdgpu: Create hqd info structure Amber Lin
2026-03-26 17:56   ` Alex Deucher
2026-03-26 20:34     ` Amber Lin
2026-03-24 17:56 ` [PATCH v2 05/10] drm/amdgpu: Update mes 12.1's suspend/resume Amber Lin
2026-03-26 17:57   ` Alex Deucher
2026-03-24 17:56 ` [PATCH v2 06/10] drm/amdgpu: Missing multi-XCC support in MES Amber Lin
2026-03-26 18:02   ` Alex Deucher
2026-03-24 17:56 ` [PATCH v2 07/10] drm/amdgpu: Enable suspend/resume gang in mes 12.1 Amber Lin
2026-03-26 18:03   ` Alex Deucher
2026-03-24 17:56 ` [PATCH v2 08/10] drm/amdkfd: Add detect+reset hangs to GC 12.1 Amber Lin
2026-03-24 17:56 ` [PATCH v2 09/10] drm/amdkfd: Reset queue/pipe in MES Amber Lin
2026-03-26 16:06   ` Liu, Shaoyun
2026-03-26 17:31     ` Amber Lin
2026-03-26 18:19       ` Liu, Shaoyun
2026-03-26 18:51   ` Alex Deucher
2026-03-26 19:40     ` Amber Lin
2026-03-26 21:08       ` Alex Deucher
2026-03-26 21:35         ` Amber Lin
2026-04-13 18:50     ` Amber Lin
2026-03-24 17:56 ` [PATCH v2 10/10] drm/amdkfd: Queue reset support in KFD topology Amber Lin
2026-03-26 18:27   ` Alex Deucher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260324175653.1325754-4-Amber.Lin@amd.com \
    --to=amber.lin@amd.com \
    --cc=Jesse.Zhang@amd.com \
    --cc=Michael.Chen@amd.com \
    --cc=Shaoyun.Liu@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=jonathan.kim@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.