All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amber Lin <Amber.Lin@amd.com>
To: <amd-gfx@lists.freedesktop.org>, <alexdeucher@gmail.com>
Cc: <Shaoyun.Liu@amd.com>, <Michael.Chen@amd.com>,
	<Jesse.Zhang@amd.com>, Amber Lin <Amber.Lin@amd.com>,
	Jonathan Kim <jonathan.kim@amd.com>
Subject: [PATCH v2 02/10] drm/amdgpu: Fixup boost mes detect hang array size
Date: Tue, 24 Mar 2026 13:56:44 -0400	[thread overview]
Message-ID: <20260324175653.1325754-3-Amber.Lin@amd.com> (raw)
In-Reply-To: <20260324175653.1325754-1-Amber.Lin@amd.com>

When allocate the hung queues memory, we need to take the number of
queues into account for the worst hang case.

Suggested-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 33 +++++++++++++++++++------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 0d4c77c1b4b5..e639d6c329e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -103,7 +103,7 @@ static inline u32 amdgpu_mes_get_hqd_mask(u32 num_pipe,
 
 int amdgpu_mes_init(struct amdgpu_device *adev)
 {
-	int i, r, num_pipes;
+	int i, r, num_pipes, num_queues = 0;
 	u32 total_vmid_mask, reserved_vmid_mask;
 	int num_xcc = adev->gfx.xcc_mask ? NUM_XCC(adev->gfx.xcc_mask) : 1;
 	u32 gfx_hqd_mask = amdgpu_mes_get_hqd_mask(adev->gfx.me.num_pipe_per_me,
@@ -159,7 +159,7 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
 		adev->mes.compute_hqd_mask[i] = compute_hqd_mask;
 	}
 
-	num_pipes = adev->sdma.num_instances;
+	num_pipes = adev->sdma.num_inst_per_xcc;
 	if (num_pipes > AMDGPU_MES_MAX_SDMA_PIPES)
 		dev_warn(adev->dev, "more SDMA pipes than supported by MES! (%d vs %d)\n",
 			 num_pipes, AMDGPU_MES_MAX_SDMA_PIPES);
@@ -216,8 +216,27 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
 	if (r)
 		goto error_doorbell;
 
+	if (amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(12, 1, 0)) {
+		/* When queue/pipe reset is done in MES instead of in the
+		 * driver, MES passes hung queues information to the driver in
+		 * hung_queue_hqd_info. Calculate required space to store this
+		 * information.
+		 */
+		for (i = 0; i < AMDGPU_MES_MAX_GFX_PIPES; i++)
+			num_queues += hweight32(adev->mes.gfx_hqd_mask[i]);
+
+		for (i = 0; i < AMDGPU_MES_MAX_COMPUTE_PIPES; i++)
+			num_queues += hweight32(adev->mes.compute_hqd_mask[i]);
+
+		for (i = 0; i < AMDGPU_MES_MAX_SDMA_PIPES; i++)
+			num_queues += hweight32(adev->mes.sdma_hqd_mask[i]) * num_xcc;
+
+		adev->mes.hung_queue_hqd_info_offset = num_queues;
+		adev->mes.hung_queue_db_array_size = num_queues * 2;
+	}
+
 	if (adev->mes.hung_queue_db_array_size) {
-		for (i = 0; i < AMDGPU_MAX_MES_PIPES * num_xcc; i++) {
+		for (i = 0; i < AMDGPU_MAX_MES_PIPES; i++) {
 			r = amdgpu_bo_create_kernel(adev,
 						    adev->mes.hung_queue_db_array_size * sizeof(u32),
 						    PAGE_SIZE,
@@ -264,10 +283,10 @@ void amdgpu_mes_fini(struct amdgpu_device *adev)
 			      &adev->mes.event_log_cpu_addr);
 
 	for (i = 0; i < AMDGPU_MAX_MES_PIPES * num_xcc; i++) {
-		amdgpu_bo_free_kernel(&adev->mes.hung_queue_db_array_gpu_obj[i],
-				      &adev->mes.hung_queue_db_array_gpu_addr[i],
-				      &adev->mes.hung_queue_db_array_cpu_addr[i]);
-
+		if (adev->mes.hung_queue_db_array_gpu_obj[i])
+			 amdgpu_bo_free_kernel(&adev->mes.hung_queue_db_array_gpu_obj[i],
+					 &adev->mes.hung_queue_db_array_gpu_addr[i],
+					 &adev->mes.hung_queue_db_array_cpu_addr[i]);
 		if (adev->mes.sch_ctx_ptr[i])
 			amdgpu_device_wb_free(adev, adev->mes.sch_ctx_offs[i]);
 		if (adev->mes.query_status_fence_ptr[i])
-- 
2.43.0


  parent reply	other threads:[~2026-03-24 17:57 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 17:56 [PATCH 0/8] Support compute queue/pipe reset on gfx 12.1 Amber Lin
2026-03-24 17:56 ` [PATCH v2 01/10] drm/amdgpu: Fix gfx_hqd_mask in mes 12.1 Amber Lin
2026-03-24 17:56 ` Amber Lin [this message]
2026-03-26 18:03   ` [PATCH v2 02/10] drm/amdgpu: Fixup boost mes detect hang array size Alex Deucher
2026-03-24 17:56 ` [PATCH v2 03/10] drm/amdgpu: Fixup detect and reset Amber Lin
2026-03-24 17:56 ` [PATCH v2 04/10] drm/amdgpu: Create hqd info structure Amber Lin
2026-03-26 17:56   ` Alex Deucher
2026-03-26 20:34     ` Amber Lin
2026-03-24 17:56 ` [PATCH v2 05/10] drm/amdgpu: Update mes 12.1's suspend/resume Amber Lin
2026-03-26 17:57   ` Alex Deucher
2026-03-24 17:56 ` [PATCH v2 06/10] drm/amdgpu: Missing multi-XCC support in MES Amber Lin
2026-03-26 18:02   ` Alex Deucher
2026-03-24 17:56 ` [PATCH v2 07/10] drm/amdgpu: Enable suspend/resume gang in mes 12.1 Amber Lin
2026-03-26 18:03   ` Alex Deucher
2026-03-24 17:56 ` [PATCH v2 08/10] drm/amdkfd: Add detect+reset hangs to GC 12.1 Amber Lin
2026-03-24 17:56 ` [PATCH v2 09/10] drm/amdkfd: Reset queue/pipe in MES Amber Lin
2026-03-26 16:06   ` Liu, Shaoyun
2026-03-26 17:31     ` Amber Lin
2026-03-26 18:19       ` Liu, Shaoyun
2026-03-26 18:51   ` Alex Deucher
2026-03-26 19:40     ` Amber Lin
2026-03-26 21:08       ` Alex Deucher
2026-03-26 21:35         ` Amber Lin
2026-04-13 18:50     ` Amber Lin
2026-03-24 17:56 ` [PATCH v2 10/10] drm/amdkfd: Queue reset support in KFD topology Amber Lin
2026-03-26 18:27   ` Alex Deucher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260324175653.1325754-3-Amber.Lin@amd.com \
    --to=amber.lin@amd.com \
    --cc=Jesse.Zhang@amd.com \
    --cc=Michael.Chen@amd.com \
    --cc=Shaoyun.Liu@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=jonathan.kim@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.