All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amber Lin <Amber.Lin@amd.com>
To: Alex Deucher <alexdeucher@gmail.com>
Cc: amd-gfx@lists.freedesktop.org, Shaoyun.Liu@amd.com,
	Michael.Chen@amd.com,  Jesse.Zhang@amd.com,
	Jonathan Kim <jonathan.kim@amd.com>
Subject: Re: [PATCH 2/8] drm/amdgpu: Fixup boost mes detect hang array size
Date: Mon, 23 Mar 2026 15:15:24 -0400	[thread overview]
Message-ID: <6797379d-2f65-49b7-8826-0763efc4a158@amd.com> (raw)
In-Reply-To: <CADnq5_OxDUYro8TqWQFuJ1qE9MRwoC1j6=ac_D1moP2Mfss+4Q@mail.gmail.com>


On 3/23/26 15:04, Alex Deucher wrote:
> On Fri, Mar 20, 2026 at 4:09 PM Amber Lin <Amber.Lin@amd.com> wrote:
>> When allocate the hung queues memory, we need to take the number of
>> queues into account for the worst hang case.
>>
>> Suggested-by: Jonathan Kim <jonathan.kim@amd.com>
>> Signed-off-by: Amber Lin <Amber.Lin@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 33 +++++++++++++++++++------
>>   1 file changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>> index 0d4c77c1b4b5..b68bf4a9cb40 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>> @@ -103,7 +103,7 @@ static inline u32 amdgpu_mes_get_hqd_mask(u32 num_pipe,
>>
>>   int amdgpu_mes_init(struct amdgpu_device *adev)
>>   {
>> -       int i, r, num_pipes;
>> +       int i, r, num_pipes, num_queues = 0;
>>          u32 total_vmid_mask, reserved_vmid_mask;
>>          int num_xcc = adev->gfx.xcc_mask ? NUM_XCC(adev->gfx.xcc_mask) : 1;
>>          u32 gfx_hqd_mask = amdgpu_mes_get_hqd_mask(adev->gfx.me.num_pipe_per_me,
>> @@ -159,7 +159,7 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
>>                  adev->mes.compute_hqd_mask[i] = compute_hqd_mask;
>>          }
>>
>> -       num_pipes = adev->sdma.num_instances;
>> +       num_pipes = adev->sdma.num_inst_per_xcc;
>>          if (num_pipes > AMDGPU_MES_MAX_SDMA_PIPES)
>>                  dev_warn(adev->dev, "more SDMA pipes than supported by MES! (%d vs %d)\n",
>>                           num_pipes, AMDGPU_MES_MAX_SDMA_PIPES);
>> @@ -216,8 +216,27 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
>>          if (r)
>>                  goto error_doorbell;
>>
>> +       if (amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(12, 0, 0)) {
> Is this 12.0 and higher or 12.1 and higher?
>
> Alex
Thank you for the catch. Yes it should be 12.1 for now until the MES 12 
support is available too. I'll fix it in v2

Amber
>
>> +               /* When queue/pipe reset is done in MES instead of in the
>> +                * driver, MES passes hung queues information to the driver in
>> +                * hung_queue_hqd_info. Calculate required space to store this
>> +                * information.
>> +                */
>> +               for (i = 0; i < AMDGPU_MES_MAX_GFX_PIPES; i++)
>> +                       num_queues += hweight32(adev->mes.gfx_hqd_mask[i]);
>> +
>> +               for (i = 0; i < AMDGPU_MES_MAX_COMPUTE_PIPES; i++)
>> +                       num_queues += hweight32(adev->mes.compute_hqd_mask[i]);
>> +
>> +               for (i = 0; i < AMDGPU_MES_MAX_SDMA_PIPES; i++)
>> +                       num_queues += hweight32(adev->mes.sdma_hqd_mask[i]) * num_xcc;
>> +
>> +               adev->mes.hung_queue_hqd_info_offset = num_queues;
>> +               adev->mes.hung_queue_db_array_size = num_queues * 2;
>> +       }
>> +
>>          if (adev->mes.hung_queue_db_array_size) {
>> -               for (i = 0; i < AMDGPU_MAX_MES_PIPES * num_xcc; i++) {
>> +               for (i = 0; i < AMDGPU_MAX_MES_PIPES; i++) {
>>                          r = amdgpu_bo_create_kernel(adev,
>>                                                      adev->mes.hung_queue_db_array_size * sizeof(u32),
>>                                                      PAGE_SIZE,
>> @@ -264,10 +283,10 @@ void amdgpu_mes_fini(struct amdgpu_device *adev)
>>                                &adev->mes.event_log_cpu_addr);
>>
>>          for (i = 0; i < AMDGPU_MAX_MES_PIPES * num_xcc; i++) {
>> -               amdgpu_bo_free_kernel(&adev->mes.hung_queue_db_array_gpu_obj[i],
>> -                                     &adev->mes.hung_queue_db_array_gpu_addr[i],
>> -                                     &adev->mes.hung_queue_db_array_cpu_addr[i]);
>> -
>> +               if (adev->mes.hung_queue_db_array_gpu_obj[i])
>> +                        amdgpu_bo_free_kernel(&adev->mes.hung_queue_db_array_gpu_obj[i],
>> +                                        &adev->mes.hung_queue_db_array_gpu_addr[i],
>> +                                        &adev->mes.hung_queue_db_array_cpu_addr[i]);
>>                  if (adev->mes.sch_ctx_ptr[i])
>>                          amdgpu_device_wb_free(adev, adev->mes.sch_ctx_offs[i]);
>>                  if (adev->mes.query_status_fence_ptr[i])
>> --
>> 2.43.0
>>


  reply	other threads:[~2026-03-23 19:15 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20 20:02 [PATCH 0/8] Support compute queue/pipe reset on gfx 12.1 Amber Lin
2026-03-20 20:02 ` [PATCH 1/8] drm/amdgpu: Fix gfx_hqd_mask in mes 12.1 Amber Lin
2026-03-23 19:03   ` Alex Deucher
2026-03-20 20:02 ` [PATCH 2/8] drm/amdgpu: Fixup boost mes detect hang array size Amber Lin
2026-03-23 19:04   ` Alex Deucher
2026-03-23 19:15     ` Amber Lin [this message]
2026-03-20 20:02 ` [PATCH 3/8] drm/amdgpu: Fixup detect and reset Amber Lin
2026-03-23 19:07   ` Alex Deucher
2026-03-20 20:02 ` [PATCH 4/8] drm/amdgpu: Create hqd info structure Amber Lin
2026-03-23 19:01   ` Alex Deucher
2026-03-23 19:11     ` Amber Lin
2026-03-20 20:02 ` [PATCH 5/8] drm/amdgpu: Missing multi-XCC support in MES Amber Lin
2026-03-23 19:10   ` Alex Deucher
2026-03-23 19:19     ` Amber Lin
2026-03-20 20:02 ` [PATCH 6/8] drm/amdgpu: Enable suspend/resume gang in mes 12.1 Amber Lin
2026-03-23 19:11   ` Alex Deucher
2026-03-20 20:02 ` [PATCH 7/8] drm/amdkfd: Add detect+reset hangs to GC 12.1 Amber Lin
2026-03-23 19:12   ` Alex Deucher
2026-03-20 20:02 ` [PATCH 8/8] drm/amdkfd: Reset queue/pipe in MES Amber Lin
2026-03-23 19:21   ` Alex Deucher
2026-03-23 19:42     ` Amber Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6797379d-2f65-49b7-8826-0763efc4a158@amd.com \
    --to=amber.lin@amd.com \
    --cc=Jesse.Zhang@amd.com \
    --cc=Michael.Chen@amd.com \
    --cc=Shaoyun.Liu@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=jonathan.kim@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.