public inbox for amd-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Support compute queue/pipe reset on gfx 12.1
@ 2026-03-24 17:56 Amber Lin
  2026-03-24 17:56 ` [PATCH v2 01/10] drm/amdgpu: Fix gfx_hqd_mask in mes 12.1 Amber Lin
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Amber Lin @ 2026-03-24 17:56 UTC (permalink / raw)
  To: amd-gfx, alexdeucher; +Cc: Shaoyun.Liu, Michael.Chen, Jesse.Zhang, Amber Lin

Instead of MES does the detection and driver does the reset, this series
implements compute queue/pipe reset with detection and reset both done
in MES.

When REMOVE_QUEUE fails, driver takes it as at least one queue hanged.
Driver sends SUSPEND to suspend all queues, then RESET to reset hung
queues. MES will unmap hung queues and store hung queues information
in doorbell array and hqd_info for driver. Driver finds valid doorbell
offset in doorbell array and looks up hqd_info for each hung queue's
information. Next, driver cleans up hung queues and sends RESUME to resume
healthy queues. 

Amber Lin (8):
  drm/amdgpu: Fix gfx_hqd_mask in mes 12.1
  drm/amdgpu: Fixup boost mes detect hang array size
  drm/amdgpu: Fixup detect and reset
  drm/amdgpu: Create hqd info structure
  drm/amdgpu: Missing multi-XCC support in MES
  drm/amdgpu: Enable suspend/resume gang in mes 12.1
  drm/amdkfd: Add detect+reset hangs to GC 12.1
  drm/amdkfd: Reset queue/pipe in MES

 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |  89 ++++++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |  23 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |   2 +-
 drivers/gpu/drm/amd/amdgpu/mes_userqueue.c    |   2 +-
 drivers/gpu/drm/amd/amdgpu/mes_v12_1.c        |  98 ++++++++----
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 151 +++++++++++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c     |   1 +
 8 files changed, 306 insertions(+), 61 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-03-26 21:35 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-24 17:56 [PATCH 0/8] Support compute queue/pipe reset on gfx 12.1 Amber Lin
2026-03-24 17:56 ` [PATCH v2 01/10] drm/amdgpu: Fix gfx_hqd_mask in mes 12.1 Amber Lin
2026-03-24 17:56 ` [PATCH v2 02/10] drm/amdgpu: Fixup boost mes detect hang array size Amber Lin
2026-03-26 18:03   ` Alex Deucher
2026-03-24 17:56 ` [PATCH v2 03/10] drm/amdgpu: Fixup detect and reset Amber Lin
2026-03-24 17:56 ` [PATCH v2 04/10] drm/amdgpu: Create hqd info structure Amber Lin
2026-03-26 17:56   ` Alex Deucher
2026-03-26 20:34     ` Amber Lin
2026-03-24 17:56 ` [PATCH v2 05/10] drm/amdgpu: Update mes 12.1's suspend/resume Amber Lin
2026-03-26 17:57   ` Alex Deucher
2026-03-24 17:56 ` [PATCH v2 06/10] drm/amdgpu: Missing multi-XCC support in MES Amber Lin
2026-03-26 18:02   ` Alex Deucher
2026-03-24 17:56 ` [PATCH v2 07/10] drm/amdgpu: Enable suspend/resume gang in mes 12.1 Amber Lin
2026-03-26 18:03   ` Alex Deucher
2026-03-24 17:56 ` [PATCH v2 08/10] drm/amdkfd: Add detect+reset hangs to GC 12.1 Amber Lin
2026-03-24 17:56 ` [PATCH v2 09/10] drm/amdkfd: Reset queue/pipe in MES Amber Lin
2026-03-26 16:06   ` Liu, Shaoyun
2026-03-26 17:31     ` Amber Lin
2026-03-26 18:19       ` Liu, Shaoyun
2026-03-26 18:51   ` Alex Deucher
2026-03-26 19:40     ` Amber Lin
2026-03-26 21:08       ` Alex Deucher
2026-03-26 21:35         ` Amber Lin
2026-03-24 17:56 ` [PATCH v2 10/10] drm/amdkfd: Queue reset support in KFD topology Amber Lin
2026-03-26 18:27   ` Alex Deucher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox