public inbox for amd-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Support compute queue/pipe reset on gfx 12.1
@ 2026-03-20 20:02 Amber Lin
  2026-03-20 20:02 ` [PATCH 1/8] drm/amdgpu: Fix gfx_hqd_mask in mes 12.1 Amber Lin
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Amber Lin @ 2026-03-20 20:02 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shaoyun.Liu, Michael.Chen, Jesse.Zhang, Amber Lin

Instead of MES does the detection and driver does the reset, this series
implements compute queue/pipe reset with detection and reset both done
in MES.

When REMOVE_QUEUE fails, driver takes it as at least one queue hanged.
Driver sends SUSPEND to suspend all queues, then RESET to reset hung
queues. MES will unmap hung queues and store hung queues information
in doorbell array and hqd_info for driver. Driver finds valid doorbell
offset in doorbell array and looks up hqd_info for each hung queue's
information. Next, driver cleans up hung queues and sends RESUME to resume
healthy queues. 

Amber Lin (8):
  drm/amdgpu: Fix gfx_hqd_mask in mes 12.1
  drm/amdgpu: Fixup boost mes detect hang array size
  drm/amdgpu: Fixup detect and reset
  drm/amdgpu: Create hqd info structure
  drm/amdgpu: Missing multi-XCC support in MES
  drm/amdgpu: Enable suspend/resume gang in mes 12.1
  drm/amdkfd: Add detect+reset hangs to GC 12.1
  drm/amdkfd: Reset queue/pipe in MES

 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |  89 ++++++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |  23 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |   2 +-
 drivers/gpu/drm/amd/amdgpu/mes_userqueue.c    |   2 +-
 drivers/gpu/drm/amd/amdgpu/mes_v12_1.c        |  98 ++++++++----
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 151 +++++++++++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c     |   1 +
 8 files changed, 306 insertions(+), 61 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 22+ messages in thread
* [PATCH 0/8] Support compute queue/pipe reset on gfx 12.1
@ 2026-03-24 17:56 Amber Lin
  0 siblings, 0 replies; 22+ messages in thread
From: Amber Lin @ 2026-03-24 17:56 UTC (permalink / raw)
  To: amd-gfx, alexdeucher; +Cc: Shaoyun.Liu, Michael.Chen, Jesse.Zhang, Amber Lin

Instead of MES does the detection and driver does the reset, this series
implements compute queue/pipe reset with detection and reset both done
in MES.

When REMOVE_QUEUE fails, driver takes it as at least one queue hanged.
Driver sends SUSPEND to suspend all queues, then RESET to reset hung
queues. MES will unmap hung queues and store hung queues information
in doorbell array and hqd_info for driver. Driver finds valid doorbell
offset in doorbell array and looks up hqd_info for each hung queue's
information. Next, driver cleans up hung queues and sends RESUME to resume
healthy queues. 

Amber Lin (8):
  drm/amdgpu: Fix gfx_hqd_mask in mes 12.1
  drm/amdgpu: Fixup boost mes detect hang array size
  drm/amdgpu: Fixup detect and reset
  drm/amdgpu: Create hqd info structure
  drm/amdgpu: Missing multi-XCC support in MES
  drm/amdgpu: Enable suspend/resume gang in mes 12.1
  drm/amdkfd: Add detect+reset hangs to GC 12.1
  drm/amdkfd: Reset queue/pipe in MES

 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |  89 ++++++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |  23 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |   2 +-
 drivers/gpu/drm/amd/amdgpu/mes_userqueue.c    |   2 +-
 drivers/gpu/drm/amd/amdgpu/mes_v12_1.c        |  98 ++++++++----
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 151 +++++++++++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c     |   1 +
 8 files changed, 306 insertions(+), 61 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-03-24 17:57 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-20 20:02 [PATCH 0/8] Support compute queue/pipe reset on gfx 12.1 Amber Lin
2026-03-20 20:02 ` [PATCH 1/8] drm/amdgpu: Fix gfx_hqd_mask in mes 12.1 Amber Lin
2026-03-23 19:03   ` Alex Deucher
2026-03-20 20:02 ` [PATCH 2/8] drm/amdgpu: Fixup boost mes detect hang array size Amber Lin
2026-03-23 19:04   ` Alex Deucher
2026-03-23 19:15     ` Amber Lin
2026-03-20 20:02 ` [PATCH 3/8] drm/amdgpu: Fixup detect and reset Amber Lin
2026-03-23 19:07   ` Alex Deucher
2026-03-20 20:02 ` [PATCH 4/8] drm/amdgpu: Create hqd info structure Amber Lin
2026-03-23 19:01   ` Alex Deucher
2026-03-23 19:11     ` Amber Lin
2026-03-20 20:02 ` [PATCH 5/8] drm/amdgpu: Missing multi-XCC support in MES Amber Lin
2026-03-23 19:10   ` Alex Deucher
2026-03-23 19:19     ` Amber Lin
2026-03-20 20:02 ` [PATCH 6/8] drm/amdgpu: Enable suspend/resume gang in mes 12.1 Amber Lin
2026-03-23 19:11   ` Alex Deucher
2026-03-20 20:02 ` [PATCH 7/8] drm/amdkfd: Add detect+reset hangs to GC 12.1 Amber Lin
2026-03-23 19:12   ` Alex Deucher
2026-03-20 20:02 ` [PATCH 8/8] drm/amdkfd: Reset queue/pipe in MES Amber Lin
2026-03-23 19:21   ` Alex Deucher
2026-03-23 19:42     ` Amber Lin
  -- strict thread matches above, loose matches on Subject: below --
2026-03-24 17:56 [PATCH 0/8] Support compute queue/pipe reset on gfx 12.1 Amber Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox