All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Add work pool to reset domain
@ 2023-08-11  6:02 Lijo Lazar
  2023-08-11  6:02 ` [PATCH 1/5] drm/amdgpu: " Lijo Lazar
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Lijo Lazar @ 2023-08-11  6:02 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alexander.Deucher, Asad.Kamal, Hawking.Zhang

Presently, there are multiple clients of reset like RAS, job timeout, KFD hang
detection and debug method. Instead of each client maintaining a work item,
reset work pool is moved to reset domain. When a client makes a recovery request,
a work item is allocated by the reset domain and queued for execution. For the
case of job timeout, each ring has its own TDR queue to which tdr work is
scheduled. From there, it's further queued to a reset domain based on the device
configuration.

This allows flexibility to have multiple reset domains. For example, when
there are partitions, each partition can maintain its own reset domain and a job
timeout on one partition doesn't affect jobs on the other partition (when the
jobs don't have any interdependency). The reset logic will select the
appropriate reset domain based on the current device configuration.

Lijo Lazar (5):
  drm/amdgpu: Add work pool to reset domain
  drm/amdgpu: Move to reset_schedule_work
  drm/amdgpu: Set flags to cancel all pending resets
  drm/amdgpu: Add API to queue and do reset work
  drm/amdgpu: Add TDR queue for ring

 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   2 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  32 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  24 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  40 +++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  16 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  71 ++++++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  | 122 ++++++++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  |  32 +++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |   5 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |   1 -
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c      |  38 +++----
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c      |  44 ++++----
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c      |  33 +++---
 15 files changed, 285 insertions(+), 177 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-08-14 11:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-11  6:02 [PATCH 0/5] Add work pool to reset domain Lijo Lazar
2023-08-11  6:02 ` [PATCH 1/5] drm/amdgpu: " Lijo Lazar
2023-08-11  6:02 ` [PATCH 2/5] drm/amdgpu: Move to reset_schedule_work Lijo Lazar
2023-08-11  6:02 ` [PATCH 3/5] drm/amdgpu: Set flags to cancel all pending resets Lijo Lazar
2023-08-11  6:02 ` [PATCH 4/5] drm/amdgpu: Add API to queue and do reset work Lijo Lazar
2023-08-11  6:02 ` [PATCH 5/5] drm/amdgpu: Add TDR queue for ring Lijo Lazar
2023-08-12  8:23 ` [PATCH 0/5] Add work pool to reset domain Christian König
2023-08-12 17:08   ` Lazar, Lijo
2023-08-14 11:55     ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.