AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/10] drm/amdgpu: prevent concurrent GPU access during reset
@ 2024-05-28 17:23 Yunxiang Li
  2024-05-28 17:23 ` [PATCH v2 01/10] drm/amdgpu: add skip_hw_access checks for sriov Yunxiang Li
                   ` (10 more replies)
  0 siblings, 11 replies; 52+ messages in thread
From: Yunxiang Li @ 2024-05-28 17:23 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alexander.Deucher, christian.koenig, Likun.Gao, Hawking.Zhang,
	Yunxiang Li

If another thread accesses the gpu while the GPU is being reset, the
reset could fail. This is especially problematic on SRIOV since host
may reset the GPU even if guest is not yet ready.

There are code in place that tries to prevent stray access, but over
time bugs have crept in making it not reliable. This series hopes to
address these bugs.

Likun Gao (1):
  drm/amd/amdgpu: remove unnecessary flush when enable gart

Yunxiang Li (9):
  drm/amdgpu: add skip_hw_access checks for sriov
  drm/amdgpu: fix sriov host flr handler
  drm/amdgpu: abort fence poll if reset is started
  drm/amdgpu/kfd: remove is_hws_hang and is_resetting
  drm/amdgpu: remove tlb flush in amdgpu_gtt_mgr_recover
  drm/amdgpu: use helper in amdgpu_gart_unbind
  drm/amdgpu: fix locking scope when flushing tlb
  drm/amdgpu: fix missing reset domain locks
  Revert "drm/amdgpu: Queue KFD reset workitem in VF FED"

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c     |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c      |  9 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c       | 66 ++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c   |  2 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       |  8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |  7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c      | 25 +++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h      |  2 +
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |  3 -
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c        |  3 -
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c        |  3 -
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c        |  3 -
 drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c        |  4 -
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c        |  2 +-
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c        |  2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c         | 37 ++++-----
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c         | 37 ++++-----
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c         |  6 --
 drivers/gpu/drm/amd/amdkfd/kfd_device.c       |  1 -
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 79 ++++++++-----------
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  1 -
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 11 ++-
 .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  4 +-
 .../amd/amdkfd/kfd_process_queue_manager.c    | 13 ++-
 27 files changed, 164 insertions(+), 177 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2024-06-05 12:32 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-28 17:23 [PATCH v2 00/10] drm/amdgpu: prevent concurrent GPU access during reset Yunxiang Li
2024-05-28 17:23 ` [PATCH v2 01/10] drm/amdgpu: add skip_hw_access checks for sriov Yunxiang Li
2024-05-29  6:36   ` Christian König
2024-05-28 17:23 ` [PATCH v2 02/10] drm/amdgpu: fix sriov host flr handler Yunxiang Li
2024-05-29  6:41   ` Christian König
2024-05-28 17:23 ` [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started Yunxiang Li
2024-05-29  6:38   ` Christian König
2024-05-29 13:22     ` Li, Yunxiang (Teddy)
2024-05-29 13:31       ` Christian König
2024-05-29 13:44         ` Li, Yunxiang (Teddy)
2024-05-29 13:55           ` Christian König
2024-05-29 14:31             ` Li, Yunxiang (Teddy)
2024-05-29 14:35               ` Christian König
2024-05-29 14:48                 ` Li, Yunxiang (Teddy)
2024-05-29 15:19                   ` Christian König
2024-05-31 14:44                     ` Liu, Shaoyun
2024-06-03 10:58                       ` Christian König
2024-06-03 18:28                         ` Liu, Shaoyun
2024-06-04  8:07                           ` Christian König
2024-06-05 12:32                             ` Liu, Shaoyun
2024-05-28 17:23 ` [PATCH v2 04/10] drm/amdgpu/kfd: remove is_hws_hang and is_resetting Yunxiang Li
2024-05-29  6:41   ` Christian König
2024-05-29 23:04   ` Felix Kuehling
2024-05-30  0:06     ` Li, Yunxiang (Teddy)
2024-05-28 17:23 ` [PATCH v2 05/10] drm/amd/amdgpu: remove unnecessary flush when enable gart Yunxiang Li
2024-05-29  6:43   ` Christian König
2024-05-28 17:23 ` [PATCH v2 06/10] drm/amdgpu: remove tlb flush in amdgpu_gtt_mgr_recover Yunxiang Li
2024-05-29  6:45   ` Christian König
2024-05-28 17:23 ` [PATCH v2 07/10] drm/amdgpu: use helper in amdgpu_gart_unbind Yunxiang Li
2024-05-29  6:46   ` Christian König
2024-05-28 17:23 ` [PATCH v2 08/10] drm/amdgpu: fix locking scope when flushing tlb Yunxiang Li
2024-05-29  6:49   ` Christian König
2024-05-28 17:23 ` [PATCH v2 09/10] drm/amdgpu: fix missing reset domain locks Yunxiang Li
2024-05-29  6:55   ` Christian König
2024-05-30 22:02   ` Felix Kuehling
2024-05-30 22:35     ` Li, Yunxiang (Teddy)
2024-05-31  6:52     ` Christian König
2024-05-31 15:47       ` Felix Kuehling
2024-06-04 12:52         ` Li, Yunxiang (Teddy)
2024-05-28 17:23 ` [PATCH v2 10/10] Revert "drm/amdgpu: Queue KFD reset workitem in VF FED" Yunxiang Li
2024-05-28 19:04   ` Skvortsov, Victor
2024-05-30 21:47 ` [PATCH v3 0/8] drm/amdgpu: prevent concurrent GPU access during reset Yunxiang Li
2024-05-30 21:47   ` [PATCH v3 1/8] drm/amdgpu: add skip_hw_access checks for sriov Yunxiang Li
2024-05-30 21:47   ` [PATCH v3 2/8] drm/amdgpu: fix sriov host flr handler Yunxiang Li
2024-06-05  1:12     ` Deng, Emily
2024-05-30 21:48   ` [PATCH v3 3/8] drm/amdgpu/kfd: remove is_hws_hang and is_resetting Yunxiang Li
2024-05-30 21:48   ` [PATCH v3 4/8] drm/amd/amdgpu: remove unnecessary flush when enable gart Yunxiang Li
2024-05-30 21:48   ` [PATCH v3 5/8] drm/amdgpu: remove tlb flush in amdgpu_gtt_mgr_recover Yunxiang Li
2024-05-30 21:48   ` [PATCH v3 6/8] drm/amdgpu: use helper in amdgpu_gart_unbind Yunxiang Li
2024-05-30 21:48   ` [PATCH v3 7/8] drm/amdgpu: fix locking scope when flushing tlb Yunxiang Li
2024-05-30 21:48   ` [PATCH v3 8/8] drm/amdgpu: fix missing reset domain locks Yunxiang Li
2024-05-31  6:50     ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox