[PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves
@ 2025-11-21 10:12 Pierre-Eric Pelloux-Prayer
  2025-11-21 10:12 ` [PATCH v3 01/28] drm/amdgpu: give each kernel job a unique id Pierre-Eric Pelloux-Prayer
                   ` (27 more replies)
  0 siblings, 28 replies; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  Cc: Pierre-Eric Pelloux-Prayer, Christian König, Alex Deucher,
	David Airlie, Felix Kuehling, Harry Wentland, Huang Rui, Leo Li,
	Maarten Lankhorst, Maxime Ripard, Simona Vetter, Sumit Semwal,
	Thomas Zimmermann, amd-gfx, dri-devel, linaro-mm-sig,
	linux-kernel, linux-media

The drm/ttm patch modifies TTM to support multiple contexts for the pipelined moves.

Then amdgpu/ttm is updated to express dependencies between jobs explicitely,
instead of relying on the ordering of execution guaranteed by the use of a single
instance.
With all of this in place, we can use multiple entities, with each having access
to the available SDMA instances.

This rework also gives the opportunity to merge the clear functions into a single
one and to optimize a bit GART usage.

(The first patch of the series has already been merged through drm-misc but I'm
including it here to reduce conflicts)

For v3 I've kept the series as a whole but I've reorganized the patches so that
everything up to the drm/ttm change can be merged through amd-staging-drm-next
once reviewed.

v3:
 - shuffled the patches: everything up to the drm/ttm patch has no dependency
   on the ttm change and be merged independently
 - split "drm/amdgpu: pass the entity to use to ttm functions" in 2 commits
 - moved AMDGPU_GTT_NUM_TRANSFER_WINDOWS removal to its own commit
 - added a ttm job submission helper
 - addressed comments from Christian and Felix
v2:
  - addressed comments from Christian
  - dropped "drm/amdgpu: prepare amdgpu_fill_buffer to use N entities" and
    "drm/amdgpu: use multiple entities in amdgpu_fill_buffer"
  - added "drm/admgpu: handle resv dependencies in amdgpu_ttm_map_buffer",
    "drm/amdgpu: round robin through clear_entities in amdgpu_fill_buffer"
  - reworked how sdma rings/scheds are passed to amdgpu_ttm
v1: https://lists.freedesktop.org/archives/dri-devel/2025-November/534517.html

Pierre-Eric Pelloux-Prayer (28):
  drm/amdgpu: give each kernel job a unique id
  drm/amdgpu: use ttm_resource_manager_cleanup
  drm/amdgpu: remove direct_submit arg from amdgpu_copy_buffer
  drm/amdgpu: remove the ring param from ttm functions
  drm/amdgpu: introduce amdgpu_ttm_buffer_entity
  drm/amdgpu: add amdgpu_ttm_job_submit helper
  drm/amdgpu: fix error handling in amdgpu_copy_buffer
  drm/amdgpu: pass the entity to use to amdgpu_ttm_map_buffer
  drm/amdgpu: pass the entity to use to ttm public functions
  drm/amdgpu: add amdgpu_device argument to ttm functions that need it
  drm/amdgpu: statically assign gart windows to ttm entities
  drm/amdgpu: remove AMDGPU_GTT_NUM_TRANSFER_WINDOWS
  drm/amdgpu: add missing lock when using ttm entities
  drm/amdgpu: check entity lock is held in amdgpu_ttm_job_submit
  drm/amdgpu: double AMDGPU_GTT_MAX_TRANSFER_SIZE
  drm/amdgpu: use larger gart window when possible
  drm/amdgpu: introduce amdgpu_sdma_set_vm_pte_scheds
  drm/amdgpu: move sched status check inside
    amdgpu_ttm_set_buffer_funcs_status
  drm/ttm: rework pipelined eviction fence handling
  drm/amdgpu: allocate multiple clear entities
  drm/amdgpu: allocate multiple move entities
  drm/amdgpu: round robin through clear_entities in amdgpu_fill_buffer
  drm/amdgpu: use TTM_NUM_MOVE_FENCES when reserving fences
  drm/amdgpu: use multiple entities in amdgpu_move_blit
  drm/amdgpu: pass all the sdma scheds to amdgpu_mman
  drm/amdgpu: give ttm entities access to all the sdma scheds
  drm/amdgpu: get rid of amdgpu_ttm_clear_buffer
  drm/amdgpu: rename amdgpu_fill_buffer as amdgpu_ttm_clear_buffer

 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  15 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c       |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c       |  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c   |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       |   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h       |  19 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c      |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |  16 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 493 +++++++++++-------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h       |  58 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c       |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c       |  11 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h       |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c       |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c      |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |  26 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c    |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c     |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c   |  12 +-
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c         |  34 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c        |  34 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c        |  34 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c        |  41 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c      |  41 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c        |  37 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c        |  37 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c        |  32 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c        |  32 +-
 drivers/gpu/drm/amd/amdgpu/si_dma.c           |  34 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c         |   6 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c         |   6 +-
 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c         |  12 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c      |  33 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |   3 +-
 .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |   6 +-
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c  |   6 +-
 .../gpu/drm/ttm/tests/ttm_bo_validate_test.c  |  11 +-
 drivers/gpu/drm/ttm/tests/ttm_resource_test.c |   5 +-
 drivers/gpu/drm/ttm/ttm_bo.c                  |  47 +-
 drivers/gpu/drm/ttm/ttm_bo_util.c             |  38 +-
 drivers/gpu/drm/ttm/ttm_resource.c            |  31 +-
 include/drm/ttm/ttm_resource.h                |  29 +-
 47 files changed, 706 insertions(+), 615 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v3 01/28] drm/amdgpu: give each kernel job a unique id
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 12:51   ` Christian König
  2025-11-21 20:02   ` Felix Kuehling
  2025-11-21 10:12 ` [PATCH v3 02/28] drm/amdgpu: use ttm_resource_manager_cleanup Pierre-Eric Pelloux-Prayer
                   ` (26 subsequent siblings)
  27 siblings, 2 replies; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Felix Kuehling
  Cc: Pierre-Eric Pelloux-Prayer, Arunpravin Paneer Selvam, amd-gfx,
	dri-devel, linux-kernel

Userspace jobs have drm_file.client_id as a unique identifier
as job's owners. For kernel jobs, we can allocate arbitrary
values - the risk of overlap with userspace ids is small (given
that it's a u64 value).
In the unlikely case the overlap happens, it'll only impact
trace events.

Since this ID is traced in the gpu_scheduler trace events, this
allows to determine the source of each job sent to the hardware.

To make grepping easier, the IDs are defined as they will appear
in the trace output.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://lore.kernel.org/r/20250604122827.2191-1-pierre-eric.pelloux-prayer@amd.com
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c     |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c     |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  5 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     | 19 +++++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c    |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     | 28 +++++++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h     |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c     |  5 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  8 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      |  6 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c  |  4 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c   |  4 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 12 +++++----
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |  6 +++--
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c       |  6 +++--
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c    |  3 ++-
 19 files changed, 84 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 5a1904b0b064..1ffbd416a8ad 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -1551,7 +1551,8 @@ static int amdgpu_gfx_run_cleaner_shader_job(struct amdgpu_ring *ring)
 	owner = (void *)(unsigned long)atomic_inc_return(&counter);
 
 	r = amdgpu_job_alloc_with_ib(ring->adev, &entity, owner,
-				     64, 0, &job);
+				     64, 0, &job,
+				     AMDGPU_KERNEL_JOB_ID_CLEANER_SHADER);
 	if (r)
 		goto err;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 0017bd10d452..ea8ec160b98a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -690,7 +690,7 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
 	r = amdgpu_job_alloc_with_ib(ring->adev, &adev->mman.high_pr,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     16 * 4, AMDGPU_IB_POOL_IMMEDIATE,
-				     &job);
+				     &job, AMDGPU_KERNEL_JOB_ID_FLUSH_GPU_TLB);
 	if (r)
 		goto error_alloc;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index efa3281145f6..b284bd8021df 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -232,11 +232,12 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev,
 			     struct drm_sched_entity *entity, void *owner,
 			     size_t size, enum amdgpu_ib_pool_type pool_type,
-			     struct amdgpu_job **job)
+			     struct amdgpu_job **job, u64 k_job_id)
 {
 	int r;
 
-	r = amdgpu_job_alloc(adev, NULL, entity, owner, 1, job, 0);
+	r = amdgpu_job_alloc(adev, NULL, entity, owner, 1, job,
+			     k_job_id);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index d25f1fcf0242..7abf069d17d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -44,6 +44,22 @@
 struct amdgpu_fence;
 enum amdgpu_ib_pool_type;
 
+/* Internal kernel job ids. (decreasing values, starting from U64_MAX). */
+#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE              (18446744073709551615ULL)
+#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE_PDES         (18446744073709551614ULL)
+#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE_RANGE        (18446744073709551613ULL)
+#define AMDGPU_KERNEL_JOB_ID_VM_PT_CLEAR            (18446744073709551612ULL)
+#define AMDGPU_KERNEL_JOB_ID_TTM_MAP_BUFFER         (18446744073709551611ULL)
+#define AMDGPU_KERNEL_JOB_ID_TTM_ACCESS_MEMORY_SDMA (18446744073709551610ULL)
+#define AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER        (18446744073709551609ULL)
+#define AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE       (18446744073709551608ULL)
+#define AMDGPU_KERNEL_JOB_ID_MOVE_BLIT              (18446744073709551607ULL)
+#define AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER       (18446744073709551606ULL)
+#define AMDGPU_KERNEL_JOB_ID_CLEANER_SHADER         (18446744073709551605ULL)
+#define AMDGPU_KERNEL_JOB_ID_FLUSH_GPU_TLB          (18446744073709551604ULL)
+#define AMDGPU_KERNEL_JOB_ID_KFD_GART_MAP           (18446744073709551603ULL)
+#define AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST          (18446744073709551602ULL)
+
 struct amdgpu_job {
 	struct drm_sched_job    base;
 	struct amdgpu_vm	*vm;
@@ -97,7 +113,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev,
 			     struct drm_sched_entity *entity, void *owner,
 			     size_t size, enum amdgpu_ib_pool_type pool_type,
-			     struct amdgpu_job **job);
+			     struct amdgpu_job **job,
+			     u64 k_job_id);
 void amdgpu_job_set_resources(struct amdgpu_job *job, struct amdgpu_bo *gds,
 			      struct amdgpu_bo *gws, struct amdgpu_bo *oa);
 void amdgpu_job_free_resources(struct amdgpu_job *job);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
index 91678621f1ff..63ee6ba6a931 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
@@ -196,7 +196,8 @@ static int amdgpu_jpeg_dec_set_reg(struct amdgpu_ring *ring, uint32_t handle,
 	int i, r;
 
 	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
-				     AMDGPU_IB_POOL_DIRECT, &job);
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 24ebba43a469..926a3f09a776 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1322,7 +1322,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
 	if (r)
 		goto out;
 
-	r = amdgpu_fill_buffer(abo, 0, &bo->base._resv, &fence, true);
+	r = amdgpu_fill_buffer(abo, 0, &bo->base._resv, &fence, true,
+			       AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
 	if (WARN_ON(r))
 		goto out;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 04a79ef05f90..6a1434391fb8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -226,7 +226,8 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo,
 	r = amdgpu_job_alloc_with_ib(adev, &adev->mman.high_pr,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     num_dw * 4 + num_bytes,
-				     AMDGPU_IB_POOL_DELAYED, &job);
+				     AMDGPU_IB_POOL_DELAYED, &job,
+				     AMDGPU_KERNEL_JOB_ID_TTM_MAP_BUFFER);
 	if (r)
 		return r;
 
@@ -398,7 +399,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
 		struct dma_fence *wipe_fence = NULL;
 
 		r = amdgpu_fill_buffer(abo, 0, NULL, &wipe_fence,
-				       false);
+				       false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
 		if (r) {
 			goto error;
 		} else if (wipe_fence) {
@@ -1480,7 +1481,8 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
 	r = amdgpu_job_alloc_with_ib(adev, &adev->mman.high_pr,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     num_dw * 4, AMDGPU_IB_POOL_DELAYED,
-				     &job);
+				     &job,
+				     AMDGPU_KERNEL_JOB_ID_TTM_ACCESS_MEMORY_SDMA);
 	if (r)
 		goto out;
 
@@ -2204,7 +2206,7 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
 				  struct dma_resv *resv,
 				  bool vm_needs_flush,
 				  struct amdgpu_job **job,
-				  bool delayed)
+				  bool delayed, u64 k_job_id)
 {
 	enum amdgpu_ib_pool_type pool = direct_submit ?
 		AMDGPU_IB_POOL_DIRECT :
@@ -2214,7 +2216,7 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
 						    &adev->mman.high_pr;
 	r = amdgpu_job_alloc_with_ib(adev, entity,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
-				     num_dw * 4, pool, job);
+				     num_dw * 4, pool, job, k_job_id);
 	if (r)
 		return r;
 
@@ -2254,7 +2256,8 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
 	num_loops = DIV_ROUND_UP(byte_count, max_bytes);
 	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8);
 	r = amdgpu_ttm_prepare_job(adev, direct_submit, num_dw,
-				   resv, vm_needs_flush, &job, false);
+				   resv, vm_needs_flush, &job, false,
+				   AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER);
 	if (r)
 		return r;
 
@@ -2289,7 +2292,8 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data,
 			       uint64_t dst_addr, uint32_t byte_count,
 			       struct dma_resv *resv,
 			       struct dma_fence **fence,
-			       bool vm_needs_flush, bool delayed)
+			       bool vm_needs_flush, bool delayed,
+			       u64 k_job_id)
 {
 	struct amdgpu_device *adev = ring->adev;
 	unsigned int num_loops, num_dw;
@@ -2302,7 +2306,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data,
 	num_loops = DIV_ROUND_UP_ULL(byte_count, max_bytes);
 	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->fill_num_dw, 8);
 	r = amdgpu_ttm_prepare_job(adev, false, num_dw, resv, vm_needs_flush,
-				   &job, delayed);
+				   &job, delayed, k_job_id);
 	if (r)
 		return r;
 
@@ -2372,7 +2376,8 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
 			goto err;
 
 		r = amdgpu_ttm_fill_mem(ring, 0, addr, size, resv,
-					&next, true, true);
+					&next, true, true,
+					AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
 		if (r)
 			goto err;
 
@@ -2391,7 +2396,8 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
 			uint32_t src_data,
 			struct dma_resv *resv,
 			struct dma_fence **f,
-			bool delayed)
+			bool delayed,
+			u64 k_job_id)
 {
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
 	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
@@ -2421,7 +2427,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
 			goto error;
 
 		r = amdgpu_ttm_fill_mem(ring, src_data, to, cur_size, resv,
-					&next, true, delayed);
+					&next, true, delayed, k_job_id);
 		if (r)
 			goto error;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 054d48823d5f..577ee04ce0bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -175,7 +175,8 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
 			uint32_t src_data,
 			struct dma_resv *resv,
 			struct dma_fence **fence,
-			bool delayed);
+			bool delayed,
+			u64 k_job_id);
 
 int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
 void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 74758b5ffc6c..5c38f0d30c87 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -1136,7 +1136,8 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring *ring, struct amdgpu_bo *bo,
 	r = amdgpu_job_alloc_with_ib(ring->adev, &adev->uvd.entity,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     64, direct ? AMDGPU_IB_POOL_DIRECT :
-				     AMDGPU_IB_POOL_DELAYED, &job);
+				     AMDGPU_IB_POOL_DELAYED, &job,
+				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 709ca369cb52..a7d8f1ce6ac2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -491,7 +491,7 @@ static int amdgpu_vce_get_create_msg(struct amdgpu_ring *ring, uint32_t handle,
 	r = amdgpu_job_alloc_with_ib(ring->adev, &ring->adev->vce.entity,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     ib_size_dw * 4, AMDGPU_IB_POOL_DIRECT,
-				     &job);
+				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
 	if (r)
 		return r;
 
@@ -582,7 +582,8 @@ static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     ib_size_dw * 4,
 				     direct ? AMDGPU_IB_POOL_DIRECT :
-				     AMDGPU_IB_POOL_DELAYED, &job);
+				     AMDGPU_IB_POOL_DELAYED, &job,
+				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 5ae7cc0d5f57..5e0786ea911b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -626,7 +626,7 @@ static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring *ring,
 
 	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL,
 				     64, AMDGPU_IB_POOL_DIRECT,
-				     &job);
+				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
 	if (r)
 		goto err;
 
@@ -806,7 +806,7 @@ static int amdgpu_vcn_dec_sw_send_msg(struct amdgpu_ring *ring,
 
 	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL,
 				     ib_size_dw * 4, AMDGPU_IB_POOL_DIRECT,
-				     &job);
+				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
 	if (r)
 		goto err;
 
@@ -936,7 +936,7 @@ static int amdgpu_vcn_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t hand
 
 	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL,
 				     ib_size_dw * 4, AMDGPU_IB_POOL_DIRECT,
-				     &job);
+				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
 	if (r)
 		return r;
 
@@ -1003,7 +1003,7 @@ static int amdgpu_vcn_enc_get_destroy_msg(struct amdgpu_ring *ring, uint32_t han
 
 	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL,
 				     ib_size_dw * 4, AMDGPU_IB_POOL_DIRECT,
-				     &job);
+				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index e2587eea6c4a..193de267984e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -989,7 +989,8 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
 	params.vm = vm;
 	params.immediate = immediate;
 
-	r = vm->update_funcs->prepare(&params, NULL);
+	r = vm->update_funcs->prepare(&params, NULL,
+				      AMDGPU_KERNEL_JOB_ID_VM_UPDATE_PDES);
 	if (r)
 		goto error;
 
@@ -1158,7 +1159,8 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		dma_fence_put(tmp);
 	}
 
-	r = vm->update_funcs->prepare(&params, sync);
+	r = vm->update_funcs->prepare(&params, sync,
+				      AMDGPU_KERNEL_JOB_ID_VM_UPDATE_RANGE);
 	if (r)
 		goto error_free;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 330e4bdea387..139642eacdd0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -310,7 +310,7 @@ struct amdgpu_vm_update_params {
 struct amdgpu_vm_update_funcs {
 	int (*map_table)(struct amdgpu_bo_vm *bo);
 	int (*prepare)(struct amdgpu_vm_update_params *p,
-		       struct amdgpu_sync *sync);
+		       struct amdgpu_sync *sync, u64 k_job_id);
 	int (*update)(struct amdgpu_vm_update_params *p,
 		      struct amdgpu_bo_vm *bo, uint64_t pe, uint64_t addr,
 		      unsigned count, uint32_t incr, uint64_t flags);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
index 0c1ef5850a5e..22e2e5b47341 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
@@ -40,12 +40,14 @@ static int amdgpu_vm_cpu_map_table(struct amdgpu_bo_vm *table)
  *
  * @p: see amdgpu_vm_update_params definition
  * @sync: sync obj with fences to wait on
+ * @k_job_id: the id for tracing/debug purposes
  *
  * Returns:
  * Negativ errno, 0 for success.
  */
 static int amdgpu_vm_cpu_prepare(struct amdgpu_vm_update_params *p,
-				 struct amdgpu_sync *sync)
+				 struct amdgpu_sync *sync,
+				 u64 k_job_id)
 {
 	if (!sync)
 		return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index f6ffc207ec2a..c7a7d51080a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -26,6 +26,7 @@
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_vm.h"
+#include "amdgpu_job.h"
 
 /*
  * amdgpu_vm_pt_cursor - state for for_each_amdgpu_vm_pt
@@ -396,7 +397,8 @@ int amdgpu_vm_pt_clear(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	params.vm = vm;
 	params.immediate = immediate;
 
-	r = vm->update_funcs->prepare(&params, NULL);
+	r = vm->update_funcs->prepare(&params, NULL,
+				      AMDGPU_KERNEL_JOB_ID_VM_PT_CLEAR);
 	if (r)
 		goto exit;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
index 46d9fb433ab2..36805dcfa159 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
@@ -40,7 +40,7 @@ static int amdgpu_vm_sdma_map_table(struct amdgpu_bo_vm *table)
 
 /* Allocate a new job for @count PTE updates */
 static int amdgpu_vm_sdma_alloc_job(struct amdgpu_vm_update_params *p,
-				    unsigned int count)
+				    unsigned int count, u64 k_job_id)
 {
 	enum amdgpu_ib_pool_type pool = p->immediate ? AMDGPU_IB_POOL_IMMEDIATE
 		: AMDGPU_IB_POOL_DELAYED;
@@ -56,7 +56,7 @@ static int amdgpu_vm_sdma_alloc_job(struct amdgpu_vm_update_params *p,
 	ndw = min(ndw, AMDGPU_VM_SDMA_MAX_NUM_DW);
 
 	r = amdgpu_job_alloc_with_ib(p->adev, entity, AMDGPU_FENCE_OWNER_VM,
-				     ndw * 4, pool, &p->job);
+				     ndw * 4, pool, &p->job, k_job_id);
 	if (r)
 		return r;
 
@@ -69,16 +69,17 @@ static int amdgpu_vm_sdma_alloc_job(struct amdgpu_vm_update_params *p,
  *
  * @p: see amdgpu_vm_update_params definition
  * @sync: amdgpu_sync object with fences to wait for
+ * @k_job_id: identifier of the job, for tracing purpose
  *
  * Returns:
  * Negativ errno, 0 for success.
  */
 static int amdgpu_vm_sdma_prepare(struct amdgpu_vm_update_params *p,
-				  struct amdgpu_sync *sync)
+				  struct amdgpu_sync *sync, u64 k_job_id)
 {
 	int r;
 
-	r = amdgpu_vm_sdma_alloc_job(p, 0);
+	r = amdgpu_vm_sdma_alloc_job(p, 0, k_job_id);
 	if (r)
 		return r;
 
@@ -249,7 +250,8 @@ static int amdgpu_vm_sdma_update(struct amdgpu_vm_update_params *p,
 			if (r)
 				return r;
 
-			r = amdgpu_vm_sdma_alloc_job(p, count);
+			r = amdgpu_vm_sdma_alloc_job(p, count,
+						     AMDGPU_KERNEL_JOB_ID_VM_UPDATE);
 			if (r)
 				return r;
 		}
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
index 1c07b701d0e4..ceb94bbb03a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
@@ -217,7 +217,8 @@ static int uvd_v6_0_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t handle
 	int i, r;
 
 	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
-				     AMDGPU_IB_POOL_DIRECT, &job);
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
 	if (r)
 		return r;
 
@@ -281,7 +282,8 @@ static int uvd_v6_0_enc_get_destroy_msg(struct amdgpu_ring *ring,
 	int i, r;
 
 	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
-				     AMDGPU_IB_POOL_DIRECT, &job);
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
index 9d237b5937fb..1f8866f3f63c 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
@@ -225,7 +225,8 @@ static int uvd_v7_0_enc_get_create_msg(struct amdgpu_ring *ring, u32 handle,
 	int i, r;
 
 	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
-				     AMDGPU_IB_POOL_DIRECT, &job);
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
 	if (r)
 		return r;
 
@@ -288,7 +289,8 @@ static int uvd_v7_0_enc_get_destroy_msg(struct amdgpu_ring *ring, u32 handle,
 	int i, r;
 
 	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
-				     AMDGPU_IB_POOL_DIRECT, &job);
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 3653c563ee9a..46c84fc60af1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -67,7 +67,8 @@ svm_migrate_gart_map(struct amdgpu_ring *ring, u64 npages,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     num_dw * 4 + num_bytes,
 				     AMDGPU_IB_POOL_DELAYED,
-				     &job);
+				     &job,
+				     AMDGPU_KERNEL_JOB_ID_KFD_GART_MAP);
 	if (r)
 		return r;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 02/28] drm/amdgpu: use ttm_resource_manager_cleanup
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
  2025-11-21 10:12 ` [PATCH v3 01/28] drm/amdgpu: give each kernel job a unique id Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 12:52   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 03/28] drm/amdgpu: remove direct_submit arg from amdgpu_copy_buffer Pierre-Eric Pelloux-Prayer
                   ` (25 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

Rather than open-coding it.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 6a1434391fb8..8d0043ad5336 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2182,8 +2182,10 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 	} else {
 		drm_sched_entity_destroy(&adev->mman.high_pr);
 		drm_sched_entity_destroy(&adev->mman.low_pr);
-		dma_fence_put(man->move);
-		man->move = NULL;
+		/* Drop all the old fences since re-creating the scheduler entities
+		 * will allocate new contexts.
+		 */
+		ttm_resource_manager_cleanup(man);
 	}
 
 	/* this just adjusts TTM size idea, which sets lpfn to the correct value */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 03/28] drm/amdgpu: remove direct_submit arg from amdgpu_copy_buffer
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
  2025-11-21 10:12 ` [PATCH v3 01/28] drm/amdgpu: give each kernel job a unique id Pierre-Eric Pelloux-Prayer
  2025-11-21 10:12 ` [PATCH v3 02/28] drm/amdgpu: use ttm_resource_manager_cleanup Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 10:12 ` [PATCH v3 04/28] drm/amdgpu: remove the ring param from ttm functions Pierre-Eric Pelloux-Prayer
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Felix Kuehling, Sumit Semwal
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel,
	linux-media, linaro-mm-sig

It was always false.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 20 +++++++------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h       |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c      |  2 +-
 4 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
index 199693369c7c..02c2479a8840 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
@@ -39,7 +39,7 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size,
 	for (i = 0; i < n; i++) {
 		struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
 		r = amdgpu_copy_buffer(ring, saddr, daddr, size, NULL, &fence,
-				       false, false, 0);
+				       false, 0);
 		if (r)
 			goto exit_do_move;
 		r = dma_fence_wait(fence, false);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 8d0043ad5336..071afbacb3d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -346,7 +346,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
 		}
 
 		r = amdgpu_copy_buffer(ring, from, to, cur_size, resv,
-				       &next, false, true, copy_flags);
+				       &next, true, copy_flags);
 		if (r)
 			goto error;
 
@@ -2203,16 +2203,13 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 }
 
 static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
-				  bool direct_submit,
 				  unsigned int num_dw,
 				  struct dma_resv *resv,
 				  bool vm_needs_flush,
 				  struct amdgpu_job **job,
 				  bool delayed, u64 k_job_id)
 {
-	enum amdgpu_ib_pool_type pool = direct_submit ?
-		AMDGPU_IB_POOL_DIRECT :
-		AMDGPU_IB_POOL_DELAYED;
+	enum amdgpu_ib_pool_type pool = AMDGPU_IB_POOL_DELAYED;
 	int r;
 	struct drm_sched_entity *entity = delayed ? &adev->mman.low_pr :
 						    &adev->mman.high_pr;
@@ -2238,7 +2235,7 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
 int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
 		       uint64_t dst_offset, uint32_t byte_count,
 		       struct dma_resv *resv,
-		       struct dma_fence **fence, bool direct_submit,
+		       struct dma_fence **fence,
 		       bool vm_needs_flush, uint32_t copy_flags)
 {
 	struct amdgpu_device *adev = ring->adev;
@@ -2248,7 +2245,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
 	unsigned int i;
 	int r;
 
-	if (!direct_submit && !ring->sched.ready) {
+	if (!ring->sched.ready) {
 		dev_err(adev->dev,
 			"Trying to move memory with ring turned off.\n");
 		return -EINVAL;
@@ -2257,7 +2254,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
 	max_bytes = adev->mman.buffer_funcs->copy_max_bytes;
 	num_loops = DIV_ROUND_UP(byte_count, max_bytes);
 	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8);
-	r = amdgpu_ttm_prepare_job(adev, direct_submit, num_dw,
+	r = amdgpu_ttm_prepare_job(adev, num_dw,
 				   resv, vm_needs_flush, &job, false,
 				   AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER);
 	if (r)
@@ -2275,10 +2272,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
 
 	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
 	WARN_ON(job->ibs[0].length_dw > num_dw);
-	if (direct_submit)
-		r = amdgpu_job_submit_direct(job, ring, fence);
-	else
-		*fence = amdgpu_job_submit(job);
+	*fence = amdgpu_job_submit(job);
 	if (r)
 		goto error_free;
 
@@ -2307,7 +2301,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data,
 	max_bytes = adev->mman.buffer_funcs->fill_max_bytes;
 	num_loops = DIV_ROUND_UP_ULL(byte_count, max_bytes);
 	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->fill_num_dw, 8);
-	r = amdgpu_ttm_prepare_job(adev, false, num_dw, resv, vm_needs_flush,
+	r = amdgpu_ttm_prepare_job(adev, num_dw, resv, vm_needs_flush,
 				   &job, delayed, k_job_id);
 	if (r)
 		return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 577ee04ce0bf..50e40380fe95 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -166,7 +166,7 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev,
 int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
 		       uint64_t dst_offset, uint32_t byte_count,
 		       struct dma_resv *resv,
-		       struct dma_fence **fence, bool direct_submit,
+		       struct dma_fence **fence,
 		       bool vm_needs_flush, uint32_t copy_flags);
 int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
 			    struct dma_resv *resv,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 46c84fc60af1..378af0b2aaa9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -153,7 +153,7 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys,
 		}
 
 		r = amdgpu_copy_buffer(ring, gart_s, gart_d, size * PAGE_SIZE,
-				       NULL, &next, false, true, 0);
+				       NULL, &next, true, 0);
 		if (r) {
 			dev_err(adev->dev, "fail %d to copy memory\n", r);
 			goto out_unlock;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 04/28] drm/amdgpu: remove the ring param from ttm functions
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (2 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 03/28] drm/amdgpu: remove direct_submit arg from amdgpu_copy_buffer Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 12:58   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 05/28] drm/amdgpu: introduce amdgpu_ttm_buffer_entity Pierre-Eric Pelloux-Prayer
                   ` (23 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Felix Kuehling
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

With the removal of the direct_submit argument, the ring param
becomes useless: the jobs are always submitted to buffer_funcs_ring.

Some functions are getting an amdgpu_device argument since they
were getting it from the ring arg.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 46 ++++++++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h       |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c      |  2 +-
 4 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
index 02c2479a8840..3636b757c974 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
@@ -37,8 +37,7 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size,
 
 	stime = ktime_get();
 	for (i = 0; i < n; i++) {
-		struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
-		r = amdgpu_copy_buffer(ring, saddr, daddr, size, NULL, &fence,
+		r = amdgpu_copy_buffer(adev, saddr, daddr, size, NULL, &fence,
 				       false, 0);
 		if (r)
 			goto exit_do_move;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 071afbacb3d2..d54b57078946 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -164,11 +164,11 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 
 /**
  * amdgpu_ttm_map_buffer - Map memory into the GART windows
+ * @adev: the device being used
  * @bo: buffer object to map
  * @mem: memory object to map
  * @mm_cur: range to map
  * @window: which GART window to use
- * @ring: DMA ring to use for the copy
  * @tmz: if we should setup a TMZ enabled mapping
  * @size: in number of bytes to map, out number of bytes mapped
  * @addr: resulting address inside the MC address space
@@ -176,15 +176,16 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
  * Setup one of the GART windows to access a specific piece of memory or return
  * the physical address for local memory.
  */
-static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo,
+static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
+				 struct ttm_buffer_object *bo,
 				 struct ttm_resource *mem,
 				 struct amdgpu_res_cursor *mm_cur,
-				 unsigned int window, struct amdgpu_ring *ring,
+				 unsigned int window,
 				 bool tmz, uint64_t *size, uint64_t *addr)
 {
-	struct amdgpu_device *adev = ring->adev;
 	unsigned int offset, num_pages, num_dw, num_bytes;
 	uint64_t src_addr, dst_addr;
+	struct amdgpu_ring *ring;
 	struct amdgpu_job *job;
 	void *cpu_addr;
 	uint64_t flags;
@@ -239,6 +240,7 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo,
 	amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr,
 				dst_addr, num_bytes, 0);
 
+	ring = adev->mman.buffer_funcs_ring;
 	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
 	WARN_ON(job->ibs[0].length_dw > num_dw);
 
@@ -286,7 +288,6 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
 				      struct dma_resv *resv,
 				      struct dma_fence **f)
 {
-	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
 	struct amdgpu_res_cursor src_mm, dst_mm;
 	struct dma_fence *fence = NULL;
 	int r = 0;
@@ -312,13 +313,13 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
 		cur_size = min3(src_mm.size, dst_mm.size, 256ULL << 20);
 
 		/* Map src to window 0 and dst to window 1. */
-		r = amdgpu_ttm_map_buffer(src->bo, src->mem, &src_mm,
-					  0, ring, tmz, &cur_size, &from);
+		r = amdgpu_ttm_map_buffer(adev, src->bo, src->mem, &src_mm,
+					  0, tmz, &cur_size, &from);
 		if (r)
 			goto error;
 
-		r = amdgpu_ttm_map_buffer(dst->bo, dst->mem, &dst_mm,
-					  1, ring, tmz, &cur_size, &to);
+		r = amdgpu_ttm_map_buffer(adev, dst->bo, dst->mem, &dst_mm,
+					  1, tmz, &cur_size, &to);
 		if (r)
 			goto error;
 
@@ -345,7 +346,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
 							     write_compress_disable));
 		}
 
-		r = amdgpu_copy_buffer(ring, from, to, cur_size, resv,
+		r = amdgpu_copy_buffer(adev, from, to, cur_size, resv,
 				       &next, true, copy_flags);
 		if (r)
 			goto error;
@@ -2232,19 +2233,21 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
 						   DMA_RESV_USAGE_BOOKKEEP);
 }
 
-int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
+int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
 		       uint64_t dst_offset, uint32_t byte_count,
 		       struct dma_resv *resv,
 		       struct dma_fence **fence,
 		       bool vm_needs_flush, uint32_t copy_flags)
 {
-	struct amdgpu_device *adev = ring->adev;
 	unsigned int num_loops, num_dw;
+	struct amdgpu_ring *ring;
 	struct amdgpu_job *job;
 	uint32_t max_bytes;
 	unsigned int i;
 	int r;
 
+	ring = adev->mman.buffer_funcs_ring;
+
 	if (!ring->sched.ready) {
 		dev_err(adev->dev,
 			"Trying to move memory with ring turned off.\n");
@@ -2284,15 +2287,15 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
 	return r;
 }
 
-static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data,
+static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data,
 			       uint64_t dst_addr, uint32_t byte_count,
 			       struct dma_resv *resv,
 			       struct dma_fence **fence,
 			       bool vm_needs_flush, bool delayed,
 			       u64 k_job_id)
 {
-	struct amdgpu_device *adev = ring->adev;
 	unsigned int num_loops, num_dw;
+	struct amdgpu_ring *ring;
 	struct amdgpu_job *job;
 	uint32_t max_bytes;
 	unsigned int i;
@@ -2316,6 +2319,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data,
 		byte_count -= cur_size;
 	}
 
+	ring = adev->mman.buffer_funcs_ring;
 	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
 	WARN_ON(job->ibs[0].length_dw > num_dw);
 	*fence = amdgpu_job_submit(job);
@@ -2338,7 +2342,6 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
 			    struct dma_fence **fence)
 {
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
-	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
 	struct amdgpu_res_cursor cursor;
 	u64 addr;
 	int r = 0;
@@ -2366,12 +2369,12 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
 		/* Never clear more than 256MiB at once to avoid timeouts */
 		size = min(cursor.size, 256ULL << 20);
 
-		r = amdgpu_ttm_map_buffer(&bo->tbo, bo->tbo.resource, &cursor,
-					  1, ring, false, &size, &addr);
+		r = amdgpu_ttm_map_buffer(adev, &bo->tbo, bo->tbo.resource, &cursor,
+					  1, false, &size, &addr);
 		if (r)
 			goto err;
 
-		r = amdgpu_ttm_fill_mem(ring, 0, addr, size, resv,
+		r = amdgpu_ttm_fill_mem(adev, 0, addr, size, resv,
 					&next, true, true,
 					AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
 		if (r)
@@ -2396,7 +2399,6 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
 			u64 k_job_id)
 {
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
-	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
 	struct dma_fence *fence = NULL;
 	struct amdgpu_res_cursor dst;
 	int r;
@@ -2417,12 +2419,12 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
 		/* Never fill more than 256MiB at once to avoid timeouts */
 		cur_size = min(dst.size, 256ULL << 20);
 
-		r = amdgpu_ttm_map_buffer(&bo->tbo, bo->tbo.resource, &dst,
-					  1, ring, false, &cur_size, &to);
+		r = amdgpu_ttm_map_buffer(adev, &bo->tbo, bo->tbo.resource, &dst,
+					  1, false, &cur_size, &to);
 		if (r)
 			goto error;
 
-		r = amdgpu_ttm_fill_mem(ring, src_data, to, cur_size, resv,
+		r = amdgpu_ttm_fill_mem(adev, src_data, to, cur_size, resv,
 					&next, true, delayed, k_job_id);
 		if (r)
 			goto error;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 50e40380fe95..76e00a6510c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -163,7 +163,7 @@ int amdgpu_ttm_init(struct amdgpu_device *adev);
 void amdgpu_ttm_fini(struct amdgpu_device *adev);
 void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev,
 					bool enable);
-int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
+int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
 		       uint64_t dst_offset, uint32_t byte_count,
 		       struct dma_resv *resv,
 		       struct dma_fence **fence,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 378af0b2aaa9..0ad44acb08f2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -152,7 +152,7 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys,
 			goto out_unlock;
 		}
 
-		r = amdgpu_copy_buffer(ring, gart_s, gart_d, size * PAGE_SIZE,
+		r = amdgpu_copy_buffer(adev, gart_s, gart_d, size * PAGE_SIZE,
 				       NULL, &next, true, 0);
 		if (r) {
 			dev_err(adev->dev, "fail %d to copy memory\n", r);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 05/28] drm/amdgpu: introduce amdgpu_ttm_buffer_entity
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (3 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 04/28] drm/amdgpu: remove the ring param from ttm functions Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 10:12 ` [PATCH v3 06/28] drm/amdgpu: add amdgpu_ttm_job_submit helper Pierre-Eric Pelloux-Prayer
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Felix Kuehling
  Cc: Pierre-Eric Pelloux-Prayer, Felix Kuehling, amd-gfx, dri-devel,
	linux-kernel

No functional change for now, but this struct will have more
fields added in the next commit.

Technically the change introduces synchronisation issue, because
dependencies between successive jobs are not taken care of
properly. For instance, amdgpu_ttm_clear_buffer uses
amdgpu_ttm_map_buffer then amdgpu_ttm_fill_mem which use
different entities (default_entity then move/clear entity).
But it's all working as expected, because all entities use the
same sdma instance for now and default_entity has a higher prio
so its job always gets scheduler first.

The next commits will deal with these dependencies correctly.

---
v2: renamed amdgpu_ttm_buffer_entity
---

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  | 30 +++++++++++++++++-------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  | 12 ++++++----
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 13 ++++++----
 4 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index ea8ec160b98a..94e07b9ec7b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -687,7 +687,7 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
 	 * itself at least for GART.
 	 */
 	mutex_lock(&adev->mman.gtt_window_lock);
-	r = amdgpu_job_alloc_with_ib(ring->adev, &adev->mman.high_pr,
+	r = amdgpu_job_alloc_with_ib(ring->adev, &adev->mman.default_entity.base,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     16 * 4, AMDGPU_IB_POOL_IMMEDIATE,
 				     &job, AMDGPU_KERNEL_JOB_ID_FLUSH_GPU_TLB);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index d54b57078946..17e1892c44a2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -224,7 +224,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
 	num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
 	num_bytes = num_pages * 8 * AMDGPU_GPU_PAGES_IN_CPU_PAGE;
 
-	r = amdgpu_job_alloc_with_ib(adev, &adev->mman.high_pr,
+	r = amdgpu_job_alloc_with_ib(adev, &adev->mman.default_entity.base,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     num_dw * 4 + num_bytes,
 				     AMDGPU_IB_POOL_DELAYED, &job,
@@ -1479,7 +1479,7 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
 		memcpy(adev->mman.sdma_access_ptr, buf, len);
 
 	num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
-	r = amdgpu_job_alloc_with_ib(adev, &adev->mman.high_pr,
+	r = amdgpu_job_alloc_with_ib(adev, &adev->mman.default_entity.base,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     num_dw * 4, AMDGPU_IB_POOL_DELAYED,
 				     &job,
@@ -2161,7 +2161,7 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 
 		ring = adev->mman.buffer_funcs_ring;
 		sched = &ring->sched;
-		r = drm_sched_entity_init(&adev->mman.high_pr,
+		r = drm_sched_entity_init(&adev->mman.default_entity.base,
 					  DRM_SCHED_PRIORITY_KERNEL, &sched,
 					  1, NULL);
 		if (r) {
@@ -2171,18 +2171,30 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 			return;
 		}
 
-		r = drm_sched_entity_init(&adev->mman.low_pr,
+		r = drm_sched_entity_init(&adev->mman.clear_entity.base,
+					  DRM_SCHED_PRIORITY_NORMAL, &sched,
+					  1, NULL);
+		if (r) {
+			dev_err(adev->dev,
+				"Failed setting up TTM BO clear entity (%d)\n",
+				r);
+			goto error_free_entity;
+		}
+
+		r = drm_sched_entity_init(&adev->mman.move_entity.base,
 					  DRM_SCHED_PRIORITY_NORMAL, &sched,
 					  1, NULL);
 		if (r) {
 			dev_err(adev->dev,
 				"Failed setting up TTM BO move entity (%d)\n",
 				r);
+			drm_sched_entity_destroy(&adev->mman.clear_entity.base);
 			goto error_free_entity;
 		}
 	} else {
-		drm_sched_entity_destroy(&adev->mman.high_pr);
-		drm_sched_entity_destroy(&adev->mman.low_pr);
+		drm_sched_entity_destroy(&adev->mman.default_entity.base);
+		drm_sched_entity_destroy(&adev->mman.clear_entity.base);
+		drm_sched_entity_destroy(&adev->mman.move_entity.base);
 		/* Drop all the old fences since re-creating the scheduler entities
 		 * will allocate new contexts.
 		 */
@@ -2200,7 +2212,7 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 	return;
 
 error_free_entity:
-	drm_sched_entity_destroy(&adev->mman.high_pr);
+	drm_sched_entity_destroy(&adev->mman.default_entity.base);
 }
 
 static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
@@ -2212,8 +2224,8 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
 {
 	enum amdgpu_ib_pool_type pool = AMDGPU_IB_POOL_DELAYED;
 	int r;
-	struct drm_sched_entity *entity = delayed ? &adev->mman.low_pr :
-						    &adev->mman.high_pr;
+	struct drm_sched_entity *entity = delayed ? &adev->mman.clear_entity.base :
+						    &adev->mman.move_entity.base;
 	r = amdgpu_job_alloc_with_ib(adev, entity,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     num_dw * 4, pool, job, k_job_id);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 76e00a6510c6..41bbc25680a2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -52,6 +52,10 @@ struct amdgpu_gtt_mgr {
 	spinlock_t lock;
 };
 
+struct amdgpu_ttm_buffer_entity {
+	struct drm_sched_entity base;
+};
+
 struct amdgpu_mman {
 	struct ttm_device		bdev;
 	struct ttm_pool			*ttm_pools;
@@ -64,10 +68,10 @@ struct amdgpu_mman {
 	bool					buffer_funcs_enabled;
 
 	struct mutex				gtt_window_lock;
-	/* High priority scheduler entity for buffer moves */
-	struct drm_sched_entity			high_pr;
-	/* Low priority scheduler entity for VRAM clearing */
-	struct drm_sched_entity			low_pr;
+
+	struct amdgpu_ttm_buffer_entity default_entity;
+	struct amdgpu_ttm_buffer_entity clear_entity;
+	struct amdgpu_ttm_buffer_entity move_entity;
 
 	struct amdgpu_vram_mgr vram_mgr;
 	struct amdgpu_gtt_mgr gtt_mgr;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 0ad44acb08f2..ade1d4068d29 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -45,7 +45,9 @@ svm_migrate_direct_mapping_addr(struct amdgpu_device *adev, u64 addr)
 }
 
 static int
-svm_migrate_gart_map(struct amdgpu_ring *ring, u64 npages,
+svm_migrate_gart_map(struct amdgpu_ring *ring,
+		     struct amdgpu_ttm_buffer_entity *entity,
+		     u64 npages,
 		     dma_addr_t *addr, u64 *gart_addr, u64 flags)
 {
 	struct amdgpu_device *adev = ring->adev;
@@ -63,7 +65,7 @@ svm_migrate_gart_map(struct amdgpu_ring *ring, u64 npages,
 	num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
 	num_bytes = npages * 8;
 
-	r = amdgpu_job_alloc_with_ib(adev, &adev->mman.high_pr,
+	r = amdgpu_job_alloc_with_ib(adev, &entity->base,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     num_dw * 4 + num_bytes,
 				     AMDGPU_IB_POOL_DELAYED,
@@ -128,11 +130,14 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys,
 {
 	const u64 GTT_MAX_PAGES = AMDGPU_GTT_MAX_TRANSFER_SIZE;
 	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
+	struct amdgpu_ttm_buffer_entity *entity;
 	u64 gart_s, gart_d;
 	struct dma_fence *next;
 	u64 size;
 	int r;
 
+	entity = &adev->mman.move_entity;
+
 	mutex_lock(&adev->mman.gtt_window_lock);
 
 	while (npages) {
@@ -140,10 +145,10 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys,
 
 		if (direction == FROM_VRAM_TO_RAM) {
 			gart_s = svm_migrate_direct_mapping_addr(adev, *vram);
-			r = svm_migrate_gart_map(ring, size, sys, &gart_d, 0);
+			r = svm_migrate_gart_map(ring, entity, size, sys, &gart_d, 0);
 
 		} else if (direction == FROM_RAM_TO_VRAM) {
-			r = svm_migrate_gart_map(ring, size, sys, &gart_s,
+			r = svm_migrate_gart_map(ring, entity, size, sys, &gart_s,
 						 KFD_IOCTL_SVM_FLAG_GPU_RO);
 			gart_d = svm_migrate_direct_mapping_addr(adev, *vram);
 		}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 06/28] drm/amdgpu: add amdgpu_ttm_job_submit helper
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (4 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 05/28] drm/amdgpu: introduce amdgpu_ttm_buffer_entity Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 13:00   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 07/28] drm/amdgpu: fix error handling in amdgpu_copy_buffer Pierre-Eric Pelloux-Prayer
                   ` (21 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Sumit Semwal
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel,
	linux-media, linaro-mm-sig

Deduplicate the IB padding code and will also be used
later to check locking.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 34 ++++++++++++-------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 17e1892c44a2..be1232b2d55e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -162,6 +162,18 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 	*placement = abo->placement;
 }
 
+static struct dma_fence *
+amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_job *job, u32 num_dw)
+{
+	struct amdgpu_ring *ring;
+
+	ring = adev->mman.buffer_funcs_ring;
+	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
+	WARN_ON(job->ibs[0].length_dw > num_dw);
+
+	return amdgpu_job_submit(job);
+}
+
 /**
  * amdgpu_ttm_map_buffer - Map memory into the GART windows
  * @adev: the device being used
@@ -185,7 +197,6 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
 {
 	unsigned int offset, num_pages, num_dw, num_bytes;
 	uint64_t src_addr, dst_addr;
-	struct amdgpu_ring *ring;
 	struct amdgpu_job *job;
 	void *cpu_addr;
 	uint64_t flags;
@@ -240,10 +251,6 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
 	amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr,
 				dst_addr, num_bytes, 0);
 
-	ring = adev->mman.buffer_funcs_ring;
-	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
-	WARN_ON(job->ibs[0].length_dw > num_dw);
-
 	flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, mem);
 	if (tmz)
 		flags |= AMDGPU_PTE_TMZ;
@@ -261,7 +268,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
 		amdgpu_gart_map_vram_range(adev, pa, 0, num_pages, flags, cpu_addr);
 	}
 
-	dma_fence_put(amdgpu_job_submit(job));
+	dma_fence_put(amdgpu_ttm_job_submit(adev, job, num_dw));
 	return 0;
 }
 
@@ -1497,10 +1504,7 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
 	amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr,
 				PAGE_SIZE, 0);
 
-	amdgpu_ring_pad_ib(adev->mman.buffer_funcs_ring, &job->ibs[0]);
-	WARN_ON(job->ibs[0].length_dw > num_dw);
-
-	fence = amdgpu_job_submit(job);
+	fence = amdgpu_ttm_job_submit(adev, job, num_dw);
 
 	if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout))
 		r = -ETIMEDOUT;
@@ -2285,11 +2289,9 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
 		byte_count -= cur_size_in_bytes;
 	}
 
-	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
-	WARN_ON(job->ibs[0].length_dw > num_dw);
-	*fence = amdgpu_job_submit(job);
 	if (r)
 		goto error_free;
+	*fence = amdgpu_ttm_job_submit(adev, job, num_dw);
 
 	return r;
 
@@ -2307,7 +2309,6 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data,
 			       u64 k_job_id)
 {
 	unsigned int num_loops, num_dw;
-	struct amdgpu_ring *ring;
 	struct amdgpu_job *job;
 	uint32_t max_bytes;
 	unsigned int i;
@@ -2331,10 +2332,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data,
 		byte_count -= cur_size;
 	}
 
-	ring = adev->mman.buffer_funcs_ring;
-	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
-	WARN_ON(job->ibs[0].length_dw > num_dw);
-	*fence = amdgpu_job_submit(job);
+	*fence = amdgpu_ttm_job_submit(adev, job, num_dw);
 	return 0;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 07/28] drm/amdgpu: fix error handling in amdgpu_copy_buffer
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (5 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 06/28] drm/amdgpu: add amdgpu_ttm_job_submit helper Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 13:04   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 08/28] drm/amdgpu: pass the entity to use to amdgpu_ttm_map_buffer Pierre-Eric Pelloux-Prayer
                   ` (20 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

If amdgpu_job_alloc_with_ib fails, amdgpu_ttm_prepare_job should
clear the pointer to NULL, this way the caller can call
amdgpu_job_free on all failures without risking a double free.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index be1232b2d55e..353682c0e8f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2233,8 +2233,10 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
 	r = amdgpu_job_alloc_with_ib(adev, entity,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     num_dw * 4, pool, job, k_job_id);
-	if (r)
+	if (r) {
+		*job = NULL;
 		return r;
+	}
 
 	if (vm_needs_flush) {
 		(*job)->vm_pd_addr = amdgpu_gmc_pd_addr(adev->gmc.pdb0_bo ?
@@ -2277,7 +2279,7 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
 				   resv, vm_needs_flush, &job, false,
 				   AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER);
 	if (r)
-		return r;
+		goto error_free;
 
 	for (i = 0; i < num_loops; i++) {
 		uint32_t cur_size_in_bytes = min(byte_count, max_bytes);
@@ -2289,11 +2291,9 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
 		byte_count -= cur_size_in_bytes;
 	}
 
-	if (r)
-		goto error_free;
 	*fence = amdgpu_ttm_job_submit(adev, job, num_dw);
 
-	return r;
+	return 0;
 
 error_free:
 	amdgpu_job_free(job);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 08/28] drm/amdgpu: pass the entity to use to amdgpu_ttm_map_buffer
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (6 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 07/28] drm/amdgpu: fix error handling in amdgpu_copy_buffer Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 13:05   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 09/28] drm/amdgpu: pass the entity to use to ttm public functions Pierre-Eric Pelloux-Prayer
                   ` (19 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

This way the caller can select the one it wants to use.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 55 ++++++++++++++++---------
 1 file changed, 35 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 353682c0e8f0..3d850893b97f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -177,6 +177,7 @@ amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_job *job, u32 nu
 /**
  * amdgpu_ttm_map_buffer - Map memory into the GART windows
  * @adev: the device being used
+ * @entity: entity to run the window setup job
  * @bo: buffer object to map
  * @mem: memory object to map
  * @mm_cur: range to map
@@ -189,6 +190,7 @@ amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_job *job, u32 nu
  * the physical address for local memory.
  */
 static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
+				 struct amdgpu_ttm_buffer_entity *entity,
 				 struct ttm_buffer_object *bo,
 				 struct ttm_resource *mem,
 				 struct amdgpu_res_cursor *mm_cur,
@@ -235,7 +237,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
 	num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
 	num_bytes = num_pages * 8 * AMDGPU_GPU_PAGES_IN_CPU_PAGE;
 
-	r = amdgpu_job_alloc_with_ib(adev, &adev->mman.default_entity.base,
+	r = amdgpu_job_alloc_with_ib(adev, &entity->base,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     num_dw * 4 + num_bytes,
 				     AMDGPU_IB_POOL_DELAYED, &job,
@@ -275,6 +277,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
 /**
  * amdgpu_ttm_copy_mem_to_mem - Helper function for copy
  * @adev: amdgpu device
+ * @entity: entity to run the jobs
  * @src: buffer/address where to read from
  * @dst: buffer/address where to write to
  * @size: number of bytes to copy
@@ -289,6 +292,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
  */
 __attribute__((nonnull))
 static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
+				      struct amdgpu_ttm_buffer_entity *entity,
 				      const struct amdgpu_copy_mem *src,
 				      const struct amdgpu_copy_mem *dst,
 				      uint64_t size, bool tmz,
@@ -320,12 +324,14 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
 		cur_size = min3(src_mm.size, dst_mm.size, 256ULL << 20);
 
 		/* Map src to window 0 and dst to window 1. */
-		r = amdgpu_ttm_map_buffer(adev, src->bo, src->mem, &src_mm,
+		r = amdgpu_ttm_map_buffer(adev, entity,
+					  src->bo, src->mem, &src_mm,
 					  0, tmz, &cur_size, &from);
 		if (r)
 			goto error;
 
-		r = amdgpu_ttm_map_buffer(adev, dst->bo, dst->mem, &dst_mm,
+		r = amdgpu_ttm_map_buffer(adev, entity,
+					  dst->bo, dst->mem, &dst_mm,
 					  1, tmz, &cur_size, &to);
 		if (r)
 			goto error;
@@ -394,7 +400,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
 	src.offset = 0;
 	dst.offset = 0;
 
-	r = amdgpu_ttm_copy_mem_to_mem(adev, &src, &dst,
+	r = amdgpu_ttm_copy_mem_to_mem(adev,
+				       &adev->mman.move_entity,
+				       &src, &dst,
 				       new_mem->size,
 				       amdgpu_bo_encrypted(abo),
 				       bo->base.resv, &fence);
@@ -2220,17 +2228,16 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 }
 
 static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
+				  struct amdgpu_ttm_buffer_entity *entity,
 				  unsigned int num_dw,
 				  struct dma_resv *resv,
 				  bool vm_needs_flush,
 				  struct amdgpu_job **job,
-				  bool delayed, u64 k_job_id)
+				  u64 k_job_id)
 {
 	enum amdgpu_ib_pool_type pool = AMDGPU_IB_POOL_DELAYED;
 	int r;
-	struct drm_sched_entity *entity = delayed ? &adev->mman.clear_entity.base :
-						    &adev->mman.move_entity.base;
-	r = amdgpu_job_alloc_with_ib(adev, entity,
+	r = amdgpu_job_alloc_with_ib(adev, &entity->base,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     num_dw * 4, pool, job, k_job_id);
 	if (r) {
@@ -2275,8 +2282,8 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
 	max_bytes = adev->mman.buffer_funcs->copy_max_bytes;
 	num_loops = DIV_ROUND_UP(byte_count, max_bytes);
 	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8);
-	r = amdgpu_ttm_prepare_job(adev, num_dw,
-				   resv, vm_needs_flush, &job, false,
+	r = amdgpu_ttm_prepare_job(adev, &adev->mman.move_entity, num_dw,
+				   resv, vm_needs_flush, &job,
 				   AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER);
 	if (r)
 		goto error_free;
@@ -2301,11 +2308,13 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
 	return r;
 }
 
-static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data,
+static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
+			       struct amdgpu_ttm_buffer_entity *entity,
+			       uint32_t src_data,
 			       uint64_t dst_addr, uint32_t byte_count,
 			       struct dma_resv *resv,
 			       struct dma_fence **fence,
-			       bool vm_needs_flush, bool delayed,
+			       bool vm_needs_flush,
 			       u64 k_job_id)
 {
 	unsigned int num_loops, num_dw;
@@ -2317,8 +2326,8 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data,
 	max_bytes = adev->mman.buffer_funcs->fill_max_bytes;
 	num_loops = DIV_ROUND_UP_ULL(byte_count, max_bytes);
 	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->fill_num_dw, 8);
-	r = amdgpu_ttm_prepare_job(adev, num_dw, resv, vm_needs_flush,
-				   &job, delayed, k_job_id);
+	r = amdgpu_ttm_prepare_job(adev, entity, num_dw, resv,
+				   vm_needs_flush, &job, k_job_id);
 	if (r)
 		return r;
 
@@ -2379,13 +2388,14 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
 		/* Never clear more than 256MiB at once to avoid timeouts */
 		size = min(cursor.size, 256ULL << 20);
 
-		r = amdgpu_ttm_map_buffer(adev, &bo->tbo, bo->tbo.resource, &cursor,
+		r = amdgpu_ttm_map_buffer(adev, &adev->mman.clear_entity,
+					  &bo->tbo, bo->tbo.resource, &cursor,
 					  1, false, &size, &addr);
 		if (r)
 			goto err;
 
-		r = amdgpu_ttm_fill_mem(adev, 0, addr, size, resv,
-					&next, true, true,
+		r = amdgpu_ttm_fill_mem(adev, &adev->mman.clear_entity, 0, addr, size, resv,
+					&next, true,
 					AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
 		if (r)
 			goto err;
@@ -2409,10 +2419,14 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
 			u64 k_job_id)
 {
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
+	struct amdgpu_ttm_buffer_entity *entity;
 	struct dma_fence *fence = NULL;
 	struct amdgpu_res_cursor dst;
 	int r;
 
+	entity = delayed ? &adev->mman.clear_entity :
+			   &adev->mman.move_entity;
+
 	if (!adev->mman.buffer_funcs_enabled) {
 		dev_err(adev->dev,
 			"Trying to clear memory with ring turned off.\n");
@@ -2429,13 +2443,14 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
 		/* Never fill more than 256MiB at once to avoid timeouts */
 		cur_size = min(dst.size, 256ULL << 20);
 
-		r = amdgpu_ttm_map_buffer(adev, &bo->tbo, bo->tbo.resource, &dst,
+		r = amdgpu_ttm_map_buffer(adev, &adev->mman.default_entity,
+					  &bo->tbo, bo->tbo.resource, &dst,
 					  1, false, &cur_size, &to);
 		if (r)
 			goto error;
 
-		r = amdgpu_ttm_fill_mem(adev, src_data, to, cur_size, resv,
-					&next, true, delayed, k_job_id);
+		r = amdgpu_ttm_fill_mem(adev, entity, src_data, to, cur_size, resv,
+					&next, true, k_job_id);
 		if (r)
 			goto error;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 09/28] drm/amdgpu: pass the entity to use to ttm public functions
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (7 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 08/28] drm/amdgpu: pass the entity to use to amdgpu_ttm_map_buffer Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 13:23   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 10/28] drm/amdgpu: add amdgpu_device argument to ttm functions that need it Pierre-Eric Pelloux-Prayer
                   ` (18 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Felix Kuehling, Sumit Semwal
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel,
	linux-media, linaro-mm-sig

This way the caller can select the one it wants to use.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 34 +++++++++----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h       | 16 +++++----
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c      |  3 +-
 5 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
index 3636b757c974..a050167e76a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
@@ -37,7 +37,8 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size,
 
 	stime = ktime_get();
 	for (i = 0; i < n; i++) {
-		r = amdgpu_copy_buffer(adev, saddr, daddr, size, NULL, &fence,
+		r = amdgpu_copy_buffer(adev, &adev->mman.default_entity,
+				       saddr, daddr, size, NULL, &fence,
 				       false, 0);
 		if (r)
 			goto exit_do_move;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 926a3f09a776..858eb9fa061b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1322,8 +1322,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
 	if (r)
 		goto out;
 
-	r = amdgpu_fill_buffer(abo, 0, &bo->base._resv, &fence, true,
-			       AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
+	r = amdgpu_fill_buffer(&adev->mman.clear_entity, abo, 0, &bo->base._resv,
+			       &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
 	if (WARN_ON(r))
 		goto out;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 3d850893b97f..1d3afad885da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -359,7 +359,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
 							     write_compress_disable));
 		}
 
-		r = amdgpu_copy_buffer(adev, from, to, cur_size, resv,
+		r = amdgpu_copy_buffer(adev, entity, from, to, cur_size, resv,
 				       &next, true, copy_flags);
 		if (r)
 			goto error;
@@ -414,8 +414,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
 	    (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) {
 		struct dma_fence *wipe_fence = NULL;
 
-		r = amdgpu_fill_buffer(abo, 0, NULL, &wipe_fence,
-				       false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
+		r = amdgpu_fill_buffer(&adev->mman.move_entity,
+				       abo, 0, NULL, &wipe_fence,
+				       AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
 		if (r) {
 			goto error;
 		} else if (wipe_fence) {
@@ -2258,7 +2259,9 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
 						   DMA_RESV_USAGE_BOOKKEEP);
 }
 
-int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
+int amdgpu_copy_buffer(struct amdgpu_device *adev,
+		       struct amdgpu_ttm_buffer_entity *entity,
+		       uint64_t src_offset,
 		       uint64_t dst_offset, uint32_t byte_count,
 		       struct dma_resv *resv,
 		       struct dma_fence **fence,
@@ -2282,7 +2285,7 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
 	max_bytes = adev->mman.buffer_funcs->copy_max_bytes;
 	num_loops = DIV_ROUND_UP(byte_count, max_bytes);
 	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8);
-	r = amdgpu_ttm_prepare_job(adev, &adev->mman.move_entity, num_dw,
+	r = amdgpu_ttm_prepare_job(adev, entity, num_dw,
 				   resv, vm_needs_flush, &job,
 				   AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER);
 	if (r)
@@ -2411,22 +2414,18 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
 	return r;
 }
 
-int amdgpu_fill_buffer(struct amdgpu_bo *bo,
-			uint32_t src_data,
-			struct dma_resv *resv,
-			struct dma_fence **f,
-			bool delayed,
-			u64 k_job_id)
+int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity,
+		       struct amdgpu_bo *bo,
+		       uint32_t src_data,
+		       struct dma_resv *resv,
+		       struct dma_fence **f,
+		       u64 k_job_id)
 {
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
-	struct amdgpu_ttm_buffer_entity *entity;
 	struct dma_fence *fence = NULL;
 	struct amdgpu_res_cursor dst;
 	int r;
 
-	entity = delayed ? &adev->mman.clear_entity :
-			   &adev->mman.move_entity;
-
 	if (!adev->mman.buffer_funcs_enabled) {
 		dev_err(adev->dev,
 			"Trying to clear memory with ring turned off.\n");
@@ -2443,13 +2442,14 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
 		/* Never fill more than 256MiB at once to avoid timeouts */
 		cur_size = min(dst.size, 256ULL << 20);
 
-		r = amdgpu_ttm_map_buffer(adev, &adev->mman.default_entity,
+		r = amdgpu_ttm_map_buffer(adev, entity,
 					  &bo->tbo, bo->tbo.resource, &dst,
 					  1, false, &cur_size, &to);
 		if (r)
 			goto error;
 
-		r = amdgpu_ttm_fill_mem(adev, entity, src_data, to, cur_size, resv,
+		r = amdgpu_ttm_fill_mem(adev, entity,
+					src_data, to, cur_size, resv,
 					&next, true, k_job_id);
 		if (r)
 			goto error;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 41bbc25680a2..9288599c9c46 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -167,7 +167,9 @@ int amdgpu_ttm_init(struct amdgpu_device *adev);
 void amdgpu_ttm_fini(struct amdgpu_device *adev);
 void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev,
 					bool enable);
-int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
+int amdgpu_copy_buffer(struct amdgpu_device *adev,
+		       struct amdgpu_ttm_buffer_entity *entity,
+		       uint64_t src_offset,
 		       uint64_t dst_offset, uint32_t byte_count,
 		       struct dma_resv *resv,
 		       struct dma_fence **fence,
@@ -175,12 +177,12 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
 int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
 			    struct dma_resv *resv,
 			    struct dma_fence **fence);
-int amdgpu_fill_buffer(struct amdgpu_bo *bo,
-			uint32_t src_data,
-			struct dma_resv *resv,
-			struct dma_fence **fence,
-			bool delayed,
-			u64 k_job_id);
+int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity,
+		       struct amdgpu_bo *bo,
+		       uint32_t src_data,
+		       struct dma_resv *resv,
+		       struct dma_fence **f,
+		       u64 k_job_id);
 
 int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
 void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index ade1d4068d29..9c76f1ba0e55 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -157,7 +157,8 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys,
 			goto out_unlock;
 		}
 
-		r = amdgpu_copy_buffer(adev, gart_s, gart_d, size * PAGE_SIZE,
+		r = amdgpu_copy_buffer(adev, entity,
+				       gart_s, gart_d, size * PAGE_SIZE,
 				       NULL, &next, true, 0);
 		if (r) {
 			dev_err(adev->dev, "fail %d to copy memory\n", r);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 10/28] drm/amdgpu: add amdgpu_device argument to ttm functions that need it
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (8 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 09/28] drm/amdgpu: pass the entity to use to ttm public functions Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 13:24   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 11/28] drm/amdgpu: statically assign gart windows to ttm entities Pierre-Eric Pelloux-Prayer
                   ` (17 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

Instead of getting it through amdgpu_ttm_adev(bo->tbo.bdev).

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  5 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 11 ++++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h    |  6 ++++--
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 858eb9fa061b..2ee48f76483d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -725,7 +725,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 	    bo->tbo.resource->mem_type == TTM_PL_VRAM) {
 		struct dma_fence *fence;
 
-		r = amdgpu_ttm_clear_buffer(bo, bo->tbo.base.resv, &fence);
+		r = amdgpu_ttm_clear_buffer(adev, bo, bo->tbo.base.resv, &fence);
 		if (unlikely(r))
 			goto fail_unreserve;
 
@@ -1322,7 +1322,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
 	if (r)
 		goto out;
 
-	r = amdgpu_fill_buffer(&adev->mman.clear_entity, abo, 0, &bo->base._resv,
+	r = amdgpu_fill_buffer(adev,
+			       &adev->mman.clear_entity, abo, 0, &bo->base._resv,
 			       &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
 	if (WARN_ON(r))
 		goto out;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 1d3afad885da..57dff2df433b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -414,7 +414,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
 	    (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) {
 		struct dma_fence *wipe_fence = NULL;
 
-		r = amdgpu_fill_buffer(&adev->mman.move_entity,
+		r = amdgpu_fill_buffer(adev, &adev->mman.move_entity,
 				       abo, 0, NULL, &wipe_fence,
 				       AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
 		if (r) {
@@ -2350,6 +2350,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
 
 /**
  * amdgpu_ttm_clear_buffer - clear memory buffers
+ * @adev: amdgpu device object
  * @bo: amdgpu buffer object
  * @resv: reservation object
  * @fence: dma_fence associated with the operation
@@ -2359,11 +2360,11 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
  * Returns:
  * 0 for success or a negative error code on failure.
  */
-int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
+int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
+			    struct amdgpu_bo *bo,
 			    struct dma_resv *resv,
 			    struct dma_fence **fence)
 {
-	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
 	struct amdgpu_res_cursor cursor;
 	u64 addr;
 	int r = 0;
@@ -2414,14 +2415,14 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
 	return r;
 }
 
-int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity,
+int amdgpu_fill_buffer(struct amdgpu_device *adev,
+		       struct amdgpu_ttm_buffer_entity *entity,
 		       struct amdgpu_bo *bo,
 		       uint32_t src_data,
 		       struct dma_resv *resv,
 		       struct dma_fence **f,
 		       u64 k_job_id)
 {
-	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
 	struct dma_fence *fence = NULL;
 	struct amdgpu_res_cursor dst;
 	int r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 9288599c9c46..d0f55a7edd30 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -174,10 +174,12 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev,
 		       struct dma_resv *resv,
 		       struct dma_fence **fence,
 		       bool vm_needs_flush, uint32_t copy_flags);
-int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
+int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
+			    struct amdgpu_bo *bo,
 			    struct dma_resv *resv,
 			    struct dma_fence **fence);
-int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity,
+int amdgpu_fill_buffer(struct amdgpu_device *adev,
+		       struct amdgpu_ttm_buffer_entity *entity,
 		       struct amdgpu_bo *bo,
 		       uint32_t src_data,
 		       struct dma_resv *resv,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 11/28] drm/amdgpu: statically assign gart windows to ttm entities
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (9 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 10/28] drm/amdgpu: add amdgpu_device argument to ttm functions that need it Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 13:34   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 12/28] drm/amdgpu: remove AMDGPU_GTT_NUM_TRANSFER_WINDOWS Pierre-Eric Pelloux-Prayer
                   ` (16 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Felix Kuehling
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

If multiple entities share the same window we must make sure
that jobs using them are executed sequentially.

This commit gives separate windows to each entity, so jobs
from multiple entities could execute in parallel if needed.
(for now they all use the first sdma engine, so it makes no
difference yet).
The entity stores the gart window offsets to centralize the
"window id" to "window offset" in a single place.

default_entity doesn't get any windows reserved since there is
no use for them.

---
v3:
- renamed gart_window_lock -> lock (Christian)
- added amdgpu_ttm_buffer_entity_init (Christian)
- fixed gart_addr in svm_migrate_gart_map (Felix)
- renamed gart_window_idX -> gart_window_offs[]
- added amdgpu_compute_gart_address
---

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c  |  6 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  | 56 ++++++++++++++++++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  | 14 +++++-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  9 ++--
 4 files changed, 61 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 94e07b9ec7b4..0d2784fe0be3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -686,7 +686,7 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
 	 * translation. Avoid this by doing the invalidation from the SDMA
 	 * itself at least for GART.
 	 */
-	mutex_lock(&adev->mman.gtt_window_lock);
+	mutex_lock(&adev->mman.default_entity.lock);
 	r = amdgpu_job_alloc_with_ib(ring->adev, &adev->mman.default_entity.base,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
 				     16 * 4, AMDGPU_IB_POOL_IMMEDIATE,
@@ -699,7 +699,7 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
 	job->ibs->ptr[job->ibs->length_dw++] = ring->funcs->nop;
 	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
 	fence = amdgpu_job_submit(job);
-	mutex_unlock(&adev->mman.gtt_window_lock);
+	mutex_unlock(&adev->mman.default_entity.lock);
 
 	dma_fence_wait(fence, false);
 	dma_fence_put(fence);
@@ -707,7 +707,7 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
 	return;
 
 error_alloc:
-	mutex_unlock(&adev->mman.gtt_window_lock);
+	mutex_unlock(&adev->mman.default_entity.lock);
 	dev_err(adev->dev, "Error flushing GPU TLB using the SDMA (%d)!\n", r);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 57dff2df433b..1371a40d4687 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -229,9 +229,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
 
 	*size = min(*size, (uint64_t)num_pages * PAGE_SIZE - offset);
 
-	*addr = adev->gmc.gart_start;
-	*addr += (u64)window * AMDGPU_GTT_MAX_TRANSFER_SIZE *
-		AMDGPU_GPU_PAGE_SIZE;
+	*addr = amdgpu_compute_gart_address(&adev->gmc, entity, window);
 	*addr += offset;
 
 	num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
@@ -249,7 +247,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
 	src_addr += job->ibs[0].gpu_addr;
 
 	dst_addr = amdgpu_bo_gpu_offset(adev->gart.bo);
-	dst_addr += window * AMDGPU_GTT_MAX_TRANSFER_SIZE * 8;
+	dst_addr += (entity->gart_window_offs[window] >> AMDGPU_GPU_PAGE_SHIFT) * 8;
 	amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr,
 				dst_addr, num_bytes, 0);
 
@@ -314,7 +312,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
 	amdgpu_res_first(src->mem, src->offset, size, &src_mm);
 	amdgpu_res_first(dst->mem, dst->offset, size, &dst_mm);
 
-	mutex_lock(&adev->mman.gtt_window_lock);
+	mutex_lock(&entity->lock);
 	while (src_mm.remaining) {
 		uint64_t from, to, cur_size, tiling_flags;
 		uint32_t num_type, data_format, max_com, write_compress_disable;
@@ -371,7 +369,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
 		amdgpu_res_next(&dst_mm, cur_size);
 	}
 error:
-	mutex_unlock(&adev->mman.gtt_window_lock);
+	mutex_unlock(&entity->lock);
 	*f = fence;
 	return r;
 }
@@ -1876,6 +1874,27 @@ static void amdgpu_ttm_mmio_remap_bo_fini(struct amdgpu_device *adev)
 	adev->rmmio_remap.bo = NULL;
 }
 
+static int amdgpu_ttm_buffer_entity_init(struct amdgpu_ttm_buffer_entity *entity,
+					 int starting_gart_window,
+					 bool needs_src_gart_window,
+					 bool needs_dst_gart_window)
+{
+	mutex_init(&entity->lock);
+	if (needs_src_gart_window) {
+		entity->gart_window_offs[0] =
+			(u64)starting_gart_window * AMDGPU_GTT_MAX_TRANSFER_SIZE *
+				AMDGPU_GPU_PAGE_SIZE;
+		starting_gart_window++;
+	}
+	if (needs_dst_gart_window) {
+		entity->gart_window_offs[1] =
+			(u64)starting_gart_window * AMDGPU_GTT_MAX_TRANSFER_SIZE *
+				AMDGPU_GPU_PAGE_SIZE;
+		starting_gart_window++;
+	}
+	return starting_gart_window;
+}
+
 /*
  * amdgpu_ttm_init - Init the memory management (ttm) as well as various
  * gtt/vram related fields.
@@ -1890,8 +1909,6 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 	uint64_t gtt_size;
 	int r;
 
-	mutex_init(&adev->mman.gtt_window_lock);
-
 	dma_set_max_seg_size(adev->dev, UINT_MAX);
 	/* No others user of address space so set it to 0 */
 	r = ttm_device_init(&adev->mman.bdev, &amdgpu_bo_driver, adev->dev,
@@ -2161,6 +2178,7 @@ void amdgpu_ttm_fini(struct amdgpu_device *adev)
 void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 {
 	struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
+	u32 used_windows;
 	uint64_t size;
 	int r;
 
@@ -2204,6 +2222,14 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 			drm_sched_entity_destroy(&adev->mman.clear_entity.base);
 			goto error_free_entity;
 		}
+
+		/* Statically assign GART windows to each entity. */
+		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.default_entity,
+							     0, false, false);
+		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.move_entity,
+							     used_windows, true, true);
+		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.clear_entity,
+							     used_windows, false, true);
 	} else {
 		drm_sched_entity_destroy(&adev->mman.default_entity.base);
 		drm_sched_entity_destroy(&adev->mman.clear_entity.base);
@@ -2365,6 +2391,7 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
 			    struct dma_resv *resv,
 			    struct dma_fence **fence)
 {
+	struct amdgpu_ttm_buffer_entity *entity;
 	struct amdgpu_res_cursor cursor;
 	u64 addr;
 	int r = 0;
@@ -2375,11 +2402,12 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
 	if (!fence)
 		return -EINVAL;
 
+	entity = &adev->mman.clear_entity;
 	*fence = dma_fence_get_stub();
 
 	amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor);
 
-	mutex_lock(&adev->mman.gtt_window_lock);
+	mutex_lock(&entity->lock);
 	while (cursor.remaining) {
 		struct dma_fence *next = NULL;
 		u64 size;
@@ -2392,13 +2420,13 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
 		/* Never clear more than 256MiB at once to avoid timeouts */
 		size = min(cursor.size, 256ULL << 20);
 
-		r = amdgpu_ttm_map_buffer(adev, &adev->mman.clear_entity,
+		r = amdgpu_ttm_map_buffer(adev, entity,
 					  &bo->tbo, bo->tbo.resource, &cursor,
 					  1, false, &size, &addr);
 		if (r)
 			goto err;
 
-		r = amdgpu_ttm_fill_mem(adev, &adev->mman.clear_entity, 0, addr, size, resv,
+		r = amdgpu_ttm_fill_mem(adev, entity, 0, addr, size, resv,
 					&next, true,
 					AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
 		if (r)
@@ -2410,7 +2438,7 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
 		amdgpu_res_next(&cursor, size);
 	}
 err:
-	mutex_unlock(&adev->mman.gtt_window_lock);
+	mutex_unlock(&entity->lock);
 
 	return r;
 }
@@ -2435,7 +2463,7 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
 
 	amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst);
 
-	mutex_lock(&adev->mman.gtt_window_lock);
+	mutex_lock(&entity->lock);
 	while (dst.remaining) {
 		struct dma_fence *next;
 		uint64_t cur_size, to;
@@ -2461,7 +2489,7 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
 		amdgpu_res_next(&dst, cur_size);
 	}
 error:
-	mutex_unlock(&adev->mman.gtt_window_lock);
+	mutex_unlock(&entity->lock);
 	if (f)
 		*f = dma_fence_get(fence);
 	dma_fence_put(fence);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index d0f55a7edd30..a7eed678bd3f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -29,6 +29,7 @@
 #include <drm/ttm/ttm_placement.h>
 #include "amdgpu_vram_mgr.h"
 #include "amdgpu_hmm.h"
+#include "amdgpu_gmc.h"
 
 #define AMDGPU_PL_GDS		(TTM_PL_PRIV + 0)
 #define AMDGPU_PL_GWS		(TTM_PL_PRIV + 1)
@@ -39,7 +40,7 @@
 #define __AMDGPU_PL_NUM	(TTM_PL_PRIV + 6)
 
 #define AMDGPU_GTT_MAX_TRANSFER_SIZE	512
-#define AMDGPU_GTT_NUM_TRANSFER_WINDOWS	2
+#define AMDGPU_GTT_NUM_TRANSFER_WINDOWS	3
 
 extern const struct attribute_group amdgpu_vram_mgr_attr_group;
 extern const struct attribute_group amdgpu_gtt_mgr_attr_group;
@@ -54,6 +55,8 @@ struct amdgpu_gtt_mgr {
 
 struct amdgpu_ttm_buffer_entity {
 	struct drm_sched_entity base;
+	struct mutex		lock;
+	u32			gart_window_offs[2];
 };
 
 struct amdgpu_mman {
@@ -69,7 +72,7 @@ struct amdgpu_mman {
 
 	struct mutex				gtt_window_lock;
 
-	struct amdgpu_ttm_buffer_entity default_entity;
+	struct amdgpu_ttm_buffer_entity default_entity; /* has no gart windows */
 	struct amdgpu_ttm_buffer_entity clear_entity;
 	struct amdgpu_ttm_buffer_entity move_entity;
 
@@ -201,6 +204,13 @@ static inline int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo,
 }
 #endif
 
+static inline u64 amdgpu_compute_gart_address(struct amdgpu_gmc *gmc,
+					      struct amdgpu_ttm_buffer_entity *entity,
+					      int index)
+{
+	return gmc->gart_start + entity->gart_window_offs[index];
+}
+
 void amdgpu_ttm_tt_set_user_pages(struct ttm_tt *ttm, struct amdgpu_hmm_range *range);
 int amdgpu_ttm_tt_get_userptr(const struct ttm_buffer_object *tbo,
 			      uint64_t *user_addr);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 9c76f1ba0e55..0cc1d2b35026 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -59,8 +59,7 @@ svm_migrate_gart_map(struct amdgpu_ring *ring,
 	void *cpu_addr;
 	int r;
 
-	/* use gart window 0 */
-	*gart_addr = adev->gmc.gart_start;
+	*gart_addr = amdgpu_compute_gart_address(&adev->gmc, entity, 0);
 
 	num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
 	num_bytes = npages * 8;
@@ -116,7 +115,7 @@ svm_migrate_gart_map(struct amdgpu_ring *ring,
  * multiple GTT_MAX_PAGES transfer, all sdma operations are serialized, wait for
  * the last sdma finish fence which is returned to check copy memory is done.
  *
- * Context: Process context, takes and releases gtt_window_lock
+ * Context: Process context
  *
  * Return:
  * 0 - OK, otherwise error code
@@ -138,7 +137,7 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys,
 
 	entity = &adev->mman.move_entity;
 
-	mutex_lock(&adev->mman.gtt_window_lock);
+	mutex_lock(&entity->lock);
 
 	while (npages) {
 		size = min(GTT_MAX_PAGES, npages);
@@ -175,7 +174,7 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys,
 	}
 
 out_unlock:
-	mutex_unlock(&adev->mman.gtt_window_lock);
+	mutex_unlock(&entity->lock);
 
 	return r;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 12/28] drm/amdgpu: remove AMDGPU_GTT_NUM_TRANSFER_WINDOWS
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (10 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 11/28] drm/amdgpu: statically assign gart windows to ttm entities Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 13:50   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 13/28] drm/amdgpu: add missing lock when using ttm entities Pierre-Eric Pelloux-Prayer
                   ` (15 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

ttm is going to use a variable number of windows so we need to get
rid of the hardcoded value.

Since amdgpu_gtt_mgr_init is initialized after
amdgpu_ttm_set_buffer_funcs_status is called with enable=false, we
still need to determine the reserved windows count before doing
the real initialisation so a warning is added if the actual value
doesn't match the reserved one.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c |  8 +++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     | 21 ++++++++++++++-------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h     |  7 +++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c     |  6 ++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h     |  3 ++-
 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c       | 12 ++++--------
 6 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 895c1e4c6747..924151b6cfd3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -269,10 +269,12 @@ static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = {
  *
  * @adev: amdgpu_device pointer
  * @gtt_size: maximum size of GTT
+ * @reserved_windows: num of already used windows
  *
  * Allocate and initialize the GTT manager.
  */
-int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size)
+int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size,
+			u32 reserved_windows)
 {
 	struct amdgpu_gtt_mgr *mgr = &adev->mman.gtt_mgr;
 	struct ttm_resource_manager *man = &mgr->manager;
@@ -283,8 +285,8 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size)
 
 	ttm_resource_manager_init(man, &adev->mman.bdev, gtt_size);
 
-	start = AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
-	start += amdgpu_vce_required_gart_pages(adev);
+	start = AMDGPU_GTT_MAX_TRANSFER_SIZE * reserved_windows;
+	start += amdgpu_vce_required_gart_pages(adev, start);
 	size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
 	drm_mm_init(&mgr->mm, start, size);
 	spin_lock_init(&mgr->lock);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 1371a40d4687..3a0511d1739f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1907,6 +1907,7 @@ static int amdgpu_ttm_buffer_entity_init(struct amdgpu_ttm_buffer_entity *entity
 int amdgpu_ttm_init(struct amdgpu_device *adev)
 {
 	uint64_t gtt_size;
+	u32 reserved_windows;
 	int r;
 
 	dma_set_max_seg_size(adev->dev, UINT_MAX);
@@ -1939,7 +1940,7 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 	}
 
 	/* Change the size here instead of the init above so only lpfn is affected */
-	amdgpu_ttm_set_buffer_funcs_status(adev, false);
+	reserved_windows = amdgpu_ttm_set_buffer_funcs_status(adev, false);
 #ifdef CONFIG_64BIT
 #ifdef CONFIG_X86
 	if (adev->gmc.xgmi.connected_to_cpu)
@@ -2035,7 +2036,7 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 	}
 
 	/* Initialize GTT memory pool */
-	r = amdgpu_gtt_mgr_init(adev, gtt_size);
+	r = amdgpu_gtt_mgr_init(adev, gtt_size, reserved_windows);
 	if (r) {
 		dev_err(adev->dev, "Failed initializing GTT heap.\n");
 		return r;
@@ -2174,17 +2175,21 @@ void amdgpu_ttm_fini(struct amdgpu_device *adev)
  *
  * Enable/disable use of buffer functions during suspend/resume. This should
  * only be called at bootup or when userspace isn't running.
+ *
+ * Returns: the number of GART reserved window
  */
-void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
+u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 {
 	struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
-	u32 used_windows;
+	u32 used_windows, reserved_windows;
 	uint64_t size;
 	int r;
 
+	reserved_windows = 3;
+
 	if (!adev->mman.initialized || amdgpu_in_reset(adev) ||
 	    adev->mman.buffer_funcs_enabled == enable || adev->gmc.is_app_apu)
-		return;
+		return reserved_windows;
 
 	if (enable) {
 		struct amdgpu_ring *ring;
@@ -2199,7 +2204,7 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 			dev_err(adev->dev,
 				"Failed setting up TTM BO move entity (%d)\n",
 				r);
-			return;
+			return 0;
 		}
 
 		r = drm_sched_entity_init(&adev->mman.clear_entity.base,
@@ -2230,6 +2235,7 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 							     used_windows, true, true);
 		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.clear_entity,
 							     used_windows, false, true);
+		WARN_ON(used_windows != reserved_windows);
 	} else {
 		drm_sched_entity_destroy(&adev->mman.default_entity.base);
 		drm_sched_entity_destroy(&adev->mman.clear_entity.base);
@@ -2248,10 +2254,11 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 	man->size = size;
 	adev->mman.buffer_funcs_enabled = enable;
 
-	return;
+	return reserved_windows;
 
 error_free_entity:
 	drm_sched_entity_destroy(&adev->mman.default_entity.base);
+	return 0;
 }
 
 static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index a7eed678bd3f..2a78cf8a3f9f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -40,7 +40,6 @@
 #define __AMDGPU_PL_NUM	(TTM_PL_PRIV + 6)
 
 #define AMDGPU_GTT_MAX_TRANSFER_SIZE	512
-#define AMDGPU_GTT_NUM_TRANSFER_WINDOWS	3
 
 extern const struct attribute_group amdgpu_vram_mgr_attr_group;
 extern const struct attribute_group amdgpu_gtt_mgr_attr_group;
@@ -134,7 +133,7 @@ struct amdgpu_copy_mem {
 #define AMDGPU_COPY_FLAGS_GET(value, field) \
 	(((__u32)(value) >> AMDGPU_COPY_FLAGS_##field##_SHIFT) & AMDGPU_COPY_FLAGS_##field##_MASK)
 
-int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size);
+int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size, u32 reserved_windows);
 void amdgpu_gtt_mgr_fini(struct amdgpu_device *adev);
 int amdgpu_preempt_mgr_init(struct amdgpu_device *adev);
 void amdgpu_preempt_mgr_fini(struct amdgpu_device *adev);
@@ -168,8 +167,8 @@ bool amdgpu_res_cpu_visible(struct amdgpu_device *adev,
 
 int amdgpu_ttm_init(struct amdgpu_device *adev);
 void amdgpu_ttm_fini(struct amdgpu_device *adev);
-void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev,
-					bool enable);
+u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev,
+				       bool enable);
 int amdgpu_copy_buffer(struct amdgpu_device *adev,
 		       struct amdgpu_ttm_buffer_entity *entity,
 		       uint64_t src_offset,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index a7d8f1ce6ac2..56308efce465 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -459,11 +459,13 @@ void amdgpu_vce_free_handles(struct amdgpu_device *adev, struct drm_file *filp)
  * For VCE1, see vce_v1_0_ensure_vcpu_bo_32bit_addr for details.
  * For VCE2+, this is not needed so return zero.
  */
-u32 amdgpu_vce_required_gart_pages(struct amdgpu_device *adev)
+u32 amdgpu_vce_required_gart_pages(struct amdgpu_device *adev, u64 gtt_transfer_end)
 {
 	/* VCE IP block not added yet, so can't use amdgpu_ip_version */
-	if (adev->family == AMDGPU_FAMILY_SI)
+	if (adev->family == AMDGPU_FAMILY_SI) {
+		adev->vce.gart_page_start = gtt_transfer_end;
 		return 512;
+	}
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
index 1c3464ce5037..d07302535d33 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
@@ -52,6 +52,7 @@ struct amdgpu_vce {
 	uint32_t                srbm_soft_reset;
 	unsigned		num_rings;
 	uint32_t		keyselect;
+	u64			gart_page_start;
 };
 
 int amdgpu_vce_early_init(struct amdgpu_device *adev);
@@ -61,7 +62,7 @@ int amdgpu_vce_entity_init(struct amdgpu_device *adev, struct amdgpu_ring *ring)
 int amdgpu_vce_suspend(struct amdgpu_device *adev);
 int amdgpu_vce_resume(struct amdgpu_device *adev);
 void amdgpu_vce_free_handles(struct amdgpu_device *adev, struct drm_file *filp);
-u32 amdgpu_vce_required_gart_pages(struct amdgpu_device *adev);
+u32 amdgpu_vce_required_gart_pages(struct amdgpu_device *adev, u64 gtt_transfer_end);
 int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p, struct amdgpu_job *job,
 			     struct amdgpu_ib *ib);
 int amdgpu_vce_ring_parse_cs_vm(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
index 9ae424618556..dd18fc45225d 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
@@ -47,11 +47,6 @@
 #define VCE_V1_0_DATA_SIZE	(7808 * (AMDGPU_MAX_VCE_HANDLES + 1))
 #define VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK	0x02
 
-#define VCE_V1_0_GART_PAGE_START \
-	(AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS)
-#define VCE_V1_0_GART_ADDR_START \
-	(VCE_V1_0_GART_PAGE_START * AMDGPU_GPU_PAGE_SIZE)
-
 static void vce_v1_0_set_ring_funcs(struct amdgpu_device *adev);
 static void vce_v1_0_set_irq_funcs(struct amdgpu_device *adev);
 
@@ -541,6 +536,7 @@ static int vce_v1_0_ensure_vcpu_bo_32bit_addr(struct amdgpu_device *adev)
 	u64 num_pages = ALIGN(bo_size, AMDGPU_GPU_PAGE_SIZE) / AMDGPU_GPU_PAGE_SIZE;
 	u64 pa = amdgpu_gmc_vram_pa(adev, adev->vce.vcpu_bo);
 	u64 flags = AMDGPU_PTE_READABLE | AMDGPU_PTE_WRITEABLE | AMDGPU_PTE_VALID;
+	u64 vce_gart_addr_start = adev->vce.gart_page_start * AMDGPU_GPU_PAGE_SIZE;
 
 	/*
 	 * Check if the VCPU BO already has a 32-bit address.
@@ -550,12 +546,12 @@ static int vce_v1_0_ensure_vcpu_bo_32bit_addr(struct amdgpu_device *adev)
 		return 0;
 
 	/* Check if we can map the VCPU BO in GART to a 32-bit address. */
-	if (adev->gmc.gart_start + VCE_V1_0_GART_ADDR_START > max_vcpu_bo_addr)
+	if (adev->gmc.gart_start + vce_gart_addr_start > max_vcpu_bo_addr)
 		return -EINVAL;
 
-	amdgpu_gart_map_vram_range(adev, pa, VCE_V1_0_GART_PAGE_START,
+	amdgpu_gart_map_vram_range(adev, pa, adev->vce.gart_page_start,
 				   num_pages, flags, adev->gart.ptr);
-	adev->vce.gpu_addr = adev->gmc.gart_start + VCE_V1_0_GART_ADDR_START;
+	adev->vce.gpu_addr = adev->gmc.gart_start + vce_gart_addr_start;
 	if (adev->vce.gpu_addr > max_vcpu_bo_addr)
 		return -EINVAL;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 13/28] drm/amdgpu: add missing lock when using ttm entities
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (11 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 12/28] drm/amdgpu: remove AMDGPU_GTT_NUM_TRANSFER_WINDOWS Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 13:53   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 14/28] drm/amdgpu: check entity lock is held in amdgpu_ttm_job_submit Pierre-Eric Pelloux-Prayer
                   ` (14 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

Taking the entity lock is required to guarantee the ordering of
execution. The next commit will add a check that the lock is
held.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
index a050167e76a4..832d9ae101f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
@@ -35,6 +35,7 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size,
 	struct dma_fence *fence;
 	int i, r;
 
+	mutex_lock(&adev->mman.default_entity.lock);
 	stime = ktime_get();
 	for (i = 0; i < n; i++) {
 		r = amdgpu_copy_buffer(adev, &adev->mman.default_entity,
@@ -47,6 +48,7 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size,
 		if (r)
 			goto exit_do_move;
 	}
+	mutex_unlock(&adev->mman.default_entity.lock);
 
 exit_do_move:
 	etime = ktime_get();
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 3a0511d1739f..a803af015d05 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1501,6 +1501,7 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
 	if (r)
 		goto out;
 
+	mutex_lock(&adev->mman.default_entity.lock);
 	amdgpu_res_first(abo->tbo.resource, offset, len, &src_mm);
 	src_addr = amdgpu_ttm_domain_start(adev, bo->resource->mem_type) +
 		src_mm.start;
@@ -1512,6 +1513,7 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
 				PAGE_SIZE, 0);
 
 	fence = amdgpu_ttm_job_submit(adev, job, num_dw);
+	mutex_unlock(&adev->mman.default_entity.lock);
 
 	if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout))
 		r = -ETIMEDOUT;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 14/28] drm/amdgpu: check entity lock is held in amdgpu_ttm_job_submit
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (12 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 13/28] drm/amdgpu: add missing lock when using ttm entities Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 13:54   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 15/28] drm/amdgpu: double AMDGPU_GTT_MAX_TRANSFER_SIZE Pierre-Eric Pelloux-Prayer
                   ` (13 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

drm_sched_job_arm and drm_sched_entity_push_job must be called
under the same lock to guarantee the order of execution.

This commit adds a check in amdgpu_ttm_job_submit and fix the
places where the lock was missing.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index a803af015d05..164b49d768d8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -163,7 +163,8 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 }
 
 static struct dma_fence *
-amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_job *job, u32 num_dw)
+amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_ttm_buffer_entity *entity,
+		      struct amdgpu_job *job, u32 num_dw)
 {
 	struct amdgpu_ring *ring;
 
@@ -171,6 +172,8 @@ amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_job *job, u32 nu
 	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
 	WARN_ON(job->ibs[0].length_dw > num_dw);
 
+	lockdep_assert_held(&entity->lock);
+
 	return amdgpu_job_submit(job);
 }
 
@@ -268,7 +271,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
 		amdgpu_gart_map_vram_range(adev, pa, 0, num_pages, flags, cpu_addr);
 	}
 
-	dma_fence_put(amdgpu_ttm_job_submit(adev, job, num_dw));
+	dma_fence_put(amdgpu_ttm_job_submit(adev, entity, job, num_dw));
 	return 0;
 }
 
@@ -1512,7 +1515,7 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
 	amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr,
 				PAGE_SIZE, 0);
 
-	fence = amdgpu_ttm_job_submit(adev, job, num_dw);
+	fence = amdgpu_ttm_job_submit(adev, &adev->mman.default_entity, job, num_dw);
 	mutex_unlock(&adev->mman.default_entity.lock);
 
 	if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout))
@@ -2336,7 +2339,7 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev,
 		byte_count -= cur_size_in_bytes;
 	}
 
-	*fence = amdgpu_ttm_job_submit(adev, job, num_dw);
+	*fence = amdgpu_ttm_job_submit(adev, entity, job, num_dw);
 
 	return 0;
 
@@ -2379,7 +2382,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
 		byte_count -= cur_size;
 	}
 
-	*fence = amdgpu_ttm_job_submit(adev, job, num_dw);
+	*fence = amdgpu_ttm_job_submit(adev, entity, job, num_dw);
 	return 0;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 15/28] drm/amdgpu: double AMDGPU_GTT_MAX_TRANSFER_SIZE
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (13 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 14/28] drm/amdgpu: check entity lock is held in amdgpu_ttm_job_submit Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 10:12 ` [PATCH v3 16/28] drm/amdgpu: use larger gart window when possible Pierre-Eric Pelloux-Prayer
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

Makes copies/evictions faster when gart windows are required.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 2a78cf8a3f9f..0b3b03f43bab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -39,7 +39,7 @@
 #define AMDGPU_PL_MMIO_REMAP	(TTM_PL_PRIV + 5)
 #define __AMDGPU_PL_NUM	(TTM_PL_PRIV + 6)
 
-#define AMDGPU_GTT_MAX_TRANSFER_SIZE	512
+#define AMDGPU_GTT_MAX_TRANSFER_SIZE	1024
 
 extern const struct attribute_group amdgpu_vram_mgr_attr_group;
 extern const struct attribute_group amdgpu_gtt_mgr_attr_group;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 16/28] drm/amdgpu: use larger gart window when possible
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (14 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 15/28] drm/amdgpu: double AMDGPU_GTT_MAX_TRANSFER_SIZE Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 15:02   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 17/28] drm/amdgpu: introduce amdgpu_sdma_set_vm_pte_scheds Pierre-Eric Pelloux-Prayer
                   ` (11 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

Entities' gart windows are contiguous so when copying a buffer
and src doesn't need a gart window, its window can be used to
extend dst one (and vice versa).

This doubles the gart window size and reduces the number of jobs
required.

---
v2: pass adev instead of ring to amdgpu_ttm_needs_gart_window
---

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 78 ++++++++++++++++++-------
 1 file changed, 58 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 164b49d768d8..489880b2fb8e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -177,6 +177,21 @@ amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_ttm_buffer_entit
 	return amdgpu_job_submit(job);
 }
 
+static bool amdgpu_ttm_needs_gart_window(struct amdgpu_device *adev,
+					 struct ttm_resource *mem,
+					 struct amdgpu_res_cursor *mm_cur,
+					 bool tmz,
+					 uint64_t *addr)
+{
+	/* Map only what can't be accessed directly */
+	if (!tmz && mem->start != AMDGPU_BO_INVALID_OFFSET) {
+		*addr = amdgpu_ttm_domain_start(adev, mem->mem_type) +
+			mm_cur->start;
+		return false;
+	}
+	return true;
+}
+
 /**
  * amdgpu_ttm_map_buffer - Map memory into the GART windows
  * @adev: the device being used
@@ -185,6 +200,7 @@ amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_ttm_buffer_entit
  * @mem: memory object to map
  * @mm_cur: range to map
  * @window: which GART window to use
+ * @use_two_windows: if true, use a double window
  * @tmz: if we should setup a TMZ enabled mapping
  * @size: in number of bytes to map, out number of bytes mapped
  * @addr: resulting address inside the MC address space
@@ -198,6 +214,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
 				 struct ttm_resource *mem,
 				 struct amdgpu_res_cursor *mm_cur,
 				 unsigned int window,
+				 bool use_two_windows,
 				 bool tmz, uint64_t *size, uint64_t *addr)
 {
 	unsigned int offset, num_pages, num_dw, num_bytes;
@@ -213,13 +230,8 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
 	if (WARN_ON(mem->mem_type == AMDGPU_PL_PREEMPT))
 		return -EINVAL;
 
-	/* Map only what can't be accessed directly */
-	if (!tmz && mem->start != AMDGPU_BO_INVALID_OFFSET) {
-		*addr = amdgpu_ttm_domain_start(adev, mem->mem_type) +
-			mm_cur->start;
+	if (!amdgpu_ttm_needs_gart_window(adev, mem, mm_cur, tmz, addr))
 		return 0;
-	}
-
 
 	/*
 	 * If start begins at an offset inside the page, then adjust the size
@@ -228,7 +240,8 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
 	offset = mm_cur->start & ~PAGE_MASK;
 
 	num_pages = PFN_UP(*size + offset);
-	num_pages = min_t(uint32_t, num_pages, AMDGPU_GTT_MAX_TRANSFER_SIZE);
+	num_pages = min_t(uint32_t,
+		num_pages, AMDGPU_GTT_MAX_TRANSFER_SIZE * (use_two_windows ? 2 : 1));
 
 	*size = min(*size, (uint64_t)num_pages * PAGE_SIZE - offset);
 
@@ -300,7 +313,9 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
 				      struct dma_resv *resv,
 				      struct dma_fence **f)
 {
+	bool src_needs_gart_window, dst_needs_gart_window, use_two_gart_windows;
 	struct amdgpu_res_cursor src_mm, dst_mm;
+	int src_gart_window, dst_gart_window;
 	struct dma_fence *fence = NULL;
 	int r = 0;
 	uint32_t copy_flags = 0;
@@ -324,18 +339,40 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
 		/* Never copy more than 256MiB at once to avoid a timeout */
 		cur_size = min3(src_mm.size, dst_mm.size, 256ULL << 20);
 
-		/* Map src to window 0 and dst to window 1. */
-		r = amdgpu_ttm_map_buffer(adev, entity,
-					  src->bo, src->mem, &src_mm,
-					  0, tmz, &cur_size, &from);
-		if (r)
-			goto error;
+		/* If only one direction needs a gart window to access memory, use both
+		 * windows for it.
+		 */
+		src_needs_gart_window =
+			amdgpu_ttm_needs_gart_window(adev, src->mem, &src_mm, tmz, &from);
+		dst_needs_gart_window =
+			amdgpu_ttm_needs_gart_window(adev, dst->mem, &dst_mm, tmz, &to);
 
-		r = amdgpu_ttm_map_buffer(adev, entity,
-					  dst->bo, dst->mem, &dst_mm,
-					  1, tmz, &cur_size, &to);
-		if (r)
-			goto error;
+		if (src_needs_gart_window) {
+			src_gart_window = 0;
+			use_two_gart_windows = !dst_needs_gart_window;
+		}
+		if (dst_needs_gart_window) {
+			dst_gart_window = src_needs_gart_window ? 1 : 0;
+			use_two_gart_windows = !src_needs_gart_window;
+		}
+
+		if (src_needs_gart_window) {
+			r = amdgpu_ttm_map_buffer(adev, entity,
+						  src->bo, src->mem, &src_mm,
+						  src_gart_window, use_two_gart_windows,
+						  tmz, &cur_size, &from);
+			if (r)
+				goto error;
+		}
+
+		if (dst_needs_gart_window) {
+			r = amdgpu_ttm_map_buffer(adev, entity,
+						  dst->bo, dst->mem, &dst_mm,
+						  dst_gart_window, use_two_gart_windows,
+						  tmz, &cur_size, &to);
+			if (r)
+				goto error;
+		}
 
 		abo_src = ttm_to_amdgpu_bo(src->bo);
 		abo_dst = ttm_to_amdgpu_bo(dst->bo);
@@ -2434,7 +2471,7 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
 
 		r = amdgpu_ttm_map_buffer(adev, entity,
 					  &bo->tbo, bo->tbo.resource, &cursor,
-					  1, false, &size, &addr);
+					  1, false, false, &size, &addr);
 		if (r)
 			goto err;
 
@@ -2485,7 +2522,8 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
 
 		r = amdgpu_ttm_map_buffer(adev, entity,
 					  &bo->tbo, bo->tbo.resource, &dst,
-					  1, false, &cur_size, &to);
+					  1, false, false,
+					  &cur_size, &to);
 		if (r)
 			goto error;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 17/28] drm/amdgpu: introduce amdgpu_sdma_set_vm_pte_scheds
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (15 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 16/28] drm/amdgpu: use larger gart window when possible Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 15:05   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 18/28] drm/amdgpu: move sched status check inside amdgpu_ttm_set_buffer_funcs_status Pierre-Eric Pelloux-Prayer
                   ` (10 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

All sdma versions used the same logic, so add a helper and move the
common code to a single place.

---
v2: pass amdgpu_vm_pte_funcs as well
v3: drop all the *_set_vm_pte_funcs one liners
---

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h      |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   | 17 ++++++++++++
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c    | 31 ++++++---------------
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c   | 31 ++++++---------------
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c   | 31 ++++++---------------
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   | 35 ++++++------------------
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 35 ++++++------------------
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   | 31 ++++++---------------
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c   | 31 ++++++---------------
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c   | 29 ++++++--------------
 drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c   | 29 ++++++--------------
 drivers/gpu/drm/amd/amdgpu/si_dma.c      | 31 ++++++---------------
 12 files changed, 105 insertions(+), 228 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 790e84fec949..a50e3c0a4b18 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1612,6 +1612,8 @@ struct dma_fence *amdgpu_device_enforce_isolation(struct amdgpu_device *adev,
 bool amdgpu_device_has_display_hardware(struct amdgpu_device *adev);
 ssize_t amdgpu_get_soft_full_reset_mask(struct amdgpu_ring *ring);
 ssize_t amdgpu_show_reset_mask(char *buf, uint32_t supported_reset);
+void amdgpu_sdma_set_vm_pte_scheds(struct amdgpu_device *adev,
+				   const struct amdgpu_vm_pte_funcs *vm_pte_funcs);
 
 /* atpx handler */
 #if defined(CONFIG_VGA_SWITCHEROO)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 193de267984e..5061d5b0f875 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3228,3 +3228,20 @@ void amdgpu_vm_print_task_info(struct amdgpu_device *adev,
 		task_info->process_name, task_info->tgid,
 		task_info->task.comm, task_info->task.pid);
 }
+
+void amdgpu_sdma_set_vm_pte_scheds(struct amdgpu_device *adev,
+				   const struct amdgpu_vm_pte_funcs *vm_pte_funcs)
+{
+	struct drm_gpu_scheduler *sched;
+	int i;
+
+	for (i = 0; i < adev->sdma.num_instances; i++) {
+		if (adev->sdma.has_page_queue)
+			sched = &adev->sdma.instance[i].page.sched;
+		else
+			sched = &adev->sdma.instance[i].ring.sched;
+		adev->vm_manager.vm_pte_scheds[i] = sched;
+	}
+	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
+	adev->vm_manager.vm_pte_funcs = vm_pte_funcs;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
index 9e8715b4739d..22780c09177d 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
@@ -53,7 +53,6 @@ static const u32 sdma_offsets[SDMA_MAX_INSTANCE] =
 static void cik_sdma_set_ring_funcs(struct amdgpu_device *adev);
 static void cik_sdma_set_irq_funcs(struct amdgpu_device *adev);
 static void cik_sdma_set_buffer_funcs(struct amdgpu_device *adev);
-static void cik_sdma_set_vm_pte_funcs(struct amdgpu_device *adev);
 static int cik_sdma_soft_reset(struct amdgpu_ip_block *ip_block);
 
 u32 amdgpu_cik_gpu_check_soft_reset(struct amdgpu_device *adev);
@@ -919,6 +918,14 @@ static void cik_enable_sdma_mgls(struct amdgpu_device *adev,
 	}
 }
 
+static const struct amdgpu_vm_pte_funcs cik_sdma_vm_pte_funcs = {
+	.copy_pte_num_dw = 7,
+	.copy_pte = cik_sdma_vm_copy_pte,
+
+	.write_pte = cik_sdma_vm_write_pte,
+	.set_pte_pde = cik_sdma_vm_set_pte_pde,
+};
+
 static int cik_sdma_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
@@ -933,7 +940,7 @@ static int cik_sdma_early_init(struct amdgpu_ip_block *ip_block)
 	cik_sdma_set_ring_funcs(adev);
 	cik_sdma_set_irq_funcs(adev);
 	cik_sdma_set_buffer_funcs(adev);
-	cik_sdma_set_vm_pte_funcs(adev);
+	amdgpu_sdma_set_vm_pte_scheds(adev, &cik_sdma_vm_pte_funcs);
 
 	return 0;
 }
@@ -1337,26 +1344,6 @@ static void cik_sdma_set_buffer_funcs(struct amdgpu_device *adev)
 	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
 }
 
-static const struct amdgpu_vm_pte_funcs cik_sdma_vm_pte_funcs = {
-	.copy_pte_num_dw = 7,
-	.copy_pte = cik_sdma_vm_copy_pte,
-
-	.write_pte = cik_sdma_vm_write_pte,
-	.set_pte_pde = cik_sdma_vm_set_pte_pde,
-};
-
-static void cik_sdma_set_vm_pte_funcs(struct amdgpu_device *adev)
-{
-	unsigned i;
-
-	adev->vm_manager.vm_pte_funcs = &cik_sdma_vm_pte_funcs;
-	for (i = 0; i < adev->sdma.num_instances; i++) {
-		adev->vm_manager.vm_pte_scheds[i] =
-			&adev->sdma.instance[i].ring.sched;
-	}
-	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
-}
-
 const struct amdgpu_ip_block_version cik_sdma_ip_block =
 {
 	.type = AMD_IP_BLOCK_TYPE_SDMA,
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
index 92ce580647cd..0090ace49024 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
@@ -51,7 +51,6 @@
 
 static void sdma_v2_4_set_ring_funcs(struct amdgpu_device *adev);
 static void sdma_v2_4_set_buffer_funcs(struct amdgpu_device *adev);
-static void sdma_v2_4_set_vm_pte_funcs(struct amdgpu_device *adev);
 static void sdma_v2_4_set_irq_funcs(struct amdgpu_device *adev);
 
 MODULE_FIRMWARE("amdgpu/topaz_sdma.bin");
@@ -809,6 +808,14 @@ static void sdma_v2_4_ring_emit_wreg(struct amdgpu_ring *ring,
 	amdgpu_ring_write(ring, val);
 }
 
+static const struct amdgpu_vm_pte_funcs sdma_v2_4_vm_pte_funcs = {
+	.copy_pte_num_dw = 7,
+	.copy_pte = sdma_v2_4_vm_copy_pte,
+
+	.write_pte = sdma_v2_4_vm_write_pte,
+	.set_pte_pde = sdma_v2_4_vm_set_pte_pde,
+};
+
 static int sdma_v2_4_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
@@ -822,7 +829,7 @@ static int sdma_v2_4_early_init(struct amdgpu_ip_block *ip_block)
 
 	sdma_v2_4_set_ring_funcs(adev);
 	sdma_v2_4_set_buffer_funcs(adev);
-	sdma_v2_4_set_vm_pte_funcs(adev);
+	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v2_4_vm_pte_funcs);
 	sdma_v2_4_set_irq_funcs(adev);
 
 	return 0;
@@ -1232,26 +1239,6 @@ static void sdma_v2_4_set_buffer_funcs(struct amdgpu_device *adev)
 	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
 }
 
-static const struct amdgpu_vm_pte_funcs sdma_v2_4_vm_pte_funcs = {
-	.copy_pte_num_dw = 7,
-	.copy_pte = sdma_v2_4_vm_copy_pte,
-
-	.write_pte = sdma_v2_4_vm_write_pte,
-	.set_pte_pde = sdma_v2_4_vm_set_pte_pde,
-};
-
-static void sdma_v2_4_set_vm_pte_funcs(struct amdgpu_device *adev)
-{
-	unsigned i;
-
-	adev->vm_manager.vm_pte_funcs = &sdma_v2_4_vm_pte_funcs;
-	for (i = 0; i < adev->sdma.num_instances; i++) {
-		adev->vm_manager.vm_pte_scheds[i] =
-			&adev->sdma.instance[i].ring.sched;
-	}
-	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
-}
-
 const struct amdgpu_ip_block_version sdma_v2_4_ip_block = {
 	.type = AMD_IP_BLOCK_TYPE_SDMA,
 	.major = 2,
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
index 1c076bd1cf73..2526d393162a 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
@@ -51,7 +51,6 @@
 
 static void sdma_v3_0_set_ring_funcs(struct amdgpu_device *adev);
 static void sdma_v3_0_set_buffer_funcs(struct amdgpu_device *adev);
-static void sdma_v3_0_set_vm_pte_funcs(struct amdgpu_device *adev);
 static void sdma_v3_0_set_irq_funcs(struct amdgpu_device *adev);
 
 MODULE_FIRMWARE("amdgpu/tonga_sdma.bin");
@@ -1082,6 +1081,14 @@ static void sdma_v3_0_ring_emit_wreg(struct amdgpu_ring *ring,
 	amdgpu_ring_write(ring, val);
 }
 
+static const struct amdgpu_vm_pte_funcs sdma_v3_0_vm_pte_funcs = {
+	.copy_pte_num_dw = 7,
+	.copy_pte = sdma_v3_0_vm_copy_pte,
+
+	.write_pte = sdma_v3_0_vm_write_pte,
+	.set_pte_pde = sdma_v3_0_vm_set_pte_pde,
+};
+
 static int sdma_v3_0_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
@@ -1102,7 +1109,7 @@ static int sdma_v3_0_early_init(struct amdgpu_ip_block *ip_block)
 
 	sdma_v3_0_set_ring_funcs(adev);
 	sdma_v3_0_set_buffer_funcs(adev);
-	sdma_v3_0_set_vm_pte_funcs(adev);
+	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v3_0_vm_pte_funcs);
 	sdma_v3_0_set_irq_funcs(adev);
 
 	return 0;
@@ -1674,26 +1681,6 @@ static void sdma_v3_0_set_buffer_funcs(struct amdgpu_device *adev)
 	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
 }
 
-static const struct amdgpu_vm_pte_funcs sdma_v3_0_vm_pte_funcs = {
-	.copy_pte_num_dw = 7,
-	.copy_pte = sdma_v3_0_vm_copy_pte,
-
-	.write_pte = sdma_v3_0_vm_write_pte,
-	.set_pte_pde = sdma_v3_0_vm_set_pte_pde,
-};
-
-static void sdma_v3_0_set_vm_pte_funcs(struct amdgpu_device *adev)
-{
-	unsigned i;
-
-	adev->vm_manager.vm_pte_funcs = &sdma_v3_0_vm_pte_funcs;
-	for (i = 0; i < adev->sdma.num_instances; i++) {
-		adev->vm_manager.vm_pte_scheds[i] =
-			 &adev->sdma.instance[i].ring.sched;
-	}
-	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
-}
-
 const struct amdgpu_ip_block_version sdma_v3_0_ip_block =
 {
 	.type = AMD_IP_BLOCK_TYPE_SDMA,
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index f38004e6064e..a35d9951e22a 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -129,7 +129,6 @@ static const struct amdgpu_hwip_reg_entry sdma_reg_list_4_0[] = {
 
 static void sdma_v4_0_set_ring_funcs(struct amdgpu_device *adev);
 static void sdma_v4_0_set_buffer_funcs(struct amdgpu_device *adev);
-static void sdma_v4_0_set_vm_pte_funcs(struct amdgpu_device *adev);
 static void sdma_v4_0_set_irq_funcs(struct amdgpu_device *adev);
 static void sdma_v4_0_set_ras_funcs(struct amdgpu_device *adev);
 
@@ -1751,6 +1750,14 @@ static bool sdma_v4_0_fw_support_paging_queue(struct amdgpu_device *adev)
 	}
 }
 
+static const struct amdgpu_vm_pte_funcs sdma_v4_0_vm_pte_funcs = {
+	.copy_pte_num_dw = 7,
+	.copy_pte = sdma_v4_0_vm_copy_pte,
+
+	.write_pte = sdma_v4_0_vm_write_pte,
+	.set_pte_pde = sdma_v4_0_vm_set_pte_pde,
+};
+
 static int sdma_v4_0_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
@@ -1769,7 +1776,7 @@ static int sdma_v4_0_early_init(struct amdgpu_ip_block *ip_block)
 
 	sdma_v4_0_set_ring_funcs(adev);
 	sdma_v4_0_set_buffer_funcs(adev);
-	sdma_v4_0_set_vm_pte_funcs(adev);
+	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v4_0_vm_pte_funcs);
 	sdma_v4_0_set_irq_funcs(adev);
 	sdma_v4_0_set_ras_funcs(adev);
 
@@ -2615,30 +2622,6 @@ static void sdma_v4_0_set_buffer_funcs(struct amdgpu_device *adev)
 		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
 }
 
-static const struct amdgpu_vm_pte_funcs sdma_v4_0_vm_pte_funcs = {
-	.copy_pte_num_dw = 7,
-	.copy_pte = sdma_v4_0_vm_copy_pte,
-
-	.write_pte = sdma_v4_0_vm_write_pte,
-	.set_pte_pde = sdma_v4_0_vm_set_pte_pde,
-};
-
-static void sdma_v4_0_set_vm_pte_funcs(struct amdgpu_device *adev)
-{
-	struct drm_gpu_scheduler *sched;
-	unsigned i;
-
-	adev->vm_manager.vm_pte_funcs = &sdma_v4_0_vm_pte_funcs;
-	for (i = 0; i < adev->sdma.num_instances; i++) {
-		if (adev->sdma.has_page_queue)
-			sched = &adev->sdma.instance[i].page.sched;
-		else
-			sched = &adev->sdma.instance[i].ring.sched;
-		adev->vm_manager.vm_pte_scheds[i] = sched;
-	}
-	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
-}
-
 static void sdma_v4_0_get_ras_error_count(uint32_t value,
 					uint32_t instance,
 					uint32_t *sec_count)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
index a1443990d5c6..7f77367848d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
@@ -104,7 +104,6 @@ static const struct amdgpu_hwip_reg_entry sdma_reg_list_4_4_2[] = {
 
 static void sdma_v4_4_2_set_ring_funcs(struct amdgpu_device *adev);
 static void sdma_v4_4_2_set_buffer_funcs(struct amdgpu_device *adev);
-static void sdma_v4_4_2_set_vm_pte_funcs(struct amdgpu_device *adev);
 static void sdma_v4_4_2_set_irq_funcs(struct amdgpu_device *adev);
 static void sdma_v4_4_2_set_ras_funcs(struct amdgpu_device *adev);
 static void sdma_v4_4_2_update_reset_mask(struct amdgpu_device *adev);
@@ -1347,6 +1346,14 @@ static const struct amdgpu_sdma_funcs sdma_v4_4_2_sdma_funcs = {
 	.soft_reset_kernel_queue = &sdma_v4_4_2_soft_reset_engine,
 };
 
+static const struct amdgpu_vm_pte_funcs sdma_v4_4_2_vm_pte_funcs = {
+	.copy_pte_num_dw = 7,
+	.copy_pte = sdma_v4_4_2_vm_copy_pte,
+
+	.write_pte = sdma_v4_4_2_vm_write_pte,
+	.set_pte_pde = sdma_v4_4_2_vm_set_pte_pde,
+};
+
 static int sdma_v4_4_2_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
@@ -1362,7 +1369,7 @@ static int sdma_v4_4_2_early_init(struct amdgpu_ip_block *ip_block)
 
 	sdma_v4_4_2_set_ring_funcs(adev);
 	sdma_v4_4_2_set_buffer_funcs(adev);
-	sdma_v4_4_2_set_vm_pte_funcs(adev);
+	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v4_4_2_vm_pte_funcs);
 	sdma_v4_4_2_set_irq_funcs(adev);
 	sdma_v4_4_2_set_ras_funcs(adev);
 	return 0;
@@ -2316,30 +2323,6 @@ static void sdma_v4_4_2_set_buffer_funcs(struct amdgpu_device *adev)
 		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
 }
 
-static const struct amdgpu_vm_pte_funcs sdma_v4_4_2_vm_pte_funcs = {
-	.copy_pte_num_dw = 7,
-	.copy_pte = sdma_v4_4_2_vm_copy_pte,
-
-	.write_pte = sdma_v4_4_2_vm_write_pte,
-	.set_pte_pde = sdma_v4_4_2_vm_set_pte_pde,
-};
-
-static void sdma_v4_4_2_set_vm_pte_funcs(struct amdgpu_device *adev)
-{
-	struct drm_gpu_scheduler *sched;
-	unsigned i;
-
-	adev->vm_manager.vm_pte_funcs = &sdma_v4_4_2_vm_pte_funcs;
-	for (i = 0; i < adev->sdma.num_instances; i++) {
-		if (adev->sdma.has_page_queue)
-			sched = &adev->sdma.instance[i].page.sched;
-		else
-			sched = &adev->sdma.instance[i].ring.sched;
-		adev->vm_manager.vm_pte_scheds[i] = sched;
-	}
-	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
-}
-
 /**
  * sdma_v4_4_2_update_reset_mask - update  reset mask for SDMA
  * @adev: Pointer to the AMDGPU device structure
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 8ddc4df06a1f..7ce13c5d4e61 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -110,7 +110,6 @@ static const struct amdgpu_hwip_reg_entry sdma_reg_list_5_0[] = {
 
 static void sdma_v5_0_set_ring_funcs(struct amdgpu_device *adev);
 static void sdma_v5_0_set_buffer_funcs(struct amdgpu_device *adev);
-static void sdma_v5_0_set_vm_pte_funcs(struct amdgpu_device *adev);
 static void sdma_v5_0_set_irq_funcs(struct amdgpu_device *adev);
 static int sdma_v5_0_stop_queue(struct amdgpu_ring *ring);
 static int sdma_v5_0_restore_queue(struct amdgpu_ring *ring);
@@ -1357,6 +1356,13 @@ static const struct amdgpu_sdma_funcs sdma_v5_0_sdma_funcs = {
 	.soft_reset_kernel_queue = &sdma_v5_0_soft_reset_engine,
 };
 
+static const struct amdgpu_vm_pte_funcs sdma_v5_0_vm_pte_funcs = {
+	.copy_pte_num_dw = 7,
+	.copy_pte = sdma_v5_0_vm_copy_pte,
+	.write_pte = sdma_v5_0_vm_write_pte,
+	.set_pte_pde = sdma_v5_0_vm_set_pte_pde,
+};
+
 static int sdma_v5_0_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
@@ -1368,7 +1374,7 @@ static int sdma_v5_0_early_init(struct amdgpu_ip_block *ip_block)
 
 	sdma_v5_0_set_ring_funcs(adev);
 	sdma_v5_0_set_buffer_funcs(adev);
-	sdma_v5_0_set_vm_pte_funcs(adev);
+	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v5_0_vm_pte_funcs);
 	sdma_v5_0_set_irq_funcs(adev);
 	sdma_v5_0_set_mqd_funcs(adev);
 
@@ -2073,27 +2079,6 @@ static void sdma_v5_0_set_buffer_funcs(struct amdgpu_device *adev)
 	}
 }
 
-static const struct amdgpu_vm_pte_funcs sdma_v5_0_vm_pte_funcs = {
-	.copy_pte_num_dw = 7,
-	.copy_pte = sdma_v5_0_vm_copy_pte,
-	.write_pte = sdma_v5_0_vm_write_pte,
-	.set_pte_pde = sdma_v5_0_vm_set_pte_pde,
-};
-
-static void sdma_v5_0_set_vm_pte_funcs(struct amdgpu_device *adev)
-{
-	unsigned i;
-
-	if (adev->vm_manager.vm_pte_funcs == NULL) {
-		adev->vm_manager.vm_pte_funcs = &sdma_v5_0_vm_pte_funcs;
-		for (i = 0; i < adev->sdma.num_instances; i++) {
-			adev->vm_manager.vm_pte_scheds[i] =
-				&adev->sdma.instance[i].ring.sched;
-		}
-		adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
-	}
-}
-
 const struct amdgpu_ip_block_version sdma_v5_0_ip_block = {
 	.type = AMD_IP_BLOCK_TYPE_SDMA,
 	.major = 5,
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
index 51101b0aa2fa..98beff18cf28 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
@@ -111,7 +111,6 @@ static const struct amdgpu_hwip_reg_entry sdma_reg_list_5_2[] = {
 
 static void sdma_v5_2_set_ring_funcs(struct amdgpu_device *adev);
 static void sdma_v5_2_set_buffer_funcs(struct amdgpu_device *adev);
-static void sdma_v5_2_set_vm_pte_funcs(struct amdgpu_device *adev);
 static void sdma_v5_2_set_irq_funcs(struct amdgpu_device *adev);
 static int sdma_v5_2_stop_queue(struct amdgpu_ring *ring);
 static int sdma_v5_2_restore_queue(struct amdgpu_ring *ring);
@@ -1248,6 +1247,13 @@ static void sdma_v5_2_ring_emit_reg_write_reg_wait(struct amdgpu_ring *ring,
 	amdgpu_ring_emit_reg_wait(ring, reg1, mask, mask);
 }
 
+static const struct amdgpu_vm_pte_funcs sdma_v5_2_vm_pte_funcs = {
+	.copy_pte_num_dw = 7,
+	.copy_pte = sdma_v5_2_vm_copy_pte,
+	.write_pte = sdma_v5_2_vm_write_pte,
+	.set_pte_pde = sdma_v5_2_vm_set_pte_pde,
+};
+
 static int sdma_v5_2_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
@@ -1259,7 +1265,7 @@ static int sdma_v5_2_early_init(struct amdgpu_ip_block *ip_block)
 
 	sdma_v5_2_set_ring_funcs(adev);
 	sdma_v5_2_set_buffer_funcs(adev);
-	sdma_v5_2_set_vm_pte_funcs(adev);
+	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v5_2_vm_pte_funcs);
 	sdma_v5_2_set_irq_funcs(adev);
 	sdma_v5_2_set_mqd_funcs(adev);
 
@@ -2084,27 +2090,6 @@ static void sdma_v5_2_set_buffer_funcs(struct amdgpu_device *adev)
 	}
 }
 
-static const struct amdgpu_vm_pte_funcs sdma_v5_2_vm_pte_funcs = {
-	.copy_pte_num_dw = 7,
-	.copy_pte = sdma_v5_2_vm_copy_pte,
-	.write_pte = sdma_v5_2_vm_write_pte,
-	.set_pte_pde = sdma_v5_2_vm_set_pte_pde,
-};
-
-static void sdma_v5_2_set_vm_pte_funcs(struct amdgpu_device *adev)
-{
-	unsigned i;
-
-	if (adev->vm_manager.vm_pte_funcs == NULL) {
-		adev->vm_manager.vm_pte_funcs = &sdma_v5_2_vm_pte_funcs;
-		for (i = 0; i < adev->sdma.num_instances; i++) {
-			adev->vm_manager.vm_pte_scheds[i] =
-				&adev->sdma.instance[i].ring.sched;
-		}
-		adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
-	}
-}
-
 const struct amdgpu_ip_block_version sdma_v5_2_ip_block = {
 	.type = AMD_IP_BLOCK_TYPE_SDMA,
 	.major = 5,
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index e3f725bc2f29..c32331b72ba0 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -119,7 +119,6 @@ static const struct amdgpu_hwip_reg_entry sdma_reg_list_6_0[] = {
 
 static void sdma_v6_0_set_ring_funcs(struct amdgpu_device *adev);
 static void sdma_v6_0_set_buffer_funcs(struct amdgpu_device *adev);
-static void sdma_v6_0_set_vm_pte_funcs(struct amdgpu_device *adev);
 static void sdma_v6_0_set_irq_funcs(struct amdgpu_device *adev);
 static int sdma_v6_0_start(struct amdgpu_device *adev);
 
@@ -1268,6 +1267,13 @@ static void sdma_v6_0_set_ras_funcs(struct amdgpu_device *adev)
 	}
 }
 
+static const struct amdgpu_vm_pte_funcs sdma_v6_0_vm_pte_funcs = {
+	.copy_pte_num_dw = 7,
+	.copy_pte = sdma_v6_0_vm_copy_pte,
+	.write_pte = sdma_v6_0_vm_write_pte,
+	.set_pte_pde = sdma_v6_0_vm_set_pte_pde,
+};
+
 static int sdma_v6_0_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
@@ -1296,7 +1302,7 @@ static int sdma_v6_0_early_init(struct amdgpu_ip_block *ip_block)
 
 	sdma_v6_0_set_ring_funcs(adev);
 	sdma_v6_0_set_buffer_funcs(adev);
-	sdma_v6_0_set_vm_pte_funcs(adev);
+	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v6_0_vm_pte_funcs);
 	sdma_v6_0_set_irq_funcs(adev);
 	sdma_v6_0_set_mqd_funcs(adev);
 	sdma_v6_0_set_ras_funcs(adev);
@@ -1889,25 +1895,6 @@ static void sdma_v6_0_set_buffer_funcs(struct amdgpu_device *adev)
 	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
 }
 
-static const struct amdgpu_vm_pte_funcs sdma_v6_0_vm_pte_funcs = {
-	.copy_pte_num_dw = 7,
-	.copy_pte = sdma_v6_0_vm_copy_pte,
-	.write_pte = sdma_v6_0_vm_write_pte,
-	.set_pte_pde = sdma_v6_0_vm_set_pte_pde,
-};
-
-static void sdma_v6_0_set_vm_pte_funcs(struct amdgpu_device *adev)
-{
-	unsigned i;
-
-	adev->vm_manager.vm_pte_funcs = &sdma_v6_0_vm_pte_funcs;
-	for (i = 0; i < adev->sdma.num_instances; i++) {
-		adev->vm_manager.vm_pte_scheds[i] =
-			&adev->sdma.instance[i].ring.sched;
-	}
-	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
-}
-
 const struct amdgpu_ip_block_version sdma_v6_0_ip_block = {
 	.type = AMD_IP_BLOCK_TYPE_SDMA,
 	.major = 6,
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
index 7fee98d37720..9318d23eb71e 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
@@ -119,7 +119,6 @@ static const struct amdgpu_hwip_reg_entry sdma_reg_list_7_0[] = {
 
 static void sdma_v7_0_set_ring_funcs(struct amdgpu_device *adev);
 static void sdma_v7_0_set_buffer_funcs(struct amdgpu_device *adev);
-static void sdma_v7_0_set_vm_pte_funcs(struct amdgpu_device *adev);
 static void sdma_v7_0_set_irq_funcs(struct amdgpu_device *adev);
 static int sdma_v7_0_start(struct amdgpu_device *adev);
 
@@ -1253,6 +1252,13 @@ static void sdma_v7_0_ring_emit_reg_write_reg_wait(struct amdgpu_ring *ring,
 	amdgpu_ring_emit_reg_wait(ring, reg1, mask, mask);
 }
 
+static const struct amdgpu_vm_pte_funcs sdma_v7_0_vm_pte_funcs = {
+	.copy_pte_num_dw = 8,
+	.copy_pte = sdma_v7_0_vm_copy_pte,
+	.write_pte = sdma_v7_0_vm_write_pte,
+	.set_pte_pde = sdma_v7_0_vm_set_pte_pde,
+};
+
 static int sdma_v7_0_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
@@ -1283,7 +1289,7 @@ static int sdma_v7_0_early_init(struct amdgpu_ip_block *ip_block)
 
 	sdma_v7_0_set_ring_funcs(adev);
 	sdma_v7_0_set_buffer_funcs(adev);
-	sdma_v7_0_set_vm_pte_funcs(adev);
+	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v7_0_vm_pte_funcs);
 	sdma_v7_0_set_irq_funcs(adev);
 	sdma_v7_0_set_mqd_funcs(adev);
 
@@ -1831,25 +1837,6 @@ static void sdma_v7_0_set_buffer_funcs(struct amdgpu_device *adev)
 	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
 }
 
-static const struct amdgpu_vm_pte_funcs sdma_v7_0_vm_pte_funcs = {
-	.copy_pte_num_dw = 8,
-	.copy_pte = sdma_v7_0_vm_copy_pte,
-	.write_pte = sdma_v7_0_vm_write_pte,
-	.set_pte_pde = sdma_v7_0_vm_set_pte_pde,
-};
-
-static void sdma_v7_0_set_vm_pte_funcs(struct amdgpu_device *adev)
-{
-	unsigned i;
-
-	adev->vm_manager.vm_pte_funcs = &sdma_v7_0_vm_pte_funcs;
-	for (i = 0; i < adev->sdma.num_instances; i++) {
-		adev->vm_manager.vm_pte_scheds[i] =
-			&adev->sdma.instance[i].ring.sched;
-	}
-	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
-}
-
 const struct amdgpu_ip_block_version sdma_v7_0_ip_block = {
 	.type = AMD_IP_BLOCK_TYPE_SDMA,
 	.major = 7,
diff --git a/drivers/gpu/drm/amd/amdgpu/si_dma.c b/drivers/gpu/drm/amd/amdgpu/si_dma.c
index 7f18e4875287..b85df997ed49 100644
--- a/drivers/gpu/drm/amd/amdgpu/si_dma.c
+++ b/drivers/gpu/drm/amd/amdgpu/si_dma.c
@@ -37,7 +37,6 @@ const u32 sdma_offsets[SDMA_MAX_INSTANCE] =
 
 static void si_dma_set_ring_funcs(struct amdgpu_device *adev);
 static void si_dma_set_buffer_funcs(struct amdgpu_device *adev);
-static void si_dma_set_vm_pte_funcs(struct amdgpu_device *adev);
 static void si_dma_set_irq_funcs(struct amdgpu_device *adev);
 
 /**
@@ -473,6 +472,14 @@ static void si_dma_ring_emit_wreg(struct amdgpu_ring *ring,
 	amdgpu_ring_write(ring, val);
 }
 
+static const struct amdgpu_vm_pte_funcs si_dma_vm_pte_funcs = {
+	.copy_pte_num_dw = 5,
+	.copy_pte = si_dma_vm_copy_pte,
+
+	.write_pte = si_dma_vm_write_pte,
+	.set_pte_pde = si_dma_vm_set_pte_pde,
+};
+
 static int si_dma_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
@@ -481,7 +488,7 @@ static int si_dma_early_init(struct amdgpu_ip_block *ip_block)
 
 	si_dma_set_ring_funcs(adev);
 	si_dma_set_buffer_funcs(adev);
-	si_dma_set_vm_pte_funcs(adev);
+	amdgpu_sdma_set_vm_pte_scheds(adev, &si_dma_vm_pte_funcs);
 	si_dma_set_irq_funcs(adev);
 
 	return 0;
@@ -830,26 +837,6 @@ static void si_dma_set_buffer_funcs(struct amdgpu_device *adev)
 	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
 }
 
-static const struct amdgpu_vm_pte_funcs si_dma_vm_pte_funcs = {
-	.copy_pte_num_dw = 5,
-	.copy_pte = si_dma_vm_copy_pte,
-
-	.write_pte = si_dma_vm_write_pte,
-	.set_pte_pde = si_dma_vm_set_pte_pde,
-};
-
-static void si_dma_set_vm_pte_funcs(struct amdgpu_device *adev)
-{
-	unsigned i;
-
-	adev->vm_manager.vm_pte_funcs = &si_dma_vm_pte_funcs;
-	for (i = 0; i < adev->sdma.num_instances; i++) {
-		adev->vm_manager.vm_pte_scheds[i] =
-			&adev->sdma.instance[i].ring.sched;
-	}
-	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
-}
-
 const struct amdgpu_ip_block_version si_dma_ip_block =
 {
 	.type = AMD_IP_BLOCK_TYPE_SDMA,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 18/28] drm/amdgpu: move sched status check inside amdgpu_ttm_set_buffer_funcs_status
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (16 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 17/28] drm/amdgpu: introduce amdgpu_sdma_set_vm_pte_scheds Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 15:08   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 19/28] drm/ttm: rework pipelined eviction fence handling Pierre-Eric Pelloux-Prayer
                   ` (9 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

It avoids duplicated code and allows to output a warning.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 ++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  6 ++++++
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 54f7c81f287b..7167db54d722 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3309,9 +3309,7 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 	if (r)
 		goto init_failed;
 
-	if (adev->mman.buffer_funcs_ring &&
-	    adev->mman.buffer_funcs_ring->sched.ready)
-		amdgpu_ttm_set_buffer_funcs_status(adev, true);
+	amdgpu_ttm_set_buffer_funcs_status(adev, true);
 
 	/* Don't init kfd if whole hive need to be reset during init */
 	if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
@@ -4191,8 +4189,7 @@ static int amdgpu_device_ip_resume(struct amdgpu_device *adev)
 
 	r = amdgpu_device_ip_resume_phase2(adev);
 
-	if (adev->mman.buffer_funcs_ring->sched.ready)
-		amdgpu_ttm_set_buffer_funcs_status(adev, true);
+	amdgpu_ttm_set_buffer_funcs_status(adev, true);
 
 	if (r)
 		return r;
@@ -5321,8 +5318,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool notify_clients)
 	return 0;
 
 unwind_evict:
-	if (adev->mman.buffer_funcs_ring->sched.ready)
-		amdgpu_ttm_set_buffer_funcs_status(adev, true);
+	amdgpu_ttm_set_buffer_funcs_status(adev, true);
 	amdgpu_fence_driver_hw_init(adev);
 
 unwind_userq:
@@ -6050,8 +6046,7 @@ int amdgpu_device_reinit_after_reset(struct amdgpu_reset_context *reset_context)
 				if (r)
 					goto out;
 
-				if (tmp_adev->mman.buffer_funcs_ring->sched.ready)
-					amdgpu_ttm_set_buffer_funcs_status(tmp_adev, true);
+				amdgpu_ttm_set_buffer_funcs_status(tmp_adev, true);
 
 				r = amdgpu_device_ip_resume_phase3(tmp_adev);
 				if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 489880b2fb8e..9024dde0c5a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2233,6 +2233,12 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 	    adev->mman.buffer_funcs_enabled == enable || adev->gmc.is_app_apu)
 		return reserved_windows;
 
+	if ((!adev->mman.buffer_funcs_ring || !adev->mman.buffer_funcs_ring->sched.ready) &&
+	    enable) {
+		dev_warn(adev->dev, "Not enabling DMA transfers for in kernel use");
+		return 0;
+	}
+
 	if (enable) {
 		struct amdgpu_ring *ring;
 		struct drm_gpu_scheduler *sched;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 19/28] drm/ttm: rework pipelined eviction fence handling
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (17 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 18/28] drm/amdgpu: move sched status check inside amdgpu_ttm_set_buffer_funcs_status Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 15:12   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 20/28] drm/amdgpu: allocate multiple clear entities Pierre-Eric Pelloux-Prayer
                   ` (8 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Christian Koenig, Huang Rui, Matthew Auld, Matthew Brost,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Sumit Semwal
  Cc: Pierre-Eric Pelloux-Prayer, dri-devel, linux-kernel, linux-media,
	linaro-mm-sig

Until now ttm stored a single pipelined eviction fence which means
drivers had to use a single entity for these evictions.

To lift this requirement, this commit allows up to 8 entities to
be used.

Ideally a dma_resv object would have been used as a container of
the eviction fences, but the locking rules makes it complex.
dma_resv all have the same ww_class, which means "Attempting to
lock more mutexes after ww_acquire_done." is an error.

One alternative considered was to introduced a 2nd ww_class for
specific resv to hold a single "transient" lock (= the resv lock
would only be held for a short period, without taking any other
locks).

The other option, is to statically reserve a fence array, and
extend the existing code to deal with N fences, instead of 1.

The driver is still responsible to reserve the correct number
of fence slots.

---
v2:
- simplified code
- dropped n_fences
- name changes
v3: use ttm_resource_manager_cleanup
---

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 .../gpu/drm/ttm/tests/ttm_bo_validate_test.c  | 11 +++--
 drivers/gpu/drm/ttm/tests/ttm_resource_test.c |  5 +-
 drivers/gpu/drm/ttm/ttm_bo.c                  | 47 ++++++++++---------
 drivers/gpu/drm/ttm/ttm_bo_util.c             | 38 ++++++++++++---
 drivers/gpu/drm/ttm/ttm_resource.c            | 31 +++++++-----
 include/drm/ttm/ttm_resource.h                | 29 ++++++++----
 6 files changed, 104 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
index 3148f5d3dbd6..8f71906c4238 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
@@ -651,7 +651,7 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test)
 	int err;
 
 	man = ttm_manager_type(priv->ttm_dev, mem_type);
-	man->move = dma_fence_get_stub();
+	man->eviction_fences[0] = dma_fence_get_stub();
 
 	bo = ttm_bo_kunit_init(test, test->priv, size, NULL);
 	bo->type = bo_type;
@@ -668,7 +668,7 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test)
 	KUNIT_EXPECT_EQ(test, ctx.bytes_moved, size);
 
 	ttm_bo_put(bo);
-	dma_fence_put(man->move);
+	dma_fence_put(man->eviction_fences[0]);
 }
 
 static const struct ttm_bo_validate_test_case ttm_bo_validate_wait_cases[] = {
@@ -732,9 +732,9 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test)
 
 	spin_lock_init(&fence_lock);
 	man = ttm_manager_type(priv->ttm_dev, fst_mem);
-	man->move = alloc_mock_fence(test);
+	man->eviction_fences[0] = alloc_mock_fence(test);
 
-	task = kthread_create(threaded_fence_signal, man->move, "move-fence-signal");
+	task = kthread_create(threaded_fence_signal, man->eviction_fences[0], "move-fence-signal");
 	if (IS_ERR(task))
 		KUNIT_FAIL(test, "Couldn't create move fence signal task\n");
 
@@ -742,7 +742,8 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test)
 	err = ttm_bo_validate(bo, placement_val, &ctx_val);
 	dma_resv_unlock(bo->base.resv);
 
-	dma_fence_wait_timeout(man->move, false, MAX_SCHEDULE_TIMEOUT);
+	dma_fence_wait_timeout(man->eviction_fences[0], false, MAX_SCHEDULE_TIMEOUT);
+	man->eviction_fences[0] = NULL;
 
 	KUNIT_EXPECT_EQ(test, err, 0);
 	KUNIT_EXPECT_EQ(test, ctx_val.bytes_moved, size);
diff --git a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c
index e6ea2bd01f07..c0e4e35e0442 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c
@@ -207,6 +207,7 @@ static void ttm_resource_manager_init_basic(struct kunit *test)
 	struct ttm_resource_test_priv *priv = test->priv;
 	struct ttm_resource_manager *man;
 	size_t size = SZ_16K;
+	int i;
 
 	man = kunit_kzalloc(test, sizeof(*man), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, man);
@@ -216,8 +217,8 @@ static void ttm_resource_manager_init_basic(struct kunit *test)
 	KUNIT_ASSERT_PTR_EQ(test, man->bdev, priv->devs->ttm_dev);
 	KUNIT_ASSERT_EQ(test, man->size, size);
 	KUNIT_ASSERT_EQ(test, man->usage, 0);
-	KUNIT_ASSERT_NULL(test, man->move);
-	KUNIT_ASSERT_NOT_NULL(test, &man->move_lock);
+	for (i = 0; i < TTM_NUM_MOVE_FENCES; i++)
+		KUNIT_ASSERT_NULL(test, man->eviction_fences[i]);
 
 	for (int i = 0; i < TTM_MAX_BO_PRIORITY; ++i)
 		KUNIT_ASSERT_TRUE(test, list_empty(&man->lru[i]));
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index f4d9e68b21e7..0b3732ed6f6c 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -658,34 +658,35 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo)
 EXPORT_SYMBOL(ttm_bo_unpin);
 
 /*
- * Add the last move fence to the BO as kernel dependency and reserve a new
- * fence slot.
+ * Add the pipelined eviction fencesto the BO as kernel dependency and reserve new
+ * fence slots.
  */
-static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
-				 struct ttm_resource_manager *man,
-				 bool no_wait_gpu)
+static int ttm_bo_add_pipelined_eviction_fences(struct ttm_buffer_object *bo,
+						struct ttm_resource_manager *man,
+						bool no_wait_gpu)
 {
 	struct dma_fence *fence;
-	int ret;
+	int i;
 
-	spin_lock(&man->move_lock);
-	fence = dma_fence_get(man->move);
-	spin_unlock(&man->move_lock);
+	spin_lock(&man->eviction_lock);
+	for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) {
+		fence = man->eviction_fences[i];
+		if (!fence)
+			continue;
 
-	if (!fence)
-		return 0;
-
-	if (no_wait_gpu) {
-		ret = dma_fence_is_signaled(fence) ? 0 : -EBUSY;
-		dma_fence_put(fence);
-		return ret;
+		if (no_wait_gpu) {
+			if (!dma_fence_is_signaled(fence)) {
+				spin_unlock(&man->eviction_lock);
+				return -EBUSY;
+			}
+		} else {
+			dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL);
+		}
 	}
+	spin_unlock(&man->eviction_lock);
 
-	dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL);
-
-	ret = dma_resv_reserve_fences(bo->base.resv, 1);
-	dma_fence_put(fence);
-	return ret;
+	/* TODO: this call should be removed. */
+	return dma_resv_reserve_fences(bo->base.resv, 1);
 }
 
 /**
@@ -718,7 +719,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
 	int i, ret;
 
 	ticket = dma_resv_locking_ctx(bo->base.resv);
-	ret = dma_resv_reserve_fences(bo->base.resv, 1);
+	ret = dma_resv_reserve_fences(bo->base.resv, TTM_NUM_MOVE_FENCES);
 	if (unlikely(ret))
 		return ret;
 
@@ -757,7 +758,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
 				return ret;
 		}
 
-		ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
+		ret = ttm_bo_add_pipelined_eviction_fences(bo, man, ctx->no_wait_gpu);
 		if (unlikely(ret)) {
 			ttm_resource_free(bo, res);
 			if (ret == -EBUSY)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index acbbca9d5c92..2ff35d55e462 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -258,7 +258,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
 	ret = dma_resv_trylock(&fbo->base.base._resv);
 	WARN_ON(!ret);
 
-	ret = dma_resv_reserve_fences(&fbo->base.base._resv, 1);
+	ret = dma_resv_reserve_fences(&fbo->base.base._resv, TTM_NUM_MOVE_FENCES);
 	if (ret) {
 		dma_resv_unlock(&fbo->base.base._resv);
 		kfree(fbo);
@@ -646,20 +646,44 @@ static void ttm_bo_move_pipeline_evict(struct ttm_buffer_object *bo,
 {
 	struct ttm_device *bdev = bo->bdev;
 	struct ttm_resource_manager *from;
+	struct dma_fence *tmp;
+	int i;
 
 	from = ttm_manager_type(bdev, bo->resource->mem_type);
 
 	/**
 	 * BO doesn't have a TTM we need to bind/unbind. Just remember
-	 * this eviction and free up the allocation
+	 * this eviction and free up the allocation.
+	 * The fence will be saved in the first free slot or in the slot
+	 * already used to store a fence from the same context. Since
+	 * drivers can't use more than TTM_NUM_MOVE_FENCES contexts for
+	 * evictions we should always find a slot to use.
 	 */
-	spin_lock(&from->move_lock);
-	if (!from->move || dma_fence_is_later(fence, from->move)) {
-		dma_fence_put(from->move);
-		from->move = dma_fence_get(fence);
+	spin_lock(&from->eviction_lock);
+	for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) {
+		tmp = from->eviction_fences[i];
+		if (!tmp)
+			break;
+		if (fence->context != tmp->context)
+			continue;
+		if (dma_fence_is_later(fence, tmp)) {
+			dma_fence_put(tmp);
+			break;
+		}
+		goto unlock;
+	}
+	if (i < TTM_NUM_MOVE_FENCES) {
+		from->eviction_fences[i] = dma_fence_get(fence);
+	} else {
+		WARN(1, "not enough fence slots for all fence contexts");
+		spin_unlock(&from->eviction_lock);
+		dma_fence_wait(fence, false);
+		goto end;
 	}
-	spin_unlock(&from->move_lock);
 
+unlock:
+	spin_unlock(&from->eviction_lock);
+end:
 	ttm_resource_free(bo, &bo->resource);
 }
 
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index e2c82ad07eb4..62c34cafa387 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -523,14 +523,15 @@ void ttm_resource_manager_init(struct ttm_resource_manager *man,
 {
 	unsigned i;
 
-	spin_lock_init(&man->move_lock);
 	man->bdev = bdev;
 	man->size = size;
 	man->usage = 0;
 
 	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i)
 		INIT_LIST_HEAD(&man->lru[i]);
-	man->move = NULL;
+	spin_lock_init(&man->eviction_lock);
+	for (i = 0; i < TTM_NUM_MOVE_FENCES; i++)
+		man->eviction_fences[i] = NULL;
 }
 EXPORT_SYMBOL(ttm_resource_manager_init);
 
@@ -551,7 +552,7 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev,
 		.no_wait_gpu = false,
 	};
 	struct dma_fence *fence;
-	int ret;
+	int ret, i;
 
 	do {
 		ret = ttm_bo_evict_first(bdev, man, &ctx);
@@ -561,18 +562,24 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev,
 	if (ret && ret != -ENOENT)
 		return ret;
 
-	spin_lock(&man->move_lock);
-	fence = dma_fence_get(man->move);
-	spin_unlock(&man->move_lock);
+	ret = 0;
 
-	if (fence) {
-		ret = dma_fence_wait(fence, false);
-		dma_fence_put(fence);
-		if (ret)
-			return ret;
+	spin_lock(&man->eviction_lock);
+	for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) {
+		fence = man->eviction_fences[i];
+		if (fence && !dma_fence_is_signaled(fence)) {
+			dma_fence_get(fence);
+			spin_unlock(&man->eviction_lock);
+			ret = dma_fence_wait(fence, false);
+			dma_fence_put(fence);
+			if (ret)
+				return ret;
+			spin_lock(&man->eviction_lock);
+		}
 	}
+	spin_unlock(&man->eviction_lock);
 
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL(ttm_resource_manager_evict_all);
 
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
index f49daa504c36..50e6added509 100644
--- a/include/drm/ttm/ttm_resource.h
+++ b/include/drm/ttm/ttm_resource.h
@@ -50,6 +50,15 @@ struct io_mapping;
 struct sg_table;
 struct scatterlist;
 
+/**
+ * define TTM_NUM_MOVE_FENCES - How many entities can be used for evictions
+ *
+ * Pipelined evictions can be spread on multiple entities. This
+ * is the max number of entities that can be used by the driver
+ * for that purpose.
+ */
+#define TTM_NUM_MOVE_FENCES 8
+
 /**
  * enum ttm_lru_item_type - enumerate ttm_lru_item subclasses
  */
@@ -180,8 +189,8 @@ struct ttm_resource_manager_func {
  * @size: Size of the managed region.
  * @bdev: ttm device this manager belongs to
  * @func: structure pointer implementing the range manager. See above
- * @move_lock: lock for move fence
- * @move: The fence of the last pipelined move operation.
+ * @eviction_lock: lock for eviction fences
+ * @eviction_fences: The fences of the last pipelined move operation.
  * @lru: The lru list for this memory type.
  *
  * This structure is used to identify and manage memory types for a device.
@@ -195,12 +204,12 @@ struct ttm_resource_manager {
 	struct ttm_device *bdev;
 	uint64_t size;
 	const struct ttm_resource_manager_func *func;
-	spinlock_t move_lock;
 
-	/*
-	 * Protected by @move_lock.
+	/* This is very similar to a dma_resv object, but locking rules make
+	 * it difficult to use one in this context.
 	 */
-	struct dma_fence *move;
+	spinlock_t eviction_lock;
+	struct dma_fence *eviction_fences[TTM_NUM_MOVE_FENCES];
 
 	/*
 	 * Protected by the bdev->lru_lock.
@@ -421,8 +430,12 @@ static inline bool ttm_resource_manager_used(struct ttm_resource_manager *man)
 static inline void
 ttm_resource_manager_cleanup(struct ttm_resource_manager *man)
 {
-	dma_fence_put(man->move);
-	man->move = NULL;
+	int i;
+
+	for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) {
+		dma_fence_put(man->eviction_fences[i]);
+		man->eviction_fences[i] = NULL;
+	}
 }
 
 void ttm_lru_bulk_move_init(struct ttm_lru_bulk_move *bulk);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 20/28] drm/amdgpu: allocate multiple clear entities
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (18 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 19/28] drm/ttm: rework pipelined eviction fence handling Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 15:34   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 21/28] drm/amdgpu: allocate multiple move entities Pierre-Eric Pelloux-Prayer
                   ` (7 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

No functional change for now, as we always use entity 0.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 51 ++++++++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h    |  3 +-
 3 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2ee48f76483d..56663e82efef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1323,7 +1323,7 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
 		goto out;
 
 	r = amdgpu_fill_buffer(adev,
-			       &adev->mman.clear_entity, abo, 0, &bo->base._resv,
+			       &adev->mman.clear_entities[0], abo, 0, &bo->base._resv,
 			       &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
 	if (WARN_ON(r))
 		goto out;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 9024dde0c5a7..d7f041e43eca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2224,10 +2224,12 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 {
 	struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
 	u32 used_windows, reserved_windows;
+	u32 num_clear_entities;
 	uint64_t size;
-	int r;
+	int r, i, j;
 
-	reserved_windows = 3;
+	num_clear_entities = adev->sdma.num_instances;
+	reserved_windows = 2 + num_clear_entities;
 
 	if (!adev->mman.initialized || amdgpu_in_reset(adev) ||
 	    adev->mman.buffer_funcs_enabled == enable || adev->gmc.is_app_apu)
@@ -2250,21 +2252,11 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 					  1, NULL);
 		if (r) {
 			dev_err(adev->dev,
-				"Failed setting up TTM BO move entity (%d)\n",
+				"Failed setting up TTM BO eviction entity (%d)\n",
 				r);
 			return 0;
 		}
 
-		r = drm_sched_entity_init(&adev->mman.clear_entity.base,
-					  DRM_SCHED_PRIORITY_NORMAL, &sched,
-					  1, NULL);
-		if (r) {
-			dev_err(adev->dev,
-				"Failed setting up TTM BO clear entity (%d)\n",
-				r);
-			goto error_free_entity;
-		}
-
 		r = drm_sched_entity_init(&adev->mman.move_entity.base,
 					  DRM_SCHED_PRIORITY_NORMAL, &sched,
 					  1, NULL);
@@ -2272,26 +2264,48 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 			dev_err(adev->dev,
 				"Failed setting up TTM BO move entity (%d)\n",
 				r);
-			drm_sched_entity_destroy(&adev->mman.clear_entity.base);
 			goto error_free_entity;
 		}
 
+		adev->mman.num_clear_entities = num_clear_entities;
+		adev->mman.clear_entities = kcalloc(num_clear_entities,
+						    sizeof(struct amdgpu_ttm_buffer_entity),
+						    GFP_KERNEL);
+		if (!adev->mman.clear_entities)
+			goto error_free_entity;
+
+		for (i = 0; i < num_clear_entities; i++) {
+			r = drm_sched_entity_init(&adev->mman.clear_entities[i].base,
+						  DRM_SCHED_PRIORITY_NORMAL, &sched,
+						  1, NULL);
+			if (r) {
+				for (j = 0; j < i; j++)
+					drm_sched_entity_destroy(
+						&adev->mman.clear_entities[j].base);
+				kfree(adev->mman.clear_entities);
+				goto error_free_entity;
+			}
+		}
+
 		/* Statically assign GART windows to each entity. */
 		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.default_entity,
 							     0, false, false);
 		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.move_entity,
 							     used_windows, true, true);
-		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.clear_entity,
-							     used_windows, false, true);
+		for (i = 0; i < num_clear_entities; i++)
+			used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.clear_entities[i],
+								     used_windows, false, true);
 		WARN_ON(used_windows != reserved_windows);
 	} else {
 		drm_sched_entity_destroy(&adev->mman.default_entity.base);
-		drm_sched_entity_destroy(&adev->mman.clear_entity.base);
 		drm_sched_entity_destroy(&adev->mman.move_entity.base);
+		for (i = 0; i < num_clear_entities; i++)
+			drm_sched_entity_destroy(&adev->mman.clear_entities[i].base);
 		/* Drop all the old fences since re-creating the scheduler entities
 		 * will allocate new contexts.
 		 */
 		ttm_resource_manager_cleanup(man);
+		kfree(adev->mman.clear_entities);
 	}
 
 	/* this just adjusts TTM size idea, which sets lpfn to the correct value */
@@ -2456,8 +2470,7 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
 
 	if (!fence)
 		return -EINVAL;
-
-	entity = &adev->mman.clear_entity;
+	entity = &adev->mman.clear_entities[0];
 	*fence = dma_fence_get_stub();
 
 	amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 0b3b03f43bab..250ef54a5550 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -72,8 +72,9 @@ struct amdgpu_mman {
 	struct mutex				gtt_window_lock;
 
 	struct amdgpu_ttm_buffer_entity default_entity; /* has no gart windows */
-	struct amdgpu_ttm_buffer_entity clear_entity;
 	struct amdgpu_ttm_buffer_entity move_entity;
+	struct amdgpu_ttm_buffer_entity *clear_entities;
+	u32 num_clear_entities;
 
 	struct amdgpu_vram_mgr vram_mgr;
 	struct amdgpu_gtt_mgr gtt_mgr;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 21/28] drm/amdgpu: allocate multiple move entities
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (19 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 20/28] drm/amdgpu: allocate multiple clear entities Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 10:12 ` [PATCH v3 22/28] drm/amdgpu: round robin through clear_entities in amdgpu_fill_buffer Pierre-Eric Pelloux-Prayer
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Felix Kuehling
  Cc: Pierre-Eric Pelloux-Prayer, Felix Kuehling, amd-gfx, dri-devel,
	linux-kernel

No functional change for now, as we always use entity 0.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  | 45 +++++++++++++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |  3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  2 +-
 3 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index d7f041e43eca..438e8a3b7a06 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -439,7 +439,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
 	dst.offset = 0;
 
 	r = amdgpu_ttm_copy_mem_to_mem(adev,
-				       &adev->mman.move_entity,
+				       &adev->mman.move_entities[0],
 				       &src, &dst,
 				       new_mem->size,
 				       amdgpu_bo_encrypted(abo),
@@ -452,7 +452,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
 	    (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) {
 		struct dma_fence *wipe_fence = NULL;
 
-		r = amdgpu_fill_buffer(adev, &adev->mman.move_entity,
+		r = amdgpu_fill_buffer(adev, &adev->mman.move_entities[0],
 				       abo, 0, NULL, &wipe_fence,
 				       AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
 		if (r) {
@@ -2224,12 +2224,13 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 {
 	struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
 	u32 used_windows, reserved_windows;
-	u32 num_clear_entities;
+	u32 num_clear_entities, num_move_entities;
 	uint64_t size;
 	int r, i, j;
 
 	num_clear_entities = adev->sdma.num_instances;
-	reserved_windows = 2 + num_clear_entities;
+	num_move_entities = MIN(adev->sdma.num_instances, TTM_NUM_MOVE_FENCES);
+	reserved_windows = 2 * num_move_entities + num_clear_entities;
 
 	if (!adev->mman.initialized || amdgpu_in_reset(adev) ||
 	    adev->mman.buffer_funcs_enabled == enable || adev->gmc.is_app_apu)
@@ -2251,20 +2252,25 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 					  DRM_SCHED_PRIORITY_KERNEL, &sched,
 					  1, NULL);
 		if (r) {
-			dev_err(adev->dev,
-				"Failed setting up TTM BO eviction entity (%d)\n",
+			dev_err(adev->dev, "Failed setting up entity (%d)\n",
 				r);
 			return 0;
 		}
 
-		r = drm_sched_entity_init(&adev->mman.move_entity.base,
-					  DRM_SCHED_PRIORITY_NORMAL, &sched,
-					  1, NULL);
-		if (r) {
-			dev_err(adev->dev,
-				"Failed setting up TTM BO move entity (%d)\n",
-				r);
-			goto error_free_entity;
+		adev->mman.num_move_entities = num_move_entities;
+		for (i = 0; i < num_move_entities; i++) {
+			r = drm_sched_entity_init(&adev->mman.move_entities[i].base,
+						  DRM_SCHED_PRIORITY_NORMAL, &sched,
+						  1, NULL);
+			if (r) {
+				dev_err(adev->dev,
+					"Failed setting up TTM BO move entities (%d)\n",
+					r);
+				for (j = 0; j < i; j++)
+					drm_sched_entity_destroy(
+						&adev->mman.move_entities[j].base);
+				goto error_free_entity;
+			}
 		}
 
 		adev->mman.num_clear_entities = num_clear_entities;
@@ -2279,6 +2285,9 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 						  DRM_SCHED_PRIORITY_NORMAL, &sched,
 						  1, NULL);
 			if (r) {
+				for (j = 0; j < num_move_entities; j++)
+					drm_sched_entity_destroy(
+						&adev->mman.move_entities[j].base);
 				for (j = 0; j < i; j++)
 					drm_sched_entity_destroy(
 						&adev->mman.clear_entities[j].base);
@@ -2290,15 +2299,17 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 		/* Statically assign GART windows to each entity. */
 		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.default_entity,
 							     0, false, false);
-		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.move_entity,
-							     used_windows, true, true);
+		for (i = 0; i < num_move_entities; i++)
+			used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.move_entities[i],
+								     used_windows, true, true);
 		for (i = 0; i < num_clear_entities; i++)
 			used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.clear_entities[i],
 								     used_windows, false, true);
 		WARN_ON(used_windows != reserved_windows);
 	} else {
 		drm_sched_entity_destroy(&adev->mman.default_entity.base);
-		drm_sched_entity_destroy(&adev->mman.move_entity.base);
+		for (i = 0; i < num_move_entities; i++)
+			drm_sched_entity_destroy(&adev->mman.move_entities[i].base);
 		for (i = 0; i < num_clear_entities; i++)
 			drm_sched_entity_destroy(&adev->mman.clear_entities[i].base);
 		/* Drop all the old fences since re-creating the scheduler entities
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 250ef54a5550..eabc5a1549e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -72,9 +72,10 @@ struct amdgpu_mman {
 	struct mutex				gtt_window_lock;
 
 	struct amdgpu_ttm_buffer_entity default_entity; /* has no gart windows */
-	struct amdgpu_ttm_buffer_entity move_entity;
 	struct amdgpu_ttm_buffer_entity *clear_entities;
 	u32 num_clear_entities;
+	struct amdgpu_ttm_buffer_entity move_entities[TTM_NUM_MOVE_FENCES];
+	u32 num_move_entities;
 
 	struct amdgpu_vram_mgr vram_mgr;
 	struct amdgpu_gtt_mgr gtt_mgr;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 0cc1d2b35026..5dd65f05a1e0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -135,7 +135,7 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys,
 	u64 size;
 	int r;
 
-	entity = &adev->mman.move_entity;
+	entity = &adev->mman.move_entities[0];
 
 	mutex_lock(&entity->lock);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 22/28] drm/amdgpu: round robin through clear_entities in amdgpu_fill_buffer
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (20 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 21/28] drm/amdgpu: allocate multiple move entities Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 15:36   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 23/28] drm/amdgpu: use TTM_NUM_MOVE_FENCES when reserving fences Pierre-Eric Pelloux-Prayer
                   ` (5 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

This makes clear of different BOs run in parallel. Partial jobs to
clear a single BO still execute sequentially.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 12 ++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h    |  2 ++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 56663e82efef..7d8d70135cc2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1322,8 +1322,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
 	if (r)
 		goto out;
 
-	r = amdgpu_fill_buffer(adev,
-			       &adev->mman.clear_entities[0], abo, 0, &bo->base._resv,
+	r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev),
+			       abo, 0, &bo->base._resv,
 			       &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
 	if (WARN_ON(r))
 		goto out;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 438e8a3b7a06..8d70bea66dd0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2277,6 +2277,7 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 		adev->mman.clear_entities = kcalloc(num_clear_entities,
 						    sizeof(struct amdgpu_ttm_buffer_entity),
 						    GFP_KERNEL);
+		atomic_set(&adev->mman.next_clear_entity, 0);
 		if (!adev->mman.clear_entities)
 			goto error_free_entity;
 
@@ -2576,6 +2577,17 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
 	return r;
 }
 
+struct amdgpu_ttm_buffer_entity *
+amdgpu_ttm_next_clear_entity(struct amdgpu_device *adev)
+{
+	struct amdgpu_mman *mman = &adev->mman;
+	int i;
+
+	i = atomic_inc_return(&mman->next_clear_entity) %
+			      mman->num_clear_entities;
+	return &mman->clear_entities[i];
+}
+
 /**
  * amdgpu_ttm_evict_resources - evict memory buffers
  * @adev: amdgpu device object
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index eabc5a1549e9..887531126d9d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -73,6 +73,7 @@ struct amdgpu_mman {
 
 	struct amdgpu_ttm_buffer_entity default_entity; /* has no gart windows */
 	struct amdgpu_ttm_buffer_entity *clear_entities;
+	atomic_t next_clear_entity;
 	u32 num_clear_entities;
 	struct amdgpu_ttm_buffer_entity move_entities[TTM_NUM_MOVE_FENCES];
 	u32 num_move_entities;
@@ -189,6 +190,7 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
 		       struct dma_resv *resv,
 		       struct dma_fence **f,
 		       u64 k_job_id);
+struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct amdgpu_device *adev);
 
 int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
 void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 23/28] drm/amdgpu: use TTM_NUM_MOVE_FENCES when reserving fences
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (21 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 22/28] drm/amdgpu: round robin through clear_entities in amdgpu_fill_buffer Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 15:41   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 24/28] drm/amdgpu: use multiple entities in amdgpu_move_blit Pierre-Eric Pelloux-Prayer
                   ` (4 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Felix Kuehling, Harry Wentland, Leo Li, Rodrigo Siqueira
  Cc: Pierre-Eric Pelloux-Prayer, Felix Kuehling, amd-gfx, dri-devel,
	linux-kernel

Use TTM_NUM_MOVE_FENCES as an upperbound of how many fences
ttm might need to deal with moves/evictions.

---
v2: removed drm_err calls
---

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c                  | 5 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c                 | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c                | 6 ++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c                  | 3 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c                    | 3 +--
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 6 ++----
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c    | 6 ++----
 7 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d591dce0f3b3..5215238f8fc9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -916,9 +916,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 			goto out_free_user_pages;
 
 		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
-			/* One fence for TTM and one for each CS job */
 			r = drm_exec_prepare_obj(&p->exec, &e->bo->tbo.base,
-						 1 + p->gang_size);
+						 TTM_NUM_MOVE_FENCES + p->gang_size);
 			drm_exec_retry_on_contention(&p->exec);
 			if (unlikely(r))
 				goto out_free_user_pages;
@@ -928,7 +927,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 
 		if (p->uf_bo) {
 			r = drm_exec_prepare_obj(&p->exec, &p->uf_bo->tbo.base,
-						 1 + p->gang_size);
+						 TTM_NUM_MOVE_FENCES + p->gang_size);
 			drm_exec_retry_on_contention(&p->exec);
 			if (unlikely(r))
 				goto out_free_user_pages;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index f2505ae5fd65..41fd84a19d66 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -354,7 +354,7 @@ static void amdgpu_gem_object_close(struct drm_gem_object *obj,
 
 	drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
 	drm_exec_until_all_locked(&exec) {
-		r = drm_exec_prepare_obj(&exec, &bo->tbo.base, 1);
+		r = drm_exec_prepare_obj(&exec, &bo->tbo.base, TTM_NUM_MOVE_FENCES);
 		drm_exec_retry_on_contention(&exec);
 		if (unlikely(r))
 			goto out_unlock;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
index 79bad9cbe2ab..b92561eea3da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
@@ -326,11 +326,9 @@ static int amdgpu_vkms_prepare_fb(struct drm_plane *plane,
 		return r;
 	}
 
-	r = dma_resv_reserve_fences(rbo->tbo.base.resv, 1);
-	if (r) {
-		dev_err(adev->dev, "allocating fence slot failed (%d)\n", r);
+	r = dma_resv_reserve_fences(rbo->tbo.base.resv, TTM_NUM_MOVE_FENCES);
+	if (r)
 		goto error_unlock;
-	}
 
 	if (plane->type != DRM_PLANE_TYPE_CURSOR)
 		domain = amdgpu_display_supported_domains(adev, rbo->flags);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 5061d5b0f875..62f37a9ca966 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2656,7 +2656,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	}
 
 	amdgpu_vm_bo_base_init(&vm->root, vm, root_bo);
-	r = dma_resv_reserve_fences(root_bo->tbo.base.resv, 1);
+	r = dma_resv_reserve_fences(root_bo->tbo.base.resv,
+				    TTM_NUM_MOVE_FENCES);
 	if (r)
 		goto error_free_root;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 97c2270f278f..0f8d85ee97fc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -627,9 +627,8 @@ svm_range_vram_node_new(struct kfd_node *node, struct svm_range *prange,
 		}
 	}
 
-	r = dma_resv_reserve_fences(bo->tbo.base.resv, 1);
+	r = dma_resv_reserve_fences(bo->tbo.base.resv, TTM_NUM_MOVE_FENCES);
 	if (r) {
-		pr_debug("failed %d to reserve bo\n", r);
 		amdgpu_bo_unreserve(bo);
 		goto reserve_bo_failed;
 	}
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 56cb866ac6f8..ceb55dd183ed 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -952,11 +952,9 @@ static int amdgpu_dm_plane_helper_prepare_fb(struct drm_plane *plane,
 		return r;
 	}
 
-	r = dma_resv_reserve_fences(rbo->tbo.base.resv, 1);
-	if (r) {
-		drm_err(adev_to_drm(adev), "reserving fence slot failed (%d)\n", r);
+	r = dma_resv_reserve_fences(rbo->tbo.base.resv, TTM_NUM_MOVE_FENCES);
+	if (r)
 		goto error_unlock;
-	}
 
 	if (plane->type != DRM_PLANE_TYPE_CURSOR)
 		domain = amdgpu_display_supported_domains(adev, rbo->flags);
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c
index d9527c05fc87..110f0173eee6 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c
@@ -106,11 +106,9 @@ static int amdgpu_dm_wb_prepare_job(struct drm_writeback_connector *wb_connector
 		return r;
 	}
 
-	r = dma_resv_reserve_fences(rbo->tbo.base.resv, 1);
-	if (r) {
-		drm_err(adev_to_drm(adev), "reserving fence slot failed (%d)\n", r);
+	r = dma_resv_reserve_fences(rbo->tbo.base.resv, TTM_NUM_MOVE_FENCES);
+	if (r)
 		goto error_unlock;
-	}
 
 	domain = amdgpu_display_supported_domains(adev, rbo->flags);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 24/28] drm/amdgpu: use multiple entities in amdgpu_move_blit
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (22 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 23/28] drm/amdgpu: use TTM_NUM_MOVE_FENCES when reserving fences Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 10:12 ` [PATCH v3 25/28] drm/amdgpu: pass all the sdma scheds to amdgpu_mman Pierre-Eric Pelloux-Prayer
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

Thanks to "drm/ttm: rework pipelined eviction fence handling", ttm
can deal correctly with moves and evictions being executed from
different contexts.

Create several entities and use them in a round-robin fashion.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 10 ++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 8d70bea66dd0..575a4d4a1747 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -427,6 +427,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
 {
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
 	struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
+	struct amdgpu_ttm_buffer_entity *entity;
 	struct amdgpu_copy_mem src, dst;
 	struct dma_fence *fence = NULL;
 	int r;
@@ -438,8 +439,12 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
 	src.offset = 0;
 	dst.offset = 0;
 
+	int e = atomic_inc_return(&adev->mman.next_move_entity) %
+				  adev->mman.num_move_entities;
+	entity = &adev->mman.move_entities[e];
+
 	r = amdgpu_ttm_copy_mem_to_mem(adev,
-				       &adev->mman.move_entities[0],
+				       entity,
 				       &src, &dst,
 				       new_mem->size,
 				       amdgpu_bo_encrypted(abo),
@@ -452,7 +457,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
 	    (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) {
 		struct dma_fence *wipe_fence = NULL;
 
-		r = amdgpu_fill_buffer(adev, &adev->mman.move_entities[0],
+		r = amdgpu_fill_buffer(adev, entity,
 				       abo, 0, NULL, &wipe_fence,
 				       AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
 		if (r) {
@@ -2258,6 +2263,7 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 		}
 
 		adev->mman.num_move_entities = num_move_entities;
+		atomic_set(&adev->mman.next_move_entity, 0);
 		for (i = 0; i < num_move_entities; i++) {
 			r = drm_sched_entity_init(&adev->mman.move_entities[i].base,
 						  DRM_SCHED_PRIORITY_NORMAL, &sched,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 887531126d9d..0785a2c594f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -76,6 +76,7 @@ struct amdgpu_mman {
 	atomic_t next_clear_entity;
 	u32 num_clear_entities;
 	struct amdgpu_ttm_buffer_entity move_entities[TTM_NUM_MOVE_FENCES];
+	atomic_t next_move_entity;
 	u32 num_move_entities;
 
 	struct amdgpu_vram_mgr vram_mgr;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 25/28] drm/amdgpu: pass all the sdma scheds to amdgpu_mman
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (23 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 24/28] drm/amdgpu: use multiple entities in amdgpu_move_blit Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 15:52   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 26/28] drm/amdgpu: give ttm entities access to all the sdma scheds Pierre-Eric Pelloux-Prayer
                   ` (2 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Felix Kuehling
  Cc: Pierre-Eric Pelloux-Prayer, Felix Kuehling, amd-gfx, dri-devel,
	linux-kernel

This will allow the use of all of them for clear/fill buffer
operations.
Since drm_sched_entity_init requires a scheduler array, we
store schedulers rather than rings. For the few places that need
access to a ring, we can get it from the sched using container_of.

Since the code is the same for all sdma versions, add a new
helper amdgpu_sdma_set_buffer_funcs_scheds to set buffer_funcs_scheds
based on the number of sdma instances.

Note: the new sched array is identical to the amdgpu_vm_manager one.
These 2 could be merged.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c    |  4 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 31 +++++++++++++++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h    |  3 ++-
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c      |  3 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c     |  3 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c     |  3 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c     |  6 +----
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c   |  6 +----
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c     |  6 ++---
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c     |  6 ++---
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c     |  3 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c     |  3 +--
 drivers/gpu/drm/amd/amdgpu/si_dma.c        |  3 +--
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c   |  3 ++-
 16 files changed, 47 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index a50e3c0a4b18..d07075fe2d8c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1614,6 +1614,8 @@ ssize_t amdgpu_get_soft_full_reset_mask(struct amdgpu_ring *ring);
 ssize_t amdgpu_show_reset_mask(char *buf, uint32_t supported_reset);
 void amdgpu_sdma_set_vm_pte_scheds(struct amdgpu_device *adev,
 				   const struct amdgpu_vm_pte_funcs *vm_pte_funcs);
+void amdgpu_sdma_set_buffer_funcs_scheds(struct amdgpu_device *adev,
+					 const struct amdgpu_buffer_funcs *buffer_funcs);
 
 /* atpx handler */
 #if defined(CONFIG_VGA_SWITCHEROO)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7167db54d722..9d3931d31d96 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4527,7 +4527,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	adev->num_rings = 0;
 	RCU_INIT_POINTER(adev->gang_submit, dma_fence_get_stub());
 	adev->mman.buffer_funcs = NULL;
-	adev->mman.buffer_funcs_ring = NULL;
+	adev->mman.num_buffer_funcs_scheds = 0;
 	adev->vm_manager.vm_pte_funcs = NULL;
 	adev->vm_manager.vm_pte_num_scheds = 0;
 	adev->gmc.gmc_funcs = NULL;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 0d2784fe0be3..ff9a066870f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -651,12 +651,14 @@ int amdgpu_gmc_allocate_vm_inv_eng(struct amdgpu_device *adev)
 void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
 			      uint32_t vmhub, uint32_t flush_type)
 {
-	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
+	struct amdgpu_ring *ring;
 	struct amdgpu_vmhub *hub = &adev->vmhub[vmhub];
 	struct dma_fence *fence;
 	struct amdgpu_job *job;
 	int r;
 
+	ring = to_amdgpu_ring(adev->mman.buffer_funcs_scheds[0]);
+
 	if (!hub->sdma_invalidation_workaround || vmid ||
 	    !adev->mman.buffer_funcs_enabled || !adev->ib_pool_ready ||
 	    !ring->sched.ready) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 575a4d4a1747..eec0cab8060c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -168,7 +168,7 @@ amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_ttm_buffer_entit
 {
 	struct amdgpu_ring *ring;
 
-	ring = adev->mman.buffer_funcs_ring;
+	ring = to_amdgpu_ring(adev->mman.buffer_funcs_scheds[0]);
 	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
 	WARN_ON(job->ibs[0].length_dw > num_dw);
 
@@ -2241,18 +2241,16 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 	    adev->mman.buffer_funcs_enabled == enable || adev->gmc.is_app_apu)
 		return reserved_windows;
 
-	if ((!adev->mman.buffer_funcs_ring || !adev->mman.buffer_funcs_ring->sched.ready) &&
+	if ((!adev->mman.num_buffer_funcs_scheds || !adev->mman.buffer_funcs_scheds[0]->ready) &&
 	    enable) {
 		dev_warn(adev->dev, "Not enabling DMA transfers for in kernel use");
 		return 0;
 	}
 
 	if (enable) {
-		struct amdgpu_ring *ring;
 		struct drm_gpu_scheduler *sched;
 
-		ring = adev->mman.buffer_funcs_ring;
-		sched = &ring->sched;
+		sched = adev->mman.buffer_funcs_scheds[0];
 		r = drm_sched_entity_init(&adev->mman.default_entity.base,
 					  DRM_SCHED_PRIORITY_KERNEL, &sched,
 					  1, NULL);
@@ -2387,7 +2385,7 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev,
 	unsigned int i;
 	int r;
 
-	ring = adev->mman.buffer_funcs_ring;
+	ring = to_amdgpu_ring(adev->mman.buffer_funcs_scheds[0]);
 
 	if (!ring->sched.ready) {
 		dev_err(adev->dev,
@@ -2624,6 +2622,27 @@ int amdgpu_ttm_evict_resources(struct amdgpu_device *adev, int mem_type)
 	return ttm_resource_manager_evict_all(&adev->mman.bdev, man);
 }
 
+void amdgpu_sdma_set_buffer_funcs_scheds(struct amdgpu_device *adev,
+					 const struct amdgpu_buffer_funcs *buffer_funcs)
+{
+	struct amdgpu_vmhub *hub = &adev->vmhub[AMDGPU_GFXHUB(0)];
+	struct drm_gpu_scheduler *sched;
+	int i;
+
+	adev->mman.buffer_funcs = buffer_funcs;
+
+	for (i = 0; i < adev->sdma.num_instances; i++) {
+		if (adev->sdma.has_page_queue)
+			sched = &adev->sdma.instance[i].page.sched;
+		else
+			sched = &adev->sdma.instance[i].ring.sched;
+		adev->mman.buffer_funcs_scheds[i] = sched;
+	}
+
+	adev->mman.num_buffer_funcs_scheds = hub->sdma_invalidation_workaround ?
+		1 : adev->sdma.num_instances;
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 static int amdgpu_ttm_page_pool_show(struct seq_file *m, void *unused)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 0785a2c594f7..653a4d17543e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -66,7 +66,8 @@ struct amdgpu_mman {
 
 	/* buffer handling */
 	const struct amdgpu_buffer_funcs	*buffer_funcs;
-	struct amdgpu_ring			*buffer_funcs_ring;
+	struct drm_gpu_scheduler		*buffer_funcs_scheds[AMDGPU_MAX_RINGS];
+	u32					num_buffer_funcs_scheds;
 	bool					buffer_funcs_enabled;
 
 	struct mutex				gtt_window_lock;
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
index 22780c09177d..26276dcfd458 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
@@ -1340,8 +1340,7 @@ static const struct amdgpu_buffer_funcs cik_sdma_buffer_funcs = {
 
 static void cik_sdma_set_buffer_funcs(struct amdgpu_device *adev)
 {
-	adev->mman.buffer_funcs = &cik_sdma_buffer_funcs;
-	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
+	amdgpu_sdma_set_buffer_funcs_scheds(adev, &cik_sdma_buffer_funcs);
 }
 
 const struct amdgpu_ip_block_version cik_sdma_ip_block =
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
index 0090ace49024..c6a059ca59e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
@@ -1235,8 +1235,7 @@ static const struct amdgpu_buffer_funcs sdma_v2_4_buffer_funcs = {
 
 static void sdma_v2_4_set_buffer_funcs(struct amdgpu_device *adev)
 {
-	adev->mman.buffer_funcs = &sdma_v2_4_buffer_funcs;
-	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
+	amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v2_4_buffer_funcs);
 }
 
 const struct amdgpu_ip_block_version sdma_v2_4_ip_block = {
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
index 2526d393162a..cb516a25210d 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
@@ -1677,8 +1677,7 @@ static const struct amdgpu_buffer_funcs sdma_v3_0_buffer_funcs = {
 
 static void sdma_v3_0_set_buffer_funcs(struct amdgpu_device *adev)
 {
-	adev->mman.buffer_funcs = &sdma_v3_0_buffer_funcs;
-	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
+	amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v3_0_buffer_funcs);
 }
 
 const struct amdgpu_ip_block_version sdma_v3_0_ip_block =
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index a35d9951e22a..f234ee54f39e 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -2615,11 +2615,7 @@ static const struct amdgpu_buffer_funcs sdma_v4_0_buffer_funcs = {
 
 static void sdma_v4_0_set_buffer_funcs(struct amdgpu_device *adev)
 {
-	adev->mman.buffer_funcs = &sdma_v4_0_buffer_funcs;
-	if (adev->sdma.has_page_queue)
-		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].page;
-	else
-		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
+	amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v4_0_buffer_funcs);
 }
 
 static void sdma_v4_0_get_ras_error_count(uint32_t value,
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
index 7f77367848d4..cd7627b03066 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
@@ -2316,11 +2316,7 @@ static const struct amdgpu_buffer_funcs sdma_v4_4_2_buffer_funcs = {
 
 static void sdma_v4_4_2_set_buffer_funcs(struct amdgpu_device *adev)
 {
-	adev->mman.buffer_funcs = &sdma_v4_4_2_buffer_funcs;
-	if (adev->sdma.has_page_queue)
-		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].page;
-	else
-		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
+	amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v4_4_2_buffer_funcs);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 7ce13c5d4e61..5b495fda4f71 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -2073,10 +2073,8 @@ static const struct amdgpu_buffer_funcs sdma_v5_0_buffer_funcs = {
 
 static void sdma_v5_0_set_buffer_funcs(struct amdgpu_device *adev)
 {
-	if (adev->mman.buffer_funcs == NULL) {
-		adev->mman.buffer_funcs = &sdma_v5_0_buffer_funcs;
-		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
-	}
+	if (adev->mman.buffer_funcs == NULL)
+		amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v5_0_buffer_funcs);
 }
 
 const struct amdgpu_ip_block_version sdma_v5_0_ip_block = {
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
index 98beff18cf28..be2d9e57c459 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
@@ -2084,10 +2084,8 @@ static const struct amdgpu_buffer_funcs sdma_v5_2_buffer_funcs = {
 
 static void sdma_v5_2_set_buffer_funcs(struct amdgpu_device *adev)
 {
-	if (adev->mman.buffer_funcs == NULL) {
-		adev->mman.buffer_funcs = &sdma_v5_2_buffer_funcs;
-		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
-	}
+	if (adev->mman.buffer_funcs == NULL)
+		amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v5_2_buffer_funcs);
 }
 
 const struct amdgpu_ip_block_version sdma_v5_2_ip_block = {
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index c32331b72ba0..ed8937fe76ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1891,8 +1891,7 @@ static const struct amdgpu_buffer_funcs sdma_v6_0_buffer_funcs = {
 
 static void sdma_v6_0_set_buffer_funcs(struct amdgpu_device *adev)
 {
-	adev->mman.buffer_funcs = &sdma_v6_0_buffer_funcs;
-	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
+	amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v6_0_buffer_funcs);
 }
 
 const struct amdgpu_ip_block_version sdma_v6_0_ip_block = {
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
index 9318d23eb71e..f4c91153542c 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
@@ -1833,8 +1833,7 @@ static const struct amdgpu_buffer_funcs sdma_v7_0_buffer_funcs = {
 
 static void sdma_v7_0_set_buffer_funcs(struct amdgpu_device *adev)
 {
-	adev->mman.buffer_funcs = &sdma_v7_0_buffer_funcs;
-	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
+	amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v7_0_buffer_funcs);
 }
 
 const struct amdgpu_ip_block_version sdma_v7_0_ip_block = {
diff --git a/drivers/gpu/drm/amd/amdgpu/si_dma.c b/drivers/gpu/drm/amd/amdgpu/si_dma.c
index b85df997ed49..ac6272fcffe9 100644
--- a/drivers/gpu/drm/amd/amdgpu/si_dma.c
+++ b/drivers/gpu/drm/amd/amdgpu/si_dma.c
@@ -833,8 +833,7 @@ static const struct amdgpu_buffer_funcs si_dma_buffer_funcs = {
 
 static void si_dma_set_buffer_funcs(struct amdgpu_device *adev)
 {
-	adev->mman.buffer_funcs = &si_dma_buffer_funcs;
-	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
+	amdgpu_sdma_set_buffer_funcs_scheds(adev, &si_dma_buffer_funcs);
 }
 
 const struct amdgpu_ip_block_version si_dma_ip_block =
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 5dd65f05a1e0..a149265e3611 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -128,13 +128,14 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys,
 			     struct dma_fence **mfence)
 {
 	const u64 GTT_MAX_PAGES = AMDGPU_GTT_MAX_TRANSFER_SIZE;
-	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
+	struct amdgpu_ring *ring;
 	struct amdgpu_ttm_buffer_entity *entity;
 	u64 gart_s, gart_d;
 	struct dma_fence *next;
 	u64 size;
 	int r;
 
+	ring = to_amdgpu_ring(adev->mman.buffer_funcs_scheds[0]);
 	entity = &adev->mman.move_entities[0];
 
 	mutex_lock(&entity->lock);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 26/28] drm/amdgpu: give ttm entities access to all the sdma scheds
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (24 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 25/28] drm/amdgpu: pass all the sdma scheds to amdgpu_mman Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 10:12 ` [PATCH v3 27/28] drm/amdgpu: get rid of amdgpu_ttm_clear_buffer Pierre-Eric Pelloux-Prayer
  2025-11-21 10:12 ` [PATCH v3 28/28] drm/amdgpu: rename amdgpu_fill_buffer as amdgpu_ttm_clear_buffer Pierre-Eric Pelloux-Prayer
  27 siblings, 0 replies; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel

With this change we now have as many clear and move entities as we
have sdma engines (limited to TTM_NUM_MOVE_FENCES).

To enable load-balancing this patch gives access to all sdma
schedulers to all entities.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index eec0cab8060c..39cfe2dbdf03 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2233,8 +2233,8 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 	uint64_t size;
 	int r, i, j;
 
-	num_clear_entities = adev->sdma.num_instances;
-	num_move_entities = MIN(adev->sdma.num_instances, TTM_NUM_MOVE_FENCES);
+	num_clear_entities = MIN(adev->mman.num_buffer_funcs_scheds, TTM_NUM_MOVE_FENCES);
+	num_move_entities = MIN(adev->mman.num_buffer_funcs_scheds, TTM_NUM_MOVE_FENCES);
 	reserved_windows = 2 * num_move_entities + num_clear_entities;
 
 	if (!adev->mman.initialized || amdgpu_in_reset(adev) ||
@@ -2248,11 +2248,8 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 	}
 
 	if (enable) {
-		struct drm_gpu_scheduler *sched;
-
-		sched = adev->mman.buffer_funcs_scheds[0];
 		r = drm_sched_entity_init(&adev->mman.default_entity.base,
-					  DRM_SCHED_PRIORITY_KERNEL, &sched,
+					  DRM_SCHED_PRIORITY_KERNEL, adev->mman.buffer_funcs_scheds,
 					  1, NULL);
 		if (r) {
 			dev_err(adev->dev, "Failed setting up entity (%d)\n",
@@ -2264,8 +2261,9 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 		atomic_set(&adev->mman.next_move_entity, 0);
 		for (i = 0; i < num_move_entities; i++) {
 			r = drm_sched_entity_init(&adev->mman.move_entities[i].base,
-						  DRM_SCHED_PRIORITY_NORMAL, &sched,
-						  1, NULL);
+						  DRM_SCHED_PRIORITY_NORMAL,
+						  adev->mman.buffer_funcs_scheds,
+						  adev->mman.num_buffer_funcs_scheds, NULL);
 			if (r) {
 				dev_err(adev->dev,
 					"Failed setting up TTM BO move entities (%d)\n",
@@ -2287,8 +2285,9 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
 
 		for (i = 0; i < num_clear_entities; i++) {
 			r = drm_sched_entity_init(&adev->mman.clear_entities[i].base,
-						  DRM_SCHED_PRIORITY_NORMAL, &sched,
-						  1, NULL);
+						  DRM_SCHED_PRIORITY_NORMAL,
+						  adev->mman.buffer_funcs_scheds,
+						  adev->mman.num_buffer_funcs_scheds, NULL);
 			if (r) {
 				for (j = 0; j < num_move_entities; j++)
 					drm_sched_entity_destroy(
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 27/28] drm/amdgpu: get rid of amdgpu_ttm_clear_buffer
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (25 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 26/28] drm/amdgpu: give ttm entities access to all the sdma scheds Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  2025-11-21 15:54   ` Christian König
  2025-11-21 10:12 ` [PATCH v3 28/28] drm/amdgpu: rename amdgpu_fill_buffer as amdgpu_ttm_clear_buffer Pierre-Eric Pelloux-Prayer
  27 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Sumit Semwal
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel,
	linux-media, linaro-mm-sig

It's doing the same thing as amdgpu_fill_buffer(src_data=0), so drop it.

The only caveat is that amdgpu_res_cleared() return value is only valid
right after allocation.

---
v2: introduce new "bool consider_clear_status" arg
---

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 90 +++++-----------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h    |  7 +-
 3 files changed, 33 insertions(+), 80 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 7d8d70135cc2..dccc31d0128e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -725,13 +725,17 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 	    bo->tbo.resource->mem_type == TTM_PL_VRAM) {
 		struct dma_fence *fence;
 
-		r = amdgpu_ttm_clear_buffer(adev, bo, bo->tbo.base.resv, &fence);
+		r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev),
+				       bo, 0, NULL, &fence,
+				       true, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
 		if (unlikely(r))
 			goto fail_unreserve;
 
-		dma_resv_add_fence(bo->tbo.base.resv, fence,
-				   DMA_RESV_USAGE_KERNEL);
-		dma_fence_put(fence);
+		if (fence) {
+			dma_resv_add_fence(bo->tbo.base.resv, fence,
+					   DMA_RESV_USAGE_KERNEL);
+			dma_fence_put(fence);
+		}
 	}
 	if (!bp->resv)
 		amdgpu_bo_unreserve(bo);
@@ -1323,8 +1327,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
 		goto out;
 
 	r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev),
-			       abo, 0, &bo->base._resv,
-			       &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
+			       abo, 0, &bo->base._resv, &fence,
+			       false, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
 	if (WARN_ON(r))
 		goto out;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 39cfe2dbdf03..c65c411ce26e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -459,7 +459,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
 
 		r = amdgpu_fill_buffer(adev, entity,
 				       abo, 0, NULL, &wipe_fence,
-				       AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
+				       false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
 		if (r) {
 			goto error;
 		} else if (wipe_fence) {
@@ -2459,79 +2459,28 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
 }
 
 /**
- * amdgpu_ttm_clear_buffer - clear memory buffers
+ * amdgpu_fill_buffer - fill a buffer with a given value
  * @adev: amdgpu device object
- * @bo: amdgpu buffer object
- * @resv: reservation object
- * @fence: dma_fence associated with the operation
+ * @entity: optional entity to use. If NULL, the clearing entities will be
+ *          used to load-balance the partial clears
+ * @bo: the bo to fill
+ * @src_data: the value to set
+ * @resv: fences contained in this reservation will be used as dependencies.
+ * @out_fence: the fence from the last clear will be stored here. It might be
+ *             NULL if no job was run.
+ * @dependency: optional input dependency fence.
+ * @consider_clear_status: true if region reported as cleared by amdgpu_res_cleared()
+ *                         are skipped.
+ * @k_job_id: trace id
  *
- * Clear the memory buffer resource.
- *
- * Returns:
- * 0 for success or a negative error code on failure.
  */
-int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
-			    struct amdgpu_bo *bo,
-			    struct dma_resv *resv,
-			    struct dma_fence **fence)
-{
-	struct amdgpu_ttm_buffer_entity *entity;
-	struct amdgpu_res_cursor cursor;
-	u64 addr;
-	int r = 0;
-
-	if (!adev->mman.buffer_funcs_enabled)
-		return -EINVAL;
-
-	if (!fence)
-		return -EINVAL;
-	entity = &adev->mman.clear_entities[0];
-	*fence = dma_fence_get_stub();
-
-	amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor);
-
-	mutex_lock(&entity->lock);
-	while (cursor.remaining) {
-		struct dma_fence *next = NULL;
-		u64 size;
-
-		if (amdgpu_res_cleared(&cursor)) {
-			amdgpu_res_next(&cursor, cursor.size);
-			continue;
-		}
-
-		/* Never clear more than 256MiB at once to avoid timeouts */
-		size = min(cursor.size, 256ULL << 20);
-
-		r = amdgpu_ttm_map_buffer(adev, entity,
-					  &bo->tbo, bo->tbo.resource, &cursor,
-					  1, false, false, &size, &addr);
-		if (r)
-			goto err;
-
-		r = amdgpu_ttm_fill_mem(adev, entity, 0, addr, size, resv,
-					&next, true,
-					AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
-		if (r)
-			goto err;
-
-		dma_fence_put(*fence);
-		*fence = next;
-
-		amdgpu_res_next(&cursor, size);
-	}
-err:
-	mutex_unlock(&entity->lock);
-
-	return r;
-}
-
 int amdgpu_fill_buffer(struct amdgpu_device *adev,
 		       struct amdgpu_ttm_buffer_entity *entity,
 		       struct amdgpu_bo *bo,
 		       uint32_t src_data,
 		       struct dma_resv *resv,
-		       struct dma_fence **f,
+		       struct dma_fence **out_fence,
+		       bool consider_clear_status,
 		       u64 k_job_id)
 {
 	struct dma_fence *fence = NULL;
@@ -2551,6 +2500,11 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
 		struct dma_fence *next;
 		uint64_t cur_size, to;
 
+		if (consider_clear_status && amdgpu_res_cleared(&dst)) {
+			amdgpu_res_next(&dst, dst.size);
+			continue;
+		}
+
 		/* Never fill more than 256MiB at once to avoid timeouts */
 		cur_size = min(dst.size, 256ULL << 20);
 
@@ -2574,9 +2528,7 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
 	}
 error:
 	mutex_unlock(&entity->lock);
-	if (f)
-		*f = dma_fence_get(fence);
-	dma_fence_put(fence);
+	*out_fence = fence;
 	return r;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 653a4d17543e..f3bdbcec9afc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -181,16 +181,13 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev,
 		       struct dma_resv *resv,
 		       struct dma_fence **fence,
 		       bool vm_needs_flush, uint32_t copy_flags);
-int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
-			    struct amdgpu_bo *bo,
-			    struct dma_resv *resv,
-			    struct dma_fence **fence);
 int amdgpu_fill_buffer(struct amdgpu_device *adev,
 		       struct amdgpu_ttm_buffer_entity *entity,
 		       struct amdgpu_bo *bo,
 		       uint32_t src_data,
 		       struct dma_resv *resv,
-		       struct dma_fence **f,
+		       struct dma_fence **out_fence,
+		       bool consider_clear_status,
 		       u64 k_job_id);
 struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct amdgpu_device *adev);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 28/28] drm/amdgpu: rename amdgpu_fill_buffer as amdgpu_ttm_clear_buffer
  2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
                   ` (26 preceding siblings ...)
  2025-11-21 10:12 ` [PATCH v3 27/28] drm/amdgpu: get rid of amdgpu_ttm_clear_buffer Pierre-Eric Pelloux-Prayer
@ 2025-11-21 10:12 ` Pierre-Eric Pelloux-Prayer
  27 siblings, 0 replies; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 10:12 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Sumit Semwal
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, linux-kernel,
	linux-media, linaro-mm-sig

This is the only use case for this function.

---
v2: amdgpu_ttm_clear_buffer instead of amdgpu_clear_buffer
---

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 12 +++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 27 ++++++++++------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h    | 15 ++++++------
 3 files changed, 25 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index dccc31d0128e..ac1727c3634a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -725,9 +725,9 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 	    bo->tbo.resource->mem_type == TTM_PL_VRAM) {
 		struct dma_fence *fence;
 
-		r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev),
-				       bo, 0, NULL, &fence,
-				       true, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
+		r = amdgpu_ttm_clear_buffer(adev, amdgpu_ttm_next_clear_entity(adev),
+					    bo, NULL, &fence,
+					    true, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
 		if (unlikely(r))
 			goto fail_unreserve;
 
@@ -1326,9 +1326,9 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
 	if (r)
 		goto out;
 
-	r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev),
-			       abo, 0, &bo->base._resv, &fence,
-			       false, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
+	r = amdgpu_ttm_clear_buffer(adev, amdgpu_ttm_next_clear_entity(adev),
+				    abo, &bo->base._resv, &fence,
+				    false, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
 	if (WARN_ON(r))
 		goto out;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index c65c411ce26e..1cc72fd94a4c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -457,9 +457,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
 	    (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) {
 		struct dma_fence *wipe_fence = NULL;
 
-		r = amdgpu_fill_buffer(adev, entity,
-				       abo, 0, NULL, &wipe_fence,
-				       false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
+		r = amdgpu_ttm_clear_buffer(adev, entity,
+					    abo, NULL, &wipe_fence,
+					    false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
 		if (r) {
 			goto error;
 		} else if (wipe_fence) {
@@ -2459,29 +2459,26 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
 }
 
 /**
- * amdgpu_fill_buffer - fill a buffer with a given value
+ * amdgpu_ttm_clear_buffer - fill a buffer with 0
  * @adev: amdgpu device object
  * @entity: optional entity to use. If NULL, the clearing entities will be
  *          used to load-balance the partial clears
  * @bo: the bo to fill
- * @src_data: the value to set
  * @resv: fences contained in this reservation will be used as dependencies.
  * @out_fence: the fence from the last clear will be stored here. It might be
  *             NULL if no job was run.
- * @dependency: optional input dependency fence.
  * @consider_clear_status: true if region reported as cleared by amdgpu_res_cleared()
  *                         are skipped.
  * @k_job_id: trace id
  *
  */
-int amdgpu_fill_buffer(struct amdgpu_device *adev,
-		       struct amdgpu_ttm_buffer_entity *entity,
-		       struct amdgpu_bo *bo,
-		       uint32_t src_data,
-		       struct dma_resv *resv,
-		       struct dma_fence **out_fence,
-		       bool consider_clear_status,
-		       u64 k_job_id)
+int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
+			    struct amdgpu_ttm_buffer_entity *entity,
+			    struct amdgpu_bo *bo,
+			    struct dma_resv *resv,
+			    struct dma_fence **out_fence,
+			    bool consider_clear_status,
+			    u64 k_job_id)
 {
 	struct dma_fence *fence = NULL;
 	struct amdgpu_res_cursor dst;
@@ -2516,7 +2513,7 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
 			goto error;
 
 		r = amdgpu_ttm_fill_mem(adev, entity,
-					src_data, to, cur_size, resv,
+					0, to, cur_size, resv,
 					&next, true, k_job_id);
 		if (r)
 			goto error;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index f3bdbcec9afc..fba205c1b5d7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -181,14 +181,13 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev,
 		       struct dma_resv *resv,
 		       struct dma_fence **fence,
 		       bool vm_needs_flush, uint32_t copy_flags);
-int amdgpu_fill_buffer(struct amdgpu_device *adev,
-		       struct amdgpu_ttm_buffer_entity *entity,
-		       struct amdgpu_bo *bo,
-		       uint32_t src_data,
-		       struct dma_resv *resv,
-		       struct dma_fence **out_fence,
-		       bool consider_clear_status,
-		       u64 k_job_id);
+int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
+			    struct amdgpu_ttm_buffer_entity *entity,
+			    struct amdgpu_bo *bo,
+			    struct dma_resv *resv,
+			    struct dma_fence **out_fence,
+			    bool consider_clear_status,
+			    u64 k_job_id);
 struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct amdgpu_device *adev);
 
 int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 01/28] drm/amdgpu: give each kernel job a unique id
  2025-11-21 10:12 ` [PATCH v3 01/28] drm/amdgpu: give each kernel job a unique id Pierre-Eric Pelloux-Prayer
@ 2025-11-21 12:51   ` Christian König
  2025-11-21 20:02   ` Felix Kuehling
  1 sibling, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 12:51 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter, Felix Kuehling
  Cc: Arunpravin Paneer Selvam, amd-gfx, dri-devel, linux-kernel

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> Userspace jobs have drm_file.client_id as a unique identifier
> as job's owners. For kernel jobs, we can allocate arbitrary
> values - the risk of overlap with userspace ids is small (given
> that it's a u64 value).
> In the unlikely case the overlap happens, it'll only impact
> trace events.
> 
> Since this ID is traced in the gpu_scheduler trace events, this
> allows to determine the source of each job sent to the hardware.
> 
> To make grepping easier, the IDs are defined as they will appear
> in the trace output.
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> Acked-by: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
> Link: https://lore.kernel.org/r/20250604122827.2191-1-pierre-eric.pelloux-prayer@amd.com

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c     |  3 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c     |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  5 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     | 19 +++++++++++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c    |  3 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  3 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     | 28 +++++++++++++--------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h     |  3 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |  3 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c     |  5 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  8 +++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      |  6 +++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c  |  4 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c   |  4 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 12 +++++----
>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |  6 +++--
>  drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c       |  6 +++--
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c    |  3 ++-
>  19 files changed, 84 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 5a1904b0b064..1ffbd416a8ad 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -1551,7 +1551,8 @@ static int amdgpu_gfx_run_cleaner_shader_job(struct amdgpu_ring *ring)
>  	owner = (void *)(unsigned long)atomic_inc_return(&counter);
>  
>  	r = amdgpu_job_alloc_with_ib(ring->adev, &entity, owner,
> -				     64, 0, &job);
> +				     64, 0, &job,
> +				     AMDGPU_KERNEL_JOB_ID_CLEANER_SHADER);
>  	if (r)
>  		goto err;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 0017bd10d452..ea8ec160b98a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -690,7 +690,7 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
>  	r = amdgpu_job_alloc_with_ib(ring->adev, &adev->mman.high_pr,
>  				     AMDGPU_FENCE_OWNER_UNDEFINED,
>  				     16 * 4, AMDGPU_IB_POOL_IMMEDIATE,
> -				     &job);
> +				     &job, AMDGPU_KERNEL_JOB_ID_FLUSH_GPU_TLB);
>  	if (r)
>  		goto error_alloc;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index efa3281145f6..b284bd8021df 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -232,11 +232,12 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>  int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev,
>  			     struct drm_sched_entity *entity, void *owner,
>  			     size_t size, enum amdgpu_ib_pool_type pool_type,
> -			     struct amdgpu_job **job)
> +			     struct amdgpu_job **job, u64 k_job_id)
>  {
>  	int r;
>  
> -	r = amdgpu_job_alloc(adev, NULL, entity, owner, 1, job, 0);
> +	r = amdgpu_job_alloc(adev, NULL, entity, owner, 1, job,
> +			     k_job_id);
>  	if (r)
>  		return r;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> index d25f1fcf0242..7abf069d17d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> @@ -44,6 +44,22 @@
>  struct amdgpu_fence;
>  enum amdgpu_ib_pool_type;
>  
> +/* Internal kernel job ids. (decreasing values, starting from U64_MAX). */
> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE              (18446744073709551615ULL)
> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE_PDES         (18446744073709551614ULL)
> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE_RANGE        (18446744073709551613ULL)
> +#define AMDGPU_KERNEL_JOB_ID_VM_PT_CLEAR            (18446744073709551612ULL)
> +#define AMDGPU_KERNEL_JOB_ID_TTM_MAP_BUFFER         (18446744073709551611ULL)
> +#define AMDGPU_KERNEL_JOB_ID_TTM_ACCESS_MEMORY_SDMA (18446744073709551610ULL)
> +#define AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER        (18446744073709551609ULL)
> +#define AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE       (18446744073709551608ULL)
> +#define AMDGPU_KERNEL_JOB_ID_MOVE_BLIT              (18446744073709551607ULL)
> +#define AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER       (18446744073709551606ULL)
> +#define AMDGPU_KERNEL_JOB_ID_CLEANER_SHADER         (18446744073709551605ULL)
> +#define AMDGPU_KERNEL_JOB_ID_FLUSH_GPU_TLB          (18446744073709551604ULL)
> +#define AMDGPU_KERNEL_JOB_ID_KFD_GART_MAP           (18446744073709551603ULL)
> +#define AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST          (18446744073709551602ULL)
> +
>  struct amdgpu_job {
>  	struct drm_sched_job    base;
>  	struct amdgpu_vm	*vm;
> @@ -97,7 +113,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>  int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev,
>  			     struct drm_sched_entity *entity, void *owner,
>  			     size_t size, enum amdgpu_ib_pool_type pool_type,
> -			     struct amdgpu_job **job);
> +			     struct amdgpu_job **job,
> +			     u64 k_job_id);
>  void amdgpu_job_set_resources(struct amdgpu_job *job, struct amdgpu_bo *gds,
>  			      struct amdgpu_bo *gws, struct amdgpu_bo *oa);
>  void amdgpu_job_free_resources(struct amdgpu_job *job);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
> index 91678621f1ff..63ee6ba6a931 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
> @@ -196,7 +196,8 @@ static int amdgpu_jpeg_dec_set_reg(struct amdgpu_ring *ring, uint32_t handle,
>  	int i, r;
>  
>  	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
> -				     AMDGPU_IB_POOL_DIRECT, &job);
> +				     AMDGPU_IB_POOL_DIRECT, &job,
> +				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>  	if (r)
>  		return r;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 24ebba43a469..926a3f09a776 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -1322,7 +1322,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>  	if (r)
>  		goto out;
>  
> -	r = amdgpu_fill_buffer(abo, 0, &bo->base._resv, &fence, true);
> +	r = amdgpu_fill_buffer(abo, 0, &bo->base._resv, &fence, true,
> +			       AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
>  	if (WARN_ON(r))
>  		goto out;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 04a79ef05f90..6a1434391fb8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -226,7 +226,8 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo,
>  	r = amdgpu_job_alloc_with_ib(adev, &adev->mman.high_pr,
>  				     AMDGPU_FENCE_OWNER_UNDEFINED,
>  				     num_dw * 4 + num_bytes,
> -				     AMDGPU_IB_POOL_DELAYED, &job);
> +				     AMDGPU_IB_POOL_DELAYED, &job,
> +				     AMDGPU_KERNEL_JOB_ID_TTM_MAP_BUFFER);
>  	if (r)
>  		return r;
>  
> @@ -398,7 +399,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
>  		struct dma_fence *wipe_fence = NULL;
>  
>  		r = amdgpu_fill_buffer(abo, 0, NULL, &wipe_fence,
> -				       false);
> +				       false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
>  		if (r) {
>  			goto error;
>  		} else if (wipe_fence) {
> @@ -1480,7 +1481,8 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
>  	r = amdgpu_job_alloc_with_ib(adev, &adev->mman.high_pr,
>  				     AMDGPU_FENCE_OWNER_UNDEFINED,
>  				     num_dw * 4, AMDGPU_IB_POOL_DELAYED,
> -				     &job);
> +				     &job,
> +				     AMDGPU_KERNEL_JOB_ID_TTM_ACCESS_MEMORY_SDMA);
>  	if (r)
>  		goto out;
>  
> @@ -2204,7 +2206,7 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
>  				  struct dma_resv *resv,
>  				  bool vm_needs_flush,
>  				  struct amdgpu_job **job,
> -				  bool delayed)
> +				  bool delayed, u64 k_job_id)
>  {
>  	enum amdgpu_ib_pool_type pool = direct_submit ?
>  		AMDGPU_IB_POOL_DIRECT :
> @@ -2214,7 +2216,7 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
>  						    &adev->mman.high_pr;
>  	r = amdgpu_job_alloc_with_ib(adev, entity,
>  				     AMDGPU_FENCE_OWNER_UNDEFINED,
> -				     num_dw * 4, pool, job);
> +				     num_dw * 4, pool, job, k_job_id);
>  	if (r)
>  		return r;
>  
> @@ -2254,7 +2256,8 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
>  	num_loops = DIV_ROUND_UP(byte_count, max_bytes);
>  	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8);
>  	r = amdgpu_ttm_prepare_job(adev, direct_submit, num_dw,
> -				   resv, vm_needs_flush, &job, false);
> +				   resv, vm_needs_flush, &job, false,
> +				   AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER);
>  	if (r)
>  		return r;
>  
> @@ -2289,7 +2292,8 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data,
>  			       uint64_t dst_addr, uint32_t byte_count,
>  			       struct dma_resv *resv,
>  			       struct dma_fence **fence,
> -			       bool vm_needs_flush, bool delayed)
> +			       bool vm_needs_flush, bool delayed,
> +			       u64 k_job_id)
>  {
>  	struct amdgpu_device *adev = ring->adev;
>  	unsigned int num_loops, num_dw;
> @@ -2302,7 +2306,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data,
>  	num_loops = DIV_ROUND_UP_ULL(byte_count, max_bytes);
>  	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->fill_num_dw, 8);
>  	r = amdgpu_ttm_prepare_job(adev, false, num_dw, resv, vm_needs_flush,
> -				   &job, delayed);
> +				   &job, delayed, k_job_id);
>  	if (r)
>  		return r;
>  
> @@ -2372,7 +2376,8 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>  			goto err;
>  
>  		r = amdgpu_ttm_fill_mem(ring, 0, addr, size, resv,
> -					&next, true, true);
> +					&next, true, true,
> +					AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
>  		if (r)
>  			goto err;
>  
> @@ -2391,7 +2396,8 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
>  			uint32_t src_data,
>  			struct dma_resv *resv,
>  			struct dma_fence **f,
> -			bool delayed)
> +			bool delayed,
> +			u64 k_job_id)
>  {
>  	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>  	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
> @@ -2421,7 +2427,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
>  			goto error;
>  
>  		r = amdgpu_ttm_fill_mem(ring, src_data, to, cur_size, resv,
> -					&next, true, delayed);
> +					&next, true, delayed, k_job_id);
>  		if (r)
>  			goto error;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index 054d48823d5f..577ee04ce0bf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -175,7 +175,8 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
>  			uint32_t src_data,
>  			struct dma_resv *resv,
>  			struct dma_fence **fence,
> -			bool delayed);
> +			bool delayed,
> +			u64 k_job_id);
>  
>  int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
>  void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> index 74758b5ffc6c..5c38f0d30c87 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> @@ -1136,7 +1136,8 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring *ring, struct amdgpu_bo *bo,
>  	r = amdgpu_job_alloc_with_ib(ring->adev, &adev->uvd.entity,
>  				     AMDGPU_FENCE_OWNER_UNDEFINED,
>  				     64, direct ? AMDGPU_IB_POOL_DIRECT :
> -				     AMDGPU_IB_POOL_DELAYED, &job);
> +				     AMDGPU_IB_POOL_DELAYED, &job,
> +				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>  	if (r)
>  		return r;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index 709ca369cb52..a7d8f1ce6ac2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -491,7 +491,7 @@ static int amdgpu_vce_get_create_msg(struct amdgpu_ring *ring, uint32_t handle,
>  	r = amdgpu_job_alloc_with_ib(ring->adev, &ring->adev->vce.entity,
>  				     AMDGPU_FENCE_OWNER_UNDEFINED,
>  				     ib_size_dw * 4, AMDGPU_IB_POOL_DIRECT,
> -				     &job);
> +				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>  	if (r)
>  		return r;
>  
> @@ -582,7 +582,8 @@ static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,
>  				     AMDGPU_FENCE_OWNER_UNDEFINED,
>  				     ib_size_dw * 4,
>  				     direct ? AMDGPU_IB_POOL_DIRECT :
> -				     AMDGPU_IB_POOL_DELAYED, &job);
> +				     AMDGPU_IB_POOL_DELAYED, &job,
> +				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>  	if (r)
>  		return r;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> index 5ae7cc0d5f57..5e0786ea911b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> @@ -626,7 +626,7 @@ static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring *ring,
>  
>  	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL,
>  				     64, AMDGPU_IB_POOL_DIRECT,
> -				     &job);
> +				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>  	if (r)
>  		goto err;
>  
> @@ -806,7 +806,7 @@ static int amdgpu_vcn_dec_sw_send_msg(struct amdgpu_ring *ring,
>  
>  	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL,
>  				     ib_size_dw * 4, AMDGPU_IB_POOL_DIRECT,
> -				     &job);
> +				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>  	if (r)
>  		goto err;
>  
> @@ -936,7 +936,7 @@ static int amdgpu_vcn_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t hand
>  
>  	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL,
>  				     ib_size_dw * 4, AMDGPU_IB_POOL_DIRECT,
> -				     &job);
> +				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>  	if (r)
>  		return r;
>  
> @@ -1003,7 +1003,7 @@ static int amdgpu_vcn_enc_get_destroy_msg(struct amdgpu_ring *ring, uint32_t han
>  
>  	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL,
>  				     ib_size_dw * 4, AMDGPU_IB_POOL_DIRECT,
> -				     &job);
> +				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>  	if (r)
>  		return r;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index e2587eea6c4a..193de267984e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -989,7 +989,8 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
>  	params.vm = vm;
>  	params.immediate = immediate;
>  
> -	r = vm->update_funcs->prepare(&params, NULL);
> +	r = vm->update_funcs->prepare(&params, NULL,
> +				      AMDGPU_KERNEL_JOB_ID_VM_UPDATE_PDES);
>  	if (r)
>  		goto error;
>  
> @@ -1158,7 +1159,8 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>  		dma_fence_put(tmp);
>  	}
>  
> -	r = vm->update_funcs->prepare(&params, sync);
> +	r = vm->update_funcs->prepare(&params, sync,
> +				      AMDGPU_KERNEL_JOB_ID_VM_UPDATE_RANGE);
>  	if (r)
>  		goto error_free;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 330e4bdea387..139642eacdd0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -310,7 +310,7 @@ struct amdgpu_vm_update_params {
>  struct amdgpu_vm_update_funcs {
>  	int (*map_table)(struct amdgpu_bo_vm *bo);
>  	int (*prepare)(struct amdgpu_vm_update_params *p,
> -		       struct amdgpu_sync *sync);
> +		       struct amdgpu_sync *sync, u64 k_job_id);
>  	int (*update)(struct amdgpu_vm_update_params *p,
>  		      struct amdgpu_bo_vm *bo, uint64_t pe, uint64_t addr,
>  		      unsigned count, uint32_t incr, uint64_t flags);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
> index 0c1ef5850a5e..22e2e5b47341 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
> @@ -40,12 +40,14 @@ static int amdgpu_vm_cpu_map_table(struct amdgpu_bo_vm *table)
>   *
>   * @p: see amdgpu_vm_update_params definition
>   * @sync: sync obj with fences to wait on
> + * @k_job_id: the id for tracing/debug purposes
>   *
>   * Returns:
>   * Negativ errno, 0 for success.
>   */
>  static int amdgpu_vm_cpu_prepare(struct amdgpu_vm_update_params *p,
> -				 struct amdgpu_sync *sync)
> +				 struct amdgpu_sync *sync,
> +				 u64 k_job_id)
>  {
>  	if (!sync)
>  		return 0;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
> index f6ffc207ec2a..c7a7d51080a8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
> @@ -26,6 +26,7 @@
>  #include "amdgpu.h"
>  #include "amdgpu_trace.h"
>  #include "amdgpu_vm.h"
> +#include "amdgpu_job.h"
>  
>  /*
>   * amdgpu_vm_pt_cursor - state for for_each_amdgpu_vm_pt
> @@ -396,7 +397,8 @@ int amdgpu_vm_pt_clear(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>  	params.vm = vm;
>  	params.immediate = immediate;
>  
> -	r = vm->update_funcs->prepare(&params, NULL);
> +	r = vm->update_funcs->prepare(&params, NULL,
> +				      AMDGPU_KERNEL_JOB_ID_VM_PT_CLEAR);
>  	if (r)
>  		goto exit;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> index 46d9fb433ab2..36805dcfa159 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> @@ -40,7 +40,7 @@ static int amdgpu_vm_sdma_map_table(struct amdgpu_bo_vm *table)
>  
>  /* Allocate a new job for @count PTE updates */
>  static int amdgpu_vm_sdma_alloc_job(struct amdgpu_vm_update_params *p,
> -				    unsigned int count)
> +				    unsigned int count, u64 k_job_id)
>  {
>  	enum amdgpu_ib_pool_type pool = p->immediate ? AMDGPU_IB_POOL_IMMEDIATE
>  		: AMDGPU_IB_POOL_DELAYED;
> @@ -56,7 +56,7 @@ static int amdgpu_vm_sdma_alloc_job(struct amdgpu_vm_update_params *p,
>  	ndw = min(ndw, AMDGPU_VM_SDMA_MAX_NUM_DW);
>  
>  	r = amdgpu_job_alloc_with_ib(p->adev, entity, AMDGPU_FENCE_OWNER_VM,
> -				     ndw * 4, pool, &p->job);
> +				     ndw * 4, pool, &p->job, k_job_id);
>  	if (r)
>  		return r;
>  
> @@ -69,16 +69,17 @@ static int amdgpu_vm_sdma_alloc_job(struct amdgpu_vm_update_params *p,
>   *
>   * @p: see amdgpu_vm_update_params definition
>   * @sync: amdgpu_sync object with fences to wait for
> + * @k_job_id: identifier of the job, for tracing purpose
>   *
>   * Returns:
>   * Negativ errno, 0 for success.
>   */
>  static int amdgpu_vm_sdma_prepare(struct amdgpu_vm_update_params *p,
> -				  struct amdgpu_sync *sync)
> +				  struct amdgpu_sync *sync, u64 k_job_id)
>  {
>  	int r;
>  
> -	r = amdgpu_vm_sdma_alloc_job(p, 0);
> +	r = amdgpu_vm_sdma_alloc_job(p, 0, k_job_id);
>  	if (r)
>  		return r;
>  
> @@ -249,7 +250,8 @@ static int amdgpu_vm_sdma_update(struct amdgpu_vm_update_params *p,
>  			if (r)
>  				return r;
>  
> -			r = amdgpu_vm_sdma_alloc_job(p, count);
> +			r = amdgpu_vm_sdma_alloc_job(p, count,
> +						     AMDGPU_KERNEL_JOB_ID_VM_UPDATE);
>  			if (r)
>  				return r;
>  		}
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> index 1c07b701d0e4..ceb94bbb03a4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> @@ -217,7 +217,8 @@ static int uvd_v6_0_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t handle
>  	int i, r;
>  
>  	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
> -				     AMDGPU_IB_POOL_DIRECT, &job);
> +				     AMDGPU_IB_POOL_DIRECT, &job,
> +				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>  	if (r)
>  		return r;
>  
> @@ -281,7 +282,8 @@ static int uvd_v6_0_enc_get_destroy_msg(struct amdgpu_ring *ring,
>  	int i, r;
>  
>  	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
> -				     AMDGPU_IB_POOL_DIRECT, &job);
> +				     AMDGPU_IB_POOL_DIRECT, &job,
> +				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>  	if (r)
>  		return r;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
> index 9d237b5937fb..1f8866f3f63c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
> @@ -225,7 +225,8 @@ static int uvd_v7_0_enc_get_create_msg(struct amdgpu_ring *ring, u32 handle,
>  	int i, r;
>  
>  	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
> -				     AMDGPU_IB_POOL_DIRECT, &job);
> +				     AMDGPU_IB_POOL_DIRECT, &job,
> +				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>  	if (r)
>  		return r;
>  
> @@ -288,7 +289,8 @@ static int uvd_v7_0_enc_get_destroy_msg(struct amdgpu_ring *ring, u32 handle,
>  	int i, r;
>  
>  	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
> -				     AMDGPU_IB_POOL_DIRECT, &job);
> +				     AMDGPU_IB_POOL_DIRECT, &job,
> +				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>  	if (r)
>  		return r;
>  
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index 3653c563ee9a..46c84fc60af1 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -67,7 +67,8 @@ svm_migrate_gart_map(struct amdgpu_ring *ring, u64 npages,
>  				     AMDGPU_FENCE_OWNER_UNDEFINED,
>  				     num_dw * 4 + num_bytes,
>  				     AMDGPU_IB_POOL_DELAYED,
> -				     &job);
> +				     &job,
> +				     AMDGPU_KERNEL_JOB_ID_KFD_GART_MAP);
>  	if (r)
>  		return r;
>  


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 02/28] drm/amdgpu: use ttm_resource_manager_cleanup
  2025-11-21 10:12 ` [PATCH v3 02/28] drm/amdgpu: use ttm_resource_manager_cleanup Pierre-Eric Pelloux-Prayer
@ 2025-11-21 12:52   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 12:52 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> Rather than open-coding it.
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

Reviewed-by: Chrstian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 6a1434391fb8..8d0043ad5336 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -2182,8 +2182,10 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>  	} else {
>  		drm_sched_entity_destroy(&adev->mman.high_pr);
>  		drm_sched_entity_destroy(&adev->mman.low_pr);
> -		dma_fence_put(man->move);
> -		man->move = NULL;
> +		/* Drop all the old fences since re-creating the scheduler entities
> +		 * will allocate new contexts.
> +		 */
> +		ttm_resource_manager_cleanup(man);
>  	}
>  
>  	/* this just adjusts TTM size idea, which sets lpfn to the correct value */


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 04/28] drm/amdgpu: remove the ring param from ttm functions
  2025-11-21 10:12 ` [PATCH v3 04/28] drm/amdgpu: remove the ring param from ttm functions Pierre-Eric Pelloux-Prayer
@ 2025-11-21 12:58   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 12:58 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter, Felix Kuehling
  Cc: amd-gfx, dri-devel, linux-kernel



On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> With the removal of the direct_submit argument, the ring param
> becomes useless: the jobs are always submitted to buffer_funcs_ring.
> 
> Some functions are getting an amdgpu_device argument since they
> were getting it from the ring arg.
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c |  3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 46 ++++++++++---------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h       |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c      |  2 +-
>  4 files changed, 27 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> index 02c2479a8840..3636b757c974 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> @@ -37,8 +37,7 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size,
>  
>  	stime = ktime_get();
>  	for (i = 0; i < n; i++) {
> -		struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
> -		r = amdgpu_copy_buffer(ring, saddr, daddr, size, NULL, &fence,
> +		r = amdgpu_copy_buffer(adev, saddr, daddr, size, NULL, &fence,
>  				       false, 0);
>  		if (r)
>  			goto exit_do_move;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 071afbacb3d2..d54b57078946 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -164,11 +164,11 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
>  
>  /**
>   * amdgpu_ttm_map_buffer - Map memory into the GART windows
> + * @adev: the device being used
>   * @bo: buffer object to map
>   * @mem: memory object to map
>   * @mm_cur: range to map
>   * @window: which GART window to use
> - * @ring: DMA ring to use for the copy
>   * @tmz: if we should setup a TMZ enabled mapping
>   * @size: in number of bytes to map, out number of bytes mapped
>   * @addr: resulting address inside the MC address space
> @@ -176,15 +176,16 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
>   * Setup one of the GART windows to access a specific piece of memory or return
>   * the physical address for local memory.
>   */
> -static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo,
> +static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
> +				 struct ttm_buffer_object *bo,

Using this should work as well:

struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);

Not a must have, but is more defensive and saves an extra parameter.

Apart from that the patch looks good to me.

Regards,
Christian.

>  				 struct ttm_resource *mem,
>  				 struct amdgpu_res_cursor *mm_cur,
> -				 unsigned int window, struct amdgpu_ring *ring,
> +				 unsigned int window,
>  				 bool tmz, uint64_t *size, uint64_t *addr)
>  {
> -	struct amdgpu_device *adev = ring->adev;
>  	unsigned int offset, num_pages, num_dw, num_bytes;
>  	uint64_t src_addr, dst_addr;
> +	struct amdgpu_ring *ring;
>  	struct amdgpu_job *job;
>  	void *cpu_addr;
>  	uint64_t flags;
> @@ -239,6 +240,7 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo,
>  	amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr,
>  				dst_addr, num_bytes, 0);
>  
> +	ring = adev->mman.buffer_funcs_ring;
>  	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
>  	WARN_ON(job->ibs[0].length_dw > num_dw);
>  
> @@ -286,7 +288,6 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>  				      struct dma_resv *resv,
>  				      struct dma_fence **f)
>  {
> -	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
>  	struct amdgpu_res_cursor src_mm, dst_mm;
>  	struct dma_fence *fence = NULL;
>  	int r = 0;
> @@ -312,13 +313,13 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>  		cur_size = min3(src_mm.size, dst_mm.size, 256ULL << 20);
>  
>  		/* Map src to window 0 and dst to window 1. */
> -		r = amdgpu_ttm_map_buffer(src->bo, src->mem, &src_mm,
> -					  0, ring, tmz, &cur_size, &from);
> +		r = amdgpu_ttm_map_buffer(adev, src->bo, src->mem, &src_mm,
> +					  0, tmz, &cur_size, &from);
>  		if (r)
>  			goto error;
>  
> -		r = amdgpu_ttm_map_buffer(dst->bo, dst->mem, &dst_mm,
> -					  1, ring, tmz, &cur_size, &to);
> +		r = amdgpu_ttm_map_buffer(adev, dst->bo, dst->mem, &dst_mm,
> +					  1, tmz, &cur_size, &to);
>  		if (r)
>  			goto error;
>  
> @@ -345,7 +346,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>  							     write_compress_disable));
>  		}
>  
> -		r = amdgpu_copy_buffer(ring, from, to, cur_size, resv,
> +		r = amdgpu_copy_buffer(adev, from, to, cur_size, resv,
>  				       &next, true, copy_flags);
>  		if (r)
>  			goto error;
> @@ -2232,19 +2233,21 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
>  						   DMA_RESV_USAGE_BOOKKEEP);
>  }
>  
> -int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
> +int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
>  		       uint64_t dst_offset, uint32_t byte_count,
>  		       struct dma_resv *resv,
>  		       struct dma_fence **fence,
>  		       bool vm_needs_flush, uint32_t copy_flags)
>  {
> -	struct amdgpu_device *adev = ring->adev;
>  	unsigned int num_loops, num_dw;
> +	struct amdgpu_ring *ring;
>  	struct amdgpu_job *job;
>  	uint32_t max_bytes;
>  	unsigned int i;
>  	int r;
>  
> +	ring = adev->mman.buffer_funcs_ring;
> +
>  	if (!ring->sched.ready) {
>  		dev_err(adev->dev,
>  			"Trying to move memory with ring turned off.\n");
> @@ -2284,15 +2287,15 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
>  	return r;
>  }
>  
> -static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data,
> +static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data,
>  			       uint64_t dst_addr, uint32_t byte_count,
>  			       struct dma_resv *resv,
>  			       struct dma_fence **fence,
>  			       bool vm_needs_flush, bool delayed,
>  			       u64 k_job_id)
>  {
> -	struct amdgpu_device *adev = ring->adev;
>  	unsigned int num_loops, num_dw;
> +	struct amdgpu_ring *ring;
>  	struct amdgpu_job *job;
>  	uint32_t max_bytes;
>  	unsigned int i;
> @@ -2316,6 +2319,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data,
>  		byte_count -= cur_size;
>  	}
>  
> +	ring = adev->mman.buffer_funcs_ring;
>  	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
>  	WARN_ON(job->ibs[0].length_dw > num_dw);
>  	*fence = amdgpu_job_submit(job);
> @@ -2338,7 +2342,6 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>  			    struct dma_fence **fence)
>  {
>  	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> -	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
>  	struct amdgpu_res_cursor cursor;
>  	u64 addr;
>  	int r = 0;
> @@ -2366,12 +2369,12 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>  		/* Never clear more than 256MiB at once to avoid timeouts */
>  		size = min(cursor.size, 256ULL << 20);
>  
> -		r = amdgpu_ttm_map_buffer(&bo->tbo, bo->tbo.resource, &cursor,
> -					  1, ring, false, &size, &addr);
> +		r = amdgpu_ttm_map_buffer(adev, &bo->tbo, bo->tbo.resource, &cursor,
> +					  1, false, &size, &addr);
>  		if (r)
>  			goto err;
>  
> -		r = amdgpu_ttm_fill_mem(ring, 0, addr, size, resv,
> +		r = amdgpu_ttm_fill_mem(adev, 0, addr, size, resv,
>  					&next, true, true,
>  					AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
>  		if (r)
> @@ -2396,7 +2399,6 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
>  			u64 k_job_id)
>  {
>  	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> -	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
>  	struct dma_fence *fence = NULL;
>  	struct amdgpu_res_cursor dst;
>  	int r;
> @@ -2417,12 +2419,12 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
>  		/* Never fill more than 256MiB at once to avoid timeouts */
>  		cur_size = min(dst.size, 256ULL << 20);
>  
> -		r = amdgpu_ttm_map_buffer(&bo->tbo, bo->tbo.resource, &dst,
> -					  1, ring, false, &cur_size, &to);
> +		r = amdgpu_ttm_map_buffer(adev, &bo->tbo, bo->tbo.resource, &dst,
> +					  1, false, &cur_size, &to);
>  		if (r)
>  			goto error;
>  
> -		r = amdgpu_ttm_fill_mem(ring, src_data, to, cur_size, resv,
> +		r = amdgpu_ttm_fill_mem(adev, src_data, to, cur_size, resv,
>  					&next, true, delayed, k_job_id);
>  		if (r)
>  			goto error;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index 50e40380fe95..76e00a6510c6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -163,7 +163,7 @@ int amdgpu_ttm_init(struct amdgpu_device *adev);
>  void amdgpu_ttm_fini(struct amdgpu_device *adev);
>  void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev,
>  					bool enable);
> -int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
> +int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
>  		       uint64_t dst_offset, uint32_t byte_count,
>  		       struct dma_resv *resv,
>  		       struct dma_fence **fence,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index 378af0b2aaa9..0ad44acb08f2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -152,7 +152,7 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys,
>  			goto out_unlock;
>  		}
>  
> -		r = amdgpu_copy_buffer(ring, gart_s, gart_d, size * PAGE_SIZE,
> +		r = amdgpu_copy_buffer(adev, gart_s, gart_d, size * PAGE_SIZE,
>  				       NULL, &next, true, 0);
>  		if (r) {
>  			dev_err(adev->dev, "fail %d to copy memory\n", r);


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 06/28] drm/amdgpu: add amdgpu_ttm_job_submit helper
  2025-11-21 10:12 ` [PATCH v3 06/28] drm/amdgpu: add amdgpu_ttm_job_submit helper Pierre-Eric Pelloux-Prayer
@ 2025-11-21 13:00   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 13:00 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter, Sumit Semwal
  Cc: amd-gfx, dri-devel, linux-kernel, linux-media, linaro-mm-sig

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> Deduplicate the IB padding code and will also be used
> later to check locking.
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 34 ++++++++++++-------------
>  1 file changed, 16 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 17e1892c44a2..be1232b2d55e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -162,6 +162,18 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
>  	*placement = abo->placement;
>  }
>  
> +static struct dma_fence *
> +amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_job *job, u32 num_dw)
> +{
> +	struct amdgpu_ring *ring;
> +
> +	ring = adev->mman.buffer_funcs_ring;
> +	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
> +	WARN_ON(job->ibs[0].length_dw > num_dw);
> +
> +	return amdgpu_job_submit(job);
> +}
> +
>  /**
>   * amdgpu_ttm_map_buffer - Map memory into the GART windows
>   * @adev: the device being used
> @@ -185,7 +197,6 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
>  {
>  	unsigned int offset, num_pages, num_dw, num_bytes;
>  	uint64_t src_addr, dst_addr;
> -	struct amdgpu_ring *ring;
>  	struct amdgpu_job *job;
>  	void *cpu_addr;
>  	uint64_t flags;
> @@ -240,10 +251,6 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
>  	amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr,
>  				dst_addr, num_bytes, 0);
>  
> -	ring = adev->mman.buffer_funcs_ring;
> -	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
> -	WARN_ON(job->ibs[0].length_dw > num_dw);
> -
>  	flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, mem);
>  	if (tmz)
>  		flags |= AMDGPU_PTE_TMZ;
> @@ -261,7 +268,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
>  		amdgpu_gart_map_vram_range(adev, pa, 0, num_pages, flags, cpu_addr);
>  	}
>  
> -	dma_fence_put(amdgpu_job_submit(job));
> +	dma_fence_put(amdgpu_ttm_job_submit(adev, job, num_dw));
>  	return 0;
>  }
>  
> @@ -1497,10 +1504,7 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
>  	amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr,
>  				PAGE_SIZE, 0);
>  
> -	amdgpu_ring_pad_ib(adev->mman.buffer_funcs_ring, &job->ibs[0]);
> -	WARN_ON(job->ibs[0].length_dw > num_dw);
> -
> -	fence = amdgpu_job_submit(job);
> +	fence = amdgpu_ttm_job_submit(adev, job, num_dw);
>  
>  	if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout))
>  		r = -ETIMEDOUT;
> @@ -2285,11 +2289,9 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
>  		byte_count -= cur_size_in_bytes;
>  	}
>  
> -	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
> -	WARN_ON(job->ibs[0].length_dw > num_dw);
> -	*fence = amdgpu_job_submit(job);
>  	if (r)
>  		goto error_free;
> +	*fence = amdgpu_ttm_job_submit(adev, job, num_dw);
>  
>  	return r;
>  
> @@ -2307,7 +2309,6 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data,
>  			       u64 k_job_id)
>  {
>  	unsigned int num_loops, num_dw;
> -	struct amdgpu_ring *ring;
>  	struct amdgpu_job *job;
>  	uint32_t max_bytes;
>  	unsigned int i;
> @@ -2331,10 +2332,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data,
>  		byte_count -= cur_size;
>  	}
>  
> -	ring = adev->mman.buffer_funcs_ring;
> -	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
> -	WARN_ON(job->ibs[0].length_dw > num_dw);
> -	*fence = amdgpu_job_submit(job);
> +	*fence = amdgpu_ttm_job_submit(adev, job, num_dw);
>  	return 0;
>  }
>  


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 07/28] drm/amdgpu: fix error handling in amdgpu_copy_buffer
  2025-11-21 10:12 ` [PATCH v3 07/28] drm/amdgpu: fix error handling in amdgpu_copy_buffer Pierre-Eric Pelloux-Prayer
@ 2025-11-21 13:04   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 13:04 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> If amdgpu_job_alloc_with_ib fails, amdgpu_ttm_prepare_job should
> clear the pointer to NULL, this way the caller can call
> amdgpu_job_free on all failures without risking a double free.
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index be1232b2d55e..353682c0e8f0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -2233,8 +2233,10 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
>  	r = amdgpu_job_alloc_with_ib(adev, entity,
>  				     AMDGPU_FENCE_OWNER_UNDEFINED,
>  				     num_dw * 4, pool, job, k_job_id);
> -	if (r)
> +	if (r) {
> +		*job = NULL;

Mhm, that is something amdgpu_job_alloc_with_ib() should probably be doing instead.

Apart from that patch looks good to me.

Regards,
Christian.

>  		return r;
> +	}
>  
>  	if (vm_needs_flush) {
>  		(*job)->vm_pd_addr = amdgpu_gmc_pd_addr(adev->gmc.pdb0_bo ?
> @@ -2277,7 +2279,7 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
>  				   resv, vm_needs_flush, &job, false,
>  				   AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER);
>  	if (r)
> -		return r;
> +		goto error_free;
>  
>  	for (i = 0; i < num_loops; i++) {
>  		uint32_t cur_size_in_bytes = min(byte_count, max_bytes);
> @@ -2289,11 +2291,9 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
>  		byte_count -= cur_size_in_bytes;
>  	}
>  
> -	if (r)
> -		goto error_free;
>  	*fence = amdgpu_ttm_job_submit(adev, job, num_dw);
>  
> -	return r;
> +	return 0;
>  
>  error_free:
>  	amdgpu_job_free(job);


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 08/28] drm/amdgpu: pass the entity to use to amdgpu_ttm_map_buffer
  2025-11-21 10:12 ` [PATCH v3 08/28] drm/amdgpu: pass the entity to use to amdgpu_ttm_map_buffer Pierre-Eric Pelloux-Prayer
@ 2025-11-21 13:05   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 13:05 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> This way the caller can select the one it wants to use.
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 55 ++++++++++++++++---------
>  1 file changed, 35 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 353682c0e8f0..3d850893b97f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -177,6 +177,7 @@ amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_job *job, u32 nu
>  /**
>   * amdgpu_ttm_map_buffer - Map memory into the GART windows
>   * @adev: the device being used
> + * @entity: entity to run the window setup job
>   * @bo: buffer object to map
>   * @mem: memory object to map
>   * @mm_cur: range to map
> @@ -189,6 +190,7 @@ amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_job *job, u32 nu
>   * the physical address for local memory.
>   */
>  static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
> +				 struct amdgpu_ttm_buffer_entity *entity,
>  				 struct ttm_buffer_object *bo,
>  				 struct ttm_resource *mem,
>  				 struct amdgpu_res_cursor *mm_cur,
> @@ -235,7 +237,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
>  	num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
>  	num_bytes = num_pages * 8 * AMDGPU_GPU_PAGES_IN_CPU_PAGE;
>  
> -	r = amdgpu_job_alloc_with_ib(adev, &adev->mman.default_entity.base,
> +	r = amdgpu_job_alloc_with_ib(adev, &entity->base,
>  				     AMDGPU_FENCE_OWNER_UNDEFINED,
>  				     num_dw * 4 + num_bytes,
>  				     AMDGPU_IB_POOL_DELAYED, &job,
> @@ -275,6 +277,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
>  /**
>   * amdgpu_ttm_copy_mem_to_mem - Helper function for copy
>   * @adev: amdgpu device
> + * @entity: entity to run the jobs
>   * @src: buffer/address where to read from
>   * @dst: buffer/address where to write to
>   * @size: number of bytes to copy
> @@ -289,6 +292,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
>   */
>  __attribute__((nonnull))
>  static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
> +				      struct amdgpu_ttm_buffer_entity *entity,
>  				      const struct amdgpu_copy_mem *src,
>  				      const struct amdgpu_copy_mem *dst,
>  				      uint64_t size, bool tmz,
> @@ -320,12 +324,14 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>  		cur_size = min3(src_mm.size, dst_mm.size, 256ULL << 20);
>  
>  		/* Map src to window 0 and dst to window 1. */
> -		r = amdgpu_ttm_map_buffer(adev, src->bo, src->mem, &src_mm,
> +		r = amdgpu_ttm_map_buffer(adev, entity,
> +					  src->bo, src->mem, &src_mm,
>  					  0, tmz, &cur_size, &from);
>  		if (r)
>  			goto error;
>  
> -		r = amdgpu_ttm_map_buffer(adev, dst->bo, dst->mem, &dst_mm,
> +		r = amdgpu_ttm_map_buffer(adev, entity,
> +					  dst->bo, dst->mem, &dst_mm,
>  					  1, tmz, &cur_size, &to);
>  		if (r)
>  			goto error;
> @@ -394,7 +400,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
>  	src.offset = 0;
>  	dst.offset = 0;
>  
> -	r = amdgpu_ttm_copy_mem_to_mem(adev, &src, &dst,
> +	r = amdgpu_ttm_copy_mem_to_mem(adev,
> +				       &adev->mman.move_entity,
> +				       &src, &dst,
>  				       new_mem->size,
>  				       amdgpu_bo_encrypted(abo),
>  				       bo->base.resv, &fence);
> @@ -2220,17 +2228,16 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>  }
>  
>  static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
> +				  struct amdgpu_ttm_buffer_entity *entity,
>  				  unsigned int num_dw,
>  				  struct dma_resv *resv,
>  				  bool vm_needs_flush,
>  				  struct amdgpu_job **job,
> -				  bool delayed, u64 k_job_id)
> +				  u64 k_job_id)
>  {
>  	enum amdgpu_ib_pool_type pool = AMDGPU_IB_POOL_DELAYED;
>  	int r;
> -	struct drm_sched_entity *entity = delayed ? &adev->mman.clear_entity.base :
> -						    &adev->mman.move_entity.base;
> -	r = amdgpu_job_alloc_with_ib(adev, entity,
> +	r = amdgpu_job_alloc_with_ib(adev, &entity->base,
>  				     AMDGPU_FENCE_OWNER_UNDEFINED,
>  				     num_dw * 4, pool, job, k_job_id);
>  	if (r) {
> @@ -2275,8 +2282,8 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
>  	max_bytes = adev->mman.buffer_funcs->copy_max_bytes;
>  	num_loops = DIV_ROUND_UP(byte_count, max_bytes);
>  	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8);
> -	r = amdgpu_ttm_prepare_job(adev, num_dw,
> -				   resv, vm_needs_flush, &job, false,
> +	r = amdgpu_ttm_prepare_job(adev, &adev->mman.move_entity, num_dw,
> +				   resv, vm_needs_flush, &job,
>  				   AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER);
>  	if (r)
>  		goto error_free;
> @@ -2301,11 +2308,13 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
>  	return r;
>  }
>  
> -static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data,
> +static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
> +			       struct amdgpu_ttm_buffer_entity *entity,
> +			       uint32_t src_data,
>  			       uint64_t dst_addr, uint32_t byte_count,
>  			       struct dma_resv *resv,
>  			       struct dma_fence **fence,
> -			       bool vm_needs_flush, bool delayed,
> +			       bool vm_needs_flush,
>  			       u64 k_job_id)
>  {
>  	unsigned int num_loops, num_dw;
> @@ -2317,8 +2326,8 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data,
>  	max_bytes = adev->mman.buffer_funcs->fill_max_bytes;
>  	num_loops = DIV_ROUND_UP_ULL(byte_count, max_bytes);
>  	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->fill_num_dw, 8);
> -	r = amdgpu_ttm_prepare_job(adev, num_dw, resv, vm_needs_flush,
> -				   &job, delayed, k_job_id);
> +	r = amdgpu_ttm_prepare_job(adev, entity, num_dw, resv,
> +				   vm_needs_flush, &job, k_job_id);
>  	if (r)
>  		return r;
>  
> @@ -2379,13 +2388,14 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>  		/* Never clear more than 256MiB at once to avoid timeouts */
>  		size = min(cursor.size, 256ULL << 20);
>  
> -		r = amdgpu_ttm_map_buffer(adev, &bo->tbo, bo->tbo.resource, &cursor,
> +		r = amdgpu_ttm_map_buffer(adev, &adev->mman.clear_entity,
> +					  &bo->tbo, bo->tbo.resource, &cursor,
>  					  1, false, &size, &addr);
>  		if (r)
>  			goto err;
>  
> -		r = amdgpu_ttm_fill_mem(adev, 0, addr, size, resv,
> -					&next, true, true,
> +		r = amdgpu_ttm_fill_mem(adev, &adev->mman.clear_entity, 0, addr, size, resv,
> +					&next, true,
>  					AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
>  		if (r)
>  			goto err;
> @@ -2409,10 +2419,14 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
>  			u64 k_job_id)
>  {
>  	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> +	struct amdgpu_ttm_buffer_entity *entity;
>  	struct dma_fence *fence = NULL;
>  	struct amdgpu_res_cursor dst;
>  	int r;
>  
> +	entity = delayed ? &adev->mman.clear_entity :
> +			   &adev->mman.move_entity;
> +
>  	if (!adev->mman.buffer_funcs_enabled) {
>  		dev_err(adev->dev,
>  			"Trying to clear memory with ring turned off.\n");
> @@ -2429,13 +2443,14 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
>  		/* Never fill more than 256MiB at once to avoid timeouts */
>  		cur_size = min(dst.size, 256ULL << 20);
>  
> -		r = amdgpu_ttm_map_buffer(adev, &bo->tbo, bo->tbo.resource, &dst,
> +		r = amdgpu_ttm_map_buffer(adev, &adev->mman.default_entity,
> +					  &bo->tbo, bo->tbo.resource, &dst,
>  					  1, false, &cur_size, &to);
>  		if (r)
>  			goto error;
>  
> -		r = amdgpu_ttm_fill_mem(adev, src_data, to, cur_size, resv,
> -					&next, true, delayed, k_job_id);
> +		r = amdgpu_ttm_fill_mem(adev, entity, src_data, to, cur_size, resv,
> +					&next, true, k_job_id);
>  		if (r)
>  			goto error;
>  


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 09/28] drm/amdgpu: pass the entity to use to ttm public functions
  2025-11-21 10:12 ` [PATCH v3 09/28] drm/amdgpu: pass the entity to use to ttm public functions Pierre-Eric Pelloux-Prayer
@ 2025-11-21 13:23   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 13:23 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter, Felix Kuehling, Sumit Semwal
  Cc: amd-gfx, dri-devel, linux-kernel, linux-media, linaro-mm-sig

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> This way the caller can select the one it wants to use.
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

I'm wondering if it wouldn't make sense to put a pointer to adev into each amdgpu_ttm_buffer_entity.

But that is maybe something for another patch. For now:

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c |  3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |  4 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 34 +++++++++----------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h       | 16 +++++----
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c      |  3 +-
>  5 files changed, 32 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> index 3636b757c974..a050167e76a4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> @@ -37,7 +37,8 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size,
>  
>  	stime = ktime_get();
>  	for (i = 0; i < n; i++) {
> -		r = amdgpu_copy_buffer(adev, saddr, daddr, size, NULL, &fence,
> +		r = amdgpu_copy_buffer(adev, &adev->mman.default_entity,
> +				       saddr, daddr, size, NULL, &fence,
>  				       false, 0);
>  		if (r)
>  			goto exit_do_move;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 926a3f09a776..858eb9fa061b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -1322,8 +1322,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>  	if (r)
>  		goto out;
>  
> -	r = amdgpu_fill_buffer(abo, 0, &bo->base._resv, &fence, true,
> -			       AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
> +	r = amdgpu_fill_buffer(&adev->mman.clear_entity, abo, 0, &bo->base._resv,
> +			       &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
>  	if (WARN_ON(r))
>  		goto out;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 3d850893b97f..1d3afad885da 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -359,7 +359,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>  							     write_compress_disable));
>  		}
>  
> -		r = amdgpu_copy_buffer(adev, from, to, cur_size, resv,
> +		r = amdgpu_copy_buffer(adev, entity, from, to, cur_size, resv,
>  				       &next, true, copy_flags);
>  		if (r)
>  			goto error;
> @@ -414,8 +414,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
>  	    (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) {
>  		struct dma_fence *wipe_fence = NULL;
>  
> -		r = amdgpu_fill_buffer(abo, 0, NULL, &wipe_fence,
> -				       false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
> +		r = amdgpu_fill_buffer(&adev->mman.move_entity,
> +				       abo, 0, NULL, &wipe_fence,
> +				       AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
>  		if (r) {
>  			goto error;
>  		} else if (wipe_fence) {
> @@ -2258,7 +2259,9 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
>  						   DMA_RESV_USAGE_BOOKKEEP);
>  }
>  
> -int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
> +int amdgpu_copy_buffer(struct amdgpu_device *adev,
> +		       struct amdgpu_ttm_buffer_entity *entity,
> +		       uint64_t src_offset,
>  		       uint64_t dst_offset, uint32_t byte_count,
>  		       struct dma_resv *resv,
>  		       struct dma_fence **fence,
> @@ -2282,7 +2285,7 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
>  	max_bytes = adev->mman.buffer_funcs->copy_max_bytes;
>  	num_loops = DIV_ROUND_UP(byte_count, max_bytes);
>  	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8);
> -	r = amdgpu_ttm_prepare_job(adev, &adev->mman.move_entity, num_dw,
> +	r = amdgpu_ttm_prepare_job(adev, entity, num_dw,
>  				   resv, vm_needs_flush, &job,
>  				   AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER);
>  	if (r)
> @@ -2411,22 +2414,18 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>  	return r;
>  }
>  
> -int amdgpu_fill_buffer(struct amdgpu_bo *bo,
> -			uint32_t src_data,
> -			struct dma_resv *resv,
> -			struct dma_fence **f,
> -			bool delayed,
> -			u64 k_job_id)
> +int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity,
> +		       struct amdgpu_bo *bo,
> +		       uint32_t src_data,
> +		       struct dma_resv *resv,
> +		       struct dma_fence **f,
> +		       u64 k_job_id)
>  {
>  	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> -	struct amdgpu_ttm_buffer_entity *entity;
>  	struct dma_fence *fence = NULL;
>  	struct amdgpu_res_cursor dst;
>  	int r;
>  
> -	entity = delayed ? &adev->mman.clear_entity :
> -			   &adev->mman.move_entity;
> -
>  	if (!adev->mman.buffer_funcs_enabled) {
>  		dev_err(adev->dev,
>  			"Trying to clear memory with ring turned off.\n");
> @@ -2443,13 +2442,14 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
>  		/* Never fill more than 256MiB at once to avoid timeouts */
>  		cur_size = min(dst.size, 256ULL << 20);
>  
> -		r = amdgpu_ttm_map_buffer(adev, &adev->mman.default_entity,
> +		r = amdgpu_ttm_map_buffer(adev, entity,
>  					  &bo->tbo, bo->tbo.resource, &dst,
>  					  1, false, &cur_size, &to);
>  		if (r)
>  			goto error;
>  
> -		r = amdgpu_ttm_fill_mem(adev, entity, src_data, to, cur_size, resv,
> +		r = amdgpu_ttm_fill_mem(adev, entity,
> +					src_data, to, cur_size, resv,
>  					&next, true, k_job_id);
>  		if (r)
>  			goto error;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index 41bbc25680a2..9288599c9c46 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -167,7 +167,9 @@ int amdgpu_ttm_init(struct amdgpu_device *adev);
>  void amdgpu_ttm_fini(struct amdgpu_device *adev);
>  void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev,
>  					bool enable);
> -int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
> +int amdgpu_copy_buffer(struct amdgpu_device *adev,
> +		       struct amdgpu_ttm_buffer_entity *entity,
> +		       uint64_t src_offset,
>  		       uint64_t dst_offset, uint32_t byte_count,
>  		       struct dma_resv *resv,
>  		       struct dma_fence **fence,
> @@ -175,12 +177,12 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset,
>  int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>  			    struct dma_resv *resv,
>  			    struct dma_fence **fence);
> -int amdgpu_fill_buffer(struct amdgpu_bo *bo,
> -			uint32_t src_data,
> -			struct dma_resv *resv,
> -			struct dma_fence **fence,
> -			bool delayed,
> -			u64 k_job_id);
> +int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity,
> +		       struct amdgpu_bo *bo,
> +		       uint32_t src_data,
> +		       struct dma_resv *resv,
> +		       struct dma_fence **f,
> +		       u64 k_job_id);
>  
>  int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
>  void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index ade1d4068d29..9c76f1ba0e55 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -157,7 +157,8 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys,
>  			goto out_unlock;
>  		}
>  
> -		r = amdgpu_copy_buffer(adev, gart_s, gart_d, size * PAGE_SIZE,
> +		r = amdgpu_copy_buffer(adev, entity,
> +				       gart_s, gart_d, size * PAGE_SIZE,
>  				       NULL, &next, true, 0);
>  		if (r) {
>  			dev_err(adev->dev, "fail %d to copy memory\n", r);


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 10/28] drm/amdgpu: add amdgpu_device argument to ttm functions that need it
  2025-11-21 10:12 ` [PATCH v3 10/28] drm/amdgpu: add amdgpu_device argument to ttm functions that need it Pierre-Eric Pelloux-Prayer
@ 2025-11-21 13:24   ` Christian König
  2025-11-21 16:09     ` Pierre-Eric Pelloux-Prayer
  0 siblings, 1 reply; 60+ messages in thread
From: Christian König @ 2025-11-21 13:24 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> Instead of getting it through amdgpu_ttm_adev(bo->tbo.bdev).

Why should that be a good idea?

Regards,
Christian.

> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  5 +++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 11 ++++++-----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h    |  6 ++++--
>  3 files changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 858eb9fa061b..2ee48f76483d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -725,7 +725,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>  	    bo->tbo.resource->mem_type == TTM_PL_VRAM) {
>  		struct dma_fence *fence;
>  
> -		r = amdgpu_ttm_clear_buffer(bo, bo->tbo.base.resv, &fence);
> +		r = amdgpu_ttm_clear_buffer(adev, bo, bo->tbo.base.resv, &fence);
>  		if (unlikely(r))
>  			goto fail_unreserve;
>  
> @@ -1322,7 +1322,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>  	if (r)
>  		goto out;
>  
> -	r = amdgpu_fill_buffer(&adev->mman.clear_entity, abo, 0, &bo->base._resv,
> +	r = amdgpu_fill_buffer(adev,
> +			       &adev->mman.clear_entity, abo, 0, &bo->base._resv,
>  			       &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
>  	if (WARN_ON(r))
>  		goto out;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 1d3afad885da..57dff2df433b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -414,7 +414,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
>  	    (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) {
>  		struct dma_fence *wipe_fence = NULL;
>  
> -		r = amdgpu_fill_buffer(&adev->mman.move_entity,
> +		r = amdgpu_fill_buffer(adev, &adev->mman.move_entity,
>  				       abo, 0, NULL, &wipe_fence,
>  				       AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
>  		if (r) {
> @@ -2350,6 +2350,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
>  
>  /**
>   * amdgpu_ttm_clear_buffer - clear memory buffers
> + * @adev: amdgpu device object
>   * @bo: amdgpu buffer object
>   * @resv: reservation object
>   * @fence: dma_fence associated with the operation
> @@ -2359,11 +2360,11 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
>   * Returns:
>   * 0 for success or a negative error code on failure.
>   */
> -int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
> +int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
> +			    struct amdgpu_bo *bo,
>  			    struct dma_resv *resv,
>  			    struct dma_fence **fence)
>  {
> -	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>  	struct amdgpu_res_cursor cursor;
>  	u64 addr;
>  	int r = 0;
> @@ -2414,14 +2415,14 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>  	return r;
>  }
>  
> -int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity,
> +int amdgpu_fill_buffer(struct amdgpu_device *adev,
> +		       struct amdgpu_ttm_buffer_entity *entity,
>  		       struct amdgpu_bo *bo,
>  		       uint32_t src_data,
>  		       struct dma_resv *resv,
>  		       struct dma_fence **f,
>  		       u64 k_job_id)
>  {
> -	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>  	struct dma_fence *fence = NULL;
>  	struct amdgpu_res_cursor dst;
>  	int r;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index 9288599c9c46..d0f55a7edd30 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -174,10 +174,12 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev,
>  		       struct dma_resv *resv,
>  		       struct dma_fence **fence,
>  		       bool vm_needs_flush, uint32_t copy_flags);
> -int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
> +int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
> +			    struct amdgpu_bo *bo,
>  			    struct dma_resv *resv,
>  			    struct dma_fence **fence);
> -int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity,
> +int amdgpu_fill_buffer(struct amdgpu_device *adev,
> +		       struct amdgpu_ttm_buffer_entity *entity,
>  		       struct amdgpu_bo *bo,
>  		       uint32_t src_data,
>  		       struct dma_resv *resv,


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 11/28] drm/amdgpu: statically assign gart windows to ttm entities
  2025-11-21 10:12 ` [PATCH v3 11/28] drm/amdgpu: statically assign gart windows to ttm entities Pierre-Eric Pelloux-Prayer
@ 2025-11-21 13:34   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 13:34 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter, Felix Kuehling
  Cc: amd-gfx, dri-devel, linux-kernel

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> If multiple entities share the same window we must make sure
> that jobs using them are executed sequentially.
> 
> This commit gives separate windows to each entity, so jobs
> from multiple entities could execute in parallel if needed.
> (for now they all use the first sdma engine, so it makes no
> difference yet).
> The entity stores the gart window offsets to centralize the
> "window id" to "window offset" in a single place.
> 
> default_entity doesn't get any windows reserved since there is
> no use for them.
> 
> ---
> v3:
> - renamed gart_window_lock -> lock (Christian)
> - added amdgpu_ttm_buffer_entity_init (Christian)
> - fixed gart_addr in svm_migrate_gart_map (Felix)
> - renamed gart_window_idX -> gart_window_offs[]
> - added amdgpu_compute_gart_address
> ---
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c  |  6 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  | 56 ++++++++++++++++++------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  | 14 +++++-
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  9 ++--
>  4 files changed, 61 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 94e07b9ec7b4..0d2784fe0be3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -686,7 +686,7 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
>  	 * translation. Avoid this by doing the invalidation from the SDMA
>  	 * itself at least for GART.
>  	 */
> -	mutex_lock(&adev->mman.gtt_window_lock);
> +	mutex_lock(&adev->mman.default_entity.lock);
>  	r = amdgpu_job_alloc_with_ib(ring->adev, &adev->mman.default_entity.base,
>  				     AMDGPU_FENCE_OWNER_UNDEFINED,
>  				     16 * 4, AMDGPU_IB_POOL_IMMEDIATE,
> @@ -699,7 +699,7 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
>  	job->ibs->ptr[job->ibs->length_dw++] = ring->funcs->nop;
>  	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
>  	fence = amdgpu_job_submit(job);
> -	mutex_unlock(&adev->mman.gtt_window_lock);
> +	mutex_unlock(&adev->mman.default_entity.lock);
>  
>  	dma_fence_wait(fence, false);
>  	dma_fence_put(fence);
> @@ -707,7 +707,7 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
>  	return;
>  
>  error_alloc:
> -	mutex_unlock(&adev->mman.gtt_window_lock);
> +	mutex_unlock(&adev->mman.default_entity.lock);
>  	dev_err(adev->dev, "Error flushing GPU TLB using the SDMA (%d)!\n", r);
>  }
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 57dff2df433b..1371a40d4687 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -229,9 +229,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
>  
>  	*size = min(*size, (uint64_t)num_pages * PAGE_SIZE - offset);
>  
> -	*addr = adev->gmc.gart_start;
> -	*addr += (u64)window * AMDGPU_GTT_MAX_TRANSFER_SIZE *
> -		AMDGPU_GPU_PAGE_SIZE;
> +	*addr = amdgpu_compute_gart_address(&adev->gmc, entity, window);
>  	*addr += offset;
>  
>  	num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
> @@ -249,7 +247,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
>  	src_addr += job->ibs[0].gpu_addr;
>  
>  	dst_addr = amdgpu_bo_gpu_offset(adev->gart.bo);
> -	dst_addr += window * AMDGPU_GTT_MAX_TRANSFER_SIZE * 8;
> +	dst_addr += (entity->gart_window_offs[window] >> AMDGPU_GPU_PAGE_SHIFT) * 8;
>  	amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr,
>  				dst_addr, num_bytes, 0);
>  
> @@ -314,7 +312,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>  	amdgpu_res_first(src->mem, src->offset, size, &src_mm);
>  	amdgpu_res_first(dst->mem, dst->offset, size, &dst_mm);
>  
> -	mutex_lock(&adev->mman.gtt_window_lock);
> +	mutex_lock(&entity->lock);
>  	while (src_mm.remaining) {
>  		uint64_t from, to, cur_size, tiling_flags;
>  		uint32_t num_type, data_format, max_com, write_compress_disable;
> @@ -371,7 +369,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>  		amdgpu_res_next(&dst_mm, cur_size);
>  	}
>  error:
> -	mutex_unlock(&adev->mman.gtt_window_lock);
> +	mutex_unlock(&entity->lock);
>  	*f = fence;
>  	return r;
>  }
> @@ -1876,6 +1874,27 @@ static void amdgpu_ttm_mmio_remap_bo_fini(struct amdgpu_device *adev)
>  	adev->rmmio_remap.bo = NULL;
>  }
>  
> +static int amdgpu_ttm_buffer_entity_init(struct amdgpu_ttm_buffer_entity *entity,
> +					 int starting_gart_window,
> +					 bool needs_src_gart_window,
> +					 bool needs_dst_gart_window)
> +{
> +	mutex_init(&entity->lock);
> +	if (needs_src_gart_window) {
> +		entity->gart_window_offs[0] =
> +			(u64)starting_gart_window * AMDGPU_GTT_MAX_TRANSFER_SIZE *
> +				AMDGPU_GPU_PAGE_SIZE;
> +		starting_gart_window++;
> +	}
> +	if (needs_dst_gart_window) {
> +		entity->gart_window_offs[1] =
> +			(u64)starting_gart_window * AMDGPU_GTT_MAX_TRANSFER_SIZE *
> +				AMDGPU_GPU_PAGE_SIZE;
> +		starting_gart_window++;
> +	}
> +	return starting_gart_window;
> +}
> +
>  /*
>   * amdgpu_ttm_init - Init the memory management (ttm) as well as various
>   * gtt/vram related fields.
> @@ -1890,8 +1909,6 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>  	uint64_t gtt_size;
>  	int r;
>  
> -	mutex_init(&adev->mman.gtt_window_lock);
> -
>  	dma_set_max_seg_size(adev->dev, UINT_MAX);
>  	/* No others user of address space so set it to 0 */
>  	r = ttm_device_init(&adev->mman.bdev, &amdgpu_bo_driver, adev->dev,
> @@ -2161,6 +2178,7 @@ void amdgpu_ttm_fini(struct amdgpu_device *adev)
>  void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>  {
>  	struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
> +	u32 used_windows;
>  	uint64_t size;
>  	int r;
>  
> @@ -2204,6 +2222,14 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>  			drm_sched_entity_destroy(&adev->mman.clear_entity.base);
>  			goto error_free_entity;
>  		}
> +
> +		/* Statically assign GART windows to each entity. */
> +		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.default_entity,
> +							     0, false, false);
> +		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.move_entity,
> +							     used_windows, true, true);
> +		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.clear_entity,
> +							     used_windows, false, true);
>  	} else {
>  		drm_sched_entity_destroy(&adev->mman.default_entity.base);
>  		drm_sched_entity_destroy(&adev->mman.clear_entity.base);
> @@ -2365,6 +2391,7 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
>  			    struct dma_resv *resv,
>  			    struct dma_fence **fence)
>  {
> +	struct amdgpu_ttm_buffer_entity *entity;
>  	struct amdgpu_res_cursor cursor;
>  	u64 addr;
>  	int r = 0;
> @@ -2375,11 +2402,12 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
>  	if (!fence)
>  		return -EINVAL;
>  
> +	entity = &adev->mman.clear_entity;
>  	*fence = dma_fence_get_stub();
>  
>  	amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor);
>  
> -	mutex_lock(&adev->mman.gtt_window_lock);
> +	mutex_lock(&entity->lock);
>  	while (cursor.remaining) {
>  		struct dma_fence *next = NULL;
>  		u64 size;
> @@ -2392,13 +2420,13 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
>  		/* Never clear more than 256MiB at once to avoid timeouts */
>  		size = min(cursor.size, 256ULL << 20);
>  
> -		r = amdgpu_ttm_map_buffer(adev, &adev->mman.clear_entity,
> +		r = amdgpu_ttm_map_buffer(adev, entity,
>  					  &bo->tbo, bo->tbo.resource, &cursor,
>  					  1, false, &size, &addr);
>  		if (r)
>  			goto err;
>  
> -		r = amdgpu_ttm_fill_mem(adev, &adev->mman.clear_entity, 0, addr, size, resv,
> +		r = amdgpu_ttm_fill_mem(adev, entity, 0, addr, size, resv,
>  					&next, true,
>  					AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
>  		if (r)
> @@ -2410,7 +2438,7 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
>  		amdgpu_res_next(&cursor, size);
>  	}
>  err:
> -	mutex_unlock(&adev->mman.gtt_window_lock);
> +	mutex_unlock(&entity->lock);
>  
>  	return r;
>  }
> @@ -2435,7 +2463,7 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
>  
>  	amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst);
>  
> -	mutex_lock(&adev->mman.gtt_window_lock);
> +	mutex_lock(&entity->lock);
>  	while (dst.remaining) {
>  		struct dma_fence *next;
>  		uint64_t cur_size, to;
> @@ -2461,7 +2489,7 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
>  		amdgpu_res_next(&dst, cur_size);
>  	}
>  error:
> -	mutex_unlock(&adev->mman.gtt_window_lock);
> +	mutex_unlock(&entity->lock);
>  	if (f)
>  		*f = dma_fence_get(fence);
>  	dma_fence_put(fence);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index d0f55a7edd30..a7eed678bd3f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -29,6 +29,7 @@
>  #include <drm/ttm/ttm_placement.h>
>  #include "amdgpu_vram_mgr.h"
>  #include "amdgpu_hmm.h"
> +#include "amdgpu_gmc.h"
>  
>  #define AMDGPU_PL_GDS		(TTM_PL_PRIV + 0)
>  #define AMDGPU_PL_GWS		(TTM_PL_PRIV + 1)
> @@ -39,7 +40,7 @@
>  #define __AMDGPU_PL_NUM	(TTM_PL_PRIV + 6)
>  
>  #define AMDGPU_GTT_MAX_TRANSFER_SIZE	512
> -#define AMDGPU_GTT_NUM_TRANSFER_WINDOWS	2
> +#define AMDGPU_GTT_NUM_TRANSFER_WINDOWS	3
>  
>  extern const struct attribute_group amdgpu_vram_mgr_attr_group;
>  extern const struct attribute_group amdgpu_gtt_mgr_attr_group;
> @@ -54,6 +55,8 @@ struct amdgpu_gtt_mgr {
>  
>  struct amdgpu_ttm_buffer_entity {
>  	struct drm_sched_entity base;
> +	struct mutex		lock;
> +	u32			gart_window_offs[2];

u64 here please. The theoretical size of the GART is way larger than 4GiB.

>  };
>  
>  struct amdgpu_mman {
> @@ -69,7 +72,7 @@ struct amdgpu_mman {
>  
>  	struct mutex				gtt_window_lock;
>  
> -	struct amdgpu_ttm_buffer_entity default_entity;
> +	struct amdgpu_ttm_buffer_entity default_entity; /* has no gart windows */

Comments above the field, not after. Best practice would be to use kerneldoc style.

E.g. something like:

/* @default_entity: for workarounds, has no gart windows... */

>  	struct amdgpu_ttm_buffer_entity clear_entity;
>  	struct amdgpu_ttm_buffer_entity move_entity;
>  
> @@ -201,6 +204,13 @@ static inline int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo,
>  }
>  #endif
>  
> +static inline u64 amdgpu_compute_gart_address(struct amdgpu_gmc *gmc,
> +					      struct amdgpu_ttm_buffer_entity *entity,
> +					      int index)

Kerneldoc would be nice to have for that function.

Regards,
Christian.

> +{
> +	return gmc->gart_start + entity->gart_window_offs[index];
> +}
> +
>  void amdgpu_ttm_tt_set_user_pages(struct ttm_tt *ttm, struct amdgpu_hmm_range *range);
>  int amdgpu_ttm_tt_get_userptr(const struct ttm_buffer_object *tbo,
>  			      uint64_t *user_addr);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index 9c76f1ba0e55..0cc1d2b35026 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -59,8 +59,7 @@ svm_migrate_gart_map(struct amdgpu_ring *ring,
>  	void *cpu_addr;
>  	int r;
>  
> -	/* use gart window 0 */
> -	*gart_addr = adev->gmc.gart_start;
> +	*gart_addr = amdgpu_compute_gart_address(&adev->gmc, entity, 0);
>  
>  	num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
>  	num_bytes = npages * 8;
> @@ -116,7 +115,7 @@ svm_migrate_gart_map(struct amdgpu_ring *ring,
>   * multiple GTT_MAX_PAGES transfer, all sdma operations are serialized, wait for
>   * the last sdma finish fence which is returned to check copy memory is done.
>   *
> - * Context: Process context, takes and releases gtt_window_lock
> + * Context: Process context
>   *
>   * Return:
>   * 0 - OK, otherwise error code
> @@ -138,7 +137,7 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys,
>  
>  	entity = &adev->mman.move_entity;
>  
> -	mutex_lock(&adev->mman.gtt_window_lock);
> +	mutex_lock(&entity->lock);
>  
>  	while (npages) {
>  		size = min(GTT_MAX_PAGES, npages);
> @@ -175,7 +174,7 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys,
>  	}
>  
>  out_unlock:
> -	mutex_unlock(&adev->mman.gtt_window_lock);
> +	mutex_unlock(&entity->lock);
>  
>  	return r;
>  }


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 12/28] drm/amdgpu: remove AMDGPU_GTT_NUM_TRANSFER_WINDOWS
  2025-11-21 10:12 ` [PATCH v3 12/28] drm/amdgpu: remove AMDGPU_GTT_NUM_TRANSFER_WINDOWS Pierre-Eric Pelloux-Prayer
@ 2025-11-21 13:50   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 13:50 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> ttm is going to use a variable number of windows so we need to get
> rid of the hardcoded value.
> 
> Since amdgpu_gtt_mgr_init is initialized after
> amdgpu_ttm_set_buffer_funcs_status is called with enable=false, we
> still need to determine the reserved windows count before doing
> the real initialisation so a warning is added if the actual value
> doesn't match the reserved one.

Reading that I just realized that we have a chicken and egg problem here.

When initializing the driver we know the maximum number of SDMA engines we might see, but we don't know if all of them are working.

So we need to initialize the GART and GTT manager to bringup and test each SDMA engine before we can figure out how many GART windows we need.

*sigh* We probably need to re-iterate over the idea of dynamical allocation of GART windows.

It's most likely not a problem for current production HW, but most likely become one sooner or later.

Should we clean that up now or postpone till later?

Regards,
Christian.

> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c |  8 +++++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     | 21 ++++++++++++++-------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h     |  7 +++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c     |  6 ++++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h     |  3 ++-
>  drivers/gpu/drm/amd/amdgpu/vce_v1_0.c       | 12 ++++--------
>  6 files changed, 32 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> index 895c1e4c6747..924151b6cfd3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> @@ -269,10 +269,12 @@ static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = {
>   *
>   * @adev: amdgpu_device pointer
>   * @gtt_size: maximum size of GTT
> + * @reserved_windows: num of already used windows
>   *
>   * Allocate and initialize the GTT manager.
>   */
> -int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size)
> +int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size,
> +			u32 reserved_windows)
>  {
>  	struct amdgpu_gtt_mgr *mgr = &adev->mman.gtt_mgr;
>  	struct ttm_resource_manager *man = &mgr->manager;
> @@ -283,8 +285,8 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size)
>  
>  	ttm_resource_manager_init(man, &adev->mman.bdev, gtt_size);
>  
> -	start = AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
> -	start += amdgpu_vce_required_gart_pages(adev);
> +	start = AMDGPU_GTT_MAX_TRANSFER_SIZE * reserved_windows;
> +	start += amdgpu_vce_required_gart_pages(adev, start);
>  	size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
>  	drm_mm_init(&mgr->mm, start, size);
>  	spin_lock_init(&mgr->lock);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 1371a40d4687..3a0511d1739f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1907,6 +1907,7 @@ static int amdgpu_ttm_buffer_entity_init(struct amdgpu_ttm_buffer_entity *entity
>  int amdgpu_ttm_init(struct amdgpu_device *adev)
>  {
>  	uint64_t gtt_size;
> +	u32 reserved_windows;
>  	int r;
>  
>  	dma_set_max_seg_size(adev->dev, UINT_MAX);
> @@ -1939,7 +1940,7 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>  	}
>  
>  	/* Change the size here instead of the init above so only lpfn is affected */
> -	amdgpu_ttm_set_buffer_funcs_status(adev, false);
> +	reserved_windows = amdgpu_ttm_set_buffer_funcs_status(adev, false);
>  #ifdef CONFIG_64BIT
>  #ifdef CONFIG_X86
>  	if (adev->gmc.xgmi.connected_to_cpu)
> @@ -2035,7 +2036,7 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>  	}
>  
>  	/* Initialize GTT memory pool */
> -	r = amdgpu_gtt_mgr_init(adev, gtt_size);
> +	r = amdgpu_gtt_mgr_init(adev, gtt_size, reserved_windows);
>  	if (r) {
>  		dev_err(adev->dev, "Failed initializing GTT heap.\n");
>  		return r;
> @@ -2174,17 +2175,21 @@ void amdgpu_ttm_fini(struct amdgpu_device *adev)
>   *
>   * Enable/disable use of buffer functions during suspend/resume. This should
>   * only be called at bootup or when userspace isn't running.
> + *
> + * Returns: the number of GART reserved window
>   */
> -void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
> +u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>  {
>  	struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
> -	u32 used_windows;
> +	u32 used_windows, reserved_windows;
>  	uint64_t size;
>  	int r;
>  
> +	reserved_windows = 3;
> +
>  	if (!adev->mman.initialized || amdgpu_in_reset(adev) ||
>  	    adev->mman.buffer_funcs_enabled == enable || adev->gmc.is_app_apu)
> -		return;
> +		return reserved_windows;
>  
>  	if (enable) {
>  		struct amdgpu_ring *ring;
> @@ -2199,7 +2204,7 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>  			dev_err(adev->dev,
>  				"Failed setting up TTM BO move entity (%d)\n",
>  				r);
> -			return;
> +			return 0;
>  		}
>  
>  		r = drm_sched_entity_init(&adev->mman.clear_entity.base,
> @@ -2230,6 +2235,7 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>  							     used_windows, true, true);
>  		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.clear_entity,
>  							     used_windows, false, true);
> +		WARN_ON(used_windows != reserved_windows);
>  	} else {
>  		drm_sched_entity_destroy(&adev->mman.default_entity.base);
>  		drm_sched_entity_destroy(&adev->mman.clear_entity.base);
> @@ -2248,10 +2254,11 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>  	man->size = size;
>  	adev->mman.buffer_funcs_enabled = enable;
>  
> -	return;
> +	return reserved_windows;
>  
>  error_free_entity:
>  	drm_sched_entity_destroy(&adev->mman.default_entity.base);
> +	return 0;
>  }
>  
>  static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index a7eed678bd3f..2a78cf8a3f9f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -40,7 +40,6 @@
>  #define __AMDGPU_PL_NUM	(TTM_PL_PRIV + 6)
>  
>  #define AMDGPU_GTT_MAX_TRANSFER_SIZE	512
> -#define AMDGPU_GTT_NUM_TRANSFER_WINDOWS	3
>  
>  extern const struct attribute_group amdgpu_vram_mgr_attr_group;
>  extern const struct attribute_group amdgpu_gtt_mgr_attr_group;
> @@ -134,7 +133,7 @@ struct amdgpu_copy_mem {
>  #define AMDGPU_COPY_FLAGS_GET(value, field) \
>  	(((__u32)(value) >> AMDGPU_COPY_FLAGS_##field##_SHIFT) & AMDGPU_COPY_FLAGS_##field##_MASK)
>  
> -int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size);
> +int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size, u32 reserved_windows);
>  void amdgpu_gtt_mgr_fini(struct amdgpu_device *adev);
>  int amdgpu_preempt_mgr_init(struct amdgpu_device *adev);
>  void amdgpu_preempt_mgr_fini(struct amdgpu_device *adev);
> @@ -168,8 +167,8 @@ bool amdgpu_res_cpu_visible(struct amdgpu_device *adev,
>  
>  int amdgpu_ttm_init(struct amdgpu_device *adev);
>  void amdgpu_ttm_fini(struct amdgpu_device *adev);
> -void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev,
> -					bool enable);
> +u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev,
> +				       bool enable);
>  int amdgpu_copy_buffer(struct amdgpu_device *adev,
>  		       struct amdgpu_ttm_buffer_entity *entity,
>  		       uint64_t src_offset,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index a7d8f1ce6ac2..56308efce465 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -459,11 +459,13 @@ void amdgpu_vce_free_handles(struct amdgpu_device *adev, struct drm_file *filp)
>   * For VCE1, see vce_v1_0_ensure_vcpu_bo_32bit_addr for details.
>   * For VCE2+, this is not needed so return zero.
>   */
> -u32 amdgpu_vce_required_gart_pages(struct amdgpu_device *adev)
> +u32 amdgpu_vce_required_gart_pages(struct amdgpu_device *adev, u64 gtt_transfer_end)
>  {
>  	/* VCE IP block not added yet, so can't use amdgpu_ip_version */
> -	if (adev->family == AMDGPU_FAMILY_SI)
> +	if (adev->family == AMDGPU_FAMILY_SI) {
> +		adev->vce.gart_page_start = gtt_transfer_end;
>  		return 512;
> +	}
>  
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> index 1c3464ce5037..d07302535d33 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> @@ -52,6 +52,7 @@ struct amdgpu_vce {
>  	uint32_t                srbm_soft_reset;
>  	unsigned		num_rings;
>  	uint32_t		keyselect;
> +	u64			gart_page_start;
>  };
>  
>  int amdgpu_vce_early_init(struct amdgpu_device *adev);
> @@ -61,7 +62,7 @@ int amdgpu_vce_entity_init(struct amdgpu_device *adev, struct amdgpu_ring *ring)
>  int amdgpu_vce_suspend(struct amdgpu_device *adev);
>  int amdgpu_vce_resume(struct amdgpu_device *adev);
>  void amdgpu_vce_free_handles(struct amdgpu_device *adev, struct drm_file *filp);
> -u32 amdgpu_vce_required_gart_pages(struct amdgpu_device *adev);
> +u32 amdgpu_vce_required_gart_pages(struct amdgpu_device *adev, u64 gtt_transfer_end);
>  int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p, struct amdgpu_job *job,
>  			     struct amdgpu_ib *ib);
>  int amdgpu_vce_ring_parse_cs_vm(struct amdgpu_cs_parser *p,
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> index 9ae424618556..dd18fc45225d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> @@ -47,11 +47,6 @@
>  #define VCE_V1_0_DATA_SIZE	(7808 * (AMDGPU_MAX_VCE_HANDLES + 1))
>  #define VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK	0x02
>  
> -#define VCE_V1_0_GART_PAGE_START \
> -	(AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS)
> -#define VCE_V1_0_GART_ADDR_START \
> -	(VCE_V1_0_GART_PAGE_START * AMDGPU_GPU_PAGE_SIZE)
> -
>  static void vce_v1_0_set_ring_funcs(struct amdgpu_device *adev);
>  static void vce_v1_0_set_irq_funcs(struct amdgpu_device *adev);
>  
> @@ -541,6 +536,7 @@ static int vce_v1_0_ensure_vcpu_bo_32bit_addr(struct amdgpu_device *adev)
>  	u64 num_pages = ALIGN(bo_size, AMDGPU_GPU_PAGE_SIZE) / AMDGPU_GPU_PAGE_SIZE;
>  	u64 pa = amdgpu_gmc_vram_pa(adev, adev->vce.vcpu_bo);
>  	u64 flags = AMDGPU_PTE_READABLE | AMDGPU_PTE_WRITEABLE | AMDGPU_PTE_VALID;
> +	u64 vce_gart_addr_start = adev->vce.gart_page_start * AMDGPU_GPU_PAGE_SIZE;
>  
>  	/*
>  	 * Check if the VCPU BO already has a 32-bit address.
> @@ -550,12 +546,12 @@ static int vce_v1_0_ensure_vcpu_bo_32bit_addr(struct amdgpu_device *adev)
>  		return 0;
>  
>  	/* Check if we can map the VCPU BO in GART to a 32-bit address. */
> -	if (adev->gmc.gart_start + VCE_V1_0_GART_ADDR_START > max_vcpu_bo_addr)
> +	if (adev->gmc.gart_start + vce_gart_addr_start > max_vcpu_bo_addr)
>  		return -EINVAL;
>  
> -	amdgpu_gart_map_vram_range(adev, pa, VCE_V1_0_GART_PAGE_START,
> +	amdgpu_gart_map_vram_range(adev, pa, adev->vce.gart_page_start,
>  				   num_pages, flags, adev->gart.ptr);
> -	adev->vce.gpu_addr = adev->gmc.gart_start + VCE_V1_0_GART_ADDR_START;
> +	adev->vce.gpu_addr = adev->gmc.gart_start + vce_gart_addr_start;
>  	if (adev->vce.gpu_addr > max_vcpu_bo_addr)
>  		return -EINVAL;
>  


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 13/28] drm/amdgpu: add missing lock when using ttm entities
  2025-11-21 10:12 ` [PATCH v3 13/28] drm/amdgpu: add missing lock when using ttm entities Pierre-Eric Pelloux-Prayer
@ 2025-11-21 13:53   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 13:53 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> Taking the entity lock is required to guarantee the ordering of
> execution. The next commit will add a check that the lock is
> held.
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

The benchmarking is kind of irrelevant, but adding the other lock should actually be the first patch in the series since it is a bug fix (it needs to grab the gart window lock at this place of course).

Then the patch needs a CC stable and a Fixes tag to the patch who introduced amdgpu_ttm_access_memory_sdma().

With that done: Reviewed-by: Christian König <christian.koenig@amd.com>

Regards,
Christian.

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> index a050167e76a4..832d9ae101f0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> @@ -35,6 +35,7 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size,
>  	struct dma_fence *fence;
>  	int i, r;
>  
> +	mutex_lock(&adev->mman.default_entity.lock);
>  	stime = ktime_get();
>  	for (i = 0; i < n; i++) {
>  		r = amdgpu_copy_buffer(adev, &adev->mman.default_entity,
> @@ -47,6 +48,7 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size,
>  		if (r)
>  			goto exit_do_move;
>  	}
> +	mutex_unlock(&adev->mman.default_entity.lock);
>  
>  exit_do_move:
>  	etime = ktime_get();
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 3a0511d1739f..a803af015d05 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1501,6 +1501,7 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
>  	if (r)
>  		goto out;
>  
> +	mutex_lock(&adev->mman.default_entity.lock);
>  	amdgpu_res_first(abo->tbo.resource, offset, len, &src_mm);
>  	src_addr = amdgpu_ttm_domain_start(adev, bo->resource->mem_type) +
>  		src_mm.start;
> @@ -1512,6 +1513,7 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
>  				PAGE_SIZE, 0);
>  
>  	fence = amdgpu_ttm_job_submit(adev, job, num_dw);
> +	mutex_unlock(&adev->mman.default_entity.lock);
>  
>  	if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout))
>  		r = -ETIMEDOUT;


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 14/28] drm/amdgpu: check entity lock is held in amdgpu_ttm_job_submit
  2025-11-21 10:12 ` [PATCH v3 14/28] drm/amdgpu: check entity lock is held in amdgpu_ttm_job_submit Pierre-Eric Pelloux-Prayer
@ 2025-11-21 13:54   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 13:54 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> drm_sched_job_arm and drm_sched_entity_push_job must be called
> under the same lock to guarantee the order of execution.
> 
> This commit adds a check in amdgpu_ttm_job_submit and fix the
> places where the lock was missing.
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index a803af015d05..164b49d768d8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -163,7 +163,8 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
>  }
>  
>  static struct dma_fence *
> -amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_job *job, u32 num_dw)
> +amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_ttm_buffer_entity *entity,
> +		      struct amdgpu_job *job, u32 num_dw)
>  {
>  	struct amdgpu_ring *ring;
>  
> @@ -171,6 +172,8 @@ amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_job *job, u32 nu
>  	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
>  	WARN_ON(job->ibs[0].length_dw > num_dw);
>  
> +	lockdep_assert_held(&entity->lock);
> +
>  	return amdgpu_job_submit(job);
>  }
>  
> @@ -268,7 +271,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
>  		amdgpu_gart_map_vram_range(adev, pa, 0, num_pages, flags, cpu_addr);
>  	}
>  
> -	dma_fence_put(amdgpu_ttm_job_submit(adev, job, num_dw));
> +	dma_fence_put(amdgpu_ttm_job_submit(adev, entity, job, num_dw));
>  	return 0;
>  }
>  
> @@ -1512,7 +1515,7 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
>  	amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr,
>  				PAGE_SIZE, 0);
>  
> -	fence = amdgpu_ttm_job_submit(adev, job, num_dw);
> +	fence = amdgpu_ttm_job_submit(adev, &adev->mman.default_entity, job, num_dw);
>  	mutex_unlock(&adev->mman.default_entity.lock);
>  
>  	if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout))
> @@ -2336,7 +2339,7 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev,
>  		byte_count -= cur_size_in_bytes;
>  	}
>  
> -	*fence = amdgpu_ttm_job_submit(adev, job, num_dw);
> +	*fence = amdgpu_ttm_job_submit(adev, entity, job, num_dw);
>  
>  	return 0;
>  
> @@ -2379,7 +2382,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
>  		byte_count -= cur_size;
>  	}
>  
> -	*fence = amdgpu_ttm_job_submit(adev, job, num_dw);
> +	*fence = amdgpu_ttm_job_submit(adev, entity, job, num_dw);
>  	return 0;
>  }
>  


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 16/28] drm/amdgpu: use larger gart window when possible
  2025-11-21 10:12 ` [PATCH v3 16/28] drm/amdgpu: use larger gart window when possible Pierre-Eric Pelloux-Prayer
@ 2025-11-21 15:02   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 15:02 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> Entities' gart windows are contiguous so when copying a buffer
> and src doesn't need a gart window, its window can be used to
> extend dst one (and vice versa).
> 
> This doubles the gart window size and reduces the number of jobs
> required.
> 
> ---
> v2: pass adev instead of ring to amdgpu_ttm_needs_gart_window
> ---
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 78 ++++++++++++++++++-------
>  1 file changed, 58 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 164b49d768d8..489880b2fb8e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -177,6 +177,21 @@ amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_ttm_buffer_entit
>  	return amdgpu_job_submit(job);
>  }
>  
> +static bool amdgpu_ttm_needs_gart_window(struct amdgpu_device *adev,
> +					 struct ttm_resource *mem,
> +					 struct amdgpu_res_cursor *mm_cur,
> +					 bool tmz,
> +					 uint64_t *addr)
> +{
> +	/* Map only what can't be accessed directly */
> +	if (!tmz && mem->start != AMDGPU_BO_INVALID_OFFSET) {

Checking mem->start here is actually a really bad idea.

IIRC Amar was once working on patches to fix that, but never finished them.

On the other hand, that is certainly not a problem for this patch here.

> +		*addr = amdgpu_ttm_domain_start(adev, mem->mem_type) +
> +			mm_cur->start;
> +		return false;
> +	}
> +	return true;
> +}
> +
>  /**
>   * amdgpu_ttm_map_buffer - Map memory into the GART windows
>   * @adev: the device being used
> @@ -185,6 +200,7 @@ amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_ttm_buffer_entit
>   * @mem: memory object to map
>   * @mm_cur: range to map
>   * @window: which GART window to use
> + * @use_two_windows: if true, use a double window
>   * @tmz: if we should setup a TMZ enabled mapping
>   * @size: in number of bytes to map, out number of bytes mapped
>   * @addr: resulting address inside the MC address space
> @@ -198,6 +214,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
>  				 struct ttm_resource *mem,
>  				 struct amdgpu_res_cursor *mm_cur,
>  				 unsigned int window,
> +				 bool use_two_windows,
>  				 bool tmz, uint64_t *size, uint64_t *addr)
>  {
>  	unsigned int offset, num_pages, num_dw, num_bytes;
> @@ -213,13 +230,8 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
>  	if (WARN_ON(mem->mem_type == AMDGPU_PL_PREEMPT))
>  		return -EINVAL;
>  
> -	/* Map only what can't be accessed directly */
> -	if (!tmz && mem->start != AMDGPU_BO_INVALID_OFFSET) {
> -		*addr = amdgpu_ttm_domain_start(adev, mem->mem_type) +
> -			mm_cur->start;
> +	if (!amdgpu_ttm_needs_gart_window(adev, mem, mm_cur, tmz, addr))
>  		return 0;
> -	}
> -

When that is now checkout outside of the function this shouldn't be necessary any more and we should be able to remove that here.

>  
>  	/*
>  	 * If start begins at an offset inside the page, then adjust the size
> @@ -228,7 +240,8 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev,
>  	offset = mm_cur->start & ~PAGE_MASK;
>  
>  	num_pages = PFN_UP(*size + offset);
> -	num_pages = min_t(uint32_t, num_pages, AMDGPU_GTT_MAX_TRANSFER_SIZE);
> +	num_pages = min_t(uint32_t,
> +		num_pages, AMDGPU_GTT_MAX_TRANSFER_SIZE * (use_two_windows ? 2 : 1));

Rather use two separate calls to amdgpu_ttm_map_buffer() and just increment the cursor before the second call.

>  
>  	*size = min(*size, (uint64_t)num_pages * PAGE_SIZE - offset);
>  
> @@ -300,7 +313,9 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>  				      struct dma_resv *resv,
>  				      struct dma_fence **f)
>  {
> +	bool src_needs_gart_window, dst_needs_gart_window, use_two_gart_windows;
>  	struct amdgpu_res_cursor src_mm, dst_mm;
> +	int src_gart_window, dst_gart_window;
>  	struct dma_fence *fence = NULL;
>  	int r = 0;
>  	uint32_t copy_flags = 0;
> @@ -324,18 +339,40 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>  		/* Never copy more than 256MiB at once to avoid a timeout */
>  		cur_size = min3(src_mm.size, dst_mm.size, 256ULL << 20);
>  
> -		/* Map src to window 0 and dst to window 1. */
> -		r = amdgpu_ttm_map_buffer(adev, entity,
> -					  src->bo, src->mem, &src_mm,
> -					  0, tmz, &cur_size, &from);
> -		if (r)
> -			goto error;
> +		/* If only one direction needs a gart window to access memory, use both
> +		 * windows for it.
> +		 */
> +		src_needs_gart_window =
> +			amdgpu_ttm_needs_gart_window(adev, src->mem, &src_mm, tmz, &from);
> +		dst_needs_gart_window =
> +			amdgpu_ttm_needs_gart_window(adev, dst->mem, &dst_mm, tmz, &to);

The coding style looks a bit odd.

Regards,
Christian.

>  
> -		r = amdgpu_ttm_map_buffer(adev, entity,
> -					  dst->bo, dst->mem, &dst_mm,
> -					  1, tmz, &cur_size, &to);
> -		if (r)
> -			goto error;
> +		if (src_needs_gart_window) {
> +			src_gart_window = 0;
> +			use_two_gart_windows = !dst_needs_gart_window;
> +		}
> +		if (dst_needs_gart_window) {
> +			dst_gart_window = src_needs_gart_window ? 1 : 0;
> +			use_two_gart_windows = !src_needs_gart_window;
> +		}
> +
> +		if (src_needs_gart_window) {
> +			r = amdgpu_ttm_map_buffer(adev, entity,
> +						  src->bo, src->mem, &src_mm,
> +						  src_gart_window, use_two_gart_windows,
> +						  tmz, &cur_size, &from);
> +			if (r)
> +				goto error;
> +		}
> +
> +		if (dst_needs_gart_window) {
> +			r = amdgpu_ttm_map_buffer(adev, entity,
> +						  dst->bo, dst->mem, &dst_mm,
> +						  dst_gart_window, use_two_gart_windows,
> +						  tmz, &cur_size, &to);
> +			if (r)
> +				goto error;
> +		}
>  
>  		abo_src = ttm_to_amdgpu_bo(src->bo);
>  		abo_dst = ttm_to_amdgpu_bo(dst->bo);
> @@ -2434,7 +2471,7 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
>  
>  		r = amdgpu_ttm_map_buffer(adev, entity,
>  					  &bo->tbo, bo->tbo.resource, &cursor,
> -					  1, false, &size, &addr);
> +					  1, false, false, &size, &addr);
>  		if (r)
>  			goto err;
>  
> @@ -2485,7 +2522,8 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
>  
>  		r = amdgpu_ttm_map_buffer(adev, entity,
>  					  &bo->tbo, bo->tbo.resource, &dst,
> -					  1, false, &cur_size, &to);
> +					  1, false, false,
> +					  &cur_size, &to);
>  		if (r)
>  			goto error;
>  


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 17/28] drm/amdgpu: introduce amdgpu_sdma_set_vm_pte_scheds
  2025-11-21 10:12 ` [PATCH v3 17/28] drm/amdgpu: introduce amdgpu_sdma_set_vm_pte_scheds Pierre-Eric Pelloux-Prayer
@ 2025-11-21 15:05   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 15:05 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> All sdma versions used the same logic, so add a helper and move the
> common code to a single place.
> 
> ---
> v2: pass amdgpu_vm_pte_funcs as well
> v3: drop all the *_set_vm_pte_funcs one liners
> ---
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h      |  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   | 17 ++++++++++++
>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c    | 31 ++++++---------------
>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c   | 31 ++++++---------------
>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c   | 31 ++++++---------------
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   | 35 ++++++------------------
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 35 ++++++------------------
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   | 31 ++++++---------------
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c   | 31 ++++++---------------
>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c   | 29 ++++++--------------
>  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c   | 29 ++++++--------------
>  drivers/gpu/drm/amd/amdgpu/si_dma.c      | 31 ++++++---------------
>  12 files changed, 105 insertions(+), 228 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 790e84fec949..a50e3c0a4b18 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1612,6 +1612,8 @@ struct dma_fence *amdgpu_device_enforce_isolation(struct amdgpu_device *adev,
>  bool amdgpu_device_has_display_hardware(struct amdgpu_device *adev);
>  ssize_t amdgpu_get_soft_full_reset_mask(struct amdgpu_ring *ring);
>  ssize_t amdgpu_show_reset_mask(char *buf, uint32_t supported_reset);
> +void amdgpu_sdma_set_vm_pte_scheds(struct amdgpu_device *adev,
> +				   const struct amdgpu_vm_pte_funcs *vm_pte_funcs);
>  
>  /* atpx handler */
>  #if defined(CONFIG_VGA_SWITCHEROO)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 193de267984e..5061d5b0f875 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3228,3 +3228,20 @@ void amdgpu_vm_print_task_info(struct amdgpu_device *adev,
>  		task_info->process_name, task_info->tgid,
>  		task_info->task.comm, task_info->task.pid);
>  }
> +
> +void amdgpu_sdma_set_vm_pte_scheds(struct amdgpu_device *adev,
> +				   const struct amdgpu_vm_pte_funcs *vm_pte_funcs)
> +{
> +	struct drm_gpu_scheduler *sched;
> +	int i;
> +
> +	for (i = 0; i < adev->sdma.num_instances; i++) {
> +		if (adev->sdma.has_page_queue)
> +			sched = &adev->sdma.instance[i].page.sched;
> +		else
> +			sched = &adev->sdma.instance[i].ring.sched;
> +		adev->vm_manager.vm_pte_scheds[i] = sched;
> +	}
> +	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
> +	adev->vm_manager.vm_pte_funcs = vm_pte_funcs;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> index 9e8715b4739d..22780c09177d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> @@ -53,7 +53,6 @@ static const u32 sdma_offsets[SDMA_MAX_INSTANCE] =
>  static void cik_sdma_set_ring_funcs(struct amdgpu_device *adev);
>  static void cik_sdma_set_irq_funcs(struct amdgpu_device *adev);
>  static void cik_sdma_set_buffer_funcs(struct amdgpu_device *adev);
> -static void cik_sdma_set_vm_pte_funcs(struct amdgpu_device *adev);
>  static int cik_sdma_soft_reset(struct amdgpu_ip_block *ip_block);
>  
>  u32 amdgpu_cik_gpu_check_soft_reset(struct amdgpu_device *adev);
> @@ -919,6 +918,14 @@ static void cik_enable_sdma_mgls(struct amdgpu_device *adev,
>  	}
>  }
>  
> +static const struct amdgpu_vm_pte_funcs cik_sdma_vm_pte_funcs = {
> +	.copy_pte_num_dw = 7,
> +	.copy_pte = cik_sdma_vm_copy_pte,
> +
> +	.write_pte = cik_sdma_vm_write_pte,
> +	.set_pte_pde = cik_sdma_vm_set_pte_pde,
> +};
> +
>  static int cik_sdma_early_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> @@ -933,7 +940,7 @@ static int cik_sdma_early_init(struct amdgpu_ip_block *ip_block)
>  	cik_sdma_set_ring_funcs(adev);
>  	cik_sdma_set_irq_funcs(adev);
>  	cik_sdma_set_buffer_funcs(adev);
> -	cik_sdma_set_vm_pte_funcs(adev);
> +	amdgpu_sdma_set_vm_pte_scheds(adev, &cik_sdma_vm_pte_funcs);
>  
>  	return 0;
>  }
> @@ -1337,26 +1344,6 @@ static void cik_sdma_set_buffer_funcs(struct amdgpu_device *adev)
>  	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
>  }
>  
> -static const struct amdgpu_vm_pte_funcs cik_sdma_vm_pte_funcs = {
> -	.copy_pte_num_dw = 7,
> -	.copy_pte = cik_sdma_vm_copy_pte,
> -
> -	.write_pte = cik_sdma_vm_write_pte,
> -	.set_pte_pde = cik_sdma_vm_set_pte_pde,
> -};
> -
> -static void cik_sdma_set_vm_pte_funcs(struct amdgpu_device *adev)
> -{
> -	unsigned i;
> -
> -	adev->vm_manager.vm_pte_funcs = &cik_sdma_vm_pte_funcs;
> -	for (i = 0; i < adev->sdma.num_instances; i++) {
> -		adev->vm_manager.vm_pte_scheds[i] =
> -			&adev->sdma.instance[i].ring.sched;
> -	}
> -	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
> -}
> -
>  const struct amdgpu_ip_block_version cik_sdma_ip_block =
>  {
>  	.type = AMD_IP_BLOCK_TYPE_SDMA,
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> index 92ce580647cd..0090ace49024 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> @@ -51,7 +51,6 @@
>  
>  static void sdma_v2_4_set_ring_funcs(struct amdgpu_device *adev);
>  static void sdma_v2_4_set_buffer_funcs(struct amdgpu_device *adev);
> -static void sdma_v2_4_set_vm_pte_funcs(struct amdgpu_device *adev);
>  static void sdma_v2_4_set_irq_funcs(struct amdgpu_device *adev);
>  
>  MODULE_FIRMWARE("amdgpu/topaz_sdma.bin");
> @@ -809,6 +808,14 @@ static void sdma_v2_4_ring_emit_wreg(struct amdgpu_ring *ring,
>  	amdgpu_ring_write(ring, val);
>  }
>  
> +static const struct amdgpu_vm_pte_funcs sdma_v2_4_vm_pte_funcs = {
> +	.copy_pte_num_dw = 7,
> +	.copy_pte = sdma_v2_4_vm_copy_pte,
> +
> +	.write_pte = sdma_v2_4_vm_write_pte,
> +	.set_pte_pde = sdma_v2_4_vm_set_pte_pde,
> +};
> +
>  static int sdma_v2_4_early_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> @@ -822,7 +829,7 @@ static int sdma_v2_4_early_init(struct amdgpu_ip_block *ip_block)
>  
>  	sdma_v2_4_set_ring_funcs(adev);
>  	sdma_v2_4_set_buffer_funcs(adev);
> -	sdma_v2_4_set_vm_pte_funcs(adev);
> +	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v2_4_vm_pte_funcs);
>  	sdma_v2_4_set_irq_funcs(adev);
>  
>  	return 0;
> @@ -1232,26 +1239,6 @@ static void sdma_v2_4_set_buffer_funcs(struct amdgpu_device *adev)
>  	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
>  }
>  
> -static const struct amdgpu_vm_pte_funcs sdma_v2_4_vm_pte_funcs = {
> -	.copy_pte_num_dw = 7,
> -	.copy_pte = sdma_v2_4_vm_copy_pte,
> -
> -	.write_pte = sdma_v2_4_vm_write_pte,
> -	.set_pte_pde = sdma_v2_4_vm_set_pte_pde,
> -};
> -
> -static void sdma_v2_4_set_vm_pte_funcs(struct amdgpu_device *adev)
> -{
> -	unsigned i;
> -
> -	adev->vm_manager.vm_pte_funcs = &sdma_v2_4_vm_pte_funcs;
> -	for (i = 0; i < adev->sdma.num_instances; i++) {
> -		adev->vm_manager.vm_pte_scheds[i] =
> -			&adev->sdma.instance[i].ring.sched;
> -	}
> -	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
> -}
> -
>  const struct amdgpu_ip_block_version sdma_v2_4_ip_block = {
>  	.type = AMD_IP_BLOCK_TYPE_SDMA,
>  	.major = 2,
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> index 1c076bd1cf73..2526d393162a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> @@ -51,7 +51,6 @@
>  
>  static void sdma_v3_0_set_ring_funcs(struct amdgpu_device *adev);
>  static void sdma_v3_0_set_buffer_funcs(struct amdgpu_device *adev);
> -static void sdma_v3_0_set_vm_pte_funcs(struct amdgpu_device *adev);
>  static void sdma_v3_0_set_irq_funcs(struct amdgpu_device *adev);
>  
>  MODULE_FIRMWARE("amdgpu/tonga_sdma.bin");
> @@ -1082,6 +1081,14 @@ static void sdma_v3_0_ring_emit_wreg(struct amdgpu_ring *ring,
>  	amdgpu_ring_write(ring, val);
>  }
>  
> +static const struct amdgpu_vm_pte_funcs sdma_v3_0_vm_pte_funcs = {
> +	.copy_pte_num_dw = 7,
> +	.copy_pte = sdma_v3_0_vm_copy_pte,
> +
> +	.write_pte = sdma_v3_0_vm_write_pte,
> +	.set_pte_pde = sdma_v3_0_vm_set_pte_pde,
> +};
> +
>  static int sdma_v3_0_early_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> @@ -1102,7 +1109,7 @@ static int sdma_v3_0_early_init(struct amdgpu_ip_block *ip_block)
>  
>  	sdma_v3_0_set_ring_funcs(adev);
>  	sdma_v3_0_set_buffer_funcs(adev);
> -	sdma_v3_0_set_vm_pte_funcs(adev);
> +	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v3_0_vm_pte_funcs);
>  	sdma_v3_0_set_irq_funcs(adev);
>  
>  	return 0;
> @@ -1674,26 +1681,6 @@ static void sdma_v3_0_set_buffer_funcs(struct amdgpu_device *adev)
>  	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
>  }
>  
> -static const struct amdgpu_vm_pte_funcs sdma_v3_0_vm_pte_funcs = {
> -	.copy_pte_num_dw = 7,
> -	.copy_pte = sdma_v3_0_vm_copy_pte,
> -
> -	.write_pte = sdma_v3_0_vm_write_pte,
> -	.set_pte_pde = sdma_v3_0_vm_set_pte_pde,
> -};
> -
> -static void sdma_v3_0_set_vm_pte_funcs(struct amdgpu_device *adev)
> -{
> -	unsigned i;
> -
> -	adev->vm_manager.vm_pte_funcs = &sdma_v3_0_vm_pte_funcs;
> -	for (i = 0; i < adev->sdma.num_instances; i++) {
> -		adev->vm_manager.vm_pte_scheds[i] =
> -			 &adev->sdma.instance[i].ring.sched;
> -	}
> -	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
> -}
> -
>  const struct amdgpu_ip_block_version sdma_v3_0_ip_block =
>  {
>  	.type = AMD_IP_BLOCK_TYPE_SDMA,
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index f38004e6064e..a35d9951e22a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -129,7 +129,6 @@ static const struct amdgpu_hwip_reg_entry sdma_reg_list_4_0[] = {
>  
>  static void sdma_v4_0_set_ring_funcs(struct amdgpu_device *adev);
>  static void sdma_v4_0_set_buffer_funcs(struct amdgpu_device *adev);
> -static void sdma_v4_0_set_vm_pte_funcs(struct amdgpu_device *adev);
>  static void sdma_v4_0_set_irq_funcs(struct amdgpu_device *adev);
>  static void sdma_v4_0_set_ras_funcs(struct amdgpu_device *adev);
>  
> @@ -1751,6 +1750,14 @@ static bool sdma_v4_0_fw_support_paging_queue(struct amdgpu_device *adev)
>  	}
>  }
>  
> +static const struct amdgpu_vm_pte_funcs sdma_v4_0_vm_pte_funcs = {
> +	.copy_pte_num_dw = 7,
> +	.copy_pte = sdma_v4_0_vm_copy_pte,
> +
> +	.write_pte = sdma_v4_0_vm_write_pte,
> +	.set_pte_pde = sdma_v4_0_vm_set_pte_pde,
> +};
> +
>  static int sdma_v4_0_early_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> @@ -1769,7 +1776,7 @@ static int sdma_v4_0_early_init(struct amdgpu_ip_block *ip_block)
>  
>  	sdma_v4_0_set_ring_funcs(adev);
>  	sdma_v4_0_set_buffer_funcs(adev);
> -	sdma_v4_0_set_vm_pte_funcs(adev);
> +	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v4_0_vm_pte_funcs);
>  	sdma_v4_0_set_irq_funcs(adev);
>  	sdma_v4_0_set_ras_funcs(adev);
>  
> @@ -2615,30 +2622,6 @@ static void sdma_v4_0_set_buffer_funcs(struct amdgpu_device *adev)
>  		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
>  }
>  
> -static const struct amdgpu_vm_pte_funcs sdma_v4_0_vm_pte_funcs = {
> -	.copy_pte_num_dw = 7,
> -	.copy_pte = sdma_v4_0_vm_copy_pte,
> -
> -	.write_pte = sdma_v4_0_vm_write_pte,
> -	.set_pte_pde = sdma_v4_0_vm_set_pte_pde,
> -};
> -
> -static void sdma_v4_0_set_vm_pte_funcs(struct amdgpu_device *adev)
> -{
> -	struct drm_gpu_scheduler *sched;
> -	unsigned i;
> -
> -	adev->vm_manager.vm_pte_funcs = &sdma_v4_0_vm_pte_funcs;
> -	for (i = 0; i < adev->sdma.num_instances; i++) {
> -		if (adev->sdma.has_page_queue)
> -			sched = &adev->sdma.instance[i].page.sched;
> -		else
> -			sched = &adev->sdma.instance[i].ring.sched;
> -		adev->vm_manager.vm_pte_scheds[i] = sched;
> -	}
> -	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
> -}
> -
>  static void sdma_v4_0_get_ras_error_count(uint32_t value,
>  					uint32_t instance,
>  					uint32_t *sec_count)
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> index a1443990d5c6..7f77367848d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> @@ -104,7 +104,6 @@ static const struct amdgpu_hwip_reg_entry sdma_reg_list_4_4_2[] = {
>  
>  static void sdma_v4_4_2_set_ring_funcs(struct amdgpu_device *adev);
>  static void sdma_v4_4_2_set_buffer_funcs(struct amdgpu_device *adev);
> -static void sdma_v4_4_2_set_vm_pte_funcs(struct amdgpu_device *adev);
>  static void sdma_v4_4_2_set_irq_funcs(struct amdgpu_device *adev);
>  static void sdma_v4_4_2_set_ras_funcs(struct amdgpu_device *adev);
>  static void sdma_v4_4_2_update_reset_mask(struct amdgpu_device *adev);
> @@ -1347,6 +1346,14 @@ static const struct amdgpu_sdma_funcs sdma_v4_4_2_sdma_funcs = {
>  	.soft_reset_kernel_queue = &sdma_v4_4_2_soft_reset_engine,
>  };
>  
> +static const struct amdgpu_vm_pte_funcs sdma_v4_4_2_vm_pte_funcs = {
> +	.copy_pte_num_dw = 7,
> +	.copy_pte = sdma_v4_4_2_vm_copy_pte,
> +
> +	.write_pte = sdma_v4_4_2_vm_write_pte,
> +	.set_pte_pde = sdma_v4_4_2_vm_set_pte_pde,
> +};
> +
>  static int sdma_v4_4_2_early_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> @@ -1362,7 +1369,7 @@ static int sdma_v4_4_2_early_init(struct amdgpu_ip_block *ip_block)
>  
>  	sdma_v4_4_2_set_ring_funcs(adev);
>  	sdma_v4_4_2_set_buffer_funcs(adev);
> -	sdma_v4_4_2_set_vm_pte_funcs(adev);
> +	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v4_4_2_vm_pte_funcs);
>  	sdma_v4_4_2_set_irq_funcs(adev);
>  	sdma_v4_4_2_set_ras_funcs(adev);
>  	return 0;
> @@ -2316,30 +2323,6 @@ static void sdma_v4_4_2_set_buffer_funcs(struct amdgpu_device *adev)
>  		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
>  }
>  
> -static const struct amdgpu_vm_pte_funcs sdma_v4_4_2_vm_pte_funcs = {
> -	.copy_pte_num_dw = 7,
> -	.copy_pte = sdma_v4_4_2_vm_copy_pte,
> -
> -	.write_pte = sdma_v4_4_2_vm_write_pte,
> -	.set_pte_pde = sdma_v4_4_2_vm_set_pte_pde,
> -};
> -
> -static void sdma_v4_4_2_set_vm_pte_funcs(struct amdgpu_device *adev)
> -{
> -	struct drm_gpu_scheduler *sched;
> -	unsigned i;
> -
> -	adev->vm_manager.vm_pte_funcs = &sdma_v4_4_2_vm_pte_funcs;
> -	for (i = 0; i < adev->sdma.num_instances; i++) {
> -		if (adev->sdma.has_page_queue)
> -			sched = &adev->sdma.instance[i].page.sched;
> -		else
> -			sched = &adev->sdma.instance[i].ring.sched;
> -		adev->vm_manager.vm_pte_scheds[i] = sched;
> -	}
> -	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
> -}
> -
>  /**
>   * sdma_v4_4_2_update_reset_mask - update  reset mask for SDMA
>   * @adev: Pointer to the AMDGPU device structure
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> index 8ddc4df06a1f..7ce13c5d4e61 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> @@ -110,7 +110,6 @@ static const struct amdgpu_hwip_reg_entry sdma_reg_list_5_0[] = {
>  
>  static void sdma_v5_0_set_ring_funcs(struct amdgpu_device *adev);
>  static void sdma_v5_0_set_buffer_funcs(struct amdgpu_device *adev);
> -static void sdma_v5_0_set_vm_pte_funcs(struct amdgpu_device *adev);
>  static void sdma_v5_0_set_irq_funcs(struct amdgpu_device *adev);
>  static int sdma_v5_0_stop_queue(struct amdgpu_ring *ring);
>  static int sdma_v5_0_restore_queue(struct amdgpu_ring *ring);
> @@ -1357,6 +1356,13 @@ static const struct amdgpu_sdma_funcs sdma_v5_0_sdma_funcs = {
>  	.soft_reset_kernel_queue = &sdma_v5_0_soft_reset_engine,
>  };
>  
> +static const struct amdgpu_vm_pte_funcs sdma_v5_0_vm_pte_funcs = {
> +	.copy_pte_num_dw = 7,
> +	.copy_pte = sdma_v5_0_vm_copy_pte,
> +	.write_pte = sdma_v5_0_vm_write_pte,
> +	.set_pte_pde = sdma_v5_0_vm_set_pte_pde,
> +};
> +
>  static int sdma_v5_0_early_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> @@ -1368,7 +1374,7 @@ static int sdma_v5_0_early_init(struct amdgpu_ip_block *ip_block)
>  
>  	sdma_v5_0_set_ring_funcs(adev);
>  	sdma_v5_0_set_buffer_funcs(adev);
> -	sdma_v5_0_set_vm_pte_funcs(adev);
> +	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v5_0_vm_pte_funcs);
>  	sdma_v5_0_set_irq_funcs(adev);
>  	sdma_v5_0_set_mqd_funcs(adev);
>  
> @@ -2073,27 +2079,6 @@ static void sdma_v5_0_set_buffer_funcs(struct amdgpu_device *adev)
>  	}
>  }
>  
> -static const struct amdgpu_vm_pte_funcs sdma_v5_0_vm_pte_funcs = {
> -	.copy_pte_num_dw = 7,
> -	.copy_pte = sdma_v5_0_vm_copy_pte,
> -	.write_pte = sdma_v5_0_vm_write_pte,
> -	.set_pte_pde = sdma_v5_0_vm_set_pte_pde,
> -};
> -
> -static void sdma_v5_0_set_vm_pte_funcs(struct amdgpu_device *adev)
> -{
> -	unsigned i;
> -
> -	if (adev->vm_manager.vm_pte_funcs == NULL) {
> -		adev->vm_manager.vm_pte_funcs = &sdma_v5_0_vm_pte_funcs;
> -		for (i = 0; i < adev->sdma.num_instances; i++) {
> -			adev->vm_manager.vm_pte_scheds[i] =
> -				&adev->sdma.instance[i].ring.sched;
> -		}
> -		adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
> -	}
> -}
> -
>  const struct amdgpu_ip_block_version sdma_v5_0_ip_block = {
>  	.type = AMD_IP_BLOCK_TYPE_SDMA,
>  	.major = 5,
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> index 51101b0aa2fa..98beff18cf28 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> @@ -111,7 +111,6 @@ static const struct amdgpu_hwip_reg_entry sdma_reg_list_5_2[] = {
>  
>  static void sdma_v5_2_set_ring_funcs(struct amdgpu_device *adev);
>  static void sdma_v5_2_set_buffer_funcs(struct amdgpu_device *adev);
> -static void sdma_v5_2_set_vm_pte_funcs(struct amdgpu_device *adev);
>  static void sdma_v5_2_set_irq_funcs(struct amdgpu_device *adev);
>  static int sdma_v5_2_stop_queue(struct amdgpu_ring *ring);
>  static int sdma_v5_2_restore_queue(struct amdgpu_ring *ring);
> @@ -1248,6 +1247,13 @@ static void sdma_v5_2_ring_emit_reg_write_reg_wait(struct amdgpu_ring *ring,
>  	amdgpu_ring_emit_reg_wait(ring, reg1, mask, mask);
>  }
>  
> +static const struct amdgpu_vm_pte_funcs sdma_v5_2_vm_pte_funcs = {
> +	.copy_pte_num_dw = 7,
> +	.copy_pte = sdma_v5_2_vm_copy_pte,
> +	.write_pte = sdma_v5_2_vm_write_pte,
> +	.set_pte_pde = sdma_v5_2_vm_set_pte_pde,
> +};
> +
>  static int sdma_v5_2_early_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> @@ -1259,7 +1265,7 @@ static int sdma_v5_2_early_init(struct amdgpu_ip_block *ip_block)
>  
>  	sdma_v5_2_set_ring_funcs(adev);
>  	sdma_v5_2_set_buffer_funcs(adev);
> -	sdma_v5_2_set_vm_pte_funcs(adev);
> +	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v5_2_vm_pte_funcs);
>  	sdma_v5_2_set_irq_funcs(adev);
>  	sdma_v5_2_set_mqd_funcs(adev);
>  
> @@ -2084,27 +2090,6 @@ static void sdma_v5_2_set_buffer_funcs(struct amdgpu_device *adev)
>  	}
>  }
>  
> -static const struct amdgpu_vm_pte_funcs sdma_v5_2_vm_pte_funcs = {
> -	.copy_pte_num_dw = 7,
> -	.copy_pte = sdma_v5_2_vm_copy_pte,
> -	.write_pte = sdma_v5_2_vm_write_pte,
> -	.set_pte_pde = sdma_v5_2_vm_set_pte_pde,
> -};
> -
> -static void sdma_v5_2_set_vm_pte_funcs(struct amdgpu_device *adev)
> -{
> -	unsigned i;
> -
> -	if (adev->vm_manager.vm_pte_funcs == NULL) {
> -		adev->vm_manager.vm_pte_funcs = &sdma_v5_2_vm_pte_funcs;
> -		for (i = 0; i < adev->sdma.num_instances; i++) {
> -			adev->vm_manager.vm_pte_scheds[i] =
> -				&adev->sdma.instance[i].ring.sched;
> -		}
> -		adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
> -	}
> -}
> -
>  const struct amdgpu_ip_block_version sdma_v5_2_ip_block = {
>  	.type = AMD_IP_BLOCK_TYPE_SDMA,
>  	.major = 5,
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> index e3f725bc2f29..c32331b72ba0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> @@ -119,7 +119,6 @@ static const struct amdgpu_hwip_reg_entry sdma_reg_list_6_0[] = {
>  
>  static void sdma_v6_0_set_ring_funcs(struct amdgpu_device *adev);
>  static void sdma_v6_0_set_buffer_funcs(struct amdgpu_device *adev);
> -static void sdma_v6_0_set_vm_pte_funcs(struct amdgpu_device *adev);
>  static void sdma_v6_0_set_irq_funcs(struct amdgpu_device *adev);
>  static int sdma_v6_0_start(struct amdgpu_device *adev);
>  
> @@ -1268,6 +1267,13 @@ static void sdma_v6_0_set_ras_funcs(struct amdgpu_device *adev)
>  	}
>  }
>  
> +static const struct amdgpu_vm_pte_funcs sdma_v6_0_vm_pte_funcs = {
> +	.copy_pte_num_dw = 7,
> +	.copy_pte = sdma_v6_0_vm_copy_pte,
> +	.write_pte = sdma_v6_0_vm_write_pte,
> +	.set_pte_pde = sdma_v6_0_vm_set_pte_pde,
> +};
> +
>  static int sdma_v6_0_early_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> @@ -1296,7 +1302,7 @@ static int sdma_v6_0_early_init(struct amdgpu_ip_block *ip_block)
>  
>  	sdma_v6_0_set_ring_funcs(adev);
>  	sdma_v6_0_set_buffer_funcs(adev);
> -	sdma_v6_0_set_vm_pte_funcs(adev);
> +	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v6_0_vm_pte_funcs);
>  	sdma_v6_0_set_irq_funcs(adev);
>  	sdma_v6_0_set_mqd_funcs(adev);
>  	sdma_v6_0_set_ras_funcs(adev);
> @@ -1889,25 +1895,6 @@ static void sdma_v6_0_set_buffer_funcs(struct amdgpu_device *adev)
>  	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
>  }
>  
> -static const struct amdgpu_vm_pte_funcs sdma_v6_0_vm_pte_funcs = {
> -	.copy_pte_num_dw = 7,
> -	.copy_pte = sdma_v6_0_vm_copy_pte,
> -	.write_pte = sdma_v6_0_vm_write_pte,
> -	.set_pte_pde = sdma_v6_0_vm_set_pte_pde,
> -};
> -
> -static void sdma_v6_0_set_vm_pte_funcs(struct amdgpu_device *adev)
> -{
> -	unsigned i;
> -
> -	adev->vm_manager.vm_pte_funcs = &sdma_v6_0_vm_pte_funcs;
> -	for (i = 0; i < adev->sdma.num_instances; i++) {
> -		adev->vm_manager.vm_pte_scheds[i] =
> -			&adev->sdma.instance[i].ring.sched;
> -	}
> -	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
> -}
> -
>  const struct amdgpu_ip_block_version sdma_v6_0_ip_block = {
>  	.type = AMD_IP_BLOCK_TYPE_SDMA,
>  	.major = 6,
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
> index 7fee98d37720..9318d23eb71e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
> @@ -119,7 +119,6 @@ static const struct amdgpu_hwip_reg_entry sdma_reg_list_7_0[] = {
>  
>  static void sdma_v7_0_set_ring_funcs(struct amdgpu_device *adev);
>  static void sdma_v7_0_set_buffer_funcs(struct amdgpu_device *adev);
> -static void sdma_v7_0_set_vm_pte_funcs(struct amdgpu_device *adev);
>  static void sdma_v7_0_set_irq_funcs(struct amdgpu_device *adev);
>  static int sdma_v7_0_start(struct amdgpu_device *adev);
>  
> @@ -1253,6 +1252,13 @@ static void sdma_v7_0_ring_emit_reg_write_reg_wait(struct amdgpu_ring *ring,
>  	amdgpu_ring_emit_reg_wait(ring, reg1, mask, mask);
>  }
>  
> +static const struct amdgpu_vm_pte_funcs sdma_v7_0_vm_pte_funcs = {
> +	.copy_pte_num_dw = 8,
> +	.copy_pte = sdma_v7_0_vm_copy_pte,
> +	.write_pte = sdma_v7_0_vm_write_pte,
> +	.set_pte_pde = sdma_v7_0_vm_set_pte_pde,
> +};
> +
>  static int sdma_v7_0_early_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> @@ -1283,7 +1289,7 @@ static int sdma_v7_0_early_init(struct amdgpu_ip_block *ip_block)
>  
>  	sdma_v7_0_set_ring_funcs(adev);
>  	sdma_v7_0_set_buffer_funcs(adev);
> -	sdma_v7_0_set_vm_pte_funcs(adev);
> +	amdgpu_sdma_set_vm_pte_scheds(adev, &sdma_v7_0_vm_pte_funcs);
>  	sdma_v7_0_set_irq_funcs(adev);
>  	sdma_v7_0_set_mqd_funcs(adev);
>  
> @@ -1831,25 +1837,6 @@ static void sdma_v7_0_set_buffer_funcs(struct amdgpu_device *adev)
>  	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
>  }
>  
> -static const struct amdgpu_vm_pte_funcs sdma_v7_0_vm_pte_funcs = {
> -	.copy_pte_num_dw = 8,
> -	.copy_pte = sdma_v7_0_vm_copy_pte,
> -	.write_pte = sdma_v7_0_vm_write_pte,
> -	.set_pte_pde = sdma_v7_0_vm_set_pte_pde,
> -};
> -
> -static void sdma_v7_0_set_vm_pte_funcs(struct amdgpu_device *adev)
> -{
> -	unsigned i;
> -
> -	adev->vm_manager.vm_pte_funcs = &sdma_v7_0_vm_pte_funcs;
> -	for (i = 0; i < adev->sdma.num_instances; i++) {
> -		adev->vm_manager.vm_pte_scheds[i] =
> -			&adev->sdma.instance[i].ring.sched;
> -	}
> -	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
> -}
> -
>  const struct amdgpu_ip_block_version sdma_v7_0_ip_block = {
>  	.type = AMD_IP_BLOCK_TYPE_SDMA,
>  	.major = 7,
> diff --git a/drivers/gpu/drm/amd/amdgpu/si_dma.c b/drivers/gpu/drm/amd/amdgpu/si_dma.c
> index 7f18e4875287..b85df997ed49 100644
> --- a/drivers/gpu/drm/amd/amdgpu/si_dma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/si_dma.c
> @@ -37,7 +37,6 @@ const u32 sdma_offsets[SDMA_MAX_INSTANCE] =
>  
>  static void si_dma_set_ring_funcs(struct amdgpu_device *adev);
>  static void si_dma_set_buffer_funcs(struct amdgpu_device *adev);
> -static void si_dma_set_vm_pte_funcs(struct amdgpu_device *adev);
>  static void si_dma_set_irq_funcs(struct amdgpu_device *adev);
>  
>  /**
> @@ -473,6 +472,14 @@ static void si_dma_ring_emit_wreg(struct amdgpu_ring *ring,
>  	amdgpu_ring_write(ring, val);
>  }
>  
> +static const struct amdgpu_vm_pte_funcs si_dma_vm_pte_funcs = {
> +	.copy_pte_num_dw = 5,
> +	.copy_pte = si_dma_vm_copy_pte,
> +
> +	.write_pte = si_dma_vm_write_pte,
> +	.set_pte_pde = si_dma_vm_set_pte_pde,
> +};
> +
>  static int si_dma_early_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> @@ -481,7 +488,7 @@ static int si_dma_early_init(struct amdgpu_ip_block *ip_block)
>  
>  	si_dma_set_ring_funcs(adev);
>  	si_dma_set_buffer_funcs(adev);
> -	si_dma_set_vm_pte_funcs(adev);
> +	amdgpu_sdma_set_vm_pte_scheds(adev, &si_dma_vm_pte_funcs);
>  	si_dma_set_irq_funcs(adev);
>  
>  	return 0;
> @@ -830,26 +837,6 @@ static void si_dma_set_buffer_funcs(struct amdgpu_device *adev)
>  	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
>  }
>  
> -static const struct amdgpu_vm_pte_funcs si_dma_vm_pte_funcs = {
> -	.copy_pte_num_dw = 5,
> -	.copy_pte = si_dma_vm_copy_pte,
> -
> -	.write_pte = si_dma_vm_write_pte,
> -	.set_pte_pde = si_dma_vm_set_pte_pde,
> -};
> -
> -static void si_dma_set_vm_pte_funcs(struct amdgpu_device *adev)
> -{
> -	unsigned i;
> -
> -	adev->vm_manager.vm_pte_funcs = &si_dma_vm_pte_funcs;
> -	for (i = 0; i < adev->sdma.num_instances; i++) {
> -		adev->vm_manager.vm_pte_scheds[i] =
> -			&adev->sdma.instance[i].ring.sched;
> -	}
> -	adev->vm_manager.vm_pte_num_scheds = adev->sdma.num_instances;
> -}
> -
>  const struct amdgpu_ip_block_version si_dma_ip_block =
>  {
>  	.type = AMD_IP_BLOCK_TYPE_SDMA,


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 18/28] drm/amdgpu: move sched status check inside amdgpu_ttm_set_buffer_funcs_status
  2025-11-21 10:12 ` [PATCH v3 18/28] drm/amdgpu: move sched status check inside amdgpu_ttm_set_buffer_funcs_status Pierre-Eric Pelloux-Prayer
@ 2025-11-21 15:08   ` Christian König
  2025-11-21 15:12     ` Pierre-Eric Pelloux-Prayer
  0 siblings, 1 reply; 60+ messages in thread
From: Christian König @ 2025-11-21 15:08 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel



On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> It avoids duplicated code and allows to output a warning.
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 ++++---------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  6 ++++++
>  2 files changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 54f7c81f287b..7167db54d722 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3309,9 +3309,7 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>  	if (r)
>  		goto init_failed;
>  
> -	if (adev->mman.buffer_funcs_ring &&
> -	    adev->mman.buffer_funcs_ring->sched.ready)
> -		amdgpu_ttm_set_buffer_funcs_status(adev, true);
> +	amdgpu_ttm_set_buffer_funcs_status(adev, true);
>  
>  	/* Don't init kfd if whole hive need to be reset during init */
>  	if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
> @@ -4191,8 +4189,7 @@ static int amdgpu_device_ip_resume(struct amdgpu_device *adev)
>  
>  	r = amdgpu_device_ip_resume_phase2(adev);
>  
> -	if (adev->mman.buffer_funcs_ring->sched.ready)
> -		amdgpu_ttm_set_buffer_funcs_status(adev, true);
> +	amdgpu_ttm_set_buffer_funcs_status(adev, true);
>  
>  	if (r)
>  		return r;
> @@ -5321,8 +5318,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool notify_clients)
>  	return 0;
>  
>  unwind_evict:
> -	if (adev->mman.buffer_funcs_ring->sched.ready)
> -		amdgpu_ttm_set_buffer_funcs_status(adev, true);
> +	amdgpu_ttm_set_buffer_funcs_status(adev, true);
>  	amdgpu_fence_driver_hw_init(adev);
>  
>  unwind_userq:
> @@ -6050,8 +6046,7 @@ int amdgpu_device_reinit_after_reset(struct amdgpu_reset_context *reset_context)
>  				if (r)
>  					goto out;
>  
> -				if (tmp_adev->mman.buffer_funcs_ring->sched.ready)
> -					amdgpu_ttm_set_buffer_funcs_status(tmp_adev, true);
> +				amdgpu_ttm_set_buffer_funcs_status(tmp_adev, true);
>  
>  				r = amdgpu_device_ip_resume_phase3(tmp_adev);
>  				if (r)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 489880b2fb8e..9024dde0c5a7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -2233,6 +2233,12 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>  	    adev->mman.buffer_funcs_enabled == enable || adev->gmc.is_app_apu)
>  		return reserved_windows;
>  
> +	if ((!adev->mman.buffer_funcs_ring || !adev->mman.buffer_funcs_ring->sched.ready) &&
> +	    enable) {
> +		dev_warn(adev->dev, "Not enabling DMA transfers for in kernel use");
> +		return 0;
> +	}
> +

Only check that when enabling the functions. Could be that when disabling them we have sched.ready set to false already.

Apart from that looks good to me.

Regards,
Christian.

>  	if (enable) {
>  		struct amdgpu_ring *ring;
>  		struct drm_gpu_scheduler *sched;


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 18/28] drm/amdgpu: move sched status check inside amdgpu_ttm_set_buffer_funcs_status
  2025-11-21 15:08   ` Christian König
@ 2025-11-21 15:12     ` Pierre-Eric Pelloux-Prayer
  2025-11-21 15:23       ` Christian König
  0 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 15:12 UTC (permalink / raw)
  To: Christian König, Pierre-Eric Pelloux-Prayer, Alex Deucher,
	David Airlie, Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel



Le 21/11/2025 à 16:08, Christian König a écrit :
> 
> 
> On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
>> It avoids duplicated code and allows to output a warning.
>>
>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 ++++---------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  6 ++++++
>>   2 files changed, 10 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 54f7c81f287b..7167db54d722 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3309,9 +3309,7 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>>   	if (r)
>>   		goto init_failed;
>>   
>> -	if (adev->mman.buffer_funcs_ring &&
>> -	    adev->mman.buffer_funcs_ring->sched.ready)
>> -		amdgpu_ttm_set_buffer_funcs_status(adev, true);
>> +	amdgpu_ttm_set_buffer_funcs_status(adev, true);
>>   
>>   	/* Don't init kfd if whole hive need to be reset during init */
>>   	if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
>> @@ -4191,8 +4189,7 @@ static int amdgpu_device_ip_resume(struct amdgpu_device *adev)
>>   
>>   	r = amdgpu_device_ip_resume_phase2(adev);
>>   
>> -	if (adev->mman.buffer_funcs_ring->sched.ready)
>> -		amdgpu_ttm_set_buffer_funcs_status(adev, true);
>> +	amdgpu_ttm_set_buffer_funcs_status(adev, true);
>>   
>>   	if (r)
>>   		return r;
>> @@ -5321,8 +5318,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool notify_clients)
>>   	return 0;
>>   
>>   unwind_evict:
>> -	if (adev->mman.buffer_funcs_ring->sched.ready)
>> -		amdgpu_ttm_set_buffer_funcs_status(adev, true);
>> +	amdgpu_ttm_set_buffer_funcs_status(adev, true);
>>   	amdgpu_fence_driver_hw_init(adev);
>>   
>>   unwind_userq:
>> @@ -6050,8 +6046,7 @@ int amdgpu_device_reinit_after_reset(struct amdgpu_reset_context *reset_context)
>>   				if (r)
>>   					goto out;
>>   
>> -				if (tmp_adev->mman.buffer_funcs_ring->sched.ready)
>> -					amdgpu_ttm_set_buffer_funcs_status(tmp_adev, true);
>> +				amdgpu_ttm_set_buffer_funcs_status(tmp_adev, true);
>>   
>>   				r = amdgpu_device_ip_resume_phase3(tmp_adev);
>>   				if (r)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index 489880b2fb8e..9024dde0c5a7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -2233,6 +2233,12 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>>   	    adev->mman.buffer_funcs_enabled == enable || adev->gmc.is_app_apu)
>>   		return reserved_windows;
>>   
>> +	if ((!adev->mman.buffer_funcs_ring || !adev->mman.buffer_funcs_ring->sched.ready) &&
>> +	    enable) {
>> +		dev_warn(adev->dev, "Not enabling DMA transfers for in kernel use");
>> +		return 0;
>> +	}
>> +
> 
> Only check that when enabling the functions. Could be that when disabling them we have sched.ready set to false already.

The check already has a "&& enable" condition. Are you suggesting something 
different?

PE


> 
> Apart from that looks good to me.
> 
> Regards,
> Christian.
> 
>>   	if (enable) {
>>   		struct amdgpu_ring *ring;
>>   		struct drm_gpu_scheduler *sched;


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 19/28] drm/ttm: rework pipelined eviction fence handling
  2025-11-21 10:12 ` [PATCH v3 19/28] drm/ttm: rework pipelined eviction fence handling Pierre-Eric Pelloux-Prayer
@ 2025-11-21 15:12   ` Christian König
  2025-11-26 15:34     ` Maarten Lankhorst
  0 siblings, 1 reply; 60+ messages in thread
From: Christian König @ 2025-11-21 15:12 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Huang Rui, Matthew Auld,
	Matthew Brost, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal
  Cc: dri-devel, linux-kernel, linux-media, linaro-mm-sig

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> Until now ttm stored a single pipelined eviction fence which means
> drivers had to use a single entity for these evictions.
> 
> To lift this requirement, this commit allows up to 8 entities to
> be used.
> 
> Ideally a dma_resv object would have been used as a container of
> the eviction fences, but the locking rules makes it complex.
> dma_resv all have the same ww_class, which means "Attempting to
> lock more mutexes after ww_acquire_done." is an error.
> 
> One alternative considered was to introduced a 2nd ww_class for
> specific resv to hold a single "transient" lock (= the resv lock
> would only be held for a short period, without taking any other
> locks).
> 
> The other option, is to statically reserve a fence array, and
> extend the existing code to deal with N fences, instead of 1.
> 
> The driver is still responsible to reserve the correct number
> of fence slots.
> 
> ---
> v2:
> - simplified code
> - dropped n_fences
> - name changes
> v3: use ttm_resource_manager_cleanup
> ---
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

Going to push separately to drm-misc-next on Monday.

Regards,
Christian.

> ---
>  .../gpu/drm/ttm/tests/ttm_bo_validate_test.c  | 11 +++--
>  drivers/gpu/drm/ttm/tests/ttm_resource_test.c |  5 +-
>  drivers/gpu/drm/ttm/ttm_bo.c                  | 47 ++++++++++---------
>  drivers/gpu/drm/ttm/ttm_bo_util.c             | 38 ++++++++++++---
>  drivers/gpu/drm/ttm/ttm_resource.c            | 31 +++++++-----
>  include/drm/ttm/ttm_resource.h                | 29 ++++++++----
>  6 files changed, 104 insertions(+), 57 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
> index 3148f5d3dbd6..8f71906c4238 100644
> --- a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
> +++ b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
> @@ -651,7 +651,7 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test)
>  	int err;
>  
>  	man = ttm_manager_type(priv->ttm_dev, mem_type);
> -	man->move = dma_fence_get_stub();
> +	man->eviction_fences[0] = dma_fence_get_stub();
>  
>  	bo = ttm_bo_kunit_init(test, test->priv, size, NULL);
>  	bo->type = bo_type;
> @@ -668,7 +668,7 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test)
>  	KUNIT_EXPECT_EQ(test, ctx.bytes_moved, size);
>  
>  	ttm_bo_put(bo);
> -	dma_fence_put(man->move);
> +	dma_fence_put(man->eviction_fences[0]);
>  }
>  
>  static const struct ttm_bo_validate_test_case ttm_bo_validate_wait_cases[] = {
> @@ -732,9 +732,9 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test)
>  
>  	spin_lock_init(&fence_lock);
>  	man = ttm_manager_type(priv->ttm_dev, fst_mem);
> -	man->move = alloc_mock_fence(test);
> +	man->eviction_fences[0] = alloc_mock_fence(test);
>  
> -	task = kthread_create(threaded_fence_signal, man->move, "move-fence-signal");
> +	task = kthread_create(threaded_fence_signal, man->eviction_fences[0], "move-fence-signal");
>  	if (IS_ERR(task))
>  		KUNIT_FAIL(test, "Couldn't create move fence signal task\n");
>  
> @@ -742,7 +742,8 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test)
>  	err = ttm_bo_validate(bo, placement_val, &ctx_val);
>  	dma_resv_unlock(bo->base.resv);
>  
> -	dma_fence_wait_timeout(man->move, false, MAX_SCHEDULE_TIMEOUT);
> +	dma_fence_wait_timeout(man->eviction_fences[0], false, MAX_SCHEDULE_TIMEOUT);
> +	man->eviction_fences[0] = NULL;
>  
>  	KUNIT_EXPECT_EQ(test, err, 0);
>  	KUNIT_EXPECT_EQ(test, ctx_val.bytes_moved, size);
> diff --git a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c
> index e6ea2bd01f07..c0e4e35e0442 100644
> --- a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c
> +++ b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c
> @@ -207,6 +207,7 @@ static void ttm_resource_manager_init_basic(struct kunit *test)
>  	struct ttm_resource_test_priv *priv = test->priv;
>  	struct ttm_resource_manager *man;
>  	size_t size = SZ_16K;
> +	int i;
>  
>  	man = kunit_kzalloc(test, sizeof(*man), GFP_KERNEL);
>  	KUNIT_ASSERT_NOT_NULL(test, man);
> @@ -216,8 +217,8 @@ static void ttm_resource_manager_init_basic(struct kunit *test)
>  	KUNIT_ASSERT_PTR_EQ(test, man->bdev, priv->devs->ttm_dev);
>  	KUNIT_ASSERT_EQ(test, man->size, size);
>  	KUNIT_ASSERT_EQ(test, man->usage, 0);
> -	KUNIT_ASSERT_NULL(test, man->move);
> -	KUNIT_ASSERT_NOT_NULL(test, &man->move_lock);
> +	for (i = 0; i < TTM_NUM_MOVE_FENCES; i++)
> +		KUNIT_ASSERT_NULL(test, man->eviction_fences[i]);
>  
>  	for (int i = 0; i < TTM_MAX_BO_PRIORITY; ++i)
>  		KUNIT_ASSERT_TRUE(test, list_empty(&man->lru[i]));
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index f4d9e68b21e7..0b3732ed6f6c 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -658,34 +658,35 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo)
>  EXPORT_SYMBOL(ttm_bo_unpin);
>  
>  /*
> - * Add the last move fence to the BO as kernel dependency and reserve a new
> - * fence slot.
> + * Add the pipelined eviction fencesto the BO as kernel dependency and reserve new
> + * fence slots.
>   */
> -static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
> -				 struct ttm_resource_manager *man,
> -				 bool no_wait_gpu)
> +static int ttm_bo_add_pipelined_eviction_fences(struct ttm_buffer_object *bo,
> +						struct ttm_resource_manager *man,
> +						bool no_wait_gpu)
>  {
>  	struct dma_fence *fence;
> -	int ret;
> +	int i;
>  
> -	spin_lock(&man->move_lock);
> -	fence = dma_fence_get(man->move);
> -	spin_unlock(&man->move_lock);
> +	spin_lock(&man->eviction_lock);
> +	for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) {
> +		fence = man->eviction_fences[i];
> +		if (!fence)
> +			continue;
>  
> -	if (!fence)
> -		return 0;
> -
> -	if (no_wait_gpu) {
> -		ret = dma_fence_is_signaled(fence) ? 0 : -EBUSY;
> -		dma_fence_put(fence);
> -		return ret;
> +		if (no_wait_gpu) {
> +			if (!dma_fence_is_signaled(fence)) {
> +				spin_unlock(&man->eviction_lock);
> +				return -EBUSY;
> +			}
> +		} else {
> +			dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL);
> +		}
>  	}
> +	spin_unlock(&man->eviction_lock);
>  
> -	dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL);
> -
> -	ret = dma_resv_reserve_fences(bo->base.resv, 1);
> -	dma_fence_put(fence);
> -	return ret;
> +	/* TODO: this call should be removed. */
> +	return dma_resv_reserve_fences(bo->base.resv, 1);
>  }
>  
>  /**
> @@ -718,7 +719,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
>  	int i, ret;
>  
>  	ticket = dma_resv_locking_ctx(bo->base.resv);
> -	ret = dma_resv_reserve_fences(bo->base.resv, 1);
> +	ret = dma_resv_reserve_fences(bo->base.resv, TTM_NUM_MOVE_FENCES);
>  	if (unlikely(ret))
>  		return ret;
>  
> @@ -757,7 +758,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
>  				return ret;
>  		}
>  
> -		ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
> +		ret = ttm_bo_add_pipelined_eviction_fences(bo, man, ctx->no_wait_gpu);
>  		if (unlikely(ret)) {
>  			ttm_resource_free(bo, res);
>  			if (ret == -EBUSY)
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index acbbca9d5c92..2ff35d55e462 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -258,7 +258,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
>  	ret = dma_resv_trylock(&fbo->base.base._resv);
>  	WARN_ON(!ret);
>  
> -	ret = dma_resv_reserve_fences(&fbo->base.base._resv, 1);
> +	ret = dma_resv_reserve_fences(&fbo->base.base._resv, TTM_NUM_MOVE_FENCES);
>  	if (ret) {
>  		dma_resv_unlock(&fbo->base.base._resv);
>  		kfree(fbo);
> @@ -646,20 +646,44 @@ static void ttm_bo_move_pipeline_evict(struct ttm_buffer_object *bo,
>  {
>  	struct ttm_device *bdev = bo->bdev;
>  	struct ttm_resource_manager *from;
> +	struct dma_fence *tmp;
> +	int i;
>  
>  	from = ttm_manager_type(bdev, bo->resource->mem_type);
>  
>  	/**
>  	 * BO doesn't have a TTM we need to bind/unbind. Just remember
> -	 * this eviction and free up the allocation
> +	 * this eviction and free up the allocation.
> +	 * The fence will be saved in the first free slot or in the slot
> +	 * already used to store a fence from the same context. Since
> +	 * drivers can't use more than TTM_NUM_MOVE_FENCES contexts for
> +	 * evictions we should always find a slot to use.
>  	 */
> -	spin_lock(&from->move_lock);
> -	if (!from->move || dma_fence_is_later(fence, from->move)) {
> -		dma_fence_put(from->move);
> -		from->move = dma_fence_get(fence);
> +	spin_lock(&from->eviction_lock);
> +	for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) {
> +		tmp = from->eviction_fences[i];
> +		if (!tmp)
> +			break;
> +		if (fence->context != tmp->context)
> +			continue;
> +		if (dma_fence_is_later(fence, tmp)) {
> +			dma_fence_put(tmp);
> +			break;
> +		}
> +		goto unlock;
> +	}
> +	if (i < TTM_NUM_MOVE_FENCES) {
> +		from->eviction_fences[i] = dma_fence_get(fence);
> +	} else {
> +		WARN(1, "not enough fence slots for all fence contexts");
> +		spin_unlock(&from->eviction_lock);
> +		dma_fence_wait(fence, false);
> +		goto end;
>  	}
> -	spin_unlock(&from->move_lock);
>  
> +unlock:
> +	spin_unlock(&from->eviction_lock);
> +end:
>  	ttm_resource_free(bo, &bo->resource);
>  }
>  
> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
> index e2c82ad07eb4..62c34cafa387 100644
> --- a/drivers/gpu/drm/ttm/ttm_resource.c
> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
> @@ -523,14 +523,15 @@ void ttm_resource_manager_init(struct ttm_resource_manager *man,
>  {
>  	unsigned i;
>  
> -	spin_lock_init(&man->move_lock);
>  	man->bdev = bdev;
>  	man->size = size;
>  	man->usage = 0;
>  
>  	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i)
>  		INIT_LIST_HEAD(&man->lru[i]);
> -	man->move = NULL;
> +	spin_lock_init(&man->eviction_lock);
> +	for (i = 0; i < TTM_NUM_MOVE_FENCES; i++)
> +		man->eviction_fences[i] = NULL;
>  }
>  EXPORT_SYMBOL(ttm_resource_manager_init);
>  
> @@ -551,7 +552,7 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev,
>  		.no_wait_gpu = false,
>  	};
>  	struct dma_fence *fence;
> -	int ret;
> +	int ret, i;
>  
>  	do {
>  		ret = ttm_bo_evict_first(bdev, man, &ctx);
> @@ -561,18 +562,24 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev,
>  	if (ret && ret != -ENOENT)
>  		return ret;
>  
> -	spin_lock(&man->move_lock);
> -	fence = dma_fence_get(man->move);
> -	spin_unlock(&man->move_lock);
> +	ret = 0;
>  
> -	if (fence) {
> -		ret = dma_fence_wait(fence, false);
> -		dma_fence_put(fence);
> -		if (ret)
> -			return ret;
> +	spin_lock(&man->eviction_lock);
> +	for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) {
> +		fence = man->eviction_fences[i];
> +		if (fence && !dma_fence_is_signaled(fence)) {
> +			dma_fence_get(fence);
> +			spin_unlock(&man->eviction_lock);
> +			ret = dma_fence_wait(fence, false);
> +			dma_fence_put(fence);
> +			if (ret)
> +				return ret;
> +			spin_lock(&man->eviction_lock);
> +		}
>  	}
> +	spin_unlock(&man->eviction_lock);
>  
> -	return 0;
> +	return ret;
>  }
>  EXPORT_SYMBOL(ttm_resource_manager_evict_all);
>  
> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
> index f49daa504c36..50e6added509 100644
> --- a/include/drm/ttm/ttm_resource.h
> +++ b/include/drm/ttm/ttm_resource.h
> @@ -50,6 +50,15 @@ struct io_mapping;
>  struct sg_table;
>  struct scatterlist;
>  
> +/**
> + * define TTM_NUM_MOVE_FENCES - How many entities can be used for evictions
> + *
> + * Pipelined evictions can be spread on multiple entities. This
> + * is the max number of entities that can be used by the driver
> + * for that purpose.
> + */
> +#define TTM_NUM_MOVE_FENCES 8
> +
>  /**
>   * enum ttm_lru_item_type - enumerate ttm_lru_item subclasses
>   */
> @@ -180,8 +189,8 @@ struct ttm_resource_manager_func {
>   * @size: Size of the managed region.
>   * @bdev: ttm device this manager belongs to
>   * @func: structure pointer implementing the range manager. See above
> - * @move_lock: lock for move fence
> - * @move: The fence of the last pipelined move operation.
> + * @eviction_lock: lock for eviction fences
> + * @eviction_fences: The fences of the last pipelined move operation.
>   * @lru: The lru list for this memory type.
>   *
>   * This structure is used to identify and manage memory types for a device.
> @@ -195,12 +204,12 @@ struct ttm_resource_manager {
>  	struct ttm_device *bdev;
>  	uint64_t size;
>  	const struct ttm_resource_manager_func *func;
> -	spinlock_t move_lock;
>  
> -	/*
> -	 * Protected by @move_lock.
> +	/* This is very similar to a dma_resv object, but locking rules make
> +	 * it difficult to use one in this context.
>  	 */
> -	struct dma_fence *move;
> +	spinlock_t eviction_lock;
> +	struct dma_fence *eviction_fences[TTM_NUM_MOVE_FENCES];
>  
>  	/*
>  	 * Protected by the bdev->lru_lock.
> @@ -421,8 +430,12 @@ static inline bool ttm_resource_manager_used(struct ttm_resource_manager *man)
>  static inline void
>  ttm_resource_manager_cleanup(struct ttm_resource_manager *man)
>  {
> -	dma_fence_put(man->move);
> -	man->move = NULL;
> +	int i;
> +
> +	for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) {
> +		dma_fence_put(man->eviction_fences[i]);
> +		man->eviction_fences[i] = NULL;
> +	}
>  }
>  
>  void ttm_lru_bulk_move_init(struct ttm_lru_bulk_move *bulk);


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 18/28] drm/amdgpu: move sched status check inside amdgpu_ttm_set_buffer_funcs_status
  2025-11-21 15:12     ` Pierre-Eric Pelloux-Prayer
@ 2025-11-21 15:23       ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 15:23 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Pierre-Eric Pelloux-Prayer,
	Alex Deucher, David Airlie, Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel



On 11/21/25 16:12, Pierre-Eric Pelloux-Prayer wrote:
> 
> 
> Le 21/11/2025 à 16:08, Christian König a écrit :
>>
>>
>> On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
>>> It avoids duplicated code and allows to output a warning.
>>>
>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 ++++---------
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  6 ++++++
>>>   2 files changed, 10 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 54f7c81f287b..7167db54d722 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -3309,9 +3309,7 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>>>       if (r)
>>>           goto init_failed;
>>>   -    if (adev->mman.buffer_funcs_ring &&
>>> -        adev->mman.buffer_funcs_ring->sched.ready)
>>> -        amdgpu_ttm_set_buffer_funcs_status(adev, true);
>>> +    amdgpu_ttm_set_buffer_funcs_status(adev, true);
>>>         /* Don't init kfd if whole hive need to be reset during init */
>>>       if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
>>> @@ -4191,8 +4189,7 @@ static int amdgpu_device_ip_resume(struct amdgpu_device *adev)
>>>         r = amdgpu_device_ip_resume_phase2(adev);
>>>   -    if (adev->mman.buffer_funcs_ring->sched.ready)
>>> -        amdgpu_ttm_set_buffer_funcs_status(adev, true);
>>> +    amdgpu_ttm_set_buffer_funcs_status(adev, true);
>>>         if (r)
>>>           return r;
>>> @@ -5321,8 +5318,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool notify_clients)
>>>       return 0;
>>>     unwind_evict:
>>> -    if (adev->mman.buffer_funcs_ring->sched.ready)
>>> -        amdgpu_ttm_set_buffer_funcs_status(adev, true);
>>> +    amdgpu_ttm_set_buffer_funcs_status(adev, true);
>>>       amdgpu_fence_driver_hw_init(adev);
>>>     unwind_userq:
>>> @@ -6050,8 +6046,7 @@ int amdgpu_device_reinit_after_reset(struct amdgpu_reset_context *reset_context)
>>>                   if (r)
>>>                       goto out;
>>>   -                if (tmp_adev->mman.buffer_funcs_ring->sched.ready)
>>> -                    amdgpu_ttm_set_buffer_funcs_status(tmp_adev, true);
>>> +                amdgpu_ttm_set_buffer_funcs_status(tmp_adev, true);
>>>                     r = amdgpu_device_ip_resume_phase3(tmp_adev);
>>>                   if (r)
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> index 489880b2fb8e..9024dde0c5a7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> @@ -2233,6 +2233,12 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>>>           adev->mman.buffer_funcs_enabled == enable || adev->gmc.is_app_apu)
>>>           return reserved_windows;
>>>   +    if ((!adev->mman.buffer_funcs_ring || !adev->mman.buffer_funcs_ring->sched.ready) &&
>>> +        enable) {
>>> +        dev_warn(adev->dev, "Not enabling DMA transfers for in kernel use");
>>> +        return 0;
>>> +    }
>>> +
>>
>> Only check that when enabling the functions. Could be that when disabling them we have sched.ready set to false already.
> 
> The check already has a "&& enable" condition. Are you suggesting something different?

Ah, missed that. But you have an "if (enabled) {" right below it. So I suggest to just move the check in there.

Regards,
Christian.

> 
> PE
> 
> 
>>
>> Apart from that looks good to me.
>>
>> Regards,
>> Christian.
>>
>>>       if (enable) {
>>>           struct amdgpu_ring *ring;
>>>           struct drm_gpu_scheduler *sched;


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 20/28] drm/amdgpu: allocate multiple clear entities
  2025-11-21 10:12 ` [PATCH v3 20/28] drm/amdgpu: allocate multiple clear entities Pierre-Eric Pelloux-Prayer
@ 2025-11-21 15:34   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 15:34 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel



On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> No functional change for now, as we always use entity 0.
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 51 ++++++++++++++--------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h    |  3 +-
>  3 files changed, 35 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 2ee48f76483d..56663e82efef 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -1323,7 +1323,7 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>  		goto out;
>  
>  	r = amdgpu_fill_buffer(adev,
> -			       &adev->mman.clear_entity, abo, 0, &bo->base._resv,
> +			       &adev->mman.clear_entities[0], abo, 0, &bo->base._resv,
>  			       &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
>  	if (WARN_ON(r))
>  		goto out;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 9024dde0c5a7..d7f041e43eca 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -2224,10 +2224,12 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>  {
>  	struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
>  	u32 used_windows, reserved_windows;
> +	u32 num_clear_entities;
>  	uint64_t size;
> -	int r;
> +	int r, i, j;
>  
> -	reserved_windows = 3;
> +	num_clear_entities = adev->sdma.num_instances;

Actually each SDMA instance has two ring buffers.

Additional to that using adev->sdma_num_instances is a not good idea here. That should probably rather come from the copy buffer funcs.

Regards,
Christian.

> +	reserved_windows = 2 + num_clear_entities;
>  
>  	if (!adev->mman.initialized || amdgpu_in_reset(adev) ||
>  	    adev->mman.buffer_funcs_enabled == enable || adev->gmc.is_app_apu)
> @@ -2250,21 +2252,11 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>  					  1, NULL);
>  		if (r) {
>  			dev_err(adev->dev,
> -				"Failed setting up TTM BO move entity (%d)\n",
> +				"Failed setting up TTM BO eviction entity (%d)\n",
>  				r);
>  			return 0;
>  		}
>  
> -		r = drm_sched_entity_init(&adev->mman.clear_entity.base,
> -					  DRM_SCHED_PRIORITY_NORMAL, &sched,
> -					  1, NULL);
> -		if (r) {
> -			dev_err(adev->dev,
> -				"Failed setting up TTM BO clear entity (%d)\n",
> -				r);
> -			goto error_free_entity;
> -		}
> -
>  		r = drm_sched_entity_init(&adev->mman.move_entity.base,
>  					  DRM_SCHED_PRIORITY_NORMAL, &sched,
>  					  1, NULL);
> @@ -2272,26 +2264,48 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>  			dev_err(adev->dev,
>  				"Failed setting up TTM BO move entity (%d)\n",
>  				r);
> -			drm_sched_entity_destroy(&adev->mman.clear_entity.base);
>  			goto error_free_entity;
>  		}
>  
> +		adev->mman.num_clear_entities = num_clear_entities;
> +		adev->mman.clear_entities = kcalloc(num_clear_entities,
> +						    sizeof(struct amdgpu_ttm_buffer_entity),
> +						    GFP_KERNEL);
> +		if (!adev->mman.clear_entities)
> +			goto error_free_entity;
> +
> +		for (i = 0; i < num_clear_entities; i++) {
> +			r = drm_sched_entity_init(&adev->mman.clear_entities[i].base,
> +						  DRM_SCHED_PRIORITY_NORMAL, &sched,
> +						  1, NULL);
> +			if (r) {
> +				for (j = 0; j < i; j++)
> +					drm_sched_entity_destroy(
> +						&adev->mman.clear_entities[j].base);
> +				kfree(adev->mman.clear_entities);
> +				goto error_free_entity;
> +			}
> +		}
> +
>  		/* Statically assign GART windows to each entity. */
>  		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.default_entity,
>  							     0, false, false);
>  		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.move_entity,
>  							     used_windows, true, true);
> -		used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.clear_entity,
> -							     used_windows, false, true);
> +		for (i = 0; i < num_clear_entities; i++)
> +			used_windows = amdgpu_ttm_buffer_entity_init(&adev->mman.clear_entities[i],
> +								     used_windows, false, true);
>  		WARN_ON(used_windows != reserved_windows);
>  	} else {
>  		drm_sched_entity_destroy(&adev->mman.default_entity.base);
> -		drm_sched_entity_destroy(&adev->mman.clear_entity.base);
>  		drm_sched_entity_destroy(&adev->mman.move_entity.base);
> +		for (i = 0; i < num_clear_entities; i++)
> +			drm_sched_entity_destroy(&adev->mman.clear_entities[i].base);
>  		/* Drop all the old fences since re-creating the scheduler entities
>  		 * will allocate new contexts.
>  		 */
>  		ttm_resource_manager_cleanup(man);
> +		kfree(adev->mman.clear_entities);
>  	}
>  
>  	/* this just adjusts TTM size idea, which sets lpfn to the correct value */
> @@ -2456,8 +2470,7 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
>  
>  	if (!fence)
>  		return -EINVAL;
> -
> -	entity = &adev->mman.clear_entity;
> +	entity = &adev->mman.clear_entities[0];
>  	*fence = dma_fence_get_stub();
>  
>  	amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index 0b3b03f43bab..250ef54a5550 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -72,8 +72,9 @@ struct amdgpu_mman {
>  	struct mutex				gtt_window_lock;
>  
>  	struct amdgpu_ttm_buffer_entity default_entity; /* has no gart windows */
> -	struct amdgpu_ttm_buffer_entity clear_entity;
>  	struct amdgpu_ttm_buffer_entity move_entity;
> +	struct amdgpu_ttm_buffer_entity *clear_entities;
> +	u32 num_clear_entities;
>  
>  	struct amdgpu_vram_mgr vram_mgr;
>  	struct amdgpu_gtt_mgr gtt_mgr;


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 22/28] drm/amdgpu: round robin through clear_entities in amdgpu_fill_buffer
  2025-11-21 10:12 ` [PATCH v3 22/28] drm/amdgpu: round robin through clear_entities in amdgpu_fill_buffer Pierre-Eric Pelloux-Prayer
@ 2025-11-21 15:36   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 15:36 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> This makes clear of different BOs run in parallel. Partial jobs to
> clear a single BO still execute sequentially.
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  4 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 12 ++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h    |  2 ++
>  3 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 56663e82efef..7d8d70135cc2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -1322,8 +1322,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>  	if (r)
>  		goto out;
>  
> -	r = amdgpu_fill_buffer(adev,
> -			       &adev->mman.clear_entities[0], abo, 0, &bo->base._resv,
> +	r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev),
> +			       abo, 0, &bo->base._resv,
>  			       &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
>  	if (WARN_ON(r))
>  		goto out;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 438e8a3b7a06..8d70bea66dd0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -2277,6 +2277,7 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>  		adev->mman.clear_entities = kcalloc(num_clear_entities,
>  						    sizeof(struct amdgpu_ttm_buffer_entity),
>  						    GFP_KERNEL);
> +		atomic_set(&adev->mman.next_clear_entity, 0);
>  		if (!adev->mman.clear_entities)
>  			goto error_free_entity;
>  
> @@ -2576,6 +2577,17 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
>  	return r;
>  }
>  
> +struct amdgpu_ttm_buffer_entity *
> +amdgpu_ttm_next_clear_entity(struct amdgpu_device *adev)
> +{
> +	struct amdgpu_mman *mman = &adev->mman;
> +	int i;
> +
> +	i = atomic_inc_return(&mman->next_clear_entity) %
> +			      mman->num_clear_entities;
> +	return &mman->clear_entities[i];
> +}
> +
>  /**
>   * amdgpu_ttm_evict_resources - evict memory buffers
>   * @adev: amdgpu device object
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index eabc5a1549e9..887531126d9d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -73,6 +73,7 @@ struct amdgpu_mman {
>  
>  	struct amdgpu_ttm_buffer_entity default_entity; /* has no gart windows */
>  	struct amdgpu_ttm_buffer_entity *clear_entities;
> +	atomic_t next_clear_entity;
>  	u32 num_clear_entities;
>  	struct amdgpu_ttm_buffer_entity move_entities[TTM_NUM_MOVE_FENCES];
>  	u32 num_move_entities;
> @@ -189,6 +190,7 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
>  		       struct dma_resv *resv,
>  		       struct dma_fence **f,
>  		       u64 k_job_id);
> +struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct amdgpu_device *adev);
>  
>  int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
>  void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 23/28] drm/amdgpu: use TTM_NUM_MOVE_FENCES when reserving fences
  2025-11-21 10:12 ` [PATCH v3 23/28] drm/amdgpu: use TTM_NUM_MOVE_FENCES when reserving fences Pierre-Eric Pelloux-Prayer
@ 2025-11-21 15:41   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 15:41 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter, Felix Kuehling, Harry Wentland, Leo Li,
	Rodrigo Siqueira
  Cc: amd-gfx, dri-devel, linux-kernel



On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> Use TTM_NUM_MOVE_FENCES as an upperbound of how many fences
> ttm might need to deal with moves/evictions.
> 
> ---
> v2: removed drm_err calls
> ---
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> Acked-by: Felix Kuehling <felix.kuehling@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c                  | 5 ++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c                 | 2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c                | 6 ++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c                  | 3 ++-
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c                    | 3 +--
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 6 ++----
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c    | 6 ++----
>  7 files changed, 12 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index d591dce0f3b3..5215238f8fc9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -916,9 +916,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>  			goto out_free_user_pages;
>  
>  		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> -			/* One fence for TTM and one for each CS job */
>  			r = drm_exec_prepare_obj(&p->exec, &e->bo->tbo.base,
> -						 1 + p->gang_size);
> +						 TTM_NUM_MOVE_FENCES + p->gang_size);
>  			drm_exec_retry_on_contention(&p->exec);
>  			if (unlikely(r))
>  				goto out_free_user_pages;
> @@ -928,7 +927,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>  
>  		if (p->uf_bo) {
>  			r = drm_exec_prepare_obj(&p->exec, &p->uf_bo->tbo.base,
> -						 1 + p->gang_size);
> +						 TTM_NUM_MOVE_FENCES + p->gang_size);
>  			drm_exec_retry_on_contention(&p->exec);
>  			if (unlikely(r))
>  				goto out_free_user_pages;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index f2505ae5fd65..41fd84a19d66 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -354,7 +354,7 @@ static void amdgpu_gem_object_close(struct drm_gem_object *obj,
>  
>  	drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
>  	drm_exec_until_all_locked(&exec) {
> -		r = drm_exec_prepare_obj(&exec, &bo->tbo.base, 1);
> +		r = drm_exec_prepare_obj(&exec, &bo->tbo.base, TTM_NUM_MOVE_FENCES);

This here can stay 1, we just need a single slot for the VM update fence.

>  		drm_exec_retry_on_contention(&exec);
>  		if (unlikely(r))
>  			goto out_unlock;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
> index 79bad9cbe2ab..b92561eea3da 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
> @@ -326,11 +326,9 @@ static int amdgpu_vkms_prepare_fb(struct drm_plane *plane,
>  		return r;
>  	}
>  
> -	r = dma_resv_reserve_fences(rbo->tbo.base.resv, 1);
> -	if (r) {
> -		dev_err(adev->dev, "allocating fence slot failed (%d)\n", r);
> +	r = dma_resv_reserve_fences(rbo->tbo.base.resv, TTM_NUM_MOVE_FENCES);
> +	if (r)
>  		goto error_unlock;
> -	}
>  
>  	if (plane->type != DRM_PLANE_TYPE_CURSOR)
>  		domain = amdgpu_display_supported_domains(adev, rbo->flags);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 5061d5b0f875..62f37a9ca966 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2656,7 +2656,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>  	}
>  
>  	amdgpu_vm_bo_base_init(&vm->root, vm, root_bo);
> -	r = dma_resv_reserve_fences(root_bo->tbo.base.resv, 1);
> +	r = dma_resv_reserve_fences(root_bo->tbo.base.resv,
> +				    TTM_NUM_MOVE_FENCES);

Here we only need a single slot to clear the root PD during init.

With those two removed the patch is Reviewed-by: Christian König <christian.koenig@amd.com>.

Fingers crossed that we haven't missed any.

Regards,
Christian.

>  	if (r)
>  		goto error_free_root;
>  
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 97c2270f278f..0f8d85ee97fc 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -627,9 +627,8 @@ svm_range_vram_node_new(struct kfd_node *node, struct svm_range *prange,
>  		}
>  	}
>  
> -	r = dma_resv_reserve_fences(bo->tbo.base.resv, 1);
> +	r = dma_resv_reserve_fences(bo->tbo.base.resv, TTM_NUM_MOVE_FENCES);
>  	if (r) {
> -		pr_debug("failed %d to reserve bo\n", r);
>  		amdgpu_bo_unreserve(bo);
>  		goto reserve_bo_failed;
>  	}
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> index 56cb866ac6f8..ceb55dd183ed 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> @@ -952,11 +952,9 @@ static int amdgpu_dm_plane_helper_prepare_fb(struct drm_plane *plane,
>  		return r;
>  	}
>  
> -	r = dma_resv_reserve_fences(rbo->tbo.base.resv, 1);
> -	if (r) {
> -		drm_err(adev_to_drm(adev), "reserving fence slot failed (%d)\n", r);
> +	r = dma_resv_reserve_fences(rbo->tbo.base.resv, TTM_NUM_MOVE_FENCES);
> +	if (r)
>  		goto error_unlock;
> -	}
>  
>  	if (plane->type != DRM_PLANE_TYPE_CURSOR)
>  		domain = amdgpu_display_supported_domains(adev, rbo->flags);
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c
> index d9527c05fc87..110f0173eee6 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c
> @@ -106,11 +106,9 @@ static int amdgpu_dm_wb_prepare_job(struct drm_writeback_connector *wb_connector
>  		return r;
>  	}
>  
> -	r = dma_resv_reserve_fences(rbo->tbo.base.resv, 1);
> -	if (r) {
> -		drm_err(adev_to_drm(adev), "reserving fence slot failed (%d)\n", r);
> +	r = dma_resv_reserve_fences(rbo->tbo.base.resv, TTM_NUM_MOVE_FENCES);
> +	if (r)
>  		goto error_unlock;
> -	}
>  
>  	domain = amdgpu_display_supported_domains(adev, rbo->flags);
>  


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 25/28] drm/amdgpu: pass all the sdma scheds to amdgpu_mman
  2025-11-21 10:12 ` [PATCH v3 25/28] drm/amdgpu: pass all the sdma scheds to amdgpu_mman Pierre-Eric Pelloux-Prayer
@ 2025-11-21 15:52   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 15:52 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter, Felix Kuehling
  Cc: amd-gfx, dri-devel, linux-kernel



On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> This will allow the use of all of them for clear/fill buffer
> operations.
> Since drm_sched_entity_init requires a scheduler array, we
> store schedulers rather than rings. For the few places that need
> access to a ring, we can get it from the sched using container_of.
> 
> Since the code is the same for all sdma versions, add a new
> helper amdgpu_sdma_set_buffer_funcs_scheds to set buffer_funcs_scheds
> based on the number of sdma instances.
> 
> Note: the new sched array is identical to the amdgpu_vm_manager one.
> These 2 could be merged.
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> Acked-by: Felix Kuehling <felix.kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c    |  4 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 31 +++++++++++++++++-----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h    |  3 ++-
>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c      |  3 +--
>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c     |  3 +--
>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c     |  3 +--
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c     |  6 +----
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c   |  6 +----
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c     |  6 ++---
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c     |  6 ++---
>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c     |  3 +--
>  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c     |  3 +--
>  drivers/gpu/drm/amd/amdgpu/si_dma.c        |  3 +--
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c   |  3 ++-
>  16 files changed, 47 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index a50e3c0a4b18..d07075fe2d8c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1614,6 +1614,8 @@ ssize_t amdgpu_get_soft_full_reset_mask(struct amdgpu_ring *ring);
>  ssize_t amdgpu_show_reset_mask(char *buf, uint32_t supported_reset);
>  void amdgpu_sdma_set_vm_pte_scheds(struct amdgpu_device *adev,
>  				   const struct amdgpu_vm_pte_funcs *vm_pte_funcs);
> +void amdgpu_sdma_set_buffer_funcs_scheds(struct amdgpu_device *adev,
> +					 const struct amdgpu_buffer_funcs *buffer_funcs);
>  
>  /* atpx handler */
>  #if defined(CONFIG_VGA_SWITCHEROO)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 7167db54d722..9d3931d31d96 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4527,7 +4527,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>  	adev->num_rings = 0;
>  	RCU_INIT_POINTER(adev->gang_submit, dma_fence_get_stub());
>  	adev->mman.buffer_funcs = NULL;
> -	adev->mman.buffer_funcs_ring = NULL;
> +	adev->mman.num_buffer_funcs_scheds = 0;
>  	adev->vm_manager.vm_pte_funcs = NULL;
>  	adev->vm_manager.vm_pte_num_scheds = 0;
>  	adev->gmc.gmc_funcs = NULL;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 0d2784fe0be3..ff9a066870f2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -651,12 +651,14 @@ int amdgpu_gmc_allocate_vm_inv_eng(struct amdgpu_device *adev)
>  void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
>  			      uint32_t vmhub, uint32_t flush_type)
>  {
> -	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
> +	struct amdgpu_ring *ring;
>  	struct amdgpu_vmhub *hub = &adev->vmhub[vmhub];
>  	struct dma_fence *fence;
>  	struct amdgpu_job *job;
>  	int r;
>  
> +	ring = to_amdgpu_ring(adev->mman.buffer_funcs_scheds[0]);
> +
>  	if (!hub->sdma_invalidation_workaround || vmid ||
>  	    !adev->mman.buffer_funcs_enabled || !adev->ib_pool_ready ||
>  	    !ring->sched.ready) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 575a4d4a1747..eec0cab8060c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -168,7 +168,7 @@ amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_ttm_buffer_entit
>  {
>  	struct amdgpu_ring *ring;
>  
> -	ring = adev->mman.buffer_funcs_ring;
> +	ring = to_amdgpu_ring(adev->mman.buffer_funcs_scheds[0]);
>  	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
>  	WARN_ON(job->ibs[0].length_dw > num_dw);
>  
> @@ -2241,18 +2241,16 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>  	    adev->mman.buffer_funcs_enabled == enable || adev->gmc.is_app_apu)
>  		return reserved_windows;
>  
> -	if ((!adev->mman.buffer_funcs_ring || !adev->mman.buffer_funcs_ring->sched.ready) &&
> +	if ((!adev->mman.num_buffer_funcs_scheds || !adev->mman.buffer_funcs_scheds[0]->ready) &&
>  	    enable) {
>  		dev_warn(adev->dev, "Not enabling DMA transfers for in kernel use");
>  		return 0;
>  	}
>  
>  	if (enable) {
> -		struct amdgpu_ring *ring;
>  		struct drm_gpu_scheduler *sched;
>  
> -		ring = adev->mman.buffer_funcs_ring;
> -		sched = &ring->sched;
> +		sched = adev->mman.buffer_funcs_scheds[0];
>  		r = drm_sched_entity_init(&adev->mman.default_entity.base,
>  					  DRM_SCHED_PRIORITY_KERNEL, &sched,
>  					  1, NULL);
> @@ -2387,7 +2385,7 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev,
>  	unsigned int i;
>  	int r;
>  
> -	ring = adev->mman.buffer_funcs_ring;
> +	ring = to_amdgpu_ring(adev->mman.buffer_funcs_scheds[0]);
>  
>  	if (!ring->sched.ready) {
>  		dev_err(adev->dev,
> @@ -2624,6 +2622,27 @@ int amdgpu_ttm_evict_resources(struct amdgpu_device *adev, int mem_type)
>  	return ttm_resource_manager_evict_all(&adev->mman.bdev, man);
>  }
>  
> +void amdgpu_sdma_set_buffer_funcs_scheds(struct amdgpu_device *adev,
> +					 const struct amdgpu_buffer_funcs *buffer_funcs)
> +{
> +	struct amdgpu_vmhub *hub = &adev->vmhub[AMDGPU_GFXHUB(0)];
> +	struct drm_gpu_scheduler *sched;
> +	int i;
> +
> +	adev->mman.buffer_funcs = buffer_funcs;
> +
> +	for (i = 0; i < adev->sdma.num_instances; i++) {
> +		if (adev->sdma.has_page_queue)
> +			sched = &adev->sdma.instance[i].page.sched;
> +		else
> +			sched = &adev->sdma.instance[i].ring.sched;
> +		adev->mman.buffer_funcs_scheds[i] = sched;
> +	}
> +
> +	adev->mman.num_buffer_funcs_scheds = hub->sdma_invalidation_workaround ?
> +		1 : adev->sdma.num_instances;

That won't work, hub->sdma_invalidation_workaround is only initialized after that is called here.

We need to check this when the TTM functions are enabled.

Regards,
Christian.

> +}
> +
>  #if defined(CONFIG_DEBUG_FS)
>  
>  static int amdgpu_ttm_page_pool_show(struct seq_file *m, void *unused)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index 0785a2c594f7..653a4d17543e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -66,7 +66,8 @@ struct amdgpu_mman {
>  
>  	/* buffer handling */
>  	const struct amdgpu_buffer_funcs	*buffer_funcs;
> -	struct amdgpu_ring			*buffer_funcs_ring;
> +	struct drm_gpu_scheduler		*buffer_funcs_scheds[AMDGPU_MAX_RINGS];
> +	u32					num_buffer_funcs_scheds;
>  	bool					buffer_funcs_enabled;
>  
>  	struct mutex				gtt_window_lock;
> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> index 22780c09177d..26276dcfd458 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> @@ -1340,8 +1340,7 @@ static const struct amdgpu_buffer_funcs cik_sdma_buffer_funcs = {
>  
>  static void cik_sdma_set_buffer_funcs(struct amdgpu_device *adev)
>  {
> -	adev->mman.buffer_funcs = &cik_sdma_buffer_funcs;
> -	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
> +	amdgpu_sdma_set_buffer_funcs_scheds(adev, &cik_sdma_buffer_funcs);
>  }
>  
>  const struct amdgpu_ip_block_version cik_sdma_ip_block =
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> index 0090ace49024..c6a059ca59e5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> @@ -1235,8 +1235,7 @@ static const struct amdgpu_buffer_funcs sdma_v2_4_buffer_funcs = {
>  
>  static void sdma_v2_4_set_buffer_funcs(struct amdgpu_device *adev)
>  {
> -	adev->mman.buffer_funcs = &sdma_v2_4_buffer_funcs;
> -	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
> +	amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v2_4_buffer_funcs);
>  }
>  
>  const struct amdgpu_ip_block_version sdma_v2_4_ip_block = {
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> index 2526d393162a..cb516a25210d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> @@ -1677,8 +1677,7 @@ static const struct amdgpu_buffer_funcs sdma_v3_0_buffer_funcs = {
>  
>  static void sdma_v3_0_set_buffer_funcs(struct amdgpu_device *adev)
>  {
> -	adev->mman.buffer_funcs = &sdma_v3_0_buffer_funcs;
> -	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
> +	amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v3_0_buffer_funcs);
>  }
>  
>  const struct amdgpu_ip_block_version sdma_v3_0_ip_block =
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index a35d9951e22a..f234ee54f39e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -2615,11 +2615,7 @@ static const struct amdgpu_buffer_funcs sdma_v4_0_buffer_funcs = {
>  
>  static void sdma_v4_0_set_buffer_funcs(struct amdgpu_device *adev)
>  {
> -	adev->mman.buffer_funcs = &sdma_v4_0_buffer_funcs;
> -	if (adev->sdma.has_page_queue)
> -		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].page;
> -	else
> -		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
> +	amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v4_0_buffer_funcs);
>  }
>  
>  static void sdma_v4_0_get_ras_error_count(uint32_t value,
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> index 7f77367848d4..cd7627b03066 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> @@ -2316,11 +2316,7 @@ static const struct amdgpu_buffer_funcs sdma_v4_4_2_buffer_funcs = {
>  
>  static void sdma_v4_4_2_set_buffer_funcs(struct amdgpu_device *adev)
>  {
> -	adev->mman.buffer_funcs = &sdma_v4_4_2_buffer_funcs;
> -	if (adev->sdma.has_page_queue)
> -		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].page;
> -	else
> -		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
> +	amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v4_4_2_buffer_funcs);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> index 7ce13c5d4e61..5b495fda4f71 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> @@ -2073,10 +2073,8 @@ static const struct amdgpu_buffer_funcs sdma_v5_0_buffer_funcs = {
>  
>  static void sdma_v5_0_set_buffer_funcs(struct amdgpu_device *adev)
>  {
> -	if (adev->mman.buffer_funcs == NULL) {
> -		adev->mman.buffer_funcs = &sdma_v5_0_buffer_funcs;
> -		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
> -	}
> +	if (adev->mman.buffer_funcs == NULL)
> +		amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v5_0_buffer_funcs);
>  }
>  
>  const struct amdgpu_ip_block_version sdma_v5_0_ip_block = {
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> index 98beff18cf28..be2d9e57c459 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> @@ -2084,10 +2084,8 @@ static const struct amdgpu_buffer_funcs sdma_v5_2_buffer_funcs = {
>  
>  static void sdma_v5_2_set_buffer_funcs(struct amdgpu_device *adev)
>  {
> -	if (adev->mman.buffer_funcs == NULL) {
> -		adev->mman.buffer_funcs = &sdma_v5_2_buffer_funcs;
> -		adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
> -	}
> +	if (adev->mman.buffer_funcs == NULL)
> +		amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v5_2_buffer_funcs);
>  }
>  
>  const struct amdgpu_ip_block_version sdma_v5_2_ip_block = {
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> index c32331b72ba0..ed8937fe76ea 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> @@ -1891,8 +1891,7 @@ static const struct amdgpu_buffer_funcs sdma_v6_0_buffer_funcs = {
>  
>  static void sdma_v6_0_set_buffer_funcs(struct amdgpu_device *adev)
>  {
> -	adev->mman.buffer_funcs = &sdma_v6_0_buffer_funcs;
> -	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
> +	amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v6_0_buffer_funcs);
>  }
>  
>  const struct amdgpu_ip_block_version sdma_v6_0_ip_block = {
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
> index 9318d23eb71e..f4c91153542c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
> @@ -1833,8 +1833,7 @@ static const struct amdgpu_buffer_funcs sdma_v7_0_buffer_funcs = {
>  
>  static void sdma_v7_0_set_buffer_funcs(struct amdgpu_device *adev)
>  {
> -	adev->mman.buffer_funcs = &sdma_v7_0_buffer_funcs;
> -	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
> +	amdgpu_sdma_set_buffer_funcs_scheds(adev, &sdma_v7_0_buffer_funcs);
>  }
>  
>  const struct amdgpu_ip_block_version sdma_v7_0_ip_block = {
> diff --git a/drivers/gpu/drm/amd/amdgpu/si_dma.c b/drivers/gpu/drm/amd/amdgpu/si_dma.c
> index b85df997ed49..ac6272fcffe9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/si_dma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/si_dma.c
> @@ -833,8 +833,7 @@ static const struct amdgpu_buffer_funcs si_dma_buffer_funcs = {
>  
>  static void si_dma_set_buffer_funcs(struct amdgpu_device *adev)
>  {
> -	adev->mman.buffer_funcs = &si_dma_buffer_funcs;
> -	adev->mman.buffer_funcs_ring = &adev->sdma.instance[0].ring;
> +	amdgpu_sdma_set_buffer_funcs_scheds(adev, &si_dma_buffer_funcs);
>  }
>  
>  const struct amdgpu_ip_block_version si_dma_ip_block =
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index 5dd65f05a1e0..a149265e3611 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -128,13 +128,14 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys,
>  			     struct dma_fence **mfence)
>  {
>  	const u64 GTT_MAX_PAGES = AMDGPU_GTT_MAX_TRANSFER_SIZE;
> -	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
> +	struct amdgpu_ring *ring;
>  	struct amdgpu_ttm_buffer_entity *entity;
>  	u64 gart_s, gart_d;
>  	struct dma_fence *next;
>  	u64 size;
>  	int r;
>  
> +	ring = to_amdgpu_ring(adev->mman.buffer_funcs_scheds[0]);
>  	entity = &adev->mman.move_entities[0];
>  
>  	mutex_lock(&entity->lock);


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 27/28] drm/amdgpu: get rid of amdgpu_ttm_clear_buffer
  2025-11-21 10:12 ` [PATCH v3 27/28] drm/amdgpu: get rid of amdgpu_ttm_clear_buffer Pierre-Eric Pelloux-Prayer
@ 2025-11-21 15:54   ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-21 15:54 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
	Simona Vetter, Sumit Semwal
  Cc: amd-gfx, dri-devel, linux-kernel, linux-media, linaro-mm-sig

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
> It's doing the same thing as amdgpu_fill_buffer(src_data=0), so drop it.
> 
> The only caveat is that amdgpu_res_cleared() return value is only valid
> right after allocation.
> 
> ---
> v2: introduce new "bool consider_clear_status" arg
> ---
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

It would be better to have that ealier in the patch set, but I guess that gives you rebasing problems?

Christian.

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 90 +++++-----------------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h    |  7 +-
>  3 files changed, 33 insertions(+), 80 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 7d8d70135cc2..dccc31d0128e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -725,13 +725,17 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>  	    bo->tbo.resource->mem_type == TTM_PL_VRAM) {
>  		struct dma_fence *fence;
>  
> -		r = amdgpu_ttm_clear_buffer(adev, bo, bo->tbo.base.resv, &fence);
> +		r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev),
> +				       bo, 0, NULL, &fence,
> +				       true, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
>  		if (unlikely(r))
>  			goto fail_unreserve;
>  
> -		dma_resv_add_fence(bo->tbo.base.resv, fence,
> -				   DMA_RESV_USAGE_KERNEL);
> -		dma_fence_put(fence);
> +		if (fence) {
> +			dma_resv_add_fence(bo->tbo.base.resv, fence,
> +					   DMA_RESV_USAGE_KERNEL);
> +			dma_fence_put(fence);
> +		}
>  	}
>  	if (!bp->resv)
>  		amdgpu_bo_unreserve(bo);
> @@ -1323,8 +1327,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>  		goto out;
>  
>  	r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev),
> -			       abo, 0, &bo->base._resv,
> -			       &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
> +			       abo, 0, &bo->base._resv, &fence,
> +			       false, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
>  	if (WARN_ON(r))
>  		goto out;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 39cfe2dbdf03..c65c411ce26e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -459,7 +459,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
>  
>  		r = amdgpu_fill_buffer(adev, entity,
>  				       abo, 0, NULL, &wipe_fence,
> -				       AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
> +				       false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
>  		if (r) {
>  			goto error;
>  		} else if (wipe_fence) {
> @@ -2459,79 +2459,28 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
>  }
>  
>  /**
> - * amdgpu_ttm_clear_buffer - clear memory buffers
> + * amdgpu_fill_buffer - fill a buffer with a given value
>   * @adev: amdgpu device object
> - * @bo: amdgpu buffer object
> - * @resv: reservation object
> - * @fence: dma_fence associated with the operation
> + * @entity: optional entity to use. If NULL, the clearing entities will be
> + *          used to load-balance the partial clears
> + * @bo: the bo to fill
> + * @src_data: the value to set
> + * @resv: fences contained in this reservation will be used as dependencies.
> + * @out_fence: the fence from the last clear will be stored here. It might be
> + *             NULL if no job was run.
> + * @dependency: optional input dependency fence.
> + * @consider_clear_status: true if region reported as cleared by amdgpu_res_cleared()
> + *                         are skipped.
> + * @k_job_id: trace id
>   *
> - * Clear the memory buffer resource.
> - *
> - * Returns:
> - * 0 for success or a negative error code on failure.
>   */
> -int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
> -			    struct amdgpu_bo *bo,
> -			    struct dma_resv *resv,
> -			    struct dma_fence **fence)
> -{
> -	struct amdgpu_ttm_buffer_entity *entity;
> -	struct amdgpu_res_cursor cursor;
> -	u64 addr;
> -	int r = 0;
> -
> -	if (!adev->mman.buffer_funcs_enabled)
> -		return -EINVAL;
> -
> -	if (!fence)
> -		return -EINVAL;
> -	entity = &adev->mman.clear_entities[0];
> -	*fence = dma_fence_get_stub();
> -
> -	amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor);
> -
> -	mutex_lock(&entity->lock);
> -	while (cursor.remaining) {
> -		struct dma_fence *next = NULL;
> -		u64 size;
> -
> -		if (amdgpu_res_cleared(&cursor)) {
> -			amdgpu_res_next(&cursor, cursor.size);
> -			continue;
> -		}
> -
> -		/* Never clear more than 256MiB at once to avoid timeouts */
> -		size = min(cursor.size, 256ULL << 20);
> -
> -		r = amdgpu_ttm_map_buffer(adev, entity,
> -					  &bo->tbo, bo->tbo.resource, &cursor,
> -					  1, false, false, &size, &addr);
> -		if (r)
> -			goto err;
> -
> -		r = amdgpu_ttm_fill_mem(adev, entity, 0, addr, size, resv,
> -					&next, true,
> -					AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
> -		if (r)
> -			goto err;
> -
> -		dma_fence_put(*fence);
> -		*fence = next;
> -
> -		amdgpu_res_next(&cursor, size);
> -	}
> -err:
> -	mutex_unlock(&entity->lock);
> -
> -	return r;
> -}
> -
>  int amdgpu_fill_buffer(struct amdgpu_device *adev,
>  		       struct amdgpu_ttm_buffer_entity *entity,
>  		       struct amdgpu_bo *bo,
>  		       uint32_t src_data,
>  		       struct dma_resv *resv,
> -		       struct dma_fence **f,
> +		       struct dma_fence **out_fence,
> +		       bool consider_clear_status,
>  		       u64 k_job_id)
>  {
>  	struct dma_fence *fence = NULL;
> @@ -2551,6 +2500,11 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
>  		struct dma_fence *next;
>  		uint64_t cur_size, to;
>  
> +		if (consider_clear_status && amdgpu_res_cleared(&dst)) {
> +			amdgpu_res_next(&dst, dst.size);
> +			continue;
> +		}
> +
>  		/* Never fill more than 256MiB at once to avoid timeouts */
>  		cur_size = min(dst.size, 256ULL << 20);
>  
> @@ -2574,9 +2528,7 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev,
>  	}
>  error:
>  	mutex_unlock(&entity->lock);
> -	if (f)
> -		*f = dma_fence_get(fence);
> -	dma_fence_put(fence);
> +	*out_fence = fence;
>  	return r;
>  }
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index 653a4d17543e..f3bdbcec9afc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -181,16 +181,13 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev,
>  		       struct dma_resv *resv,
>  		       struct dma_fence **fence,
>  		       bool vm_needs_flush, uint32_t copy_flags);
> -int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
> -			    struct amdgpu_bo *bo,
> -			    struct dma_resv *resv,
> -			    struct dma_fence **fence);
>  int amdgpu_fill_buffer(struct amdgpu_device *adev,
>  		       struct amdgpu_ttm_buffer_entity *entity,
>  		       struct amdgpu_bo *bo,
>  		       uint32_t src_data,
>  		       struct dma_resv *resv,
> -		       struct dma_fence **f,
> +		       struct dma_fence **out_fence,
> +		       bool consider_clear_status,
>  		       u64 k_job_id);
>  struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct amdgpu_device *adev);
>  


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 10/28] drm/amdgpu: add amdgpu_device argument to ttm functions that need it
  2025-11-21 13:24   ` Christian König
@ 2025-11-21 16:09     ` Pierre-Eric Pelloux-Prayer
  2025-11-24  8:42       ` Christian König
  0 siblings, 1 reply; 60+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-11-21 16:09 UTC (permalink / raw)
  To: Christian König, Pierre-Eric Pelloux-Prayer, Alex Deucher,
	David Airlie, Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel



Le 21/11/2025 à 14:24, Christian König a écrit :
> On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
>> Instead of getting it through amdgpu_ttm_adev(bo->tbo.bdev).
> 
> Why should that be a good idea?

IMO explicit parameters are clearer than implicit ones so if these functions 
depends on adev, they might as well get it as an argument instead of fishing it 
from one of their other arguments.

But if you prefer to keep the existing code I can drop this patch.

Pierre-Eric


> 
> Regards,
> Christian.
> 
>>
>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  5 +++--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 11 ++++++-----
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h    |  6 ++++--
>>   3 files changed, 13 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 858eb9fa061b..2ee48f76483d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -725,7 +725,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>   	    bo->tbo.resource->mem_type == TTM_PL_VRAM) {
>>   		struct dma_fence *fence;
>>   
>> -		r = amdgpu_ttm_clear_buffer(bo, bo->tbo.base.resv, &fence);
>> +		r = amdgpu_ttm_clear_buffer(adev, bo, bo->tbo.base.resv, &fence);
>>   		if (unlikely(r))
>>   			goto fail_unreserve;
>>   
>> @@ -1322,7 +1322,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>>   	if (r)
>>   		goto out;
>>   
>> -	r = amdgpu_fill_buffer(&adev->mman.clear_entity, abo, 0, &bo->base._resv,
>> +	r = amdgpu_fill_buffer(adev,
>> +			       &adev->mman.clear_entity, abo, 0, &bo->base._resv,
>>   			       &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
>>   	if (WARN_ON(r))
>>   		goto out;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index 1d3afad885da..57dff2df433b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -414,7 +414,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
>>   	    (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) {
>>   		struct dma_fence *wipe_fence = NULL;
>>   
>> -		r = amdgpu_fill_buffer(&adev->mman.move_entity,
>> +		r = amdgpu_fill_buffer(adev, &adev->mman.move_entity,
>>   				       abo, 0, NULL, &wipe_fence,
>>   				       AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
>>   		if (r) {
>> @@ -2350,6 +2350,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
>>   
>>   /**
>>    * amdgpu_ttm_clear_buffer - clear memory buffers
>> + * @adev: amdgpu device object
>>    * @bo: amdgpu buffer object
>>    * @resv: reservation object
>>    * @fence: dma_fence associated with the operation
>> @@ -2359,11 +2360,11 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
>>    * Returns:
>>    * 0 for success or a negative error code on failure.
>>    */
>> -int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>> +int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
>> +			    struct amdgpu_bo *bo,
>>   			    struct dma_resv *resv,
>>   			    struct dma_fence **fence)
>>   {
>> -	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>>   	struct amdgpu_res_cursor cursor;
>>   	u64 addr;
>>   	int r = 0;
>> @@ -2414,14 +2415,14 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>>   	return r;
>>   }
>>   
>> -int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity,
>> +int amdgpu_fill_buffer(struct amdgpu_device *adev,
>> +		       struct amdgpu_ttm_buffer_entity *entity,
>>   		       struct amdgpu_bo *bo,
>>   		       uint32_t src_data,
>>   		       struct dma_resv *resv,
>>   		       struct dma_fence **f,
>>   		       u64 k_job_id)
>>   {
>> -	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>>   	struct dma_fence *fence = NULL;
>>   	struct amdgpu_res_cursor dst;
>>   	int r;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>> index 9288599c9c46..d0f55a7edd30 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>> @@ -174,10 +174,12 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev,
>>   		       struct dma_resv *resv,
>>   		       struct dma_fence **fence,
>>   		       bool vm_needs_flush, uint32_t copy_flags);
>> -int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>> +int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
>> +			    struct amdgpu_bo *bo,
>>   			    struct dma_resv *resv,
>>   			    struct dma_fence **fence);
>> -int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity,
>> +int amdgpu_fill_buffer(struct amdgpu_device *adev,
>> +		       struct amdgpu_ttm_buffer_entity *entity,
>>   		       struct amdgpu_bo *bo,
>>   		       uint32_t src_data,
>>   		       struct dma_resv *resv,


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 01/28] drm/amdgpu: give each kernel job a unique id
  2025-11-21 10:12 ` [PATCH v3 01/28] drm/amdgpu: give each kernel job a unique id Pierre-Eric Pelloux-Prayer
  2025-11-21 12:51   ` Christian König
@ 2025-11-21 20:02   ` Felix Kuehling
  1 sibling, 0 replies; 60+ messages in thread
From: Felix Kuehling @ 2025-11-21 20:02 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Alex Deucher, Christian König,
	David Airlie, Simona Vetter
  Cc: Arunpravin Paneer Selvam, amd-gfx, dri-devel, linux-kernel

Patches 1, 3, 4, 5, 9, 11, 21, 23, 25 (the ones with KFD changes) are

Acked-by: Felix Kuehling <felix.kuehling@amd.com>



On 2025-11-21 05:12, Pierre-Eric Pelloux-Prayer wrote:
> Userspace jobs have drm_file.client_id as a unique identifier
> as job's owners. For kernel jobs, we can allocate arbitrary
> values - the risk of overlap with userspace ids is small (given
> that it's a u64 value).
> In the unlikely case the overlap happens, it'll only impact
> trace events.
>
> Since this ID is traced in the gpu_scheduler trace events, this
> allows to determine the source of each job sent to the hardware.
>
> To make grepping easier, the IDs are defined as they will appear
> in the trace output.
>
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> Acked-by: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
> Link: https://lore.kernel.org/r/20250604122827.2191-1-pierre-eric.pelloux-prayer@amd.com
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c     |  3 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c     |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  5 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     | 19 +++++++++++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c    |  3 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  3 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     | 28 +++++++++++++--------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h     |  3 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |  3 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c     |  5 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  8 +++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      |  6 +++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c  |  4 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c   |  4 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 12 +++++----
>   drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |  6 +++--
>   drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c       |  6 +++--
>   drivers/gpu/drm/amd/amdkfd/kfd_migrate.c    |  3 ++-
>   19 files changed, 84 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 5a1904b0b064..1ffbd416a8ad 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -1551,7 +1551,8 @@ static int amdgpu_gfx_run_cleaner_shader_job(struct amdgpu_ring *ring)
>   	owner = (void *)(unsigned long)atomic_inc_return(&counter);
>   
>   	r = amdgpu_job_alloc_with_ib(ring->adev, &entity, owner,
> -				     64, 0, &job);
> +				     64, 0, &job,
> +				     AMDGPU_KERNEL_JOB_ID_CLEANER_SHADER);
>   	if (r)
>   		goto err;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 0017bd10d452..ea8ec160b98a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -690,7 +690,7 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
>   	r = amdgpu_job_alloc_with_ib(ring->adev, &adev->mman.high_pr,
>   				     AMDGPU_FENCE_OWNER_UNDEFINED,
>   				     16 * 4, AMDGPU_IB_POOL_IMMEDIATE,
> -				     &job);
> +				     &job, AMDGPU_KERNEL_JOB_ID_FLUSH_GPU_TLB);
>   	if (r)
>   		goto error_alloc;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index efa3281145f6..b284bd8021df 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -232,11 +232,12 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev,
>   			     struct drm_sched_entity *entity, void *owner,
>   			     size_t size, enum amdgpu_ib_pool_type pool_type,
> -			     struct amdgpu_job **job)
> +			     struct amdgpu_job **job, u64 k_job_id)
>   {
>   	int r;
>   
> -	r = amdgpu_job_alloc(adev, NULL, entity, owner, 1, job, 0);
> +	r = amdgpu_job_alloc(adev, NULL, entity, owner, 1, job,
> +			     k_job_id);
>   	if (r)
>   		return r;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> index d25f1fcf0242..7abf069d17d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> @@ -44,6 +44,22 @@
>   struct amdgpu_fence;
>   enum amdgpu_ib_pool_type;
>   
> +/* Internal kernel job ids. (decreasing values, starting from U64_MAX). */
> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE              (18446744073709551615ULL)
> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE_PDES         (18446744073709551614ULL)
> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE_RANGE        (18446744073709551613ULL)
> +#define AMDGPU_KERNEL_JOB_ID_VM_PT_CLEAR            (18446744073709551612ULL)
> +#define AMDGPU_KERNEL_JOB_ID_TTM_MAP_BUFFER         (18446744073709551611ULL)
> +#define AMDGPU_KERNEL_JOB_ID_TTM_ACCESS_MEMORY_SDMA (18446744073709551610ULL)
> +#define AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER        (18446744073709551609ULL)
> +#define AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE       (18446744073709551608ULL)
> +#define AMDGPU_KERNEL_JOB_ID_MOVE_BLIT              (18446744073709551607ULL)
> +#define AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER       (18446744073709551606ULL)
> +#define AMDGPU_KERNEL_JOB_ID_CLEANER_SHADER         (18446744073709551605ULL)
> +#define AMDGPU_KERNEL_JOB_ID_FLUSH_GPU_TLB          (18446744073709551604ULL)
> +#define AMDGPU_KERNEL_JOB_ID_KFD_GART_MAP           (18446744073709551603ULL)
> +#define AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST          (18446744073709551602ULL)
> +
>   struct amdgpu_job {
>   	struct drm_sched_job    base;
>   	struct amdgpu_vm	*vm;
> @@ -97,7 +113,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev,
>   			     struct drm_sched_entity *entity, void *owner,
>   			     size_t size, enum amdgpu_ib_pool_type pool_type,
> -			     struct amdgpu_job **job);
> +			     struct amdgpu_job **job,
> +			     u64 k_job_id);
>   void amdgpu_job_set_resources(struct amdgpu_job *job, struct amdgpu_bo *gds,
>   			      struct amdgpu_bo *gws, struct amdgpu_bo *oa);
>   void amdgpu_job_free_resources(struct amdgpu_job *job);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
> index 91678621f1ff..63ee6ba6a931 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
> @@ -196,7 +196,8 @@ static int amdgpu_jpeg_dec_set_reg(struct amdgpu_ring *ring, uint32_t handle,
>   	int i, r;
>   
>   	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
> -				     AMDGPU_IB_POOL_DIRECT, &job);
> +				     AMDGPU_IB_POOL_DIRECT, &job,
> +				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>   	if (r)
>   		return r;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 24ebba43a469..926a3f09a776 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -1322,7 +1322,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>   	if (r)
>   		goto out;
>   
> -	r = amdgpu_fill_buffer(abo, 0, &bo->base._resv, &fence, true);
> +	r = amdgpu_fill_buffer(abo, 0, &bo->base._resv, &fence, true,
> +			       AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
>   	if (WARN_ON(r))
>   		goto out;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 04a79ef05f90..6a1434391fb8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -226,7 +226,8 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo,
>   	r = amdgpu_job_alloc_with_ib(adev, &adev->mman.high_pr,
>   				     AMDGPU_FENCE_OWNER_UNDEFINED,
>   				     num_dw * 4 + num_bytes,
> -				     AMDGPU_IB_POOL_DELAYED, &job);
> +				     AMDGPU_IB_POOL_DELAYED, &job,
> +				     AMDGPU_KERNEL_JOB_ID_TTM_MAP_BUFFER);
>   	if (r)
>   		return r;
>   
> @@ -398,7 +399,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
>   		struct dma_fence *wipe_fence = NULL;
>   
>   		r = amdgpu_fill_buffer(abo, 0, NULL, &wipe_fence,
> -				       false);
> +				       false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
>   		if (r) {
>   			goto error;
>   		} else if (wipe_fence) {
> @@ -1480,7 +1481,8 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
>   	r = amdgpu_job_alloc_with_ib(adev, &adev->mman.high_pr,
>   				     AMDGPU_FENCE_OWNER_UNDEFINED,
>   				     num_dw * 4, AMDGPU_IB_POOL_DELAYED,
> -				     &job);
> +				     &job,
> +				     AMDGPU_KERNEL_JOB_ID_TTM_ACCESS_MEMORY_SDMA);
>   	if (r)
>   		goto out;
>   
> @@ -2204,7 +2206,7 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
>   				  struct dma_resv *resv,
>   				  bool vm_needs_flush,
>   				  struct amdgpu_job **job,
> -				  bool delayed)
> +				  bool delayed, u64 k_job_id)
>   {
>   	enum amdgpu_ib_pool_type pool = direct_submit ?
>   		AMDGPU_IB_POOL_DIRECT :
> @@ -2214,7 +2216,7 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
>   						    &adev->mman.high_pr;
>   	r = amdgpu_job_alloc_with_ib(adev, entity,
>   				     AMDGPU_FENCE_OWNER_UNDEFINED,
> -				     num_dw * 4, pool, job);
> +				     num_dw * 4, pool, job, k_job_id);
>   	if (r)
>   		return r;
>   
> @@ -2254,7 +2256,8 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
>   	num_loops = DIV_ROUND_UP(byte_count, max_bytes);
>   	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8);
>   	r = amdgpu_ttm_prepare_job(adev, direct_submit, num_dw,
> -				   resv, vm_needs_flush, &job, false);
> +				   resv, vm_needs_flush, &job, false,
> +				   AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER);
>   	if (r)
>   		return r;
>   
> @@ -2289,7 +2292,8 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data,
>   			       uint64_t dst_addr, uint32_t byte_count,
>   			       struct dma_resv *resv,
>   			       struct dma_fence **fence,
> -			       bool vm_needs_flush, bool delayed)
> +			       bool vm_needs_flush, bool delayed,
> +			       u64 k_job_id)
>   {
>   	struct amdgpu_device *adev = ring->adev;
>   	unsigned int num_loops, num_dw;
> @@ -2302,7 +2306,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data,
>   	num_loops = DIV_ROUND_UP_ULL(byte_count, max_bytes);
>   	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->fill_num_dw, 8);
>   	r = amdgpu_ttm_prepare_job(adev, false, num_dw, resv, vm_needs_flush,
> -				   &job, delayed);
> +				   &job, delayed, k_job_id);
>   	if (r)
>   		return r;
>   
> @@ -2372,7 +2376,8 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>   			goto err;
>   
>   		r = amdgpu_ttm_fill_mem(ring, 0, addr, size, resv,
> -					&next, true, true);
> +					&next, true, true,
> +					AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
>   		if (r)
>   			goto err;
>   
> @@ -2391,7 +2396,8 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
>   			uint32_t src_data,
>   			struct dma_resv *resv,
>   			struct dma_fence **f,
> -			bool delayed)
> +			bool delayed,
> +			u64 k_job_id)
>   {
>   	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>   	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
> @@ -2421,7 +2427,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
>   			goto error;
>   
>   		r = amdgpu_ttm_fill_mem(ring, src_data, to, cur_size, resv,
> -					&next, true, delayed);
> +					&next, true, delayed, k_job_id);
>   		if (r)
>   			goto error;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index 054d48823d5f..577ee04ce0bf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -175,7 +175,8 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
>   			uint32_t src_data,
>   			struct dma_resv *resv,
>   			struct dma_fence **fence,
> -			bool delayed);
> +			bool delayed,
> +			u64 k_job_id);
>   
>   int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
>   void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> index 74758b5ffc6c..5c38f0d30c87 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> @@ -1136,7 +1136,8 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring *ring, struct amdgpu_bo *bo,
>   	r = amdgpu_job_alloc_with_ib(ring->adev, &adev->uvd.entity,
>   				     AMDGPU_FENCE_OWNER_UNDEFINED,
>   				     64, direct ? AMDGPU_IB_POOL_DIRECT :
> -				     AMDGPU_IB_POOL_DELAYED, &job);
> +				     AMDGPU_IB_POOL_DELAYED, &job,
> +				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>   	if (r)
>   		return r;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index 709ca369cb52..a7d8f1ce6ac2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -491,7 +491,7 @@ static int amdgpu_vce_get_create_msg(struct amdgpu_ring *ring, uint32_t handle,
>   	r = amdgpu_job_alloc_with_ib(ring->adev, &ring->adev->vce.entity,
>   				     AMDGPU_FENCE_OWNER_UNDEFINED,
>   				     ib_size_dw * 4, AMDGPU_IB_POOL_DIRECT,
> -				     &job);
> +				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>   	if (r)
>   		return r;
>   
> @@ -582,7 +582,8 @@ static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,
>   				     AMDGPU_FENCE_OWNER_UNDEFINED,
>   				     ib_size_dw * 4,
>   				     direct ? AMDGPU_IB_POOL_DIRECT :
> -				     AMDGPU_IB_POOL_DELAYED, &job);
> +				     AMDGPU_IB_POOL_DELAYED, &job,
> +				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>   	if (r)
>   		return r;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> index 5ae7cc0d5f57..5e0786ea911b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> @@ -626,7 +626,7 @@ static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring *ring,
>   
>   	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL,
>   				     64, AMDGPU_IB_POOL_DIRECT,
> -				     &job);
> +				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>   	if (r)
>   		goto err;
>   
> @@ -806,7 +806,7 @@ static int amdgpu_vcn_dec_sw_send_msg(struct amdgpu_ring *ring,
>   
>   	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL,
>   				     ib_size_dw * 4, AMDGPU_IB_POOL_DIRECT,
> -				     &job);
> +				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>   	if (r)
>   		goto err;
>   
> @@ -936,7 +936,7 @@ static int amdgpu_vcn_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t hand
>   
>   	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL,
>   				     ib_size_dw * 4, AMDGPU_IB_POOL_DIRECT,
> -				     &job);
> +				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>   	if (r)
>   		return r;
>   
> @@ -1003,7 +1003,7 @@ static int amdgpu_vcn_enc_get_destroy_msg(struct amdgpu_ring *ring, uint32_t han
>   
>   	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL,
>   				     ib_size_dw * 4, AMDGPU_IB_POOL_DIRECT,
> -				     &job);
> +				     &job, AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>   	if (r)
>   		return r;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index e2587eea6c4a..193de267984e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -989,7 +989,8 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
>   	params.vm = vm;
>   	params.immediate = immediate;
>   
> -	r = vm->update_funcs->prepare(&params, NULL);
> +	r = vm->update_funcs->prepare(&params, NULL,
> +				      AMDGPU_KERNEL_JOB_ID_VM_UPDATE_PDES);
>   	if (r)
>   		goto error;
>   
> @@ -1158,7 +1159,8 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   		dma_fence_put(tmp);
>   	}
>   
> -	r = vm->update_funcs->prepare(&params, sync);
> +	r = vm->update_funcs->prepare(&params, sync,
> +				      AMDGPU_KERNEL_JOB_ID_VM_UPDATE_RANGE);
>   	if (r)
>   		goto error_free;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 330e4bdea387..139642eacdd0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -310,7 +310,7 @@ struct amdgpu_vm_update_params {
>   struct amdgpu_vm_update_funcs {
>   	int (*map_table)(struct amdgpu_bo_vm *bo);
>   	int (*prepare)(struct amdgpu_vm_update_params *p,
> -		       struct amdgpu_sync *sync);
> +		       struct amdgpu_sync *sync, u64 k_job_id);
>   	int (*update)(struct amdgpu_vm_update_params *p,
>   		      struct amdgpu_bo_vm *bo, uint64_t pe, uint64_t addr,
>   		      unsigned count, uint32_t incr, uint64_t flags);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
> index 0c1ef5850a5e..22e2e5b47341 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
> @@ -40,12 +40,14 @@ static int amdgpu_vm_cpu_map_table(struct amdgpu_bo_vm *table)
>    *
>    * @p: see amdgpu_vm_update_params definition
>    * @sync: sync obj with fences to wait on
> + * @k_job_id: the id for tracing/debug purposes
>    *
>    * Returns:
>    * Negativ errno, 0 for success.
>    */
>   static int amdgpu_vm_cpu_prepare(struct amdgpu_vm_update_params *p,
> -				 struct amdgpu_sync *sync)
> +				 struct amdgpu_sync *sync,
> +				 u64 k_job_id)
>   {
>   	if (!sync)
>   		return 0;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
> index f6ffc207ec2a..c7a7d51080a8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
> @@ -26,6 +26,7 @@
>   #include "amdgpu.h"
>   #include "amdgpu_trace.h"
>   #include "amdgpu_vm.h"
> +#include "amdgpu_job.h"
>   
>   /*
>    * amdgpu_vm_pt_cursor - state for for_each_amdgpu_vm_pt
> @@ -396,7 +397,8 @@ int amdgpu_vm_pt_clear(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   	params.vm = vm;
>   	params.immediate = immediate;
>   
> -	r = vm->update_funcs->prepare(&params, NULL);
> +	r = vm->update_funcs->prepare(&params, NULL,
> +				      AMDGPU_KERNEL_JOB_ID_VM_PT_CLEAR);
>   	if (r)
>   		goto exit;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> index 46d9fb433ab2..36805dcfa159 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> @@ -40,7 +40,7 @@ static int amdgpu_vm_sdma_map_table(struct amdgpu_bo_vm *table)
>   
>   /* Allocate a new job for @count PTE updates */
>   static int amdgpu_vm_sdma_alloc_job(struct amdgpu_vm_update_params *p,
> -				    unsigned int count)
> +				    unsigned int count, u64 k_job_id)
>   {
>   	enum amdgpu_ib_pool_type pool = p->immediate ? AMDGPU_IB_POOL_IMMEDIATE
>   		: AMDGPU_IB_POOL_DELAYED;
> @@ -56,7 +56,7 @@ static int amdgpu_vm_sdma_alloc_job(struct amdgpu_vm_update_params *p,
>   	ndw = min(ndw, AMDGPU_VM_SDMA_MAX_NUM_DW);
>   
>   	r = amdgpu_job_alloc_with_ib(p->adev, entity, AMDGPU_FENCE_OWNER_VM,
> -				     ndw * 4, pool, &p->job);
> +				     ndw * 4, pool, &p->job, k_job_id);
>   	if (r)
>   		return r;
>   
> @@ -69,16 +69,17 @@ static int amdgpu_vm_sdma_alloc_job(struct amdgpu_vm_update_params *p,
>    *
>    * @p: see amdgpu_vm_update_params definition
>    * @sync: amdgpu_sync object with fences to wait for
> + * @k_job_id: identifier of the job, for tracing purpose
>    *
>    * Returns:
>    * Negativ errno, 0 for success.
>    */
>   static int amdgpu_vm_sdma_prepare(struct amdgpu_vm_update_params *p,
> -				  struct amdgpu_sync *sync)
> +				  struct amdgpu_sync *sync, u64 k_job_id)
>   {
>   	int r;
>   
> -	r = amdgpu_vm_sdma_alloc_job(p, 0);
> +	r = amdgpu_vm_sdma_alloc_job(p, 0, k_job_id);
>   	if (r)
>   		return r;
>   
> @@ -249,7 +250,8 @@ static int amdgpu_vm_sdma_update(struct amdgpu_vm_update_params *p,
>   			if (r)
>   				return r;
>   
> -			r = amdgpu_vm_sdma_alloc_job(p, count);
> +			r = amdgpu_vm_sdma_alloc_job(p, count,
> +						     AMDGPU_KERNEL_JOB_ID_VM_UPDATE);
>   			if (r)
>   				return r;
>   		}
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> index 1c07b701d0e4..ceb94bbb03a4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> @@ -217,7 +217,8 @@ static int uvd_v6_0_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t handle
>   	int i, r;
>   
>   	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
> -				     AMDGPU_IB_POOL_DIRECT, &job);
> +				     AMDGPU_IB_POOL_DIRECT, &job,
> +				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>   	if (r)
>   		return r;
>   
> @@ -281,7 +282,8 @@ static int uvd_v6_0_enc_get_destroy_msg(struct amdgpu_ring *ring,
>   	int i, r;
>   
>   	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
> -				     AMDGPU_IB_POOL_DIRECT, &job);
> +				     AMDGPU_IB_POOL_DIRECT, &job,
> +				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>   	if (r)
>   		return r;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
> index 9d237b5937fb..1f8866f3f63c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
> @@ -225,7 +225,8 @@ static int uvd_v7_0_enc_get_create_msg(struct amdgpu_ring *ring, u32 handle,
>   	int i, r;
>   
>   	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
> -				     AMDGPU_IB_POOL_DIRECT, &job);
> +				     AMDGPU_IB_POOL_DIRECT, &job,
> +				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>   	if (r)
>   		return r;
>   
> @@ -288,7 +289,8 @@ static int uvd_v7_0_enc_get_destroy_msg(struct amdgpu_ring *ring, u32 handle,
>   	int i, r;
>   
>   	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, ib_size_dw * 4,
> -				     AMDGPU_IB_POOL_DIRECT, &job);
> +				     AMDGPU_IB_POOL_DIRECT, &job,
> +				     AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST);
>   	if (r)
>   		return r;
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index 3653c563ee9a..46c84fc60af1 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -67,7 +67,8 @@ svm_migrate_gart_map(struct amdgpu_ring *ring, u64 npages,
>   				     AMDGPU_FENCE_OWNER_UNDEFINED,
>   				     num_dw * 4 + num_bytes,
>   				     AMDGPU_IB_POOL_DELAYED,
> -				     &job);
> +				     &job,
> +				     AMDGPU_KERNEL_JOB_ID_KFD_GART_MAP);
>   	if (r)
>   		return r;
>   

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 10/28] drm/amdgpu: add amdgpu_device argument to ttm functions that need it
  2025-11-21 16:09     ` Pierre-Eric Pelloux-Prayer
@ 2025-11-24  8:42       ` Christian König
  0 siblings, 0 replies; 60+ messages in thread
From: Christian König @ 2025-11-24  8:42 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Pierre-Eric Pelloux-Prayer,
	Alex Deucher, David Airlie, Simona Vetter
  Cc: amd-gfx, dri-devel, linux-kernel

On 11/21/25 17:09, Pierre-Eric Pelloux-Prayer wrote:
> 
> 
> Le 21/11/2025 à 14:24, Christian König a écrit :
>> On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
>>> Instead of getting it through amdgpu_ttm_adev(bo->tbo.bdev).
>>
>> Why should that be a good idea?
> 
> IMO explicit parameters are clearer than implicit ones so if these functions depends on adev, they might as well get it as an argument instead of fishing it from one of their other arguments.

Usually defensive code does exactly the oposite. E.g. when you can figure out adev from the BO then do it.

Only when the adev of the BO and the adev which does the operation can be different then provide that as extra parameter *and* code in such a way that this actually works.

Such things can happen in an XGMI hive for example, but makes the whole stuff much more complicated (lock inversions etc...).

> But if you prefer to keep the existing code I can drop this patch.

Yes please do so and if you see other cases please make sure to use the same approach as well.

Thanks,
Christian.

> 
> Pierre-Eric
> 
> 
>>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  5 +++--
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 11 ++++++-----
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h    |  6 ++++--
>>>   3 files changed, 13 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index 858eb9fa061b..2ee48f76483d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -725,7 +725,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>>           bo->tbo.resource->mem_type == TTM_PL_VRAM) {
>>>           struct dma_fence *fence;
>>>   -        r = amdgpu_ttm_clear_buffer(bo, bo->tbo.base.resv, &fence);
>>> +        r = amdgpu_ttm_clear_buffer(adev, bo, bo->tbo.base.resv, &fence);
>>>           if (unlikely(r))
>>>               goto fail_unreserve;
>>>   @@ -1322,7 +1322,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>>>       if (r)
>>>           goto out;
>>>   -    r = amdgpu_fill_buffer(&adev->mman.clear_entity, abo, 0, &bo->base._resv,
>>> +    r = amdgpu_fill_buffer(adev,
>>> +                   &adev->mman.clear_entity, abo, 0, &bo->base._resv,
>>>                      &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
>>>       if (WARN_ON(r))
>>>           goto out;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> index 1d3afad885da..57dff2df433b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> @@ -414,7 +414,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
>>>           (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) {
>>>           struct dma_fence *wipe_fence = NULL;
>>>   -        r = amdgpu_fill_buffer(&adev->mman.move_entity,
>>> +        r = amdgpu_fill_buffer(adev, &adev->mman.move_entity,
>>>                          abo, 0, NULL, &wipe_fence,
>>>                          AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
>>>           if (r) {
>>> @@ -2350,6 +2350,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
>>>     /**
>>>    * amdgpu_ttm_clear_buffer - clear memory buffers
>>> + * @adev: amdgpu device object
>>>    * @bo: amdgpu buffer object
>>>    * @resv: reservation object
>>>    * @fence: dma_fence associated with the operation
>>> @@ -2359,11 +2360,11 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev,
>>>    * Returns:
>>>    * 0 for success or a negative error code on failure.
>>>    */
>>> -int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>>> +int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
>>> +                struct amdgpu_bo *bo,
>>>                   struct dma_resv *resv,
>>>                   struct dma_fence **fence)
>>>   {
>>> -    struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>>>       struct amdgpu_res_cursor cursor;
>>>       u64 addr;
>>>       int r = 0;
>>> @@ -2414,14 +2415,14 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>>>       return r;
>>>   }
>>>   -int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity,
>>> +int amdgpu_fill_buffer(struct amdgpu_device *adev,
>>> +               struct amdgpu_ttm_buffer_entity *entity,
>>>                  struct amdgpu_bo *bo,
>>>                  uint32_t src_data,
>>>                  struct dma_resv *resv,
>>>                  struct dma_fence **f,
>>>                  u64 k_job_id)
>>>   {
>>> -    struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>>>       struct dma_fence *fence = NULL;
>>>       struct amdgpu_res_cursor dst;
>>>       int r;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> index 9288599c9c46..d0f55a7edd30 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> @@ -174,10 +174,12 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev,
>>>                  struct dma_resv *resv,
>>>                  struct dma_fence **fence,
>>>                  bool vm_needs_flush, uint32_t copy_flags);
>>> -int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>>> +int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
>>> +                struct amdgpu_bo *bo,
>>>                   struct dma_resv *resv,
>>>                   struct dma_fence **fence);
>>> -int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity,
>>> +int amdgpu_fill_buffer(struct amdgpu_device *adev,
>>> +               struct amdgpu_ttm_buffer_entity *entity,
>>>                  struct amdgpu_bo *bo,
>>>                  uint32_t src_data,
>>>                  struct dma_resv *resv,


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 19/28] drm/ttm: rework pipelined eviction fence handling
  2025-11-21 15:12   ` Christian König
@ 2025-11-26 15:34     ` Maarten Lankhorst
  2025-11-26 15:36       ` Christian König
  0 siblings, 1 reply; 60+ messages in thread
From: Maarten Lankhorst @ 2025-11-26 15:34 UTC (permalink / raw)
  To: Christian König, Pierre-Eric Pelloux-Prayer, Huang Rui,
	Matthew Auld, Matthew Brost, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Sumit Semwal
  Cc: dri-devel, linux-kernel, linux-media, linaro-mm-sig

Hey,

Den 2025-11-21 kl. 16:12, skrev Christian König:
> On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
>> Until now ttm stored a single pipelined eviction fence which means
>> drivers had to use a single entity for these evictions.
>>
>> To lift this requirement, this commit allows up to 8 entities to
>> be used.
>>
>> Ideally a dma_resv object would have been used as a container of
>> the eviction fences, but the locking rules makes it complex.
>> dma_resv all have the same ww_class, which means "Attempting to
>> lock more mutexes after ww_acquire_done." is an error.
>>
>> One alternative considered was to introduced a 2nd ww_class for
>> specific resv to hold a single "transient" lock (= the resv lock
>> would only be held for a short period, without taking any other
>> locks).
>>
>> The other option, is to statically reserve a fence array, and
>> extend the existing code to deal with N fences, instead of 1.
>>
>> The driver is still responsible to reserve the correct number
>> of fence slots.
>>
>> ---
>> v2:
>> - simplified code
>> - dropped n_fences
>> - name changes
>> v3: use ttm_resource_manager_cleanup
>> ---
>>
>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> 
> Reviewed-by: Christian König <christian.koenig@amd.com>
> 
> Going to push separately to drm-misc-next on Monday.
> 
Pushing this broke drm-tip, the amd driver fails to build, as it's not using the eviction_fences array.

Kind regards,
~Maarten Lankhorst

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 19/28] drm/ttm: rework pipelined eviction fence handling
  2025-11-26 15:34     ` Maarten Lankhorst
@ 2025-11-26 15:36       ` Christian König
  2025-11-26 15:39         ` Maarten Lankhorst
  0 siblings, 1 reply; 60+ messages in thread
From: Christian König @ 2025-11-26 15:36 UTC (permalink / raw)
  To: Maarten Lankhorst, Pierre-Eric Pelloux-Prayer, Huang Rui,
	Matthew Auld, Matthew Brost, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Sumit Semwal
  Cc: dri-devel, linux-kernel, linux-media, linaro-mm-sig

On 11/26/25 16:34, Maarten Lankhorst wrote:
> Hey,
> 
> Den 2025-11-21 kl. 16:12, skrev Christian König:
>> On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
>>> Until now ttm stored a single pipelined eviction fence which means
>>> drivers had to use a single entity for these evictions.
>>>
>>> To lift this requirement, this commit allows up to 8 entities to
>>> be used.
>>>
>>> Ideally a dma_resv object would have been used as a container of
>>> the eviction fences, but the locking rules makes it complex.
>>> dma_resv all have the same ww_class, which means "Attempting to
>>> lock more mutexes after ww_acquire_done." is an error.
>>>
>>> One alternative considered was to introduced a 2nd ww_class for
>>> specific resv to hold a single "transient" lock (= the resv lock
>>> would only be held for a short period, without taking any other
>>> locks).
>>>
>>> The other option, is to statically reserve a fence array, and
>>> extend the existing code to deal with N fences, instead of 1.
>>>
>>> The driver is still responsible to reserve the correct number
>>> of fence slots.
>>>
>>> ---
>>> v2:
>>> - simplified code
>>> - dropped n_fences
>>> - name changes
>>> v3: use ttm_resource_manager_cleanup
>>> ---
>>>
>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
>>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>> Going to push separately to drm-misc-next on Monday.
>>
> Pushing this broke drm-tip, the amd driver fails to build, as it's not using the eviction_fences array.

Thanks for the note! But hui? We changed amdgpu to not touch the move fence.

Give me a second.

Thanks,
Christian.

> 
> Kind regards,
> ~Maarten Lankhorst


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 19/28] drm/ttm: rework pipelined eviction fence handling
  2025-11-26 15:36       ` Christian König
@ 2025-11-26 15:39         ` Maarten Lankhorst
  2025-11-26 15:48           ` Christian König
  0 siblings, 1 reply; 60+ messages in thread
From: Maarten Lankhorst @ 2025-11-26 15:39 UTC (permalink / raw)
  To: Christian König, Pierre-Eric Pelloux-Prayer, Huang Rui,
	Matthew Auld, Matthew Brost, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Sumit Semwal
  Cc: dri-devel, linux-kernel, linux-media, linaro-mm-sig

Hey,

Den 2025-11-26 kl. 16:36, skrev Christian König:
> On 11/26/25 16:34, Maarten Lankhorst wrote:
>> Hey,
>>
>> Den 2025-11-21 kl. 16:12, skrev Christian König:
>>> On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
>>>> Until now ttm stored a single pipelined eviction fence which means
>>>> drivers had to use a single entity for these evictions.
>>>>
>>>> To lift this requirement, this commit allows up to 8 entities to
>>>> be used.
>>>>
>>>> Ideally a dma_resv object would have been used as a container of
>>>> the eviction fences, but the locking rules makes it complex.
>>>> dma_resv all have the same ww_class, which means "Attempting to
>>>> lock more mutexes after ww_acquire_done." is an error.
>>>>
>>>> One alternative considered was to introduced a 2nd ww_class for
>>>> specific resv to hold a single "transient" lock (= the resv lock
>>>> would only be held for a short period, without taking any other
>>>> locks).
>>>>
>>>> The other option, is to statically reserve a fence array, and
>>>> extend the existing code to deal with N fences, instead of 1.
>>>>
>>>> The driver is still responsible to reserve the correct number
>>>> of fence slots.
>>>>
>>>> ---
>>>> v2:
>>>> - simplified code
>>>> - dropped n_fences
>>>> - name changes
>>>> v3: use ttm_resource_manager_cleanup
>>>> ---
>>>>
>>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
>>>
>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>
>>> Going to push separately to drm-misc-next on Monday.
>>>
>> Pushing this broke drm-tip, the amd driver fails to build, as it's not using the eviction_fences array.
> 
> Thanks for the note! But hui? We changed amdgpu to not touch the move fence.
> 
> Give me a second.commit 13bec21f5f4cdabdf06725e5a8dee0b9b56ff671 (HEAD -> drm-tip, drm-tip/drm-tip, drm-tip/HEAD)

Author: Christian König <christian.koenig@amd.com>
Date:   Wed Nov 26 13:13:03 2025 +0100

    drm-tip: 2025y-11m-26d-12h-12m-41s UTC integration manifest

drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2188:34: error: ‘struct ttm_resource_manager’ has no member named ‘move’
 2188 |                 dma_fence_put(man->move);
      |                                  ^~
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2189:20: error: ‘struct ttm_resource_manager’ has no member named ‘move’
 2189 |                 man->move = NULL;
      |                    ^~

Is what I see.

Kind regards,
~Maarten Lankhorst

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 19/28] drm/ttm: rework pipelined eviction fence handling
  2025-11-26 15:39         ` Maarten Lankhorst
@ 2025-11-26 15:48           ` Christian König
  2025-11-26 17:14             ` Maarten Lankhorst
  0 siblings, 1 reply; 60+ messages in thread
From: Christian König @ 2025-11-26 15:48 UTC (permalink / raw)
  To: Maarten Lankhorst, Pierre-Eric Pelloux-Prayer, Huang Rui,
	Matthew Auld, Matthew Brost, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Sumit Semwal
  Cc: dri-devel, linux-kernel, linux-media, linaro-mm-sig

On 11/26/25 16:39, Maarten Lankhorst wrote:
> Hey,
> 
> Den 2025-11-26 kl. 16:36, skrev Christian König:
>> On 11/26/25 16:34, Maarten Lankhorst wrote:
>>> Hey,
>>>
>>> Den 2025-11-21 kl. 16:12, skrev Christian König:
>>>> On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
>>>>> Until now ttm stored a single pipelined eviction fence which means
>>>>> drivers had to use a single entity for these evictions.
>>>>>
>>>>> To lift this requirement, this commit allows up to 8 entities to
>>>>> be used.
>>>>>
>>>>> Ideally a dma_resv object would have been used as a container of
>>>>> the eviction fences, but the locking rules makes it complex.
>>>>> dma_resv all have the same ww_class, which means "Attempting to
>>>>> lock more mutexes after ww_acquire_done." is an error.
>>>>>
>>>>> One alternative considered was to introduced a 2nd ww_class for
>>>>> specific resv to hold a single "transient" lock (= the resv lock
>>>>> would only be held for a short period, without taking any other
>>>>> locks).
>>>>>
>>>>> The other option, is to statically reserve a fence array, and
>>>>> extend the existing code to deal with N fences, instead of 1.
>>>>>
>>>>> The driver is still responsible to reserve the correct number
>>>>> of fence slots.
>>>>>
>>>>> ---
>>>>> v2:
>>>>> - simplified code
>>>>> - dropped n_fences
>>>>> - name changes
>>>>> v3: use ttm_resource_manager_cleanup
>>>>> ---
>>>>>
>>>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
>>>>
>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>
>>>> Going to push separately to drm-misc-next on Monday.
>>>>
>>> Pushing this broke drm-tip, the amd driver fails to build, as it's not using the eviction_fences array.
>>
>> Thanks for the note! But hui? We changed amdgpu to not touch the move fence.
>>
>> Give me a second.commit 13bec21f5f4cdabdf06725e5a8dee0b9b56ff671 (HEAD -> drm-tip, drm-tip/drm-tip, drm-tip/HEAD)
> 
> Author: Christian König <christian.koenig@amd.com>
> Date:   Wed Nov 26 13:13:03 2025 +0100
> 
>     drm-tip: 2025y-11m-26d-12h-12m-41s UTC integration manifest
> 
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2188:34: error: ‘struct ttm_resource_manager’ has no member named ‘move’
>  2188 |                 dma_fence_put(man->move);
>       |                                  ^~
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2189:20: error: ‘struct ttm_resource_manager’ has no member named ‘move’
>  2189 |                 man->move = NULL;
>       |                    ^~
> 
> Is what I see.

Ah, crap, I know what's going on.

The patch to remove those lines is queued up to go upstream through amd-staging-drm-next instead of drm-misc-next.

I will push this patch to drm-misc-next and sync up with Alex that it shouldn't go upstream through amd-staging-drm-next.

Going to build test drm-tip the next time.

Thanks,
Christian.


> 
> Kind regards,
> ~Maarten Lankhorst


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 19/28] drm/ttm: rework pipelined eviction fence handling
  2025-11-26 15:48           ` Christian König
@ 2025-11-26 17:14             ` Maarten Lankhorst
  0 siblings, 0 replies; 60+ messages in thread
From: Maarten Lankhorst @ 2025-11-26 17:14 UTC (permalink / raw)
  To: Christian König, Pierre-Eric Pelloux-Prayer, Huang Rui,
	Matthew Auld, Matthew Brost, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Sumit Semwal
  Cc: dri-devel, linux-kernel, linux-media, linaro-mm-sig

Hey,

Den 2025-11-26 kl. 16:48, skrev Christian König:
> On 11/26/25 16:39, Maarten Lankhorst wrote:
>> Hey,
>>
>> Den 2025-11-26 kl. 16:36, skrev Christian König:
>>> On 11/26/25 16:34, Maarten Lankhorst wrote:
>>>> Hey,
>>>>
>>>> Den 2025-11-21 kl. 16:12, skrev Christian König:
>>>>> On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
>>>>>> Until now ttm stored a single pipelined eviction fence which means
>>>>>> drivers had to use a single entity for these evictions.
>>>>>>
>>>>>> To lift this requirement, this commit allows up to 8 entities to
>>>>>> be used.
>>>>>>
>>>>>> Ideally a dma_resv object would have been used as a container of
>>>>>> the eviction fences, but the locking rules makes it complex.
>>>>>> dma_resv all have the same ww_class, which means "Attempting to
>>>>>> lock more mutexes after ww_acquire_done." is an error.
>>>>>>
>>>>>> One alternative considered was to introduced a 2nd ww_class for
>>>>>> specific resv to hold a single "transient" lock (= the resv lock
>>>>>> would only be held for a short period, without taking any other
>>>>>> locks).
>>>>>>
>>>>>> The other option, is to statically reserve a fence array, and
>>>>>> extend the existing code to deal with N fences, instead of 1.
>>>>>>
>>>>>> The driver is still responsible to reserve the correct number
>>>>>> of fence slots.
>>>>>>
>>>>>> ---
>>>>>> v2:
>>>>>> - simplified code
>>>>>> - dropped n_fences
>>>>>> - name changes
>>>>>> v3: use ttm_resource_manager_cleanup
>>>>>> ---
>>>>>>
>>>>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
>>>>>
>>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>>
>>>>> Going to push separately to drm-misc-next on Monday.
>>>>>
>>>> Pushing this broke drm-tip, the amd driver fails to build, as it's not using the eviction_fences array.
>>>
>>> Thanks for the note! But hui? We changed amdgpu to not touch the move fence.
>>>
>>> Give me a second.commit 13bec21f5f4cdabdf06725e5a8dee0b9b56ff671 (HEAD -> drm-tip, drm-tip/drm-tip, drm-tip/HEAD)
>>
>> Author: Christian König <christian.koenig@amd.com>
>> Date:   Wed Nov 26 13:13:03 2025 +0100
>>
>>     drm-tip: 2025y-11m-26d-12h-12m-41s UTC integration manifest
>>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2188:34: error: ‘struct ttm_resource_manager’ has no member named ‘move’
>>  2188 |                 dma_fence_put(man->move);
>>       |                                  ^~
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2189:20: error: ‘struct ttm_resource_manager’ has no member named ‘move’
>>  2189 |                 man->move = NULL;
>>       |                    ^~
>>
>> Is what I see.
> 
> Ah, crap, I know what's going on.
> 
> The patch to remove those lines is queued up to go upstream through amd-staging-drm-next instead of drm-misc-next.
> 
> I will push this patch to drm-misc-next and sync up with Alex that it shouldn't go upstream through amd-staging-drm-next.
> 
> Going to build test drm-tip the next time.
Thank you, drm-tip now builds cleanly again!

> 
> Thanks,
> Christian.
> 
> 

Kind regards,
~Maarten Lankhorst

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2025-11-26 17:15 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-21 10:12 [PATCH v3 00/28] drm/amdgpu: use all SDMA instances for TTM clears and moves Pierre-Eric Pelloux-Prayer
2025-11-21 10:12 ` [PATCH v3 01/28] drm/amdgpu: give each kernel job a unique id Pierre-Eric Pelloux-Prayer
2025-11-21 12:51   ` Christian König
2025-11-21 20:02   ` Felix Kuehling
2025-11-21 10:12 ` [PATCH v3 02/28] drm/amdgpu: use ttm_resource_manager_cleanup Pierre-Eric Pelloux-Prayer
2025-11-21 12:52   ` Christian König
2025-11-21 10:12 ` [PATCH v3 03/28] drm/amdgpu: remove direct_submit arg from amdgpu_copy_buffer Pierre-Eric Pelloux-Prayer
2025-11-21 10:12 ` [PATCH v3 04/28] drm/amdgpu: remove the ring param from ttm functions Pierre-Eric Pelloux-Prayer
2025-11-21 12:58   ` Christian König
2025-11-21 10:12 ` [PATCH v3 05/28] drm/amdgpu: introduce amdgpu_ttm_buffer_entity Pierre-Eric Pelloux-Prayer
2025-11-21 10:12 ` [PATCH v3 06/28] drm/amdgpu: add amdgpu_ttm_job_submit helper Pierre-Eric Pelloux-Prayer
2025-11-21 13:00   ` Christian König
2025-11-21 10:12 ` [PATCH v3 07/28] drm/amdgpu: fix error handling in amdgpu_copy_buffer Pierre-Eric Pelloux-Prayer
2025-11-21 13:04   ` Christian König
2025-11-21 10:12 ` [PATCH v3 08/28] drm/amdgpu: pass the entity to use to amdgpu_ttm_map_buffer Pierre-Eric Pelloux-Prayer
2025-11-21 13:05   ` Christian König
2025-11-21 10:12 ` [PATCH v3 09/28] drm/amdgpu: pass the entity to use to ttm public functions Pierre-Eric Pelloux-Prayer
2025-11-21 13:23   ` Christian König
2025-11-21 10:12 ` [PATCH v3 10/28] drm/amdgpu: add amdgpu_device argument to ttm functions that need it Pierre-Eric Pelloux-Prayer
2025-11-21 13:24   ` Christian König
2025-11-21 16:09     ` Pierre-Eric Pelloux-Prayer
2025-11-24  8:42       ` Christian König
2025-11-21 10:12 ` [PATCH v3 11/28] drm/amdgpu: statically assign gart windows to ttm entities Pierre-Eric Pelloux-Prayer
2025-11-21 13:34   ` Christian König
2025-11-21 10:12 ` [PATCH v3 12/28] drm/amdgpu: remove AMDGPU_GTT_NUM_TRANSFER_WINDOWS Pierre-Eric Pelloux-Prayer
2025-11-21 13:50   ` Christian König
2025-11-21 10:12 ` [PATCH v3 13/28] drm/amdgpu: add missing lock when using ttm entities Pierre-Eric Pelloux-Prayer
2025-11-21 13:53   ` Christian König
2025-11-21 10:12 ` [PATCH v3 14/28] drm/amdgpu: check entity lock is held in amdgpu_ttm_job_submit Pierre-Eric Pelloux-Prayer
2025-11-21 13:54   ` Christian König
2025-11-21 10:12 ` [PATCH v3 15/28] drm/amdgpu: double AMDGPU_GTT_MAX_TRANSFER_SIZE Pierre-Eric Pelloux-Prayer
2025-11-21 10:12 ` [PATCH v3 16/28] drm/amdgpu: use larger gart window when possible Pierre-Eric Pelloux-Prayer
2025-11-21 15:02   ` Christian König
2025-11-21 10:12 ` [PATCH v3 17/28] drm/amdgpu: introduce amdgpu_sdma_set_vm_pte_scheds Pierre-Eric Pelloux-Prayer
2025-11-21 15:05   ` Christian König
2025-11-21 10:12 ` [PATCH v3 18/28] drm/amdgpu: move sched status check inside amdgpu_ttm_set_buffer_funcs_status Pierre-Eric Pelloux-Prayer
2025-11-21 15:08   ` Christian König
2025-11-21 15:12     ` Pierre-Eric Pelloux-Prayer
2025-11-21 15:23       ` Christian König
2025-11-21 10:12 ` [PATCH v3 19/28] drm/ttm: rework pipelined eviction fence handling Pierre-Eric Pelloux-Prayer
2025-11-21 15:12   ` Christian König
2025-11-26 15:34     ` Maarten Lankhorst
2025-11-26 15:36       ` Christian König
2025-11-26 15:39         ` Maarten Lankhorst
2025-11-26 15:48           ` Christian König
2025-11-26 17:14             ` Maarten Lankhorst
2025-11-21 10:12 ` [PATCH v3 20/28] drm/amdgpu: allocate multiple clear entities Pierre-Eric Pelloux-Prayer
2025-11-21 15:34   ` Christian König
2025-11-21 10:12 ` [PATCH v3 21/28] drm/amdgpu: allocate multiple move entities Pierre-Eric Pelloux-Prayer
2025-11-21 10:12 ` [PATCH v3 22/28] drm/amdgpu: round robin through clear_entities in amdgpu_fill_buffer Pierre-Eric Pelloux-Prayer
2025-11-21 15:36   ` Christian König
2025-11-21 10:12 ` [PATCH v3 23/28] drm/amdgpu: use TTM_NUM_MOVE_FENCES when reserving fences Pierre-Eric Pelloux-Prayer
2025-11-21 15:41   ` Christian König
2025-11-21 10:12 ` [PATCH v3 24/28] drm/amdgpu: use multiple entities in amdgpu_move_blit Pierre-Eric Pelloux-Prayer
2025-11-21 10:12 ` [PATCH v3 25/28] drm/amdgpu: pass all the sdma scheds to amdgpu_mman Pierre-Eric Pelloux-Prayer
2025-11-21 15:52   ` Christian König
2025-11-21 10:12 ` [PATCH v3 26/28] drm/amdgpu: give ttm entities access to all the sdma scheds Pierre-Eric Pelloux-Prayer
2025-11-21 10:12 ` [PATCH v3 27/28] drm/amdgpu: get rid of amdgpu_ttm_clear_buffer Pierre-Eric Pelloux-Prayer
2025-11-21 15:54   ` Christian König
2025-11-21 10:12 ` [PATCH v3 28/28] drm/amdgpu: rename amdgpu_fill_buffer as amdgpu_ttm_clear_buffer Pierre-Eric Pelloux-Prayer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox