All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] KFD user queue validation
@ 2024-07-18 21:05 Philip Yang
  2024-07-18 21:05 ` [PATCH v2 1/9] drm/amdkfd: kfd_bo_mapped_dev support partition Philip Yang
                   ` (9 more replies)
  0 siblings, 10 replies; 18+ messages in thread
From: Philip Yang @ 2024-07-18 21:05 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix.Kuehling, Alexander.Deucher, christian.koenig, Philip Yang

This patch series do additional queue buffers validation in the queue
creation IOCTLS, fail the queue creation if buffers not mapped on the GPU
with the expected size.

Ensure queue buffers residency by tracking the GPUVM virtual addresses
for queue buffers to return error if the user tries to free and unmap them
when the qeueu is active, or evict the queue if SVM memory is unmapped and
freed from CPU.

Patch 1-2 is prepration work and general fix.

v2:
 - patch 3/9, keep wptr_bo_gart in struct queue

Philip Yang (9):
  drm/amdkfd: kfd_bo_mapped_dev support partition
  drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer
  drm/amdkfd: Refactor queue wptr_bo GART mapping
  drm/amdkfd: Validate user queue buffers
  drm/amdkfd: Ensure user queue buffers residency
  drm/amdkfd: Validate user queue svm memory residency
  drm/amdkfd: Validate user queue update
  drm/amdkfd: Store queue cwsr area size to node properties
  drm/amdkfd: Validate queue cwsr area and eop buffer size

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c    |  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   6 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  24 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |   6 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |  61 +---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c       |   4 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |   8 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c  |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  19 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c      |   2 +-
 .../amd/amdkfd/kfd_process_queue_manager.c    |  79 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_queue.c        | 336 ++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  12 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h          |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c     |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h     |   4 +
 16 files changed, 489 insertions(+), 91 deletions(-)

-- 
2.43.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/9] drm/amdkfd: kfd_bo_mapped_dev support partition
  2024-07-18 21:05 [PATCH v2 0/9] KFD user queue validation Philip Yang
@ 2024-07-18 21:05 ` Philip Yang
  2024-07-18 21:05 ` [PATCH v2 2/9] drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer Philip Yang
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Philip Yang @ 2024-07-18 21:05 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, Alexander.Deucher, christian.koenig, Philip Yang,
	Felix Kuehling

Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter
instead of adev, to support spatial partition. This is only used by CRIU
checkpoint restore now. No functional change.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 5 +++--
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index e7bb1ca35801..66b1c72c81e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -345,7 +345,7 @@ void amdgpu_amdkfd_ras_pasid_poison_consumption_handler(struct amdgpu_device *ad
 			pasid_notify pasid_fn, void *data, uint32_t reset);
 
 bool amdgpu_amdkfd_is_fed(struct amdgpu_device *adev);
-bool amdgpu_amdkfd_bo_mapped_to_dev(struct amdgpu_device *adev, struct kgd_mem *mem);
+bool amdgpu_amdkfd_bo_mapped_to_dev(void *drm_priv, struct kgd_mem *mem);
 void amdgpu_amdkfd_block_mmu_notifications(void *p);
 int amdgpu_amdkfd_criu_resume(void *p);
 bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 11672bfe4fad..199e387d35f4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -3200,12 +3200,13 @@ int amdgpu_amdkfd_get_tile_config(struct amdgpu_device *adev,
 	return 0;
 }
 
-bool amdgpu_amdkfd_bo_mapped_to_dev(struct amdgpu_device *adev, struct kgd_mem *mem)
+bool amdgpu_amdkfd_bo_mapped_to_dev(void *drm_priv, struct kgd_mem *mem)
 {
+	struct amdgpu_vm *vm = drm_priv_to_vm(drm_priv);
 	struct kfd_mem_attachment *entry;
 
 	list_for_each_entry(entry, &mem->attachments, list) {
-		if (entry->is_mapped && entry->adev == adev)
+		if (entry->is_mapped && entry->bo_va->base.vm == vm)
 			return true;
 	}
 	return false;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 32e5db509560..1d9b21628be7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1963,7 +1963,7 @@ static int criu_checkpoint_bos(struct kfd_process *p,
 				bo_bucket->offset = amdgpu_bo_mmap_offset(dumper_bo);
 
 			for (i = 0; i < p->n_pdds; i++) {
-				if (amdgpu_amdkfd_bo_mapped_to_dev(p->pdds[i]->dev->adev, kgd_mem))
+				if (amdgpu_amdkfd_bo_mapped_to_dev(p->pdds[i]->drm_priv, kgd_mem))
 					bo_priv->mapped_gpuids[dev_idx++] = p->pdds[i]->user_gpu_id;
 			}
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/9] drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer
  2024-07-18 21:05 [PATCH v2 0/9] KFD user queue validation Philip Yang
  2024-07-18 21:05 ` [PATCH v2 1/9] drm/amdkfd: kfd_bo_mapped_dev support partition Philip Yang
@ 2024-07-18 21:05 ` Philip Yang
  2024-07-18 21:05 ` [PATCH v2 3/9] drm/amdkfd: Refactor queue wptr_bo GART mapping Philip Yang
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Philip Yang @ 2024-07-18 21:05 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, Alexander.Deucher, christian.koenig, Philip Yang,
	Felix Kuehling

Pass pointer reference to amdgpu_bo_unref to clear the correct pointer,
otherwise amdgpu_bo_unref clear the local variable, the original pointer
not set to NULL, this could cause use-after-free bug.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c         | 14 +++++++-------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c            |  4 ++--
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c       |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           |  2 +-
 .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |  4 ++--
 8 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 03205e3c3746..c272461d70a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -364,15 +364,15 @@ int amdgpu_amdkfd_alloc_gtt_mem(struct amdgpu_device *adev, size_t size,
 	return r;
 }
 
-void amdgpu_amdkfd_free_gtt_mem(struct amdgpu_device *adev, void *mem_obj)
+void amdgpu_amdkfd_free_gtt_mem(struct amdgpu_device *adev, void **mem_obj)
 {
-	struct amdgpu_bo *bo = (struct amdgpu_bo *) mem_obj;
+	struct amdgpu_bo **bo = (struct amdgpu_bo **) mem_obj;
 
-	amdgpu_bo_reserve(bo, true);
-	amdgpu_bo_kunmap(bo);
-	amdgpu_bo_unpin(bo);
-	amdgpu_bo_unreserve(bo);
-	amdgpu_bo_unref(&(bo));
+	amdgpu_bo_reserve(*bo, true);
+	amdgpu_bo_kunmap(*bo);
+	amdgpu_bo_unpin(*bo);
+	amdgpu_bo_unreserve(*bo);
+	amdgpu_bo_unref(bo);
 }
 
 int amdgpu_amdkfd_alloc_gws(struct amdgpu_device *adev, size_t size,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 66b1c72c81e5..6e591280774b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -235,7 +235,7 @@ int amdgpu_amdkfd_bo_validate_and_fence(struct amdgpu_bo *bo,
 int amdgpu_amdkfd_alloc_gtt_mem(struct amdgpu_device *adev, size_t size,
 				void **mem_obj, uint64_t *gpu_addr,
 				void **cpu_ptr, bool mqd_gfx9);
-void amdgpu_amdkfd_free_gtt_mem(struct amdgpu_device *adev, void *mem_obj);
+void amdgpu_amdkfd_free_gtt_mem(struct amdgpu_device *adev, void **mem_obj);
 int amdgpu_amdkfd_alloc_gws(struct amdgpu_device *adev, size_t size,
 				void **mem_obj);
 void amdgpu_amdkfd_free_gws(struct amdgpu_device *adev, void *mem_obj);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 1d9b21628be7..823f245dc7d0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -423,7 +423,7 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 
 err_create_queue:
 	if (wptr_bo)
-		amdgpu_amdkfd_free_gtt_mem(dev->adev, wptr_bo);
+		amdgpu_amdkfd_free_gtt_mem(dev->adev, (void **)&wptr_bo);
 err_wptr_map_gart:
 err_bind_process:
 err_pdd:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index f4d20adaa068..6619028dd58b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -907,7 +907,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 kfd_doorbell_error:
 	kfd_gtt_sa_fini(kfd);
 kfd_gtt_sa_init_error:
-	amdgpu_amdkfd_free_gtt_mem(kfd->adev, kfd->gtt_mem);
+	amdgpu_amdkfd_free_gtt_mem(kfd->adev, &kfd->gtt_mem);
 alloc_gtt_mem_failure:
 	dev_err(kfd_device,
 		"device %x:%x NOT added due to errors\n",
@@ -925,7 +925,7 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd)
 		kfd_doorbell_fini(kfd);
 		ida_destroy(&kfd->doorbell_ida);
 		kfd_gtt_sa_fini(kfd);
-		amdgpu_amdkfd_free_gtt_mem(kfd->adev, kfd->gtt_mem);
+		amdgpu_amdkfd_free_gtt_mem(kfd->adev, &kfd->gtt_mem);
 	}
 
 	kfree(kfd);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 4f48507418d2..420444eb8e98 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -2621,7 +2621,7 @@ static void deallocate_hiq_sdma_mqd(struct kfd_node *dev,
 {
 	WARN(!mqd, "No hiq sdma mqd trunk to free");
 
-	amdgpu_amdkfd_free_gtt_mem(dev->adev, mqd->gtt_mem);
+	amdgpu_amdkfd_free_gtt_mem(dev->adev, &mqd->gtt_mem);
 }
 
 void device_queue_manager_uninit(struct device_queue_manager *dqm)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
index 50a81da43ce1..d9ae854b6908 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
@@ -225,7 +225,7 @@ void kfd_free_mqd_cp(struct mqd_manager *mm, void *mqd,
 	      struct kfd_mem_obj *mqd_mem_obj)
 {
 	if (mqd_mem_obj->gtt_mem) {
-		amdgpu_amdkfd_free_gtt_mem(mm->dev->adev, mqd_mem_obj->gtt_mem);
+		amdgpu_amdkfd_free_gtt_mem(mm->dev->adev, &mqd_mem_obj->gtt_mem);
 		kfree(mqd_mem_obj);
 	} else {
 		kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index d65974f3b34d..70d6359bb5a3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1048,7 +1048,7 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
 
 		if (pdd->dev->kfd->shared_resources.enable_mes)
 			amdgpu_amdkfd_free_gtt_mem(pdd->dev->adev,
-						   pdd->proc_ctx_bo);
+						   &pdd->proc_ctx_bo);
 		/*
 		 * before destroying pdd, make sure to report availability
 		 * for auto suspend
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 21f5a1fb3bf8..36f0460cbffe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -204,9 +204,9 @@ static void pqm_clean_queue_resource(struct process_queue_manager *pqm,
 	}
 
 	if (dev->kfd->shared_resources.enable_mes) {
-		amdgpu_amdkfd_free_gtt_mem(dev->adev, pqn->q->gang_ctx_bo);
+		amdgpu_amdkfd_free_gtt_mem(dev->adev, &pqn->q->gang_ctx_bo);
 		if (pqn->q->wptr_bo)
-			amdgpu_amdkfd_free_gtt_mem(dev->adev, pqn->q->wptr_bo);
+			amdgpu_amdkfd_free_gtt_mem(dev->adev, (void **)&pqn->q->wptr_bo);
 	}
 }
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/9] drm/amdkfd: Refactor queue wptr_bo GART mapping
  2024-07-18 21:05 [PATCH v2 0/9] KFD user queue validation Philip Yang
  2024-07-18 21:05 ` [PATCH v2 1/9] drm/amdkfd: kfd_bo_mapped_dev support partition Philip Yang
  2024-07-18 21:05 ` [PATCH v2 2/9] drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer Philip Yang
@ 2024-07-18 21:05 ` Philip Yang
  2024-07-18 21:05 ` [PATCH v2 4/9] drm/amdkfd: Validate user queue buffers Philip Yang
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Philip Yang @ 2024-07-18 21:05 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix.Kuehling, Alexander.Deucher, christian.koenig, Philip Yang

Add helper function kfd_queue_acquire_buffers to get queue wptr_bo
reference from queue write_ptr if it is mapped to the KFD node with
expected size.

Add wptr_bo to structure queue_properties because structure queue is
allocated after queue buffers are validated, then we can remove wptr_bo
parameter from pqm_create_queue.

Rename structure queue wptr_bo_gart to hold wptr_bo reference for GART
mapping and umapping. Move MES wptr_bo_gart mapping to init_user_queue,
the same location with queue ctx_bo GART mapping.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  5 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 56 +++---------------
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         | 13 +++--
 .../amd/amdkfd/kfd_process_queue_manager.c    | 45 +++++++++++----
 drivers/gpu/drm/amd/amdkfd/kfd_queue.c        | 57 +++++++++++++++++++
 7 files changed, 116 insertions(+), 68 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 6e591280774b..4ed49265c764 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -322,7 +322,7 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_mem *mem,
 					     void **kptr, uint64_t *size);
 void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_mem *mem);
 
-int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_bo *bo);
+int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_bo *bo, struct amdgpu_bo **bo_gart);
 
 int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info,
 					    struct dma_fence __rcu **ef);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 199e387d35f4..0ab37e7aec26 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2226,11 +2226,12 @@ int amdgpu_amdkfd_gpuvm_sync_memory(
 /**
  * amdgpu_amdkfd_map_gtt_bo_to_gart - Map BO to GART and increment reference count
  * @bo: Buffer object to be mapped
+ * @bo_gart: Return bo reference
  *
  * Before return, bo reference count is incremented. To release the reference and unpin/
  * unmap the BO, call amdgpu_amdkfd_free_gtt_mem.
  */
-int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_bo *bo)
+int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_bo *bo, struct amdgpu_bo **bo_gart)
 {
 	int ret;
 
@@ -2257,7 +2258,7 @@ int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_bo *bo)
 
 	amdgpu_bo_unreserve(bo);
 
-	bo = amdgpu_bo_ref(bo);
+	*bo_gart = amdgpu_bo_ref(bo);
 
 	return 0;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 823f245dc7d0..202f24ee4bd7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -247,8 +247,8 @@ static int set_queue_properties_from_user(struct queue_properties *q_properties,
 	q_properties->priority = args->queue_priority;
 	q_properties->queue_address = args->ring_base_address;
 	q_properties->queue_size = args->ring_size;
-	q_properties->read_ptr = (uint32_t *) args->read_pointer_address;
-	q_properties->write_ptr = (uint32_t *) args->write_pointer_address;
+	q_properties->read_ptr = (void __user *)args->read_pointer_address;
+	q_properties->write_ptr = (void __user *)args->write_pointer_address;
 	q_properties->eop_ring_buffer_address = args->eop_buffer_address;
 	q_properties->eop_ring_buffer_size = args->eop_buffer_size;
 	q_properties->ctx_save_restore_area_address =
@@ -306,7 +306,6 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 	struct kfd_process_device *pdd;
 	struct queue_properties q_properties;
 	uint32_t doorbell_offset_in_process = 0;
-	struct amdgpu_bo *wptr_bo = NULL;
 
 	memset(&q_properties, 0, sizeof(struct queue_properties));
 
@@ -342,53 +341,17 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 		}
 	}
 
-	/* Starting with GFX11, wptr BOs must be mapped to GART for MES to determine work
-	 * on unmapped queues for usermode queue oversubscription (no aggregated doorbell)
-	 */
-	if (dev->kfd->shared_resources.enable_mes &&
-			((dev->adev->mes.sched_version & AMDGPU_MES_API_VERSION_MASK)
-			>> AMDGPU_MES_API_VERSION_SHIFT) >= 2) {
-		struct amdgpu_bo_va_mapping *wptr_mapping;
-		struct amdgpu_vm *wptr_vm;
-
-		wptr_vm = drm_priv_to_vm(pdd->drm_priv);
-		err = amdgpu_bo_reserve(wptr_vm->root.bo, false);
-		if (err)
-			goto err_wptr_map_gart;
-
-		wptr_mapping = amdgpu_vm_bo_lookup_mapping(
-				wptr_vm, args->write_pointer_address >> PAGE_SHIFT);
-		amdgpu_bo_unreserve(wptr_vm->root.bo);
-		if (!wptr_mapping) {
-			pr_err("Failed to lookup wptr bo\n");
-			err = -EINVAL;
-			goto err_wptr_map_gart;
-		}
-
-		wptr_bo = wptr_mapping->bo_va->base.bo;
-		if (wptr_bo->tbo.base.size > PAGE_SIZE) {
-			pr_err("Requested GART mapping for wptr bo larger than one page\n");
-			err = -EINVAL;
-			goto err_wptr_map_gart;
-		}
-		if (dev->adev != amdgpu_ttm_adev(wptr_bo->tbo.bdev)) {
-			pr_err("Queue memory allocated to wrong device\n");
-			err = -EINVAL;
-			goto err_wptr_map_gart;
-		}
-
-		err = amdgpu_amdkfd_map_gtt_bo_to_gart(wptr_bo);
-		if (err) {
-			pr_err("Failed to map wptr bo to GART\n");
-			goto err_wptr_map_gart;
-		}
+	err = kfd_queue_acquire_buffers(pdd, &q_properties);
+	if (err) {
+		pr_debug("failed to acquire user queue buffers\n");
+		goto err_acquire_queue_buf;
 	}
 
 	pr_debug("Creating queue for PASID 0x%x on gpu 0x%x\n",
 			p->pasid,
 			dev->id);
 
-	err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id, wptr_bo,
+	err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id,
 			NULL, NULL, NULL, &doorbell_offset_in_process);
 	if (err != 0)
 		goto err_create_queue;
@@ -422,9 +385,8 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 	return 0;
 
 err_create_queue:
-	if (wptr_bo)
-		amdgpu_amdkfd_free_gtt_mem(dev->adev, (void **)&wptr_bo);
-err_wptr_map_gart:
+	kfd_queue_release_buffers(pdd, &q_properties);
+err_acquire_queue_buf:
 err_bind_process:
 err_pdd:
 	mutex_unlock(&p->mutex);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 420444eb8e98..fdc76c24b2e7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -208,10 +208,8 @@ static int add_queue_mes(struct device_queue_manager *dqm, struct queue *q,
 	queue_input.mqd_addr = q->gart_mqd_addr;
 	queue_input.wptr_addr = (uint64_t)q->properties.write_ptr;
 
-	if (q->wptr_bo) {
-		wptr_addr_off = (uint64_t)q->properties.write_ptr & (PAGE_SIZE - 1);
-		queue_input.wptr_mc_addr = amdgpu_bo_gpu_offset(q->wptr_bo) + wptr_addr_off;
-	}
+	wptr_addr_off = (uint64_t)q->properties.write_ptr & (PAGE_SIZE - 1);
+	queue_input.wptr_mc_addr = amdgpu_bo_gpu_offset(q->properties.wptr_bo) + wptr_addr_off;
 
 	queue_input.is_kfd_process = 1;
 	queue_input.is_aql_queue = (q->properties.format == KFD_QUEUE_FORMAT_AQL);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 2b3ec92981e8..aba9bcd91f65 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -494,8 +494,8 @@ struct queue_properties {
 	uint64_t  queue_size;
 	uint32_t priority;
 	uint32_t queue_percent;
-	uint32_t *read_ptr;
-	uint32_t *write_ptr;
+	void __user *read_ptr;
+	void __user *write_ptr;
 	void __iomem *doorbell_ptr;
 	uint32_t doorbell_off;
 	bool is_interop;
@@ -522,6 +522,8 @@ struct queue_properties {
 	uint64_t tba_addr;
 	uint64_t tma_addr;
 	uint64_t exception_status;
+
+	struct amdgpu_bo *wptr_bo;
 };
 
 #define QUEUE_IS_ACTIVE(q) ((q).queue_size > 0 &&	\
@@ -604,7 +606,7 @@ struct queue {
 	uint64_t gang_ctx_gpu_addr;
 	void *gang_ctx_cpu_ptr;
 
-	struct amdgpu_bo *wptr_bo;
+	struct amdgpu_bo *wptr_bo_gart;
 };
 
 enum KFD_MQD_TYPE {
@@ -1284,6 +1286,10 @@ int init_queue(struct queue **q, const struct queue_properties *properties);
 void uninit_queue(struct queue *q);
 void print_queue_properties(struct queue_properties *q);
 void print_queue(struct queue *q);
+int kfd_queue_buffer_get(struct amdgpu_vm *vm, void __user *addr, struct amdgpu_bo **pbo,
+			 u64 expected_size);
+int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_properties *properties);
+int kfd_queue_release_buffers(struct kfd_process_device *pdd, struct queue_properties *properties);
 
 struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE type,
 		struct kfd_node *dev);
@@ -1320,7 +1326,6 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			    struct file *f,
 			    struct queue_properties *properties,
 			    unsigned int *qid,
-			    struct amdgpu_bo *wptr_bo,
 			    const struct kfd_criu_queue_priv_data *q_data,
 			    const void *restore_mqd,
 			    const void *restore_ctl_stack,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 36f0460cbffe..4947f28b3afb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -205,18 +205,21 @@ static void pqm_clean_queue_resource(struct process_queue_manager *pqm,
 
 	if (dev->kfd->shared_resources.enable_mes) {
 		amdgpu_amdkfd_free_gtt_mem(dev->adev, &pqn->q->gang_ctx_bo);
-		if (pqn->q->wptr_bo)
-			amdgpu_amdkfd_free_gtt_mem(dev->adev, (void **)&pqn->q->wptr_bo);
+		amdgpu_amdkfd_free_gtt_mem(dev->adev, (void **)&pqn->q->wptr_bo_gart);
 	}
 }
 
 void pqm_uninit(struct process_queue_manager *pqm)
 {
 	struct process_queue_node *pqn, *next;
+	struct kfd_process_device *pdd;
 
 	list_for_each_entry_safe(pqn, next, &pqm->queues, process_queue_list) {
-		if (pqn->q)
+		if (pqn->q) {
+			pdd = kfd_get_process_device_data(pqn->q->device, pqm->process);
+			kfd_queue_release_buffers(pdd, &pqn->q->properties);
 			pqm_clean_queue_resource(pqm, pqn);
+		}
 
 		kfd_procfs_del_queue(pqn->q);
 		uninit_queue(pqn->q);
@@ -231,8 +234,7 @@ void pqm_uninit(struct process_queue_manager *pqm)
 static int init_user_queue(struct process_queue_manager *pqm,
 				struct kfd_node *dev, struct queue **q,
 				struct queue_properties *q_properties,
-				struct file *f, struct amdgpu_bo *wptr_bo,
-				unsigned int qid)
+				struct file *f, unsigned int qid)
 {
 	int retval;
 
@@ -263,12 +265,32 @@ static int init_user_queue(struct process_queue_manager *pqm,
 			goto cleanup;
 		}
 		memset((*q)->gang_ctx_cpu_ptr, 0, AMDGPU_MES_GANG_CTX_SIZE);
-		(*q)->wptr_bo = wptr_bo;
+
+		/* Starting with GFX11, wptr BOs must be mapped to GART for MES to determine work
+		 * on unmapped queues for usermode queue oversubscription (no aggregated doorbell)
+		 */
+		if (((dev->adev->mes.sched_version & AMDGPU_MES_API_VERSION_MASK)
+		    >> AMDGPU_MES_API_VERSION_SHIFT) >= 2) {
+			if (dev->adev != amdgpu_ttm_adev(q_properties->wptr_bo->tbo.bdev)) {
+				pr_err("Queue memory allocated to wrong device\n");
+				retval = -EINVAL;
+				goto free_gang_ctx_bo;
+			}
+
+			retval = amdgpu_amdkfd_map_gtt_bo_to_gart(q_properties->wptr_bo,
+								  &(*q)->wptr_bo_gart);
+			if (retval) {
+				pr_err("Failed to map wptr bo to GART\n");
+				goto free_gang_ctx_bo;
+			}
+		}
 	}
 
 	pr_debug("PQM After init queue");
 	return 0;
 
+free_gang_ctx_bo:
+	amdgpu_amdkfd_free_gtt_mem(dev->adev, (*q)->gang_ctx_bo);
 cleanup:
 	uninit_queue(*q);
 	*q = NULL;
@@ -280,7 +302,6 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			    struct file *f,
 			    struct queue_properties *properties,
 			    unsigned int *qid,
-			    struct amdgpu_bo *wptr_bo,
 			    const struct kfd_criu_queue_priv_data *q_data,
 			    const void *restore_mqd,
 			    const void *restore_ctl_stack,
@@ -351,7 +372,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 		 * allocate_sdma_queue() in create_queue() has the
 		 * corresponding check logic.
 		 */
-		retval = init_user_queue(pqm, dev, &q, properties, f, wptr_bo, *qid);
+		retval = init_user_queue(pqm, dev, &q, properties, f, *qid);
 		if (retval != 0)
 			goto err_create_queue;
 		pqn->q = q;
@@ -372,7 +393,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			goto err_create_queue;
 		}
 
-		retval = init_user_queue(pqm, dev, &q, properties, f, wptr_bo, *qid);
+		retval = init_user_queue(pqm, dev, &q, properties, f, *qid);
 		if (retval != 0)
 			goto err_create_queue;
 		pqn->q = q;
@@ -490,6 +511,10 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid)
 	}
 
 	if (pqn->q) {
+		retval = kfd_queue_release_buffers(pdd, &pqn->q->properties);
+		if (retval)
+			goto err_destroy_queue;
+
 		kfd_procfs_del_queue(pqn->q);
 		dqm = pqn->q->device->dqm;
 		retval = dqm->ops.destroy_queue(dqm, &pdd->qpd, pqn->q);
@@ -971,7 +996,7 @@ int kfd_criu_restore_queue(struct kfd_process *p,
 
 	print_queue_properties(&qp);
 
-	ret = pqm_create_queue(&p->pqm, pdd->dev, NULL, &qp, &queue_id, NULL, q_data, mqd, ctl_stack,
+	ret = pqm_create_queue(&p->pqm, pdd->dev, NULL, &qp, &queue_id, q_data, mqd, ctl_stack,
 				NULL);
 	if (ret) {
 		pr_err("Failed to create new queue err:%d\n", ret);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
index 0f6992b1895c..b4529ec298a9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
@@ -82,3 +82,60 @@ void uninit_queue(struct queue *q)
 {
 	kfree(q);
 }
+
+int kfd_queue_buffer_get(struct amdgpu_vm *vm, void __user *addr, struct amdgpu_bo **pbo,
+			 u64 expected_size)
+{
+	struct amdgpu_bo_va_mapping *mapping;
+	u64 user_addr;
+	u64 size;
+
+	user_addr = (u64)addr >> AMDGPU_GPU_PAGE_SHIFT;
+	size = expected_size >> AMDGPU_GPU_PAGE_SHIFT;
+
+	mapping = amdgpu_vm_bo_lookup_mapping(vm, user_addr);
+	if (!mapping)
+		goto out_err;
+
+	if (user_addr != mapping->start || user_addr + size - 1 != mapping->last) {
+		pr_debug("expected size 0x%llx not equal to mapping addr 0x%llx size 0x%llx\n",
+			expected_size, mapping->start << AMDGPU_GPU_PAGE_SHIFT,
+			(mapping->last - mapping->start + 1) << AMDGPU_GPU_PAGE_SHIFT);
+		goto out_err;
+	}
+
+	*pbo = amdgpu_bo_ref(mapping->bo_va->base.bo);
+	return 0;
+
+out_err:
+	*pbo = NULL;
+	return -EINVAL;
+}
+
+int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_properties *properties)
+{
+	struct amdgpu_vm *vm;
+	int err;
+
+	vm = drm_priv_to_vm(pdd->drm_priv);
+	err = amdgpu_bo_reserve(vm->root.bo, false);
+	if (err)
+		return err;
+
+	err = kfd_queue_buffer_get(vm, properties->write_ptr, &properties->wptr_bo, PAGE_SIZE);
+	if (err)
+		goto out_unreserve;
+
+	amdgpu_bo_unreserve(vm->root.bo);
+	return 0;
+
+out_unreserve:
+	amdgpu_bo_unreserve(vm->root.bo);
+	return err;
+}
+
+int kfd_queue_release_buffers(struct kfd_process_device *pdd, struct queue_properties *properties)
+{
+	amdgpu_bo_unref(&properties->wptr_bo);
+	return 0;
+}
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/9] drm/amdkfd: Validate user queue buffers
  2024-07-18 21:05 [PATCH v2 0/9] KFD user queue validation Philip Yang
                   ` (2 preceding siblings ...)
  2024-07-18 21:05 ` [PATCH v2 3/9] drm/amdkfd: Refactor queue wptr_bo GART mapping Philip Yang
@ 2024-07-18 21:05 ` Philip Yang
  2025-02-12 22:42   ` Uwe Kleine-König
  2024-07-18 21:05 ` [PATCH v2 5/9] drm/amdkfd: Ensure user queue buffers residency Philip Yang
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Philip Yang @ 2024-07-18 21:05 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, Alexander.Deucher, christian.koenig, Philip Yang,
	Felix Kuehling

Find user queue rptr, ring buf, eop buffer and cwsr area BOs, and
check BOs are mapped on the GPU with correct size and take the BO
reference.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  4 +++
 drivers/gpu/drm/amd/amdkfd/kfd_queue.c | 38 ++++++++++++++++++++++++--
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index aba9bcd91f65..80d8080c5764 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -524,6 +524,10 @@ struct queue_properties {
 	uint64_t exception_status;
 
 	struct amdgpu_bo *wptr_bo;
+	struct amdgpu_bo *rptr_bo;
+	struct amdgpu_bo *ring_bo;
+	struct amdgpu_bo *eop_buf_bo;
+	struct amdgpu_bo *cwsr_bo;
 };
 
 #define QUEUE_IS_ACTIVE(q) ((q).queue_size > 0 &&	\
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
index b4529ec298a9..0e661160c295 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
@@ -97,7 +97,8 @@ int kfd_queue_buffer_get(struct amdgpu_vm *vm, void __user *addr, struct amdgpu_
 	if (!mapping)
 		goto out_err;
 
-	if (user_addr != mapping->start || user_addr + size - 1 != mapping->last) {
+	if (user_addr != mapping->start ||
+	    (size != 0 && user_addr + size - 1 != mapping->last)) {
 		pr_debug("expected size 0x%llx not equal to mapping addr 0x%llx size 0x%llx\n",
 			expected_size, mapping->start << AMDGPU_GPU_PAGE_SHIFT,
 			(mapping->last - mapping->start + 1) << AMDGPU_GPU_PAGE_SHIFT);
@@ -124,18 +125,51 @@ int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_prope
 
 	err = kfd_queue_buffer_get(vm, properties->write_ptr, &properties->wptr_bo, PAGE_SIZE);
 	if (err)
+		goto out_err_unreserve;
+
+	err = kfd_queue_buffer_get(vm, properties->read_ptr, &properties->rptr_bo, PAGE_SIZE);
+	if (err)
+		goto out_err_unreserve;
+
+	err = kfd_queue_buffer_get(vm, (void *)properties->queue_address,
+				   &properties->ring_bo, properties->queue_size);
+	if (err)
+		goto out_err_unreserve;
+
+	/* only compute queue requires EOP buffer and CWSR area */
+	if (properties->type != KFD_QUEUE_TYPE_COMPUTE)
 		goto out_unreserve;
 
+	/* EOP buffer is not required for all ASICs */
+	if (properties->eop_ring_buffer_address) {
+		err = kfd_queue_buffer_get(vm, (void *)properties->eop_ring_buffer_address,
+					   &properties->eop_buf_bo,
+					   properties->eop_ring_buffer_size);
+		if (err)
+			goto out_err_unreserve;
+	}
+
+	err = kfd_queue_buffer_get(vm, (void *)properties->ctx_save_restore_area_address,
+				   &properties->cwsr_bo, 0);
+	if (err)
+		goto out_err_unreserve;
+
+out_unreserve:
 	amdgpu_bo_unreserve(vm->root.bo);
 	return 0;
 
-out_unreserve:
+out_err_unreserve:
 	amdgpu_bo_unreserve(vm->root.bo);
+	kfd_queue_release_buffers(pdd, properties);
 	return err;
 }
 
 int kfd_queue_release_buffers(struct kfd_process_device *pdd, struct queue_properties *properties)
 {
 	amdgpu_bo_unref(&properties->wptr_bo);
+	amdgpu_bo_unref(&properties->rptr_bo);
+	amdgpu_bo_unref(&properties->ring_bo);
+	amdgpu_bo_unref(&properties->eop_buf_bo);
+	amdgpu_bo_unref(&properties->cwsr_bo);
 	return 0;
 }
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 5/9] drm/amdkfd: Ensure user queue buffers residency
  2024-07-18 21:05 [PATCH v2 0/9] KFD user queue validation Philip Yang
                   ` (3 preceding siblings ...)
  2024-07-18 21:05 ` [PATCH v2 4/9] drm/amdkfd: Validate user queue buffers Philip Yang
@ 2024-07-18 21:05 ` Philip Yang
  2024-07-18 21:05 ` [PATCH v2 6/9] drm/amdkfd: Validate user queue svm memory residency Philip Yang
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Philip Yang @ 2024-07-18 21:05 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix.Kuehling, Alexander.Deucher, christian.koenig, Philip Yang

Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap
BO from the GPU if the bo_va queue_refcount is not zero.

Create queue to increase the bo_va queue_refcount, destroy queue to
decrease the bo_va queue_refcount, to ensure the queue buffers mapped on
the GPU when queue is active.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 14 ++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |  6 ++++
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |  3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_queue.c        | 34 ++++++++++++++++---
 5 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 0ab37e7aec26..6d5fd371d5ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1252,7 +1252,7 @@ static int unreserve_bo_and_vms(struct bo_vm_reservation_context *ctx,
 	return ret;
 }
 
-static void unmap_bo_from_gpuvm(struct kgd_mem *mem,
+static int unmap_bo_from_gpuvm(struct kgd_mem *mem,
 				struct kfd_mem_attachment *entry,
 				struct amdgpu_sync *sync)
 {
@@ -1260,11 +1260,18 @@ static void unmap_bo_from_gpuvm(struct kgd_mem *mem,
 	struct amdgpu_device *adev = entry->adev;
 	struct amdgpu_vm *vm = bo_va->base.vm;
 
+	if (bo_va->queue_refcount) {
+		pr_debug("bo_va->queue_refcount %d\n", bo_va->queue_refcount);
+		return -EBUSY;
+	}
+
 	amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
 
 	amdgpu_vm_clear_freed(adev, vm, &bo_va->last_pt_update);
 
 	amdgpu_sync_fence(sync, bo_va->last_pt_update);
+
+	return 0;
 }
 
 static int update_gpuvm_pte(struct kgd_mem *mem,
@@ -2191,7 +2198,10 @@ int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
 		pr_debug("\t unmap VA 0x%llx - 0x%llx from entry %p\n",
 			 entry->va, entry->va + bo_size, entry);
 
-		unmap_bo_from_gpuvm(mem, entry, ctx.sync);
+		ret = unmap_bo_from_gpuvm(mem, entry, ctx.sync);
+		if (ret)
+			goto unreserve_out;
+
 		entry->is_mapped = false;
 
 		mem->mapped_to_gpu_memory--;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index bc42ccbde659..d7e27957013f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -90,6 +90,12 @@ struct amdgpu_bo_va {
 	bool				cleared;
 
 	bool				is_xgmi;
+
+	/*
+	 * protected by vm reservation lock
+	 * if non-zero, cannot unmap from GPU because user queues may still access it
+	 */
+	unsigned int			queue_refcount;
 };
 
 struct amdgpu_bo {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 202f24ee4bd7..65a37ac5a0f0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1384,8 +1384,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
 		err = amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
 			peer_pdd->dev->adev, (struct kgd_mem *)mem, peer_pdd->drm_priv);
 		if (err) {
-			pr_err("Failed to unmap from gpu %d/%d\n",
-			       i, args->n_devices);
+			pr_debug("Failed to unmap from gpu %d/%d\n", i, args->n_devices);
 			goto unmap_memory_from_gpu_failed;
 		}
 		args->n_success = i+1;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 80d8080c5764..c31589043d5b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1292,6 +1292,7 @@ void print_queue_properties(struct queue_properties *q);
 void print_queue(struct queue *q);
 int kfd_queue_buffer_get(struct amdgpu_vm *vm, void __user *addr, struct amdgpu_bo **pbo,
 			 u64 expected_size);
+void kfd_queue_buffer_put(struct amdgpu_vm *vm, struct amdgpu_bo **bo);
 int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_properties *properties);
 int kfd_queue_release_buffers(struct kfd_process_device *pdd, struct queue_properties *properties);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
index 0e661160c295..3fd386dcb011 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
@@ -106,6 +106,7 @@ int kfd_queue_buffer_get(struct amdgpu_vm *vm, void __user *addr, struct amdgpu_
 	}
 
 	*pbo = amdgpu_bo_ref(mapping->bo_va->base.bo);
+	mapping->bo_va->queue_refcount++;
 	return 0;
 
 out_err:
@@ -113,6 +114,19 @@ int kfd_queue_buffer_get(struct amdgpu_vm *vm, void __user *addr, struct amdgpu_
 	return -EINVAL;
 }
 
+void kfd_queue_buffer_put(struct amdgpu_vm *vm, struct amdgpu_bo **bo)
+{
+	if (*bo) {
+		struct amdgpu_bo_va *bo_va;
+
+		bo_va = amdgpu_vm_bo_find(vm, *bo);
+		if (bo_va)
+			bo_va->queue_refcount--;
+	}
+
+	amdgpu_bo_unref(bo);
+}
+
 int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_properties *properties)
 {
 	struct amdgpu_vm *vm;
@@ -166,10 +180,20 @@ int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_prope
 
 int kfd_queue_release_buffers(struct kfd_process_device *pdd, struct queue_properties *properties)
 {
-	amdgpu_bo_unref(&properties->wptr_bo);
-	amdgpu_bo_unref(&properties->rptr_bo);
-	amdgpu_bo_unref(&properties->ring_bo);
-	amdgpu_bo_unref(&properties->eop_buf_bo);
-	amdgpu_bo_unref(&properties->cwsr_bo);
+	struct amdgpu_vm *vm;
+	int err;
+
+	vm = drm_priv_to_vm(pdd->drm_priv);
+	err = amdgpu_bo_reserve(vm->root.bo, false);
+	if (err)
+		return err;
+
+	kfd_queue_buffer_put(vm, &properties->wptr_bo);
+	kfd_queue_buffer_put(vm, &properties->rptr_bo);
+	kfd_queue_buffer_put(vm, &properties->ring_bo);
+	kfd_queue_buffer_put(vm, &properties->eop_buf_bo);
+	kfd_queue_buffer_put(vm, &properties->cwsr_bo);
+
+	amdgpu_bo_unreserve(vm->root.bo);
 	return 0;
 }
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 6/9] drm/amdkfd: Validate user queue svm memory residency
  2024-07-18 21:05 [PATCH v2 0/9] KFD user queue validation Philip Yang
                   ` (4 preceding siblings ...)
  2024-07-18 21:05 ` [PATCH v2 5/9] drm/amdkfd: Ensure user queue buffers residency Philip Yang
@ 2024-07-18 21:05 ` Philip Yang
  2024-07-18 21:05 ` [PATCH v2 7/9] drm/amdkfd: Validate user queue update Philip Yang
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Philip Yang @ 2024-07-18 21:05 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, Alexander.Deucher, christian.koenig, Philip Yang,
	Felix Kuehling

Queue CWSR area maybe registered to GPU as svm memory, create queue to
ensure svm mapped to GPU with KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag.

Add queue_refcount to struct svm_range, to track queue CWSR area usage.

Because unmap mmu notifier callback return value is ignored, if
application unmap the CWSR area while queue is active, pr_warn message
in dmesg log. To be safe, evict user queue.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_queue.c | 110 ++++++++++++++++++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c   |  12 +++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h   |   1 +
 3 files changed, 122 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
index 3fd386dcb011..67242ce051b5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
@@ -24,6 +24,7 @@
 
 #include <linux/slab.h>
 #include "kfd_priv.h"
+#include "kfd_svm.h"
 
 void print_queue_properties(struct queue_properties *q)
 {
@@ -83,6 +84,100 @@ void uninit_queue(struct queue *q)
 	kfree(q);
 }
 
+static int kfd_queue_buffer_svm_get(struct kfd_process_device *pdd, u64 addr, u64 size)
+{
+	struct kfd_process *p = pdd->process;
+	struct list_head update_list;
+	struct svm_range *prange;
+	int ret = -EINVAL;
+
+	INIT_LIST_HEAD(&update_list);
+	addr >>= PAGE_SHIFT;
+	size >>= PAGE_SHIFT;
+
+	mutex_lock(&p->svms.lock);
+
+	/*
+	 * range may split to multiple svm pranges aligned to granularity boundaery.
+	 */
+	while (size) {
+		uint32_t gpuid, gpuidx;
+		int r;
+
+		prange = svm_range_from_addr(&p->svms, addr, NULL);
+		if (!prange)
+			break;
+
+		if (!prange->mapped_to_gpu)
+			break;
+
+		r = kfd_process_gpuid_from_node(p, pdd->dev, &gpuid, &gpuidx);
+		if (r < 0)
+			break;
+		if (!test_bit(gpuidx, prange->bitmap_access) &&
+		    !test_bit(gpuidx, prange->bitmap_aip))
+			break;
+
+		if (!(prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED))
+			break;
+
+		list_add(&prange->update_list, &update_list);
+
+		if (prange->last - prange->start + 1 >= size) {
+			size = 0;
+			break;
+		}
+
+		size -= prange->last - prange->start + 1;
+		addr += prange->last - prange->start + 1;
+	}
+	if (size) {
+		pr_debug("[0x%llx 0x%llx] not registered\n", addr, addr + size - 1);
+		goto out_unlock;
+	}
+
+	list_for_each_entry(prange, &update_list, update_list)
+		atomic_inc(&prange->queue_refcount);
+	ret = 0;
+
+out_unlock:
+	mutex_unlock(&p->svms.lock);
+	return ret;
+}
+
+static void kfd_queue_buffer_svm_put(struct kfd_process_device *pdd, u64 addr, u64 size)
+{
+	struct kfd_process *p = pdd->process;
+	struct svm_range *prange, *pchild;
+	struct interval_tree_node *node;
+	unsigned long last;
+
+	addr >>= PAGE_SHIFT;
+	last = addr + (size >> PAGE_SHIFT) - 1;
+
+	mutex_lock(&p->svms.lock);
+
+	node = interval_tree_iter_first(&p->svms.objects, addr, last);
+	while (node) {
+		struct interval_tree_node *next_node;
+		unsigned long next_start;
+
+		prange = container_of(node, struct svm_range, it_node);
+		next_node = interval_tree_iter_next(node, addr, last);
+		next_start = min(node->last, last) + 1;
+
+		if (atomic_add_unless(&prange->queue_refcount, -1, 0)) {
+			list_for_each_entry(pchild, &prange->child_list, child_list)
+				atomic_add_unless(&pchild->queue_refcount, -1, 0);
+		}
+
+		node = next_node;
+		addr = next_start;
+	}
+
+	mutex_unlock(&p->svms.lock);
+}
+
 int kfd_queue_buffer_get(struct amdgpu_vm *vm, void __user *addr, struct amdgpu_bo **pbo,
 			 u64 expected_size)
 {
@@ -165,8 +260,17 @@ int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_prope
 
 	err = kfd_queue_buffer_get(vm, (void *)properties->ctx_save_restore_area_address,
 				   &properties->cwsr_bo, 0);
+	if (!err)
+		goto out_unreserve;
+
+	amdgpu_bo_unreserve(vm->root.bo);
+
+	err = kfd_queue_buffer_svm_get(pdd, properties->ctx_save_restore_area_address,
+				       properties->ctx_save_restore_area_size);
 	if (err)
-		goto out_err_unreserve;
+		goto out_err_release;
+
+	return 0;
 
 out_unreserve:
 	amdgpu_bo_unreserve(vm->root.bo);
@@ -174,6 +278,7 @@ int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_prope
 
 out_err_unreserve:
 	amdgpu_bo_unreserve(vm->root.bo);
+out_err_release:
 	kfd_queue_release_buffers(pdd, properties);
 	return err;
 }
@@ -195,5 +300,8 @@ int kfd_queue_release_buffers(struct kfd_process_device *pdd, struct queue_prope
 	kfd_queue_buffer_put(vm, &properties->cwsr_bo);
 
 	amdgpu_bo_unreserve(vm->root.bo);
+
+	kfd_queue_buffer_svm_put(pdd, properties->ctx_save_restore_area_address,
+				 properties->ctx_save_restore_area_size);
 	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 7b671aefab01..10b1a1f64198 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1051,6 +1051,7 @@ svm_range_split_adjust(struct svm_range *new, struct svm_range *old,
 	new->mapped_to_gpu = old->mapped_to_gpu;
 	bitmap_copy(new->bitmap_access, old->bitmap_access, MAX_GPU_INSTANCE);
 	bitmap_copy(new->bitmap_aip, old->bitmap_aip, MAX_GPU_INSTANCE);
+	atomic_set(&new->queue_refcount, atomic_read(&old->queue_refcount));
 
 	return 0;
 }
@@ -1986,6 +1987,7 @@ static struct svm_range *svm_range_clone(struct svm_range *old)
 	new->vram_pages = old->vram_pages;
 	bitmap_copy(new->bitmap_access, old->bitmap_access, MAX_GPU_INSTANCE);
 	bitmap_copy(new->bitmap_aip, old->bitmap_aip, MAX_GPU_INSTANCE);
+	atomic_set(&new->queue_refcount, atomic_read(&old->queue_refcount));
 
 	return new;
 }
@@ -2438,6 +2440,16 @@ svm_range_unmap_from_cpu(struct mm_struct *mm, struct svm_range *prange,
 	unsigned long s, l;
 	bool unmap_parent;
 
+	if (atomic_read(&prange->queue_refcount)) {
+		int r;
+
+		pr_warn("Freeing queue vital buffer 0x%lx, queue evicted\n",
+			prange->start << PAGE_SHIFT);
+		r = kgd2kfd_quiesce_mm(mm, KFD_QUEUE_EVICTION_TRIGGER_SVM);
+		if (r)
+			pr_debug("failed %d to quiesce KFD queues\n", r);
+	}
+
 	p = kfd_lookup_process_by_mm(mm);
 	if (!p)
 		return;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index 70c1776611c4..747325a2ea89 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
@@ -137,6 +137,7 @@ struct svm_range {
 	DECLARE_BITMAP(bitmap_access, MAX_GPU_INSTANCE);
 	DECLARE_BITMAP(bitmap_aip, MAX_GPU_INSTANCE);
 	bool				mapped_to_gpu;
+	atomic_t			queue_refcount;
 };
 
 static inline void svm_range_lock(struct svm_range *prange)
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 7/9] drm/amdkfd: Validate user queue update
  2024-07-18 21:05 [PATCH v2 0/9] KFD user queue validation Philip Yang
                   ` (5 preceding siblings ...)
  2024-07-18 21:05 ` [PATCH v2 6/9] drm/amdkfd: Validate user queue svm memory residency Philip Yang
@ 2024-07-18 21:05 ` Philip Yang
  2024-07-18 21:05 ` [PATCH v2 8/9] drm/amdkfd: Store queue cwsr area size to node properties Philip Yang
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Philip Yang @ 2024-07-18 21:05 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, Alexander.Deucher, christian.koenig, Philip Yang,
	Felix Kuehling

Ensure update queue new ring buffer is mapped on GPU with correct size.

Decrease queue old ring_bo queue_refcount and increase new ring_bo
queue_refcount.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
---
 .../amd/amdkfd/kfd_process_queue_manager.c    | 32 ++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 4947f28b3afb..9995dbb43359 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -549,11 +549,41 @@ int pqm_update_queue_properties(struct process_queue_manager *pqm,
 	struct process_queue_node *pqn;
 
 	pqn = get_queue_by_qid(pqm, qid);
-	if (!pqn) {
+	if (!pqn || !pqn->q) {
 		pr_debug("No queue %d exists for update operation\n", qid);
 		return -EFAULT;
 	}
 
+	/*
+	 * Update with NULL ring address is used to disable the queue
+	 */
+	if (p->queue_address && p->queue_size) {
+		struct kfd_process_device *pdd;
+		struct amdgpu_vm *vm;
+		struct queue *q = pqn->q;
+		int err;
+
+		pdd = kfd_get_process_device_data(q->device, q->process);
+		if (!pdd)
+			return -ENODEV;
+		vm = drm_priv_to_vm(pdd->drm_priv);
+		err = amdgpu_bo_reserve(vm->root.bo, false);
+		if (err)
+			return err;
+
+		if (kfd_queue_buffer_get(vm, (void *)p->queue_address, &p->ring_bo,
+					 p->queue_size)) {
+			pr_debug("ring buf 0x%llx size 0x%llx not mapped on GPU\n",
+				 p->queue_address, p->queue_size);
+			return -EFAULT;
+		}
+
+		kfd_queue_buffer_put(vm, &pqn->q->properties.ring_bo);
+		amdgpu_bo_unreserve(vm->root.bo);
+
+		pqn->q->properties.ring_bo = p->ring_bo;
+	}
+
 	pqn->q->properties.queue_address = p->queue_address;
 	pqn->q->properties.queue_size = p->queue_size;
 	pqn->q->properties.queue_percent = p->queue_percent;
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 8/9] drm/amdkfd: Store queue cwsr area size to node properties
  2024-07-18 21:05 [PATCH v2 0/9] KFD user queue validation Philip Yang
                   ` (6 preceding siblings ...)
  2024-07-18 21:05 ` [PATCH v2 7/9] drm/amdkfd: Validate user queue update Philip Yang
@ 2024-07-18 21:05 ` Philip Yang
  2024-07-18 21:05 ` [PATCH v2 9/9] drm/amdkfd: Validate queue cwsr area and eop buffer size Philip Yang
  2024-07-18 21:12 ` [PATCH v2 0/9] KFD user queue validation Felix Kuehling
  9 siblings, 0 replies; 18+ messages in thread
From: Philip Yang @ 2024-07-18 21:05 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, Alexander.Deucher, christian.koenig, Philip Yang,
	Felix Kuehling

Use the queue eop buffer size, cwsr area size, ctl stack size
calculation from Thunk, store the value to KFD node properties.

Those will be used to validate queue eop buffer size, cwsr area size,
ctl stack size when creating KFD user compute queue.

Those will be exposed to user space via sysfs KFD node properties, to
remove the duplicate calculation code from Thunk.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_queue.c    | 75 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  2 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  4 ++
 4 files changed, 82 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index c31589043d5b..b5cae48dff66 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1295,6 +1295,7 @@ int kfd_queue_buffer_get(struct amdgpu_vm *vm, void __user *addr, struct amdgpu_
 void kfd_queue_buffer_put(struct amdgpu_vm *vm, struct amdgpu_bo **bo);
 int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_properties *properties);
 int kfd_queue_release_buffers(struct kfd_process_device *pdd, struct queue_properties *properties);
+void kfd_queue_ctx_save_restore_size(struct kfd_topology_device *dev);
 
 struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE type,
 		struct kfd_node *dev);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
index 67242ce051b5..adcda9730c9f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
@@ -24,6 +24,7 @@
 
 #include <linux/slab.h>
 #include "kfd_priv.h"
+#include "kfd_topology.h"
 #include "kfd_svm.h"
 
 void print_queue_properties(struct queue_properties *q)
@@ -305,3 +306,77 @@ int kfd_queue_release_buffers(struct kfd_process_device *pdd, struct queue_prope
 				 properties->ctx_save_restore_area_size);
 	return 0;
 }
+
+#define SGPR_SIZE_PER_CU	0x4000
+#define LDS_SIZE_PER_CU		0x10000
+#define HWREG_SIZE_PER_CU	0x1000
+#define DEBUGGER_BYTES_ALIGN	64
+#define DEBUGGER_BYTES_PER_WAVE	32
+
+static u32 kfd_get_vgpr_size_per_cu(u32 gfxv)
+{
+	u32 vgpr_size = 0x40000;
+
+	if ((gfxv / 100 * 100) == 90400 ||	/* GFX_VERSION_AQUA_VANJARAM */
+	    gfxv == 90010 ||			/* GFX_VERSION_ALDEBARAN */
+	    gfxv == 90008)			/* GFX_VERSION_ARCTURUS */
+		vgpr_size = 0x80000;
+	else if (gfxv == 110000 ||		/* GFX_VERSION_PLUM_BONITO */
+		 gfxv == 110001 ||		/* GFX_VERSION_WHEAT_NAS */
+		 gfxv == 120000 ||		/* GFX_VERSION_GFX1200 */
+		 gfxv == 120001)		/* GFX_VERSION_GFX1201 */
+		vgpr_size = 0x60000;
+
+	return vgpr_size;
+}
+
+#define WG_CONTEXT_DATA_SIZE_PER_CU(gfxv)	\
+	(kfd_get_vgpr_size_per_cu(gfxv) + SGPR_SIZE_PER_CU +\
+	 LDS_SIZE_PER_CU + HWREG_SIZE_PER_CU)
+
+#define CNTL_STACK_BYTES_PER_WAVE(gfxv)	\
+	((gfxv) >= 100100 ? 12 : 8)	/* GFX_VERSION_NAVI10*/
+
+#define SIZEOF_HSA_USER_CONTEXT_SAVE_AREA_HEADER 40
+
+void kfd_queue_ctx_save_restore_size(struct kfd_topology_device *dev)
+{
+	struct kfd_node_properties *props = &dev->node_props;
+	u32 gfxv = props->gfx_target_version;
+	u32 ctl_stack_size;
+	u32 wg_data_size;
+	u32 wave_num;
+	u32 cu_num;
+
+	if (gfxv < 80001)	/* GFX_VERSION_CARRIZO */
+		return;
+
+	cu_num = props->simd_count / props->simd_per_cu / NUM_XCC(dev->gpu->xcc_mask);
+	wave_num = (gfxv < 100100) ?	/* GFX_VERSION_NAVI10 */
+		    min(cu_num * 40, props->array_count / props->simd_arrays_per_engine * 512)
+		    : cu_num * 32;
+
+	wg_data_size = ALIGN(cu_num * WG_CONTEXT_DATA_SIZE_PER_CU(gfxv), PAGE_SIZE);
+	ctl_stack_size = wave_num * CNTL_STACK_BYTES_PER_WAVE(gfxv) + 8;
+	ctl_stack_size = ALIGN(SIZEOF_HSA_USER_CONTEXT_SAVE_AREA_HEADER + ctl_stack_size,
+			       PAGE_SIZE);
+
+	if ((gfxv / 10000 * 10000) == 100000) {
+		/* HW design limits control stack size to 0x7000.
+		 * This is insufficient for theoretical PM4 cases
+		 * but sufficient for AQL, limited by SPI events.
+		 */
+		ctl_stack_size = min(ctl_stack_size, 0x7000);
+	}
+
+	props->ctl_stack_size = ctl_stack_size;
+	props->debug_memory_size = ALIGN(wave_num * DEBUGGER_BYTES_PER_WAVE, DEBUGGER_BYTES_ALIGN);
+	props->cwsr_size = ctl_stack_size + wg_data_size;
+
+	if (gfxv == 80002)	/* GFX_VERSION_TONGA */
+		props->eop_buffer_size = 0x8000;
+	else if ((gfxv / 100 * 100) == 90400)	/* GFX_VERSION_AQUA_VANJARAM */
+		props->eop_buffer_size = 4096;
+	else if (gfxv >= 80000)
+		props->eop_buffer_size = 4096;
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 6f89b06f89d3..a9b3eda65a2c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -2120,6 +2120,8 @@ int kfd_topology_add_device(struct kfd_node *gpu)
 		dev->gpu->adev->gmc.xgmi.connected_to_cpu)
 		dev->node_props.capability |= HSA_CAP_FLAGS_COHERENTHOSTACCESS;
 
+	kfd_queue_ctx_save_restore_size(dev);
+
 	kfd_debug_print_topology();
 
 	kfd_notify_gpu_change(gpu_id, 1);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index 2d1c9d771bef..43ba0d32e5bd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -74,6 +74,10 @@ struct kfd_node_properties {
 	uint32_t num_sdma_xgmi_engines;
 	uint32_t num_sdma_queues_per_engine;
 	uint32_t num_cp_queues;
+	uint32_t cwsr_size;
+	uint32_t ctl_stack_size;
+	uint32_t eop_buffer_size;
+	uint32_t debug_memory_size;
 	char name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
 };
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 9/9] drm/amdkfd: Validate queue cwsr area and eop buffer size
  2024-07-18 21:05 [PATCH v2 0/9] KFD user queue validation Philip Yang
                   ` (7 preceding siblings ...)
  2024-07-18 21:05 ` [PATCH v2 8/9] drm/amdkfd: Store queue cwsr area size to node properties Philip Yang
@ 2024-07-18 21:05 ` Philip Yang
  2024-07-18 21:12 ` [PATCH v2 0/9] KFD user queue validation Felix Kuehling
  9 siblings, 0 replies; 18+ messages in thread
From: Philip Yang @ 2024-07-18 21:05 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, Alexander.Deucher, christian.koenig, Philip Yang,
	Felix Kuehling

When creating KFD user compute queue, check if queue eop buffer size,
cwsr area size, ctl stack size equal to the size of KFD node
properities.

Check the entire cwsr area which may split into multiple svm ranges
aligned to gramularity boundary.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_queue.c | 46 +++++++++++++++++++++++---
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
index adcda9730c9f..9807e8adf77d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
@@ -225,9 +225,15 @@ void kfd_queue_buffer_put(struct amdgpu_vm *vm, struct amdgpu_bo **bo)
 
 int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_properties *properties)
 {
+	struct kfd_topology_device *topo_dev;
 	struct amdgpu_vm *vm;
+	u32 total_cwsr_size;
 	int err;
 
+	topo_dev = kfd_topology_device_by_id(pdd->dev->id);
+	if (!topo_dev)
+		return -EINVAL;
+
 	vm = drm_priv_to_vm(pdd->drm_priv);
 	err = amdgpu_bo_reserve(vm->root.bo, false);
 	if (err)
@@ -252,6 +258,12 @@ int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_prope
 
 	/* EOP buffer is not required for all ASICs */
 	if (properties->eop_ring_buffer_address) {
+		if (properties->eop_ring_buffer_size != topo_dev->node_props.eop_buffer_size) {
+			pr_debug("queue eop bo size 0x%lx not equal to node eop buf size 0x%x\n",
+				properties->eop_buf_bo->tbo.base.size,
+				topo_dev->node_props.eop_buffer_size);
+			goto out_err_unreserve;
+		}
 		err = kfd_queue_buffer_get(vm, (void *)properties->eop_ring_buffer_address,
 					   &properties->eop_buf_bo,
 					   properties->eop_ring_buffer_size);
@@ -259,15 +271,33 @@ int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_prope
 			goto out_err_unreserve;
 	}
 
+	if (properties->ctl_stack_size != topo_dev->node_props.ctl_stack_size) {
+		pr_debug("queue ctl stack size 0x%x not equal to node ctl stack size 0x%x\n",
+			properties->ctl_stack_size,
+			topo_dev->node_props.ctl_stack_size);
+		goto out_err_unreserve;
+	}
+
+	if (properties->ctx_save_restore_area_size != topo_dev->node_props.cwsr_size) {
+		pr_debug("queue cwsr size 0x%x not equal to node cwsr size 0x%x\n",
+			properties->ctx_save_restore_area_size,
+			topo_dev->node_props.cwsr_size);
+		goto out_err_unreserve;
+	}
+
+	total_cwsr_size = (topo_dev->node_props.cwsr_size + topo_dev->node_props.debug_memory_size)
+			  * NUM_XCC(pdd->dev->xcc_mask);
+	total_cwsr_size = ALIGN(total_cwsr_size, PAGE_SIZE);
+
 	err = kfd_queue_buffer_get(vm, (void *)properties->ctx_save_restore_area_address,
-				   &properties->cwsr_bo, 0);
+				   &properties->cwsr_bo, total_cwsr_size);
 	if (!err)
 		goto out_unreserve;
 
 	amdgpu_bo_unreserve(vm->root.bo);
 
 	err = kfd_queue_buffer_svm_get(pdd, properties->ctx_save_restore_area_address,
-				       properties->ctx_save_restore_area_size);
+				       total_cwsr_size);
 	if (err)
 		goto out_err_release;
 
@@ -286,7 +316,9 @@ int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_prope
 
 int kfd_queue_release_buffers(struct kfd_process_device *pdd, struct queue_properties *properties)
 {
+	struct kfd_topology_device *topo_dev;
 	struct amdgpu_vm *vm;
+	u32 total_cwsr_size;
 	int err;
 
 	vm = drm_priv_to_vm(pdd->drm_priv);
@@ -302,8 +334,14 @@ int kfd_queue_release_buffers(struct kfd_process_device *pdd, struct queue_prope
 
 	amdgpu_bo_unreserve(vm->root.bo);
 
-	kfd_queue_buffer_svm_put(pdd, properties->ctx_save_restore_area_address,
-				 properties->ctx_save_restore_area_size);
+	topo_dev = kfd_topology_device_by_id(pdd->dev->id);
+	if (!topo_dev)
+		return -EINVAL;
+	total_cwsr_size = (topo_dev->node_props.cwsr_size + topo_dev->node_props.debug_memory_size)
+			  * NUM_XCC(pdd->dev->xcc_mask);
+	total_cwsr_size = ALIGN(total_cwsr_size, PAGE_SIZE);
+
+	kfd_queue_buffer_svm_put(pdd, properties->ctx_save_restore_area_address, total_cwsr_size);
 	return 0;
 }
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/9] KFD user queue validation
  2024-07-18 21:05 [PATCH v2 0/9] KFD user queue validation Philip Yang
                   ` (8 preceding siblings ...)
  2024-07-18 21:05 ` [PATCH v2 9/9] drm/amdkfd: Validate queue cwsr area and eop buffer size Philip Yang
@ 2024-07-18 21:12 ` Felix Kuehling
  2024-07-19  8:45   ` Christian König
  9 siblings, 1 reply; 18+ messages in thread
From: Felix Kuehling @ 2024-07-18 21:12 UTC (permalink / raw)
  To: Philip Yang, amd-gfx; +Cc: Alexander.Deucher, christian.koenig


On 2024-07-18 17:05, Philip Yang wrote:
> This patch series do additional queue buffers validation in the queue
> creation IOCTLS, fail the queue creation if buffers not mapped on the GPU
> with the expected size.
> 
> Ensure queue buffers residency by tracking the GPUVM virtual addresses
> for queue buffers to return error if the user tries to free and unmap them
> when the qeueu is active, or evict the queue if SVM memory is unmapped and
> freed from CPU.
> 
> Patch 1-2 is prepration work and general fix.
> 
> v2:
>  - patch 3/9, keep wptr_bo_gart in struct queue

The series is
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>

> 
> Philip Yang (9):
>   drm/amdkfd: kfd_bo_mapped_dev support partition
>   drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer
>   drm/amdkfd: Refactor queue wptr_bo GART mapping
>   drm/amdkfd: Validate user queue buffers
>   drm/amdkfd: Ensure user queue buffers residency
>   drm/amdkfd: Validate user queue svm memory residency
>   drm/amdkfd: Validate user queue update
>   drm/amdkfd: Store queue cwsr area size to node properties
>   drm/amdkfd: Validate queue cwsr area and eop buffer size
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c    |  14 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   6 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  24 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |   6 +
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |  61 +---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c       |   4 +-
>  .../drm/amd/amdkfd/kfd_device_queue_manager.c |   8 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c  |   2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  19 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c      |   2 +-
>  .../amd/amdkfd/kfd_process_queue_manager.c    |  79 +++-
>  drivers/gpu/drm/amd/amdkfd/kfd_queue.c        | 336 ++++++++++++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  12 +
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.h          |   1 +
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c     |   2 +
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h     |   4 +
>  16 files changed, 489 insertions(+), 91 deletions(-)
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/9] KFD user queue validation
  2024-07-18 21:12 ` [PATCH v2 0/9] KFD user queue validation Felix Kuehling
@ 2024-07-19  8:45   ` Christian König
  0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2024-07-19  8:45 UTC (permalink / raw)
  To: Felix Kuehling, Philip Yang, amd-gfx; +Cc: Alexander.Deucher

Am 18.07.24 um 23:12 schrieb Felix Kuehling:
> On 2024-07-18 17:05, Philip Yang wrote:
>> This patch series do additional queue buffers validation in the queue
>> creation IOCTLS, fail the queue creation if buffers not mapped on the GPU
>> with the expected size.
>>
>> Ensure queue buffers residency by tracking the GPUVM virtual addresses
>> for queue buffers to return error if the user tries to free and unmap them
>> when the qeueu is active, or evict the queue if SVM memory is unmapped and
>> freed from CPU.
>>
>> Patch 1-2 is prepration work and general fix.
>>
>> v2:
>>   - patch 3/9, keep wptr_bo_gart in struct queue
> The series is
> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>

I only skimmed over it and will probably find something to complain on 
later.

But we need to get this out of the door, so feel free to add Acked-by: 
Christian König <christian.koenig@amd.com> to the series for now.

Thanks,
Christian.

>
>> Philip Yang (9):
>>    drm/amdkfd: kfd_bo_mapped_dev support partition
>>    drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer
>>    drm/amdkfd: Refactor queue wptr_bo GART mapping
>>    drm/amdkfd: Validate user queue buffers
>>    drm/amdkfd: Ensure user queue buffers residency
>>    drm/amdkfd: Validate user queue svm memory residency
>>    drm/amdkfd: Validate user queue update
>>    drm/amdkfd: Store queue cwsr area size to node properties
>>    drm/amdkfd: Validate queue cwsr area and eop buffer size
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c    |  14 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   6 +-
>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  24 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |   6 +
>>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |  61 +---
>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c       |   4 +-
>>   .../drm/amd/amdkfd/kfd_device_queue_manager.c |   8 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c  |   2 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  19 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c      |   2 +-
>>   .../amd/amdkfd/kfd_process_queue_manager.c    |  79 +++-
>>   drivers/gpu/drm/amd/amdkfd/kfd_queue.c        | 336 ++++++++++++++++++
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  12 +
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.h          |   1 +
>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c     |   2 +
>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h     |   4 +
>>   16 files changed, 489 insertions(+), 91 deletions(-)
>>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/9] drm/amdkfd: Validate user queue buffers
  2024-07-18 21:05 ` [PATCH v2 4/9] drm/amdkfd: Validate user queue buffers Philip Yang
@ 2025-02-12 22:42   ` Uwe Kleine-König
  2025-02-12 23:02     ` Philip Yang
  0 siblings, 1 reply; 18+ messages in thread
From: Uwe Kleine-König @ 2025-02-12 22:42 UTC (permalink / raw)
  To: Philip Yang
  Cc: amd-gfx, Felix.Kuehling, Alexander.Deucher, christian.koenig,
	regressions, Dieter Faulbaum

[-- Attachment #1: Type: text/plain, Size: 916 bytes --]

#regzbot introduced: 68e599db7a549f010a329515f3508d8a8c3467a4
#regzbot monitor: https://bugs.debian.org/1093124

Hello,

On Thu, Jul 18, 2024 at 05:05:53PM -0400, Philip Yang wrote:
> Find user queue rptr, ring buf, eop buffer and cwsr area BOs, and
> check BOs are mapped on the GPU with correct size and take the BO
> reference.
> 
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>

This change made it into v6.12-rc1 as 68e599db7a54 ("drm/amdkfd:
Validate user queue buffers"). A Debian user (Dieter Faulbaum, on Cc)
reported that this change introduced a regression using a gfx803 device
resulting in a HSA exception when e.g. darktable is used. I didn't even
try to understand the problem, but maybe one of you have an idea about
the issue?!

See more details in the Debian bug at https://bugs.debian.org/1093124

Best regards
Uwe

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/9] drm/amdkfd: Validate user queue buffers
  2025-02-12 22:42   ` Uwe Kleine-König
@ 2025-02-12 23:02     ` Philip Yang
  2025-02-13  8:43       ` Uwe Kleine-König
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Philip Yang @ 2025-02-12 23:02 UTC (permalink / raw)
  To: Uwe Kleine-König, Philip Yang
  Cc: amd-gfx, Felix.Kuehling, Alexander.Deucher, christian.koenig,
	regressions, Dieter Faulbaum

[-- Attachment #1: Type: text/html, Size: 2165 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/9] drm/amdkfd: Validate user queue buffers
  2025-02-12 23:02     ` Philip Yang
@ 2025-02-13  8:43       ` Uwe Kleine-König
  2025-02-13  9:56       ` Dieter Faulbaum
  2025-03-14 11:06       ` Dieter Faulbaum
  2 siblings, 0 replies; 18+ messages in thread
From: Uwe Kleine-König @ 2025-02-13  8:43 UTC (permalink / raw)
  To: Philip Yang
  Cc: Philip Yang, amd-gfx, Felix.Kuehling, Alexander.Deucher,
	christian.koenig, regressions, Dieter Faulbaum

[-- Attachment #1: Type: text/plain, Size: 640 bytes --]

On Wed, Feb 12, 2025 at 06:02:15PM -0500, Philip Yang wrote:
> [html-content]

Are you aware that your mail is hard for some people (e.g. those like me
who read their mail in a terminal, but also consider people who have to
rely on braille readers) to read and also isn't properly archived on
lore.kernel.org[1]?

Would be great if you could tune the settings of your thunderbird
profile to send text replies. The in-kernel documentation[2] might help
here.

Best regards
Uwe

[1] https://lore.kernel.org/all/315e27ab-f1d1-a681-a32f-1fc28cd81193@amd.com/
[2] https://www.kernel.org/doc/html/v4.11/process/email-clients.html#thunderbird-gui

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/9] drm/amdkfd: Validate user queue buffers
  2025-02-12 23:02     ` Philip Yang
  2025-02-13  8:43       ` Uwe Kleine-König
@ 2025-02-13  9:56       ` Dieter Faulbaum
  2025-03-14 11:06       ` Dieter Faulbaum
  2 siblings, 0 replies; 18+ messages in thread
From: Dieter Faulbaum @ 2025-02-13  9:56 UTC (permalink / raw)
  To: Philip Yang
  Cc: Uwe Kleine-König, Philip Yang, amd-gfx, Felix.Kuehling,
	Alexander.Deucher, christian.koenig, regressions


Hello Philip,

Philip Yang <yangp@amd.com> writes:

> On 2025-02-12 17:42, Uwe Kleine-König wrote:
>
>  #regzbot introduced: 68e599db7a549f010a329515f3508d8a8c3467a4
> #regzbot monitor: https://bugs.debian.org/1093124
>
> Hello,
>
> On Thu, Jul 18, 2024 at 05:05:53PM -0400, Philip Yang wrote:
>
>  Find user queue rptr, ring buf, eop buffer and cwsr area BOs, 
>  and
> check BOs are mapped on the GPU with correct size and take the 
> BO
> reference.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
>
>
> This change made it into v6.12-rc1 as 68e599db7a54 ("drm/amdkfd:
> Validate user queue buffers"). A Debian user (Dieter Faulbaum, 
> on Cc)
> reported that this change introduced a regression using a gfx803 
> device
> resulting in a HSA exception when e.g. darktable is used. I 
> didn't even
> try to understand the problem, but maybe one of you have an idea 
> about
> the issue?!
>
> Try this patch
>
> https://lore.kernel.org/all/20250130000412.29812-1-Philip.Yang@amd.com/T/
Thank you, this patch helps!
Tested with kernel version 6.12.0+

And (for me) the Debian bug report could be closed.

Thanks to all for your help!
Dieter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/9] drm/amdkfd: Validate user queue buffers
  2025-02-12 23:02     ` Philip Yang
  2025-02-13  8:43       ` Uwe Kleine-König
  2025-02-13  9:56       ` Dieter Faulbaum
@ 2025-03-14 11:06       ` Dieter Faulbaum
  2025-03-14 13:12         ` Alex Deucher
  2 siblings, 1 reply; 18+ messages in thread
From: Dieter Faulbaum @ 2025-03-14 11:06 UTC (permalink / raw)
  To: Philip Yang
  Cc: Uwe Kleine-König, Philip Yang, amd-gfx, Felix.Kuehling,
	Alexander.Deucher, christian.koenig, regressions


Hello Philip,

Philip Yang <yangp@amd.com> writes:

> On 2025-02-12 17:42, Uwe Kleine-König wrote:
>
>  #regzbot introduced: 68e599db7a549f010a329515f3508d8a8c3467a4
> #regzbot monitor: https://bugs.debian.org/1093124
>
> Hello,
>
> On Thu, Jul 18, 2024 at 05:05:53PM -0400, Philip Yang wrote:
>
>  Find user queue rptr, ring buf, eop buffer and cwsr area BOs, 
>  and
> check BOs are mapped on the GPU with correct size and take the 
> BO
> reference.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
>
>
> This change made it into v6.12-rc1 as 68e599db7a54 ("drm/amdkfd:
> Validate user queue buffers"). A Debian user (Dieter Faulbaum, 
> on Cc)
> reported that this change introduced a regression using a gfx803 
> device
> resulting in a HSA exception when e.g. darktable is used. I 
> didn't even
> try to understand the problem, but maybe one of you have an idea 
> about
> the issue?!
>
> Try this patch
>
> https://lore.kernel.org/all/20250130000412.29812-1-Philip.Yang@amd.com/T/

It seems for me, that your patch isn't applied in the mainline 
kernel.
What do you think, will it once happen?-)
Is it falling through cracks?



With regards
Dieter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/9] drm/amdkfd: Validate user queue buffers
  2025-03-14 11:06       ` Dieter Faulbaum
@ 2025-03-14 13:12         ` Alex Deucher
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Deucher @ 2025-03-14 13:12 UTC (permalink / raw)
  To: Dieter Faulbaum
  Cc: Philip Yang, Uwe Kleine-König, Philip Yang, amd-gfx,
	Felix.Kuehling, Alexander.Deucher, christian.koenig, regressions

On Fri, Mar 14, 2025 at 8:55 AM Dieter Faulbaum
<dieter@faulbaum.in-berlin.de> wrote:
>
>
> Hello Philip,
>
> Philip Yang <yangp@amd.com> writes:
>
> > On 2025-02-12 17:42, Uwe Kleine-König wrote:
> >
> >  #regzbot introduced: 68e599db7a549f010a329515f3508d8a8c3467a4
> > #regzbot monitor: https://bugs.debian.org/1093124
> >
> > Hello,
> >
> > On Thu, Jul 18, 2024 at 05:05:53PM -0400, Philip Yang wrote:
> >
> >  Find user queue rptr, ring buf, eop buffer and cwsr area BOs,
> >  and
> > check BOs are mapped on the GPU with correct size and take the
> > BO
> > reference.
> >
> > Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> > Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
> >
> >
> > This change made it into v6.12-rc1 as 68e599db7a54 ("drm/amdkfd:
> > Validate user queue buffers"). A Debian user (Dieter Faulbaum,
> > on Cc)
> > reported that this change introduced a regression using a gfx803
> > device
> > resulting in a HSA exception when e.g. darktable is used. I
> > didn't even
> > try to understand the problem, but maybe one of you have an idea
> > about
> > the issue?!
> >
> > Try this patch
> >
> > https://lore.kernel.org/all/20250130000412.29812-1-Philip.Yang@amd.com/T/
>
> It seems for me, that your patch isn't applied in the mainline
> kernel.
> What do you think, will it once happen?-)
> Is it falling through cracks?

It's in drm-next:
https://gitlab.freedesktop.org/drm/kernel/-/commit/e7a477735f1771b9a9346a5fbd09d7ff0641723a

I'll cherry-pick it for stable next week.

Alex

>
>
>
> With regards
> Dieter

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-03-14 13:12 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-18 21:05 [PATCH v2 0/9] KFD user queue validation Philip Yang
2024-07-18 21:05 ` [PATCH v2 1/9] drm/amdkfd: kfd_bo_mapped_dev support partition Philip Yang
2024-07-18 21:05 ` [PATCH v2 2/9] drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer Philip Yang
2024-07-18 21:05 ` [PATCH v2 3/9] drm/amdkfd: Refactor queue wptr_bo GART mapping Philip Yang
2024-07-18 21:05 ` [PATCH v2 4/9] drm/amdkfd: Validate user queue buffers Philip Yang
2025-02-12 22:42   ` Uwe Kleine-König
2025-02-12 23:02     ` Philip Yang
2025-02-13  8:43       ` Uwe Kleine-König
2025-02-13  9:56       ` Dieter Faulbaum
2025-03-14 11:06       ` Dieter Faulbaum
2025-03-14 13:12         ` Alex Deucher
2024-07-18 21:05 ` [PATCH v2 5/9] drm/amdkfd: Ensure user queue buffers residency Philip Yang
2024-07-18 21:05 ` [PATCH v2 6/9] drm/amdkfd: Validate user queue svm memory residency Philip Yang
2024-07-18 21:05 ` [PATCH v2 7/9] drm/amdkfd: Validate user queue update Philip Yang
2024-07-18 21:05 ` [PATCH v2 8/9] drm/amdkfd: Store queue cwsr area size to node properties Philip Yang
2024-07-18 21:05 ` [PATCH v2 9/9] drm/amdkfd: Validate queue cwsr area and eop buffer size Philip Yang
2024-07-18 21:12 ` [PATCH v2 0/9] KFD user queue validation Felix Kuehling
2024-07-19  8:45   ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.