[PATCH 0/2] Defer vm_bo cleanup in GPUVM with DRM_GPUVM_IMMEDIATE

rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Defer vm_bo cleanup in GPUVM with DRM_GPUVM_IMMEDIATE_MODE
@ 2025-09-05 12:11 Alice Ryhl
  2025-09-05 12:11 ` [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup Alice Ryhl
  2025-09-05 12:11 ` [PATCH 2/2] panthor: use drm_gpuva_unlink_defer() Alice Ryhl
  0 siblings, 2 replies; 27+ messages in thread
From: Alice Ryhl @ 2025-09-05 12:11 UTC (permalink / raw)
  To: Danilo Krummrich, Matthew Brost, Thomas Hellström
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Boris Brezillon, Steven Price, Daniel Almeida,
	Liviu Dudau, dri-devel, linux-kernel, rust-for-linux, Alice Ryhl

There are two main ways that GPUVM might be used:

* staged mode, where VM_BIND ioctls update the GPUVM immediately so that
  the GPUVM reflects the state of the VM *including* staged changes that
  are not yet applied to the GPU's virtual address space.
* immediate mode, where the GPUVM state is updated during run_job(),
  i.e., in the DMA fence signalling critical path, to ensure that the
  GPUVM and the GPU's virtual address space has the same state at all
  times.

Currently, only Panthor uses GPUVM in immediate mode, but the Rust
drivers Tyr and Nova will also use GPUVM in immediate mode, so it is
worth to support both staged and immediate mode well in GPUVM. To use
immediate mode, we must manage the vm_bos and vas during the fence
signalling critical path.

The first part of that work was the introduction of a fence signalling
safe mutex for the GEMs GPUVA list in commit e7fa80e2932c ("drm_gem: add
mutex to drm_gem_object.gpuva").

This is series the second part of that work: Dropping a vm_bo object in
the fence signalling critical path is problematic for two reasons:

* When using DRM_GPUVM_RESV_PROTECTED, you cannot remove the vm_bo from
  the extobj/evicted lists during the fence signalling path.
* Dropping a vm_bo could lead to the GEM object getting destroyed.
  The requirement that GEM object cleanup is fence signalling safe is
  dubious and likely to be violated in practice.

Panthor already has its own custom implementation of postponing vm_bo
cleanup. Take inspiration from that by moving the logic into GPUVM, and
adjust Panthor to use the new GPUVM logic.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
Alice Ryhl (2):
      drm/gpuvm: add deferred vm_bo cleanup
      panthor: use drm_gpuva_unlink_defer()

 drivers/gpu/drm/drm_gpuvm.c           | 167 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/panthor/panthor_mmu.c | 112 ++++-------------------
 include/drm/drm_gpuvm.h               |  26 ++++++
 3 files changed, 211 insertions(+), 94 deletions(-)
---
base-commit: 4c67b73907214994f87cad795195c46fe63c1e1c
change-id: 20250905-vmbo-defer-3faf90d821f5

Best regards,
-- 
Alice Ryhl <aliceryhl@google.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-05 12:11 [PATCH 0/2] Defer vm_bo cleanup in GPUVM with DRM_GPUVM_IMMEDIATE_MODE Alice Ryhl
@ 2025-09-05 12:11 ` Alice Ryhl
  2025-09-05 13:25   ` Boris Brezillon
  2025-09-05 12:11 ` [PATCH 2/2] panthor: use drm_gpuva_unlink_defer() Alice Ryhl
  1 sibling, 1 reply; 27+ messages in thread
From: Alice Ryhl @ 2025-09-05 12:11 UTC (permalink / raw)
  To: Danilo Krummrich, Matthew Brost, Thomas Hellström
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Boris Brezillon, Steven Price, Daniel Almeida,
	Liviu Dudau, dri-devel, linux-kernel, rust-for-linux, Alice Ryhl

When using GPUVM in immediate mode, it is necessary to call
drm_gpuvm_unlink() from the fence signalling critical path. However,
unlink may call drm_gpuvm_bo_put(), which causes some challenges:

1. drm_gpuvm_bo_put() often requires you to take resv locks, which you
   can't do from the fence signalling critical path.
2. drm_gpuvm_bo_put() calls drm_gem_object_put(), which is often going
   to be unsafe to call from the fence signalling critical path.

To solve these issues, add a deferred version of drm_gpuvm_unlink() that
adds the vm_bo to a deferred cleanup list, and then clean it up later.

The new methods take the GEMs GPUVA lock internally rather than letting
the caller do it because it also needs to perform an operation after
releasing the mutex again. This is to prevent freeing the GEM while
holding the mutex (more info as comments in the patch). This means that
the new methods can only be used with DRM_GPUVM_IMMEDIATE_MODE.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 drivers/gpu/drm/drm_gpuvm.c | 167 ++++++++++++++++++++++++++++++++++++++++++++
 include/drm/drm_gpuvm.h     |  26 +++++++
 2 files changed, 193 insertions(+)

diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index 78a1a4f095095e9379bdf604d583f6c8b9863ccb..849b6c08f83dcba832eda372bd3ce62b540e144b 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -876,6 +876,25 @@ __drm_gpuvm_bo_list_add(struct drm_gpuvm *gpuvm, spinlock_t *lock,
 	cond_spin_unlock(lock, !!lock);
 }
 
+/**
+ * drm_gpuvm_bo_is_dead() - check whether this vm_bo is scheduled for cleanup
+ * @vm_bo: the &drm_gpuvm_bo
+ *
+ * When a vm_bo is scheduled for cleanup using the bo_defer list, it is not
+ * immediately removed from the evict and extobj lists. Therefore, anyone
+ * iterating these lists should skip entries that are being destroyed.
+ *
+ * Checking the refcount without incrementing it is okay as long as the lock
+ * protecting the evict/extobj list is held for as long as you are using the
+ * vm_bo, because even if the refcount hits zero while you are using it, freeing
+ * the vm_bo requires taking the list's lock.
+ */
+static bool
+drm_gpuvm_bo_is_dead(struct drm_gpuvm_bo *vm_bo)
+{
+	return !kref_read(&vm_bo->kref);
+}
+
 /**
  * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
  * @__vm_bo: the &drm_gpuvm_bo
@@ -1081,6 +1100,9 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const char *name,
 	INIT_LIST_HEAD(&gpuvm->evict.list);
 	spin_lock_init(&gpuvm->evict.lock);
 
+	INIT_LIST_HEAD(&gpuvm->bo_defer.list);
+	spin_lock_init(&gpuvm->bo_defer.lock);
+
 	kref_init(&gpuvm->kref);
 
 	gpuvm->name = name ? name : "unknown";
@@ -1122,6 +1144,8 @@ drm_gpuvm_fini(struct drm_gpuvm *gpuvm)
 		 "Extobj list should be empty.\n");
 	drm_WARN(gpuvm->drm, !list_empty(&gpuvm->evict.list),
 		 "Evict list should be empty.\n");
+	drm_WARN(gpuvm->drm, !list_empty(&gpuvm->bo_defer.list),
+		 "VM BO cleanup list should be empty.\n");
 
 	drm_gem_object_put(gpuvm->r_obj);
 }
@@ -1217,6 +1241,9 @@ drm_gpuvm_prepare_objects_locked(struct drm_gpuvm *gpuvm,
 
 	drm_gpuvm_resv_assert_held(gpuvm);
 	list_for_each_entry(vm_bo, &gpuvm->extobj.list, list.entry.extobj) {
+		if (drm_gpuvm_bo_is_dead(vm_bo))
+			continue;
+
 		ret = exec_prepare_obj(exec, vm_bo->obj, num_fences);
 		if (ret)
 			break;
@@ -1460,6 +1487,9 @@ drm_gpuvm_validate_locked(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
 
 	list_for_each_entry_safe(vm_bo, next, &gpuvm->evict.list,
 				 list.entry.evict) {
+		if (drm_gpuvm_bo_is_dead(vm_bo))
+			continue;
+
 		ret = ops->vm_bo_validate(vm_bo, exec);
 		if (ret)
 			break;
@@ -1560,6 +1590,7 @@ drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
 
 	INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
 	INIT_LIST_HEAD(&vm_bo->list.entry.evict);
+	INIT_LIST_HEAD(&vm_bo->list.entry.bo_defer);
 
 	return vm_bo;
 }
@@ -1621,6 +1652,106 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
 }
 EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
 
+static void
+drm_gpuvm_bo_defer_locked(struct kref *kref)
+{
+	struct drm_gpuvm_bo *vm_bo = container_of(kref, struct drm_gpuvm_bo,
+						  kref);
+	struct drm_gpuvm *gpuvm = vm_bo->vm;
+
+	if (!drm_gpuvm_resv_protected(gpuvm)) {
+		drm_gpuvm_bo_list_del(vm_bo, extobj, true);
+		drm_gpuvm_bo_list_del(vm_bo, evict, true);
+	}
+
+	list_del(&vm_bo->list.entry.gem);
+	mutex_unlock(&vm_bo->obj->gpuva.lock);
+}
+
+/**
+ * drm_gpuvm_bo_put_deferred() - drop a struct drm_gpuvm_bo reference with
+ * deferred cleanup
+ * @vm_bo: the &drm_gpuvm_bo to release the reference of
+ *
+ * This releases a reference to @vm_bo.
+ *
+ * This might take and release the GEMs GPUVA lock. You should call
+ * drm_gpuvm_bo_deferred_cleanup() later to complete the cleanup process.
+ *
+ * Returns: true if vm_bo is being destroyed, false otherwise.
+ */
+bool
+drm_gpuvm_bo_put_deferred(struct drm_gpuvm_bo *vm_bo)
+{
+	bool defer;
+
+	drm_WARN_ON(vm_bo->vm->drm, !drm_gpuvm_immediate_mode(vm_bo->vm));
+
+	if (!vm_bo)
+		return false;
+
+	defer = kref_put_mutex(&vm_bo->kref, drm_gpuvm_bo_defer_locked,
+			       &vm_bo->obj->gpuva.lock);
+
+	/*
+	 * It's important that the GEM stays alive for the duration in which
+	 * drm_gpuvm_bo_defer_locked() holds the mutex, but the instant we add
+	 * the vm_bo to bo_defer, another thread might call
+	 * drm_gpuvm_bo_deferred_cleanup() and put the GEM. For this reason, we
+	 * add the vm_bo to bo_defer after releasing the GEM's mutex.
+	 */
+	if (defer)
+		drm_gpuvm_bo_list_add(vm_bo, bo_defer, true);
+
+	return defer;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put_deferred);
+
+/**
+ * drm_gpuvm_bo_deferred_cleanup() - clean up BOs in the deferred list
+ * deferred cleanup
+ * @gpuvm: the VM to clean up
+ *
+ * Cleans up &drm_gpuvm_bo instances in the deferred cleanup list.
+ */
+void
+drm_gpuvm_bo_deferred_cleanup(struct drm_gpuvm *gpuvm)
+{
+	const struct drm_gpuvm_ops *ops = gpuvm->ops;
+	struct drm_gpuvm_bo *vm_bo;
+	struct drm_gem_object *obj;
+	LIST_HEAD(bo_defer);
+
+	spin_lock(&gpuvm->bo_defer.lock);
+	list_replace_init(&gpuvm->bo_defer.list, &bo_defer);
+	spin_unlock(&gpuvm->bo_defer.lock);
+
+	if (drm_gpuvm_resv_protected(gpuvm)) {
+		dma_resv_lock(drm_gpuvm_resv(gpuvm), NULL);
+		list_for_each_entry(vm_bo, &bo_defer, list.entry.bo_defer) {
+			drm_gpuvm_bo_list_del(vm_bo, extobj, false);
+			drm_gpuvm_bo_list_del(vm_bo, evict, false);
+		}
+		dma_resv_unlock(drm_gpuvm_resv(gpuvm));
+	}
+
+	while (!list_empty(&bo_defer)) {
+		vm_bo = list_first_entry(&bo_defer,
+			struct drm_gpuvm_bo, list.entry.bo_defer);
+		obj = vm_bo->obj;
+
+		list_del(&vm_bo->list.entry.bo_defer);
+		if (ops && ops->vm_bo_free)
+			ops->vm_bo_free(vm_bo);
+		else
+			kfree(vm_bo);
+
+		drm_gpuvm_put(gpuvm);
+		drm_gem_object_put(obj);
+	}
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_deferred_cleanup);
+
 static struct drm_gpuvm_bo *
 __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
 		    struct drm_gem_object *obj)
@@ -1948,6 +2079,42 @@ drm_gpuva_unlink(struct drm_gpuva *va)
 }
 EXPORT_SYMBOL_GPL(drm_gpuva_unlink);
 
+/**
+ * drm_gpuva_unlink_defer() - unlink a &drm_gpuva with deferred vm_bo cleanup
+ * @va: the &drm_gpuva to unlink
+ *
+ * Similar to drm_gpuva_unlink(), but uses drm_gpuvm_bo_put_deferred() and takes
+ * the lock for the caller.
+ */
+void
+drm_gpuva_unlink_defer(struct drm_gpuva *va)
+{
+	struct drm_gem_object *obj = va->gem.obj;
+	struct drm_gpuvm_bo *vm_bo = va->vm_bo;
+
+	if (unlikely(!obj))
+		return;
+
+	drm_WARN_ON(vm_bo->vm->drm, !drm_gpuvm_immediate_mode(vm_bo->vm));
+
+	mutex_lock(&obj->gpuva.lock);
+	list_del_init(&va->gem.entry);
+
+	/*
+	 * This is drm_gpuvm_bo_put_deferred() slightly modified since we
+	 * already hold the mutex. It's important that we add the vm_bo to
+	 * bo_defer after releasing the mutex for the same reason as in
+	 * drm_gpuvm_bo_put_deferred().
+	 */
+	if (kref_put(&vm_bo->kref, drm_gpuvm_bo_defer_locked))
+		drm_gpuvm_bo_list_add(vm_bo, bo_defer, true);
+	else
+		mutex_unlock(&obj->gpuva.lock);
+
+	va->vm_bo = NULL;
+}
+EXPORT_SYMBOL_GPL(drm_gpuva_unlink_defer);
+
 /**
  * drm_gpuva_find_first() - find the first &drm_gpuva in the given range
  * @gpuvm: the &drm_gpuvm to search in
diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
index 727b8f336fad0d853998e4379cbd374155468e18..1f80389e14312f552a8abc95d12f52f83beb9be8 100644
--- a/include/drm/drm_gpuvm.h
+++ b/include/drm/drm_gpuvm.h
@@ -152,6 +152,7 @@ void drm_gpuva_remove(struct drm_gpuva *va);
 
 void drm_gpuva_link(struct drm_gpuva *va, struct drm_gpuvm_bo *vm_bo);
 void drm_gpuva_unlink(struct drm_gpuva *va);
+void drm_gpuva_unlink_defer(struct drm_gpuva *va);
 
 struct drm_gpuva *drm_gpuva_find(struct drm_gpuvm *gpuvm,
 				 u64 addr, u64 range);
@@ -331,6 +332,22 @@ struct drm_gpuvm {
 		 */
 		spinlock_t lock;
 	} evict;
+
+	/**
+	 * @bo_defer: structure holding vm_bos that need to be destroyed
+	 */
+	struct {
+		/**
+		 * @bo_defer.list: &list_head storing &drm_gpuvm_bos that need
+		 * to be destroyed
+		 */
+		struct list_head list;
+
+		/**
+		 * @bo_defer.lock: spinlock to protect the bo_defer list
+		 */
+		spinlock_t lock;
+	} bo_defer;
 };
 
 void drm_gpuvm_init(struct drm_gpuvm *gpuvm, const char *name,
@@ -714,6 +731,12 @@ struct drm_gpuvm_bo {
 			 * &drm_gpuvms evict list.
 			 */
 			struct list_head evict;
+
+			/**
+			 * @list.entry.bo_defer: List entry to attach to
+			 * the &drm_gpuvms bo_defer list.
+			 */
+			struct list_head bo_defer;
 		} entry;
 	} list;
 };
@@ -746,6 +769,9 @@ drm_gpuvm_bo_get(struct drm_gpuvm_bo *vm_bo)
 
 bool drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo);
 
+bool drm_gpuvm_bo_put_deferred(struct drm_gpuvm_bo *vm_bo);
+void drm_gpuvm_bo_deferred_cleanup(struct drm_gpuvm *gpuvm);
+
 struct drm_gpuvm_bo *
 drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
 		  struct drm_gem_object *obj);

-- 
2.51.0.355.g5224444f11-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] panthor: use drm_gpuva_unlink_defer()
  2025-09-05 12:11 [PATCH 0/2] Defer vm_bo cleanup in GPUVM with DRM_GPUVM_IMMEDIATE_MODE Alice Ryhl
  2025-09-05 12:11 ` [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup Alice Ryhl
@ 2025-09-05 12:11 ` Alice Ryhl
  2025-09-05 12:52   ` Boris Brezillon
  1 sibling, 1 reply; 27+ messages in thread
From: Alice Ryhl @ 2025-09-05 12:11 UTC (permalink / raw)
  To: Danilo Krummrich, Matthew Brost, Thomas Hellström
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Boris Brezillon, Steven Price, Daniel Almeida,
	Liviu Dudau, dri-devel, linux-kernel, rust-for-linux, Alice Ryhl

Instead of manually deferring cleanup of vm_bos, use the new GPUVM
infrastructure for doing so.

To avoid manual management of vm_bo refcounts, the panthor_vma_link()
and panthor_vma_unlink() methods are changed to get and put a vm_bo
refcount on the vm_bo. This simplifies the code a lot. I preserved the
behavior where panthor_gpuva_sm_step_map() drops the refcount right away
rather than letting panthor_vm_cleanup_op_ctx() do it later.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 drivers/gpu/drm/panthor/panthor_mmu.c | 112 ++++++----------------------------
 1 file changed, 18 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index 6dec4354e3789d17c5a87fc8de3bc86764b804bc..4922da0b106aec2bdf657ce4c596acb9c63797ce 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -181,20 +181,6 @@ struct panthor_vm_op_ctx {
 		u64 range;
 	} va;
 
-	/**
-	 * @returned_vmas: List of panthor_vma objects returned after a VM operation.
-	 *
-	 * For unmap operations, this will contain all VMAs that were covered by the
-	 * specified VA range.
-	 *
-	 * For map operations, this will contain all VMAs that previously mapped to
-	 * the specified VA range.
-	 *
-	 * Those VMAs, and the resources they point to will be released as part of
-	 * the op_ctx cleanup operation.
-	 */
-	struct list_head returned_vmas;
-
 	/** @map: Fields specific to a map operation. */
 	struct {
 		/** @map.vm_bo: Buffer object to map. */
@@ -1081,47 +1067,18 @@ void panthor_vm_free_va(struct panthor_vm *vm, struct drm_mm_node *va_node)
 	mutex_unlock(&vm->mm_lock);
 }
 
-static void panthor_vm_bo_put(struct drm_gpuvm_bo *vm_bo)
+static void panthor_vm_bo_free(struct drm_gpuvm_bo *vm_bo)
 {
 	struct panthor_gem_object *bo = to_panthor_bo(vm_bo->obj);
-	struct drm_gpuvm *vm = vm_bo->vm;
-	bool unpin;
-
-	/* We must retain the GEM before calling drm_gpuvm_bo_put(),
-	 * otherwise the mutex might be destroyed while we hold it.
-	 * Same goes for the VM, since we take the VM resv lock.
-	 */
-	drm_gem_object_get(&bo->base.base);
-	drm_gpuvm_get(vm);
-
-	/* We take the resv lock to protect against concurrent accesses to the
-	 * gpuvm evicted/extobj lists that are modified in
-	 * drm_gpuvm_bo_destroy(), which is called if drm_gpuvm_bo_put()
-	 * releases sthe last vm_bo reference.
-	 * We take the BO GPUVA list lock to protect the vm_bo removal from the
-	 * GEM vm_bo list.
-	 */
-	dma_resv_lock(drm_gpuvm_resv(vm), NULL);
-	mutex_lock(&bo->base.base.gpuva.lock);
-	unpin = drm_gpuvm_bo_put(vm_bo);
-	mutex_unlock(&bo->base.base.gpuva.lock);
-	dma_resv_unlock(drm_gpuvm_resv(vm));
 
-	/* If the vm_bo object was destroyed, release the pin reference that
-	 * was hold by this object.
-	 */
-	if (unpin && !drm_gem_is_imported(&bo->base.base))
+	if (!drm_gem_is_imported(&bo->base.base))
 		drm_gem_shmem_unpin(&bo->base);
-
-	drm_gpuvm_put(vm);
-	drm_gem_object_put(&bo->base.base);
+	kfree(vm_bo);
 }
 
 static void panthor_vm_cleanup_op_ctx(struct panthor_vm_op_ctx *op_ctx,
 				      struct panthor_vm *vm)
 {
-	struct panthor_vma *vma, *tmp_vma;
-
 	u32 remaining_pt_count = op_ctx->rsvd_page_tables.count -
 				 op_ctx->rsvd_page_tables.ptr;
 
@@ -1134,16 +1091,12 @@ static void panthor_vm_cleanup_op_ctx(struct panthor_vm_op_ctx *op_ctx,
 	kfree(op_ctx->rsvd_page_tables.pages);
 
 	if (op_ctx->map.vm_bo)
-		panthor_vm_bo_put(op_ctx->map.vm_bo);
+		drm_gpuvm_bo_put_deferred(op_ctx->map.vm_bo);
 
 	for (u32 i = 0; i < ARRAY_SIZE(op_ctx->preallocated_vmas); i++)
 		kfree(op_ctx->preallocated_vmas[i]);
 
-	list_for_each_entry_safe(vma, tmp_vma, &op_ctx->returned_vmas, node) {
-		list_del(&vma->node);
-		panthor_vm_bo_put(vma->base.vm_bo);
-		kfree(vma);
-	}
+	drm_gpuvm_bo_deferred_cleanup(&vm->base);
 }
 
 static struct panthor_vma *
@@ -1232,7 +1185,6 @@ static int panthor_vm_prepare_map_op_ctx(struct panthor_vm_op_ctx *op_ctx,
 		return -EINVAL;
 
 	memset(op_ctx, 0, sizeof(*op_ctx));
-	INIT_LIST_HEAD(&op_ctx->returned_vmas);
 	op_ctx->flags = flags;
 	op_ctx->va.range = size;
 	op_ctx->va.addr = va;
@@ -1243,7 +1195,9 @@ static int panthor_vm_prepare_map_op_ctx(struct panthor_vm_op_ctx *op_ctx,
 
 	if (!drm_gem_is_imported(&bo->base.base)) {
 		/* Pre-reserve the BO pages, so the map operation doesn't have to
-		 * allocate.
+		 * allocate. This pin is dropped in panthor_vm_bo_free(), so
+		 * once we call drm_gpuvm_bo_create(), GPUVM will take care of
+		 * dropping the pin for us.
 		 */
 		ret = drm_gem_shmem_pin(&bo->base);
 		if (ret)
@@ -1263,9 +1217,6 @@ static int panthor_vm_prepare_map_op_ctx(struct panthor_vm_op_ctx *op_ctx,
 
 	preallocated_vm_bo = drm_gpuvm_bo_create(&vm->base, &bo->base.base);
 	if (!preallocated_vm_bo) {
-		if (!drm_gem_is_imported(&bo->base.base))
-			drm_gem_shmem_unpin(&bo->base);
-
 		ret = -ENOMEM;
 		goto err_cleanup;
 	}
@@ -1282,16 +1233,6 @@ static int panthor_vm_prepare_map_op_ctx(struct panthor_vm_op_ctx *op_ctx,
 	mutex_unlock(&bo->base.base.gpuva.lock);
 	dma_resv_unlock(panthor_vm_resv(vm));
 
-	/* If the a vm_bo for this <VM,BO> combination exists, it already
-	 * retains a pin ref, and we can release the one we took earlier.
-	 *
-	 * If our pre-allocated vm_bo is picked, it now retains the pin ref,
-	 * which will be released in panthor_vm_bo_put().
-	 */
-	if (preallocated_vm_bo != op_ctx->map.vm_bo &&
-	    !drm_gem_is_imported(&bo->base.base))
-		drm_gem_shmem_unpin(&bo->base);
-
 	op_ctx->map.bo_offset = offset;
 
 	/* L1, L2 and L3 page tables.
@@ -1339,7 +1280,6 @@ static int panthor_vm_prepare_unmap_op_ctx(struct panthor_vm_op_ctx *op_ctx,
 	int ret;
 
 	memset(op_ctx, 0, sizeof(*op_ctx));
-	INIT_LIST_HEAD(&op_ctx->returned_vmas);
 	op_ctx->va.range = size;
 	op_ctx->va.addr = va;
 	op_ctx->flags = DRM_PANTHOR_VM_BIND_OP_TYPE_UNMAP;
@@ -1387,7 +1327,6 @@ static void panthor_vm_prepare_sync_only_op_ctx(struct panthor_vm_op_ctx *op_ctx
 						struct panthor_vm *vm)
 {
 	memset(op_ctx, 0, sizeof(*op_ctx));
-	INIT_LIST_HEAD(&op_ctx->returned_vmas);
 	op_ctx->flags = DRM_PANTHOR_VM_BIND_OP_TYPE_SYNC_ONLY;
 }
 
@@ -2033,26 +1972,12 @@ static void panthor_vma_link(struct panthor_vm *vm,
 
 	mutex_lock(&bo->base.base.gpuva.lock);
 	drm_gpuva_link(&vma->base, vm_bo);
-	drm_WARN_ON(&vm->ptdev->base, drm_gpuvm_bo_put(vm_bo));
 	mutex_unlock(&bo->base.base.gpuva.lock);
 }
 
-static void panthor_vma_unlink(struct panthor_vm *vm,
-			       struct panthor_vma *vma)
+static void panthor_vma_unlink(struct panthor_vma *vma)
 {
-	struct panthor_gem_object *bo = to_panthor_bo(vma->base.gem.obj);
-	struct drm_gpuvm_bo *vm_bo = drm_gpuvm_bo_get(vma->base.vm_bo);
-
-	mutex_lock(&bo->base.base.gpuva.lock);
-	drm_gpuva_unlink(&vma->base);
-	mutex_unlock(&bo->base.base.gpuva.lock);
-
-	/* drm_gpuva_unlink() release the vm_bo, but we manually retained it
-	 * when entering this function, so we can implement deferred VMA
-	 * destruction. Re-assign it here.
-	 */
-	vma->base.vm_bo = vm_bo;
-	list_add_tail(&vma->node, &vm->op_ctx->returned_vmas);
+	drm_gpuva_unlink_defer(&vma->base);
 }
 
 static void panthor_vma_init(struct panthor_vma *vma, u32 flags)
@@ -2084,12 +2009,12 @@ static int panthor_gpuva_sm_step_map(struct drm_gpuva_op *op, void *priv)
 	if (ret)
 		return ret;
 
-	/* Ref owned by the mapping now, clear the obj field so we don't release the
-	 * pinning/obj ref behind GPUVA's back.
-	 */
 	drm_gpuva_map(&vm->base, &vma->base, &op->map);
 	panthor_vma_link(vm, vma, op_ctx->map.vm_bo);
+
+	drm_gpuvm_bo_put_deferred(op_ctx->map.vm_bo);
 	op_ctx->map.vm_bo = NULL;
+
 	return 0;
 }
 
@@ -2128,16 +2053,14 @@ static int panthor_gpuva_sm_step_remap(struct drm_gpuva_op *op,
 		 * owned by the old mapping which will be released when this
 		 * mapping is destroyed, we need to grab a ref here.
 		 */
-		panthor_vma_link(vm, prev_vma,
-				 drm_gpuvm_bo_get(op->remap.unmap->va->vm_bo));
+		panthor_vma_link(vm, prev_vma, op->remap.unmap->va->vm_bo);
 	}
 
 	if (next_vma) {
-		panthor_vma_link(vm, next_vma,
-				 drm_gpuvm_bo_get(op->remap.unmap->va->vm_bo));
+		panthor_vma_link(vm, next_vma, op->remap.unmap->va->vm_bo);
 	}
 
-	panthor_vma_unlink(vm, unmap_vma);
+	panthor_vma_unlink(unmap_vma);
 	return 0;
 }
 
@@ -2154,12 +2077,13 @@ static int panthor_gpuva_sm_step_unmap(struct drm_gpuva_op *op,
 		return ret;
 
 	drm_gpuva_unmap(&op->unmap);
-	panthor_vma_unlink(vm, unmap_vma);
+	panthor_vma_unlink(unmap_vma);
 	return 0;
 }
 
 static const struct drm_gpuvm_ops panthor_gpuvm_ops = {
 	.vm_free = panthor_vm_free,
+	.vm_bo_free = panthor_vm_bo_free,
 	.sm_step_map = panthor_gpuva_sm_step_map,
 	.sm_step_remap = panthor_gpuva_sm_step_remap,
 	.sm_step_unmap = panthor_gpuva_sm_step_unmap,

-- 
2.51.0.355.g5224444f11-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] panthor: use drm_gpuva_unlink_defer()
  2025-09-05 12:11 ` [PATCH 2/2] panthor: use drm_gpuva_unlink_defer() Alice Ryhl
@ 2025-09-05 12:52   ` Boris Brezillon
  2025-09-05 13:01     ` Alice Ryhl
  0 siblings, 1 reply; 27+ messages in thread
From: Boris Brezillon @ 2025-09-05 12:52 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Danilo Krummrich, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Fri, 05 Sep 2025 12:11:29 +0000
Alice Ryhl <aliceryhl@google.com> wrote:

>  static void panthor_vm_cleanup_op_ctx(struct panthor_vm_op_ctx *op_ctx,
>  				      struct panthor_vm *vm)
>  {
> -	struct panthor_vma *vma, *tmp_vma;
> -
>  	u32 remaining_pt_count = op_ctx->rsvd_page_tables.count -
>  				 op_ctx->rsvd_page_tables.ptr;
>  
> @@ -1134,16 +1091,12 @@ static void panthor_vm_cleanup_op_ctx(struct panthor_vm_op_ctx *op_ctx,
>  	kfree(op_ctx->rsvd_page_tables.pages);
>  
>  	if (op_ctx->map.vm_bo)
> -		panthor_vm_bo_put(op_ctx->map.vm_bo);
> +		drm_gpuvm_bo_put_deferred(op_ctx->map.vm_bo);
>  
>  	for (u32 i = 0; i < ARRAY_SIZE(op_ctx->preallocated_vmas); i++)
>  		kfree(op_ctx->preallocated_vmas[i]);
>  
> -	list_for_each_entry_safe(vma, tmp_vma, &op_ctx->returned_vmas, node) {
> -		list_del(&vma->node);
> -		panthor_vm_bo_put(vma->base.vm_bo);
> -		kfree(vma);

Maybe I'm missing something, but I don't see the VMAs being freed in
this new version.

> -	}
> +	drm_gpuvm_bo_deferred_cleanup(&vm->base);
>  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] panthor: use drm_gpuva_unlink_defer()
  2025-09-05 12:52   ` Boris Brezillon
@ 2025-09-05 13:01     ` Alice Ryhl
  0 siblings, 0 replies; 27+ messages in thread
From: Alice Ryhl @ 2025-09-05 13:01 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Danilo Krummrich, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Fri, Sep 5, 2025 at 2:53 PM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Fri, 05 Sep 2025 12:11:29 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
>
> >  static void panthor_vm_cleanup_op_ctx(struct panthor_vm_op_ctx *op_ctx,
> >                                     struct panthor_vm *vm)
> >  {
> > -     struct panthor_vma *vma, *tmp_vma;
> > -
> >       u32 remaining_pt_count = op_ctx->rsvd_page_tables.count -
> >                                op_ctx->rsvd_page_tables.ptr;
> >
> > @@ -1134,16 +1091,12 @@ static void panthor_vm_cleanup_op_ctx(struct panthor_vm_op_ctx *op_ctx,
> >       kfree(op_ctx->rsvd_page_tables.pages);
> >
> >       if (op_ctx->map.vm_bo)
> > -             panthor_vm_bo_put(op_ctx->map.vm_bo);
> > +             drm_gpuvm_bo_put_deferred(op_ctx->map.vm_bo);
> >
> >       for (u32 i = 0; i < ARRAY_SIZE(op_ctx->preallocated_vmas); i++)
> >               kfree(op_ctx->preallocated_vmas[i]);
> >
> > -     list_for_each_entry_safe(vma, tmp_vma, &op_ctx->returned_vmas, node) {
> > -             list_del(&vma->node);
> > -             panthor_vm_bo_put(vma->base.vm_bo);
> > -             kfree(vma);
>
> Maybe I'm missing something, but I don't see the VMAs being freed in
> this new version.

Sorry you are right. We can kfree the vma right away after unlink(),
but I forgot to add that.

Alice

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-05 12:11 ` [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup Alice Ryhl
@ 2025-09-05 13:25   ` Boris Brezillon
  2025-09-05 18:18     ` Alice Ryhl
  0 siblings, 1 reply; 27+ messages in thread
From: Boris Brezillon @ 2025-09-05 13:25 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Danilo Krummrich, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Fri, 05 Sep 2025 12:11:28 +0000
Alice Ryhl <aliceryhl@google.com> wrote:

> When using GPUVM in immediate mode, it is necessary to call
> drm_gpuvm_unlink() from the fence signalling critical path. However,
> unlink may call drm_gpuvm_bo_put(), which causes some challenges:
> 
> 1. drm_gpuvm_bo_put() often requires you to take resv locks, which you
>    can't do from the fence signalling critical path.
> 2. drm_gpuvm_bo_put() calls drm_gem_object_put(), which is often going
>    to be unsafe to call from the fence signalling critical path.
> 
> To solve these issues, add a deferred version of drm_gpuvm_unlink() that
> adds the vm_bo to a deferred cleanup list, and then clean it up later.
> 
> The new methods take the GEMs GPUVA lock internally rather than letting
> the caller do it because it also needs to perform an operation after
> releasing the mutex again. This is to prevent freeing the GEM while
> holding the mutex (more info as comments in the patch). This means that
> the new methods can only be used with DRM_GPUVM_IMMEDIATE_MODE.
> 
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
>  drivers/gpu/drm/drm_gpuvm.c | 167 ++++++++++++++++++++++++++++++++++++++++++++
>  include/drm/drm_gpuvm.h     |  26 +++++++
>  2 files changed, 193 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> index 78a1a4f095095e9379bdf604d583f6c8b9863ccb..849b6c08f83dcba832eda372bd3ce62b540e144b 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -876,6 +876,25 @@ __drm_gpuvm_bo_list_add(struct drm_gpuvm *gpuvm, spinlock_t *lock,
>  	cond_spin_unlock(lock, !!lock);
>  }
>  
> +/**
> + * drm_gpuvm_bo_is_dead() - check whether this vm_bo is scheduled for cleanup
> + * @vm_bo: the &drm_gpuvm_bo
> + *
> + * When a vm_bo is scheduled for cleanup using the bo_defer list, it is not
> + * immediately removed from the evict and extobj lists.

Maybe mention that it can't be, because those lists are protected with
the resv lock and we can't take this lock when we're in immediate mode?

> Therefore, anyone
> + * iterating these lists should skip entries that are being destroyed.
> + *
> + * Checking the refcount without incrementing it is okay as long as the lock
> + * protecting the evict/extobj list is held for as long as you are using the
> + * vm_bo, because even if the refcount hits zero while you are using it, freeing
> + * the vm_bo requires taking the list's lock.
> + */
> +static bool
> +drm_gpuvm_bo_is_dead(struct drm_gpuvm_bo *vm_bo)
> +{
> +	return !kref_read(&vm_bo->kref);

I'm not too sure I like the idea of [ab]using vm_bo::kref to defer the
vm_bo release. I get why it's done like that, but I'm wondering why we
don't defer the release of drm_gpuva objects instead (which is really
what's being released in va_unlink()). I can imagine drivers wanting to
attach resources to the gpuva that can't be released in the
dma-signalling path in the future, and if we're doing that at the gpuva
level, we also get rid of this kref dance, since the va will hold a
vm_bo ref until it's destroyed.

Any particular reason you went for vm_bo destruction deferral instead
of gpuva?

> +}
> +
>  /**
>   * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
>   * @__vm_bo: the &drm_gpuvm_bo
> @@ -1081,6 +1100,9 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const char *name,
>  	INIT_LIST_HEAD(&gpuvm->evict.list);
>  	spin_lock_init(&gpuvm->evict.lock);
>  
> +	INIT_LIST_HEAD(&gpuvm->bo_defer.list);
> +	spin_lock_init(&gpuvm->bo_defer.lock);
> +
>  	kref_init(&gpuvm->kref);
>  
>  	gpuvm->name = name ? name : "unknown";
> @@ -1122,6 +1144,8 @@ drm_gpuvm_fini(struct drm_gpuvm *gpuvm)
>  		 "Extobj list should be empty.\n");
>  	drm_WARN(gpuvm->drm, !list_empty(&gpuvm->evict.list),
>  		 "Evict list should be empty.\n");
> +	drm_WARN(gpuvm->drm, !list_empty(&gpuvm->bo_defer.list),
> +		 "VM BO cleanup list should be empty.\n");
>  
>  	drm_gem_object_put(gpuvm->r_obj);
>  }
> @@ -1217,6 +1241,9 @@ drm_gpuvm_prepare_objects_locked(struct drm_gpuvm *gpuvm,
>  
>  	drm_gpuvm_resv_assert_held(gpuvm);
>  	list_for_each_entry(vm_bo, &gpuvm->extobj.list, list.entry.extobj) {
> +		if (drm_gpuvm_bo_is_dead(vm_bo))
> +			continue;
> +
>  		ret = exec_prepare_obj(exec, vm_bo->obj, num_fences);
>  		if (ret)
>  			break;
> @@ -1460,6 +1487,9 @@ drm_gpuvm_validate_locked(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
>  
>  	list_for_each_entry_safe(vm_bo, next, &gpuvm->evict.list,
>  				 list.entry.evict) {
> +		if (drm_gpuvm_bo_is_dead(vm_bo))
> +			continue;
> +
>  		ret = ops->vm_bo_validate(vm_bo, exec);
>  		if (ret)
>  			break;
> @@ -1560,6 +1590,7 @@ drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
>  
>  	INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>  	INIT_LIST_HEAD(&vm_bo->list.entry.evict);
> +	INIT_LIST_HEAD(&vm_bo->list.entry.bo_defer);
>  
>  	return vm_bo;
>  }
> @@ -1621,6 +1652,106 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>  }
>  EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>  
> +static void
> +drm_gpuvm_bo_defer_locked(struct kref *kref)
> +{
> +	struct drm_gpuvm_bo *vm_bo = container_of(kref, struct drm_gpuvm_bo,
> +						  kref);
> +	struct drm_gpuvm *gpuvm = vm_bo->vm;
> +
> +	if (!drm_gpuvm_resv_protected(gpuvm)) {
> +		drm_gpuvm_bo_list_del(vm_bo, extobj, true);
> +		drm_gpuvm_bo_list_del(vm_bo, evict, true);
> +	}
> +
> +	list_del(&vm_bo->list.entry.gem);
> +	mutex_unlock(&vm_bo->obj->gpuva.lock);

I got tricked by this implicit unlock, and the conditional unlocks it
creates in drm_gpuva_unlink_defer(). Honestly, I'd rather see this
unlocked moved to drm_gpuva_unlink_defer() and a conditional unlock
added to drm_gpuvm_bo_put_deferred(), because it's easier to reason
about when the lock/unlock calls are in the same function
(kref_put_mutex() being the equivalent of a conditional lock).

> +}
> +
> +/**
> + * drm_gpuvm_bo_put_deferred() - drop a struct drm_gpuvm_bo reference with
> + * deferred cleanup
> + * @vm_bo: the &drm_gpuvm_bo to release the reference of
> + *
> + * This releases a reference to @vm_bo.
> + *
> + * This might take and release the GEMs GPUVA lock. You should call
> + * drm_gpuvm_bo_deferred_cleanup() later to complete the cleanup process.
> + *
> + * Returns: true if vm_bo is being destroyed, false otherwise.
> + */
> +bool
> +drm_gpuvm_bo_put_deferred(struct drm_gpuvm_bo *vm_bo)
> +{
> +	bool defer;
> +
> +	drm_WARN_ON(vm_bo->vm->drm, !drm_gpuvm_immediate_mode(vm_bo->vm));
> +
> +	if (!vm_bo)
> +		return false;
> +
> +	defer = kref_put_mutex(&vm_bo->kref, drm_gpuvm_bo_defer_locked,
> +			       &vm_bo->obj->gpuva.lock);
> +
> +	/*
> +	 * It's important that the GEM stays alive for the duration in which
> +	 * drm_gpuvm_bo_defer_locked() holds the mutex, but the instant we add
> +	 * the vm_bo to bo_defer, another thread might call
> +	 * drm_gpuvm_bo_deferred_cleanup() and put the GEM. For this reason, we
> +	 * add the vm_bo to bo_defer after releasing the GEM's mutex.
> +	 */
> +	if (defer)
> +		drm_gpuvm_bo_list_add(vm_bo, bo_defer, true);
> +
> +	return defer;
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put_deferred);
> +
> +/**
> + * drm_gpuvm_bo_deferred_cleanup() - clean up BOs in the deferred list
> + * deferred cleanup
> + * @gpuvm: the VM to clean up
> + *
> + * Cleans up &drm_gpuvm_bo instances in the deferred cleanup list.
> + */
> +void
> +drm_gpuvm_bo_deferred_cleanup(struct drm_gpuvm *gpuvm)
> +{
> +	const struct drm_gpuvm_ops *ops = gpuvm->ops;
> +	struct drm_gpuvm_bo *vm_bo;
> +	struct drm_gem_object *obj;
> +	LIST_HEAD(bo_defer);
> +
> +	spin_lock(&gpuvm->bo_defer.lock);
> +	list_replace_init(&gpuvm->bo_defer.list, &bo_defer);
> +	spin_unlock(&gpuvm->bo_defer.lock);
> +
> +	if (drm_gpuvm_resv_protected(gpuvm)) {
> +		dma_resv_lock(drm_gpuvm_resv(gpuvm), NULL);
> +		list_for_each_entry(vm_bo, &bo_defer, list.entry.bo_defer) {
> +			drm_gpuvm_bo_list_del(vm_bo, extobj, false);
> +			drm_gpuvm_bo_list_del(vm_bo, evict, false);
> +		}
> +		dma_resv_unlock(drm_gpuvm_resv(gpuvm));
> +	}
> +
> +	while (!list_empty(&bo_defer)) {
> +		vm_bo = list_first_entry(&bo_defer,
> +			struct drm_gpuvm_bo, list.entry.bo_defer);
> +		obj = vm_bo->obj;
> +
> +		list_del(&vm_bo->list.entry.bo_defer);
> +		if (ops && ops->vm_bo_free)
> +			ops->vm_bo_free(vm_bo);
> +		else
> +			kfree(vm_bo);
> +
> +		drm_gpuvm_put(gpuvm);
> +		drm_gem_object_put(obj);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_deferred_cleanup);
> +
>  static struct drm_gpuvm_bo *
>  __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>  		    struct drm_gem_object *obj)
> @@ -1948,6 +2079,42 @@ drm_gpuva_unlink(struct drm_gpuva *va)
>  }
>  EXPORT_SYMBOL_GPL(drm_gpuva_unlink);
>  
> +/**
> + * drm_gpuva_unlink_defer() - unlink a &drm_gpuva with deferred vm_bo cleanup
> + * @va: the &drm_gpuva to unlink
> + *
> + * Similar to drm_gpuva_unlink(), but uses drm_gpuvm_bo_put_deferred() and takes
> + * the lock for the caller.
> + */
> +void
> +drm_gpuva_unlink_defer(struct drm_gpuva *va)
> +{
> +	struct drm_gem_object *obj = va->gem.obj;
> +	struct drm_gpuvm_bo *vm_bo = va->vm_bo;
> +
> +	if (unlikely(!obj))
> +		return;
> +
> +	drm_WARN_ON(vm_bo->vm->drm, !drm_gpuvm_immediate_mode(vm_bo->vm));
> +
> +	mutex_lock(&obj->gpuva.lock);
> +	list_del_init(&va->gem.entry);
> +
> +	/*
> +	 * This is drm_gpuvm_bo_put_deferred() slightly modified since we
> +	 * already hold the mutex. It's important that we add the vm_bo to
> +	 * bo_defer after releasing the mutex for the same reason as in
> +	 * drm_gpuvm_bo_put_deferred().
> +	 */
> +	if (kref_put(&vm_bo->kref, drm_gpuvm_bo_defer_locked))
> +		drm_gpuvm_bo_list_add(vm_bo, bo_defer, true);
> +	else
> +		mutex_unlock(&obj->gpuva.lock);
> +
> +	va->vm_bo = NULL;
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuva_unlink_defer);
> +
>  /**
>   * drm_gpuva_find_first() - find the first &drm_gpuva in the given range
>   * @gpuvm: the &drm_gpuvm to search in
> diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
> index 727b8f336fad0d853998e4379cbd374155468e18..1f80389e14312f552a8abc95d12f52f83beb9be8 100644
> --- a/include/drm/drm_gpuvm.h
> +++ b/include/drm/drm_gpuvm.h
> @@ -152,6 +152,7 @@ void drm_gpuva_remove(struct drm_gpuva *va);
>  
>  void drm_gpuva_link(struct drm_gpuva *va, struct drm_gpuvm_bo *vm_bo);
>  void drm_gpuva_unlink(struct drm_gpuva *va);
> +void drm_gpuva_unlink_defer(struct drm_gpuva *va);
>  
>  struct drm_gpuva *drm_gpuva_find(struct drm_gpuvm *gpuvm,
>  				 u64 addr, u64 range);
> @@ -331,6 +332,22 @@ struct drm_gpuvm {
>  		 */
>  		spinlock_t lock;
>  	} evict;
> +
> +	/**
> +	 * @bo_defer: structure holding vm_bos that need to be destroyed
> +	 */
> +	struct {
> +		/**
> +		 * @bo_defer.list: &list_head storing &drm_gpuvm_bos that need
> +		 * to be destroyed
> +		 */
> +		struct list_head list;
> +
> +		/**
> +		 * @bo_defer.lock: spinlock to protect the bo_defer list
> +		 */
> +		spinlock_t lock;
> +	} bo_defer;
>  };
>  
>  void drm_gpuvm_init(struct drm_gpuvm *gpuvm, const char *name,
> @@ -714,6 +731,12 @@ struct drm_gpuvm_bo {
>  			 * &drm_gpuvms evict list.
>  			 */
>  			struct list_head evict;
> +
> +			/**
> +			 * @list.entry.bo_defer: List entry to attach to
> +			 * the &drm_gpuvms bo_defer list.
> +			 */
> +			struct list_head bo_defer;
>  		} entry;
>  	} list;
>  };
> @@ -746,6 +769,9 @@ drm_gpuvm_bo_get(struct drm_gpuvm_bo *vm_bo)
>  
>  bool drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo);
>  
> +bool drm_gpuvm_bo_put_deferred(struct drm_gpuvm_bo *vm_bo);
> +void drm_gpuvm_bo_deferred_cleanup(struct drm_gpuvm *gpuvm);
> +
>  struct drm_gpuvm_bo *
>  drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>  		  struct drm_gem_object *obj);
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-05 13:25   ` Boris Brezillon
@ 2025-09-05 18:18     ` Alice Ryhl
  2025-09-05 22:47       ` Danilo Krummrich
  0 siblings, 1 reply; 27+ messages in thread
From: Alice Ryhl @ 2025-09-05 18:18 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Danilo Krummrich, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Fri, Sep 5, 2025 at 3:25 PM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Fri, 05 Sep 2025 12:11:28 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
>
> > When using GPUVM in immediate mode, it is necessary to call
> > drm_gpuvm_unlink() from the fence signalling critical path. However,
> > unlink may call drm_gpuvm_bo_put(), which causes some challenges:
> >
> > 1. drm_gpuvm_bo_put() often requires you to take resv locks, which you
> >    can't do from the fence signalling critical path.
> > 2. drm_gpuvm_bo_put() calls drm_gem_object_put(), which is often going
> >    to be unsafe to call from the fence signalling critical path.
> >
> > To solve these issues, add a deferred version of drm_gpuvm_unlink() that
> > adds the vm_bo to a deferred cleanup list, and then clean it up later.
> >
> > The new methods take the GEMs GPUVA lock internally rather than letting
> > the caller do it because it also needs to perform an operation after
> > releasing the mutex again. This is to prevent freeing the GEM while
> > holding the mutex (more info as comments in the patch). This means that
> > the new methods can only be used with DRM_GPUVM_IMMEDIATE_MODE.
> >
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > ---
> >  drivers/gpu/drm/drm_gpuvm.c | 167 ++++++++++++++++++++++++++++++++++++++++++++
> >  include/drm/drm_gpuvm.h     |  26 +++++++
> >  2 files changed, 193 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> > index 78a1a4f095095e9379bdf604d583f6c8b9863ccb..849b6c08f83dcba832eda372bd3ce62b540e144b 100644
> > --- a/drivers/gpu/drm/drm_gpuvm.c
> > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > @@ -876,6 +876,25 @@ __drm_gpuvm_bo_list_add(struct drm_gpuvm *gpuvm, spinlock_t *lock,
> >       cond_spin_unlock(lock, !!lock);
> >  }
> >
> > +/**
> > + * drm_gpuvm_bo_is_dead() - check whether this vm_bo is scheduled for cleanup
> > + * @vm_bo: the &drm_gpuvm_bo
> > + *
> > + * When a vm_bo is scheduled for cleanup using the bo_defer list, it is not
> > + * immediately removed from the evict and extobj lists.
>
> Maybe mention that it can't be, because those lists are protected with
> the resv lock and we can't take this lock when we're in immediate mode?

Ok.

> > Therefore, anyone
> > + * iterating these lists should skip entries that are being destroyed.
> > + *
> > + * Checking the refcount without incrementing it is okay as long as the lock
> > + * protecting the evict/extobj list is held for as long as you are using the
> > + * vm_bo, because even if the refcount hits zero while you are using it, freeing
> > + * the vm_bo requires taking the list's lock.
> > + */
> > +static bool
> > +drm_gpuvm_bo_is_dead(struct drm_gpuvm_bo *vm_bo)
> > +{
> > +     return !kref_read(&vm_bo->kref);
>
> I'm not too sure I like the idea of [ab]using vm_bo::kref to defer the
> vm_bo release. I get why it's done like that, but I'm wondering why we
> don't defer the release of drm_gpuva objects instead (which is really
> what's being released in va_unlink()). I can imagine drivers wanting to
> attach resources to the gpuva that can't be released in the
> dma-signalling path in the future, and if we're doing that at the gpuva
> level, we also get rid of this kref dance, since the va will hold a
> vm_bo ref until it's destroyed.
>
> Any particular reason you went for vm_bo destruction deferral instead
> of gpuva?

All of the things that were unsafe to release in the signalling path
were tied to the vm_bo, so that is why I went for vm_bo cleanup.
Another advantage is that it lets us use the same deferred logic for
the vm_bo_put() call that drops the refcount from vm_bo_obtain().

Of course if gpuvas might have resources that need deferred cleanup,
that might change the situation somewhat.


> > +}
> > +
> >  /**
> >   * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
> >   * @__vm_bo: the &drm_gpuvm_bo
> > @@ -1081,6 +1100,9 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const char *name,
> >       INIT_LIST_HEAD(&gpuvm->evict.list);
> >       spin_lock_init(&gpuvm->evict.lock);
> >
> > +     INIT_LIST_HEAD(&gpuvm->bo_defer.list);
> > +     spin_lock_init(&gpuvm->bo_defer.lock);
> > +
> >       kref_init(&gpuvm->kref);
> >
> >       gpuvm->name = name ? name : "unknown";
> > @@ -1122,6 +1144,8 @@ drm_gpuvm_fini(struct drm_gpuvm *gpuvm)
> >                "Extobj list should be empty.\n");
> >       drm_WARN(gpuvm->drm, !list_empty(&gpuvm->evict.list),
> >                "Evict list should be empty.\n");
> > +     drm_WARN(gpuvm->drm, !list_empty(&gpuvm->bo_defer.list),
> > +              "VM BO cleanup list should be empty.\n");
> >
> >       drm_gem_object_put(gpuvm->r_obj);
> >  }
> > @@ -1217,6 +1241,9 @@ drm_gpuvm_prepare_objects_locked(struct drm_gpuvm *gpuvm,
> >
> >       drm_gpuvm_resv_assert_held(gpuvm);
> >       list_for_each_entry(vm_bo, &gpuvm->extobj.list, list.entry.extobj) {
> > +             if (drm_gpuvm_bo_is_dead(vm_bo))
> > +                     continue;
> > +
> >               ret = exec_prepare_obj(exec, vm_bo->obj, num_fences);
> >               if (ret)
> >                       break;
> > @@ -1460,6 +1487,9 @@ drm_gpuvm_validate_locked(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
> >
> >       list_for_each_entry_safe(vm_bo, next, &gpuvm->evict.list,
> >                                list.entry.evict) {
> > +             if (drm_gpuvm_bo_is_dead(vm_bo))
> > +                     continue;
> > +
> >               ret = ops->vm_bo_validate(vm_bo, exec);
> >               if (ret)
> >                       break;
> > @@ -1560,6 +1590,7 @@ drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
> >
> >       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
> >       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
> > +     INIT_LIST_HEAD(&vm_bo->list.entry.bo_defer);
> >
> >       return vm_bo;
> >  }
> > @@ -1621,6 +1652,106 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
> >  }
> >  EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
> >
> > +static void
> > +drm_gpuvm_bo_defer_locked(struct kref *kref)
> > +{
> > +     struct drm_gpuvm_bo *vm_bo = container_of(kref, struct drm_gpuvm_bo,
> > +                                               kref);
> > +     struct drm_gpuvm *gpuvm = vm_bo->vm;
> > +
> > +     if (!drm_gpuvm_resv_protected(gpuvm)) {
> > +             drm_gpuvm_bo_list_del(vm_bo, extobj, true);
> > +             drm_gpuvm_bo_list_del(vm_bo, evict, true);
> > +     }
> > +
> > +     list_del(&vm_bo->list.entry.gem);
> > +     mutex_unlock(&vm_bo->obj->gpuva.lock);
>
> I got tricked by this implicit unlock, and the conditional unlocks it
> creates in drm_gpuva_unlink_defer(). Honestly, I'd rather see this
> unlocked moved to drm_gpuva_unlink_defer() and a conditional unlock
> added to drm_gpuvm_bo_put_deferred(), because it's easier to reason
> about when the lock/unlock calls are in the same function
> (kref_put_mutex() being the equivalent of a conditional lock).

Ok. I followed the docs of kref_put_mutex() that say to unlock it from
the function.

Alice

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-05 18:18     ` Alice Ryhl
@ 2025-09-05 22:47       ` Danilo Krummrich
  2025-09-07 11:15         ` Alice Ryhl
  0 siblings, 1 reply; 27+ messages in thread
From: Danilo Krummrich @ 2025-09-05 22:47 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Boris Brezillon, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Fri Sep 5, 2025 at 8:18 PM CEST, Alice Ryhl wrote:
> On Fri, Sep 5, 2025 at 3:25 PM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
>> On Fri, 05 Sep 2025 12:11:28 +0000
>> Alice Ryhl <aliceryhl@google.com> wrote:
>> > +static bool
>> > +drm_gpuvm_bo_is_dead(struct drm_gpuvm_bo *vm_bo)
>> > +{
>> > +     return !kref_read(&vm_bo->kref);
>>
>> I'm not too sure I like the idea of [ab]using vm_bo::kref to defer the
>> vm_bo release. I get why it's done like that, but I'm wondering why we
>> don't defer the release of drm_gpuva objects instead (which is really
>> what's being released in va_unlink()). I can imagine drivers wanting to
>> attach resources to the gpuva that can't be released in the
>> dma-signalling path in the future, and if we're doing that at the gpuva
>> level, we also get rid of this kref dance, since the va will hold a
>> vm_bo ref until it's destroyed.
>>
>> Any particular reason you went for vm_bo destruction deferral instead
>> of gpuva?
>
> All of the things that were unsafe to release in the signalling path
> were tied to the vm_bo, so that is why I went for vm_bo cleanup.
> Another advantage is that it lets us use the same deferred logic for
> the vm_bo_put() call that drops the refcount from vm_bo_obtain().
>
> Of course if gpuvas might have resources that need deferred cleanup,
> that might change the situation somewhat.

I think we want to track PT(E) allocations, or rather reference counts of page
table structures carried by the drm_gpuva, but we don't need to release them on
drm_gpuva_unlink(), which is where we drop the reference count of the vm_bo.

Deferring drm_gpuva_unlink() isn't really an option I think, the GEMs list of
VM_BOs and the VM_BOs list of VAs is usually used in ttm_device_funcs::move to
map or unmap all VAs associated with a GEM object.

I think PT(E) reference counts etc. should be rather released when the drm_gpuva
is freed, i.e. page table allocations can be bound to the lifetime of a
drm_gpuva. Given that, I think that eventually we'll need a cleanup list for
those as well, since once they're removed from the VM tree (in the fence
signalling critical path), we loose access otherwise.

>> > +static void
>> > +drm_gpuvm_bo_defer_locked(struct kref *kref)
>> > +{
>> > +     struct drm_gpuvm_bo *vm_bo = container_of(kref, struct drm_gpuvm_bo,
>> > +                                               kref);
>> > +     struct drm_gpuvm *gpuvm = vm_bo->vm;
>> > +
>> > +     if (!drm_gpuvm_resv_protected(gpuvm)) {
>> > +             drm_gpuvm_bo_list_del(vm_bo, extobj, true);
>> > +             drm_gpuvm_bo_list_del(vm_bo, evict, true);
>> > +     }
>> > +
>> > +     list_del(&vm_bo->list.entry.gem);
>> > +     mutex_unlock(&vm_bo->obj->gpuva.lock);
>>
>> I got tricked by this implicit unlock, and the conditional unlocks it
>> creates in drm_gpuva_unlink_defer(). Honestly, I'd rather see this
>> unlocked moved to drm_gpuva_unlink_defer() and a conditional unlock
>> added to drm_gpuvm_bo_put_deferred(), because it's easier to reason
>> about when the lock/unlock calls are in the same function
>> (kref_put_mutex() being the equivalent of a conditional lock).
>
> Ok. I followed the docs of kref_put_mutex() that say to unlock it from
> the function.

Yes, please keep it the way it is, I don't want to deviate from what is
documented and everyone else does. Besides that, I also think it's a little
less error prone.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-05 22:47       ` Danilo Krummrich
@ 2025-09-07 11:15         ` Alice Ryhl
  2025-09-07 11:28           ` Danilo Krummrich
  2025-09-08  7:22           ` Boris Brezillon
  0 siblings, 2 replies; 27+ messages in thread
From: Alice Ryhl @ 2025-09-07 11:15 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Boris Brezillon, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Sat, Sep 06, 2025 at 12:47:36AM +0200, Danilo Krummrich wrote:
> On Fri Sep 5, 2025 at 8:18 PM CEST, Alice Ryhl wrote:
> > On Fri, Sep 5, 2025 at 3:25 PM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:
> >> On Fri, 05 Sep 2025 12:11:28 +0000
> >> Alice Ryhl <aliceryhl@google.com> wrote:
> >> > +static bool
> >> > +drm_gpuvm_bo_is_dead(struct drm_gpuvm_bo *vm_bo)
> >> > +{
> >> > +     return !kref_read(&vm_bo->kref);
> >>
> >> I'm not too sure I like the idea of [ab]using vm_bo::kref to defer the
> >> vm_bo release. I get why it's done like that, but I'm wondering why we
> >> don't defer the release of drm_gpuva objects instead (which is really
> >> what's being released in va_unlink()). I can imagine drivers wanting to
> >> attach resources to the gpuva that can't be released in the
> >> dma-signalling path in the future, and if we're doing that at the gpuva
> >> level, we also get rid of this kref dance, since the va will hold a
> >> vm_bo ref until it's destroyed.
> >>
> >> Any particular reason you went for vm_bo destruction deferral instead
> >> of gpuva?
> >
> > All of the things that were unsafe to release in the signalling path
> > were tied to the vm_bo, so that is why I went for vm_bo cleanup.
> > Another advantage is that it lets us use the same deferred logic for
> > the vm_bo_put() call that drops the refcount from vm_bo_obtain().
> >
> > Of course if gpuvas might have resources that need deferred cleanup,
> > that might change the situation somewhat.
> 
> I think we want to track PT(E) allocations, or rather reference counts of page
> table structures carried by the drm_gpuva, but we don't need to release them on
> drm_gpuva_unlink(), which is where we drop the reference count of the vm_bo.
> 
> Deferring drm_gpuva_unlink() isn't really an option I think, the GEMs list of
> VM_BOs and the VM_BOs list of VAs is usually used in ttm_device_funcs::move to
> map or unmap all VAs associated with a GEM object.
> 
> I think PT(E) reference counts etc. should be rather released when the drm_gpuva
> is freed, i.e. page table allocations can be bound to the lifetime of a
> drm_gpuva. Given that, I think that eventually we'll need a cleanup list for
> those as well, since once they're removed from the VM tree (in the fence
> signalling critical path), we loose access otherwise.

Hmm. Another more conceptual issue with deferring gpuva is that
"immediate mode" is defined as having the GPUVM match the GPU's actual
address space at all times, which deferred gpuva cleanup would go
against.

Deferring vm_bo cleanup doesn't have this issue because even though the
vm_bo isn't kfreed immediately, all GPUVM apis still treat it as-if it
isn't there anymore.

> >> > +static void
> >> > +drm_gpuvm_bo_defer_locked(struct kref *kref)
> >> > +{
> >> > +     struct drm_gpuvm_bo *vm_bo = container_of(kref, struct drm_gpuvm_bo,
> >> > +                                               kref);
> >> > +     struct drm_gpuvm *gpuvm = vm_bo->vm;
> >> > +
> >> > +     if (!drm_gpuvm_resv_protected(gpuvm)) {
> >> > +             drm_gpuvm_bo_list_del(vm_bo, extobj, true);
> >> > +             drm_gpuvm_bo_list_del(vm_bo, evict, true);
> >> > +     }
> >> > +
> >> > +     list_del(&vm_bo->list.entry.gem);
> >> > +     mutex_unlock(&vm_bo->obj->gpuva.lock);
> >>
> >> I got tricked by this implicit unlock, and the conditional unlocks it
> >> creates in drm_gpuva_unlink_defer(). Honestly, I'd rather see this
> >> unlocked moved to drm_gpuva_unlink_defer() and a conditional unlock
> >> added to drm_gpuvm_bo_put_deferred(), because it's easier to reason
> >> about when the lock/unlock calls are in the same function
> >> (kref_put_mutex() being the equivalent of a conditional lock).
> >
> > Ok. I followed the docs of kref_put_mutex() that say to unlock it from
> > the function.
> 
> Yes, please keep it the way it is, I don't want to deviate from what is
> documented and everyone else does. Besides that, I also think it's a little
> less error prone.

I gave it a try:

bool
drm_gpuvm_bo_put_deferred(struct drm_gpuvm_bo *vm_bo)
{
	drm_WARN_ON(vm_bo->vm->drm, !drm_gpuvm_immediate_mode(vm_bo->vm));

	if (!vm_bo)
		return false;

	if (kref_put_mutex(&vm_bo->kref, drm_gpuvm_bo_defer_locked,
			   &vm_bo->obj->gpuva.lock)) {
		/*
		 * It's important that the GEM stays alive for the duration in which
		 * drm_gpuvm_bo_defer_locked() holds the mutex, but the instant we add
		 * the vm_bo to bo_defer, another thread might call
		 * drm_gpuvm_bo_deferred_cleanup() and put the GEM. For this reason, we
		 * add the vm_bo to bo_defer *after* releasing the GEM's mutex.
		 */
		mutex_unlock(&vm_bo->obj->gpuva.lock);
		drm_gpuvm_bo_list_add(vm_bo, bo_defer, true);
		return true;
	}

	return false;
}

void
drm_gpuva_unlink_defer(struct drm_gpuva *va)
{
	struct drm_gem_object *obj = va->gem.obj;
	struct drm_gpuvm_bo *vm_bo = va->vm_bo;
	bool should_defer_bo;

	if (unlikely(!obj))
		return;

	drm_WARN_ON(vm_bo->vm->drm, !drm_gpuvm_immediate_mode(vm_bo->vm));

	mutex_lock(&obj->gpuva.lock);
	list_del_init(&va->gem.entry);

	/*
	 * This is drm_gpuvm_bo_put_deferred() slightly modified since we
	 * already hold the mutex. It's important that we add the vm_bo to
	 * bo_defer after releasing the mutex for the same reason as in
	 * drm_gpuvm_bo_put_deferred().
	 */
	should_defer_bo = kref_put(&vm_bo->kref, drm_gpuvm_bo_defer_locked);
	mutex_unlock(&obj->gpuva.lock);
	if (should_defer_bo)
		drm_gpuvm_bo_list_add(vm_bo, bo_defer, true);

	va->vm_bo = NULL;
}

I do think it looks relatively nice like this, particularly
drm_gpuva_unlink_defer(). But that's also the one not using
kref_put_mutex().

Alice

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-07 11:15         ` Alice Ryhl
@ 2025-09-07 11:28           ` Danilo Krummrich
  2025-09-07 11:39             ` Alice Ryhl
  2025-09-08  7:22           ` Boris Brezillon
  1 sibling, 1 reply; 27+ messages in thread
From: Danilo Krummrich @ 2025-09-07 11:28 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Boris Brezillon, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Sun Sep 7, 2025 at 1:15 PM CEST, Alice Ryhl wrote:
> On Sat, Sep 06, 2025 at 12:47:36AM +0200, Danilo Krummrich wrote:
>> On Fri Sep 5, 2025 at 8:18 PM CEST, Alice Ryhl wrote:
>> > On Fri, Sep 5, 2025 at 3:25 PM Boris Brezillon
>> > <boris.brezillon@collabora.com> wrote:
>> >> On Fri, 05 Sep 2025 12:11:28 +0000
>> >> Alice Ryhl <aliceryhl@google.com> wrote:
>> >> > +static bool
>> >> > +drm_gpuvm_bo_is_dead(struct drm_gpuvm_bo *vm_bo)
>> >> > +{
>> >> > +     return !kref_read(&vm_bo->kref);
>> >>
>> >> I'm not too sure I like the idea of [ab]using vm_bo::kref to defer the
>> >> vm_bo release. I get why it's done like that, but I'm wondering why we
>> >> don't defer the release of drm_gpuva objects instead (which is really
>> >> what's being released in va_unlink()). I can imagine drivers wanting to
>> >> attach resources to the gpuva that can't be released in the
>> >> dma-signalling path in the future, and if we're doing that at the gpuva
>> >> level, we also get rid of this kref dance, since the va will hold a
>> >> vm_bo ref until it's destroyed.
>> >>
>> >> Any particular reason you went for vm_bo destruction deferral instead
>> >> of gpuva?
>> >
>> > All of the things that were unsafe to release in the signalling path
>> > were tied to the vm_bo, so that is why I went for vm_bo cleanup.
>> > Another advantage is that it lets us use the same deferred logic for
>> > the vm_bo_put() call that drops the refcount from vm_bo_obtain().
>> >
>> > Of course if gpuvas might have resources that need deferred cleanup,
>> > that might change the situation somewhat.
>> 
>> I think we want to track PT(E) allocations, or rather reference counts of page
>> table structures carried by the drm_gpuva, but we don't need to release them on
>> drm_gpuva_unlink(), which is where we drop the reference count of the vm_bo.
>> 
>> Deferring drm_gpuva_unlink() isn't really an option I think, the GEMs list of
>> VM_BOs and the VM_BOs list of VAs is usually used in ttm_device_funcs::move to
>> map or unmap all VAs associated with a GEM object.
>> 
>> I think PT(E) reference counts etc. should be rather released when the drm_gpuva
>> is freed, i.e. page table allocations can be bound to the lifetime of a
>> drm_gpuva. Given that, I think that eventually we'll need a cleanup list for
>> those as well, since once they're removed from the VM tree (in the fence
>> signalling critical path), we loose access otherwise.
>
> Hmm. Another more conceptual issue with deferring gpuva is that
> "immediate mode" is defined as having the GPUVM match the GPU's actual
> address space at all times, which deferred gpuva cleanup would go
> against.

Depends on what "deferred gpuva cleanup" means.

What needs to happen in the run_job() is drm_gpuva_unlink() and
drm_gpuva_unmap(). Freeing the drm_gpuva, inluding releasing the assoiciated
driver specific resources, can be deferred.

> Deferring vm_bo cleanup doesn't have this issue because even though the
> vm_bo isn't kfreed immediately, all GPUVM apis still treat it as-if it
> isn't there anymore.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-07 11:28           ` Danilo Krummrich
@ 2025-09-07 11:39             ` Alice Ryhl
  2025-09-07 11:44               ` Danilo Krummrich
  2025-09-08  7:11               ` Boris Brezillon
  0 siblings, 2 replies; 27+ messages in thread
From: Alice Ryhl @ 2025-09-07 11:39 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Boris Brezillon, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Sun, Sep 07, 2025 at 01:28:05PM +0200, Danilo Krummrich wrote:
> On Sun Sep 7, 2025 at 1:15 PM CEST, Alice Ryhl wrote:
> > On Sat, Sep 06, 2025 at 12:47:36AM +0200, Danilo Krummrich wrote:
> >> On Fri Sep 5, 2025 at 8:18 PM CEST, Alice Ryhl wrote:
> >> > On Fri, Sep 5, 2025 at 3:25 PM Boris Brezillon
> >> > <boris.brezillon@collabora.com> wrote:
> >> >> On Fri, 05 Sep 2025 12:11:28 +0000
> >> >> Alice Ryhl <aliceryhl@google.com> wrote:
> >> >> > +static bool
> >> >> > +drm_gpuvm_bo_is_dead(struct drm_gpuvm_bo *vm_bo)
> >> >> > +{
> >> >> > +     return !kref_read(&vm_bo->kref);
> >> >>
> >> >> I'm not too sure I like the idea of [ab]using vm_bo::kref to defer the
> >> >> vm_bo release. I get why it's done like that, but I'm wondering why we
> >> >> don't defer the release of drm_gpuva objects instead (which is really
> >> >> what's being released in va_unlink()). I can imagine drivers wanting to
> >> >> attach resources to the gpuva that can't be released in the
> >> >> dma-signalling path in the future, and if we're doing that at the gpuva
> >> >> level, we also get rid of this kref dance, since the va will hold a
> >> >> vm_bo ref until it's destroyed.
> >> >>
> >> >> Any particular reason you went for vm_bo destruction deferral instead
> >> >> of gpuva?
> >> >
> >> > All of the things that were unsafe to release in the signalling path
> >> > were tied to the vm_bo, so that is why I went for vm_bo cleanup.
> >> > Another advantage is that it lets us use the same deferred logic for
> >> > the vm_bo_put() call that drops the refcount from vm_bo_obtain().
> >> >
> >> > Of course if gpuvas might have resources that need deferred cleanup,
> >> > that might change the situation somewhat.
> >> 
> >> I think we want to track PT(E) allocations, or rather reference counts of page
> >> table structures carried by the drm_gpuva, but we don't need to release them on
> >> drm_gpuva_unlink(), which is where we drop the reference count of the vm_bo.
> >> 
> >> Deferring drm_gpuva_unlink() isn't really an option I think, the GEMs list of
> >> VM_BOs and the VM_BOs list of VAs is usually used in ttm_device_funcs::move to
> >> map or unmap all VAs associated with a GEM object.
> >> 
> >> I think PT(E) reference counts etc. should be rather released when the drm_gpuva
> >> is freed, i.e. page table allocations can be bound to the lifetime of a
> >> drm_gpuva. Given that, I think that eventually we'll need a cleanup list for
> >> those as well, since once they're removed from the VM tree (in the fence
> >> signalling critical path), we loose access otherwise.
> >
> > Hmm. Another more conceptual issue with deferring gpuva is that
> > "immediate mode" is defined as having the GPUVM match the GPU's actual
> > address space at all times, which deferred gpuva cleanup would go
> > against.
> 
> Depends on what "deferred gpuva cleanup" means.
> 
> What needs to happen in the run_job() is drm_gpuva_unlink() and
> drm_gpuva_unmap(). Freeing the drm_gpuva, inluding releasing the assoiciated
> driver specific resources, can be deferred.

Yeah I guess we could have unlink remove the gpuva, but then allow the
end-user to attach the gpuva to a list of gpuvas to kfree deferred. That
way, the drm_gpuva_unlink() is not deferred but any resources it has can
be.

Of course, this approach also makes deferred gpuva cleanup somewhat
orthogonal to this patch.

One annoying part is that we don't have an gpuvm ops operation for
freeing gpuva, and if we add one for this, it would *only* be used in
this case as most drivers explicitly kfree gpuvas, which could be
confusing for end-users.

Alice

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-07 11:39             ` Alice Ryhl
@ 2025-09-07 11:44               ` Danilo Krummrich
  2025-09-08  7:11               ` Boris Brezillon
  1 sibling, 0 replies; 27+ messages in thread
From: Danilo Krummrich @ 2025-09-07 11:44 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Boris Brezillon, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Sun Sep 7, 2025 at 1:39 PM CEST, Alice Ryhl wrote:
> On Sun, Sep 07, 2025 at 01:28:05PM +0200, Danilo Krummrich wrote:
>> On Sun Sep 7, 2025 at 1:15 PM CEST, Alice Ryhl wrote:
>> > On Sat, Sep 06, 2025 at 12:47:36AM +0200, Danilo Krummrich wrote:
>> >> On Fri Sep 5, 2025 at 8:18 PM CEST, Alice Ryhl wrote:
>> >> > On Fri, Sep 5, 2025 at 3:25 PM Boris Brezillon
>> >> > <boris.brezillon@collabora.com> wrote:
>> >> >> On Fri, 05 Sep 2025 12:11:28 +0000
>> >> >> Alice Ryhl <aliceryhl@google.com> wrote:
>> >> >> > +static bool
>> >> >> > +drm_gpuvm_bo_is_dead(struct drm_gpuvm_bo *vm_bo)
>> >> >> > +{
>> >> >> > +     return !kref_read(&vm_bo->kref);
>> >> >>
>> >> >> I'm not too sure I like the idea of [ab]using vm_bo::kref to defer the
>> >> >> vm_bo release. I get why it's done like that, but I'm wondering why we
>> >> >> don't defer the release of drm_gpuva objects instead (which is really
>> >> >> what's being released in va_unlink()). I can imagine drivers wanting to
>> >> >> attach resources to the gpuva that can't be released in the
>> >> >> dma-signalling path in the future, and if we're doing that at the gpuva
>> >> >> level, we also get rid of this kref dance, since the va will hold a
>> >> >> vm_bo ref until it's destroyed.
>> >> >>
>> >> >> Any particular reason you went for vm_bo destruction deferral instead
>> >> >> of gpuva?
>> >> >
>> >> > All of the things that were unsafe to release in the signalling path
>> >> > were tied to the vm_bo, so that is why I went for vm_bo cleanup.
>> >> > Another advantage is that it lets us use the same deferred logic for
>> >> > the vm_bo_put() call that drops the refcount from vm_bo_obtain().
>> >> >
>> >> > Of course if gpuvas might have resources that need deferred cleanup,
>> >> > that might change the situation somewhat.
>> >> 
>> >> I think we want to track PT(E) allocations, or rather reference counts of page
>> >> table structures carried by the drm_gpuva, but we don't need to release them on
>> >> drm_gpuva_unlink(), which is where we drop the reference count of the vm_bo.
>> >> 
>> >> Deferring drm_gpuva_unlink() isn't really an option I think, the GEMs list of
>> >> VM_BOs and the VM_BOs list of VAs is usually used in ttm_device_funcs::move to
>> >> map or unmap all VAs associated with a GEM object.
>> >> 
>> >> I think PT(E) reference counts etc. should be rather released when the drm_gpuva
>> >> is freed, i.e. page table allocations can be bound to the lifetime of a
>> >> drm_gpuva. Given that, I think that eventually we'll need a cleanup list for
>> >> those as well, since once they're removed from the VM tree (in the fence
>> >> signalling critical path), we loose access otherwise.
>> >
>> > Hmm. Another more conceptual issue with deferring gpuva is that
>> > "immediate mode" is defined as having the GPUVM match the GPU's actual
>> > address space at all times, which deferred gpuva cleanup would go
>> > against.
>> 
>> Depends on what "deferred gpuva cleanup" means.
>> 
>> What needs to happen in the run_job() is drm_gpuva_unlink() and
>> drm_gpuva_unmap(). Freeing the drm_gpuva, inluding releasing the assoiciated
>> driver specific resources, can be deferred.
>
> Yeah I guess we could have unlink remove the gpuva, but then allow the
> end-user to attach the gpuva to a list of gpuvas to kfree deferred. That
> way, the drm_gpuva_unlink() is not deferred but any resources it has can
> be.
>
> Of course, this approach also makes deferred gpuva cleanup somewhat
> orthogonal to this patch.

Yes, it is.

> One annoying part is that we don't have an gpuvm ops operation for
> freeing gpuva, and if we add one for this, it would *only* be used in
> this case as most drivers explicitly kfree gpuvas, which could be
> confusing for end-users.

I think the reason why I left GPUVA alloc and free to drivers was that I was
expecting them to use a dedicated kmemcache for that.

However, we can still provide drm_gpuva_alloc(), drm_gpuva_free() and va_free(),
va_alloc() callbacks for drivers to implement.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-07 11:39             ` Alice Ryhl
  2025-09-07 11:44               ` Danilo Krummrich
@ 2025-09-08  7:11               ` Boris Brezillon
  2025-09-08  8:26                 ` Alice Ryhl
  1 sibling, 1 reply; 27+ messages in thread
From: Boris Brezillon @ 2025-09-08  7:11 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Danilo Krummrich, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

Hi Alice,

On Sun, 7 Sep 2025 11:39:41 +0000
Alice Ryhl <aliceryhl@google.com> wrote:

> On Sun, Sep 07, 2025 at 01:28:05PM +0200, Danilo Krummrich wrote:
> > On Sun Sep 7, 2025 at 1:15 PM CEST, Alice Ryhl wrote:  
> > > On Sat, Sep 06, 2025 at 12:47:36AM +0200, Danilo Krummrich wrote:  
> > >> On Fri Sep 5, 2025 at 8:18 PM CEST, Alice Ryhl wrote:  
> > >> > On Fri, Sep 5, 2025 at 3:25 PM Boris Brezillon
> > >> > <boris.brezillon@collabora.com> wrote:  
> > >> >> On Fri, 05 Sep 2025 12:11:28 +0000
> > >> >> Alice Ryhl <aliceryhl@google.com> wrote:  
> > >> >> > +static bool
> > >> >> > +drm_gpuvm_bo_is_dead(struct drm_gpuvm_bo *vm_bo)
> > >> >> > +{
> > >> >> > +     return !kref_read(&vm_bo->kref);  
> > >> >>
> > >> >> I'm not too sure I like the idea of [ab]using vm_bo::kref to defer the
> > >> >> vm_bo release. I get why it's done like that, but I'm wondering why we
> > >> >> don't defer the release of drm_gpuva objects instead (which is really
> > >> >> what's being released in va_unlink()). I can imagine drivers wanting to
> > >> >> attach resources to the gpuva that can't be released in the
> > >> >> dma-signalling path in the future, and if we're doing that at the gpuva
> > >> >> level, we also get rid of this kref dance, since the va will hold a
> > >> >> vm_bo ref until it's destroyed.
> > >> >>
> > >> >> Any particular reason you went for vm_bo destruction deferral instead
> > >> >> of gpuva?  
> > >> >
> > >> > All of the things that were unsafe to release in the signalling path
> > >> > were tied to the vm_bo, so that is why I went for vm_bo cleanup.
> > >> > Another advantage is that it lets us use the same deferred logic for
> > >> > the vm_bo_put() call that drops the refcount from vm_bo_obtain().
> > >> >
> > >> > Of course if gpuvas might have resources that need deferred cleanup,
> > >> > that might change the situation somewhat.  
> > >> 
> > >> I think we want to track PT(E) allocations, or rather reference counts of page
> > >> table structures carried by the drm_gpuva, but we don't need to release them on
> > >> drm_gpuva_unlink(), which is where we drop the reference count of the vm_bo.
> > >> 
> > >> Deferring drm_gpuva_unlink() isn't really an option I think, the GEMs list of
> > >> VM_BOs and the VM_BOs list of VAs is usually used in ttm_device_funcs::move to
> > >> map or unmap all VAs associated with a GEM object.
> > >> 
> > >> I think PT(E) reference counts etc. should be rather released when the drm_gpuva
> > >> is freed, i.e. page table allocations can be bound to the lifetime of a
> > >> drm_gpuva. Given that, I think that eventually we'll need a cleanup list for
> > >> those as well, since once they're removed from the VM tree (in the fence
> > >> signalling critical path), we loose access otherwise.  
> > >
> > > Hmm. Another more conceptual issue with deferring gpuva is that
> > > "immediate mode" is defined as having the GPUVM match the GPU's actual
> > > address space at all times, which deferred gpuva cleanup would go
> > > against.  
> > 
> > Depends on what "deferred gpuva cleanup" means.
> > 
> > What needs to happen in the run_job() is drm_gpuva_unlink() and
> > drm_gpuva_unmap(). Freeing the drm_gpuva, inluding releasing the assoiciated
> > driver specific resources, can be deferred.  
> 
> Yeah I guess we could have unlink remove the gpuva, but then allow the
> end-user to attach the gpuva to a list of gpuvas to kfree deferred. That
> way, the drm_gpuva_unlink() is not deferred but any resources it has can
> be.

This ^.

> 
> Of course, this approach also makes deferred gpuva cleanup somewhat
> orthogonal to this patch.

Well, yes and no, because if you go for gpuva deferred cleanup, you
don't really need the fancy kref_put() you have in this patch, it's
just a regular vm_bo_put() that's called in the deferred gpuva path on
the vm_bo attached to the gpuva being released.

> 
> One annoying part is that we don't have an gpuvm ops operation for
> freeing gpuva, and if we add one for this, it would *only* be used in
> this case as most drivers explicitly kfree gpuvas, which could be
> confusing for end-users.

Also not sure ::vm_bo_free() was meant to be used like that. It was for
drivers that need to control the drm_gpuvm_bo allocation, not those
that rely on the default implementation (kmalloc). Given how things
are described in the the doc, it feels weird to have a ::vm_bo_free()
without ::vm_bo_alloc(). So, if we decide to go this way (which I'm
still not convinced we should, given ultimately we might want to defer
gpuvas cleanup), the ::vm_bo_free() doc should be extended to cover
this 'deferred vm_bo free' case.

Regards,

Boris

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-07 11:15         ` Alice Ryhl
  2025-09-07 11:28           ` Danilo Krummrich
@ 2025-09-08  7:22           ` Boris Brezillon
  1 sibling, 0 replies; 27+ messages in thread
From: Boris Brezillon @ 2025-09-08  7:22 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Danilo Krummrich, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Sun, 7 Sep 2025 11:15:20 +0000
Alice Ryhl <aliceryhl@google.com> wrote:

> On Sat, Sep 06, 2025 at 12:47:36AM +0200, Danilo Krummrich wrote:
> > On Fri Sep 5, 2025 at 8:18 PM CEST, Alice Ryhl wrote:  
> > > On Fri, Sep 5, 2025 at 3:25 PM Boris Brezillon
> > > <boris.brezillon@collabora.com> wrote:  
> > >> On Fri, 05 Sep 2025 12:11:28 +0000
> > >> Alice Ryhl <aliceryhl@google.com> wrote:  
> > >> > +static bool
> > >> > +drm_gpuvm_bo_is_dead(struct drm_gpuvm_bo *vm_bo)
> > >> > +{
> > >> > +     return !kref_read(&vm_bo->kref);  
> > >>
> > >> I'm not too sure I like the idea of [ab]using vm_bo::kref to defer the
> > >> vm_bo release. I get why it's done like that, but I'm wondering why we
> > >> don't defer the release of drm_gpuva objects instead (which is really
> > >> what's being released in va_unlink()). I can imagine drivers wanting to
> > >> attach resources to the gpuva that can't be released in the
> > >> dma-signalling path in the future, and if we're doing that at the gpuva
> > >> level, we also get rid of this kref dance, since the va will hold a
> > >> vm_bo ref until it's destroyed.
> > >>
> > >> Any particular reason you went for vm_bo destruction deferral instead
> > >> of gpuva?  
> > >
> > > All of the things that were unsafe to release in the signalling path
> > > were tied to the vm_bo, so that is why I went for vm_bo cleanup.
> > > Another advantage is that it lets us use the same deferred logic for
> > > the vm_bo_put() call that drops the refcount from vm_bo_obtain().
> > >
> > > Of course if gpuvas might have resources that need deferred cleanup,
> > > that might change the situation somewhat.  
> > 
> > I think we want to track PT(E) allocations, or rather reference counts of page
> > table structures carried by the drm_gpuva, but we don't need to release them on
> > drm_gpuva_unlink(), which is where we drop the reference count of the vm_bo.
> > 
> > Deferring drm_gpuva_unlink() isn't really an option I think, the GEMs list of
> > VM_BOs and the VM_BOs list of VAs is usually used in ttm_device_funcs::move to
> > map or unmap all VAs associated with a GEM object.
> > 
> > I think PT(E) reference counts etc. should be rather released when the drm_gpuva
> > is freed, i.e. page table allocations can be bound to the lifetime of a
> > drm_gpuva. Given that, I think that eventually we'll need a cleanup list for
> > those as well, since once they're removed from the VM tree (in the fence
> > signalling critical path), we loose access otherwise.  
> 
> Hmm. Another more conceptual issue with deferring gpuva is that
> "immediate mode" is defined as having the GPUVM match the GPU's actual
> address space at all times, which deferred gpuva cleanup would go
> against.
> 
> Deferring vm_bo cleanup doesn't have this issue because even though the
> vm_bo isn't kfreed immediately, all GPUVM apis still treat it as-if it
> isn't there anymore.
> 
> > >> > +static void
> > >> > +drm_gpuvm_bo_defer_locked(struct kref *kref)
> > >> > +{
> > >> > +     struct drm_gpuvm_bo *vm_bo = container_of(kref, struct drm_gpuvm_bo,
> > >> > +                                               kref);
> > >> > +     struct drm_gpuvm *gpuvm = vm_bo->vm;
> > >> > +
> > >> > +     if (!drm_gpuvm_resv_protected(gpuvm)) {
> > >> > +             drm_gpuvm_bo_list_del(vm_bo, extobj, true);
> > >> > +             drm_gpuvm_bo_list_del(vm_bo, evict, true);
> > >> > +     }
> > >> > +
> > >> > +     list_del(&vm_bo->list.entry.gem);
> > >> > +     mutex_unlock(&vm_bo->obj->gpuva.lock);  
> > >>
> > >> I got tricked by this implicit unlock, and the conditional unlocks it
> > >> creates in drm_gpuva_unlink_defer(). Honestly, I'd rather see this
> > >> unlocked moved to drm_gpuva_unlink_defer() and a conditional unlock
> > >> added to drm_gpuvm_bo_put_deferred(), because it's easier to reason
> > >> about when the lock/unlock calls are in the same function
> > >> (kref_put_mutex() being the equivalent of a conditional lock).  
> > >
> > > Ok. I followed the docs of kref_put_mutex() that say to unlock it from
> > > the function.  
> > 
> > Yes, please keep it the way it is, I don't want to deviate from what is
> > documented and everyone else does. Besides that, I also think it's a little
> > less error prone.  
> 
> I gave it a try:
> 
> bool
> drm_gpuvm_bo_put_deferred(struct drm_gpuvm_bo *vm_bo)
> {
> 	drm_WARN_ON(vm_bo->vm->drm, !drm_gpuvm_immediate_mode(vm_bo->vm));
> 
> 	if (!vm_bo)
> 		return false;
> 
> 	if (kref_put_mutex(&vm_bo->kref, drm_gpuvm_bo_defer_locked,
> 			   &vm_bo->obj->gpuva.lock)) {
> 		/*
> 		 * It's important that the GEM stays alive for the duration in which
> 		 * drm_gpuvm_bo_defer_locked() holds the mutex, but the instant we add
> 		 * the vm_bo to bo_defer, another thread might call
> 		 * drm_gpuvm_bo_deferred_cleanup() and put the GEM. For this reason, we
> 		 * add the vm_bo to bo_defer *after* releasing the GEM's mutex.
> 		 */
> 		mutex_unlock(&vm_bo->obj->gpuva.lock);
> 		drm_gpuvm_bo_list_add(vm_bo, bo_defer, true);
> 		return true;
> 	}
> 
> 	return false;
> }
> 
> void
> drm_gpuva_unlink_defer(struct drm_gpuva *va)
> {
> 	struct drm_gem_object *obj = va->gem.obj;
> 	struct drm_gpuvm_bo *vm_bo = va->vm_bo;
> 	bool should_defer_bo;
> 
> 	if (unlikely(!obj))
> 		return;
> 
> 	drm_WARN_ON(vm_bo->vm->drm, !drm_gpuvm_immediate_mode(vm_bo->vm));
> 
> 	mutex_lock(&obj->gpuva.lock);
> 	list_del_init(&va->gem.entry);
> 
> 	/*
> 	 * This is drm_gpuvm_bo_put_deferred() slightly modified since we
> 	 * already hold the mutex. It's important that we add the vm_bo to
> 	 * bo_defer after releasing the mutex for the same reason as in
> 	 * drm_gpuvm_bo_put_deferred().
> 	 */
> 	should_defer_bo = kref_put(&vm_bo->kref, drm_gpuvm_bo_defer_locked);
> 	mutex_unlock(&obj->gpuva.lock);
> 	if (should_defer_bo)
> 		drm_gpuvm_bo_list_add(vm_bo, bo_defer, true);
> 
> 	va->vm_bo = NULL;
> }
> 
> I do think it looks relatively nice like this, particularly
> drm_gpuva_unlink_defer().

I agree.

> But that's also the one not using
> kref_put_mutex().

Yeah, but that's the thing. I guess if drm_gpuvm_bo_defer_locked() was
only called from kref_put_mutex() this would be okay (though I still
have a hard time with those functions taking locks that have to be
released by the caller, but at least that's a well-known/documented
pattern). But it's also currently called from drm_gpuva_unlink_defer()
where the lock is taken but not released. I guess if the function name
was reflecting that (drm_gpuvm_bo_defer_locked_and_unlock()?), and with
a comment explaining why the lock is conditionally released in the
caller that would be acceptable, but I still find this locking scheme
quite confusing...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-08  7:11               ` Boris Brezillon
@ 2025-09-08  8:26                 ` Alice Ryhl
  2025-09-08  8:47                   ` Danilo Krummrich
  2025-09-08  9:37                   ` Boris Brezillon
  0 siblings, 2 replies; 27+ messages in thread
From: Alice Ryhl @ 2025-09-08  8:26 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Danilo Krummrich, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Mon, Sep 08, 2025 at 09:11:40AM +0200, Boris Brezillon wrote:
> Hi Alice,
> 
> On Sun, 7 Sep 2025 11:39:41 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
> 
> > Yeah I guess we could have unlink remove the gpuva, but then allow the
> > end-user to attach the gpuva to a list of gpuvas to kfree deferred. That
> > way, the drm_gpuva_unlink() is not deferred but any resources it has can
> > be.
> 
> This ^.
> 
> > 
> > Of course, this approach also makes deferred gpuva cleanup somewhat
> > orthogonal to this patch.
> 
> Well, yes and no, because if you go for gpuva deferred cleanup, you
> don't really need the fancy kref_put() you have in this patch, it's
> just a regular vm_bo_put() that's called in the deferred gpuva path on
> the vm_bo attached to the gpuva being released.

Ok, so what you suggest is that on gpuva_unlink() we remove the gpuva
from the vm_bo's list, but then instead of putting the vm_bo's refcount,
we add the gpuva to a list, and in the deferred cleanup codepath we
iterate gpuvas and drop vm_bo refcounts *at that point*. Is that
understood correctly?

That means we don't immediately remove the vm_bo from the gem.gpuva
list, but the gpuva list in the vm_bo will be empty. I guess you already
have to handle such vm_bos anyway since you can already have an empty
vm_bo in between vm_bo_obtain() and the first call to gpuva_link().

One disadvantage is that we might end up preparing or unevicting a GEM
object that doesn't have any VAs left, which the current approach
avoids.

> > One annoying part is that we don't have an gpuvm ops operation for
> > freeing gpuva, and if we add one for this, it would *only* be used in
> > this case as most drivers explicitly kfree gpuvas, which could be
> > confusing for end-users.
> 
> Also not sure ::vm_bo_free() was meant to be used like that. It was for
> drivers that need to control the drm_gpuvm_bo allocation, not those
> that rely on the default implementation (kmalloc). Given how things
> are described in the the doc, it feels weird to have a ::vm_bo_free()
> without ::vm_bo_alloc(). So, if we decide to go this way (which I'm
> still not convinced we should, given ultimately we might want to defer
> gpuvas cleanup), the ::vm_bo_free() doc should be extended to cover
> this 'deferred vm_bo free' case.

I can implement vm_bo_alloc() too, but I think it seems like a pretty
natural way to use vm_bo_free().

Alice

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-08  8:26                 ` Alice Ryhl
@ 2025-09-08  8:47                   ` Danilo Krummrich
  2025-09-08 10:20                     ` Boris Brezillon
  2025-09-08  9:37                   ` Boris Brezillon
  1 sibling, 1 reply; 27+ messages in thread
From: Danilo Krummrich @ 2025-09-08  8:47 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Boris Brezillon, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Mon Sep 8, 2025 at 10:26 AM CEST, Alice Ryhl wrote:
> On Mon, Sep 08, 2025 at 09:11:40AM +0200, Boris Brezillon wrote:
>> Hi Alice,
>> 
>> On Sun, 7 Sep 2025 11:39:41 +0000
>> Alice Ryhl <aliceryhl@google.com> wrote:
>> 
>> > Yeah I guess we could have unlink remove the gpuva, but then allow the
>> > end-user to attach the gpuva to a list of gpuvas to kfree deferred. That
>> > way, the drm_gpuva_unlink() is not deferred but any resources it has can
>> > be.
>> 
>> This ^.
>> 
>> > 
>> > Of course, this approach also makes deferred gpuva cleanup somewhat
>> > orthogonal to this patch.
>> 
>> Well, yes and no, because if you go for gpuva deferred cleanup, you
>> don't really need the fancy kref_put() you have in this patch, it's
>> just a regular vm_bo_put() that's called in the deferred gpuva path on
>> the vm_bo attached to the gpuva being released.
>
> Ok, so what you suggest is that on gpuva_unlink() we remove the gpuva
> from the vm_bo's list, but then instead of putting the vm_bo's refcount,
> we add the gpuva to a list, and in the deferred cleanup codepath we
> iterate gpuvas and drop vm_bo refcounts *at that point*. Is that
> understood correctly?

It has to be a special unlink function though, since otherwise

	drm_gpuva_link();
	drm_gpuva_unlink();
	drm_gpuva_link();
	drm_gpuva_unlink();

leaks the VM_BO. Sounds a bit messy, but my concern is really about the below:

> That means we don't immediately remove the vm_bo from the gem.gpuva
> list, but the gpuva list in the vm_bo will be empty. I guess you already
> have to handle such vm_bos anyway since you can already have an empty
> vm_bo in between vm_bo_obtain() and the first call to gpuva_link().
>
> One disadvantage is that we might end up preparing or unevicting a GEM
> object that doesn't have any VAs left, which the current approach
> avoids.

Yeah, we really want to avoid that.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-08  8:26                 ` Alice Ryhl
  2025-09-08  8:47                   ` Danilo Krummrich
@ 2025-09-08  9:37                   ` Boris Brezillon
  1 sibling, 0 replies; 27+ messages in thread
From: Boris Brezillon @ 2025-09-08  9:37 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Danilo Krummrich, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Mon, 8 Sep 2025 08:26:13 +0000
Alice Ryhl <aliceryhl@google.com> wrote:

> On Mon, Sep 08, 2025 at 09:11:40AM +0200, Boris Brezillon wrote:
> > Hi Alice,
> > 
> > On Sun, 7 Sep 2025 11:39:41 +0000
> > Alice Ryhl <aliceryhl@google.com> wrote:
> >   
> > > Yeah I guess we could have unlink remove the gpuva, but then allow the
> > > end-user to attach the gpuva to a list of gpuvas to kfree deferred. That
> > > way, the drm_gpuva_unlink() is not deferred but any resources it has can
> > > be.  
> > 
> > This ^.
> >   
> > > 
> > > Of course, this approach also makes deferred gpuva cleanup somewhat
> > > orthogonal to this patch.  
> > 
> > Well, yes and no, because if you go for gpuva deferred cleanup, you
> > don't really need the fancy kref_put() you have in this patch, it's
> > just a regular vm_bo_put() that's called in the deferred gpuva path on
> > the vm_bo attached to the gpuva being released.  
> 
> Ok, so what you suggest is that on gpuva_unlink() we remove the gpuva
> from the vm_bo's list, but then instead of putting the vm_bo's refcount,
> we add the gpuva to a list, and in the deferred cleanup codepath we
> iterate gpuvas and drop vm_bo refcounts *at that point*. Is that
> understood correctly?

Yes, that's what I'm suggesting.

> 
> That means we don't immediately remove the vm_bo from the gem.gpuva
> list, but the gpuva list in the vm_bo will be empty. I guess you already
> have to handle such vm_bos anyway since you can already have an empty
> vm_bo in between vm_bo_obtain() and the first call to gpuva_link().
> 
> One disadvantage is that we might end up preparing or unevicting a GEM
> object that doesn't have any VAs left, which the current approach
> avoids.

Fair enough, though, as you said, the unnecessary ::prepare() already
exists in the "queue-map-operation" path, because the object is added
to the extobj list as soon as vm_bo_obtain() is called, not at _map()
time. This being said, I don't know if it really matters in practice,
because:

1. if buffer eviction happens too often, the system will already suffer
   from the constant swapout/swapin already, it's not a single buffer
   that's going to make a difference
2. hopefully the buffer will be gone before the next job is submitted
   most of the time
3. we can flush the gpuva destruction list before preparing/unevicting
   GEM objects, so we don't end up processing vm_bos that no longer
   exist. Actually, I think this step is good to have regardless
   of what we decide here, because the less elements we have to iterate
   the better, and in the submit path, we've already acquired all
   GEM locks to do that cleanup, so we're probably better off doing it
   right away.

> 
> > > One annoying part is that we don't have an gpuvm ops operation for
> > > freeing gpuva, and if we add one for this, it would *only* be used in
> > > this case as most drivers explicitly kfree gpuvas, which could be
> > > confusing for end-users.  
> > 
> > Also not sure ::vm_bo_free() was meant to be used like that. It was for
> > drivers that need to control the drm_gpuvm_bo allocation, not those
> > that rely on the default implementation (kmalloc). Given how things
> > are described in the the doc, it feels weird to have a ::vm_bo_free()
> > without ::vm_bo_alloc(). So, if we decide to go this way (which I'm
> > still not convinced we should, given ultimately we might want to defer
> > gpuvas cleanup), the ::vm_bo_free() doc should be extended to cover
> > this 'deferred vm_bo free' case.  
> 
> I can implement vm_bo_alloc() too, but I think it seems like a pretty
> natural way to use vm_bo_free().

Not necessarily saying we should have a vm_bo_alloc(), just saying it
feels weird as it is now because of the asymmetry and how things are
described in the doc.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-08  8:47                   ` Danilo Krummrich
@ 2025-09-08 10:20                     ` Boris Brezillon
  2025-09-08 11:11                       ` Danilo Krummrich
  0 siblings, 1 reply; 27+ messages in thread
From: Boris Brezillon @ 2025-09-08 10:20 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Alice Ryhl, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Mon, 08 Sep 2025 10:47:25 +0200
"Danilo Krummrich" <dakr@kernel.org> wrote:

> On Mon Sep 8, 2025 at 10:26 AM CEST, Alice Ryhl wrote:
> > On Mon, Sep 08, 2025 at 09:11:40AM +0200, Boris Brezillon wrote:  
> >> Hi Alice,
> >> 
> >> On Sun, 7 Sep 2025 11:39:41 +0000
> >> Alice Ryhl <aliceryhl@google.com> wrote:
> >>   
> >> > Yeah I guess we could have unlink remove the gpuva, but then allow the
> >> > end-user to attach the gpuva to a list of gpuvas to kfree deferred. That
> >> > way, the drm_gpuva_unlink() is not deferred but any resources it has can
> >> > be.  
> >> 
> >> This ^.
> >>   
> >> > 
> >> > Of course, this approach also makes deferred gpuva cleanup somewhat
> >> > orthogonal to this patch.  
> >> 
> >> Well, yes and no, because if you go for gpuva deferred cleanup, you
> >> don't really need the fancy kref_put() you have in this patch, it's
> >> just a regular vm_bo_put() that's called in the deferred gpuva path on
> >> the vm_bo attached to the gpuva being released.  
> >
> > Ok, so what you suggest is that on gpuva_unlink() we remove the gpuva
> > from the vm_bo's list, but then instead of putting the vm_bo's refcount,
> > we add the gpuva to a list, and in the deferred cleanup codepath we
> > iterate gpuvas and drop vm_bo refcounts *at that point*. Is that
> > understood correctly?  
> 
> It has to be a special unlink function though, since otherwise
> 
> 	drm_gpuva_link();
> 	drm_gpuva_unlink();
> 	drm_gpuva_link();
> 	drm_gpuva_unlink();
> 
> leaks the VM_BO.

I'm not following. Yes there's going to be a
drm_gpuva_unlink_defer_put() that skips the

        va->vm_bo = NULL;
        drm_gpuvm_bo_put(vm_bo);

and adds the gpuva to a list for deferred destruction. But I'm not
seeing where the leak is. Once the gpuva has been put in this list,
there should be no existing component referring to this object, and it's
going to be destroyed or recycled, but not re-used as-is.


> Sounds a bit messy, but my concern is really about the below:
> 
> > That means we don't immediately remove the vm_bo from the gem.gpuva
> > list, but the gpuva list in the vm_bo will be empty. I guess you already
> > have to handle such vm_bos anyway since you can already have an empty
> > vm_bo in between vm_bo_obtain() and the first call to gpuva_link().
> >
> > One disadvantage is that we might end up preparing or unevicting a GEM
> > object that doesn't have any VAs left, which the current approach
> > avoids.  
> 
> Yeah, we really want to avoid that.

I think I agree that we want to avoid it, but I'm not too sure about
the solution that involves playing with vm_bo::kref. I'm particularly
worried by the fact drivers can iterate the evict/extobj lists
directly, and can thus see objects scheduled for destruction. I know
there's a gpuvm_bo_is_dead() helper, and drivers should be aware of the
risks, but I don't feel comfortable about this.

And since we've mentioned the possibility of having to support
gpuva destruction deferral too, I'm wondering it wouldn't be cleaner
to just go for this approach from the start (gpuva owns a ref to a
vm_bo, which gets released when the gpuva object is released).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-08 10:20                     ` Boris Brezillon
@ 2025-09-08 11:11                       ` Danilo Krummrich
  2025-09-08 12:11                         ` Boris Brezillon
  0 siblings, 1 reply; 27+ messages in thread
From: Danilo Krummrich @ 2025-09-08 11:11 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Alice Ryhl, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Mon Sep 8, 2025 at 12:20 PM CEST, Boris Brezillon wrote:
> I'm not following. Yes there's going to be a
> drm_gpuva_unlink_defer_put() that skips the
>
>         va->vm_bo = NULL;
>         drm_gpuvm_bo_put(vm_bo);
>
> and adds the gpuva to a list for deferred destruction. But I'm not
> seeing where the leak is. Once the gpuva has been put in this list,
> there should be no existing component referring to this object, and it's
> going to be destroyed or recycled, but not re-used as-is.

I'm saying exactly what you say: "has to be a special unlink function" ->
drm_gpuva_unlink_defer_put(). :)

>> Yeah, we really want to avoid that.
>
> I think I agree that we want to avoid it, but I'm not too sure about
> the solution that involves playing with vm_bo::kref. I'm particularly
> worried by the fact drivers can iterate the evict/extobj lists
> directly, and can thus see objects scheduled for destruction. I know
> there's a gpuvm_bo_is_dead() helper, and drivers should be aware of the
> risks, but I don't feel comfortable about this.

No, drivers can't iterate the evict/extobj lists directly; or at least this is
not intended by GPUVM's API and if drivers do so, this is considered peeking
into GPUVM internals, so drivers are on their own anyways.

Iterators, such as for_each_vm_bo_in_list() are not exposed to drivers.

> And since we've mentioned the possibility of having to support
> gpuva destruction deferral too, I'm wondering it wouldn't be cleaner
> to just go for this approach from the start (gpuva owns a ref to a
> vm_bo, which gets released when the gpuva object is released).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-08 11:11                       ` Danilo Krummrich
@ 2025-09-08 12:11                         ` Boris Brezillon
  2025-09-08 12:20                           ` Danilo Krummrich
  0 siblings, 1 reply; 27+ messages in thread
From: Boris Brezillon @ 2025-09-08 12:11 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Alice Ryhl, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Mon, 08 Sep 2025 13:11:32 +0200
"Danilo Krummrich" <dakr@kernel.org> wrote:

> On Mon Sep 8, 2025 at 12:20 PM CEST, Boris Brezillon wrote:
> > I'm not following. Yes there's going to be a
> > drm_gpuva_unlink_defer_put() that skips the
> >
> >         va->vm_bo = NULL;
> >         drm_gpuvm_bo_put(vm_bo);
> >
> > and adds the gpuva to a list for deferred destruction. But I'm not
> > seeing where the leak is. Once the gpuva has been put in this list,
> > there should be no existing component referring to this object, and it's
> > going to be destroyed or recycled, but not re-used as-is.  
> 
> I'm saying exactly what you say: "has to be a special unlink function" ->
> drm_gpuva_unlink_defer_put(). :)

I don't see how calling drm_gpuva_unlink() instead of
drm_gpuva_unlink_defer_put() would leak the vm_bo though. I mean, it
would certainly be wrong because you'd be calling cleanup methods that
are expected to be called with the resv lock held from the
dma-signalling path, but that's a different issue, no? Anyway, if we're
going to allow gpuva cleanup/destruction deferral, we'll either need to
do that through a different function, or through some specialization of
drm_gpuva_unlink() that does things differently based on the
immediate/non-immediate mode (or some other flag).

> 
> >> Yeah, we really want to avoid that.  
> >
> > I think I agree that we want to avoid it, but I'm not too sure about
> > the solution that involves playing with vm_bo::kref. I'm particularly
> > worried by the fact drivers can iterate the evict/extobj lists
> > directly, and can thus see objects scheduled for destruction. I know
> > there's a gpuvm_bo_is_dead() helper, and drivers should be aware of the
> > risks, but I don't feel comfortable about this.  
> 
> No, drivers can't iterate the evict/extobj lists directly; or at least this is
> not intended by GPUVM's API and if drivers do so, this is considered peeking
> into GPUVM internals, so drivers are on their own anyways.
> 
> Iterators, such as for_each_vm_bo_in_list() are not exposed to drivers.

Okay, that's a good thing. I thought Xe was doing some funky stuff with
the list...

> 
> > And since we've mentioned the possibility of having to support
> > gpuva destruction deferral too, I'm wondering it wouldn't be cleaner
> > to just go for this approach from the start (gpuva owns a ref to a
> > vm_bo, which gets released when the gpuva object is released).  


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-08 12:11                         ` Boris Brezillon
@ 2025-09-08 12:20                           ` Danilo Krummrich
  2025-09-09 10:39                             ` Thomas Hellström
  0 siblings, 1 reply; 27+ messages in thread
From: Danilo Krummrich @ 2025-09-08 12:20 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Alice Ryhl, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On 9/8/25 2:11 PM, Boris Brezillon wrote:
> On Mon, 08 Sep 2025 13:11:32 +0200
> "Danilo Krummrich" <dakr@kernel.org> wrote:
>> I'm saying exactly what you say: "has to be a special unlink function" ->
>> drm_gpuva_unlink_defer_put(). :)
> 
> I don't see how calling drm_gpuva_unlink() instead of
> drm_gpuva_unlink_defer_put() would leak the vm_bo though.

Initially (i.e. a few mails back), it sounded to me as if you'd propose to drop
the drm_gpuva's vm_bo reference only when it is freed.

>> No, drivers can't iterate the evict/extobj lists directly; or at least this is
>> not intended by GPUVM's API and if drivers do so, this is considered peeking
>> into GPUVM internals, so drivers are on their own anyways.
>>
>> Iterators, such as for_each_vm_bo_in_list() are not exposed to drivers.
> 
> Okay, that's a good thing. I thought Xe was doing some funky stuff with
> the list...

Maybe, I don't know. If they do so, the should send patches adding the
corresponding iterators and provide a rationale why drivers need to access those
lists directly and why we can't provide an API that handles the overall
use-case, such as drm_gpuvm_prepare_objects(), etc.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-08 12:20                           ` Danilo Krummrich
@ 2025-09-09 10:39                             ` Thomas Hellström
  2025-09-09 10:47                               ` Danilo Krummrich
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Hellström @ 2025-09-09 10:39 UTC (permalink / raw)
  To: Danilo Krummrich, Boris Brezillon
  Cc: Alice Ryhl, Matthew Brost, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Steven Price,
	Daniel Almeida, Liviu Dudau, dri-devel, linux-kernel,
	rust-for-linux

Hi,

On 9/8/25 14:20, Danilo Krummrich wrote:
> On 9/8/25 2:11 PM, Boris Brezillon wrote:
>> On Mon, 08 Sep 2025 13:11:32 +0200
>> "Danilo Krummrich" <dakr@kernel.org> wrote:
>>> I'm saying exactly what you say: "has to be a special unlink function" ->
>>> drm_gpuva_unlink_defer_put(). :)
>> I don't see how calling drm_gpuva_unlink() instead of
>> drm_gpuva_unlink_defer_put() would leak the vm_bo though.
> Initially (i.e. a few mails back), it sounded to me as if you'd propose to drop
> the drm_gpuva's vm_bo reference only when it is freed.
>
>>> No, drivers can't iterate the evict/extobj lists directly; or at least this is
>>> not intended by GPUVM's API and if drivers do so, this is considered peeking
>>> into GPUVM internals, so drivers are on their own anyways.
>>>
>>> Iterators, such as for_each_vm_bo_in_list() are not exposed to drivers.
>> Okay, that's a good thing. I thought Xe was doing some funky stuff with
>> the list...
> Maybe, I don't know. If they do so, the should send patches adding the
> corresponding iterators and provide a rationale why drivers need to access those
> lists directly and why we can't provide an API that handles the overall
> use-case, such as drm_gpuvm_prepare_objects(), etc.

We're using the drm_gpuvm_*for_each* macros in drm_gpuvm.h, assuming 
from name and docs they are driver api.

Also the drm_gem_for_each_gpuvm_bo(), although this usage could easily 
be converted to a helper.

So I don't think we're abusing internals ATM. If so we should ofc fix that.

IMO if some iteration macros or members that are exposed in the 
driver-facing headers need to be private (where it's not totally 
obvious) they should be marked as such or moved to private headers.

Thanks,

Thomas




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-09 10:39                             ` Thomas Hellström
@ 2025-09-09 10:47                               ` Danilo Krummrich
  2025-09-09 11:10                                 ` Thomas Hellström
  0 siblings, 1 reply; 27+ messages in thread
From: Danilo Krummrich @ 2025-09-09 10:47 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Boris Brezillon, Alice Ryhl, Matthew Brost, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Steven Price, Daniel Almeida, Liviu Dudau, dri-devel,
	linux-kernel, rust-for-linux

On 9/9/25 12:39 PM, Thomas Hellström wrote:
> On 9/8/25 14:20, Danilo Krummrich wrote:
>> On 9/8/25 2:11 PM, Boris Brezillon wrote:
>>> On Mon, 08 Sep 2025 13:11:32 +0200
>>> "Danilo Krummrich" <dakr@kernel.org> wrote:
>>>> I'm saying exactly what you say: "has to be a special unlink function" ->
>>>> drm_gpuva_unlink_defer_put(). :)
>>> I don't see how calling drm_gpuva_unlink() instead of
>>> drm_gpuva_unlink_defer_put() would leak the vm_bo though.
>> Initially (i.e. a few mails back), it sounded to me as if you'd propose to drop
>> the drm_gpuva's vm_bo reference only when it is freed.
>>
>>>> No, drivers can't iterate the evict/extobj lists directly; or at least this is
>>>> not intended by GPUVM's API and if drivers do so, this is considered peeking
>>>> into GPUVM internals, so drivers are on their own anyways.
>>>>
>>>> Iterators, such as for_each_vm_bo_in_list() are not exposed to drivers.
>>> Okay, that's a good thing. I thought Xe was doing some funky stuff with
>>> the list...
>> Maybe, I don't know. If they do so, the should send patches adding the
>> corresponding iterators and provide a rationale why drivers need to access those
>> lists directly and why we can't provide an API that handles the overall
>> use-case, such as drm_gpuvm_prepare_objects(), etc.
> 
> We're using the drm_gpuvm_*for_each* macros in drm_gpuvm.h, assuming from name
> and docs they are driver api.
> 
> Also the drm_gem_for_each_gpuvm_bo(), although this usage could easily be
> converted to a helper.

We were talking about the extobj/evict lists, the ones you mention are fine of
course. :)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-09 10:47                               ` Danilo Krummrich
@ 2025-09-09 11:10                                 ` Thomas Hellström
  2025-09-09 11:24                                   ` Alice Ryhl
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Hellström @ 2025-09-09 11:10 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Boris Brezillon, Alice Ryhl, Matthew Brost, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Steven Price, Daniel Almeida, Liviu Dudau, dri-devel,
	linux-kernel, rust-for-linux

On Tue, 2025-09-09 at 12:47 +0200, Danilo Krummrich wrote:
> On 9/9/25 12:39 PM, Thomas Hellström wrote:
> > On 9/8/25 14:20, Danilo Krummrich wrote:
> > > On 9/8/25 2:11 PM, Boris Brezillon wrote:
> > > > On Mon, 08 Sep 2025 13:11:32 +0200
> > > > "Danilo Krummrich" <dakr@kernel.org> wrote:
> > > > > I'm saying exactly what you say: "has to be a special unlink
> > > > > function" ->
> > > > > drm_gpuva_unlink_defer_put(). :)
> > > > I don't see how calling drm_gpuva_unlink() instead of
> > > > drm_gpuva_unlink_defer_put() would leak the vm_bo though.
> > > Initially (i.e. a few mails back), it sounded to me as if you'd
> > > propose to drop
> > > the drm_gpuva's vm_bo reference only when it is freed.
> > > 
> > > > > No, drivers can't iterate the evict/extobj lists directly; or
> > > > > at least this is
> > > > > not intended by GPUVM's API and if drivers do so, this is
> > > > > considered peeking
> > > > > into GPUVM internals, so drivers are on their own anyways.
> > > > > 
> > > > > Iterators, such as for_each_vm_bo_in_list() are not exposed
> > > > > to drivers.
> > > > Okay, that's a good thing. I thought Xe was doing some funky
> > > > stuff with
> > > > the list...
> > > Maybe, I don't know. If they do so, the should send patches
> > > adding the
> > > corresponding iterators and provide a rationale why drivers need
> > > to access those
> > > lists directly and why we can't provide an API that handles the
> > > overall
> > > use-case, such as drm_gpuvm_prepare_objects(), etc.
> > 
> > We're using the drm_gpuvm_*for_each* macros in drm_gpuvm.h,
> > assuming from name
> > and docs they are driver api.
> > 
> > Also the drm_gem_for_each_gpuvm_bo(), although this usage could
> > easily be
> > converted to a helper.
> 
> We were talking about the extobj/evict lists, the ones you mention
> are fine of
> course. :)
> 

Hmm. Now on closer inspection it looks like we're checking for evict
list empty, It looks like rebinding after validation may in theory
evict some bos to system memory and then we'd rerun the validation step
if the evict list was not empty.

We could of course add a helper for that or if there are better
suggestions to handle that situation, that'd be fine as well.

Thanks,
Thomas


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-09 11:10                                 ` Thomas Hellström
@ 2025-09-09 11:24                                   ` Alice Ryhl
  2025-09-09 11:28                                     ` Danilo Krummrich
  0 siblings, 1 reply; 27+ messages in thread
From: Alice Ryhl @ 2025-09-09 11:24 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Danilo Krummrich, Boris Brezillon, Matthew Brost,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On Tue, Sep 9, 2025 at 1:11 PM Thomas Hellström
<thomas.hellstrom@linux.intel.com> wrote:
>
> On Tue, 2025-09-09 at 12:47 +0200, Danilo Krummrich wrote:
> > On 9/9/25 12:39 PM, Thomas Hellström wrote:
> > > On 9/8/25 14:20, Danilo Krummrich wrote:
> > > > On 9/8/25 2:11 PM, Boris Brezillon wrote:
> > > > > On Mon, 08 Sep 2025 13:11:32 +0200
> > > > > "Danilo Krummrich" <dakr@kernel.org> wrote:
> > > > > > No, drivers can't iterate the evict/extobj lists directly; or
> > > > > > at least this is
> > > > > > not intended by GPUVM's API and if drivers do so, this is
> > > > > > considered peeking
> > > > > > into GPUVM internals, so drivers are on their own anyways.
> > > > > >
> > > > > > Iterators, such as for_each_vm_bo_in_list() are not exposed
> > > > > > to drivers.
> > > > > Okay, that's a good thing. I thought Xe was doing some funky
> > > > > stuff with
> > > > > the list...
> > > > Maybe, I don't know. If they do so, the should send patches
> > > > adding the
> > > > corresponding iterators and provide a rationale why drivers need
> > > > to access those
> > > > lists directly and why we can't provide an API that handles the
> > > > overall
> > > > use-case, such as drm_gpuvm_prepare_objects(), etc.
> > >
> > > We're using the drm_gpuvm_*for_each* macros in drm_gpuvm.h,
> > > assuming from name
> > > and docs they are driver api.
> > >
> > > Also the drm_gem_for_each_gpuvm_bo(), although this usage could
> > > easily be
> > > converted to a helper.
> >
> > We were talking about the extobj/evict lists, the ones you mention
> > are fine of
> > course. :)
> >
>
> Hmm. Now on closer inspection it looks like we're checking for evict
> list empty, It looks like rebinding after validation may in theory
> evict some bos to system memory and then we'd rerun the validation step
> if the evict list was not empty.
>
> We could of course add a helper for that or if there are better
> suggestions to handle that situation, that'd be fine as well.

I don't think evict list empty means that there are no evicted GEMs.
It's possible for an extobj to be missing from the evict list in some
scenarios. That's why drm_gpuvm_prepare_objects_locked() checks
evicted on the extobj list to ensure that the evicted list is
up-to-date when you call into drm_gpuvm_validate_locked().

Alice

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-09 11:24                                   ` Alice Ryhl
@ 2025-09-09 11:28                                     ` Danilo Krummrich
  2025-09-09 11:46                                       ` Thomas Hellström
  0 siblings, 1 reply; 27+ messages in thread
From: Danilo Krummrich @ 2025-09-09 11:28 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Thomas Hellström, Boris Brezillon, Matthew Brost,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Steven Price, Daniel Almeida, Liviu Dudau,
	dri-devel, linux-kernel, rust-for-linux

On 9/9/25 1:24 PM, Alice Ryhl wrote:
> On Tue, Sep 9, 2025 at 1:11 PM Thomas Hellström
> <thomas.hellstrom@linux.intel.com> wrote:
>>
>> On Tue, 2025-09-09 at 12:47 +0200, Danilo Krummrich wrote:
>>> On 9/9/25 12:39 PM, Thomas Hellström wrote:
>>>> On 9/8/25 14:20, Danilo Krummrich wrote:
>>>>> On 9/8/25 2:11 PM, Boris Brezillon wrote:
>>>>>> On Mon, 08 Sep 2025 13:11:32 +0200
>>>>>> "Danilo Krummrich" <dakr@kernel.org> wrote:
>>>>>>> No, drivers can't iterate the evict/extobj lists directly; or
>>>>>>> at least this is
>>>>>>> not intended by GPUVM's API and if drivers do so, this is
>>>>>>> considered peeking
>>>>>>> into GPUVM internals, so drivers are on their own anyways.
>>>>>>>
>>>>>>> Iterators, such as for_each_vm_bo_in_list() are not exposed
>>>>>>> to drivers.
>>>>>> Okay, that's a good thing. I thought Xe was doing some funky
>>>>>> stuff with
>>>>>> the list...
>>>>> Maybe, I don't know. If they do so, the should send patches
>>>>> adding the
>>>>> corresponding iterators and provide a rationale why drivers need
>>>>> to access those
>>>>> lists directly and why we can't provide an API that handles the
>>>>> overall
>>>>> use-case, such as drm_gpuvm_prepare_objects(), etc.
>>>>
>>>> We're using the drm_gpuvm_*for_each* macros in drm_gpuvm.h,
>>>> assuming from name
>>>> and docs they are driver api.
>>>>
>>>> Also the drm_gem_for_each_gpuvm_bo(), although this usage could
>>>> easily be
>>>> converted to a helper.
>>>
>>> We were talking about the extobj/evict lists, the ones you mention
>>> are fine of
>>> course. :)
>>>
>>
>> Hmm. Now on closer inspection it looks like we're checking for evict
>> list empty, It looks like rebinding after validation may in theory
>> evict some bos to system memory and then we'd rerun the validation step
>> if the evict list was not empty.
>>
>> We could of course add a helper for that or if there are better
>> suggestions to handle that situation, that'd be fine as well.
> 
> I don't think evict list empty means that there are no evicted GEMs.
> It's possible for an extobj to be missing from the evict list in some
> scenarios. That's why drm_gpuvm_prepare_objects_locked() checks
> evicted on the extobj list to ensure that the evicted list is
> up-to-date when you call into drm_gpuvm_validate_locked().

Indeed, though I would expect that Xe considers that? It was Thomas who proposed
the logic you describe here back then IIRC. :)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup
  2025-09-09 11:28                                     ` Danilo Krummrich
@ 2025-09-09 11:46                                       ` Thomas Hellström
  0 siblings, 0 replies; 27+ messages in thread
From: Thomas Hellström @ 2025-09-09 11:46 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl
  Cc: Boris Brezillon, Matthew Brost, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Steven Price,
	Daniel Almeida, Liviu Dudau, dri-devel, linux-kernel,
	rust-for-linux

On Tue, 2025-09-09 at 13:28 +0200, Danilo Krummrich wrote:
> On 9/9/25 1:24 PM, Alice Ryhl wrote:
> > On Tue, Sep 9, 2025 at 1:11 PM Thomas Hellström
> > <thomas.hellstrom@linux.intel.com> wrote:
> > > 
> > > On Tue, 2025-09-09 at 12:47 +0200, Danilo Krummrich wrote:
> > > > On 9/9/25 12:39 PM, Thomas Hellström wrote:
> > > > > On 9/8/25 14:20, Danilo Krummrich wrote:
> > > > > > On 9/8/25 2:11 PM, Boris Brezillon wrote:
> > > > > > > On Mon, 08 Sep 2025 13:11:32 +0200
> > > > > > > "Danilo Krummrich" <dakr@kernel.org> wrote:
> > > > > > > > No, drivers can't iterate the evict/extobj lists
> > > > > > > > directly; or
> > > > > > > > at least this is
> > > > > > > > not intended by GPUVM's API and if drivers do so, this
> > > > > > > > is
> > > > > > > > considered peeking
> > > > > > > > into GPUVM internals, so drivers are on their own
> > > > > > > > anyways.
> > > > > > > > 
> > > > > > > > Iterators, such as for_each_vm_bo_in_list() are not
> > > > > > > > exposed
> > > > > > > > to drivers.
> > > > > > > Okay, that's a good thing. I thought Xe was doing some
> > > > > > > funky
> > > > > > > stuff with
> > > > > > > the list...
> > > > > > Maybe, I don't know. If they do so, the should send patches
> > > > > > adding the
> > > > > > corresponding iterators and provide a rationale why drivers
> > > > > > need
> > > > > > to access those
> > > > > > lists directly and why we can't provide an API that handles
> > > > > > the
> > > > > > overall
> > > > > > use-case, such as drm_gpuvm_prepare_objects(), etc.
> > > > > 
> > > > > We're using the drm_gpuvm_*for_each* macros in drm_gpuvm.h,
> > > > > assuming from name
> > > > > and docs they are driver api.
> > > > > 
> > > > > Also the drm_gem_for_each_gpuvm_bo(), although this usage
> > > > > could
> > > > > easily be
> > > > > converted to a helper.
> > > > 
> > > > We were talking about the extobj/evict lists, the ones you
> > > > mention
> > > > are fine of
> > > > course. :)
> > > > 
> > > 
> > > Hmm. Now on closer inspection it looks like we're checking for
> > > evict
> > > list empty, It looks like rebinding after validation may in
> > > theory
> > > evict some bos to system memory and then we'd rerun the
> > > validation step
> > > if the evict list was not empty.
> > > 
> > > We could of course add a helper for that or if there are better
> > > suggestions to handle that situation, that'd be fine as well.
> > 
> > I don't think evict list empty means that there are no evicted
> > GEMs.
> > It's possible for an extobj to be missing from the evict list in
> > some
> > scenarios. That's why drm_gpuvm_prepare_objects_locked() checks
> > evicted on the extobj list to ensure that the evicted list is
> > up-to-date when you call into drm_gpuvm_validate_locked().
> 
> Indeed, though I would expect that Xe considers that? It was Thomas
> who proposed
> the logic you describe here back then IIRC. :)
> 

Yeah I don't think that eviction-while-validating could happen to
extobjs, but rather to local objects, if it happens at all anymore,
we've made a lot of changes in that area.

But moving forward both the extobj scenario and local object scenarios
will probably have to be considered in OOM situations, but then we'd of
course need to suggest suitable additions to drm_gpuvm to handle that.

/Thomas





^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-09-09 11:46 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-05 12:11 [PATCH 0/2] Defer vm_bo cleanup in GPUVM with DRM_GPUVM_IMMEDIATE_MODE Alice Ryhl
2025-09-05 12:11 ` [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup Alice Ryhl
2025-09-05 13:25   ` Boris Brezillon
2025-09-05 18:18     ` Alice Ryhl
2025-09-05 22:47       ` Danilo Krummrich
2025-09-07 11:15         ` Alice Ryhl
2025-09-07 11:28           ` Danilo Krummrich
2025-09-07 11:39             ` Alice Ryhl
2025-09-07 11:44               ` Danilo Krummrich
2025-09-08  7:11               ` Boris Brezillon
2025-09-08  8:26                 ` Alice Ryhl
2025-09-08  8:47                   ` Danilo Krummrich
2025-09-08 10:20                     ` Boris Brezillon
2025-09-08 11:11                       ` Danilo Krummrich
2025-09-08 12:11                         ` Boris Brezillon
2025-09-08 12:20                           ` Danilo Krummrich
2025-09-09 10:39                             ` Thomas Hellström
2025-09-09 10:47                               ` Danilo Krummrich
2025-09-09 11:10                                 ` Thomas Hellström
2025-09-09 11:24                                   ` Alice Ryhl
2025-09-09 11:28                                     ` Danilo Krummrich
2025-09-09 11:46                                       ` Thomas Hellström
2025-09-08  9:37                   ` Boris Brezillon
2025-09-08  7:22           ` Boris Brezillon
2025-09-05 12:11 ` [PATCH 2/2] panthor: use drm_gpuva_unlink_defer() Alice Ryhl
2025-09-05 12:52   ` Boris Brezillon
2025-09-05 13:01     ` Alice Ryhl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).