stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] drm/xe: Attempt to bring bos back to VRAM after eviction
       [not found] <20250829113350.40959-1-thomas.hellstrom@linux.intel.com>
@ 2025-08-29 11:33 ` Thomas Hellström
  2025-08-29 11:33 ` [PATCH 2/3] drm/xe: Allow the pm notifier to continue on failure Thomas Hellström
  2025-08-29 11:33 ` [PATCH 3/3] drm/xe: Block exec and rebind worker while evicting for suspend / hibernate Thomas Hellström
  2 siblings, 0 replies; 7+ messages in thread
From: Thomas Hellström @ 2025-08-29 11:33 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Matthew Auld, stable,
	Rodrigo vivi, Maarten Lankhorst

VRAM+TT bos that are evicted from VRAM to TT may remain in
TT also after a revalidation following eviction or suspend.

This manifests itself as applications becoming sluggish
after buffer objects get evicted or after a resume from
suspend or hibernation.

If the bo supports placement in both VRAM and TT, and
we are on DGFX, mark the TT placement as fallback. This means
that it is tried only after VRAM + eviction.

This flaw has probably been present since the xe module was
upstreamed but use a Fixes: commit below where backporting is
likely to be simple. For earlier versions we need to open-
code the fallback algorithm in the driver.

v2:
- Remove check for dgfx. (Matthew Auld)
- Update the xe_dma_buf kunit test for the new strategy (CI)
- Allow dma-buf to pin in current placement (CI)
- Make xe_bo_validate() for pinned bos a NOP.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5995
Fixes: a78a8da51b36 ("drm/ttm: replace busy placement with flags v6")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.9+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_bo.c      |  2 +-
 drivers/gpu/drm/xe/tests/xe_dma_buf.c | 10 +---------
 drivers/gpu/drm/xe/xe_bo.c            | 16 ++++++++++++----
 drivers/gpu/drm/xe/xe_bo.h            |  2 +-
 drivers/gpu/drm/xe/xe_dma_buf.c       |  2 +-
 5 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
index bb469096d072..7b40cc8be1c9 100644
--- a/drivers/gpu/drm/xe/tests/xe_bo.c
+++ b/drivers/gpu/drm/xe/tests/xe_bo.c
@@ -236,7 +236,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
 		}
 
 		xe_bo_lock(external, false);
-		err = xe_bo_pin_external(external);
+		err = xe_bo_pin_external(external, false);
 		xe_bo_unlock(external);
 		if (err) {
 			KUNIT_FAIL(test, "external bo pin err=%pe\n",
diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
index cde9530bef8c..3f30d2cef917 100644
--- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
@@ -85,15 +85,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
 		return;
 	}
 
-	/*
-	 * If on different devices, the exporter is kept in system  if
-	 * possible, saving a migration step as the transfer is just
-	 * likely as fast from system memory.
-	 */
-	if (params->mem_mask & XE_BO_FLAG_SYSTEM)
-		KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(exported, XE_PL_TT));
-	else
-		KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(exported, mem_type));
+	KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(exported, mem_type));
 
 	if (params->force_different_devices)
 		KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(imported, XE_PL_TT));
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 4faf15d5fa6d..870f43347281 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -188,6 +188,8 @@ static void try_add_system(struct xe_device *xe, struct xe_bo *bo,
 
 		bo->placements[*c] = (struct ttm_place) {
 			.mem_type = XE_PL_TT,
+			.flags = (bo_flags & XE_BO_FLAG_VRAM_MASK) ?
+			TTM_PL_FLAG_FALLBACK : 0,
 		};
 		*c += 1;
 	}
@@ -2322,6 +2324,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
 /**
  * xe_bo_pin_external - pin an external BO
  * @bo: buffer object to be pinned
+ * @in_place: Pin in current placement, don't attempt to migrate.
  *
  * Pin an external (not tied to a VM, can be exported via dma-buf / prime FD)
  * BO. Unique call compared to xe_bo_pin as this function has it own set of
@@ -2329,7 +2332,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
  *
  * Returns 0 for success, negative error code otherwise.
  */
-int xe_bo_pin_external(struct xe_bo *bo)
+int xe_bo_pin_external(struct xe_bo *bo, bool in_place)
 {
 	struct xe_device *xe = xe_bo_device(bo);
 	int err;
@@ -2338,9 +2341,11 @@ int xe_bo_pin_external(struct xe_bo *bo)
 	xe_assert(xe, xe_bo_is_user(bo));
 
 	if (!xe_bo_is_pinned(bo)) {
-		err = xe_bo_validate(bo, NULL, false);
-		if (err)
-			return err;
+		if (!in_place) {
+			err = xe_bo_validate(bo, NULL, false);
+			if (err)
+				return err;
+		}
 
 		spin_lock(&xe->pinned.lock);
 		list_add_tail(&bo->pinned_link, &xe->pinned.late.external);
@@ -2493,6 +2498,9 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
 	};
 	int ret;
 
+	if (xe_bo_is_pinned(bo))
+		return 0;
+
 	if (vm) {
 		lockdep_assert_held(&vm->lock);
 		xe_vm_assert_held(vm);
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 8cce413b5235..cfb1ec266a6d 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -200,7 +200,7 @@ static inline void xe_bo_unlock_vm_held(struct xe_bo *bo)
 	}
 }
 
-int xe_bo_pin_external(struct xe_bo *bo);
+int xe_bo_pin_external(struct xe_bo *bo, bool in_place);
 int xe_bo_pin(struct xe_bo *bo);
 void xe_bo_unpin_external(struct xe_bo *bo);
 void xe_bo_unpin(struct xe_bo *bo);
diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 346f857f3837..af64baf872ef 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -72,7 +72,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
 		return ret;
 	}
 
-	ret = xe_bo_pin_external(bo);
+	ret = xe_bo_pin_external(bo, true);
 	xe_assert(xe, !ret);
 
 	return 0;
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] drm/xe: Allow the pm notifier to continue on failure
       [not found] <20250829113350.40959-1-thomas.hellstrom@linux.intel.com>
  2025-08-29 11:33 ` [PATCH 1/3] drm/xe: Attempt to bring bos back to VRAM after eviction Thomas Hellström
@ 2025-08-29 11:33 ` Thomas Hellström
  2025-08-29 14:53   ` Rodrigo Vivi
  2025-08-29 15:50   ` Matthew Auld
  2025-08-29 11:33 ` [PATCH 3/3] drm/xe: Block exec and rebind worker while evicting for suspend / hibernate Thomas Hellström
  2 siblings, 2 replies; 7+ messages in thread
From: Thomas Hellström @ 2025-08-29 11:33 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Auld, Rodrigo Vivi, stable,
	Matthew Brost, Maarten Lankhorst

Its actions are opportunistic anyway and will be completed
on device suspend.

Also restrict the scope of the pm runtime reference to
the notifier callback itself to make it easier to
follow.

Marking as a fix to simplify backporting of the fix
that follows in the series.

Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_pm.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index a2e85030b7f4..b57b46ad9f7c 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -308,28 +308,22 @@ static int xe_pm_notifier_callback(struct notifier_block *nb,
 	case PM_SUSPEND_PREPARE:
 		xe_pm_runtime_get(xe);
 		err = xe_bo_evict_all_user(xe);
-		if (err) {
+		if (err)
 			drm_dbg(&xe->drm, "Notifier evict user failed (%d)\n", err);
-			xe_pm_runtime_put(xe);
-			break;
-		}
 
 		err = xe_bo_notifier_prepare_all_pinned(xe);
-		if (err) {
+		if (err)
 			drm_dbg(&xe->drm, "Notifier prepare pin failed (%d)\n", err);
-			xe_pm_runtime_put(xe);
-		}
+		xe_pm_runtime_put(xe);
 		break;
 	case PM_POST_HIBERNATION:
 	case PM_POST_SUSPEND:
+		xe_pm_runtime_get(xe);
 		xe_bo_notifier_unprepare_all_pinned(xe);
 		xe_pm_runtime_put(xe);
 		break;
 	}
 
-	if (err)
-		return NOTIFY_BAD;
-
 	return NOTIFY_DONE;
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] drm/xe: Block exec and rebind worker while evicting for suspend / hibernate
       [not found] <20250829113350.40959-1-thomas.hellstrom@linux.intel.com>
  2025-08-29 11:33 ` [PATCH 1/3] drm/xe: Attempt to bring bos back to VRAM after eviction Thomas Hellström
  2025-08-29 11:33 ` [PATCH 2/3] drm/xe: Allow the pm notifier to continue on failure Thomas Hellström
@ 2025-08-29 11:33 ` Thomas Hellström
  2 siblings, 0 replies; 7+ messages in thread
From: Thomas Hellström @ 2025-08-29 11:33 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Auld, Rodrigo Vivi, stable,
	Matthew Brost, Maarten Lankhorst

When the xe pm_notifier evicts for suspend / hibernate, there might be
racing tasks trying to re-validate again. This can lead to suspend taking
excessive time or get stuck in a live-lock. This behaviour becomes
much worse with the fix that actually makes re-validation bring back
bos to VRAM rather than letting them remain in TT.

Prevent that by having exec and the rebind worker waiting for a completion
that is set to block by the pm_notifier before suspend and is signaled
by the pm_notifier after resume / wakeup.

It's probably still possible to craft malicious applications that block
suspending. More work is pending to fix that.

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288
Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h |  2 ++
 drivers/gpu/drm/xe/xe_exec.c         |  9 +++++++++
 drivers/gpu/drm/xe/xe_pm.c           |  4 ++++
 drivers/gpu/drm/xe/xe_vm.c           | 14 ++++++++++++++
 4 files changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 092004d14db2..6602bd678cbc 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -507,6 +507,8 @@ struct xe_device {
 
 	/** @pm_notifier: Our PM notifier to perform actions in response to various PM events. */
 	struct notifier_block pm_notifier;
+	/** @pm_block: Completion to block validating tasks on suspend / hibernate prepare */
+	struct completion pm_block;
 
 	/** @pmt: Support the PMT driver callback interface */
 	struct {
diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 44364c042ad7..374c831e691b 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -237,6 +237,15 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		goto err_unlock_list;
 	}
 
+	/*
+	 * It's OK to block interruptible here with the vm lock held, since
+	 * on task freezing during suspend / hibernate, the call will
+	 * return -ERESTARTSYS and the IOCTL will be rerun.
+	 */
+	err = wait_for_completion_interruptible(&xe->pm_block);
+	if (err)
+		goto err_unlock_list;
+
 	vm_exec.vm = &vm->gpuvm;
 	vm_exec.flags = DRM_EXEC_INTERRUPTIBLE_WAIT;
 	if (xe_vm_in_lr_mode(vm)) {
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index b57b46ad9f7c..2d7b05d8a78b 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -306,6 +306,7 @@ static int xe_pm_notifier_callback(struct notifier_block *nb,
 	switch (action) {
 	case PM_HIBERNATION_PREPARE:
 	case PM_SUSPEND_PREPARE:
+		reinit_completion(&xe->pm_block);
 		xe_pm_runtime_get(xe);
 		err = xe_bo_evict_all_user(xe);
 		if (err)
@@ -318,6 +319,7 @@ static int xe_pm_notifier_callback(struct notifier_block *nb,
 		break;
 	case PM_POST_HIBERNATION:
 	case PM_POST_SUSPEND:
+		complete_all(&xe->pm_block);
 		xe_pm_runtime_get(xe);
 		xe_bo_notifier_unprepare_all_pinned(xe);
 		xe_pm_runtime_put(xe);
@@ -345,6 +347,8 @@ int xe_pm_init(struct xe_device *xe)
 	if (err)
 		return err;
 
+	init_completion(&xe->pm_block);
+	complete_all(&xe->pm_block);
 	/* For now suspend/resume is only allowed with GuC */
 	if (!xe_device_uc_enabled(xe))
 		return 0;
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 3ff3c67aa79d..edcdb0528f2a 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -394,6 +394,9 @@ static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
 		list_move_tail(&gpuva_to_vma(gpuva)->combined_links.rebind,
 			       &vm->rebind_list);
 
+	if (!try_wait_for_completion(&vm->xe->pm_block))
+		return -EAGAIN;
+
 	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false);
 	if (ret)
 		return ret;
@@ -494,6 +497,12 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	xe_assert(vm->xe, xe_vm_in_preempt_fence_mode(vm));
 	trace_xe_vm_rebind_worker_enter(vm);
 
+	/*
+	 * This blocks the wq during suspend / hibernate.
+	 * Don't hold any locks.
+	 */
+retry_pm:
+	wait_for_completion(&vm->xe->pm_block);
 	down_write(&vm->lock);
 
 	if (xe_vm_is_closed_or_banned(vm)) {
@@ -503,6 +512,11 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	}
 
 retry:
+	if (!try_wait_for_completion(&vm->xe->pm_block)) {
+		up_write(&vm->lock);
+		goto retry_pm;
+	}
+
 	if (xe_vm_userptr_check_repin(vm)) {
 		err = xe_vm_userptr_pin(vm);
 		if (err)
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] drm/xe: Allow the pm notifier to continue on failure
  2025-08-29 11:33 ` [PATCH 2/3] drm/xe: Allow the pm notifier to continue on failure Thomas Hellström
@ 2025-08-29 14:53   ` Rodrigo Vivi
  2025-08-29 15:50   ` Matthew Auld
  1 sibling, 0 replies; 7+ messages in thread
From: Rodrigo Vivi @ 2025-08-29 14:53 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Matthew Auld, stable, Matthew Brost, Maarten Lankhorst

On Fri, Aug 29, 2025 at 01:33:49PM +0200, Thomas Hellström wrote:
> Its actions are opportunistic anyway and will be completed
> on device suspend.
> 
> Also restrict the scope of the pm runtime reference to
> the notifier callback itself to make it easier to
> follow.
> 
> Marking as a fix to simplify backporting of the fix
> that follows in the series.
> 
> Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: <stable@vger.kernel.org> # v6.16+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_pm.c | 14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> index a2e85030b7f4..b57b46ad9f7c 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -308,28 +308,22 @@ static int xe_pm_notifier_callback(struct notifier_block *nb,
>  	case PM_SUSPEND_PREPARE:
>  		xe_pm_runtime_get(xe);
>  		err = xe_bo_evict_all_user(xe);
> -		if (err) {
> +		if (err)
>  			drm_dbg(&xe->drm, "Notifier evict user failed (%d)\n", err);
> -			xe_pm_runtime_put(xe);
> -			break;
> -		}
>  
>  		err = xe_bo_notifier_prepare_all_pinned(xe);
> -		if (err) {
> +		if (err)
>  			drm_dbg(&xe->drm, "Notifier prepare pin failed (%d)\n", err);
> -			xe_pm_runtime_put(xe);
> -		}
> +		xe_pm_runtime_put(xe);
>  		break;
>  	case PM_POST_HIBERNATION:
>  	case PM_POST_SUSPEND:
> +		xe_pm_runtime_get(xe);
>  		xe_bo_notifier_unprepare_all_pinned(xe);
>  		xe_pm_runtime_put(xe);
>  		break;
>  	}
>  
> -	if (err)
> -		return NOTIFY_BAD;
> -
>  	return NOTIFY_DONE;
>  }
>  
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] drm/xe: Allow the pm notifier to continue on failure
  2025-08-29 11:33 ` [PATCH 2/3] drm/xe: Allow the pm notifier to continue on failure Thomas Hellström
  2025-08-29 14:53   ` Rodrigo Vivi
@ 2025-08-29 15:50   ` Matthew Auld
  2025-08-29 17:15     ` Rodrigo Vivi
  1 sibling, 1 reply; 7+ messages in thread
From: Matthew Auld @ 2025-08-29 15:50 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Rodrigo Vivi, stable, Matthew Brost, Maarten Lankhorst

On 29/08/2025 12:33, Thomas Hellström wrote:
> Its actions are opportunistic anyway and will be completed
> on device suspend.
> 
> Also restrict the scope of the pm runtime reference to
> the notifier callback itself to make it easier to
> follow.
> 
> Marking as a fix to simplify backporting of the fix
> that follows in the series.
> 
> Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: <stable@vger.kernel.org> # v6.16+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/xe/xe_pm.c | 14 ++++----------
>   1 file changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> index a2e85030b7f4..b57b46ad9f7c 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -308,28 +308,22 @@ static int xe_pm_notifier_callback(struct notifier_block *nb,
>   	case PM_SUSPEND_PREPARE:
>   		xe_pm_runtime_get(xe);
>   		err = xe_bo_evict_all_user(xe);
> -		if (err) {
> +		if (err)
>   			drm_dbg(&xe->drm, "Notifier evict user failed (%d)\n", err);
> -			xe_pm_runtime_put(xe);
> -			break;
> -		}
>   
>   		err = xe_bo_notifier_prepare_all_pinned(xe);
> -		if (err) {
> +		if (err)
>   			drm_dbg(&xe->drm, "Notifier prepare pin failed (%d)\n", err);
> -			xe_pm_runtime_put(xe);
> -		}
> +		xe_pm_runtime_put(xe);

IIRC I was worried that this ends up triggering runtime suspend at some 
later point and then something wakes it up again before the actual 
forced suspend triggers, which looks like it would undo all the 
prepare_all_pinned() work, so I opted for keeping it held over the 
entire sequence. Is that not a concern here?

>   		break;
>   	case PM_POST_HIBERNATION:
>   	case PM_POST_SUSPEND:
> +		xe_pm_runtime_get(xe);
>   		xe_bo_notifier_unprepare_all_pinned(xe);
>   		xe_pm_runtime_put(xe);
>   		break;
>   	}
>   
> -	if (err)
> -		return NOTIFY_BAD;
> -
>   	return NOTIFY_DONE;
>   }
>   


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] drm/xe: Allow the pm notifier to continue on failure
  2025-08-29 15:50   ` Matthew Auld
@ 2025-08-29 17:15     ` Rodrigo Vivi
  2025-08-29 20:56       ` Thomas Hellström
  0 siblings, 1 reply; 7+ messages in thread
From: Rodrigo Vivi @ 2025-08-29 17:15 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Thomas Hellström, intel-xe, stable, Matthew Brost,
	Maarten Lankhorst

On Fri, Aug 29, 2025 at 04:50:01PM +0100, Matthew Auld wrote:
> On 29/08/2025 12:33, Thomas Hellström wrote:
> > Its actions are opportunistic anyway and will be completed
> > on device suspend.
> > 
> > Also restrict the scope of the pm runtime reference to
> > the notifier callback itself to make it easier to
> > follow.
> > 
> > Marking as a fix to simplify backporting of the fix
> > that follows in the series.
> > 
> > Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > Cc: <stable@vger.kernel.org> # v6.16+
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_pm.c | 14 ++++----------
> >   1 file changed, 4 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > index a2e85030b7f4..b57b46ad9f7c 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -308,28 +308,22 @@ static int xe_pm_notifier_callback(struct notifier_block *nb,
> >   	case PM_SUSPEND_PREPARE:
> >   		xe_pm_runtime_get(xe);
> >   		err = xe_bo_evict_all_user(xe);
> > -		if (err) {
> > +		if (err)
> >   			drm_dbg(&xe->drm, "Notifier evict user failed (%d)\n", err);
> > -			xe_pm_runtime_put(xe);
> > -			break;
> > -		}
> >   		err = xe_bo_notifier_prepare_all_pinned(xe);
> > -		if (err) {
> > +		if (err)
> >   			drm_dbg(&xe->drm, "Notifier prepare pin failed (%d)\n", err);
> > -			xe_pm_runtime_put(xe);
> > -		}
> > +		xe_pm_runtime_put(xe);
> 
> IIRC I was worried that this ends up triggering runtime suspend at some
> later point and then something wakes it up again before the actual forced
> suspend triggers, which looks like it would undo all the
> prepare_all_pinned() work, so I opted for keeping it held over the entire
> sequence. Is that not a concern here?

I was seeing this more like a umbrella to shut-up our inner callers warnings
since runtime pm references will be taken prior to pm state transitions
by base/power.

However on a quick look I couldn't connect the core code that takes the
runtime pm directly with this notify now. So, perhaps let's indeed play
safe and keep our own references here?!...

> 
> >   		break;
> >   	case PM_POST_HIBERNATION:
> >   	case PM_POST_SUSPEND:
> > +		xe_pm_runtime_get(xe);
> >   		xe_bo_notifier_unprepare_all_pinned(xe);
> >   		xe_pm_runtime_put(xe);
> >   		break;
> >   	}
> > -	if (err)
> > -		return NOTIFY_BAD;
> > -
> >   	return NOTIFY_DONE;
> >   }
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] drm/xe: Allow the pm notifier to continue on failure
  2025-08-29 17:15     ` Rodrigo Vivi
@ 2025-08-29 20:56       ` Thomas Hellström
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Hellström @ 2025-08-29 20:56 UTC (permalink / raw)
  To: Rodrigo Vivi, Matthew Auld
  Cc: intel-xe, stable, Matthew Brost, Maarten Lankhorst

On Fri, 2025-08-29 at 13:15 -0400, Rodrigo Vivi wrote:
> On Fri, Aug 29, 2025 at 04:50:01PM +0100, Matthew Auld wrote:
> > On 29/08/2025 12:33, Thomas Hellström wrote:
> > > Its actions are opportunistic anyway and will be completed
> > > on device suspend.
> > > 
> > > Also restrict the scope of the pm runtime reference to
> > > the notifier callback itself to make it easier to
> > > follow.
> > > 
> > > Marking as a fix to simplify backporting of the fix
> > > that follows in the series.
> > > 
> > > Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
> > > Cc: Matthew Auld <matthew.auld@intel.com>
> > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > Cc: <stable@vger.kernel.org> # v6.16+
> > > Signed-off-by: Thomas Hellström
> > > <thomas.hellstrom@linux.intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/xe_pm.c | 14 ++++----------
> > >   1 file changed, 4 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_pm.c
> > > b/drivers/gpu/drm/xe/xe_pm.c
> > > index a2e85030b7f4..b57b46ad9f7c 100644
> > > --- a/drivers/gpu/drm/xe/xe_pm.c
> > > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > > @@ -308,28 +308,22 @@ static int xe_pm_notifier_callback(struct
> > > notifier_block *nb,
> > >   	case PM_SUSPEND_PREPARE:
> > >   		xe_pm_runtime_get(xe);
> > >   		err = xe_bo_evict_all_user(xe);
> > > -		if (err) {
> > > +		if (err)
> > >   			drm_dbg(&xe->drm, "Notifier evict user
> > > failed (%d)\n", err);
> > > -			xe_pm_runtime_put(xe);
> > > -			break;
> > > -		}
> > >   		err = xe_bo_notifier_prepare_all_pinned(xe);
> > > -		if (err) {
> > > +		if (err)
> > >   			drm_dbg(&xe->drm, "Notifier prepare pin
> > > failed (%d)\n", err);
> > > -			xe_pm_runtime_put(xe);
> > > -		}
> > > +		xe_pm_runtime_put(xe);
> > 
> > IIRC I was worried that this ends up triggering runtime suspend at
> > some
> > later point and then something wakes it up again before the actual
> > forced
> > suspend triggers, which looks like it would undo all the
> > prepare_all_pinned() work, so I opted for keeping it held over the
> > entire
> > sequence. Is that not a concern here?

Good point.

> 
> I was seeing this more like a umbrella to shut-up our inner callers
> warnings
> since runtime pm references will be taken prior to pm state
> transitions
> by base/power.
> 
> However on a quick look I couldn't connect the core code that takes
> the
> runtime pm directly with this notify now. So, perhaps let's indeed
> play
> safe and keep our own references here?!...

I'll keep the references, and perhaps add a comment about that so that
nobody tries this again.

Any concerns about ignoring errors?

/Thomas


> 
> > 
> > >   		break;
> > >   	case PM_POST_HIBERNATION:
> > >   	case PM_POST_SUSPEND:
> > > +		xe_pm_runtime_get(xe);
> > >   		xe_bo_notifier_unprepare_all_pinned(xe);
> > >   		xe_pm_runtime_put(xe);
> > >   		break;
> > >   	}
> > > -	if (err)
> > > -		return NOTIFY_BAD;
> > > -
> > >   	return NOTIFY_DONE;
> > >   }
> > 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-08-29 20:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250829113350.40959-1-thomas.hellstrom@linux.intel.com>
2025-08-29 11:33 ` [PATCH 1/3] drm/xe: Attempt to bring bos back to VRAM after eviction Thomas Hellström
2025-08-29 11:33 ` [PATCH 2/3] drm/xe: Allow the pm notifier to continue on failure Thomas Hellström
2025-08-29 14:53   ` Rodrigo Vivi
2025-08-29 15:50   ` Matthew Auld
2025-08-29 17:15     ` Rodrigo Vivi
2025-08-29 20:56       ` Thomas Hellström
2025-08-29 11:33 ` [PATCH 3/3] drm/xe: Block exec and rebind worker while evicting for suspend / hibernate Thomas Hellström

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).