Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Auld <matthew.auld@intel.com>, intel-xe@lists.freedesktop.org
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>,
	stable@vger.kernel.org, Matthew Brost <matthew.brost@intel.com>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Subject: Re: [PATCH v3 3/3] drm/xe: Block exec and rebind worker while evicting for suspend / hibernate
Date: Thu, 04 Sep 2025 17:30:36 +0200	[thread overview]
Message-ID: <053d1e397ffc03736f4a1012216f9cdc0bb53f4e.camel@linux.intel.com> (raw)
In-Reply-To: <723f3c49-45e5-494e-b06a-977f37e5dfb9@intel.com>

On Thu, 2025-09-04 at 16:22 +0100, Matthew Auld wrote:
> On 04/09/2025 14:06, Thomas Hellström wrote:
> > When the xe pm_notifier evicts for suspend / hibernate, there might
> > be
> > racing tasks trying to re-validate again. This can lead to suspend
> > taking
> > excessive time or get stuck in a live-lock. This behaviour becomes
> > much worse with the fix that actually makes re-validation bring
> > back
> > bos to VRAM rather than letting them remain in TT.
> > 
> > Prevent that by having exec and the rebind worker waiting for a
> > completion
> > that is set to block by the pm_notifier before suspend and is
> > signaled
> > by the pm_notifier after resume / wakeup.
> > 
> > It's probably still possible to craft malicious applications that
> > block
> > suspending. More work is pending to fix that.
> > 
> > v3:
> > - Avoid wait_for_completion() in the kernel worker since it could
> >    potentially cause work item flushes from freezable processes to
> >    wait forever. Instead terminate the rebind workers if needed and
> >    re-launch at resume. (Matt Auld)
> > 
> > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288
> > Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > Cc: <stable@vger.kernel.org> # v6.16+
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_device_types.h |  6 ++++
> >   drivers/gpu/drm/xe/xe_exec.c         |  9 ++++++
> >   drivers/gpu/drm/xe/xe_pm.c           | 20 ++++++++++++
> >   drivers/gpu/drm/xe/xe_vm.c           | 46
> > +++++++++++++++++++++++++++-
> >   drivers/gpu/drm/xe/xe_vm.h           |  2 ++
> >   drivers/gpu/drm/xe/xe_vm_types.h     |  5 +++
> >   6 files changed, 87 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > b/drivers/gpu/drm/xe/xe_device_types.h
> > index 092004d14db2..1e780f8a2a8c 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -507,6 +507,12 @@ struct xe_device {
> >   
> >   	/** @pm_notifier: Our PM notifier to perform actions in
> > response to various PM events. */
> >   	struct notifier_block pm_notifier;
> > +	/** @pm_block: Completion to block validating tasks on
> > suspend / hibernate prepare */
> > +	struct completion pm_block;
> > +	/** @rebind_resume_list: List of wq items to kick on
> > resume. */
> > +	struct list_head rebind_resume_list;
> > +	/** @rebind_resume_lock: Lock to protect the
> > rebind_resume_list */
> > +	struct mutex rebind_resume_lock;
> >   
> >   	/** @pmt: Support the PMT driver callback interface */
> >   	struct {
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c
> > b/drivers/gpu/drm/xe/xe_exec.c
> > index 44364c042ad7..374c831e691b 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -237,6 +237,15 @@ int xe_exec_ioctl(struct drm_device *dev, void
> > *data, struct drm_file *file)
> >   		goto err_unlock_list;
> >   	}
> >   
> > +	/*
> > +	 * It's OK to block interruptible here with the vm lock
> > held, since
> > +	 * on task freezing during suspend / hibernate, the call
> > will
> > +	 * return -ERESTARTSYS and the IOCTL will be rerun.
> > +	 */
> > +	err = wait_for_completion_interruptible(&xe->pm_block);
> > +	if (err)
> > +		goto err_unlock_list;
> > +
> >   	vm_exec.vm = &vm->gpuvm;
> >   	vm_exec.flags = DRM_EXEC_INTERRUPTIBLE_WAIT;
> >   	if (xe_vm_in_lr_mode(vm)) {
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c
> > b/drivers/gpu/drm/xe/xe_pm.c
> > index bee9aacd82e7..6d59990ff6ba 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -297,6 +297,18 @@ static u32 vram_threshold_value(struct
> > xe_device *xe)
> >   	return DEFAULT_VRAM_THRESHOLD;
> >   }
> >   
> > +static void xe_pm_wake_preempt_workers(struct xe_device *xe)
> > +{
> > +	struct list_head *link, *next;
> > +
> > +	mutex_lock(&xe->rebind_resume_lock);
> > +	list_for_each_safe(link, next, &xe->rebind_resume_list) {
> > +		list_del_init(link);
> > +		xe_vm_resume_preempt_worker(link);
> > +	}
> > +	mutex_unlock(&xe->rebind_resume_lock);
> > +}
> > +
> >   static int xe_pm_notifier_callback(struct notifier_block *nb,
> >   				   unsigned long action, void
> > *data)
> >   {
> > @@ -306,6 +318,7 @@ static int xe_pm_notifier_callback(struct
> > notifier_block *nb,
> >   	switch (action) {
> >   	case PM_HIBERNATION_PREPARE:
> >   	case PM_SUSPEND_PREPARE:
> > +		reinit_completion(&xe->pm_block);
> >   		xe_pm_runtime_get(xe);
> >   		err = xe_bo_evict_all_user(xe);
> >   		if (err)
> > @@ -322,6 +335,8 @@ static int xe_pm_notifier_callback(struct
> > notifier_block *nb,
> >   		break;
> >   	case PM_POST_HIBERNATION:
> >   	case PM_POST_SUSPEND:
> > +		complete_all(&xe->pm_block);
> > +		xe_pm_wake_preempt_workers(xe);
> >   		xe_bo_notifier_unprepare_all_pinned(xe);
> >   		xe_pm_runtime_put(xe);
> >   		break;
> > @@ -348,6 +363,11 @@ int xe_pm_init(struct xe_device *xe)
> >   	if (err)
> >   		return err;
> >   
> > +	init_completion(&xe->pm_block);
> > +	complete_all(&xe->pm_block);
> > +	mutex_init(&xe->rebind_resume_lock);
> 
> err = drmm_mutex_init(&xe->rebind_resume_lock);

Sure.

> 
> ?
> 
> > +	INIT_LIST_HEAD(&xe->rebind_resume_list);
> > +
> >   	/* For now suspend/resume is only allowed with GuC */
> >   	if (!xe_device_uc_enabled(xe))
> >   		return 0;
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > b/drivers/gpu/drm/xe/xe_vm.c
> > index f55f96bb240a..97aad1d53a8c 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -394,6 +394,9 @@ static int xe_gpuvm_validate(struct
> > drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
> >   		list_move_tail(&gpuva_to_vma(gpuva)-
> > >combined_links.rebind,
> >   			       &vm->rebind_list);
> >   
> > +	if (!try_wait_for_completion(&vm->xe->pm_block))
> > +		return -EAGAIN;
> > +
> >   	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false);
> >   	if (ret)
> >   		return ret;
> > @@ -480,6 +483,37 @@ static int xe_preempt_work_begin(struct
> > drm_exec *exec, struct xe_vm *vm,
> >   	return xe_vm_validate_rebind(vm, exec, vm-
> > >preempt.num_exec_queues);
> >   }
> >   
> > +static bool vm_suspend_preempt_worker(struct xe_vm *vm)
> > +{
> > +	struct xe_device *xe = vm->xe;
> > +	bool ret = false;
> > +
> > +	mutex_lock(&xe->rebind_resume_lock);
> > +	if (!try_wait_for_completion(&vm->xe->pm_block)) {
> > +		ret = true;
> > +		list_move_tail(&vm->preempt.pm_activate_link, &xe-
> > >rebind_resume_list);
> > +	}
> > +	pr_info("Suspending %p\n", vm);
> > +	mutex_unlock(&xe->rebind_resume_lock);
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * xe_vm_resume_preempt_worker() - Resume the preempt worker.
> > + * @vm: The vm whose preempt worker to resume.
> > + *
> > + * Resume a preempt worker that was previously suspended by
> > + * vm_suspend_preempt_worker().
> > + */
> > +void xe_vm_resume_preempt_worker(struct list_head *link)
> 
> I guess should use vm arg here?

Yes I tried to avoid downcasting from within xe_pm.c but since we need
to include xe_vm.h anyway, I might as well do that.

Thanks for reviewing!

/Thomas




> 
> > +{
> > +	struct xe_vm *vm = container_of(link, typeof(*vm),
> > preempt.pm_activate_link);
> > +
> > +	pr_info("Resuming %p\n", vm);
> > +	queue_work(vm->xe->ordered_wq, &vm->preempt.rebind_work);
> > +}
> > +
> >   static void preempt_rebind_work_func(struct work_struct *w)
> >   {
> >   	struct xe_vm *vm = container_of(w, struct xe_vm,
> > preempt.rebind_work);
> > @@ -503,6 +537,11 @@ static void preempt_rebind_work_func(struct
> > work_struct *w)
> >   	}
> >   
> >   retry:
> > +	if (!try_wait_for_completion(&vm->xe->)) &&
> > vm_suspend_preempt_worker(vm)) {
> > +		up_write(&vm->lock);
> > +		return;
> > +	}
> > +
> >   	if (xe_vm_userptr_check_repin(vm)) {
> >   		err = xe_vm_userptr_pin(vm);
> >   		if (err)
> > @@ -1741,6 +1780,7 @@ struct xe_vm *xe_vm_create(struct xe_device
> > *xe, u32 flags, struct xe_file *xef)
> >   	if (flags & XE_VM_FLAG_LR_MODE) {
> >   		INIT_WORK(&vm->preempt.rebind_work,
> > preempt_rebind_work_func);
> >   		xe_pm_runtime_get_noresume(xe);
> > +		INIT_LIST_HEAD(&vm->preempt.pm_activate_link);
> >   	}
> >   
> >   	if (flags & XE_VM_FLAG_FAULT_MODE) {
> > @@ -1922,8 +1962,12 @@ void xe_vm_close_and_put(struct xe_vm *vm)
> >   	xe_assert(xe, !vm->preempt.num_exec_queues);
> >   
> >   	xe_vm_close(vm);
> > -	if (xe_vm_in_preempt_fence_mode(vm))
> > +	if (xe_vm_in_preempt_fence_mode(vm)) {
> > +		mutex_lock(&xe->rebind_resume_lock);
> > +		list_del_init(&vm->preempt.pm_activate_link);
> > +		mutex_unlock(&xe->rebind_resume_lock);
> >   		flush_work(&vm->preempt.rebind_work);
> > +	}
> >   	if (xe_vm_in_fault_mode(vm))
> >   		xe_svm_close(vm);
> >   
> > diff --git a/drivers/gpu/drm/xe/xe_vm.h
> > b/drivers/gpu/drm/xe/xe_vm.h
> > index b3e5bec0fa58..f2639794278b 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.h
> > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > @@ -281,6 +281,8 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct
> > xe_vm *vm, struct xe_bo *bo,
> >   				       struct xe_exec_queue *q,
> > u64 addr,
> >   				       enum xe_cache_level
> > cache_lvl);
> >   
> > +void xe_vm_resume_preempt_worker(struct list_head *link);
> > +
> >   /**
> >    * xe_vm_resv() - Return's the vm's reservation object
> >    * @vm: The vm
> > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> > b/drivers/gpu/drm/xe/xe_vm_types.h
> > index b5108d010786..e1a786db5f89 100644
> > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > @@ -338,6 +338,11 @@ struct xe_vm {
> >   		 * BOs
> >   		 */
> >   		struct work_struct rebind_work;
> > +		/**
> > +		 * @preempt.pm_activate_link: Link to list of
> > rebind workers to be
> > +		 * kicked on resume.
> > +		 */
> > +		struct list_head pm_activate_link;
> >   	} preempt;
> >   
> >   	/** @um: unified memory state */
> 


  reply	other threads:[~2025-09-04 15:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-04 13:06 [PATCH v3 0/3] drm/xe: Fixes around eviction and suspend Thomas Hellström
2025-09-04 13:06 ` [PATCH v3 1/3] drm/xe: Attempt to bring bos back to VRAM after eviction Thomas Hellström
2025-09-04 13:06 ` [PATCH v3 2/3] drm/xe: Allow the pm notifier to continue on failure Thomas Hellström
2025-09-04 13:06 ` [PATCH v3 3/3] drm/xe: Block exec and rebind worker while evicting for suspend / hibernate Thomas Hellström
2025-09-04 15:22   ` Matthew Auld
2025-09-04 15:30     ` Thomas Hellström [this message]
2025-09-04 13:13 ` ✓ CI.KUnit: success for drm/xe: Fixes around eviction and suspend (rev3) Patchwork
2025-09-04 14:02 ` ✓ Xe.CI.BAT: " Patchwork
2025-09-05  2:52 ` ✗ Xe.CI.Full: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=053d1e397ffc03736f4a1012216f9cdc0bb53f4e.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox