From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Auld <matthew.auld@intel.com>, intel-xe@lists.freedesktop.org
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>,
stable@vger.kernel.org, Matthew Brost <matthew.brost@intel.com>,
Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Subject: Re: [PATCH v3 3/3] drm/xe: Block exec and rebind worker while evicting for suspend / hibernate
Date: Thu, 04 Sep 2025 17:30:36 +0200 [thread overview]
Message-ID: <053d1e397ffc03736f4a1012216f9cdc0bb53f4e.camel@linux.intel.com> (raw)
In-Reply-To: <723f3c49-45e5-494e-b06a-977f37e5dfb9@intel.com>
On Thu, 2025-09-04 at 16:22 +0100, Matthew Auld wrote:
> On 04/09/2025 14:06, Thomas Hellström wrote:
> > When the xe pm_notifier evicts for suspend / hibernate, there might
> > be
> > racing tasks trying to re-validate again. This can lead to suspend
> > taking
> > excessive time or get stuck in a live-lock. This behaviour becomes
> > much worse with the fix that actually makes re-validation bring
> > back
> > bos to VRAM rather than letting them remain in TT.
> >
> > Prevent that by having exec and the rebind worker waiting for a
> > completion
> > that is set to block by the pm_notifier before suspend and is
> > signaled
> > by the pm_notifier after resume / wakeup.
> >
> > It's probably still possible to craft malicious applications that
> > block
> > suspending. More work is pending to fix that.
> >
> > v3:
> > - Avoid wait_for_completion() in the kernel worker since it could
> > potentially cause work item flushes from freezable processes to
> > wait forever. Instead terminate the rebind workers if needed and
> > re-launch at resume. (Matt Auld)
> >
> > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288
> > Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > Cc: <stable@vger.kernel.org> # v6.16+
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_device_types.h | 6 ++++
> > drivers/gpu/drm/xe/xe_exec.c | 9 ++++++
> > drivers/gpu/drm/xe/xe_pm.c | 20 ++++++++++++
> > drivers/gpu/drm/xe/xe_vm.c | 46
> > +++++++++++++++++++++++++++-
> > drivers/gpu/drm/xe/xe_vm.h | 2 ++
> > drivers/gpu/drm/xe/xe_vm_types.h | 5 +++
> > 6 files changed, 87 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > b/drivers/gpu/drm/xe/xe_device_types.h
> > index 092004d14db2..1e780f8a2a8c 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -507,6 +507,12 @@ struct xe_device {
> >
> > /** @pm_notifier: Our PM notifier to perform actions in
> > response to various PM events. */
> > struct notifier_block pm_notifier;
> > + /** @pm_block: Completion to block validating tasks on
> > suspend / hibernate prepare */
> > + struct completion pm_block;
> > + /** @rebind_resume_list: List of wq items to kick on
> > resume. */
> > + struct list_head rebind_resume_list;
> > + /** @rebind_resume_lock: Lock to protect the
> > rebind_resume_list */
> > + struct mutex rebind_resume_lock;
> >
> > /** @pmt: Support the PMT driver callback interface */
> > struct {
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c
> > b/drivers/gpu/drm/xe/xe_exec.c
> > index 44364c042ad7..374c831e691b 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -237,6 +237,15 @@ int xe_exec_ioctl(struct drm_device *dev, void
> > *data, struct drm_file *file)
> > goto err_unlock_list;
> > }
> >
> > + /*
> > + * It's OK to block interruptible here with the vm lock
> > held, since
> > + * on task freezing during suspend / hibernate, the call
> > will
> > + * return -ERESTARTSYS and the IOCTL will be rerun.
> > + */
> > + err = wait_for_completion_interruptible(&xe->pm_block);
> > + if (err)
> > + goto err_unlock_list;
> > +
> > vm_exec.vm = &vm->gpuvm;
> > vm_exec.flags = DRM_EXEC_INTERRUPTIBLE_WAIT;
> > if (xe_vm_in_lr_mode(vm)) {
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c
> > b/drivers/gpu/drm/xe/xe_pm.c
> > index bee9aacd82e7..6d59990ff6ba 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -297,6 +297,18 @@ static u32 vram_threshold_value(struct
> > xe_device *xe)
> > return DEFAULT_VRAM_THRESHOLD;
> > }
> >
> > +static void xe_pm_wake_preempt_workers(struct xe_device *xe)
> > +{
> > + struct list_head *link, *next;
> > +
> > + mutex_lock(&xe->rebind_resume_lock);
> > + list_for_each_safe(link, next, &xe->rebind_resume_list) {
> > + list_del_init(link);
> > + xe_vm_resume_preempt_worker(link);
> > + }
> > + mutex_unlock(&xe->rebind_resume_lock);
> > +}
> > +
> > static int xe_pm_notifier_callback(struct notifier_block *nb,
> > unsigned long action, void
> > *data)
> > {
> > @@ -306,6 +318,7 @@ static int xe_pm_notifier_callback(struct
> > notifier_block *nb,
> > switch (action) {
> > case PM_HIBERNATION_PREPARE:
> > case PM_SUSPEND_PREPARE:
> > + reinit_completion(&xe->pm_block);
> > xe_pm_runtime_get(xe);
> > err = xe_bo_evict_all_user(xe);
> > if (err)
> > @@ -322,6 +335,8 @@ static int xe_pm_notifier_callback(struct
> > notifier_block *nb,
> > break;
> > case PM_POST_HIBERNATION:
> > case PM_POST_SUSPEND:
> > + complete_all(&xe->pm_block);
> > + xe_pm_wake_preempt_workers(xe);
> > xe_bo_notifier_unprepare_all_pinned(xe);
> > xe_pm_runtime_put(xe);
> > break;
> > @@ -348,6 +363,11 @@ int xe_pm_init(struct xe_device *xe)
> > if (err)
> > return err;
> >
> > + init_completion(&xe->pm_block);
> > + complete_all(&xe->pm_block);
> > + mutex_init(&xe->rebind_resume_lock);
>
> err = drmm_mutex_init(&xe->rebind_resume_lock);
Sure.
>
> ?
>
> > + INIT_LIST_HEAD(&xe->rebind_resume_list);
> > +
> > /* For now suspend/resume is only allowed with GuC */
> > if (!xe_device_uc_enabled(xe))
> > return 0;
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > b/drivers/gpu/drm/xe/xe_vm.c
> > index f55f96bb240a..97aad1d53a8c 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -394,6 +394,9 @@ static int xe_gpuvm_validate(struct
> > drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
> > list_move_tail(&gpuva_to_vma(gpuva)-
> > >combined_links.rebind,
> > &vm->rebind_list);
> >
> > + if (!try_wait_for_completion(&vm->xe->pm_block))
> > + return -EAGAIN;
> > +
> > ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false);
> > if (ret)
> > return ret;
> > @@ -480,6 +483,37 @@ static int xe_preempt_work_begin(struct
> > drm_exec *exec, struct xe_vm *vm,
> > return xe_vm_validate_rebind(vm, exec, vm-
> > >preempt.num_exec_queues);
> > }
> >
> > +static bool vm_suspend_preempt_worker(struct xe_vm *vm)
> > +{
> > + struct xe_device *xe = vm->xe;
> > + bool ret = false;
> > +
> > + mutex_lock(&xe->rebind_resume_lock);
> > + if (!try_wait_for_completion(&vm->xe->pm_block)) {
> > + ret = true;
> > + list_move_tail(&vm->preempt.pm_activate_link, &xe-
> > >rebind_resume_list);
> > + }
> > + pr_info("Suspending %p\n", vm);
> > + mutex_unlock(&xe->rebind_resume_lock);
> > +
> > + return ret;
> > +}
> > +
> > +/**
> > + * xe_vm_resume_preempt_worker() - Resume the preempt worker.
> > + * @vm: The vm whose preempt worker to resume.
> > + *
> > + * Resume a preempt worker that was previously suspended by
> > + * vm_suspend_preempt_worker().
> > + */
> > +void xe_vm_resume_preempt_worker(struct list_head *link)
>
> I guess should use vm arg here?
Yes I tried to avoid downcasting from within xe_pm.c but since we need
to include xe_vm.h anyway, I might as well do that.
Thanks for reviewing!
/Thomas
>
> > +{
> > + struct xe_vm *vm = container_of(link, typeof(*vm),
> > preempt.pm_activate_link);
> > +
> > + pr_info("Resuming %p\n", vm);
> > + queue_work(vm->xe->ordered_wq, &vm->preempt.rebind_work);
> > +}
> > +
> > static void preempt_rebind_work_func(struct work_struct *w)
> > {
> > struct xe_vm *vm = container_of(w, struct xe_vm,
> > preempt.rebind_work);
> > @@ -503,6 +537,11 @@ static void preempt_rebind_work_func(struct
> > work_struct *w)
> > }
> >
> > retry:
> > + if (!try_wait_for_completion(&vm->xe->)) &&
> > vm_suspend_preempt_worker(vm)) {
> > + up_write(&vm->lock);
> > + return;
> > + }
> > +
> > if (xe_vm_userptr_check_repin(vm)) {
> > err = xe_vm_userptr_pin(vm);
> > if (err)
> > @@ -1741,6 +1780,7 @@ struct xe_vm *xe_vm_create(struct xe_device
> > *xe, u32 flags, struct xe_file *xef)
> > if (flags & XE_VM_FLAG_LR_MODE) {
> > INIT_WORK(&vm->preempt.rebind_work,
> > preempt_rebind_work_func);
> > xe_pm_runtime_get_noresume(xe);
> > + INIT_LIST_HEAD(&vm->preempt.pm_activate_link);
> > }
> >
> > if (flags & XE_VM_FLAG_FAULT_MODE) {
> > @@ -1922,8 +1962,12 @@ void xe_vm_close_and_put(struct xe_vm *vm)
> > xe_assert(xe, !vm->preempt.num_exec_queues);
> >
> > xe_vm_close(vm);
> > - if (xe_vm_in_preempt_fence_mode(vm))
> > + if (xe_vm_in_preempt_fence_mode(vm)) {
> > + mutex_lock(&xe->rebind_resume_lock);
> > + list_del_init(&vm->preempt.pm_activate_link);
> > + mutex_unlock(&xe->rebind_resume_lock);
> > flush_work(&vm->preempt.rebind_work);
> > + }
> > if (xe_vm_in_fault_mode(vm))
> > xe_svm_close(vm);
> >
> > diff --git a/drivers/gpu/drm/xe/xe_vm.h
> > b/drivers/gpu/drm/xe/xe_vm.h
> > index b3e5bec0fa58..f2639794278b 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.h
> > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > @@ -281,6 +281,8 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct
> > xe_vm *vm, struct xe_bo *bo,
> > struct xe_exec_queue *q,
> > u64 addr,
> > enum xe_cache_level
> > cache_lvl);
> >
> > +void xe_vm_resume_preempt_worker(struct list_head *link);
> > +
> > /**
> > * xe_vm_resv() - Return's the vm's reservation object
> > * @vm: The vm
> > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> > b/drivers/gpu/drm/xe/xe_vm_types.h
> > index b5108d010786..e1a786db5f89 100644
> > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > @@ -338,6 +338,11 @@ struct xe_vm {
> > * BOs
> > */
> > struct work_struct rebind_work;
> > + /**
> > + * @preempt.pm_activate_link: Link to list of
> > rebind workers to be
> > + * kicked on resume.
> > + */
> > + struct list_head pm_activate_link;
> > } preempt;
> >
> > /** @um: unified memory state */
>
next prev parent reply other threads:[~2025-09-04 15:30 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-04 13:06 [PATCH v3 0/3] drm/xe: Fixes around eviction and suspend Thomas Hellström
2025-09-04 13:06 ` [PATCH v3 1/3] drm/xe: Attempt to bring bos back to VRAM after eviction Thomas Hellström
2025-09-04 13:06 ` [PATCH v3 2/3] drm/xe: Allow the pm notifier to continue on failure Thomas Hellström
2025-09-04 13:06 ` [PATCH v3 3/3] drm/xe: Block exec and rebind worker while evicting for suspend / hibernate Thomas Hellström
2025-09-04 15:22 ` Matthew Auld
2025-09-04 15:30 ` Thomas Hellström [this message]
2025-09-04 13:13 ` ✓ CI.KUnit: success for drm/xe: Fixes around eviction and suspend (rev3) Patchwork
2025-09-04 14:02 ` ✓ Xe.CI.BAT: " Patchwork
2025-09-05 2:52 ` ✗ Xe.CI.Full: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=053d1e397ffc03736f4a1012216f9cdc0bb53f4e.camel@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.