From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Auld <matthew.auld@intel.com>, intel-xe@lists.freedesktop.org
Cc: Matthew Brost <matthew.brost@intel.com>,
Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
Jani Nikula <jani.nikula@intel.com>,
Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Subject: Re: [PATCH 15/15] drm/xe: Convert pinned suspend eviction for exhaustive eviction
Date: Wed, 13 Aug 2025 14:30:40 +0200 [thread overview]
Message-ID: <bdf454972b01f15a6144336f623df20fcd66d83c.camel@linux.intel.com> (raw)
In-Reply-To: <bce1c846-bdf9-4bf8-ba92-1baa896e2f70@intel.com>
On Wed, 2025-08-13 at 13:13 +0100, Matthew Auld wrote:
> Hi,
>
> On 13/08/2025 11:51, Thomas Hellström wrote:
> > Pinned suspend eviction and preparation for eviction validates
> > system memory for eviction buffers. Do that under a
> > validation exclusive lock to avoid interfering with other
> > processes validating system graphics memory.
> >
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_bo.c | 205 +++++++++++++++++++-------------
> > -----
> > 1 file changed, 108 insertions(+), 97 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index 82bf158426ad..efb9c88b6aa7 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -1139,43 +1139,47 @@ long xe_bo_shrink(struct ttm_operation_ctx
> > *ctx, struct ttm_buffer_object *bo,
> > int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
> > {
> > struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> > - struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > + struct xe_validation_ctx ctx;
> > + struct drm_exec exec;
> > struct xe_bo *backup;
> > int ret = 0;
> >
> > - xe_bo_lock(bo, false);
> > + xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, true) {
>
> Ah, this reminded me of
> https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288
>
> Could this help with that? If you could maybe keep the exlusive mode
> turned on for the entire prepare/unprepare stage, to ensure other
> execs/validates back off until we are done?
I have a fix in the pipe for that, but I reverted it just before
sending since the whole suspend / resume stuff needs some interface
additions. I'm seeing some false lockdep errors.
Basically xe_validation_guard is set to block on an interruptible
struct completion if we're suspending. We can only do this on freezable
tasks allowing interruptible waits ATM, otherwise these tasks will
block on the completion until freezing time and then never freeze.
Otherwise when freezing the wait is interrupted and an -ERESTARTSYS is
returned, and the task will be frozen in the signal delivery code. This
part I have verified to work.
This should take care of most validations during suspend, but not the
rebind worker. I figure we can handle it as well by using a freezable
workqueue and if the worker receives an -EINTR or -ERESTARTSYS it
doesn't error but simply requeues itself as newly added work items
aren't run until the wqs are thawed. This part I haven't verified,
though.
Finally this assumes that all validations under uninterruptible context
are too few to cause any problems during suspend.
/Thomas
>
> > + ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> > + drm_exec_retry_on_contention(&exec);
> > + xe_assert(xe, !ret);
> > + xe_assert(xe, !bo->backup_obj);
> >
> > - xe_assert(xe, !bo->backup_obj);
> > + /*
> > + * Since this is called from the PM notifier we
> > might have raced with
> > + * someone unpinning this after we dropped the
> > pinned list lock and
> > + * grabbing the above bo lock.
> > + */
> > + if (!xe_bo_is_pinned(bo))
> > + break;
> >
> > - /*
> > - * Since this is called from the PM notifier we might have
> > raced with
> > - * someone unpinning this after we dropped the pinned list
> > lock and
> > - * grabbing the above bo lock.
> > - */
> > - if (!xe_bo_is_pinned(bo))
> > - goto out_unlock_bo;
> > + if (!xe_bo_is_vram(bo))
> > + break;
> >
> > - if (!xe_bo_is_vram(bo))
> > - goto out_unlock_bo;
> > + if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> > + break;
> >
> > - if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> > - goto out_unlock_bo;
> > + backup = xe_bo_init_locked(xe, NULL, NULL, bo-
> > >ttm.base.resv, NULL, xe_bo_size(bo),
> > +
> > DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> > + XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > + XE_BO_FLAG_PINNED,
> > &exec);
> > + if (IS_ERR(backup)) {
> > + drm_exec_retry_on_contention(&exec);
> > + ret = PTR_ERR(backup);
> > + xe_validation_retry_on_oom(&ctx, &ret);
> > + break;
> > + }
> >
> > - backup = xe_bo_init_locked(xe, NULL, NULL, bo-
> > >ttm.base.resv, NULL, xe_bo_size(bo),
> > - DRM_XE_GEM_CPU_CACHING_WB,
> > ttm_bo_type_kernel,
> > - XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > - XE_BO_FLAG_PINNED, exec);
> > - if (IS_ERR(backup)) {
> > - ret = PTR_ERR(backup);
> > - goto out_unlock_bo;
> > + backup->parent_obj = xe_bo_get(bo); /* Released by
> > bo_destroy */
> > + ttm_bo_pin(&backup->ttm);
> > + bo->backup_obj = backup;
> > }
> >
> > - backup->parent_obj = xe_bo_get(bo); /* Released by
> > bo_destroy */
> > - ttm_bo_pin(&backup->ttm);
> > - bo->backup_obj = backup;
> > -
> > -out_unlock_bo:
> > - xe_bo_unlock(bo);
> > return ret;
> > }
> >
> > @@ -1215,99 +1219,106 @@ int xe_bo_notifier_unprepare_pinned(struct
> > xe_bo *bo)
> > int xe_bo_evict_pinned(struct xe_bo *bo)
> > {
> > struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> > - struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > + struct xe_validation_ctx ctx;
> > + struct drm_exec exec;
> > struct xe_bo *backup = bo->backup_obj;
> > bool backup_created = false;
> > bool unmap = false;
> > int ret = 0;
> >
> > - xe_bo_lock(bo, false);
> > + xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, true) {
> > + ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> > + drm_exec_retry_on_contention(&exec);
> > + xe_assert(xe, !ret);
> >
> > - if (WARN_ON(!bo->ttm.resource)) {
> > - ret = -EINVAL;
> > - goto out_unlock_bo;
> > - }
> > + if (WARN_ON(!bo->ttm.resource)) {
> > + ret = -EINVAL;
> > + break;
> > + }
> >
> > - if (WARN_ON(!xe_bo_is_pinned(bo))) {
> > - ret = -EINVAL;
> > - goto out_unlock_bo;
> > - }
> > + if (WARN_ON(!xe_bo_is_pinned(bo))) {
> > + ret = -EINVAL;
> > + break;
> > + }
> >
> > - if (!xe_bo_is_vram(bo))
> > - goto out_unlock_bo;
> > + if (!xe_bo_is_vram(bo))
> > + break;
> >
> > - if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> > - goto out_unlock_bo;
> > + if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> > + break;
> >
> > - if (!backup) {
> > - backup = xe_bo_init_locked(xe, NULL, NULL, bo-
> > >ttm.base.resv, NULL, xe_bo_size(bo),
> > -
> > DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> > - XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > - XE_BO_FLAG_PINNED,
> > exec);
> > - if (IS_ERR(backup)) {
> > - ret = PTR_ERR(backup);
> > - goto out_unlock_bo;
> > + if (!backup) {
> > + backup = xe_bo_init_locked(xe, NULL, NULL,
> > bo->ttm.base.resv, NULL,
> > + xe_bo_size(bo),
> > +
> > DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> > +
> > XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > +
> > XE_BO_FLAG_PINNED, &exec);
> > + if (IS_ERR(backup)) {
> > + drm_exec_retry_on_contention(&exec
> > );
> > + ret = PTR_ERR(backup);
> > + xe_validation_retry_on_oom(&ctx,
> > &ret);
> > + break;
> > + }
> > + backup->parent_obj = xe_bo_get(bo); /*
> > Released by bo_destroy */
> > + backup_created = true;
> > }
> > - backup->parent_obj = xe_bo_get(bo); /* Released by
> > bo_destroy */
> > - backup_created = true;
> > - }
> >
> > - if (xe_bo_is_user(bo) || (bo->flags &
> > XE_BO_FLAG_PINNED_LATE_RESTORE)) {
> > - struct xe_migrate *migrate;
> > - struct dma_fence *fence;
> > -
> > - if (bo->tile)
> > - migrate = bo->tile->migrate;
> > - else
> > - migrate = mem_type_to_migrate(xe, bo-
> > >ttm.resource->mem_type);
> > + if (xe_bo_is_user(bo) || (bo->flags &
> > XE_BO_FLAG_PINNED_LATE_RESTORE)) {
> > + struct xe_migrate *migrate;
> > + struct dma_fence *fence;
> >
> > - ret = dma_resv_reserve_fences(bo->ttm.base.resv,
> > 1);
> > - if (ret)
> > - goto out_backup;
> > + if (bo->tile)
> > + migrate = bo->tile->migrate;
> > + else
> > + migrate = mem_type_to_migrate(xe,
> > bo->ttm.resource->mem_type);
> >
> > - ret = dma_resv_reserve_fences(backup-
> > >ttm.base.resv, 1);
> > - if (ret)
> > - goto out_backup;
> > + ret = dma_resv_reserve_fences(bo-
> > >ttm.base.resv, 1);
> > + if (ret)
> > + goto out_backup;
> >
> > - fence = xe_migrate_copy(migrate, bo, backup, bo-
> > >ttm.resource,
> > - backup->ttm.resource,
> > false);
> > - if (IS_ERR(fence)) {
> > - ret = PTR_ERR(fence);
> > - goto out_backup;
> > - }
> > + ret = dma_resv_reserve_fences(backup-
> > >ttm.base.resv, 1);
> > + if (ret)
> > + goto out_backup;
> >
> > - dma_resv_add_fence(bo->ttm.base.resv, fence,
> > - DMA_RESV_USAGE_KERNEL);
> > - dma_resv_add_fence(backup->ttm.base.resv, fence,
> > - DMA_RESV_USAGE_KERNEL);
> > - dma_fence_put(fence);
> > - } else {
> > - ret = xe_bo_vmap(backup);
> > - if (ret)
> > - goto out_backup;
> > + fence = xe_migrate_copy(migrate, bo,
> > backup, bo->ttm.resource,
> > + backup-
> > >ttm.resource, false);
> > + if (IS_ERR(fence)) {
> > + ret = PTR_ERR(fence);
> > + goto out_backup;
> > + }
> >
> > - if (iosys_map_is_null(&bo->vmap)) {
> > - ret = xe_bo_vmap(bo);
> > + dma_resv_add_fence(bo->ttm.base.resv,
> > fence,
> > + DMA_RESV_USAGE_KERNEL);
> > + dma_resv_add_fence(backup->ttm.base.resv,
> > fence,
> > + DMA_RESV_USAGE_KERNEL);
> > + dma_fence_put(fence);
> > + } else {
> > + ret = xe_bo_vmap(backup);
> > if (ret)
> > goto out_backup;
> > - unmap = true;
> > - }
> >
> > - xe_map_memcpy_from(xe, backup->vmap.vaddr, &bo-
> > >vmap, 0,
> > - xe_bo_size(bo));
> > - }
> > + if (iosys_map_is_null(&bo->vmap)) {
> > + ret = xe_bo_vmap(bo);
> > + if (ret)
> > + goto out_vunmap;
> > + unmap = true;
> > + }
> >
> > - if (!bo->backup_obj)
> > - bo->backup_obj = backup;
> > + xe_map_memcpy_from(xe, backup->vmap.vaddr,
> > &bo->vmap, 0,
> > + xe_bo_size(bo));
> > + }
> >
> > + if (!bo->backup_obj)
> > + bo->backup_obj = backup;
> > +out_vunmap:
> > + xe_bo_vunmap(backup);
> > out_backup:
> > - xe_bo_vunmap(backup);
> > - if (ret && backup_created)
> > - xe_bo_put(backup);
> > -out_unlock_bo:
> > - if (unmap)
> > - xe_bo_vunmap(bo);
> > - xe_bo_unlock(bo);
> > + if (ret && backup_created)
> > + xe_bo_put(backup);
> > + if (unmap)
> > + xe_bo_vunmap(bo);
> > + }
> > +
> > return ret;
> > }
> >
>
next prev parent reply other threads:[~2025-08-13 12:30 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
2025-08-13 10:51 ` [PATCH 01/15] drm/xe/vm: Don't use a pin the vm_resv during validation Thomas Hellström
2025-08-13 14:28 ` Matthew Brost
2025-08-13 14:33 ` Thomas Hellström
2025-08-13 15:17 ` Matthew Brost
2025-08-13 10:51 ` [PATCH 02/15] drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member Thomas Hellström
2025-08-14 2:52 ` Matthew Brost
2025-08-13 10:51 ` [PATCH 03/15] drm/xe/vm: Clear the scratch_pt pointer on error Thomas Hellström
2025-08-13 14:45 ` Matthew Brost
2025-08-13 10:51 ` [PATCH 04/15] drm/xe: Pass down drm_exec context to validation Thomas Hellström
2025-08-13 16:42 ` Matthew Brost
2025-08-14 7:49 ` Thomas Hellström
2025-08-14 19:09 ` Matthew Brost
2025-08-22 7:40 ` Thomas Hellström
2025-08-13 10:51 ` [PATCH 05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec Thomas Hellström
2025-08-13 17:25 ` Matthew Brost
2025-08-15 15:04 ` Thomas Hellström
2025-08-14 2:33 ` Matthew Brost
2025-08-14 4:23 ` Matthew Brost
2025-08-15 15:23 ` Thomas Hellström
2025-08-15 19:01 ` Matthew Brost
2025-08-17 14:05 ` [05/15] " Simon Richter
2025-08-18 2:19 ` Matthew Brost
2025-08-18 5:24 ` Simon Richter
2025-08-18 9:19 ` Thomas Hellström
2025-08-13 10:51 ` [PATCH 06/15] drm/xe: Convert xe_bo_create_user() for exhaustive eviction Thomas Hellström
2025-08-14 2:23 ` Matthew Brost
2025-08-13 10:51 ` [PATCH 07/15] drm/xe: Convert SVM validation " Thomas Hellström
2025-08-13 15:32 ` Matthew Brost
2025-08-14 12:24 ` Thomas Hellström
2025-08-13 10:51 ` [PATCH 08/15] drm/xe: Convert existing drm_exec transactions " Thomas Hellström
2025-08-14 2:48 ` Matthew Brost
2025-08-13 10:51 ` [PATCH 09/15] drm/xe: Convert the CPU fault handler " Thomas Hellström
2025-08-13 22:06 ` Matthew Brost
2025-08-15 15:16 ` Thomas Hellström
2025-08-15 19:04 ` Matthew Brost
2025-08-18 9:11 ` Thomas Hellström
2025-08-13 10:51 ` [PATCH 10/15] drm/xe/display: Convert __xe_pin_fb_vma() Thomas Hellström
2025-08-14 2:35 ` Matthew Brost
2025-08-13 10:51 ` [PATCH 11/15] drm/xe: Convert xe_dma_buf.c for exhaustive eviction Thomas Hellström
2025-08-13 21:37 ` Matthew Brost
2025-08-15 15:05 ` Thomas Hellström
2025-08-14 20:37 ` Matthew Brost
2025-08-15 6:57 ` Thomas Hellström
2025-08-13 10:51 ` [PATCH 12/15] drm/xe: Rename ___xe_bo_create_locked() Thomas Hellström
2025-08-13 21:33 ` Matthew Brost
2025-08-13 10:51 ` [PATCH 13/15] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction Thomas Hellström
2025-08-14 3:58 ` Matthew Brost
2025-08-15 15:25 ` Thomas Hellström
2025-08-14 4:05 ` Matthew Brost
2025-08-15 15:27 ` Thomas Hellström
2025-08-14 18:48 ` Matthew Brost
2025-08-15 9:37 ` Thomas Hellström
2025-08-13 10:51 ` [PATCH 14/15] drm/xe: Convert xe_bo_create_pin_map() " Thomas Hellström
2025-08-14 4:18 ` Matthew Brost
2025-08-14 13:14 ` Thomas Hellström
2025-08-14 18:39 ` Matthew Brost
2025-08-13 10:51 ` [PATCH 15/15] drm/xe: Convert pinned suspend eviction " Thomas Hellström
2025-08-13 12:13 ` Matthew Auld
2025-08-13 12:30 ` Thomas Hellström [this message]
2025-08-14 20:30 ` Matthew Brost
2025-08-15 15:29 ` Thomas Hellström
2025-08-13 11:54 ` ✗ CI.checkpatch: warning for Driver-managed " Patchwork
2025-08-13 11:55 ` ✓ CI.KUnit: success " Patchwork
2025-08-13 13:20 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-08-13 14:25 ` ✗ Xe.CI.Full: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bdf454972b01f15a6144336f623df20fcd66d83c.camel@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jani.nikula@intel.com \
--cc=joonas.lahtinen@linux.intel.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).