Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: "Yadav, Arvind" <arvind.yadav@intel.com>, intel-xe@lists.freedesktop.org
Cc: matthew.brost@intel.com, himal.prasad.ghimiray@intel.com
Subject: Re: [RFC PATCH 3/9] drm/xe/madvise: Implement purgeable buffer object support
Date: Thu, 30 Oct 2025 09:17:07 +0100	[thread overview]
Message-ID: <29fdde6904e00bcaaefc9148c97d7cdd833c5bbc.camel@linux.intel.com> (raw)
In-Reply-To: <59ccc1f9-efb6-4584-93ca-7114a644a851@intel.com>

On Thu, 2025-10-30 at 12:33 +0530, Yadav, Arvind wrote:
> 
> On 29-10-2025 16:21, Thomas Hellström wrote:
> > On Wed, 2025-10-29 at 09:55 +0100, Thomas Hellström wrote:
> > > On Tue, 2025-10-28 at 17:54 +0530, Arvind Yadav wrote:
> > > > This allows userspace applications to provide memory usage
> > > > hints to
> > > > the kernel for better memory management under pressure:
> > > > 
> > > > - WILLNEED: BO will be needed again, re-validate if purged
> > > > - DONTNEED: BO not currently needed, may be purged if needed
> > > > 
> > > > When userspace marks BO as DONTNEED, the kernel can reclaim
> > > > their memory during memory pressure. BO transition to PURGED
> > > > state when reclaimed, and attempting to access purged buffers
> > > > triggers appropriate fault handling.
> > > > 
> > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> > > > Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> > > > ---
> > > >   drivers/gpu/drm/xe/xe_bo.c         | 75
> > > > +++++++++++++++++++++++++-
> > > > --
> > > > --
> > > >   drivers/gpu/drm/xe/xe_vm_madvise.c | 67
> > > > ++++++++++++++++++++++++++
> > > >   2 files changed, 130 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > > b/drivers/gpu/drm/xe/xe_bo.c
> > > > index cbc3ee157218..3b3eb83658cc 100644
> > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > @@ -836,6 +836,60 @@ static int xe_bo_move_notify(struct xe_bo
> > > > *bo,
> > > >   	return 0;
> > > >   }
> > > >   
> > > > +static int xe_bo_invalidate_tlb_before_purge(struct xe_bo *bo)
> > > In the future someone might want to reuse this function for
> > > invalidating somewhere else. Could we perhaps rename to
> > > xe_bo_invalidate_vmas() or something like that?
> Noted.
> > > 
> > > 
> > > > +{
> > > > +	struct drm_gpuvm_bo *vm_bo;
> > > > +	struct drm_gpuva *gpuva;
> > > > +	struct drm_gem_object *obj = &bo->ttm.base;
> > > > +	int ret;
> > > > +
> > > > +	/* BO must be locked before invalidating */
> > > > +	dma_resv_assert_held(bo->ttm.base.resv);
> > > > +
> > > > +	drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
> > > > +		drm_gpuvm_bo_for_each_va(gpuva, vm_bo) {
> > > > +			struct xe_vma *vma =
> > > > gpuva_to_vma(gpuva);
> > > > +
> > > > +			ret = xe_vm_invalidate_vma(vma);
> > > > +			if (ret)
> > > > +				return ret;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static void xe_bo_set_purged(struct xe_bo *bo)
> > > > +{
> > > > +	/* BO must be locked before modifying madv state */
> > > > +	dma_resv_assert_held(bo->ttm.base.resv);
> > > > +
> > > > +	atomic_set(&bo->madv_purgeable,
> > > > XE_MADV_PURGEABLE_PURGED);
> > > > +}
> > > > +
> > > > +static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo,
> > > > struct
> > > > ttm_operation_ctx *ctx)
> > > > +{
> > > > +	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
> > > > +	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
> > > > +
> > > > +	if (ttm_bo->ttm) {
> > > > +		struct ttm_placement place = {};
> > > > +		int ret = ttm_bo_validate(ttm_bo, &place,
> > > > ctx);
> > > > +		int ret_inval;
> > > Christian from AMD once mentioned that instead of implicitly
> > > calling
> > > ttm_bo_validate() with an empty placement, we could send the null
> > > placement through the evict_flags callback. Would that work?
> > > 
> > > 
> > Actually it doesn't since we don't get to call move_notify.
> Agreed,
> > > 
> > > > +
> > > > +		drm_WARN_ON(&xe->drm, ret);
> > > > +		if (!ret && bo) {
> > > > +			if (atomic_read(&bo->madv_purgeable)
> > > > ==
> > > > XE_MADV_PURGEABLE_DONTNEED) {
> > > > +				/* Invalidate TLB before
> > > > marking
> > > > BO
> > > > as purged */
> > > > +				ret_inval =
> > > > xe_bo_invalidate_tlb_before_purge(bo);
> > > Since the page-table update and page-freeing is really intended
> > > to be
> > > an asynchronous operation, and the GPU bindings are intended to
> > > be
> > > invalidated in move_notify() / trigger_rebind() where we properly
> > > take
> > > care of special cases like faulting VMs etc, can we move the
> > > invalidation logic there?
> > > 
> > > Perhaps it is even possible to skip the synchronous page-table
> > > zeroing
> > > here in favour of a NULL rebind (when rebinding a purged BO we
> > > set up
> > > all zero mappings, or whatever mappings are required given
> > > scratch
> > > page
> > > mode etc.) Then the page-table clearing will be properly inserted
> > > in
> > > the asynchronous execution.
> > > 
> 
> My understanding is that you are suggesting two main changes:
> 
> 1. Asynchronous Invalidation: I should remove the synchronous 
> xe_bo_invalidate_tlb_before_purge() call. Instead, I should rely on
> the 
> existing asynchronous invalidation path that is already triggered
> during 
> eviction via xe_bo_move_notify() -> xe_bo_trigger_rebind(). This will
> handle the TLB invalidation correctly and more efficiently.
> 
> 2. NULL Rebind on Access(for VM Faulting Mode is ENABLED): When a GPU
> operation tries to bind a buffer that is in the PURGED state, the
> driver 
> should not error. Instead, it should perform a "NULL rebind" by
> mapping 
> the buffer's VMA to a scratch page. This ensures the GPU reads safe, 
> zeroed data instead of accessing invalid memory.

Yes. We want scratch PTEs if the VM is scratch-page-enabled, and the
"clear_pt" mode otherwise.

> 
> The BO would only be re-allocated and re-bound when userspace
> explicitly 
> Call WILLNEED.
> I’ll now look into implementing the NULL rebind logic within the VMA 
> mapping path and will follow up with an updated patch.

IMO we should try to mimic the i915 behaviour here, since it's already
implemented in mesa: That is, if we call WILLNEED on an already purged
bo, we return an error.

/Thomas
> 
> ~Arvind
> > > > +				if (!ret_inval)
> > > > +					xe_bo_set_purged(bo);
> > > > +
> > > > +			}
> > > > +		}
> > > 
> > > 
> > > > +	}
> > > > +}
> > > > +
> > > >   static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool
> > > > evict,
> > > >   		      struct ttm_operation_ctx *ctx,
> > > >   		      struct ttm_resource *new_mem,
> > > > @@ -853,8 +907,14 @@ static int xe_bo_move(struct
> > > > ttm_buffer_object
> > > > *ttm_bo, bool evict,
> > > >   	bool needs_clear;
> > > >   	bool handle_system_ccs = (!IS_DGFX(xe) &&
> > > > xe_bo_needs_ccs_pages(bo) &&
> > > >   				  ttm &&
> > > > ttm_tt_is_populated(ttm))
> > > > ?
> > > > true : false;
> > > > +	int state = atomic_read(&bo->madv_purgeable);
> > > >   	int ret = 0;
> > > >   
> > > > +	if (evict && state == XE_MADV_PURGEABLE_DONTNEED) {
> > > > +		xe_ttm_bo_purge(ttm_bo, ctx);
> > > > +			return 0;
> > > > +	}
> > > > +
> > > >   	/* Bo creation path, moving to system or TT. */
> > > >   	if ((!old_mem && ttm) && !handle_system_ccs) {
> > > >   		if (new_mem->mem_type == XE_PL_TT)
> > > > @@ -1606,18 +1666,6 @@ static void
> > > > xe_ttm_bo_delete_mem_notify(struct
> > > > ttm_buffer_object *ttm_bo)
> > > >   	}
> > > >   }
> > > >   
> > > > -static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo,
> > > > struct
> > > > ttm_operation_ctx *ctx)
> > > > -{
> > > > -	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
> > > > -
> > > > -	if (ttm_bo->ttm) {
> > > > -		struct ttm_placement place = {};
> > > > -		int ret = ttm_bo_validate(ttm_bo, &place,
> > > > ctx);
> > > > -
> > > > -		drm_WARN_ON(&xe->drm, ret);
> > > > -	}
> > > > -}
> > > > -
> > > >   static void xe_ttm_bo_swap_notify(struct ttm_buffer_object
> > > > *ttm_bo)
> > > >   {
> > > >   	struct ttm_operation_ctx ctx = {
> > > > @@ -2472,6 +2520,9 @@ struct xe_bo *xe_bo_create_user(struct
> > > > xe_device *xe,
> > > >   				       ttm_bo_type_device,
> > > > flags,
> > > > 0,
> > > > true);
> > > >   	}
> > > >   
> > > > +	/* Initialize purge advisory state */
> > > > +	atomic_set(&bo->madv_purgeable,
> > > > XE_MADV_PURGEABLE_WILLNEED);
> > > > +
> > > >   	return bo;
> > > >   }
> > > >   
> > > > diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c
> > > > b/drivers/gpu/drm/xe/xe_vm_madvise.c
> > > > index cad3cf627c3f..1f0356ea4403 100644
> > > > --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> > > > +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> > > > @@ -158,6 +158,54 @@ static void madvise_pat_index(struct
> > > > xe_device
> > > > *xe, struct xe_vm *vm,
> > > >   	}
> > > >   }
> > > >   
> > > > +/*
> > > > + * Handle purgeable buffer object advice for
> > > > DONTNEED/WILLNEED/PURGED.
> > > > + * Returns 0 on success, negative errno on error.
> > > > + */
> > > > +static void xe_vm_madvise_purgeable_bo(struct xe_device *xe,
> > > > struct
> > > > xe_vm *vm,
> > > > +				       struct xe_vma **vmas,
> > > > int
> > > > num_vmas,
> > > > +				       struct drm_xe_madvise
> > > > *op,
> > > > struct drm_exec *exec)
> > > > +{
> > > > +
> > > > +	xe_assert(vm->xe, op->type ==
> > > > DRM_XE_VMA_ATTR_PURGEABLE_STATE);
> > > > +
> > > > +	for (int i = 0; i < num_vmas; i++) {
> > > > +		struct xe_bo *bo = xe_vma_bo(vmas[i]);
> > > > +		int state;
> > > > +		int ret;
> > > > +
> > > > +		if (!bo)
> > > > +			continue;
> > > > +
> > > > +		/* BO must be locked before modifying madv
> > > > state
> > > > */
> > > > +		dma_resv_assert_held(bo->ttm.base.resv);
> > > > +
> > > > +		switch (op->purge_state_val.val) {
> > > > +		case DRM_XE_VMA_PURGEABLE_STATE_WILLNEED:
> > > > +			state = atomic_read(&bo-
> > > > >madv_purgeable);
> > > > +			if (state == XE_MADV_PURGEABLE_PURGED)
> > > > {
> > > > +				ret = xe_bo_validate(bo, NULL,
> > > > true,
> > > > exec);
> > > > +				if (ret) {
> > > > +					drm_err(&vm->xe->drm,
> > > > +						"Failed to
> > > > validate
> > > > purged BO: %d\n", ret);
> > > > +					return;
> > > > +				}
> > > > +			}
> > > > +			atomic_set(&bo->madv_purgeable,
> > > > XE_MADV_PURGEABLE_WILLNEED);
> > > > +			break;
> > > > +		case DRM_XE_VMA_PURGEABLE_STATE_DONTNEED:
> > > > +			state = atomic_read(&bo-
> > > > >madv_purgeable);
> > > > +			if (state != XE_MADV_PURGEABLE_PURGED)
> > > > +				atomic_set(&bo-
> > > > >madv_purgeable,
> > > > XE_MADV_PURGEABLE_DONTNEED);
> > > > +			break;
> > > > +		default:
> > > > +			drm_warn(&vm->xe->drm, "Invalid
> > > > madvice
> > > > value = %d\n",
> > > > +				 op->purge_state_val.val);
> > > > +			return;
> > > > +		}
> > > > +	}
> > > > +}
> > > > +
> > > >   typedef void (*madvise_func)(struct xe_device *xe, struct
> > > > xe_vm
> > > > *vm,
> > > >   			     struct xe_vma **vmas, int
> > > > num_vmas,
> > > >   			     struct drm_xe_madvise *op);
> > > > @@ -283,6 +331,19 @@ static bool madvise_args_are_sane(struct
> > > > xe_device *xe, const struct drm_xe_madv
> > > >   			return false;
> > > >   		break;
> > > >   	}
> > > > +	case DRM_XE_VMA_ATTR_PURGEABLE_STATE:
> > > > +	{
> > > > +		u32 val = args->purge_state_val.val;
> > > > +
> > > > +		if (XE_IOCTL_DBG(xe, !((val ==
> > > > DRM_XE_VMA_PURGEABLE_STATE_WILLNEED) ||
> > > > +				       (val ==
> > > > DRM_XE_VMA_PURGEABLE_STATE_DONTNEED))))
> > > > +			return false;
> > > > +
> > > > +		if (XE_IOCTL_DBG(xe, args-
> > > > > purge_state_val.reserved))
> > > > +			return false;
> > > > +
> > > > +		break;
> > > > +	}
> > > >   	default:
> > > >   		if (XE_IOCTL_DBG(xe, 1))
> > > >   			return false;
> > > > @@ -402,6 +463,12 @@ int xe_vm_madvise_ioctl(struct drm_device
> > > > *dev,
> > > > void *data, struct drm_file *fil
> > > >   					goto err_fini;
> > > >   			}
> > > >   		}
> > > > +		if (args->type ==
> > > > DRM_XE_VMA_ATTR_PURGEABLE_STATE)
> > > > {
> > > > +			xe_vm_madvise_purgeable_bo(xe, vm,
> > > > madvise_range.vmas,
> > > > +						
> > > > madvise_range.num_vmas, args, &exec);
> > > > +			goto err_fini;
> > > > +
> > > > +		}
> > > >   	}
> > > >   
> > > >   	if (madvise_range.has_svm_userptr_vmas) {


  reply	other threads:[~2025-10-30  8:17 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-28 12:24 [RFC PATCH 0/9] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 1/9] drm/xe/uapi: Add UAPI " Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 2/9] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 3/9] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
2025-10-29  8:55   ` Thomas Hellström
2025-10-29 10:51     ` Thomas Hellström
2025-10-30  7:03       ` Yadav, Arvind
2025-10-30  8:17         ` Thomas Hellström [this message]
2025-11-06  9:58           ` Yadav, Arvind
2025-10-28 12:24 ` [RFC PATCH 4/9] drm/xe/bo: Prevent purging of shared buffer objects Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 5/9] drm/xe/bo: Handle CPU faults on purged " Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 6/9] drm/xe/bo: Prevent mmap of " Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 7/9] drm/xe/vm: Prevent binding " Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 8/9] drm/xe/uapi: Add UAPI for purgeable bo state to madvise query response Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 9/9] drm/xe: Add support for querying purgeable BO states Arvind Yadav
2025-10-28 12:37 ` [RFC PATCH 0/9] drm/xe/madvise: Add support for purgeable buffer objects Thomas Hellström
2025-10-28 13:02   ` Matthew Auld
2025-10-29  8:40     ` Yadav, Arvind
2025-10-28 13:23 ` ✗ CI.checkpatch: warning for " Patchwork
2025-10-28 13:24 ` ✓ CI.KUnit: success " Patchwork
2025-10-28 14:12 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-10-28 19:44 ` ✗ Xe.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29fdde6904e00bcaaefc9148c97d7cdd833c5bbc.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=arvind.yadav@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox