From: "Yadav, Arvind" <arvind.yadav@intel.com>
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
intel-xe@lists.freedesktop.org
Cc: <matthew.brost@intel.com>, <himal.prasad.ghimiray@intel.com>
Subject: Re: [RFC PATCH 3/9] drm/xe/madvise: Implement purgeable buffer object support
Date: Thu, 30 Oct 2025 12:33:03 +0530 [thread overview]
Message-ID: <59ccc1f9-efb6-4584-93ca-7114a644a851@intel.com> (raw)
In-Reply-To: <93be3dde56c24ce83c28a2dfe3bffeaf9a47b25d.camel@linux.intel.com>
On 29-10-2025 16:21, Thomas Hellström wrote:
> On Wed, 2025-10-29 at 09:55 +0100, Thomas Hellström wrote:
>> On Tue, 2025-10-28 at 17:54 +0530, Arvind Yadav wrote:
>>> This allows userspace applications to provide memory usage hints to
>>> the kernel for better memory management under pressure:
>>>
>>> - WILLNEED: BO will be needed again, re-validate if purged
>>> - DONTNEED: BO not currently needed, may be purged if needed
>>>
>>> When userspace marks BO as DONTNEED, the kernel can reclaim
>>> their memory during memory pressure. BO transition to PURGED
>>> state when reclaimed, and attempting to access purged buffers
>>> triggers appropriate fault handling.
>>>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>>> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
>>> ---
>>> drivers/gpu/drm/xe/xe_bo.c | 75 +++++++++++++++++++++++++-
>>> --
>>> --
>>> drivers/gpu/drm/xe/xe_vm_madvise.c | 67 ++++++++++++++++++++++++++
>>> 2 files changed, 130 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c
>>> b/drivers/gpu/drm/xe/xe_bo.c
>>> index cbc3ee157218..3b3eb83658cc 100644
>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>> @@ -836,6 +836,60 @@ static int xe_bo_move_notify(struct xe_bo *bo,
>>> return 0;
>>> }
>>>
>>> +static int xe_bo_invalidate_tlb_before_purge(struct xe_bo *bo)
>> In the future someone might want to reuse this function for
>> invalidating somewhere else. Could we perhaps rename to
>> xe_bo_invalidate_vmas() or something like that?
Noted.
>>
>>
>>> +{
>>> + struct drm_gpuvm_bo *vm_bo;
>>> + struct drm_gpuva *gpuva;
>>> + struct drm_gem_object *obj = &bo->ttm.base;
>>> + int ret;
>>> +
>>> + /* BO must be locked before invalidating */
>>> + dma_resv_assert_held(bo->ttm.base.resv);
>>> +
>>> + drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>> + drm_gpuvm_bo_for_each_va(gpuva, vm_bo) {
>>> + struct xe_vma *vma = gpuva_to_vma(gpuva);
>>> +
>>> + ret = xe_vm_invalidate_vma(vma);
>>> + if (ret)
>>> + return ret;
>>> + }
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void xe_bo_set_purged(struct xe_bo *bo)
>>> +{
>>> + /* BO must be locked before modifying madv state */
>>> + dma_resv_assert_held(bo->ttm.base.resv);
>>> +
>>> + atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_PURGED);
>>> +}
>>> +
>>> +static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo,
>>> struct
>>> ttm_operation_ctx *ctx)
>>> +{
>>> + struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
>>> + struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
>>> +
>>> + if (ttm_bo->ttm) {
>>> + struct ttm_placement place = {};
>>> + int ret = ttm_bo_validate(ttm_bo, &place, ctx);
>>> + int ret_inval;
>> Christian from AMD once mentioned that instead of implicitly calling
>> ttm_bo_validate() with an empty placement, we could send the null
>> placement through the evict_flags callback. Would that work?
>>
>>
> Actually it doesn't since we don't get to call move_notify.
Agreed,
>>
>>> +
>>> + drm_WARN_ON(&xe->drm, ret);
>>> + if (!ret && bo) {
>>> + if (atomic_read(&bo->madv_purgeable) ==
>>> XE_MADV_PURGEABLE_DONTNEED) {
>>> + /* Invalidate TLB before marking
>>> BO
>>> as purged */
>>> + ret_inval =
>>> xe_bo_invalidate_tlb_before_purge(bo);
>> Since the page-table update and page-freeing is really intended to be
>> an asynchronous operation, and the GPU bindings are intended to be
>> invalidated in move_notify() / trigger_rebind() where we properly
>> take
>> care of special cases like faulting VMs etc, can we move the
>> invalidation logic there?
>>
>> Perhaps it is even possible to skip the synchronous page-table
>> zeroing
>> here in favour of a NULL rebind (when rebinding a purged BO we set up
>> all zero mappings, or whatever mappings are required given scratch
>> page
>> mode etc.) Then the page-table clearing will be properly inserted in
>> the asynchronous execution.
>>
My understanding is that you are suggesting two main changes:
1. Asynchronous Invalidation: I should remove the synchronous
xe_bo_invalidate_tlb_before_purge() call. Instead, I should rely on the
existing asynchronous invalidation path that is already triggered during
eviction via xe_bo_move_notify() -> xe_bo_trigger_rebind(). This will
handle the TLB invalidation correctly and more efficiently.
2. NULL Rebind on Access(for VM Faulting Mode is ENABLED): When a GPU
operation tries to bind a buffer that is in the PURGED state, the driver
should not error. Instead, it should perform a "NULL rebind" by mapping
the buffer's VMA to a scratch page. This ensures the GPU reads safe,
zeroed data instead of accessing invalid memory.
The BO would only be re-allocated and re-bound when userspace explicitly
Call WILLNEED.
I’ll now look into implementing the NULL rebind logic within the VMA
mapping path and will follow up with an updated patch.
~Arvind
>>> + if (!ret_inval)
>>> + xe_bo_set_purged(bo);
>>> +
>>> + }
>>> + }
>>
>>
>>> + }
>>> +}
>>> +
>>> static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool
>>> evict,
>>> struct ttm_operation_ctx *ctx,
>>> struct ttm_resource *new_mem,
>>> @@ -853,8 +907,14 @@ static int xe_bo_move(struct ttm_buffer_object
>>> *ttm_bo, bool evict,
>>> bool needs_clear;
>>> bool handle_system_ccs = (!IS_DGFX(xe) &&
>>> xe_bo_needs_ccs_pages(bo) &&
>>> ttm && ttm_tt_is_populated(ttm))
>>> ?
>>> true : false;
>>> + int state = atomic_read(&bo->madv_purgeable);
>>> int ret = 0;
>>>
>>> + if (evict && state == XE_MADV_PURGEABLE_DONTNEED) {
>>> + xe_ttm_bo_purge(ttm_bo, ctx);
>>> + return 0;
>>> + }
>>> +
>>> /* Bo creation path, moving to system or TT. */
>>> if ((!old_mem && ttm) && !handle_system_ccs) {
>>> if (new_mem->mem_type == XE_PL_TT)
>>> @@ -1606,18 +1666,6 @@ static void
>>> xe_ttm_bo_delete_mem_notify(struct
>>> ttm_buffer_object *ttm_bo)
>>> }
>>> }
>>>
>>> -static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo,
>>> struct
>>> ttm_operation_ctx *ctx)
>>> -{
>>> - struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
>>> -
>>> - if (ttm_bo->ttm) {
>>> - struct ttm_placement place = {};
>>> - int ret = ttm_bo_validate(ttm_bo, &place, ctx);
>>> -
>>> - drm_WARN_ON(&xe->drm, ret);
>>> - }
>>> -}
>>> -
>>> static void xe_ttm_bo_swap_notify(struct ttm_buffer_object
>>> *ttm_bo)
>>> {
>>> struct ttm_operation_ctx ctx = {
>>> @@ -2472,6 +2520,9 @@ struct xe_bo *xe_bo_create_user(struct
>>> xe_device *xe,
>>> ttm_bo_type_device, flags,
>>> 0,
>>> true);
>>> }
>>>
>>> + /* Initialize purge advisory state */
>>> + atomic_set(&bo->madv_purgeable,
>>> XE_MADV_PURGEABLE_WILLNEED);
>>> +
>>> return bo;
>>> }
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c
>>> b/drivers/gpu/drm/xe/xe_vm_madvise.c
>>> index cad3cf627c3f..1f0356ea4403 100644
>>> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
>>> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
>>> @@ -158,6 +158,54 @@ static void madvise_pat_index(struct xe_device
>>> *xe, struct xe_vm *vm,
>>> }
>>> }
>>>
>>> +/*
>>> + * Handle purgeable buffer object advice for
>>> DONTNEED/WILLNEED/PURGED.
>>> + * Returns 0 on success, negative errno on error.
>>> + */
>>> +static void xe_vm_madvise_purgeable_bo(struct xe_device *xe,
>>> struct
>>> xe_vm *vm,
>>> + struct xe_vma **vmas, int
>>> num_vmas,
>>> + struct drm_xe_madvise *op,
>>> struct drm_exec *exec)
>>> +{
>>> +
>>> + xe_assert(vm->xe, op->type ==
>>> DRM_XE_VMA_ATTR_PURGEABLE_STATE);
>>> +
>>> + for (int i = 0; i < num_vmas; i++) {
>>> + struct xe_bo *bo = xe_vma_bo(vmas[i]);
>>> + int state;
>>> + int ret;
>>> +
>>> + if (!bo)
>>> + continue;
>>> +
>>> + /* BO must be locked before modifying madv state
>>> */
>>> + dma_resv_assert_held(bo->ttm.base.resv);
>>> +
>>> + switch (op->purge_state_val.val) {
>>> + case DRM_XE_VMA_PURGEABLE_STATE_WILLNEED:
>>> + state = atomic_read(&bo->madv_purgeable);
>>> + if (state == XE_MADV_PURGEABLE_PURGED) {
>>> + ret = xe_bo_validate(bo, NULL,
>>> true,
>>> exec);
>>> + if (ret) {
>>> + drm_err(&vm->xe->drm,
>>> + "Failed to
>>> validate
>>> purged BO: %d\n", ret);
>>> + return;
>>> + }
>>> + }
>>> + atomic_set(&bo->madv_purgeable,
>>> XE_MADV_PURGEABLE_WILLNEED);
>>> + break;
>>> + case DRM_XE_VMA_PURGEABLE_STATE_DONTNEED:
>>> + state = atomic_read(&bo->madv_purgeable);
>>> + if (state != XE_MADV_PURGEABLE_PURGED)
>>> + atomic_set(&bo->madv_purgeable,
>>> XE_MADV_PURGEABLE_DONTNEED);
>>> + break;
>>> + default:
>>> + drm_warn(&vm->xe->drm, "Invalid madvice
>>> value = %d\n",
>>> + op->purge_state_val.val);
>>> + return;
>>> + }
>>> + }
>>> +}
>>> +
>>> typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm
>>> *vm,
>>> struct xe_vma **vmas, int num_vmas,
>>> struct drm_xe_madvise *op);
>>> @@ -283,6 +331,19 @@ static bool madvise_args_are_sane(struct
>>> xe_device *xe, const struct drm_xe_madv
>>> return false;
>>> break;
>>> }
>>> + case DRM_XE_VMA_ATTR_PURGEABLE_STATE:
>>> + {
>>> + u32 val = args->purge_state_val.val;
>>> +
>>> + if (XE_IOCTL_DBG(xe, !((val ==
>>> DRM_XE_VMA_PURGEABLE_STATE_WILLNEED) ||
>>> + (val ==
>>> DRM_XE_VMA_PURGEABLE_STATE_DONTNEED))))
>>> + return false;
>>> +
>>> + if (XE_IOCTL_DBG(xe, args-
>>>> purge_state_val.reserved))
>>> + return false;
>>> +
>>> + break;
>>> + }
>>> default:
>>> if (XE_IOCTL_DBG(xe, 1))
>>> return false;
>>> @@ -402,6 +463,12 @@ int xe_vm_madvise_ioctl(struct drm_device
>>> *dev,
>>> void *data, struct drm_file *fil
>>> goto err_fini;
>>> }
>>> }
>>> + if (args->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE)
>>> {
>>> + xe_vm_madvise_purgeable_bo(xe, vm,
>>> madvise_range.vmas,
>>> +
>>> madvise_range.num_vmas, args, &exec);
>>> + goto err_fini;
>>> +
>>> + }
>>> }
>>>
>>> if (madvise_range.has_svm_userptr_vmas) {
next prev parent reply other threads:[~2025-10-30 7:03 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-28 12:24 [RFC PATCH 0/9] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 1/9] drm/xe/uapi: Add UAPI " Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 2/9] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 3/9] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
2025-10-29 8:55 ` Thomas Hellström
2025-10-29 10:51 ` Thomas Hellström
2025-10-30 7:03 ` Yadav, Arvind [this message]
2025-10-30 8:17 ` Thomas Hellström
2025-11-06 9:58 ` Yadav, Arvind
2025-10-28 12:24 ` [RFC PATCH 4/9] drm/xe/bo: Prevent purging of shared buffer objects Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 5/9] drm/xe/bo: Handle CPU faults on purged " Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 6/9] drm/xe/bo: Prevent mmap of " Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 7/9] drm/xe/vm: Prevent binding " Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 8/9] drm/xe/uapi: Add UAPI for purgeable bo state to madvise query response Arvind Yadav
2025-10-28 12:24 ` [RFC PATCH 9/9] drm/xe: Add support for querying purgeable BO states Arvind Yadav
2025-10-28 12:37 ` [RFC PATCH 0/9] drm/xe/madvise: Add support for purgeable buffer objects Thomas Hellström
2025-10-28 13:02 ` Matthew Auld
2025-10-29 8:40 ` Yadav, Arvind
2025-10-28 13:23 ` ✗ CI.checkpatch: warning for " Patchwork
2025-10-28 13:24 ` ✓ CI.KUnit: success " Patchwork
2025-10-28 14:12 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-10-28 19:44 ` ✗ Xe.CI.Full: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=59ccc1f9-efb6-4584-93ca-7114a644a851@intel.com \
--to=arvind.yadav@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox