public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed
From: "Yadav, Arvind" <arvind.yadav@intel.com>
To: "Souza, Jose" <jose.souza@intel.com>,
	"Brost, Matthew" <matthew.brost@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Vivi,  Rodrigo" <rodrigo.vivi@intel.com>,
	"Mishra, Pallavi" <pallavi.mishra@intel.com>,
	"Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>,
	"thomas.hellstrom@linux.intel.com"
	<thomas.hellstrom@linux.intel.com>
Subject: Re: [PATCH v6 00/12] drm/xe/madvise: Add support for purgeable buffer objects
Date: Mon, 23 Mar 2026 12:07:38 +0530	[thread overview]
Message-ID: <44bee1ad-9499-41d3-be9d-06534df4cd2c@intel.com> (raw)
In-Reply-To: <7291621eb5811fc9a227420257046bd29032e9b9.camel@intel.com>


On 04-03-2026 18:59, Souza, Jose wrote:
> On Tue, 2026-03-03 at 14:49 -0800, Matthew Brost wrote:
>> On Tue, Mar 03, 2026 at 03:05:59PM -0700, Souza, Jose wrote:
>>> On Tue, 2026-03-03 at 20:49 +0530, Arvind Yadav wrote:
>>>> This patch series introduces comprehensive support for purgeable
>>>> buffer objects
>>>> in the Xe driver, enabling userspace to provide memory usage
>>>> hints
>>>> for better
>>>> memory management under system pressure.
>>>>
>>>> Overview:
>>>>
>>>> Purgeable memory allows applications to mark buffer objects as
>>>> "not
>>>> currently
>>>> needed" (DONTNEED), making them eligible for kernel reclamation
>>>> during memory
>>>> pressure. This helps prevent OOM conditions and enables more
>>>> efficient GPU
>>>> memory utilization for workloads with temporary or regeneratable
>>>> data
>>>> (caches,
>>>> intermediate results, decoded frames, etc.).
>>>>
>>>> Purgeable BO Lifecycle:
>>>> 1. WILLNEED (default): BO actively needed, kernel preserves
>>>> backing
>>>> store
>>>> 2. DONTNEED (user hint): BO contents discardable, eligible for
>>>> purging
>>>> 3. PURGED (kernel action): Backing store reclaimed during memory
>>>> pressure
>>>>
>>>> Key Design Principles:
>>>>    - i915 compatibility: "Once purged, always purged" semantics -
>>>> purged BOs
>>>>      remain permanently invalid and must be destroyed/recreated
>>>>    - Per-VMA state tracking: Each VMA tracks its own purgeable
>>>> state,
>>>> BO is
>>>>      only marked DONTNEED when ALL VMAs across ALL VMs agree
>>>> (Thomas
>>>> Hellström)
>>>>    - Safety first: Imported/exported dma-bufs blocked from
>>>> purgeable
>>>> state -
>>>>      no visibility into external device usage (Matt Roper)
>>>>    - Multiple protection layers: Validation in madvise, VM bind,
>>>> mmap,
>>>> CPU
>>>>      and GPU fault handlers. GPU page faults on DONTNEED BOs are
>>>> rejected in
>>>>      xe_pagefault_begin() to preserve the GPU PTE invalidation
>>>> done at
>>>> madvise
>>>>      time; without this the rebind path would re-map real pages
>>>> and
>>>> undo the
>>>>      PTE zap, preventing the shrinker from ever reclaiming the BO.
>>>>    - Correct GPU PTE zapping: madvise_purgeable() explicitly sets
>>>>      skip_invalidation per VMA (false for DONTNEED, true for
>>>> WILLNEED,
>>>> purged
>>>>      and dmabuf-shared BOs) so DONTNEED always triggers a GPU PTE
>>>> zap
>>>>      regardless of prior madvise state.
>>>>    - Scratch PTE support: Fault-mode VMs use scratch pages for
>>>> safe
>>>> zero reads
>>>>      on purged BO access.
>>>>    - TTM shrinker integration: Encapsulated helpers manage
>>>> xe_ttm_tt-
>>>>> purgeable
>>>>      flag and shrinker page accounting (shrinkable vs purgeable
>>>> buckets)
>>>
>>> I get Engine memory CAT errors when using this feature:
>>>
>>> [  240.301213] xe 0000:00:02.0: [drm] Tile0: GT0: Fault response:
>>> Unsuccessful -EINVAL
>>> [  240.301301] xe 0000:00:02.0: [drm] Tile0: GT0: Engine memory CAT
>>> error [18]: class=rcs, logical_mask: 0x1, guc_id=17
>>> [  240.302871] xe 0000:00:02.0: [drm] Tile0: GT0: Engine reset:
>>> engine_class=rcs, logical_mask: 0x1, guc_id=17, state=0x249
>>> [  240.302885] xe 0000:00:02.0: [drm] Tile0: GT0: Timedout job:
>>> seqno=4294967169, lrc_seqno=4294967169, guc_id=17, flags=0x0 in
>>> arb_map_buffer_ [3374]
>>> [  240.302892] xe 0000:00:02.0: [drm:xe_devcoredump [xe]] Multiple
>>> hangs are occurring, but only the first snapshot was taken
>>>
>>> Mesa creates VM with DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE, probably
>>> you
>>> don't have a IGT test with this scenario.
>>>
>>> @cc Rodrigo
>>>
>>> Other issue is not related to your patches but drm_xe_madvise only
>>> works with non-canonical addresses and some time ago was agreed
>>> that
>>> all the user-visible addresses would be in canonical format.
>>> Not sure if we can do anything at this point but letting you know.
>>>

CAT error with SCRATCH_PAGE VMs: Fixed in patch 3.

>> We actually might be able to fix it to accept canonical addresses,
>> what we
>> can't blindly do is make non-canonical addresses stop working...
>>
>> It might create weird scenario for UMDs though if canonical addresses
>> work on some kernel but not others but perhaps since this is Mesa's
>> first use of madvise we get this in as part of purgable and only NEO
>> would have to deal with this scenario.
> Yes, this is the first madvise usage in Mesa.


Canonical addresses: Fixed in patch 12. xe_vm_madvise_ioctl now strips 
sign extension via xe_device_uncanonicalize_addr() at the top, so both 
canonical and non-canonical addresses work transparently. Non-canonical 
addresses are unaffected.

Thanks,
Arvind

>> Matt
>>
>>>> v2 Changes:
>>>>    - Reordered patches: Moved shared BO helper before main
>>>> implementation for
>>>>      proper dependency order
>>>>    - Fixed reference counting in mmap offset validation (use
>>>> drm_gem_object_put)
>>>>    - Removed incorrect claims about madvise(WILLNEED) restoring
>>>> purged
>>>> BOs
>>>>    - Fixed error code documentation inconsistencies
>>>>    - Initialize purge_state_val fields to prevent kernel memory
>>>> leaks
>>>>    - Use xe_bo_trigger_rebind() for async TLB invalidation (Thomas
>>>> Hellström)
>>>>    - Add NULL rebind with scratch PTEs for fault mode (Thomas
>>>> Hellström)
>>>>    - Implement i915-compatible retained field logic (Thomas
>>>> Hellström)
>>>>    - Skip BO validation for purged BOs in page fault handler
>>>> (crash
>>>> fix)
>>>>    - Add scratch VM check in page fault path (non-scratch VMs fail
>>>> fault)
>>>>
>>>> v3 Changes (addressing Matt and Thomas Hellström feedback):
>>>>    - Per-VMA purgeable state tracking: Added xe_vma-
>>>>> purgeable_state
>>>> field
>>>>    - Complete VMA check: xe_bo_all_vmas_dontneed() walks all VMAs
>>>> across all
>>>>      VMs to ensure unanimous DONTNEED before marking BO purgeable
>>>>    - VMA unbind recheck: Added
>>>> xe_bo_recheck_purgeable_on_vma_unbind()
>>>> to
>>>>      re-evaluate BO state when VMAs are destroyed
>>>>    - Block external dma-bufs: Added xe_bo_is_external_dmabuf()
>>>> check
>>>> using
>>>>      drm_gem_is_imported() and obj->dma_buf to prevent purging
>>>> imported/exported BOs
>>>>    - Consistent lockdep enforcement: Added xe_bo_assert_held() to
>>>> all
>>>> helpers
>>>>      that access madv_purgeable state
>>>>    - Simplified page table logic: Renamed is_null to
>>>> is_null_or_purged
>>>> in
>>>>      xe_pt_stage_bind_entry() - purged BOs treated identically to
>>>> null
>>>> VMAs
>>>>    - Removed unnecessary checks: Dropped redundant "&& bo" check
>>>> in
>>>> xe_ttm_bo_purge()
>>>>    - Xe-specific warnings: Changed drm_warn() to XE_WARN_ON() in
>>>> purge
>>>> path
>>>>    - Moved purge checks under locks: Purge state validation now
>>>> done
>>>> after
>>>>      acquiring dma-resv lock in vma_lock_and_validate() and
>>>> xe_pagefault_begin()
>>>>    - Race-free fault handling: Removed unlocked purge check from
>>>>      xe_pagefault_handle_vma(), moved to locked
>>>> xe_pagefault_begin()
>>>>    - Shrinker helper functions: Added
>>>> xe_bo_set_purgeable_shrinker()
>>>> and
>>>>      xe_bo_clear_purgeable_shrinker() to encapsulate TTM purgeable
>>>> flag updates
>>>>      and shrinker page accounting, improving code clarity and
>>>> maintainability
>>>>
>>>> v4 Changes (addressing Matt and Thomas Hellström feedback):
>>>>    - UAPI: Removed '__u64 reserved' field from purge_state_val
>>>> union
>>>> to fit
>>>>      16-byte size constraint (Matt)
>>>>    - Changed madv_purgeable from atomic_t to u32 across all
>>>> patches
>>>> (Matt)
>>>>    - CPU fault handling: Added purged check to fastpath
>>>> (xe_bo_cpu_fault_fastpath)
>>>>      to prevent hang when accessing existing mmap of purged BO
>>>>
>>>> v5 Changes (addressing Matt and Thomas Hellström feedback):
>>>>    - Add locking documentation to madv_purgeable field comment
>>>> (Matt)
>>>>    - Introduce xe_bo_set_purgeable_state() helper (void return) to
>>>> centralize
>>>>      madv_purgeable updates with xe_bo_assert_held() and state
>>>> transition
>>>>      validation using explicit enum checks (no transition out of
>>>> PURGED) (Matt)
>>>>    - Make xe_ttm_bo_purge() return int and propagate failures from
>>>>      xe_bo_move(); handle xe_bo_trigger_rebind() failures (e.g.
>>>> no_wait_gpu
>>>>      paths) rather than silently ignoring (Matt)
>>>>    - Replace drm_WARN_ON with xe_assert for better Xe-specific
>>>> assertions (Matt)
>>>>    - Hook purgeable handling into
>>>> madvise_funcs[DRM_XE_VMA_ATTR_PURGEABLE_STATE]
>>>>      instead of special-case path in xe_vm_madvise_ioctl() (Matt)
>>>>    - Track purgeable retained return via xe_madvise_details and
>>>> perform
>>>>      copy_to_user() from xe_madvise_details_fini() after locks are
>>>> dropped (Matt)
>>>>    - Set madvise_funcs[DRM_XE_VMA_ATTR_PURGEABLE_STATE] to NULL
>>>> with
>>>>      __maybe_unused on madvise_purgeable() to maintain
>>>> bisectability
>>>> until
>>>>      shrinker integration is complete in final patch (Matt)
>>>>    - Call xe_bo_recheck_purgeable_on_vma_unbind() from
>>>> xe_vma_destroy()
>>>>      right after drm_gpuva_unlink() where we already hold the BO
>>>> lock,
>>>>      drop the trylock-based late destroy path (Matt)
>>>>    - Move purgeable_state into xe_vma_mem_attr with the other
>>>> madvise
>>>>      attributes (Matt)
>>>>    - Drop READ_ONCE since the BO lock already protects us (Matt)
>>>>    - Keep returning false when there are no VMAs - otherwise we'd
>>>> mark
>>>>      BOs purgeable without any user hint (Matt)
>>>>    -  Use struct xe_vma_lock_and_validate_flags instead of
>>>> multiple
>>>> bool
>>>>      parameters to improve readability and prevent argument
>>>> transposition (Matt)
>>>>    - Fix LRU crash while running shrink test
>>>>    - Skip xe_bo_validate() for purged BOs in xe_gpuvm_validate()
>>>>    - Split ghost BO and zero-refcount handling in xe_bo_shrink()
>>>> (Thomas)
>>>>
>>>> v6 Changes (addressing Jose Souza, Thomas Hellström and Matt
>>>> Brost
>>>> feedback):
>>>>    - Document DONTNEED blocking behavior in uAPI: Clearly describe
>>>> which
>>>>      operations are blocked and with what error codes. (Thomas,
>>>> Matt)
>>>>    - Block VM_BIND to DONTNEED BOs: Return -EBUSY to prevent
>>>> creating
>>>> new
>>>>      VMAs to purgeable BOs (undefined behavior). (Thomas, Matt)
>>>>    - Block CPU faults to DONTNEED BOs: Return VM_FAULT_SIGBUS in
>>>> both
>>>> fastpath
>>>>      and slowpath to prevent undefined behavior. (Thomas, Matt)
>>>>    - Block new mmap() to DONTNEED/purged BOs: Return -EBUSY for
>>>> DONTNEED,
>>>>      -EINVAL for PURGED. (Thomas, Matt)
>>>>    - Block dma-buf export of DONTNEED/purged BOs: Return -EBUSY
>>>> for
>>>> DONTNEED,
>>>>      -EINVAL for PURGED. (Thomas, Matt)
>>>>    - Fix state transition bug: xe_bo_all_vmas_dontneed() now
>>>> returns
>>>> enum to
>>>>      distinguish NO_VMAS (preserve state) from WILLNEED (has
>>>> active
>>>> VMAs),
>>>>      preventing incorrect DONTNEED → WILLNEED flip on last VMA
>>>> unmap
>>>> (Matt)
>>>>    - Set skip_invalidation explicitly in madvise_purgeable() to
>>>> ensure
>>>>      DONTNEED always zaps GPU PTEs regardless of prior madvise
>>>> state.
>>>>    - Add DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT for
>>>> userspace
>>>>      feature detection. (Jose)
>>>>
>>>> Arvind Yadav (11):
>>>>    drm/xe/bo: Add purgeable bo state tracking and field madv to
>>>> xe_bo
>>>>    drm/xe/madvise: Implement purgeable buffer object support
>>>>    drm/xe/bo: Block CPU faults to purgeable buffer objects
>>>>    drm/xe/vm: Prevent binding of purged buffer objects
>>>>    drm/xe/madvise: Implement per-VMA purgeable state tracking
>>>>    drm/xe/madvise: Block imported and exported dma-bufs
>>>>    drm/xe/bo: Block mmap of DONTNEED/purged BOs
>>>>    drm/xe/dma_buf: Block export of DONTNEED/purged BOs
>>>>    drm/xe/bo: Add purgeable shrinker state helpers
>>>>    drm/xe/madvise: Enable purgeable buffer object IOCTL support
>>>>    drm/xe/bo: Skip zero-refcount BOs in shrinker
>>>>
>>>> Himal Prasad Ghimiray (1):
>>>>    drm/xe/uapi: Add UAPI support for purgeable buffer objects
>>>>
>>>>   drivers/gpu/drm/xe/xe_bo.c         | 223 +++++++++++++++++++++--
>>>>   drivers/gpu/drm/xe/xe_bo.h         |  60 ++++++
>>>>   drivers/gpu/drm/xe/xe_bo_types.h   |   6 +
>>>>   drivers/gpu/drm/xe/xe_dma_buf.c    |  21 +++
>>>>   drivers/gpu/drm/xe/xe_pagefault.c  |  19 ++
>>>>   drivers/gpu/drm/xe/xe_pt.c         |  40 +++-
>>>>   drivers/gpu/drm/xe/xe_query.c      |   2 +
>>>>   drivers/gpu/drm/xe/xe_svm.c        |   1 +
>>>>   drivers/gpu/drm/xe/xe_vm.c         | 100 ++++++++--
>>>>   drivers/gpu/drm/xe/xe_vm_madvise.c | 283
>>>> +++++++++++++++++++++++++++++
>>>>   drivers/gpu/drm/xe/xe_vm_madvise.h |   3 +
>>>>   drivers/gpu/drm/xe/xe_vm_types.h   |  11 ++
>>>>   include/uapi/drm/xe_drm.h          |  60 ++++++
>>>>   13 files changed, 793 insertions(+), 36 deletions(-)

  reply	other threads:[~2026-03-23  6:37 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-03 15:19 [PATCH v6 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
2026-03-03 15:19 ` [PATCH v6 01/12] drm/xe/uapi: Add UAPI " Arvind Yadav
2026-03-03 15:53   ` Souza, Jose
2026-03-20  4:00     ` Yadav, Arvind
2026-03-10  8:31   ` Thomas Hellström
2026-03-03 15:19 ` [PATCH v6 02/12] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo Arvind Yadav
2026-03-03 15:19 ` [PATCH v6 03/12] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
2026-03-10  8:41   ` Thomas Hellström
2026-03-03 15:20 ` [PATCH v6 04/12] drm/xe/bo: Block CPU faults to purgeable buffer objects Arvind Yadav
2026-03-05 15:26   ` Thomas Hellström
2026-03-03 15:20 ` [PATCH v6 05/12] drm/xe/vm: Prevent binding of purged " Arvind Yadav
2026-03-05 15:38   ` Thomas Hellström
2026-03-20  2:34     ` Yadav, Arvind
2026-03-03 15:20 ` [PATCH v6 06/12] drm/xe/madvise: Implement per-VMA purgeable state tracking Arvind Yadav
2026-03-10  9:57   ` Thomas Hellström
2026-03-23  6:47     ` Yadav, Arvind
2026-03-03 15:20 ` [PATCH v6 07/12] drm/xe/madvise: Block imported and exported dma-bufs Arvind Yadav
2026-03-03 15:20 ` [PATCH v6 08/12] drm/xe/bo: Block mmap of DONTNEED/purged BOs Arvind Yadav
2026-03-10 10:17   ` Thomas Hellström
2026-03-18 13:03     ` Yadav, Arvind
2026-03-03 15:20 ` [PATCH v6 09/12] drm/xe/dma_buf: Block export " Arvind Yadav
2026-03-10 10:19   ` Thomas Hellström
2026-03-18 13:02     ` Yadav, Arvind
2026-03-03 15:20 ` [PATCH v6 10/12] drm/xe/bo: Add purgeable shrinker state helpers Arvind Yadav
2026-03-10 10:01   ` Thomas Hellström
2026-03-18 12:15     ` Yadav, Arvind
2026-03-03 15:20 ` [PATCH v6 11/12] drm/xe/madvise: Enable purgeable buffer object IOCTL support Arvind Yadav
2026-03-10 10:23   ` Thomas Hellström
2026-03-03 15:20 ` [PATCH v6 12/12] drm/xe/bo: Skip zero-refcount BOs in shrinker Arvind Yadav
2026-03-05 15:49   ` Thomas Hellström
2026-03-17  5:59     ` Yadav, Arvind
2026-03-03 16:12 ` ✗ CI.checkpatch: warning for drm/xe/madvise: Add support for purgeable buffer objects (rev7) Patchwork
2026-03-03 16:14 ` ✓ CI.KUnit: success " Patchwork
2026-03-03 16:50 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-03 22:05 ` [PATCH v6 00/12] drm/xe/madvise: Add support for purgeable buffer objects Souza, Jose
2026-03-03 22:49   ` Matthew Brost
2026-03-04 13:29     ` Souza, Jose
2026-03-23  6:37       ` Yadav, Arvind [this message]
2026-03-04  4:01 ` ✗ Xe.CI.FULL: failure for drm/xe/madvise: Add support for purgeable buffer objects (rev7) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44bee1ad-9499-41d3-be9d-06534df4cd2c@intel.com \
    --to=arvind.yadav@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jose.souza@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=pallavi.mishra@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox