[PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects

public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects
@ 2026-03-26  5:50 Arvind Yadav
  2026-03-26  5:51 ` [PATCH v8 01/12] drm/xe/uapi: Add UAPI " Arvind Yadav
                   ` (11 more replies)
  0 siblings, 12 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26  5:50 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=yes, Size: 11179 bytes --]

This patch series introduces comprehensive support for purgeable buffer objects
in the Xe driver, enabling userspace to provide memory usage hints for better
memory management under system pressure.

Overview:

Purgeable memory allows applications to mark buffer objects as "not currently
needed" (DONTNEED), making them eligible for kernel reclamation during memory
pressure. This helps prevent OOM conditions and enables more efficient GPU
memory utilization for workloads with temporary or regeneratable data (caches,
intermediate results, decoded frames, etc.).

Purgeable BO Lifecycle:
1. WILLNEED (default): BO actively needed, kernel preserves backing store
2. DONTNEED (user hint): BO contents discardable, eligible for purging
3. PURGED (kernel action): Backing store reclaimed during memory pressure

Key Design Principles:
  - i915 compatibility: "Once purged, always purged" semantics - purged BOs
    remain permanently invalid and must be destroyed/recreated
  - Per-VMA state tracking: Each VMA tracks its own purgeable state, BO is
    only marked DONTNEED when ALL VMAs across ALL VMs agree (Thomas Hellström)
  - Safety first: Imported/exported dma-bufs blocked from purgeable state -
    no visibility into external device usage (Matt Roper)
  - Multiple protection layers: Validation in madvise, VM bind, mmap, CPU
    and GPU fault handlers. GPU page faults on DONTNEED/purged BOs skip
    validate/migrate in xe_pagefault_begin(); non-scratch VMs fail with
    -EACCES, scratch VMs let xe_vma_rebind() run and install scratch PTEs.
  - GPU PTE zapping at purge time: PTEs are zapped in xe_bo_move_notify()
    right before the shrinker frees the pages, not at madvise(DONTNEED)
    time (pages are still alive then). skip_invalidation=true is set for
    all VMA types to suppress the madvise-time invalidation.
  - Scratch PTE support: Fault-mode VMs use scratch pages for safe zero reads
    on purged BO access.
  - TTM shrinker integration: Encapsulated helpers manage xe_ttm_tt->purgeable
    flag and shrinker page accounting (shrinkable vs purgeable buckets)

References:
   Mesa MR Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40573

v2 Changes:
  - Reordered patches: Moved shared BO helper before main implementation for
    proper dependency order
  - Fixed reference counting in mmap offset validation (use drm_gem_object_put)
  - Removed incorrect claims about madvise(WILLNEED) restoring purged BOs
  - Fixed error code documentation inconsistencies
  - Initialize purge_state_val fields to prevent kernel memory leaks
  - Use xe_bo_trigger_rebind() for async TLB invalidation (Thomas Hellström)
  - Add NULL rebind with scratch PTEs for fault mode (Thomas Hellström)
  - Implement i915-compatible retained field logic (Thomas Hellström)
  - Skip BO validation for purged BOs in page fault handler (crash fix)
  - Add scratch VM check in page fault path (non-scratch VMs fail fault)

v3 Changes (addressing Matt and Thomas Hellström feedback):
  - Per-VMA purgeable state tracking: Added xe_vma->purgeable_state field
  - Complete VMA check: xe_bo_all_vmas_dontneed() walks all VMAs across all
    VMs to ensure unanimous DONTNEED before marking BO purgeable
  - VMA unbind recheck: Added xe_bo_recheck_purgeable_on_vma_unbind() to
    re-evaluate BO state when VMAs are destroyed
  - Block external dma-bufs: Added xe_bo_is_external_dmabuf() check using
    drm_gem_is_imported() and obj->dma_buf to prevent purging imported/exported BOs
  - Consistent lockdep enforcement: Added xe_bo_assert_held() to all helpers
    that access madv_purgeable state
  - Simplified page table logic: Renamed is_null to is_null_or_purged in
    xe_pt_stage_bind_entry() - purged BOs treated identically to null VMAs
  - Removed unnecessary checks: Dropped redundant "&& bo" check in xe_ttm_bo_purge()
  - Xe-specific warnings: Changed drm_warn() to XE_WARN_ON() in purge path
  - Moved purge checks under locks: Purge state validation now done after
    acquiring dma-resv lock in vma_lock_and_validate() and xe_pagefault_begin()
  - Race-free fault handling: Removed unlocked purge check from
    xe_pagefault_handle_vma(), moved to locked xe_pagefault_begin()
  - Shrinker helper functions: Added xe_bo_set_purgeable_shrinker() and
    xe_bo_clear_purgeable_shrinker() to encapsulate TTM purgeable flag updates
    and shrinker page accounting, improving code clarity and maintainability

v4 Changes (addressing Matt and Thomas Hellström feedback):
  - UAPI: Removed '__u64 reserved' field from purge_state_val union to fit
    16-byte size constraint (Matt)
  - Changed madv_purgeable from atomic_t to u32 across all patches (Matt)
  - CPU fault handling: Added purged check to fastpath (xe_bo_cpu_fault_fastpath)
    to prevent hang when accessing existing mmap of purged BO

v5 Changes (addressing Matt and Thomas Hellström feedback):
  - Add locking documentation to madv_purgeable field comment (Matt)
  - Introduce xe_bo_set_purgeable_state() helper (void return) to centralize
    madv_purgeable updates with xe_bo_assert_held() and state transition
    validation using explicit enum checks (no transition out of PURGED) (Matt)
  - Make xe_ttm_bo_purge() return int and propagate failures from
    xe_bo_move(); handle xe_bo_trigger_rebind() failures (e.g. no_wait_gpu
    paths) rather than silently ignoring (Matt)
  - Replace drm_WARN_ON with xe_assert for better Xe-specific assertions (Matt)
  - Hook purgeable handling into madvise_funcs[DRM_XE_VMA_ATTR_PURGEABLE_STATE]
    instead of special-case path in xe_vm_madvise_ioctl() (Matt)
  - Track purgeable retained return via xe_madvise_details and perform
    copy_to_user() from xe_madvise_details_fini() after locks are dropped (Matt)
  - Set madvise_funcs[DRM_XE_VMA_ATTR_PURGEABLE_STATE] to NULL with
    __maybe_unused on madvise_purgeable() to maintain bisectability until
    shrinker integration is complete in final patch (Matt)
  - Call xe_bo_recheck_purgeable_on_vma_unbind() from xe_vma_destroy()
    right after drm_gpuva_unlink() where we already hold the BO lock,
    drop the trylock-based late destroy path (Matt)
  - Move purgeable_state into xe_vma_mem_attr with the other madvise
    attributes (Matt)
  - Drop READ_ONCE since the BO lock already protects us (Matt)
  - Keep returning false when there are no VMAs - otherwise we'd mark
    BOs purgeable without any user hint (Matt)
  -  Use struct xe_vma_lock_and_validate_flags instead of multiple bool
    parameters to improve readability and prevent argument transposition (Matt)
  - Fix LRU crash while running shrink test
  - Skip xe_bo_validate() for purged BOs in xe_gpuvm_validate()
  - Split ghost BO and zero-refcount handling in xe_bo_shrink() (Thomas)

v6 Changes (addressing Jose Souza, Thomas Hellström and Matt Brost feedback):
  - Document DONTNEED blocking behavior in uAPI: Clearly describe which
    operations are blocked and with what error codes. (Thomas, Matt)
  - Block VM_BIND to DONTNEED BOs: Return -EBUSY to prevent creating new
    VMAs to purgeable BOs (undefined behavior). (Thomas, Matt)
  - Block CPU faults to DONTNEED BOs: Return VM_FAULT_SIGBUS in both fastpath
    and slowpath to prevent undefined behavior. (Thomas, Matt)
  - Block new mmap() to DONTNEED/purged BOs: Return -EBUSY for DONTNEED,
    -EINVAL for PURGED. (Thomas, Matt)
  - Block dma-buf export of DONTNEED/purged BOs: Return -EBUSY for DONTNEED,
    -EINVAL for PURGED. (Thomas, Matt)
  - Fix state transition bug: xe_bo_all_vmas_dontneed() now returns enum to
    distinguish NO_VMAS (preserve state) from WILLNEED (has active VMAs),
    preventing incorrect DONTNEED → WILLNEED flip on last VMA unmap (Matt)
  - Set skip_invalidation explicitly in madvise_purgeable() to ensure
    DONTNEED always zaps GPU PTEs regardless of prior madvise state.
  - Add DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT for userspace
    feature detection. (Jose)

v7 Changes (addressing Thomas Hellström, Matt B and Jose feedback):
  - mmap check moved from xe_gem_mmap_offset_ioctl() into a new
    xe_gem_object_mmap() callback wrapping drm_gem_ttm_mmap(), with
    interruptible lock (Thomas)
  - dma-buf export lock made interruptible: xe_bo_lock(bo, true) (Thomas)
  - vma_lock_and_validate_flags passed by value instead of pointer (reviewer)
  - xe_bo_recompute_purgeable_state() simplified using enum value alignment
    between xe_bo_vmas_purge_state and xe_madv_purgeable_state, with
    static_assert to enforce the alignment (Thomas)
  - Merge xe_bo_set_purgeable_shrinker/xe_bo_clear_purgeable_shrinker into
    a single static xe_bo_set_purgeable_shrinker(bo, new_state) called
    automatically from xe_bo_set_purgeable_state() (Thomas)
  - Drop "drm/xe/bo: Skip zero-refcount BOs in shrinker" patch — ghost BO
    path already handles this correctly (Thomas)
  - Fix Engine memory CAT errors on scratch-page VMs (Matt Roper):
    xe_pagefault_asid_to_vm() now accepts scratch VMs via
    || xe_vm_has_scratch(vm); xe_pagefault_begin() checks DONTNEED/purged
    before validate/migrate and signals skip_rebind to caller via bool*
    out-parameter to avoid xe_vma_rebind() assert and PTE zap undo
  - Add new patch 12: Accept canonical GPU addresses in xe_vm_madvise_ioctl()
    using xe_device_uncanonicalize_addr() (Matt B)
  - UAPI doc comment improvement. (Jose)

v8:
  - Remove skip_rebind out-parameter from xe_pagefault_begin(); always let
    xe_vma_rebind() run so tile_present is updated and the GPU fault resolves.
    Previously skip_rebind=true left tile_present=0, causing an infinite
    refault loop on scratch VMs. (Thomas)
  - Check xe_bo_lock() return value and propagate error. (Thomas and
    Matt)

Arvind Yadav (11):
  drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo
  drm/xe/madvise: Implement purgeable buffer object support
  drm/xe/bo: Block CPU faults to purgeable buffer objects
  drm/xe/vm: Prevent binding of purged buffer objects
  drm/xe/madvise: Implement per-VMA purgeable state tracking
  drm/xe/madvise: Block imported and exported dma-bufs
  drm/xe/bo: Block mmap of DONTNEED/purged BOs
  drm/xe/dma_buf: Block export of DONTNEED/purged BOs
  drm/xe/bo: Add purgeable shrinker state helpers
  drm/xe/madvise: Enable purgeable buffer object IOCTL support
  drm/xe/madvise: Accept canonical GPU addresses in xe_vm_madvise_ioctl

Himal Prasad Ghimiray (1):
  drm/xe/uapi: Add UAPI support for purgeable buffer objects

 drivers/gpu/drm/xe/xe_bo.c         | 194 ++++++++++++++++--
 drivers/gpu/drm/xe/xe_bo.h         |  58 ++++++
 drivers/gpu/drm/xe/xe_bo_types.h   |   6 +
 drivers/gpu/drm/xe/xe_dma_buf.c    |  24 +++
 drivers/gpu/drm/xe/xe_pagefault.c  |  14 +-
 drivers/gpu/drm/xe/xe_pt.c         |  40 +++-
 drivers/gpu/drm/xe/xe_query.c      |   2 +
 drivers/gpu/drm/xe/xe_svm.c        |   1 +
 drivers/gpu/drm/xe/xe_vm.c         | 111 +++++++++--
 drivers/gpu/drm/xe/xe_vm_madvise.c | 304 ++++++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_vm_madvise.h |   3 +
 drivers/gpu/drm/xe/xe_vm_types.h   |  11 ++
 include/uapi/drm/xe_drm.h          |  69 +++++++
 13 files changed, 795 insertions(+), 42 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v8 01/12] drm/xe/uapi: Add UAPI support for purgeable buffer objects
  2026-03-26  5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
@ 2026-03-26  5:51 ` Arvind Yadav
  2026-03-26  5:51 ` [PATCH v8 02/12] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo Arvind Yadav
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26  5:51 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom,
	José Roberto de Souza

From: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

Extend the DRM_XE_MADVISE ioctl to support purgeable buffer object
management by adding DRM_XE_VMA_ATTR_PURGEABLE_STATE attribute type.

This allows userspace applications to provide memory usage hints to
the kernel for better memory management under pressure:

- WILLNEED: Buffer is needed and should not be purged. If the BO was
  previously purged, retained field returns 0 indicating backing store
  was lost (once purged, always purged semantics matching i915).

- DONTNEED: Buffer is not currently needed and may be purged by the
  kernel under memory pressure to free resources. Only applies to
  non-shared BOs.

  To prevent undefined behavior, the following operations are blocked
  while a BO is in DONTNEED state:
  - New mmap() operations return -EBUSY
  - VM_BIND operations return -EBUSY
  - New dma-buf exports return -EBUSY
  - CPU page faults return SIGBUS
  - GPU page faults fail with -EACCES

  This ensures applications cannot use a BO while marked as DONTNEED,
  preventing erratic behavior when the kernel purges the backing store.

The implementation includes a 'retained' output field (matching i915's
drm_i915_gem_madvise.retained) that indicates whether the BO's backing
store still exists (1) or has been purged (0).

Added DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT flag to allow
userspace to detect kernel support for purgeable buffer objects
before attempting to use the feature.

v2:
  - Add PURGED state for read-only status, change ioctl to DRM_IOWR,
    add retained field for i915 compatibility

v3:
  - UAPI rule should not be changed (Matthew Brost)
  - Make 'retained' a userptr (Matthew Brost)

v4:
  - You cannot make this part of the union (purge_state_val) larger
    than the existing union (16 bytes). So just drop the '__u64 reserved'
    field. (Matt)

v5:
  - Update UAPI documentation to clarify retained must be initialized
    to 0(Thomas)

v6:
  - Document DONTNEED BO access blocking behavior to prevent undefined
    behavior and clarify uAPI contract (Thomas, Matt)
  - Add query flag DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT for
    feature detection. (Jose)
  - Rename retained to retained_ptr. (Jose)

v7:
  - Updated UAPI documentation as suggested to reflect 'updated' value
    instead of 'return'. (Jose)

Cc: Matthew Brost <matthew.brost@intel.com>
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 include/uapi/drm/xe_drm.h | 69 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index f8b2afb20540..a59baf5add9a 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -429,6 +429,7 @@ struct drm_xe_query_config {
 	#define DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR	(1 << 2)
 	#define DRM_XE_QUERY_CONFIG_FLAG_HAS_NO_COMPRESSION_HINT (1 << 3)
 	#define DRM_XE_QUERY_CONFIG_FLAG_HAS_DISABLE_STATE_CACHE_PERF_FIX	(1 << 4)
+	#define DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT    (1 << 5)
 #define DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT		2
 #define DRM_XE_QUERY_CONFIG_VA_BITS			3
 #define DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY	4
@@ -2083,6 +2084,7 @@ struct drm_xe_query_eu_stall {
  *  - DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: Set preferred memory location.
  *  - DRM_XE_MEM_RANGE_ATTR_ATOMIC: Set atomic access policy.
  *  - DRM_XE_MEM_RANGE_ATTR_PAT: Set page attribute table index.
+ *  - DRM_XE_VMA_ATTR_PURGEABLE_STATE: Set purgeable state for BOs.
  *
  * Example:
  *
@@ -2115,6 +2117,7 @@ struct drm_xe_madvise {
 #define DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC	0
 #define DRM_XE_MEM_RANGE_ATTR_ATOMIC		1
 #define DRM_XE_MEM_RANGE_ATTR_PAT		2
+#define DRM_XE_VMA_ATTR_PURGEABLE_STATE		3
 	/** @type: type of attribute */
 	__u32 type;
 
@@ -2205,6 +2208,72 @@ struct drm_xe_madvise {
 			/** @pat_index.reserved: Reserved */
 			__u64 reserved;
 		} pat_index;
+
+		/**
+		 * @purge_state_val: Purgeable state configuration
+		 *
+		 * Used when @type == DRM_XE_VMA_ATTR_PURGEABLE_STATE.
+		 *
+		 * Configures the purgeable state of buffer objects in the specified
+		 * virtual address range. This allows applications to hint to the kernel
+		 * about bo's usage patterns for better memory management.
+		 *
+		 * By default all VMAs are in WILLNEED state.
+		 *
+		 * Supported values for @purge_state_val.val:
+		 *  - DRM_XE_VMA_PURGEABLE_STATE_WILLNEED (0): Marks BO as needed.
+		 *    If the BO was previously purged, the kernel sets the __u32 at
+		 *    @retained_ptr to 0 (backing store lost) so the application knows
+		 *    it must recreate the BO.
+		 *
+		 *  - DRM_XE_VMA_PURGEABLE_STATE_DONTNEED (1): Marks BO as not currently
+		 *    needed. Kernel may purge it under memory pressure to reclaim memory.
+		 *    Only applies to non-shared BOs. The kernel sets the __u32 at
+		 *    @retained_ptr to 1 if the backing store still exists (not yet purged),
+		 *    or 0 if it was already purged.
+		 *
+		 *    Important: Once marked as DONTNEED, touching the BO's memory
+		 *    is undefined behavior. It may succeed temporarily (before the
+		 *    kernel purges the backing store) but will suddenly fail once
+		 *    the BO transitions to PURGED state.
+		 *
+		 *    To transition back: use WILLNEED and check @retained_ptr —
+		 *    if 0, backing store was lost and the BO must be recreated.
+		 *
+		 *    The following operations are blocked in DONTNEED state to
+		 *    prevent the BO from being re-mapped after madvise:
+		 *    - New mmap() calls: Fail with -EBUSY
+		 *    - VM_BIND operations: Fail with -EBUSY
+		 *    - New dma-buf exports: Fail with -EBUSY
+		 *    - CPU page faults (existing mmap): Fail with SIGBUS
+		 *    - GPU page faults (fault-mode VMs): Fail with -EACCES
+		 */
+		struct {
+#define DRM_XE_VMA_PURGEABLE_STATE_WILLNEED	0
+#define DRM_XE_VMA_PURGEABLE_STATE_DONTNEED	1
+			/** @purge_state_val.val: value for DRM_XE_VMA_ATTR_PURGEABLE_STATE */
+			__u32 val;
+
+			/** @purge_state_val.pad: MBZ */
+			__u32 pad;
+			/**
+			 * @purge_state_val.retained_ptr: Pointer to a __u32 output
+			 * field for backing store status.
+			 *
+			 * Userspace must initialize the __u32 value at this address
+			 * to 0 before the ioctl. Kernel writes a __u32 after the
+			 * operation:
+			 * - 1 if backing store exists (not purged)
+			 * - 0 if backing store was purged
+			 *
+			 * If userspace fails to initialize to 0, ioctl returns -EINVAL.
+			 * This ensures a safe default (0 = assume purged) if kernel
+			 * cannot write the result.
+			 *
+			 * Similar to i915's drm_i915_gem_madvise.retained field.
+			 */
+			__u64 retained_ptr;
+		} purge_state_val;
 	};
 
 	/** @reserved: Reserved */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v8 02/12] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo
  2026-03-26  5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
  2026-03-26  5:51 ` [PATCH v8 01/12] drm/xe/uapi: Add UAPI " Arvind Yadav
@ 2026-03-26  5:51 ` Arvind Yadav
  2026-03-26  5:51 ` [PATCH v8 03/12] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26  5:51 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom

Add infrastructure for tracking purgeable state of buffer objects.
This includes:

Introduce enum xe_madv_purgeable_state with three states:
   - XE_MADV_PURGEABLE_WILLNEED (0): BO is needed and should not be
     purged. This is the default state for all BOs.

   - XE_MADV_PURGEABLE_DONTNEED (1): BO is not currently needed and
     can be purged by the kernel under memory pressure to reclaim
     resources. Only non-shared BOs can be marked as DONTNEED.

   - XE_MADV_PURGEABLE_PURGED (2): BO has been purged by the kernel.
     Accessing a purged BO results in error. Follows i915 semantics
     where once purged, the BO remains permanently invalid ("once
     purged, always purged").

Add madv_purgeable field to struct xe_bo for state tracking
  of purgeable state across concurrent access paths

v2:
  - Add xe_bo_is_purged() helper, improve state documentation

v3:
  - Add the kernel doc(Matthew Brost)
  - Add the new helpers xe_bo_madv_is_dontneed(Matthew Brost)

v4:
  - @madv_purgeable atomic_t → u32 change across all relevant
    patches (Matt)

v5:
  - Add locking documentation to madv_purgeable field comment (Matt)

Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.h       | 56 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_bo_types.h |  6 ++++
 2 files changed, 62 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 2cbac16f7db7..fb5541bdf602 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -87,6 +87,28 @@
 
 #define XE_PCI_BARRIER_MMAP_OFFSET	(0x50 << XE_PTE_SHIFT)
 
+/**
+ * enum xe_madv_purgeable_state - Buffer object purgeable state enumeration
+ *
+ * This enum defines the possible purgeable states for a buffer object,
+ * allowing userspace to provide memory usage hints to the kernel for
+ * better memory management under pressure.
+ *
+ * @XE_MADV_PURGEABLE_WILLNEED: The buffer object is needed and should not be purged.
+ * This is the default state.
+ * @XE_MADV_PURGEABLE_DONTNEED: The buffer object is not currently needed and can be
+ * purged by the kernel under memory pressure.
+ * @XE_MADV_PURGEABLE_PURGED: The buffer object has been purged by the kernel.
+ *
+ * Accessing a purged buffer will result in an error. Per i915 semantics,
+ * once purged, a BO remains permanently invalid and must be destroyed and recreated.
+ */
+enum xe_madv_purgeable_state {
+	XE_MADV_PURGEABLE_WILLNEED,
+	XE_MADV_PURGEABLE_DONTNEED,
+	XE_MADV_PURGEABLE_PURGED,
+};
+
 struct sg_table;
 
 struct xe_bo *xe_bo_alloc(void);
@@ -215,6 +237,40 @@ static inline bool xe_bo_is_protected(const struct xe_bo *bo)
 	return bo->pxp_key_instance;
 }
 
+/**
+ * xe_bo_is_purged() - Check if buffer object has been purged
+ * @bo: The buffer object to check
+ *
+ * Checks if the buffer object's backing store has been discarded by the
+ * kernel due to memory pressure after being marked as purgeable (DONTNEED).
+ * Once purged, the BO cannot be restored and any attempt to use it will fail.
+ *
+ * Context: Caller must hold the BO's dma-resv lock
+ * Return: true if the BO has been purged, false otherwise
+ */
+static inline bool xe_bo_is_purged(struct xe_bo *bo)
+{
+	xe_bo_assert_held(bo);
+	return bo->madv_purgeable == XE_MADV_PURGEABLE_PURGED;
+}
+
+/**
+ * xe_bo_madv_is_dontneed() - Check if BO is marked as DONTNEED
+ * @bo: The buffer object to check
+ *
+ * Checks if userspace has marked this BO as DONTNEED (i.e., its contents
+ * are not currently needed and can be discarded under memory pressure).
+ * This is used internally to decide whether a BO is eligible for purging.
+ *
+ * Context: Caller must hold the BO's dma-resv lock
+ * Return: true if the BO is marked DONTNEED, false otherwise
+ */
+static inline bool xe_bo_madv_is_dontneed(struct xe_bo *bo)
+{
+	xe_bo_assert_held(bo);
+	return bo->madv_purgeable == XE_MADV_PURGEABLE_DONTNEED;
+}
+
 static inline void xe_bo_unpin_map_no_vm(struct xe_bo *bo)
 {
 	if (likely(bo)) {
diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index d4fe3c8dca5b..ff8317bfc1ae 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -108,6 +108,12 @@ struct xe_bo {
 	 * from default
 	 */
 	u64 min_align;
+
+	/**
+	 * @madv_purgeable: user space advise on BO purgeability, protected
+	 * by BO's dma-resv lock.
+	 */
+	u32 madv_purgeable;
 };
 
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v8 03/12] drm/xe/madvise: Implement purgeable buffer object support
  2026-03-26  5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
  2026-03-26  5:51 ` [PATCH v8 01/12] drm/xe/uapi: Add UAPI " Arvind Yadav
  2026-03-26  5:51 ` [PATCH v8 02/12] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo Arvind Yadav
@ 2026-03-26  5:51 ` Arvind Yadav
  2026-03-26  8:19   ` Thomas Hellström
  2026-03-26  5:51 ` [PATCH v8 04/12] drm/xe/bo: Block CPU faults to purgeable buffer objects Arvind Yadav
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26  5:51 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom

This allows userspace applications to provide memory usage hints to
the kernel for better memory management under pressure:

Add the core implementation for purgeable buffer objects, enabling memory
reclamation of user-designated DONTNEED buffers during eviction.

This patch implements the purge operation and state machine transitions:

Purgeable States (from xe_madv_purgeable_state):
 - WILLNEED (0): BO should be retained, actively used
 - DONTNEED (1): BO eligible for purging, not currently needed
 - PURGED (2): BO backing store reclaimed, permanently invalid

Design Rationale:
  - Async TLB invalidation via trigger_rebind (no blocking xe_vm_invalidate_vma)
  - i915 compatibility: retained field, "once purged always purged" semantics
  - Shared BO protection prevents multi-process memory corruption
  - Scratch PTE reuse avoids new infrastructure, safe for fault mode

Note: The madvise_purgeable() function is implemented but not hooked into
the IOCTL handler (madvise_funcs[] entry is NULL) to maintain bisectability.
The feature will be enabled in the final patch when all supporting
infrastructure (shrinker, per-VMA tracking) is complete.

v2:
  - Use xe_bo_trigger_rebind() for async TLB invalidation (Thomas Hellström)
  - Add NULL rebind with scratch PTEs for fault mode (Thomas Hellström)
  - Implement i915-compatible retained field logic (Thomas Hellström)
  - Skip BO validation for purged BOs in page fault handler (crash fix)
  - Add scratch VM check in page fault path (non-scratch VMs fail fault)
  - Force clear_pt for non-scratch VMs to avoid phys addr 0 mapping (review fix)
  - Add !is_purged check to resource cursor setup to prevent stale access

v3:
  - Rebase as xe_gt_pagefault.c is gone upstream and replaced
    with xe_pagefault.c (Matthew Brost)
  - Xe specific warn on (Matthew Brost)
  - Call helpers for madv_purgeable access(Matthew Brost)
  - Remove bo NULL check(Matthew Brost)
  - Use xe_bo_assert_held instead of dma assert(Matthew Brost)
  - Move the xe_bo_is_purged check under the dma-resv lock( by Matt)
  - Drop is_purged from xe_pt_stage_bind_entry and just set is_null to true
    for purged BO rename s/is_null/is_null_or_purged (by Matt)
  - UAPI rule should not be changed.(Matthew Brost)
  - Make 'retained' a userptr (Matthew Brost)

v4:
  - @madv_purgeable atomic_t → u32 change across all relevant patches (Matt)

v5:
  - Introduce xe_bo_set_purgeable_state() helper (void return) to centralize
    madv_purgeable updates with xe_bo_assert_held() and state transition
    validation using explicit enum checks (no transition out of PURGED) (Matt)
  - Make xe_ttm_bo_purge() return int and propagate failures from
    xe_bo_move(); handle xe_bo_trigger_rebind() failures (e.g. no_wait_gpu
    paths) rather than silently ignoring (Matt)
  - Replace drm_WARN_ON with xe_assert for better Xe-specific assertions (Matt)
  - Hook purgeable handling into madvise_funcs[DRM_XE_VMA_ATTR_PURGEABLE_STATE]
    instead of special-case path in xe_vm_madvise_ioctl() (Matt)
  - Track purgeable retained return via xe_madvise_details and perform
    copy_to_user() from xe_madvise_details_fini() after locks are dropped (Matt)
  - Set madvise_funcs[DRM_XE_VMA_ATTR_PURGEABLE_STATE] to NULL with
    __maybe_unused on madvise_purgeable() to maintain bisectability until
    shrinker integration is complete in final patch (Matt)
  - Use put_user() instead of copy_to_user() for single u32 retained value (Thomas)
  - Return -EFAULT from ioctl if put_user() fails (Thomas)
  - Validate userspace initialized retained to 0 before ioctl, ensuring safe
    default (0 = "assume purged") if put_user() fails (Thomas)
  - Refactor error handling: separate fallible put_user from infallible cleanup
  - xe_madvise_purgeable_retained_to_user(): separate helper for fallible put_user
  - Call put_user() after releasing all locks to avoid circular dependencies
  - Use xe_bo_move_notify() instead of xe_bo_trigger_rebind() in xe_ttm_bo_purge()
    for proper abstraction - handles vunmap, dma-buf notifications, and VRAM
    userfault cleanup (Thomas)
  - Fix LRU crash while running shrink test
  - Skip xe_bo_validate() for purged BOs in xe_gpuvm_validate()

v6:
  - xe_bo_move_notify() must be called *before* ttm_bo_validate(). (Thomas)
  - Block GPU page faults (fault-mode VMs) for DONTNEED bo's (Thomas, Matt)
  - Rename retained to retained_ptr. (Jose)

v7 Changes:
  - Fix engine reset from EU overfetch in scratch VMs: xe_pagefault_begin()
    and xe_pagefault_service() now return 0 instead of -EACCES/-EINVAL for
    DONTNEED/purged BOs and missing VMAs so stale accesses hit scratch PTEs.
  - Fix Engine memory CAT errors when Mesa uses DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE:
    accept scratch VMs in xe_pagefault_asid_to_vm() via '|| xe_vm_has_scratch(vm).
  - Skip validate/migrate/rebind for DONTNEED/purged BOs in xe_pagefault_begin()
    using a bool *skip_rebind out-parameter. Scratch VMs ACK the fault and fall back
    to scratch PTEs; non-scratch VMs return -EACCES.

v8:
  - Remove skip_rebind out-parameter from xe_pagefault_begin(); always let
    xe_vma_rebind() run so tile_present is updated and the GPU fault resolves.
    Previously skip_rebind=true left tile_present=0, causing an infinite
    refault loop on scratch VMs. (Thomas)
  - Fixed spelling: corrected "madvice" → "madvise". (Thomas)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c         | 107 ++++++++++++++++++++---
 drivers/gpu/drm/xe/xe_bo.h         |   2 +
 drivers/gpu/drm/xe/xe_pagefault.c  |  15 +++-
 drivers/gpu/drm/xe/xe_pt.c         |  40 +++++++--
 drivers/gpu/drm/xe/xe_vm.c         |  20 ++++-
 drivers/gpu/drm/xe/xe_vm_madvise.c | 136 +++++++++++++++++++++++++++++
 6 files changed, 298 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 22179b2df85c..b6055bb4c578 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -835,6 +835,84 @@ static int xe_bo_move_notify(struct xe_bo *bo,
 	return 0;
 }
 
+/**
+ * xe_bo_set_purgeable_state() - Set BO purgeable state with validation
+ * @bo: Buffer object
+ * @new_state: New purgeable state
+ *
+ * Sets the purgeable state with lockdep assertions and validates state
+ * transitions. Once a BO is PURGED, it cannot transition to any other state.
+ * Invalid transitions are caught with xe_assert().
+ */
+void xe_bo_set_purgeable_state(struct xe_bo *bo,
+			       enum xe_madv_purgeable_state new_state)
+{
+	struct xe_device *xe = xe_bo_device(bo);
+
+	xe_bo_assert_held(bo);
+
+	/* Validate state is one of the known values */
+	xe_assert(xe, new_state == XE_MADV_PURGEABLE_WILLNEED ||
+		      new_state == XE_MADV_PURGEABLE_DONTNEED ||
+		      new_state == XE_MADV_PURGEABLE_PURGED);
+
+	/* Once purged, always purged - cannot transition out */
+	xe_assert(xe, !(bo->madv_purgeable == XE_MADV_PURGEABLE_PURGED &&
+			new_state != XE_MADV_PURGEABLE_PURGED));
+
+	bo->madv_purgeable = new_state;
+}
+
+/**
+ * xe_ttm_bo_purge() - Purge buffer object backing store
+ * @ttm_bo: The TTM buffer object to purge
+ * @ctx: TTM operation context
+ *
+ * This function purges the backing store of a BO marked as DONTNEED and
+ * triggers rebind to invalidate stale GPU mappings. For fault-mode VMs,
+ * this zaps the PTEs. The next GPU access will trigger a page fault and
+ * perform NULL rebind (scratch pages or clear PTEs based on VM config).
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+static int xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
+{
+	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
+	struct ttm_placement place = {};
+	int ret;
+
+	xe_bo_assert_held(bo);
+
+	if (!ttm_bo->ttm)
+		return 0;
+
+	if (!xe_bo_madv_is_dontneed(bo))
+		return 0;
+
+	/*
+	 * Use the standard pre-move hook so we share the same cleanup/invalidate
+	 * path as migrations: drop any CPU vmap and schedule the necessary GPU
+	 * unbind/rebind work.
+	 *
+	 * This must be called before ttm_bo_validate() frees the pages.
+	 * May fail in no-wait contexts (fault/shrinker) or if the BO is
+	 * pinned. Keep state unchanged on failure so we don't end up "PURGED"
+	 * with stale mappings.
+	 */
+	ret = xe_bo_move_notify(bo, ctx);
+	if (ret)
+		return ret;
+
+	ret = ttm_bo_validate(ttm_bo, &place, ctx);
+	if (ret)
+		return ret;
+
+	/* Commit the state transition only once invalidation was queued */
+	xe_bo_set_purgeable_state(bo, XE_MADV_PURGEABLE_PURGED);
+
+	return 0;
+}
+
 static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
 		      struct ttm_operation_ctx *ctx,
 		      struct ttm_resource *new_mem,
@@ -854,6 +932,20 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
 				  ttm && ttm_tt_is_populated(ttm)) ? true : false;
 	int ret = 0;
 
+	/*
+	 * Purge only non-shared BOs explicitly marked DONTNEED by userspace.
+	 * The move_notify callback will handle invalidation asynchronously.
+	 */
+	if (evict && xe_bo_madv_is_dontneed(bo)) {
+		ret = xe_ttm_bo_purge(ttm_bo, ctx);
+		if (ret)
+			return ret;
+
+		/* Free the unused eviction destination resource */
+		ttm_resource_free(ttm_bo, &new_mem);
+		return 0;
+	}
+
 	/* Bo creation path, moving to system or TT. */
 	if ((!old_mem && ttm) && !handle_system_ccs) {
 		if (new_mem->mem_type == XE_PL_TT)
@@ -1603,18 +1695,6 @@ static void xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object *ttm_bo)
 	}
 }
 
-static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
-{
-	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
-
-	if (ttm_bo->ttm) {
-		struct ttm_placement place = {};
-		int ret = ttm_bo_validate(ttm_bo, &place, ctx);
-
-		drm_WARN_ON(&xe->drm, ret);
-	}
-}
-
 static void xe_ttm_bo_swap_notify(struct ttm_buffer_object *ttm_bo)
 {
 	struct ttm_operation_ctx ctx = {
@@ -2195,6 +2275,9 @@ struct xe_bo *xe_bo_init_locked(struct xe_device *xe, struct xe_bo *bo,
 #endif
 	INIT_LIST_HEAD(&bo->vram_userfault_link);
 
+	/* Initialize purge advisory state */
+	bo->madv_purgeable = XE_MADV_PURGEABLE_WILLNEED;
+
 	drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
 
 	if (resv) {
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index fb5541bdf602..653851d47aa6 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -271,6 +271,8 @@ static inline bool xe_bo_madv_is_dontneed(struct xe_bo *bo)
 	return bo->madv_purgeable == XE_MADV_PURGEABLE_DONTNEED;
 }
 
+void xe_bo_set_purgeable_state(struct xe_bo *bo, enum xe_madv_purgeable_state new_state);
+
 static inline void xe_bo_unpin_map_no_vm(struct xe_bo *bo)
 {
 	if (likely(bo)) {
diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index ea4857acf28d..2ac6e1edaa81 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -59,6 +59,19 @@ static int xe_pagefault_begin(struct drm_exec *exec, struct xe_vma *vma,
 	if (!bo)
 		return 0;
 
+	/*
+	 * Skip validate/migrate for DONTNEED/purged BOs - repopulating
+	 * their pages would prevent the shrinker from reclaiming them.
+	 * For non-scratch VMs there is no safe fallback so fail the fault.
+	 * For scratch VMs let xe_vma_rebind() run normally; it will install
+	 * scratch PTEs so the GPU gets safe zero reads instead of faulting.
+	 */
+	if (unlikely(xe_bo_madv_is_dontneed(bo) || xe_bo_is_purged(bo))) {
+		if (!xe_vm_has_scratch(vm))
+			return -EACCES;
+		return 0;
+	}
+
 	return need_vram_move ? xe_bo_migrate(bo, vram->placement, NULL, exec) :
 		xe_bo_validate(bo, vm, true, exec);
 }
@@ -145,7 +158,7 @@ static struct xe_vm *xe_pagefault_asid_to_vm(struct xe_device *xe, u32 asid)
 
 	down_read(&xe->usm.lock);
 	vm = xa_load(&xe->usm.asid_to_vm, asid);
-	if (vm && xe_vm_in_fault_mode(vm))
+	if (vm && (xe_vm_in_fault_mode(vm) || xe_vm_has_scratch(vm)))
 		xe_vm_get(vm);
 	else
 		vm = ERR_PTR(-EINVAL);
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 2d9ce2c4cb4f..08f40701f654 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -531,20 +531,26 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 	/* Is this a leaf entry ?*/
 	if (level == 0 || xe_pt_hugepte_possible(addr, next, level, xe_walk)) {
 		struct xe_res_cursor *curs = xe_walk->curs;
-		bool is_null = xe_vma_is_null(xe_walk->vma);
-		bool is_vram = is_null ? false : xe_res_is_vram(curs);
+		struct xe_bo *bo = xe_vma_bo(xe_walk->vma);
+		bool is_null_or_purged = xe_vma_is_null(xe_walk->vma) ||
+					 (bo && xe_bo_is_purged(bo));
+		bool is_vram = is_null_or_purged ? false : xe_res_is_vram(curs);
 
 		XE_WARN_ON(xe_walk->va_curs_start != addr);
 
 		if (xe_walk->clear_pt) {
 			pte = 0;
 		} else {
-			pte = vm->pt_ops->pte_encode_vma(is_null ? 0 :
+			/*
+			 * For purged BOs, treat like null VMAs - pass address 0.
+			 * The pte_encode_vma will set XE_PTE_NULL flag for scratch mapping.
+			 */
+			pte = vm->pt_ops->pte_encode_vma(is_null_or_purged ? 0 :
 							 xe_res_dma(curs) +
 							 xe_walk->dma_offset,
 							 xe_walk->vma,
 							 pat_index, level);
-			if (!is_null)
+			if (!is_null_or_purged)
 				pte |= is_vram ? xe_walk->default_vram_pte :
 					xe_walk->default_system_pte;
 
@@ -568,7 +574,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 		if (unlikely(ret))
 			return ret;
 
-		if (!is_null && !xe_walk->clear_pt)
+		if (!is_null_or_purged && !xe_walk->clear_pt)
 			xe_res_next(curs, next - addr);
 		xe_walk->va_curs_start = next;
 		xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << level);
@@ -721,6 +727,26 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 	};
 	struct xe_pt *pt = vm->pt_root[tile->id];
 	int ret;
+	bool is_purged = false;
+
+	/*
+	 * Check if BO is purged:
+	 * - Scratch VMs: Use scratch PTEs (XE_PTE_NULL) for safe zero reads
+	 * - Non-scratch VMs: Clear PTEs to zero (non-present) to avoid mapping to phys addr 0
+	 *
+	 * For non-scratch VMs, we force clear_pt=true so leaf PTEs become completely
+	 * zero instead of creating a PRESENT mapping to physical address 0.
+	 */
+	if (bo && xe_bo_is_purged(bo)) {
+		is_purged = true;
+
+		/*
+		 * For non-scratch VMs, a NULL rebind should use zero PTEs
+		 * (non-present), not a present PTE to phys 0.
+		 */
+		if (!xe_vm_has_scratch(vm))
+			xe_walk.clear_pt = true;
+	}
 
 	if (range) {
 		/* Move this entire thing to xe_svm.c? */
@@ -756,11 +782,11 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 	}
 
 	xe_walk.default_vram_pte |= XE_PPGTT_PTE_DM;
-	xe_walk.dma_offset = bo ? vram_region_gpu_offset(bo->ttm.resource) : 0;
+	xe_walk.dma_offset = (bo && !is_purged) ? vram_region_gpu_offset(bo->ttm.resource) : 0;
 	if (!range)
 		xe_bo_assert_held(bo);
 
-	if (!xe_vma_is_null(vma) && !range) {
+	if (!xe_vma_is_null(vma) && !range && !is_purged) {
 		if (xe_vma_is_userptr(vma))
 			xe_res_first_dma(to_userptr_vma(vma)->userptr.pages.dma_addr, 0,
 					 xe_vma_size(vma), &curs);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 5572e12c2a7e..a0ade67d616e 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -326,6 +326,7 @@ void xe_vm_kill(struct xe_vm *vm, bool unlocked)
 static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
 {
 	struct xe_vm *vm = gpuvm_to_vm(vm_bo->vm);
+	struct xe_bo *bo = gem_to_xe_bo(vm_bo->obj);
 	struct drm_gpuva *gpuva;
 	int ret;
 
@@ -334,10 +335,16 @@ static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
 		list_move_tail(&gpuva_to_vma(gpuva)->combined_links.rebind,
 			       &vm->rebind_list);
 
+	/* Skip re-populating purged BOs, rebind maps scratch pages. */
+	if (xe_bo_is_purged(bo)) {
+		vm_bo->evicted = false;
+		return 0;
+	}
+
 	if (!try_wait_for_completion(&vm->xe->pm_block))
 		return -EAGAIN;
 
-	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false, exec);
+	ret = xe_bo_validate(bo, vm, false, exec);
 	if (ret)
 		return ret;
 
@@ -1358,6 +1365,9 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
 static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
 			       u16 pat_index, u32 pt_level)
 {
+	struct xe_bo *bo = xe_vma_bo(vma);
+	struct xe_vm *vm = xe_vma_vm(vma);
+
 	pte |= XE_PAGE_PRESENT;
 
 	if (likely(!xe_vma_read_only(vma)))
@@ -1366,7 +1376,13 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
 	pte |= pte_encode_pat_index(pat_index, pt_level);
 	pte |= pte_encode_ps(pt_level);
 
-	if (unlikely(xe_vma_is_null(vma)))
+	/*
+	 * NULL PTEs redirect to scratch page (return zeros on read).
+	 * Set for: 1) explicit null VMAs, 2) purged BOs on scratch VMs.
+	 * Never set NULL flag without scratch page - causes undefined behavior.
+	 */
+	if (unlikely(xe_vma_is_null(vma) ||
+		     (bo && xe_bo_is_purged(bo) && xe_vm_has_scratch(vm))))
 		pte |= XE_PTE_NULL;
 
 	return pte;
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 869db304d96d..881de6cb6c11 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -26,6 +26,8 @@ struct xe_vmas_in_madvise_range {
 /**
  * struct xe_madvise_details - Argument to madvise_funcs
  * @dpagemap: Reference-counted pointer to a struct drm_pagemap.
+ * @has_purged_bo: Track if any BO was purged (for purgeable state)
+ * @retained_ptr: User pointer for retained value (for purgeable state)
  *
  * The madvise IOCTL handler may, in addition to the user-space
  * args, have additional info to pass into the madvise_func that
@@ -34,6 +36,8 @@ struct xe_vmas_in_madvise_range {
  */
 struct xe_madvise_details {
 	struct drm_pagemap *dpagemap;
+	bool has_purged_bo;
+	u64 retained_ptr;
 };
 
 static int get_vmas(struct xe_vm *vm, struct xe_vmas_in_madvise_range *madvise_range)
@@ -180,6 +184,67 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
 	}
 }
 
+/**
+ * madvise_purgeable - Handle purgeable buffer object advice
+ * @xe: XE device
+ * @vm: VM
+ * @vmas: Array of VMAs
+ * @num_vmas: Number of VMAs
+ * @op: Madvise operation
+ * @details: Madvise details for return values
+ *
+ * Handles DONTNEED/WILLNEED/PURGED states. Tracks if any BO was purged
+ * in details->has_purged_bo for later copy to userspace.
+ *
+ * Note: Marked __maybe_unused until hooked into madvise_funcs[] in the
+ * final patch to maintain bisectability. The NULL placeholder in the
+ * array ensures proper -EINVAL return for userspace until all supporting
+ * infrastructure (shrinker, per-VMA tracking) is complete.
+ */
+static void __maybe_unused madvise_purgeable(struct xe_device *xe,
+					     struct xe_vm *vm,
+					     struct xe_vma **vmas,
+					     int num_vmas,
+					     struct drm_xe_madvise *op,
+					     struct xe_madvise_details *details)
+{
+	int i;
+
+	xe_assert(vm->xe, op->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE);
+
+	for (i = 0; i < num_vmas; i++) {
+		struct xe_bo *bo = xe_vma_bo(vmas[i]);
+
+		if (!bo)
+			continue;
+
+		/* BO must be locked before modifying madv state */
+		xe_bo_assert_held(bo);
+
+		/*
+		 * Once purged, always purged. Cannot transition back to WILLNEED.
+		 * This matches i915 semantics where purged BOs are permanently invalid.
+		 */
+		if (xe_bo_is_purged(bo)) {
+			details->has_purged_bo = true;
+			continue;
+		}
+
+		switch (op->purge_state_val.val) {
+		case DRM_XE_VMA_PURGEABLE_STATE_WILLNEED:
+			xe_bo_set_purgeable_state(bo, XE_MADV_PURGEABLE_WILLNEED);
+			break;
+		case DRM_XE_VMA_PURGEABLE_STATE_DONTNEED:
+			xe_bo_set_purgeable_state(bo, XE_MADV_PURGEABLE_DONTNEED);
+			break;
+		default:
+			drm_warn(&vm->xe->drm, "Invalid madvise value = %d\n",
+				 op->purge_state_val.val);
+			return;
+		}
+	}
+}
+
 typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
 			     struct xe_vma **vmas, int num_vmas,
 			     struct drm_xe_madvise *op,
@@ -189,6 +254,12 @@ static const madvise_func madvise_funcs[] = {
 	[DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC] = madvise_preferred_mem_loc,
 	[DRM_XE_MEM_RANGE_ATTR_ATOMIC] = madvise_atomic,
 	[DRM_XE_MEM_RANGE_ATTR_PAT] = madvise_pat_index,
+	/*
+	 * Purgeable support implemented but not enabled yet to maintain
+	 * bisectability. Will be set to madvise_purgeable() in final patch
+	 * when all infrastructure (shrinker, VMA tracking) is complete.
+	 */
+	[DRM_XE_VMA_ATTR_PURGEABLE_STATE] = NULL,
 };
 
 static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start, u64 end)
@@ -319,6 +390,19 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
 			return false;
 		break;
 	}
+	case DRM_XE_VMA_ATTR_PURGEABLE_STATE:
+	{
+		u32 val = args->purge_state_val.val;
+
+		if (XE_IOCTL_DBG(xe, !(val == DRM_XE_VMA_PURGEABLE_STATE_WILLNEED ||
+				       val == DRM_XE_VMA_PURGEABLE_STATE_DONTNEED)))
+			return false;
+
+		if (XE_IOCTL_DBG(xe, args->purge_state_val.pad))
+			return false;
+
+		break;
+	}
 	default:
 		if (XE_IOCTL_DBG(xe, 1))
 			return false;
@@ -337,6 +421,12 @@ static int xe_madvise_details_init(struct xe_vm *vm, const struct drm_xe_madvise
 
 	memset(details, 0, sizeof(*details));
 
+	/* Store retained pointer for purgeable state */
+	if (args->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE) {
+		details->retained_ptr = args->purge_state_val.retained_ptr;
+		return 0;
+	}
+
 	if (args->type == DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC) {
 		int fd = args->preferred_mem_loc.devmem_fd;
 		struct drm_pagemap *dpagemap;
@@ -365,6 +455,21 @@ static void xe_madvise_details_fini(struct xe_madvise_details *details)
 	drm_pagemap_put(details->dpagemap);
 }
 
+static int xe_madvise_purgeable_retained_to_user(const struct xe_madvise_details *details)
+{
+	u32 retained;
+
+	if (!details->retained_ptr)
+		return 0;
+
+	retained = !details->has_purged_bo;
+
+	if (put_user(retained, (u32 __user *)u64_to_user_ptr(details->retained_ptr)))
+		return -EFAULT;
+
+	return 0;
+}
+
 static bool check_bo_args_are_sane(struct xe_vm *vm, struct xe_vma **vmas,
 				   int num_vmas, u32 atomic_val)
 {
@@ -422,6 +527,7 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
 	struct xe_vm *vm;
 	struct drm_exec exec;
 	int err, attr_type;
+	bool do_retained;
 
 	vm = xe_vm_lookup(xef, args->vm_id);
 	if (XE_IOCTL_DBG(xe, !vm))
@@ -432,6 +538,25 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
 		goto put_vm;
 	}
 
+	/* Cache whether we need to write retained, and validate it's initialized to 0 */
+	do_retained = args->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE &&
+		      args->purge_state_val.retained_ptr;
+	if (do_retained) {
+		u32 retained;
+		u32 __user *retained_ptr;
+
+		retained_ptr = u64_to_user_ptr(args->purge_state_val.retained_ptr);
+		if (get_user(retained, retained_ptr)) {
+			err = -EFAULT;
+			goto put_vm;
+		}
+
+		if (XE_IOCTL_DBG(xe, retained != 0)) {
+			err = -EINVAL;
+			goto put_vm;
+		}
+	}
+
 	xe_svm_flush(vm);
 
 	err = down_write_killable(&vm->lock);
@@ -487,6 +612,13 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
 	}
 
 	attr_type = array_index_nospec(args->type, ARRAY_SIZE(madvise_funcs));
+
+	/* Ensure the madvise function exists for this type */
+	if (!madvise_funcs[attr_type]) {
+		err = -EINVAL;
+		goto err_fini;
+	}
+
 	madvise_funcs[attr_type](xe, vm, madvise_range.vmas, madvise_range.num_vmas, args,
 				 &details);
 
@@ -505,6 +637,10 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
 	xe_madvise_details_fini(&details);
 unlock_vm:
 	up_write(&vm->lock);
+
+	/* Write retained value to user after releasing all locks */
+	if (!err && do_retained)
+		err = xe_madvise_purgeable_retained_to_user(&details);
 put_vm:
 	xe_vm_put(vm);
 	return err;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v8 03/12] drm/xe/madvise: Implement purgeable buffer object support
  2026-03-26  5:51 ` [PATCH v8 03/12] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
@ 2026-03-26  8:19   ` Thomas Hellström
  0 siblings, 0 replies; 16+ messages in thread
From: Thomas Hellström @ 2026-03-26  8:19 UTC (permalink / raw)
  To: Arvind Yadav, intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray

On Thu, 2026-03-26 at 11:21 +0530, Arvind Yadav wrote:
> This allows userspace applications to provide memory usage hints to
> the kernel for better memory management under pressure:
> 
> Add the core implementation for purgeable buffer objects, enabling
> memory
> reclamation of user-designated DONTNEED buffers during eviction.
> 
> This patch implements the purge operation and state machine
> transitions:
> 
> Purgeable States (from xe_madv_purgeable_state):
>  - WILLNEED (0): BO should be retained, actively used
>  - DONTNEED (1): BO eligible for purging, not currently needed
>  - PURGED (2): BO backing store reclaimed, permanently invalid
> 
> Design Rationale:
>   - Async TLB invalidation via trigger_rebind (no blocking
> xe_vm_invalidate_vma)
>   - i915 compatibility: retained field, "once purged always purged"
> semantics
>   - Shared BO protection prevents multi-process memory corruption
>   - Scratch PTE reuse avoids new infrastructure, safe for fault mode
> 
> Note: The madvise_purgeable() function is implemented but not hooked
> into
> the IOCTL handler (madvise_funcs[] entry is NULL) to maintain
> bisectability.
> The feature will be enabled in the final patch when all supporting
> infrastructure (shrinker, per-VMA tracking) is complete.
> 
> v2:
>   - Use xe_bo_trigger_rebind() for async TLB invalidation (Thomas
> Hellström)
>   - Add NULL rebind with scratch PTEs for fault mode (Thomas
> Hellström)
>   - Implement i915-compatible retained field logic (Thomas Hellström)
>   - Skip BO validation for purged BOs in page fault handler (crash
> fix)
>   - Add scratch VM check in page fault path (non-scratch VMs fail
> fault)
>   - Force clear_pt for non-scratch VMs to avoid phys addr 0 mapping
> (review fix)
>   - Add !is_purged check to resource cursor setup to prevent stale
> access
> 
> v3:
>   - Rebase as xe_gt_pagefault.c is gone upstream and replaced
>     with xe_pagefault.c (Matthew Brost)
>   - Xe specific warn on (Matthew Brost)
>   - Call helpers for madv_purgeable access(Matthew Brost)
>   - Remove bo NULL check(Matthew Brost)
>   - Use xe_bo_assert_held instead of dma assert(Matthew Brost)
>   - Move the xe_bo_is_purged check under the dma-resv lock( by Matt)
>   - Drop is_purged from xe_pt_stage_bind_entry and just set is_null
> to true
>     for purged BO rename s/is_null/is_null_or_purged (by Matt)
>   - UAPI rule should not be changed.(Matthew Brost)
>   - Make 'retained' a userptr (Matthew Brost)
> 
> v4:
>   - @madv_purgeable atomic_t → u32 change across all relevant patches
> (Matt)
> 
> v5:
>   - Introduce xe_bo_set_purgeable_state() helper (void return) to
> centralize
>     madv_purgeable updates with xe_bo_assert_held() and state
> transition
>     validation using explicit enum checks (no transition out of
> PURGED) (Matt)
>   - Make xe_ttm_bo_purge() return int and propagate failures from
>     xe_bo_move(); handle xe_bo_trigger_rebind() failures (e.g.
> no_wait_gpu
>     paths) rather than silently ignoring (Matt)
>   - Replace drm_WARN_ON with xe_assert for better Xe-specific
> assertions (Matt)
>   - Hook purgeable handling into
> madvise_funcs[DRM_XE_VMA_ATTR_PURGEABLE_STATE]
>     instead of special-case path in xe_vm_madvise_ioctl() (Matt)
>   - Track purgeable retained return via xe_madvise_details and
> perform
>     copy_to_user() from xe_madvise_details_fini() after locks are
> dropped (Matt)
>   - Set madvise_funcs[DRM_XE_VMA_ATTR_PURGEABLE_STATE] to NULL with
>     __maybe_unused on madvise_purgeable() to maintain bisectability
> until
>     shrinker integration is complete in final patch (Matt)
>   - Use put_user() instead of copy_to_user() for single u32 retained
> value (Thomas)
>   - Return -EFAULT from ioctl if put_user() fails (Thomas)
>   - Validate userspace initialized retained to 0 before ioctl,
> ensuring safe
>     default (0 = "assume purged") if put_user() fails (Thomas)
>   - Refactor error handling: separate fallible put_user from
> infallible cleanup
>   - xe_madvise_purgeable_retained_to_user(): separate helper for
> fallible put_user
>   - Call put_user() after releasing all locks to avoid circular
> dependencies
>   - Use xe_bo_move_notify() instead of xe_bo_trigger_rebind() in
> xe_ttm_bo_purge()
>     for proper abstraction - handles vunmap, dma-buf notifications,
> and VRAM
>     userfault cleanup (Thomas)
>   - Fix LRU crash while running shrink test
>   - Skip xe_bo_validate() for purged BOs in xe_gpuvm_validate()
> 
> v6:
>   - xe_bo_move_notify() must be called *before* ttm_bo_validate().
> (Thomas)
>   - Block GPU page faults (fault-mode VMs) for DONTNEED bo's (Thomas,
> Matt)
>   - Rename retained to retained_ptr. (Jose)
> 
> v7 Changes:
>   - Fix engine reset from EU overfetch in scratch VMs:
> xe_pagefault_begin()
>     and xe_pagefault_service() now return 0 instead of -EACCES/-
> EINVAL for
>     DONTNEED/purged BOs and missing VMAs so stale accesses hit
> scratch PTEs.
>   - Fix Engine memory CAT errors when Mesa uses
> DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE:
>     accept scratch VMs in xe_pagefault_asid_to_vm() via '||
> xe_vm_has_scratch(vm).
>   - Skip validate/migrate/rebind for DONTNEED/purged BOs in
> xe_pagefault_begin()
>     using a bool *skip_rebind out-parameter. Scratch VMs ACK the
> fault and fall back
>     to scratch PTEs; non-scratch VMs return -EACCES.
> 
> v8:
>   - Remove skip_rebind out-parameter from xe_pagefault_begin();
> always let
>     xe_vma_rebind() run so tile_present is updated and the GPU fault
> resolves.
>     Previously skip_rebind=true left tile_present=0, causing an
> infinite
>     refault loop on scratch VMs. (Thomas)

Let's discuss later whether we need a follow-up patch for this. Perhaps
we should kill the VM in the case of a real fault and just have
prefaults nop.

For now

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


>   - Fixed spelling: corrected "madvice" → "madvise". (Thomas)
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c         | 107 ++++++++++++++++++++---
>  drivers/gpu/drm/xe/xe_bo.h         |   2 +
>  drivers/gpu/drm/xe/xe_pagefault.c  |  15 +++-
>  drivers/gpu/drm/xe/xe_pt.c         |  40 +++++++--
>  drivers/gpu/drm/xe/xe_vm.c         |  20 ++++-
>  drivers/gpu/drm/xe/xe_vm_madvise.c | 136
> +++++++++++++++++++++++++++++
>  6 files changed, 298 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 22179b2df85c..b6055bb4c578 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -835,6 +835,84 @@ static int xe_bo_move_notify(struct xe_bo *bo,
>  	return 0;
>  }
>  
> +/**
> + * xe_bo_set_purgeable_state() - Set BO purgeable state with
> validation
> + * @bo: Buffer object
> + * @new_state: New purgeable state
> + *
> + * Sets the purgeable state with lockdep assertions and validates
> state
> + * transitions. Once a BO is PURGED, it cannot transition to any
> other state.
> + * Invalid transitions are caught with xe_assert().
> + */
> +void xe_bo_set_purgeable_state(struct xe_bo *bo,
> +			       enum xe_madv_purgeable_state
> new_state)
> +{
> +	struct xe_device *xe = xe_bo_device(bo);
> +
> +	xe_bo_assert_held(bo);
> +
> +	/* Validate state is one of the known values */
> +	xe_assert(xe, new_state == XE_MADV_PURGEABLE_WILLNEED ||
> +		      new_state == XE_MADV_PURGEABLE_DONTNEED ||
> +		      new_state == XE_MADV_PURGEABLE_PURGED);
> +
> +	/* Once purged, always purged - cannot transition out */
> +	xe_assert(xe, !(bo->madv_purgeable ==
> XE_MADV_PURGEABLE_PURGED &&
> +			new_state != XE_MADV_PURGEABLE_PURGED));
> +
> +	bo->madv_purgeable = new_state;
> +}
> +
> +/**
> + * xe_ttm_bo_purge() - Purge buffer object backing store
> + * @ttm_bo: The TTM buffer object to purge
> + * @ctx: TTM operation context
> + *
> + * This function purges the backing store of a BO marked as DONTNEED
> and
> + * triggers rebind to invalidate stale GPU mappings. For fault-mode
> VMs,
> + * this zaps the PTEs. The next GPU access will trigger a page fault
> and
> + * perform NULL rebind (scratch pages or clear PTEs based on VM
> config).
> + *
> + * Return: 0 on success, negative error code on failure
> + */
> +static int xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct
> ttm_operation_ctx *ctx)
> +{
> +	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
> +	struct ttm_placement place = {};
> +	int ret;
> +
> +	xe_bo_assert_held(bo);
> +
> +	if (!ttm_bo->ttm)
> +		return 0;
> +
> +	if (!xe_bo_madv_is_dontneed(bo))
> +		return 0;
> +
> +	/*
> +	 * Use the standard pre-move hook so we share the same
> cleanup/invalidate
> +	 * path as migrations: drop any CPU vmap and schedule the
> necessary GPU
> +	 * unbind/rebind work.
> +	 *
> +	 * This must be called before ttm_bo_validate() frees the
> pages.
> +	 * May fail in no-wait contexts (fault/shrinker) or if the
> BO is
> +	 * pinned. Keep state unchanged on failure so we don't end
> up "PURGED"
> +	 * with stale mappings.
> +	 */
> +	ret = xe_bo_move_notify(bo, ctx);
> +	if (ret)
> +		return ret;
> +
> +	ret = ttm_bo_validate(ttm_bo, &place, ctx);
> +	if (ret)
> +		return ret;
> +
> +	/* Commit the state transition only once invalidation was
> queued */
> +	xe_bo_set_purgeable_state(bo, XE_MADV_PURGEABLE_PURGED);
> +
> +	return 0;
> +}
> +
>  static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>  		      struct ttm_operation_ctx *ctx,
>  		      struct ttm_resource *new_mem,
> @@ -854,6 +932,20 @@ static int xe_bo_move(struct ttm_buffer_object
> *ttm_bo, bool evict,
>  				  ttm && ttm_tt_is_populated(ttm)) ?
> true : false;
>  	int ret = 0;
>  
> +	/*
> +	 * Purge only non-shared BOs explicitly marked DONTNEED by
> userspace.
> +	 * The move_notify callback will handle invalidation
> asynchronously.
> +	 */
> +	if (evict && xe_bo_madv_is_dontneed(bo)) {
> +		ret = xe_ttm_bo_purge(ttm_bo, ctx);
> +		if (ret)
> +			return ret;
> +
> +		/* Free the unused eviction destination resource */
> +		ttm_resource_free(ttm_bo, &new_mem);
> +		return 0;
> +	}
> +
>  	/* Bo creation path, moving to system or TT. */
>  	if ((!old_mem && ttm) && !handle_system_ccs) {
>  		if (new_mem->mem_type == XE_PL_TT)
> @@ -1603,18 +1695,6 @@ static void xe_ttm_bo_delete_mem_notify(struct
> ttm_buffer_object *ttm_bo)
>  	}
>  }
>  
> -static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct
> ttm_operation_ctx *ctx)
> -{
> -	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
> -
> -	if (ttm_bo->ttm) {
> -		struct ttm_placement place = {};
> -		int ret = ttm_bo_validate(ttm_bo, &place, ctx);
> -
> -		drm_WARN_ON(&xe->drm, ret);
> -	}
> -}
> -
>  static void xe_ttm_bo_swap_notify(struct ttm_buffer_object *ttm_bo)
>  {
>  	struct ttm_operation_ctx ctx = {
> @@ -2195,6 +2275,9 @@ struct xe_bo *xe_bo_init_locked(struct
> xe_device *xe, struct xe_bo *bo,
>  #endif
>  	INIT_LIST_HEAD(&bo->vram_userfault_link);
>  
> +	/* Initialize purge advisory state */
> +	bo->madv_purgeable = XE_MADV_PURGEABLE_WILLNEED;
> +
>  	drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
>  
>  	if (resv) {
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index fb5541bdf602..653851d47aa6 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -271,6 +271,8 @@ static inline bool xe_bo_madv_is_dontneed(struct
> xe_bo *bo)
>  	return bo->madv_purgeable == XE_MADV_PURGEABLE_DONTNEED;
>  }
>  
> +void xe_bo_set_purgeable_state(struct xe_bo *bo, enum
> xe_madv_purgeable_state new_state);
> +
>  static inline void xe_bo_unpin_map_no_vm(struct xe_bo *bo)
>  {
>  	if (likely(bo)) {
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c
> b/drivers/gpu/drm/xe/xe_pagefault.c
> index ea4857acf28d..2ac6e1edaa81 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -59,6 +59,19 @@ static int xe_pagefault_begin(struct drm_exec
> *exec, struct xe_vma *vma,
>  	if (!bo)
>  		return 0;
>  
> +	/*
> +	 * Skip validate/migrate for DONTNEED/purged BOs -
> repopulating
> +	 * their pages would prevent the shrinker from reclaiming
> them.
> +	 * For non-scratch VMs there is no safe fallback so fail the
> fault.
> +	 * For scratch VMs let xe_vma_rebind() run normally; it will
> install
> +	 * scratch PTEs so the GPU gets safe zero reads instead of
> faulting.
> +	 */
> +	if (unlikely(xe_bo_madv_is_dontneed(bo) ||
> xe_bo_is_purged(bo))) {
> +		if (!xe_vm_has_scratch(vm))
> +			return -EACCES;
> +		return 0;
> +	}
> +
>  	return need_vram_move ? xe_bo_migrate(bo, vram->placement,
> NULL, exec) :
>  		xe_bo_validate(bo, vm, true, exec);
>  }
> @@ -145,7 +158,7 @@ static struct xe_vm
> *xe_pagefault_asid_to_vm(struct xe_device *xe, u32 asid)
>  
>  	down_read(&xe->usm.lock);
>  	vm = xa_load(&xe->usm.asid_to_vm, asid);
> -	if (vm && xe_vm_in_fault_mode(vm))
> +	if (vm && (xe_vm_in_fault_mode(vm) ||
> xe_vm_has_scratch(vm)))
>  		xe_vm_get(vm);
>  	else
>  		vm = ERR_PTR(-EINVAL);
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 2d9ce2c4cb4f..08f40701f654 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -531,20 +531,26 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent,
> pgoff_t offset,
>  	/* Is this a leaf entry ?*/
>  	if (level == 0 || xe_pt_hugepte_possible(addr, next, level,
> xe_walk)) {
>  		struct xe_res_cursor *curs = xe_walk->curs;
> -		bool is_null = xe_vma_is_null(xe_walk->vma);
> -		bool is_vram = is_null ? false :
> xe_res_is_vram(curs);
> +		struct xe_bo *bo = xe_vma_bo(xe_walk->vma);
> +		bool is_null_or_purged = xe_vma_is_null(xe_walk-
> >vma) ||
> +					 (bo &&
> xe_bo_is_purged(bo));
> +		bool is_vram = is_null_or_purged ? false :
> xe_res_is_vram(curs);
>  
>  		XE_WARN_ON(xe_walk->va_curs_start != addr);
>  
>  		if (xe_walk->clear_pt) {
>  			pte = 0;
>  		} else {
> -			pte = vm->pt_ops->pte_encode_vma(is_null ? 0
> :
> +			/*
> +			 * For purged BOs, treat like null VMAs -
> pass address 0.
> +			 * The pte_encode_vma will set XE_PTE_NULL
> flag for scratch mapping.
> +			 */
> +			pte = vm->pt_ops-
> >pte_encode_vma(is_null_or_purged ? 0 :
>  							
> xe_res_dma(curs) +
>  							 xe_walk-
> >dma_offset,
>  							 xe_walk-
> >vma,
>  							 pat_index,
> level);
> -			if (!is_null)
> +			if (!is_null_or_purged)
>  				pte |= is_vram ? xe_walk-
> >default_vram_pte :
>  					xe_walk->default_system_pte;
>  
> @@ -568,7 +574,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent,
> pgoff_t offset,
>  		if (unlikely(ret))
>  			return ret;
>  
> -		if (!is_null && !xe_walk->clear_pt)
> +		if (!is_null_or_purged && !xe_walk->clear_pt)
>  			xe_res_next(curs, next - addr);
>  		xe_walk->va_curs_start = next;
>  		xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K <<
> level);
> @@ -721,6 +727,26 @@ xe_pt_stage_bind(struct xe_tile *tile, struct
> xe_vma *vma,
>  	};
>  	struct xe_pt *pt = vm->pt_root[tile->id];
>  	int ret;
> +	bool is_purged = false;
> +
> +	/*
> +	 * Check if BO is purged:
> +	 * - Scratch VMs: Use scratch PTEs (XE_PTE_NULL) for safe
> zero reads
> +	 * - Non-scratch VMs: Clear PTEs to zero (non-present) to
> avoid mapping to phys addr 0
> +	 *
> +	 * For non-scratch VMs, we force clear_pt=true so leaf PTEs
> become completely
> +	 * zero instead of creating a PRESENT mapping to physical
> address 0.
> +	 */
> +	if (bo && xe_bo_is_purged(bo)) {
> +		is_purged = true;
> +
> +		/*
> +		 * For non-scratch VMs, a NULL rebind should use
> zero PTEs
> +		 * (non-present), not a present PTE to phys 0.
> +		 */
> +		if (!xe_vm_has_scratch(vm))
> +			xe_walk.clear_pt = true;
> +	}
>  
>  	if (range) {
>  		/* Move this entire thing to xe_svm.c? */
> @@ -756,11 +782,11 @@ xe_pt_stage_bind(struct xe_tile *tile, struct
> xe_vma *vma,
>  	}
>  
>  	xe_walk.default_vram_pte |= XE_PPGTT_PTE_DM;
> -	xe_walk.dma_offset = bo ? vram_region_gpu_offset(bo-
> >ttm.resource) : 0;
> +	xe_walk.dma_offset = (bo && !is_purged) ?
> vram_region_gpu_offset(bo->ttm.resource) : 0;
>  	if (!range)
>  		xe_bo_assert_held(bo);
>  
> -	if (!xe_vma_is_null(vma) && !range) {
> +	if (!xe_vma_is_null(vma) && !range && !is_purged) {
>  		if (xe_vma_is_userptr(vma))
>  			xe_res_first_dma(to_userptr_vma(vma)-
> >userptr.pages.dma_addr, 0,
>  					 xe_vma_size(vma), &curs);
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 5572e12c2a7e..a0ade67d616e 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -326,6 +326,7 @@ void xe_vm_kill(struct xe_vm *vm, bool unlocked)
>  static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct
> drm_exec *exec)
>  {
>  	struct xe_vm *vm = gpuvm_to_vm(vm_bo->vm);
> +	struct xe_bo *bo = gem_to_xe_bo(vm_bo->obj);
>  	struct drm_gpuva *gpuva;
>  	int ret;
>  
> @@ -334,10 +335,16 @@ static int xe_gpuvm_validate(struct
> drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
>  		list_move_tail(&gpuva_to_vma(gpuva)-
> >combined_links.rebind,
>  			       &vm->rebind_list);
>  
> +	/* Skip re-populating purged BOs, rebind maps scratch pages.
> */
> +	if (xe_bo_is_purged(bo)) {
> +		vm_bo->evicted = false;
> +		return 0;
> +	}
> +
>  	if (!try_wait_for_completion(&vm->xe->pm_block))
>  		return -EAGAIN;
>  
> -	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false,
> exec);
> +	ret = xe_bo_validate(bo, vm, false, exec);
>  	if (ret)
>  		return ret;
>  
> @@ -1358,6 +1365,9 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo,
> u64 bo_offset,
>  static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>  			       u16 pat_index, u32 pt_level)
>  {
> +	struct xe_bo *bo = xe_vma_bo(vma);
> +	struct xe_vm *vm = xe_vma_vm(vma);
> +
>  	pte |= XE_PAGE_PRESENT;
>  
>  	if (likely(!xe_vma_read_only(vma)))
> @@ -1366,7 +1376,13 @@ static u64 xelp_pte_encode_vma(u64 pte, struct
> xe_vma *vma,
>  	pte |= pte_encode_pat_index(pat_index, pt_level);
>  	pte |= pte_encode_ps(pt_level);
>  
> -	if (unlikely(xe_vma_is_null(vma)))
> +	/*
> +	 * NULL PTEs redirect to scratch page (return zeros on
> read).
> +	 * Set for: 1) explicit null VMAs, 2) purged BOs on scratch
> VMs.
> +	 * Never set NULL flag without scratch page - causes
> undefined behavior.
> +	 */
> +	if (unlikely(xe_vma_is_null(vma) ||
> +		     (bo && xe_bo_is_purged(bo) &&
> xe_vm_has_scratch(vm))))
>  		pte |= XE_PTE_NULL;
>  
>  	return pte;
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c
> b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index 869db304d96d..881de6cb6c11 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -26,6 +26,8 @@ struct xe_vmas_in_madvise_range {
>  /**
>   * struct xe_madvise_details - Argument to madvise_funcs
>   * @dpagemap: Reference-counted pointer to a struct drm_pagemap.
> + * @has_purged_bo: Track if any BO was purged (for purgeable state)
> + * @retained_ptr: User pointer for retained value (for purgeable
> state)
>   *
>   * The madvise IOCTL handler may, in addition to the user-space
>   * args, have additional info to pass into the madvise_func that
> @@ -34,6 +36,8 @@ struct xe_vmas_in_madvise_range {
>   */
>  struct xe_madvise_details {
>  	struct drm_pagemap *dpagemap;
> +	bool has_purged_bo;
> +	u64 retained_ptr;
>  };
>  
>  static int get_vmas(struct xe_vm *vm, struct
> xe_vmas_in_madvise_range *madvise_range)
> @@ -180,6 +184,67 @@ static void madvise_pat_index(struct xe_device
> *xe, struct xe_vm *vm,
>  	}
>  }
>  
> +/**
> + * madvise_purgeable - Handle purgeable buffer object advice
> + * @xe: XE device
> + * @vm: VM
> + * @vmas: Array of VMAs
> + * @num_vmas: Number of VMAs
> + * @op: Madvise operation
> + * @details: Madvise details for return values
> + *
> + * Handles DONTNEED/WILLNEED/PURGED states. Tracks if any BO was
> purged
> + * in details->has_purged_bo for later copy to userspace.
> + *
> + * Note: Marked __maybe_unused until hooked into madvise_funcs[] in
> the
> + * final patch to maintain bisectability. The NULL placeholder in
> the
> + * array ensures proper -EINVAL return for userspace until all
> supporting
> + * infrastructure (shrinker, per-VMA tracking) is complete.
> + */
> +static void __maybe_unused madvise_purgeable(struct xe_device *xe,
> +					     struct xe_vm *vm,
> +					     struct xe_vma **vmas,
> +					     int num_vmas,
> +					     struct drm_xe_madvise
> *op,
> +					     struct
> xe_madvise_details *details)
> +{
> +	int i;
> +
> +	xe_assert(vm->xe, op->type ==
> DRM_XE_VMA_ATTR_PURGEABLE_STATE);
> +
> +	for (i = 0; i < num_vmas; i++) {
> +		struct xe_bo *bo = xe_vma_bo(vmas[i]);
> +
> +		if (!bo)
> +			continue;
> +
> +		/* BO must be locked before modifying madv state */
> +		xe_bo_assert_held(bo);
> +
> +		/*
> +		 * Once purged, always purged. Cannot transition
> back to WILLNEED.
> +		 * This matches i915 semantics where purged BOs are
> permanently invalid.
> +		 */
> +		if (xe_bo_is_purged(bo)) {
> +			details->has_purged_bo = true;
> +			continue;
> +		}
> +
> +		switch (op->purge_state_val.val) {
> +		case DRM_XE_VMA_PURGEABLE_STATE_WILLNEED:
> +			xe_bo_set_purgeable_state(bo,
> XE_MADV_PURGEABLE_WILLNEED);
> +			break;
> +		case DRM_XE_VMA_PURGEABLE_STATE_DONTNEED:
> +			xe_bo_set_purgeable_state(bo,
> XE_MADV_PURGEABLE_DONTNEED);
> +			break;
> +		default:
> +			drm_warn(&vm->xe->drm, "Invalid madvise
> value = %d\n",
> +				 op->purge_state_val.val);
> +			return;
> +		}
> +	}
> +}
> +
>  typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
>  			     struct xe_vma **vmas, int num_vmas,
>  			     struct drm_xe_madvise *op,
> @@ -189,6 +254,12 @@ static const madvise_func madvise_funcs[] = {
>  	[DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC] =
> madvise_preferred_mem_loc,
>  	[DRM_XE_MEM_RANGE_ATTR_ATOMIC] = madvise_atomic,
>  	[DRM_XE_MEM_RANGE_ATTR_PAT] = madvise_pat_index,
> +	/*
> +	 * Purgeable support implemented but not enabled yet to
> maintain
> +	 * bisectability. Will be set to madvise_purgeable() in
> final patch
> +	 * when all infrastructure (shrinker, VMA tracking) is
> complete.
> +	 */
> +	[DRM_XE_VMA_ATTR_PURGEABLE_STATE] = NULL,
>  };
>  
>  static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start,
> u64 end)
> @@ -319,6 +390,19 @@ static bool madvise_args_are_sane(struct
> xe_device *xe, const struct drm_xe_madv
>  			return false;
>  		break;
>  	}
> +	case DRM_XE_VMA_ATTR_PURGEABLE_STATE:
> +	{
> +		u32 val = args->purge_state_val.val;
> +
> +		if (XE_IOCTL_DBG(xe, !(val ==
> DRM_XE_VMA_PURGEABLE_STATE_WILLNEED ||
> +				       val ==
> DRM_XE_VMA_PURGEABLE_STATE_DONTNEED)))
> +			return false;
> +
> +		if (XE_IOCTL_DBG(xe, args->purge_state_val.pad))
> +			return false;
> +
> +		break;
> +	}
>  	default:
>  		if (XE_IOCTL_DBG(xe, 1))
>  			return false;
> @@ -337,6 +421,12 @@ static int xe_madvise_details_init(struct xe_vm
> *vm, const struct drm_xe_madvise
>  
>  	memset(details, 0, sizeof(*details));
>  
> +	/* Store retained pointer for purgeable state */
> +	if (args->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE) {
> +		details->retained_ptr = args-
> >purge_state_val.retained_ptr;
> +		return 0;
> +	}
> +
>  	if (args->type == DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC) {
>  		int fd = args->preferred_mem_loc.devmem_fd;
>  		struct drm_pagemap *dpagemap;
> @@ -365,6 +455,21 @@ static void xe_madvise_details_fini(struct
> xe_madvise_details *details)
>  	drm_pagemap_put(details->dpagemap);
>  }
>  
> +static int xe_madvise_purgeable_retained_to_user(const struct
> xe_madvise_details *details)
> +{
> +	u32 retained;
> +
> +	if (!details->retained_ptr)
> +		return 0;
> +
> +	retained = !details->has_purged_bo;
> +
> +	if (put_user(retained, (u32 __user
> *)u64_to_user_ptr(details->retained_ptr)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>  static bool check_bo_args_are_sane(struct xe_vm *vm, struct xe_vma
> **vmas,
>  				   int num_vmas, u32 atomic_val)
>  {
> @@ -422,6 +527,7 @@ int xe_vm_madvise_ioctl(struct drm_device *dev,
> void *data, struct drm_file *fil
>  	struct xe_vm *vm;
>  	struct drm_exec exec;
>  	int err, attr_type;
> +	bool do_retained;
>  
>  	vm = xe_vm_lookup(xef, args->vm_id);
>  	if (XE_IOCTL_DBG(xe, !vm))
> @@ -432,6 +538,25 @@ int xe_vm_madvise_ioctl(struct drm_device *dev,
> void *data, struct drm_file *fil
>  		goto put_vm;
>  	}
>  
> +	/* Cache whether we need to write retained, and validate
> it's initialized to 0 */
> +	do_retained = args->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE
> &&
> +		      args->purge_state_val.retained_ptr;
> +	if (do_retained) {
> +		u32 retained;
> +		u32 __user *retained_ptr;
> +
> +		retained_ptr = u64_to_user_ptr(args-
> >purge_state_val.retained_ptr);
> +		if (get_user(retained, retained_ptr)) {
> +			err = -EFAULT;
> +			goto put_vm;
> +		}
> +
> +		if (XE_IOCTL_DBG(xe, retained != 0)) {
> +			err = -EINVAL;
> +			goto put_vm;
> +		}
> +	}
> +
>  	xe_svm_flush(vm);
>  
>  	err = down_write_killable(&vm->lock);
> @@ -487,6 +612,13 @@ int xe_vm_madvise_ioctl(struct drm_device *dev,
> void *data, struct drm_file *fil
>  	}
>  
>  	attr_type = array_index_nospec(args->type,
> ARRAY_SIZE(madvise_funcs));
> +
> +	/* Ensure the madvise function exists for this type */
> +	if (!madvise_funcs[attr_type]) {
> +		err = -EINVAL;
> +		goto err_fini;
> +	}
> +
>  	madvise_funcs[attr_type](xe, vm, madvise_range.vmas,
> madvise_range.num_vmas, args,
>  				 &details);
>  
> @@ -505,6 +637,10 @@ int xe_vm_madvise_ioctl(struct drm_device *dev,
> void *data, struct drm_file *fil
>  	xe_madvise_details_fini(&details);
>  unlock_vm:
>  	up_write(&vm->lock);
> +
> +	/* Write retained value to user after releasing all locks */
> +	if (!err && do_retained)
> +		err =
> xe_madvise_purgeable_retained_to_user(&details);
>  put_vm:
>  	xe_vm_put(vm);
>  	return err;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v8 04/12] drm/xe/bo: Block CPU faults to purgeable buffer objects
  2026-03-26  5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (2 preceding siblings ...)
  2026-03-26  5:51 ` [PATCH v8 03/12] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
@ 2026-03-26  5:51 ` Arvind Yadav
  2026-03-26  5:51 ` [PATCH v8 05/12] drm/xe/vm: Prevent binding of purged " Arvind Yadav
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26  5:51 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom

Block CPU page faults to buffer objects marked as purgeable (DONTNEED)
or already purged. Once a BO is marked DONTNEED, its contents can be
discarded by the kernel at any time, making access undefined behavior.
Return VM_FAULT_SIGBUS immediately to fail consistently instead of
allowing erratic behavior where access sometimes works (if not yet
purged) and sometimes fails (if purged).

For DONTNEED BOs:
- Block new CPU faults with SIGBUS to prevent undefined behavior.
- Existing CPU PTEs may still work until TLB flush, but new faults
  fail immediately.

For PURGED BOs:
- Backing store has been reclaimed, making CPU access invalid.
- Without this check, accessing existing mmap mappings would trigger
  xe_bo_fault_migrate() on freed backing store, causing kernel hangs
  or crashes.

The purgeable check is added to both CPU fault paths:
- Fastpath (xe_bo_cpu_fault_fastpath): Returns VM_FAULT_SIGBUS immediately
  under dma-resv lock, preventing attempts to migrate/validate
  DONTNEED/purged pages.
- Slowpath (xe_bo_cpu_fault): Returns -EFAULT under drm_exec lock,
  converted to VM_FAULT_SIGBUS.

This matches i915 semantics for purged buffer handling.

v2:
  - Added xe_bo_is_purged(bo) instead of atomic_read.
  - Avoids leaks and keeps drm_dev_exit() while returning.

v3:
  - Move xe_bo_is_purged check under a dma-resv lock (Matthew Brost)

v4:
  - Add purged check to fastpath (xe_bo_cpu_fault_fastpath) to prevent
    hang when accessing existing mmap of purged BO.

v6:
  - Block CPU faults to DONTNEED BOs with VM_FAULT_SIGBUS. (Thomas, Matt)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index b6055bb4c578..da18b43650e3 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1979,6 +1979,16 @@ static vm_fault_t xe_bo_cpu_fault_fastpath(struct vm_fault *vmf, struct xe_devic
 	if (!dma_resv_trylock(tbo->base.resv))
 		goto out_validation;
 
+	/*
+	 * Reject CPU faults to purgeable BOs. DONTNEED BOs can be purged
+	 * at any time, and purged BOs have no backing store. Either case
+	 * is undefined behavior for CPU access.
+	 */
+	if (xe_bo_madv_is_dontneed(bo) || xe_bo_is_purged(bo)) {
+		ret = VM_FAULT_SIGBUS;
+		goto out_unlock;
+	}
+
 	if (xe_ttm_bo_is_imported(tbo)) {
 		ret = VM_FAULT_SIGBUS;
 		drm_dbg(&xe->drm, "CPU trying to access an imported buffer object.\n");
@@ -2069,6 +2079,15 @@ static vm_fault_t xe_bo_cpu_fault(struct vm_fault *vmf)
 		if (err)
 			break;
 
+		/*
+		 * Reject CPU faults to purgeable BOs. DONTNEED BOs can be
+		 * purged at any time, and purged BOs have no backing store.
+		 */
+		if (xe_bo_madv_is_dontneed(bo) || xe_bo_is_purged(bo)) {
+			err = -EFAULT;
+			break;
+		}
+
 		if (xe_ttm_bo_is_imported(tbo)) {
 			err = -EFAULT;
 			drm_dbg(&xe->drm, "CPU trying to access an imported buffer object.\n");
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v8 05/12] drm/xe/vm: Prevent binding of purged buffer objects
  2026-03-26  5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (3 preceding siblings ...)
  2026-03-26  5:51 ` [PATCH v8 04/12] drm/xe/bo: Block CPU faults to purgeable buffer objects Arvind Yadav
@ 2026-03-26  5:51 ` Arvind Yadav
  2026-03-26  5:51 ` [PATCH v8 06/12] drm/xe/madvise: Implement per-VMA purgeable state tracking Arvind Yadav
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26  5:51 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom

Add purge checking to vma_lock_and_validate() to block new mapping
operations on purged BOs while allowing cleanup operations to proceed.

Purged BOs have their backing pages freed by the kernel. New
mapping operations (MAP, PREFETCH, REMAP) must be rejected with
-EINVAL to prevent GPU access to invalid memory. Cleanup
operations (UNMAP) must be allowed so applications can release
resources after detecting purge via the retained field.

REMAP operations require mixed handling - reject new prev/next
VMAs if the BO is purged, but allow the unmap portion to proceed
for cleanup.

The check_purged flag in struct xe_vma_lock_and_validate_flags
distinguishes between these cases: true for new mappings (must reject),
false for cleanup (allow).

v2:
  - Clarify that purged BOs are permanently invalid (i915 semantics)
  - Remove incorrect claim about madvise(WILLNEED) restoring purged BOs

v3:
  - Move xe_bo_is_purged check under vma_lock_and_validate (Matt)
  - Add check_purged parameter to distinguish new mappings from cleanup
  - Allow UNMAP operations to prevent resource leaks
  - Handle REMAP operation's dual nature (cleanup + new mappings)

v5:
  - Replace three boolean parameters with struct xe_vma_lock_and_validate_flags
    to improve readability and prevent argument transposition (Matt)
  - Use u32 bitfields instead of bool members to match xe_bo_shrink_flags
    pattern - more efficient packing and follows xe driver conventions (Thomas)
  - Pass struct as const since flags are read-only (Matt)

v6:
  - Block VM_BIND to DONTNEED BOs with -EBUSY (Thomas, Matt)

v7:
  - Pass xe_vma_lock_and_validate_flags by value instead of by
    pointer, consistent with xe driver style. (Thomas)

Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 82 ++++++++++++++++++++++++++++++++------
 1 file changed, 69 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index a0ade67d616e..9c1a82b64a43 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2918,8 +2918,22 @@ static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm,
 	}
 }
 
+/**
+ * struct xe_vma_lock_and_validate_flags - Flags for vma_lock_and_validate()
+ * @res_evict: Allow evicting resources during validation
+ * @validate: Perform BO validation
+ * @request_decompress: Request BO decompression
+ * @check_purged: Reject operation if BO is purged
+ */
+struct xe_vma_lock_and_validate_flags {
+	u32 res_evict : 1;
+	u32 validate : 1;
+	u32 request_decompress : 1;
+	u32 check_purged : 1;
+};
+
 static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma,
-				 bool res_evict, bool validate, bool request_decompress)
+				 struct xe_vma_lock_and_validate_flags flags)
 {
 	struct xe_bo *bo = xe_vma_bo(vma);
 	struct xe_vm *vm = xe_vma_vm(vma);
@@ -2928,15 +2942,24 @@ static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma,
 	if (bo) {
 		if (!bo->vm)
 			err = drm_exec_lock_obj(exec, &bo->ttm.base);
-		if (!err && validate)
+
+		/* Reject new mappings to DONTNEED/purged BOs; allow cleanup operations */
+		if (!err && flags.check_purged) {
+			if (xe_bo_madv_is_dontneed(bo))
+				err = -EBUSY;  /* BO marked purgeable */
+			else if (xe_bo_is_purged(bo))
+				err = -EINVAL; /* BO already purged */
+		}
+
+		if (!err && flags.validate)
 			err = xe_bo_validate(bo, vm,
 					     xe_vm_allow_vm_eviction(vm) &&
-					     res_evict, exec);
+					     flags.res_evict, exec);
 
 		if (err)
 			return err;
 
-		if (request_decompress)
+		if (flags.request_decompress)
 			err = xe_bo_decompress(bo);
 	}
 
@@ -3030,10 +3053,13 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
 	case DRM_GPUVA_OP_MAP:
 		if (!op->map.invalidate_on_bind)
 			err = vma_lock_and_validate(exec, op->map.vma,
-						    res_evict,
-						    !xe_vm_in_fault_mode(vm) ||
-						    op->map.immediate,
-						    op->map.request_decompress);
+						    (struct xe_vma_lock_and_validate_flags) {
+							.res_evict = res_evict,
+							.validate = !xe_vm_in_fault_mode(vm) ||
+								    op->map.immediate,
+							.request_decompress = op->map.request_decompress,
+							.check_purged = true,
+						    });
 		break;
 	case DRM_GPUVA_OP_REMAP:
 		err = check_ufence(gpuva_to_vma(op->base.remap.unmap->va));
@@ -3042,13 +3068,28 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
 
 		err = vma_lock_and_validate(exec,
 					    gpuva_to_vma(op->base.remap.unmap->va),
-					    res_evict, false, false);
+					    (struct xe_vma_lock_and_validate_flags) {
+						    .res_evict = res_evict,
+						    .validate = false,
+						    .request_decompress = false,
+						    .check_purged = false,
+					    });
 		if (!err && op->remap.prev)
 			err = vma_lock_and_validate(exec, op->remap.prev,
-						    res_evict, true, false);
+						    (struct xe_vma_lock_and_validate_flags) {
+							    .res_evict = res_evict,
+							    .validate = true,
+							    .request_decompress = false,
+							    .check_purged = true,
+						    });
 		if (!err && op->remap.next)
 			err = vma_lock_and_validate(exec, op->remap.next,
-						    res_evict, true, false);
+						    (struct xe_vma_lock_and_validate_flags) {
+							    .res_evict = res_evict,
+							    .validate = true,
+							    .request_decompress = false,
+							    .check_purged = true,
+						    });
 		break;
 	case DRM_GPUVA_OP_UNMAP:
 		err = check_ufence(gpuva_to_vma(op->base.unmap.va));
@@ -3057,7 +3098,12 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
 
 		err = vma_lock_and_validate(exec,
 					    gpuva_to_vma(op->base.unmap.va),
-					    res_evict, false, false);
+					    (struct xe_vma_lock_and_validate_flags) {
+						    .res_evict = res_evict,
+						    .validate = false,
+						    .request_decompress = false,
+						    .check_purged = false,
+					    });
 		break;
 	case DRM_GPUVA_OP_PREFETCH:
 	{
@@ -3070,9 +3116,19 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
 				  region <= ARRAY_SIZE(region_to_mem_type));
 		}
 
+		/*
+		 * Prefetch attempts to migrate BO's backing store without
+		 * repopulating it first. Purged BOs have no backing store
+		 * to migrate, so reject the operation.
+		 */
 		err = vma_lock_and_validate(exec,
 					    gpuva_to_vma(op->base.prefetch.va),
-					    res_evict, false, false);
+					    (struct xe_vma_lock_and_validate_flags) {
+						    .res_evict = res_evict,
+						    .validate = false,
+						    .request_decompress = false,
+						    .check_purged = true,
+					    });
 		if (!err && !xe_vma_has_no_bo(vma))
 			err = xe_bo_migrate(xe_vma_bo(vma),
 					    region_to_mem_type[region],
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v8 06/12] drm/xe/madvise: Implement per-VMA purgeable state tracking
  2026-03-26  5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (4 preceding siblings ...)
  2026-03-26  5:51 ` [PATCH v8 05/12] drm/xe/vm: Prevent binding of purged " Arvind Yadav
@ 2026-03-26  5:51 ` Arvind Yadav
  2026-03-26  5:51 ` [PATCH v8 07/12] drm/xe/madvise: Block imported and exported dma-bufs Arvind Yadav
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26  5:51 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom

Track purgeable state per-VMA instead of using a coarse shared
BO check. This prevents purging shared BOs until all VMAs across
all VMs are marked DONTNEED.

Add xe_bo_all_vmas_dontneed() to check all VMAs before marking
a BO purgeable. Add xe_bo_recheck_purgeable_on_vma_unbind() to
handle state transitions when VMAs are destroyed - if all
remaining VMAs are DONTNEED the BO can become purgeable, or if
no VMAs remain it transitions to WILLNEED.

The per-VMA purgeable_state field stores the madvise hint for
each mapping. Shared BOs can only be purged when all VMAs
unanimously indicate DONTNEED.

This prevents the bug where unmapping the last VMA would incorrectly flip
a DONTNEED BO back to WILLNEED. The enum-based state check preserves BO
state when no VMAs remain, only updating when VMAs provide explicit hints.

v3:
  - This addresses Thomas Hellström's feedback: "loop over all vmas
    attached to the bo and check that they all say WONTNEED. This will
    also need a check at VMA unbinding"

v4:
  - @madv_purgeable atomic_t → u32 change across all relevant
    patches (Matt)

v5:
  - Call xe_bo_recheck_purgeable_on_vma_unbind() from xe_vma_destroy()
    right after drm_gpuva_unlink() where we already hold the BO lock,
    drop the trylock-based late destroy path (Matt)
  - Move purgeable_state into xe_vma_mem_attr with the other madvise
    attributes (Matt)
  - Drop READ_ONCE since the BO lock already protects us (Matt)
  - Keep returning false when there are no VMAs - otherwise we'd mark
    BOs purgeable without any user hint (Matt)
  - Use xe_bo_set_purgeable_state() instead of direct initialization(Matt)
  - use xe_assert instead of drm_warn (Thomas)

v6:
  - Fix state transition bug: don't flip DONTNEED → WILLNEED when last
    VMA unmapped (Matt)
  - Change xe_bo_all_vmas_dontneed() from bool to enum to distinguish
    "no VMAs" from "has WILLNEED VMA" (Matt)
  - Preserve BO state on NO_VMAS instead of forcing WILLNEED.
  - Set skip_invalidation explicitly in madvise_purgeable() to ensure
    DONTNEED always zaps GPU PTEs regardless of prior madvise state.

v7:
  - Don't zap PTEs at DONTNEED time -- pages are still alive.
    The zap happens in xe_bo_move_notify() right before the shrinker
    frees them.
  - Simplify xe_bo_recompute_purgeable_state() by relying on the
    intentional value alignment between xe_bo_vmas_purge_state and
    xe_madv_purgeable_state enums. Add static_assert to enforce the
    alignment. (Thomas)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c        |   1 +
 drivers/gpu/drm/xe/xe_vm.c         |   9 +-
 drivers/gpu/drm/xe/xe_vm_madvise.c | 136 +++++++++++++++++++++++++++--
 drivers/gpu/drm/xe/xe_vm_madvise.h |   3 +
 drivers/gpu/drm/xe/xe_vm_types.h   |  11 +++
 5 files changed, 153 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index a91c84487a67..062ef77e283f 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -322,6 +322,7 @@ static void xe_vma_set_default_attributes(struct xe_vma *vma)
 		.preferred_loc.migration_policy = DRM_XE_MIGRATE_ALL_PAGES,
 		.pat_index = vma->attr.default_pat_index,
 		.atomic_access = DRM_XE_ATOMIC_UNDEFINED,
+		.purgeable_state = XE_MADV_PURGEABLE_WILLNEED,
 	};
 
 	xe_vma_mem_attr_copy(&vma->attr, &default_attr);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 9c1a82b64a43..07393540f34c 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -39,6 +39,7 @@
 #include "xe_tile.h"
 #include "xe_tlb_inval.h"
 #include "xe_trace_bo.h"
+#include "xe_vm_madvise.h"
 #include "xe_wa.h"
 
 static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
@@ -1085,6 +1086,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 static void xe_vma_destroy_late(struct xe_vma *vma)
 {
 	struct xe_vm *vm = xe_vma_vm(vma);
+	struct xe_bo *bo = xe_vma_bo(vma);
 
 	if (vma->ufence) {
 		xe_sync_ufence_put(vma->ufence);
@@ -1099,7 +1101,7 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
 	} else if (xe_vma_is_null(vma) || xe_vma_is_cpu_addr_mirror(vma)) {
 		xe_vm_put(vm);
 	} else {
-		xe_bo_put(xe_vma_bo(vma));
+		xe_bo_put(bo);
 	}
 
 	xe_vma_free(vma);
@@ -1125,6 +1127,7 @@ static void vma_destroy_cb(struct dma_fence *fence,
 static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
 {
 	struct xe_vm *vm = xe_vma_vm(vma);
+	struct xe_bo *bo = xe_vma_bo(vma);
 
 	lockdep_assert_held_write(&vm->lock);
 	xe_assert(vm->xe, list_empty(&vma->combined_links.destroy));
@@ -1133,9 +1136,10 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
 		xe_assert(vm->xe, vma->gpuva.flags & XE_VMA_DESTROYED);
 		xe_userptr_destroy(to_userptr_vma(vma));
 	} else if (!xe_vma_is_null(vma) && !xe_vma_is_cpu_addr_mirror(vma)) {
-		xe_bo_assert_held(xe_vma_bo(vma));
+		xe_bo_assert_held(bo);
 
 		drm_gpuva_unlink(&vma->gpuva);
+		xe_bo_recompute_purgeable_state(bo);
 	}
 
 	xe_vm_assert_held(vm);
@@ -2692,6 +2696,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 				.atomic_access = DRM_XE_ATOMIC_UNDEFINED,
 				.default_pat_index = op->map.pat_index,
 				.pat_index = op->map.pat_index,
+				.purgeable_state = XE_MADV_PURGEABLE_WILLNEED,
 			};
 
 			flags |= op->map.vma_flags & XE_VMA_CREATE_MASK;
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 881de6cb6c11..ed1940da7739 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -13,6 +13,7 @@
 #include "xe_pt.h"
 #include "xe_svm.h"
 #include "xe_tlb_inval.h"
+#include "xe_vm.h"
 
 struct xe_vmas_in_madvise_range {
 	u64 addr;
@@ -184,6 +185,116 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
 	}
 }
 
+/**
+ * enum xe_bo_vmas_purge_state - VMA purgeable state aggregation
+ *
+ * Distinguishes whether a BO's VMAs are all DONTNEED, have at least
+ * one WILLNEED, or have no VMAs at all.
+ *
+ * Enum values align with XE_MADV_PURGEABLE_* states for consistency.
+ */
+enum xe_bo_vmas_purge_state {
+	/** @XE_BO_VMAS_STATE_WILLNEED: At least one VMA is WILLNEED */
+	XE_BO_VMAS_STATE_WILLNEED = 0,
+	/** @XE_BO_VMAS_STATE_DONTNEED: All VMAs are DONTNEED */
+	XE_BO_VMAS_STATE_DONTNEED = 1,
+	/** @XE_BO_VMAS_STATE_NO_VMAS: BO has no VMAs */
+	XE_BO_VMAS_STATE_NO_VMAS = 2,
+};
+
+/*
+ * xe_bo_recompute_purgeable_state() casts between xe_bo_vmas_purge_state and
+ * xe_madv_purgeable_state. Enforce that WILLNEED=0 and DONTNEED=1 match across
+ * both enums so the single-line cast is always valid.
+ */
+static_assert(XE_BO_VMAS_STATE_WILLNEED == (int)XE_MADV_PURGEABLE_WILLNEED,
+	      "VMA purge state WILLNEED must equal madv purgeable WILLNEED");
+static_assert(XE_BO_VMAS_STATE_DONTNEED == (int)XE_MADV_PURGEABLE_DONTNEED,
+	      "VMA purge state DONTNEED must equal madv purgeable DONTNEED");
+
+/**
+ * xe_bo_all_vmas_dontneed() - Determine BO VMA purgeable state
+ * @bo: Buffer object
+ *
+ * Check all VMAs across all VMs to determine aggregate purgeable state.
+ * Shared BOs require unanimous DONTNEED state from all mappings.
+ *
+ * Caller must hold BO dma-resv lock.
+ *
+ * Return: XE_BO_VMAS_STATE_DONTNEED if all VMAs are DONTNEED,
+ *         XE_BO_VMAS_STATE_WILLNEED if at least one VMA is not DONTNEED,
+ *         XE_BO_VMAS_STATE_NO_VMAS if BO has no VMAs
+ */
+static enum xe_bo_vmas_purge_state xe_bo_all_vmas_dontneed(struct xe_bo *bo)
+{
+	struct drm_gpuvm_bo *vm_bo;
+	struct drm_gpuva *gpuva;
+	struct drm_gem_object *obj = &bo->ttm.base;
+	bool has_vmas = false;
+
+	xe_bo_assert_held(bo);
+
+	drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
+		drm_gpuvm_bo_for_each_va(gpuva, vm_bo) {
+			struct xe_vma *vma = gpuva_to_vma(gpuva);
+
+			has_vmas = true;
+
+			/* Any non-DONTNEED VMA prevents purging */
+			if (vma->attr.purgeable_state != XE_MADV_PURGEABLE_DONTNEED)
+				return XE_BO_VMAS_STATE_WILLNEED;
+		}
+	}
+
+	/*
+	 * No VMAs => preserve existing BO purgeable state.
+	 * Avoids incorrectly flipping DONTNEED -> WILLNEED when last VMA unmapped.
+	 */
+	if (!has_vmas)
+		return XE_BO_VMAS_STATE_NO_VMAS;
+
+	return XE_BO_VMAS_STATE_DONTNEED;
+}
+
+/**
+ * xe_bo_recompute_purgeable_state() - Recompute BO purgeable state from VMAs
+ * @bo: Buffer object
+ *
+ * Walk all VMAs to determine if BO should be purgeable or not.
+ * Shared BOs require unanimous DONTNEED state from all mappings.
+ * If the BO has no VMAs the existing state is preserved.
+ *
+ * Locking: Caller must hold BO dma-resv lock. When iterating GPUVM lists,
+ * VM lock must also be held (write) to prevent concurrent VMA modifications.
+ * This is satisfied at both call sites:
+ * - xe_vma_destroy(): holds vm->lock write
+ * - madvise_purgeable(): holds vm->lock write (from madvise ioctl path)
+ *
+ * Return: nothing
+ */
+void xe_bo_recompute_purgeable_state(struct xe_bo *bo)
+{
+	enum xe_bo_vmas_purge_state vma_state;
+
+	if (!bo)
+		return;
+
+	xe_bo_assert_held(bo);
+
+	/*
+	 * Once purged, always purged. Cannot transition back to WILLNEED.
+	 * This matches i915 semantics where purged BOs are permanently invalid.
+	 */
+	if (bo->madv_purgeable == XE_MADV_PURGEABLE_PURGED)
+		return;
+
+	vma_state = xe_bo_all_vmas_dontneed(bo);
+
+	if (vma_state != (enum xe_bo_vmas_purge_state)bo->madv_purgeable &&
+	    vma_state != XE_BO_VMAS_STATE_NO_VMAS)
+		xe_bo_set_purgeable_state(bo, (enum xe_madv_purgeable_state)vma_state);
+}
+
 /**
  * madvise_purgeable - Handle purgeable buffer object advice
  * @xe: XE device
@@ -215,8 +326,11 @@ static void __maybe_unused madvise_purgeable(struct xe_device *xe,
 	for (i = 0; i < num_vmas; i++) {
 		struct xe_bo *bo = xe_vma_bo(vmas[i]);
 
-		if (!bo)
+		if (!bo) {
+			/* Purgeable state applies to BOs only, skip non-BO VMAs */
+			vmas[i]->skip_invalidation = true;
 			continue;
+		}
 
 		/* BO must be locked before modifying madv state */
 		xe_bo_assert_held(bo);
@@ -227,19 +341,31 @@ static void __maybe_unused madvise_purgeable(struct xe_device *xe,
 		 */
 		if (xe_bo_is_purged(bo)) {
 			details->has_purged_bo = true;
+			vmas[i]->skip_invalidation = true;
 			continue;
 		}
 
 		switch (op->purge_state_val.val) {
 		case DRM_XE_VMA_PURGEABLE_STATE_WILLNEED:
-			xe_bo_set_purgeable_state(bo, XE_MADV_PURGEABLE_WILLNEED);
+			vmas[i]->attr.purgeable_state = XE_MADV_PURGEABLE_WILLNEED;
+			vmas[i]->skip_invalidation = true;
+
+			xe_bo_recompute_purgeable_state(bo);
 			break;
 		case DRM_XE_VMA_PURGEABLE_STATE_DONTNEED:
-			xe_bo_set_purgeable_state(bo, XE_MADV_PURGEABLE_DONTNEED);
+			vmas[i]->attr.purgeable_state = XE_MADV_PURGEABLE_DONTNEED;
+			/*
+			 * Don't zap PTEs at DONTNEED time -- pages are still
+			 * alive. The zap happens in xe_bo_move_notify() right
+			 * before the shrinker frees them.
+			 */
+			vmas[i]->skip_invalidation = true;
+
+			xe_bo_recompute_purgeable_state(bo);
 			break;
 		default:
-			drm_warn(&vm->xe->drm, "Invalid madvise value = %d\n",
-				 op->purge_state_val.val);
+			/* Should never hit - values validated in madvise_args_are_sane() */
+			xe_assert(vm->xe, 0);
 			return;
 		}
 	}
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.h b/drivers/gpu/drm/xe/xe_vm_madvise.h
index b0e1fc445f23..39acd2689ca0 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.h
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.h
@@ -8,8 +8,11 @@
 
 struct drm_device;
 struct drm_file;
+struct xe_bo;
 
 int xe_vm_madvise_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file);
 
+void xe_bo_recompute_purgeable_state(struct xe_bo *bo);
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 69e80c94138a..033cfdd56c95 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -95,6 +95,17 @@ struct xe_vma_mem_attr {
 	 * same as default_pat_index unless overwritten by madvise.
 	 */
 	u16 pat_index;
+
+	/**
+	 * @purgeable_state: Purgeable hint for this VMA mapping
+	 *
+	 * Per-VMA purgeable state from madvise. Valid states are WILLNEED (0)
+	 * or DONTNEED (1). Shared BOs require all VMAs to be DONTNEED before
+	 * the BO can be purged. PURGED state exists only at BO level.
+	 *
+	 * Protected by BO dma-resv lock. Set via DRM_IOCTL_XE_MADVISE.
+	 */
+	u32 purgeable_state;
 };
 
 struct xe_vma {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v8 07/12] drm/xe/madvise: Block imported and exported dma-bufs
  2026-03-26  5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (5 preceding siblings ...)
  2026-03-26  5:51 ` [PATCH v8 06/12] drm/xe/madvise: Implement per-VMA purgeable state tracking Arvind Yadav
@ 2026-03-26  5:51 ` Arvind Yadav
  2026-03-26  5:51 ` [PATCH v8 08/12] drm/xe/bo: Block mmap of DONTNEED/purged BOs Arvind Yadav
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26  5:51 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom

Prevent marking imported or exported dma-bufs as purgeable.
External devices may be accessing these buffers without our
knowledge, making purging unsafe.

Check drm_gem_is_imported() for buffers created by other
drivers and obj->dma_buf for buffers exported to other
drivers. Silently skip these BOs during madvise processing.

This follows drm_gem_shmem's purgeable implementation and
prevents data corruption from purging actively-used shared
buffers.

v3:
   - Addresses review feedback from Matt Roper about handling
     imported/exported BOs correctly in the purgeable BO
     implementation.

v4:
   - Check should be add to xe_vm_madvise_purgeable_bo.

v5:
   - Rename xe_bo_is_external_dmabuf() to xe_bo_is_dmabuf_shared()
     for clarity (Thomas)
   - Update comments to clarify why both imports and exports
     are unsafe to purge.

v6:
  - No PTEs to zap for shared dma-bufs.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_vm_madvise.c | 38 ++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index ed1940da7739..340e83764a76 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -185,6 +185,34 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
 	}
 }
 
+
+/**
+ * xe_bo_is_dmabuf_shared() - Check if BO is shared via dma-buf
+ * @bo: Buffer object
+ *
+ * Prevent marking imported or exported dma-bufs as purgeable.
+ * For imported BOs, Xe doesn't own the backing store and cannot
+ * safely reclaim pages (exporter or other devices may still be
+ * using them). For exported BOs, external devices may have active
+ * mappings we cannot track.
+ *
+ * Return: true if BO is imported or exported, false otherwise
+ */
+static bool xe_bo_is_dmabuf_shared(struct xe_bo *bo)
+{
+	struct drm_gem_object *obj = &bo->ttm.base;
+
+	/* Imported: exporter owns backing store */
+	if (drm_gem_is_imported(obj))
+		return true;
+
+	/* Exported: external devices may be accessing */
+	if (obj->dma_buf)
+		return true;
+
+	return false;
+}
+
 /**
  * enum xe_bo_vmas_purge_state - VMA purgeable state aggregation
  *
@@ -234,6 +262,10 @@ static enum xe_bo_vmas_purge_state xe_bo_all_vmas_dontneed(struct xe_bo *bo)
 
 	xe_bo_assert_held(bo);
 
+	/* Shared dma-bufs cannot be purgeable */
+	if (xe_bo_is_dmabuf_shared(bo))
+		return XE_BO_VMAS_STATE_WILLNEED;
+
 	drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
 		drm_gpuvm_bo_for_each_va(gpuva, vm_bo) {
 			struct xe_vma *vma = gpuva_to_vma(gpuva);
@@ -335,6 +367,12 @@ static void __maybe_unused madvise_purgeable(struct xe_device *xe,
 		/* BO must be locked before modifying madv state */
 		xe_bo_assert_held(bo);
 
+		/* Skip shared dma-bufs - no PTEs to zap */
+		if (xe_bo_is_dmabuf_shared(bo)) {
+			vmas[i]->skip_invalidation = true;
+			continue;
+		}
+
 		/*
 		 * Once purged, always purged. Cannot transition back to WILLNEED.
 		 * This matches i915 semantics where purged BOs are permanently invalid.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v8 08/12] drm/xe/bo: Block mmap of DONTNEED/purged BOs
  2026-03-26  5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (6 preceding siblings ...)
  2026-03-26  5:51 ` [PATCH v8 07/12] drm/xe/madvise: Block imported and exported dma-bufs Arvind Yadav
@ 2026-03-26  5:51 ` Arvind Yadav
  2026-03-26  7:41   ` Matthew Brost
  2026-03-26  5:51 ` [PATCH v8 09/12] drm/xe/dma_buf: Block export " Arvind Yadav
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26  5:51 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom

Don't allow new CPU mmaps to BOs marked DONTNEED or PURGED.
DONTNEED BOs can have their contents discarded at any time, making
CPU access undefined behavior. PURGED BOs have no backing store and
are permanently invalid.

Return -EBUSY for DONTNEED BOs (temporary purgeable state) and
-EINVAL for purged BOs (permanent, no backing store).

The mmap offset ioctl now checks the BO's purgeable state before
allowing userspace to establish a new CPU mapping. This prevents
the race where userspace gets a valid offset but the BO is purged
before actual faulting begins.

Existing mmaps (established before DONTNEED) may still work until
pages are purged, at which point CPU faults fail with SIGBUS.

v6:
- Split DONTNEED → -EBUSY and PURGED → -EINVAL for consistency
  with the rest of the series (Thomas, Matt)

v7:
  - Move purgeable check from xe_gem_mmap_offset_ioctl() into a new
    xe_gem_object_mmap() callback that wraps drm_gem_ttm_mmap(). (Thomas)
  - Use an interruptible lock. (Thomas)

v8:
  - Check xe_bo_lock() return value and propagate error. (Thomas and
    Matt)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index da18b43650e3..c8e3a3fd4880 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -2165,10 +2165,35 @@ static const struct vm_operations_struct xe_gem_vm_ops = {
 	.access = xe_bo_vm_access,
 };
 
+static int xe_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
+{
+	struct xe_bo *bo = gem_to_xe_bo(obj);
+	int err = 0;
+
+	/*
+	 * Reject mmap of purgeable BOs. DONTNEED BOs can be purged
+	 * at any time, making CPU access undefined behavior. Purged BOs have
+	 * no backing store and are permanently invalid.
+	 */
+	err = xe_bo_lock(bo, true);
+	if (err)
+		return err;
+
+	if (xe_bo_madv_is_dontneed(bo))
+		err = -EBUSY;
+	else if (xe_bo_is_purged(bo))
+		err = -EINVAL;
+	xe_bo_unlock(bo);
+	if (err)
+		return err;
+
+	return drm_gem_ttm_mmap(obj, vma);
+}
+
 static const struct drm_gem_object_funcs xe_gem_object_funcs = {
 	.free = xe_gem_object_free,
 	.close = xe_gem_object_close,
-	.mmap = drm_gem_ttm_mmap,
+	.mmap = xe_gem_object_mmap,
 	.export = xe_gem_prime_export,
 	.vm_ops = &xe_gem_vm_ops,
 };
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v8 08/12] drm/xe/bo: Block mmap of DONTNEED/purged BOs
  2026-03-26  5:51 ` [PATCH v8 08/12] drm/xe/bo: Block mmap of DONTNEED/purged BOs Arvind Yadav
@ 2026-03-26  7:41   ` Matthew Brost
  0 siblings, 0 replies; 16+ messages in thread
From: Matthew Brost @ 2026-03-26  7:41 UTC (permalink / raw)
  To: Arvind Yadav; +Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom

On Thu, Mar 26, 2026 at 11:21:07AM +0530, Arvind Yadav wrote:
> Don't allow new CPU mmaps to BOs marked DONTNEED or PURGED.
> DONTNEED BOs can have their contents discarded at any time, making
> CPU access undefined behavior. PURGED BOs have no backing store and
> are permanently invalid.
> 
> Return -EBUSY for DONTNEED BOs (temporary purgeable state) and
> -EINVAL for purged BOs (permanent, no backing store).
> 
> The mmap offset ioctl now checks the BO's purgeable state before
> allowing userspace to establish a new CPU mapping. This prevents
> the race where userspace gets a valid offset but the BO is purged
> before actual faulting begins.
> 
> Existing mmaps (established before DONTNEED) may still work until
> pages are purged, at which point CPU faults fail with SIGBUS.
> 
> v6:
> - Split DONTNEED → -EBUSY and PURGED → -EINVAL for consistency
>   with the rest of the series (Thomas, Matt)
> 
> v7:
>   - Move purgeable check from xe_gem_mmap_offset_ioctl() into a new
>     xe_gem_object_mmap() callback that wraps drm_gem_ttm_mmap(). (Thomas)
>   - Use an interruptible lock. (Thomas)
> 
> v8:
>   - Check xe_bo_lock() return value and propagate error. (Thomas and
>     Matt)
> 
> Cc: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index da18b43650e3..c8e3a3fd4880 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -2165,10 +2165,35 @@ static const struct vm_operations_struct xe_gem_vm_ops = {
>  	.access = xe_bo_vm_access,
>  };
>  
> +static int xe_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
> +{
> +	struct xe_bo *bo = gem_to_xe_bo(obj);
> +	int err = 0;
> +
> +	/*
> +	 * Reject mmap of purgeable BOs. DONTNEED BOs can be purged
> +	 * at any time, making CPU access undefined behavior. Purged BOs have
> +	 * no backing store and are permanently invalid.
> +	 */
> +	err = xe_bo_lock(bo, true);
> +	if (err)
> +		return err;
> +
> +	if (xe_bo_madv_is_dontneed(bo))
> +		err = -EBUSY;
> +	else if (xe_bo_is_purged(bo))
> +		err = -EINVAL;
> +	xe_bo_unlock(bo);
> +	if (err)
> +		return err;
> +
> +	return drm_gem_ttm_mmap(obj, vma);
> +}
> +
>  static const struct drm_gem_object_funcs xe_gem_object_funcs = {
>  	.free = xe_gem_object_free,
>  	.close = xe_gem_object_close,
> -	.mmap = drm_gem_ttm_mmap,
> +	.mmap = xe_gem_object_mmap,
>  	.export = xe_gem_prime_export,
>  	.vm_ops = &xe_gem_vm_ops,
>  };
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v8 09/12] drm/xe/dma_buf: Block export of DONTNEED/purged BOs
  2026-03-26  5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (7 preceding siblings ...)
  2026-03-26  5:51 ` [PATCH v8 08/12] drm/xe/bo: Block mmap of DONTNEED/purged BOs Arvind Yadav
@ 2026-03-26  5:51 ` Arvind Yadav
  2026-03-26  7:42   ` Matthew Brost
  2026-03-26  5:51 ` [PATCH v8 10/12] drm/xe/bo: Add purgeable shrinker state helpers Arvind Yadav
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26  5:51 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom

Don't allow exporting BOs marked DONTNEED or PURGED as dma-bufs.
DONTNEED BOs can have their contents discarded at any time, making
the exported dma-buf unusable for external devices. PURGED BOs have
no backing store and are permanently invalid.

Return -EBUSY for DONTNEED BOs (temporary purgeable state) and
-EINVAL for purged BOs (permanent, no backing store).

The export path now checks the BO's purgeable state before creating
the dma-buf, preventing external devices from accessing memory that
may be purged at any time.

v6:
- Split DONTNEED → -EBUSY and PURGED → -EINVAL for consistency
  with the rest of the series (Thomas, Matt)

v7:
- Use Interruptible lock. (Thomas)

v8:
- Check xe_bo_lock() return value and propagate error. (Thomas)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_dma_buf.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index ea370cd373e9..7f9602b3363d 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -223,6 +223,26 @@ struct dma_buf *xe_gem_prime_export(struct drm_gem_object *obj, int flags)
 	if (bo->vm)
 		return ERR_PTR(-EPERM);
 
+	/*
+	 * Reject exporting purgeable BOs. DONTNEED BOs can be purged
+	 * at any time, making the exported dma-buf unusable. Purged BOs
+	 * have no backing store and are permanently invalid.
+	 */
+	ret = xe_bo_lock(bo, true);
+	if (ret)
+		return ERR_PTR(ret);
+
+	if (xe_bo_madv_is_dontneed(bo)) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	if (xe_bo_is_purged(bo)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+	xe_bo_unlock(bo);
+
 	ret = ttm_bo_setup_export(&bo->ttm, &ctx);
 	if (ret)
 		return ERR_PTR(ret);
@@ -232,6 +252,10 @@ struct dma_buf *xe_gem_prime_export(struct drm_gem_object *obj, int flags)
 		buf->ops = &xe_dmabuf_ops;
 
 	return buf;
+
+out_unlock:
+	xe_bo_unlock(bo);
+	return ERR_PTR(ret);
 }
 
 static struct drm_gem_object *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v8 09/12] drm/xe/dma_buf: Block export of DONTNEED/purged BOs
  2026-03-26  5:51 ` [PATCH v8 09/12] drm/xe/dma_buf: Block export " Arvind Yadav
@ 2026-03-26  7:42   ` Matthew Brost
  0 siblings, 0 replies; 16+ messages in thread
From: Matthew Brost @ 2026-03-26  7:42 UTC (permalink / raw)
  To: Arvind Yadav; +Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom

On Thu, Mar 26, 2026 at 11:21:08AM +0530, Arvind Yadav wrote:
> Don't allow exporting BOs marked DONTNEED or PURGED as dma-bufs.
> DONTNEED BOs can have their contents discarded at any time, making
> the exported dma-buf unusable for external devices. PURGED BOs have
> no backing store and are permanently invalid.
> 
> Return -EBUSY for DONTNEED BOs (temporary purgeable state) and
> -EINVAL for purged BOs (permanent, no backing store).
> 
> The export path now checks the BO's purgeable state before creating
> the dma-buf, preventing external devices from accessing memory that
> may be purged at any time.
> 
> v6:
> - Split DONTNEED → -EBUSY and PURGED → -EINVAL for consistency
>   with the rest of the series (Thomas, Matt)
> 
> v7:
> - Use Interruptible lock. (Thomas)
> 
> v8:
> - Check xe_bo_lock() return value and propagate error. (Thomas)
> 
> Cc: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_dma_buf.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> index ea370cd373e9..7f9602b3363d 100644
> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> @@ -223,6 +223,26 @@ struct dma_buf *xe_gem_prime_export(struct drm_gem_object *obj, int flags)
>  	if (bo->vm)
>  		return ERR_PTR(-EPERM);
>  
> +	/*
> +	 * Reject exporting purgeable BOs. DONTNEED BOs can be purged
> +	 * at any time, making the exported dma-buf unusable. Purged BOs
> +	 * have no backing store and are permanently invalid.
> +	 */
> +	ret = xe_bo_lock(bo, true);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	if (xe_bo_madv_is_dontneed(bo)) {
> +		ret = -EBUSY;
> +		goto out_unlock;
> +	}
> +
> +	if (xe_bo_is_purged(bo)) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +	xe_bo_unlock(bo);
> +
>  	ret = ttm_bo_setup_export(&bo->ttm, &ctx);
>  	if (ret)
>  		return ERR_PTR(ret);
> @@ -232,6 +252,10 @@ struct dma_buf *xe_gem_prime_export(struct drm_gem_object *obj, int flags)
>  		buf->ops = &xe_dmabuf_ops;
>  
>  	return buf;
> +
> +out_unlock:
> +	xe_bo_unlock(bo);
> +	return ERR_PTR(ret);
>  }
>  
>  static struct drm_gem_object *
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v8 10/12] drm/xe/bo: Add purgeable shrinker state helpers
  2026-03-26  5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (8 preceding siblings ...)
  2026-03-26  5:51 ` [PATCH v8 09/12] drm/xe/dma_buf: Block export " Arvind Yadav
@ 2026-03-26  5:51 ` Arvind Yadav
  2026-03-26  5:51 ` [PATCH v8 11/12] drm/xe/madvise: Enable purgeable buffer object IOCTL support Arvind Yadav
  2026-03-26  5:51 ` [PATCH v8 12/12] drm/xe/madvise: Accept canonical GPU addresses in xe_vm_madvise_ioctl Arvind Yadav
  11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26  5:51 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom

Encapsulate TTM purgeable flag updates and shrinker page accounting
into helper functions to prevent desynchronization between the TTM
tt->purgeable flag and the shrinker's page bucket counters.

Without these helpers, direct manipulation of xe_ttm_tt->purgeable
risks forgetting to update the corresponding shrinker counters,
leading to incorrect memory pressure calculations.

Update purgeable BO state to PURGED after successful shrinker purge
for DONTNEED BOs.

v4:
  - @madv_purgeable atomic_t → u32 change across all relevant
    patches (Matt)

v5:
  - Update purgeable BO state to PURGED after a successful shrinker
    purge for DONTNEED BOs.
  - Split ghost BO and zero-refcount handling in xe_bo_shrink() (Thomas)

v6:
  - Create separate patch for 'Split ghost BO and zero-refcount
    handling'. (Thomas)

v7:
  - Merge xe_bo_set_purgeable_shrinker() and xe_bo_clear_purgeable_shrinker()
    into a single static helper xe_bo_set_purgeable_shrinker(bo, new_state)
    called automatically from xe_bo_set_purgeable_state(). Callers no longer
    need to manage shrinker accounting separately. (Thomas)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 43 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index c8e3a3fd4880..0a3e66f9f18a 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -835,6 +835,42 @@ static int xe_bo_move_notify(struct xe_bo *bo,
 	return 0;
 }
 
+/**
+ * xe_bo_set_purgeable_shrinker() - Update shrinker accounting for purgeable state
+ * @bo: Buffer object
+ * @new_state: New purgeable state being set
+ *
+ * Transfers pages between shrinkable and purgeable buckets when the BO
+ * purgeable state changes. Called automatically from xe_bo_set_purgeable_state().
+ */
+static void xe_bo_set_purgeable_shrinker(struct xe_bo *bo,
+					 enum xe_madv_purgeable_state new_state)
+{
+	struct ttm_buffer_object *ttm_bo = &bo->ttm;
+	struct ttm_tt *tt = ttm_bo->ttm;
+	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
+	struct xe_ttm_tt *xe_tt;
+	long tt_pages;
+
+	xe_bo_assert_held(bo);
+
+	if (!tt || !ttm_tt_is_populated(tt))
+		return;
+
+	xe_tt = container_of(tt, struct xe_ttm_tt, ttm);
+	tt_pages = tt->num_pages;
+
+	if (!xe_tt->purgeable && new_state == XE_MADV_PURGEABLE_DONTNEED) {
+		xe_tt->purgeable = true;
+		/* Transfer pages from shrinkable to purgeable count */
+		xe_shrinker_mod_pages(xe->mem.shrinker, -tt_pages, tt_pages);
+	} else if (xe_tt->purgeable && new_state == XE_MADV_PURGEABLE_WILLNEED) {
+		xe_tt->purgeable = false;
+		/* Transfer pages from purgeable to shrinkable count */
+		xe_shrinker_mod_pages(xe->mem.shrinker, tt_pages, -tt_pages);
+	}
+}
+
 /**
  * xe_bo_set_purgeable_state() - Set BO purgeable state with validation
  * @bo: Buffer object
@@ -842,7 +878,8 @@ static int xe_bo_move_notify(struct xe_bo *bo,
  *
  * Sets the purgeable state with lockdep assertions and validates state
  * transitions. Once a BO is PURGED, it cannot transition to any other state.
- * Invalid transitions are caught with xe_assert().
+ * Invalid transitions are caught with xe_assert(). Shrinker page accounting
+ * is updated automatically.
  */
 void xe_bo_set_purgeable_state(struct xe_bo *bo,
 			       enum xe_madv_purgeable_state new_state)
@@ -861,6 +898,7 @@ void xe_bo_set_purgeable_state(struct xe_bo *bo,
 			new_state != XE_MADV_PURGEABLE_PURGED));
 
 	bo->madv_purgeable = new_state;
+	xe_bo_set_purgeable_shrinker(bo, new_state);
 }
 
 /**
@@ -1243,6 +1281,9 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
 			lret = xe_bo_move_notify(xe_bo, ctx);
 		if (!lret)
 			lret = xe_bo_shrink_purge(ctx, bo, scanned);
+		if (lret > 0 && xe_bo_madv_is_dontneed(xe_bo))
+			xe_bo_set_purgeable_state(xe_bo,
+						  XE_MADV_PURGEABLE_PURGED);
 		goto out_unref;
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v8 11/12] drm/xe/madvise: Enable purgeable buffer object IOCTL support
  2026-03-26  5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (9 preceding siblings ...)
  2026-03-26  5:51 ` [PATCH v8 10/12] drm/xe/bo: Add purgeable shrinker state helpers Arvind Yadav
@ 2026-03-26  5:51 ` Arvind Yadav
  2026-03-26  5:51 ` [PATCH v8 12/12] drm/xe/madvise: Accept canonical GPU addresses in xe_vm_madvise_ioctl Arvind Yadav
  11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26  5:51 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom

Hook the madvise_purgeable() handler into the madvise IOCTL now that all
supporting infrastructure is complete:

 - Core purge implementation (patch 3)
 - BO state tracking and helpers (patches 1-2)
 - Per-VMA purgeable state tracking (patch 6)
 - Shrinker integration for memory reclamation (patch 10)

This final patch enables userspace to use the DRM_XE_VMA_ATTR_PURGEABLE_STATE
madvise type to mark buffers as WILLNEED/DONTNEED and receive the retained
status indicating whether buffers were purged.

The feature was kept disabled in earlier patches to maintain bisectability
and ensure all components are in place before exposing to userspace.

Userspace can detect kernel support for purgeable BOs by checking the
DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT flag in the query_config
response.

v6:
  - Add DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT for userspace
    feature detection. (Jose)

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_query.c      |  2 ++
 drivers/gpu/drm/xe/xe_vm_madvise.c | 22 +++++-----------------
 2 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c
index 4852fdcb4b95..d84d6a422c45 100644
--- a/drivers/gpu/drm/xe/xe_query.c
+++ b/drivers/gpu/drm/xe/xe_query.c
@@ -342,6 +342,8 @@ static int query_config(struct xe_device *xe, struct drm_xe_device_query *query)
 			DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY;
 	config->info[DRM_XE_QUERY_CONFIG_FLAGS] |=
 		DRM_XE_QUERY_CONFIG_FLAG_HAS_DISABLE_STATE_CACHE_PERF_FIX;
+	config->info[DRM_XE_QUERY_CONFIG_FLAGS] |=
+		DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT;
 	config->info[DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT] =
 		xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : SZ_4K;
 	config->info[DRM_XE_QUERY_CONFIG_VA_BITS] = xe->info.va_bits;
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 340e83764a76..4a19da5e86d4 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -338,18 +338,11 @@ void xe_bo_recompute_purgeable_state(struct xe_bo *bo)
  *
  * Handles DONTNEED/WILLNEED/PURGED states. Tracks if any BO was purged
  * in details->has_purged_bo for later copy to userspace.
- *
- * Note: Marked __maybe_unused until hooked into madvise_funcs[] in the
- * final patch to maintain bisectability. The NULL placeholder in the
- * array ensures proper -EINVAL return for userspace until all supporting
- * infrastructure (shrinker, per-VMA tracking) is complete.
  */
-static void __maybe_unused madvise_purgeable(struct xe_device *xe,
-					     struct xe_vm *vm,
-					     struct xe_vma **vmas,
-					     int num_vmas,
-					     struct drm_xe_madvise *op,
-					     struct xe_madvise_details *details)
+static void madvise_purgeable(struct xe_device *xe, struct xe_vm *vm,
+			      struct xe_vma **vmas, int num_vmas,
+			      struct drm_xe_madvise *op,
+			      struct xe_madvise_details *details)
 {
 	int i;
 
@@ -418,12 +411,7 @@ static const madvise_func madvise_funcs[] = {
 	[DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC] = madvise_preferred_mem_loc,
 	[DRM_XE_MEM_RANGE_ATTR_ATOMIC] = madvise_atomic,
 	[DRM_XE_MEM_RANGE_ATTR_PAT] = madvise_pat_index,
-	/*
-	 * Purgeable support implemented but not enabled yet to maintain
-	 * bisectability. Will be set to madvise_purgeable() in final patch
-	 * when all infrastructure (shrinker, VMA tracking) is complete.
-	 */
-	[DRM_XE_VMA_ATTR_PURGEABLE_STATE] = NULL,
+	[DRM_XE_VMA_ATTR_PURGEABLE_STATE] = madvise_purgeable,
 };
 
 static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start, u64 end)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v8 12/12] drm/xe/madvise: Accept canonical GPU addresses in xe_vm_madvise_ioctl
  2026-03-26  5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (10 preceding siblings ...)
  2026-03-26  5:51 ` [PATCH v8 11/12] drm/xe/madvise: Enable purgeable buffer object IOCTL support Arvind Yadav
@ 2026-03-26  5:51 ` Arvind Yadav
  11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26  5:51 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom

Userspace passes canonical (sign-extended) GPU addresses where bits 63:48
mirror bit 47. The internal GPUVM uses non-canonical form (upper bits
zeroed), so passing raw canonical addresses into GPUVM lookups causes
mismatches for addresses above 128TiB.

Strip the sign extension with xe_device_uncanonicalize_addr() at the
top of xe_vm_madvise_ioctl(). Non-canonical addresses are unaffected.

Fixes: ada7486c5668 ("drm/xe: Implement madvise ioctl for xe")

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_vm_madvise.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 4a19da5e86d4..2d03676ee595 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -673,8 +673,15 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
 	struct xe_device *xe = to_xe_device(dev);
 	struct xe_file *xef = to_xe_file(file);
 	struct drm_xe_madvise *args = data;
-	struct xe_vmas_in_madvise_range madvise_range = {.addr = args->start,
-							 .range =  args->range, };
+	struct xe_vmas_in_madvise_range madvise_range = {
+		/*
+		 * Userspace may pass canonical (sign-extended) addresses.
+		 * Strip the sign extension to get the internal non-canonical
+		 * form used by the GPUVM, matching xe_vm_bind_ioctl() behavior.
+		 */
+		.addr = xe_device_uncanonicalize_addr(xe, args->start),
+		.range = args->range,
+	};
 	struct xe_madvise_details details;
 	struct xe_vm *vm;
 	struct drm_exec exec;
@@ -724,7 +731,7 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
 	if (err)
 		goto unlock_vm;
 
-	err = xe_vm_alloc_madvise_vma(vm, args->start, args->range);
+	err = xe_vm_alloc_madvise_vma(vm, madvise_range.addr, args->range);
 	if (err)
 		goto madv_fini;
 
@@ -774,7 +781,8 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
 	madvise_funcs[attr_type](xe, vm, madvise_range.vmas, madvise_range.num_vmas, args,
 				 &details);
 
-	err = xe_vm_invalidate_madvise_range(vm, args->start, args->start + args->range);
+	err = xe_vm_invalidate_madvise_range(vm, madvise_range.addr,
+					     madvise_range.addr + args->range);
 
 	if (madvise_range.has_svm_userptr_vmas)
 		xe_svm_notifier_unlock(vm);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-03-26  8:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-26  5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
2026-03-26  5:51 ` [PATCH v8 01/12] drm/xe/uapi: Add UAPI " Arvind Yadav
2026-03-26  5:51 ` [PATCH v8 02/12] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo Arvind Yadav
2026-03-26  5:51 ` [PATCH v8 03/12] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
2026-03-26  8:19   ` Thomas Hellström
2026-03-26  5:51 ` [PATCH v8 04/12] drm/xe/bo: Block CPU faults to purgeable buffer objects Arvind Yadav
2026-03-26  5:51 ` [PATCH v8 05/12] drm/xe/vm: Prevent binding of purged " Arvind Yadav
2026-03-26  5:51 ` [PATCH v8 06/12] drm/xe/madvise: Implement per-VMA purgeable state tracking Arvind Yadav
2026-03-26  5:51 ` [PATCH v8 07/12] drm/xe/madvise: Block imported and exported dma-bufs Arvind Yadav
2026-03-26  5:51 ` [PATCH v8 08/12] drm/xe/bo: Block mmap of DONTNEED/purged BOs Arvind Yadav
2026-03-26  7:41   ` Matthew Brost
2026-03-26  5:51 ` [PATCH v8 09/12] drm/xe/dma_buf: Block export " Arvind Yadav
2026-03-26  7:42   ` Matthew Brost
2026-03-26  5:51 ` [PATCH v8 10/12] drm/xe/bo: Add purgeable shrinker state helpers Arvind Yadav
2026-03-26  5:51 ` [PATCH v8 11/12] drm/xe/madvise: Enable purgeable buffer object IOCTL support Arvind Yadav
2026-03-26  5:51 ` [PATCH v8 12/12] drm/xe/madvise: Accept canonical GPU addresses in xe_vm_madvise_ioctl Arvind Yadav

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox