* [PATCH v8 01/12] drm/xe/uapi: Add UAPI support for purgeable buffer objects
2026-03-26 5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
@ 2026-03-26 5:51 ` Arvind Yadav
2026-03-26 5:51 ` [PATCH v8 02/12] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo Arvind Yadav
` (10 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26 5:51 UTC (permalink / raw)
To: intel-xe
Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom,
José Roberto de Souza
From: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Extend the DRM_XE_MADVISE ioctl to support purgeable buffer object
management by adding DRM_XE_VMA_ATTR_PURGEABLE_STATE attribute type.
This allows userspace applications to provide memory usage hints to
the kernel for better memory management under pressure:
- WILLNEED: Buffer is needed and should not be purged. If the BO was
previously purged, retained field returns 0 indicating backing store
was lost (once purged, always purged semantics matching i915).
- DONTNEED: Buffer is not currently needed and may be purged by the
kernel under memory pressure to free resources. Only applies to
non-shared BOs.
To prevent undefined behavior, the following operations are blocked
while a BO is in DONTNEED state:
- New mmap() operations return -EBUSY
- VM_BIND operations return -EBUSY
- New dma-buf exports return -EBUSY
- CPU page faults return SIGBUS
- GPU page faults fail with -EACCES
This ensures applications cannot use a BO while marked as DONTNEED,
preventing erratic behavior when the kernel purges the backing store.
The implementation includes a 'retained' output field (matching i915's
drm_i915_gem_madvise.retained) that indicates whether the BO's backing
store still exists (1) or has been purged (0).
Added DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT flag to allow
userspace to detect kernel support for purgeable buffer objects
before attempting to use the feature.
v2:
- Add PURGED state for read-only status, change ioctl to DRM_IOWR,
add retained field for i915 compatibility
v3:
- UAPI rule should not be changed (Matthew Brost)
- Make 'retained' a userptr (Matthew Brost)
v4:
- You cannot make this part of the union (purge_state_val) larger
than the existing union (16 bytes). So just drop the '__u64 reserved'
field. (Matt)
v5:
- Update UAPI documentation to clarify retained must be initialized
to 0(Thomas)
v6:
- Document DONTNEED BO access blocking behavior to prevent undefined
behavior and clarify uAPI contract (Thomas, Matt)
- Add query flag DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT for
feature detection. (Jose)
- Rename retained to retained_ptr. (Jose)
v7:
- Updated UAPI documentation as suggested to reflect 'updated' value
instead of 'return'. (Jose)
Cc: Matthew Brost <matthew.brost@intel.com>
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
include/uapi/drm/xe_drm.h | 69 +++++++++++++++++++++++++++++++++++++++
1 file changed, 69 insertions(+)
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index f8b2afb20540..a59baf5add9a 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -429,6 +429,7 @@ struct drm_xe_query_config {
#define DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR (1 << 2)
#define DRM_XE_QUERY_CONFIG_FLAG_HAS_NO_COMPRESSION_HINT (1 << 3)
#define DRM_XE_QUERY_CONFIG_FLAG_HAS_DISABLE_STATE_CACHE_PERF_FIX (1 << 4)
+ #define DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT (1 << 5)
#define DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT 2
#define DRM_XE_QUERY_CONFIG_VA_BITS 3
#define DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY 4
@@ -2083,6 +2084,7 @@ struct drm_xe_query_eu_stall {
* - DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: Set preferred memory location.
* - DRM_XE_MEM_RANGE_ATTR_ATOMIC: Set atomic access policy.
* - DRM_XE_MEM_RANGE_ATTR_PAT: Set page attribute table index.
+ * - DRM_XE_VMA_ATTR_PURGEABLE_STATE: Set purgeable state for BOs.
*
* Example:
*
@@ -2115,6 +2117,7 @@ struct drm_xe_madvise {
#define DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC 0
#define DRM_XE_MEM_RANGE_ATTR_ATOMIC 1
#define DRM_XE_MEM_RANGE_ATTR_PAT 2
+#define DRM_XE_VMA_ATTR_PURGEABLE_STATE 3
/** @type: type of attribute */
__u32 type;
@@ -2205,6 +2208,72 @@ struct drm_xe_madvise {
/** @pat_index.reserved: Reserved */
__u64 reserved;
} pat_index;
+
+ /**
+ * @purge_state_val: Purgeable state configuration
+ *
+ * Used when @type == DRM_XE_VMA_ATTR_PURGEABLE_STATE.
+ *
+ * Configures the purgeable state of buffer objects in the specified
+ * virtual address range. This allows applications to hint to the kernel
+ * about bo's usage patterns for better memory management.
+ *
+ * By default all VMAs are in WILLNEED state.
+ *
+ * Supported values for @purge_state_val.val:
+ * - DRM_XE_VMA_PURGEABLE_STATE_WILLNEED (0): Marks BO as needed.
+ * If the BO was previously purged, the kernel sets the __u32 at
+ * @retained_ptr to 0 (backing store lost) so the application knows
+ * it must recreate the BO.
+ *
+ * - DRM_XE_VMA_PURGEABLE_STATE_DONTNEED (1): Marks BO as not currently
+ * needed. Kernel may purge it under memory pressure to reclaim memory.
+ * Only applies to non-shared BOs. The kernel sets the __u32 at
+ * @retained_ptr to 1 if the backing store still exists (not yet purged),
+ * or 0 if it was already purged.
+ *
+ * Important: Once marked as DONTNEED, touching the BO's memory
+ * is undefined behavior. It may succeed temporarily (before the
+ * kernel purges the backing store) but will suddenly fail once
+ * the BO transitions to PURGED state.
+ *
+ * To transition back: use WILLNEED and check @retained_ptr —
+ * if 0, backing store was lost and the BO must be recreated.
+ *
+ * The following operations are blocked in DONTNEED state to
+ * prevent the BO from being re-mapped after madvise:
+ * - New mmap() calls: Fail with -EBUSY
+ * - VM_BIND operations: Fail with -EBUSY
+ * - New dma-buf exports: Fail with -EBUSY
+ * - CPU page faults (existing mmap): Fail with SIGBUS
+ * - GPU page faults (fault-mode VMs): Fail with -EACCES
+ */
+ struct {
+#define DRM_XE_VMA_PURGEABLE_STATE_WILLNEED 0
+#define DRM_XE_VMA_PURGEABLE_STATE_DONTNEED 1
+ /** @purge_state_val.val: value for DRM_XE_VMA_ATTR_PURGEABLE_STATE */
+ __u32 val;
+
+ /** @purge_state_val.pad: MBZ */
+ __u32 pad;
+ /**
+ * @purge_state_val.retained_ptr: Pointer to a __u32 output
+ * field for backing store status.
+ *
+ * Userspace must initialize the __u32 value at this address
+ * to 0 before the ioctl. Kernel writes a __u32 after the
+ * operation:
+ * - 1 if backing store exists (not purged)
+ * - 0 if backing store was purged
+ *
+ * If userspace fails to initialize to 0, ioctl returns -EINVAL.
+ * This ensures a safe default (0 = assume purged) if kernel
+ * cannot write the result.
+ *
+ * Similar to i915's drm_i915_gem_madvise.retained field.
+ */
+ __u64 retained_ptr;
+ } purge_state_val;
};
/** @reserved: Reserved */
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH v8 02/12] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo
2026-03-26 5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
2026-03-26 5:51 ` [PATCH v8 01/12] drm/xe/uapi: Add UAPI " Arvind Yadav
@ 2026-03-26 5:51 ` Arvind Yadav
2026-03-26 5:51 ` [PATCH v8 03/12] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
` (9 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26 5:51 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom
Add infrastructure for tracking purgeable state of buffer objects.
This includes:
Introduce enum xe_madv_purgeable_state with three states:
- XE_MADV_PURGEABLE_WILLNEED (0): BO is needed and should not be
purged. This is the default state for all BOs.
- XE_MADV_PURGEABLE_DONTNEED (1): BO is not currently needed and
can be purged by the kernel under memory pressure to reclaim
resources. Only non-shared BOs can be marked as DONTNEED.
- XE_MADV_PURGEABLE_PURGED (2): BO has been purged by the kernel.
Accessing a purged BO results in error. Follows i915 semantics
where once purged, the BO remains permanently invalid ("once
purged, always purged").
Add madv_purgeable field to struct xe_bo for state tracking
of purgeable state across concurrent access paths
v2:
- Add xe_bo_is_purged() helper, improve state documentation
v3:
- Add the kernel doc(Matthew Brost)
- Add the new helpers xe_bo_madv_is_dontneed(Matthew Brost)
v4:
- @madv_purgeable atomic_t → u32 change across all relevant
patches (Matt)
v5:
- Add locking documentation to madv_purgeable field comment (Matt)
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
drivers/gpu/drm/xe/xe_bo.h | 56 ++++++++++++++++++++++++++++++++
drivers/gpu/drm/xe/xe_bo_types.h | 6 ++++
2 files changed, 62 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 2cbac16f7db7..fb5541bdf602 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -87,6 +87,28 @@
#define XE_PCI_BARRIER_MMAP_OFFSET (0x50 << XE_PTE_SHIFT)
+/**
+ * enum xe_madv_purgeable_state - Buffer object purgeable state enumeration
+ *
+ * This enum defines the possible purgeable states for a buffer object,
+ * allowing userspace to provide memory usage hints to the kernel for
+ * better memory management under pressure.
+ *
+ * @XE_MADV_PURGEABLE_WILLNEED: The buffer object is needed and should not be purged.
+ * This is the default state.
+ * @XE_MADV_PURGEABLE_DONTNEED: The buffer object is not currently needed and can be
+ * purged by the kernel under memory pressure.
+ * @XE_MADV_PURGEABLE_PURGED: The buffer object has been purged by the kernel.
+ *
+ * Accessing a purged buffer will result in an error. Per i915 semantics,
+ * once purged, a BO remains permanently invalid and must be destroyed and recreated.
+ */
+enum xe_madv_purgeable_state {
+ XE_MADV_PURGEABLE_WILLNEED,
+ XE_MADV_PURGEABLE_DONTNEED,
+ XE_MADV_PURGEABLE_PURGED,
+};
+
struct sg_table;
struct xe_bo *xe_bo_alloc(void);
@@ -215,6 +237,40 @@ static inline bool xe_bo_is_protected(const struct xe_bo *bo)
return bo->pxp_key_instance;
}
+/**
+ * xe_bo_is_purged() - Check if buffer object has been purged
+ * @bo: The buffer object to check
+ *
+ * Checks if the buffer object's backing store has been discarded by the
+ * kernel due to memory pressure after being marked as purgeable (DONTNEED).
+ * Once purged, the BO cannot be restored and any attempt to use it will fail.
+ *
+ * Context: Caller must hold the BO's dma-resv lock
+ * Return: true if the BO has been purged, false otherwise
+ */
+static inline bool xe_bo_is_purged(struct xe_bo *bo)
+{
+ xe_bo_assert_held(bo);
+ return bo->madv_purgeable == XE_MADV_PURGEABLE_PURGED;
+}
+
+/**
+ * xe_bo_madv_is_dontneed() - Check if BO is marked as DONTNEED
+ * @bo: The buffer object to check
+ *
+ * Checks if userspace has marked this BO as DONTNEED (i.e., its contents
+ * are not currently needed and can be discarded under memory pressure).
+ * This is used internally to decide whether a BO is eligible for purging.
+ *
+ * Context: Caller must hold the BO's dma-resv lock
+ * Return: true if the BO is marked DONTNEED, false otherwise
+ */
+static inline bool xe_bo_madv_is_dontneed(struct xe_bo *bo)
+{
+ xe_bo_assert_held(bo);
+ return bo->madv_purgeable == XE_MADV_PURGEABLE_DONTNEED;
+}
+
static inline void xe_bo_unpin_map_no_vm(struct xe_bo *bo)
{
if (likely(bo)) {
diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index d4fe3c8dca5b..ff8317bfc1ae 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -108,6 +108,12 @@ struct xe_bo {
* from default
*/
u64 min_align;
+
+ /**
+ * @madv_purgeable: user space advise on BO purgeability, protected
+ * by BO's dma-resv lock.
+ */
+ u32 madv_purgeable;
};
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH v8 03/12] drm/xe/madvise: Implement purgeable buffer object support
2026-03-26 5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
2026-03-26 5:51 ` [PATCH v8 01/12] drm/xe/uapi: Add UAPI " Arvind Yadav
2026-03-26 5:51 ` [PATCH v8 02/12] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo Arvind Yadav
@ 2026-03-26 5:51 ` Arvind Yadav
2026-03-26 8:19 ` Thomas Hellström
2026-03-26 5:51 ` [PATCH v8 04/12] drm/xe/bo: Block CPU faults to purgeable buffer objects Arvind Yadav
` (8 subsequent siblings)
11 siblings, 1 reply; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26 5:51 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom
This allows userspace applications to provide memory usage hints to
the kernel for better memory management under pressure:
Add the core implementation for purgeable buffer objects, enabling memory
reclamation of user-designated DONTNEED buffers during eviction.
This patch implements the purge operation and state machine transitions:
Purgeable States (from xe_madv_purgeable_state):
- WILLNEED (0): BO should be retained, actively used
- DONTNEED (1): BO eligible for purging, not currently needed
- PURGED (2): BO backing store reclaimed, permanently invalid
Design Rationale:
- Async TLB invalidation via trigger_rebind (no blocking xe_vm_invalidate_vma)
- i915 compatibility: retained field, "once purged always purged" semantics
- Shared BO protection prevents multi-process memory corruption
- Scratch PTE reuse avoids new infrastructure, safe for fault mode
Note: The madvise_purgeable() function is implemented but not hooked into
the IOCTL handler (madvise_funcs[] entry is NULL) to maintain bisectability.
The feature will be enabled in the final patch when all supporting
infrastructure (shrinker, per-VMA tracking) is complete.
v2:
- Use xe_bo_trigger_rebind() for async TLB invalidation (Thomas Hellström)
- Add NULL rebind with scratch PTEs for fault mode (Thomas Hellström)
- Implement i915-compatible retained field logic (Thomas Hellström)
- Skip BO validation for purged BOs in page fault handler (crash fix)
- Add scratch VM check in page fault path (non-scratch VMs fail fault)
- Force clear_pt for non-scratch VMs to avoid phys addr 0 mapping (review fix)
- Add !is_purged check to resource cursor setup to prevent stale access
v3:
- Rebase as xe_gt_pagefault.c is gone upstream and replaced
with xe_pagefault.c (Matthew Brost)
- Xe specific warn on (Matthew Brost)
- Call helpers for madv_purgeable access(Matthew Brost)
- Remove bo NULL check(Matthew Brost)
- Use xe_bo_assert_held instead of dma assert(Matthew Brost)
- Move the xe_bo_is_purged check under the dma-resv lock( by Matt)
- Drop is_purged from xe_pt_stage_bind_entry and just set is_null to true
for purged BO rename s/is_null/is_null_or_purged (by Matt)
- UAPI rule should not be changed.(Matthew Brost)
- Make 'retained' a userptr (Matthew Brost)
v4:
- @madv_purgeable atomic_t → u32 change across all relevant patches (Matt)
v5:
- Introduce xe_bo_set_purgeable_state() helper (void return) to centralize
madv_purgeable updates with xe_bo_assert_held() and state transition
validation using explicit enum checks (no transition out of PURGED) (Matt)
- Make xe_ttm_bo_purge() return int and propagate failures from
xe_bo_move(); handle xe_bo_trigger_rebind() failures (e.g. no_wait_gpu
paths) rather than silently ignoring (Matt)
- Replace drm_WARN_ON with xe_assert for better Xe-specific assertions (Matt)
- Hook purgeable handling into madvise_funcs[DRM_XE_VMA_ATTR_PURGEABLE_STATE]
instead of special-case path in xe_vm_madvise_ioctl() (Matt)
- Track purgeable retained return via xe_madvise_details and perform
copy_to_user() from xe_madvise_details_fini() after locks are dropped (Matt)
- Set madvise_funcs[DRM_XE_VMA_ATTR_PURGEABLE_STATE] to NULL with
__maybe_unused on madvise_purgeable() to maintain bisectability until
shrinker integration is complete in final patch (Matt)
- Use put_user() instead of copy_to_user() for single u32 retained value (Thomas)
- Return -EFAULT from ioctl if put_user() fails (Thomas)
- Validate userspace initialized retained to 0 before ioctl, ensuring safe
default (0 = "assume purged") if put_user() fails (Thomas)
- Refactor error handling: separate fallible put_user from infallible cleanup
- xe_madvise_purgeable_retained_to_user(): separate helper for fallible put_user
- Call put_user() after releasing all locks to avoid circular dependencies
- Use xe_bo_move_notify() instead of xe_bo_trigger_rebind() in xe_ttm_bo_purge()
for proper abstraction - handles vunmap, dma-buf notifications, and VRAM
userfault cleanup (Thomas)
- Fix LRU crash while running shrink test
- Skip xe_bo_validate() for purged BOs in xe_gpuvm_validate()
v6:
- xe_bo_move_notify() must be called *before* ttm_bo_validate(). (Thomas)
- Block GPU page faults (fault-mode VMs) for DONTNEED bo's (Thomas, Matt)
- Rename retained to retained_ptr. (Jose)
v7 Changes:
- Fix engine reset from EU overfetch in scratch VMs: xe_pagefault_begin()
and xe_pagefault_service() now return 0 instead of -EACCES/-EINVAL for
DONTNEED/purged BOs and missing VMAs so stale accesses hit scratch PTEs.
- Fix Engine memory CAT errors when Mesa uses DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE:
accept scratch VMs in xe_pagefault_asid_to_vm() via '|| xe_vm_has_scratch(vm).
- Skip validate/migrate/rebind for DONTNEED/purged BOs in xe_pagefault_begin()
using a bool *skip_rebind out-parameter. Scratch VMs ACK the fault and fall back
to scratch PTEs; non-scratch VMs return -EACCES.
v8:
- Remove skip_rebind out-parameter from xe_pagefault_begin(); always let
xe_vma_rebind() run so tile_present is updated and the GPU fault resolves.
Previously skip_rebind=true left tile_present=0, causing an infinite
refault loop on scratch VMs. (Thomas)
- Fixed spelling: corrected "madvice" → "madvise". (Thomas)
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
drivers/gpu/drm/xe/xe_bo.c | 107 ++++++++++++++++++++---
drivers/gpu/drm/xe/xe_bo.h | 2 +
drivers/gpu/drm/xe/xe_pagefault.c | 15 +++-
drivers/gpu/drm/xe/xe_pt.c | 40 +++++++--
drivers/gpu/drm/xe/xe_vm.c | 20 ++++-
drivers/gpu/drm/xe/xe_vm_madvise.c | 136 +++++++++++++++++++++++++++++
6 files changed, 298 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 22179b2df85c..b6055bb4c578 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -835,6 +835,84 @@ static int xe_bo_move_notify(struct xe_bo *bo,
return 0;
}
+/**
+ * xe_bo_set_purgeable_state() - Set BO purgeable state with validation
+ * @bo: Buffer object
+ * @new_state: New purgeable state
+ *
+ * Sets the purgeable state with lockdep assertions and validates state
+ * transitions. Once a BO is PURGED, it cannot transition to any other state.
+ * Invalid transitions are caught with xe_assert().
+ */
+void xe_bo_set_purgeable_state(struct xe_bo *bo,
+ enum xe_madv_purgeable_state new_state)
+{
+ struct xe_device *xe = xe_bo_device(bo);
+
+ xe_bo_assert_held(bo);
+
+ /* Validate state is one of the known values */
+ xe_assert(xe, new_state == XE_MADV_PURGEABLE_WILLNEED ||
+ new_state == XE_MADV_PURGEABLE_DONTNEED ||
+ new_state == XE_MADV_PURGEABLE_PURGED);
+
+ /* Once purged, always purged - cannot transition out */
+ xe_assert(xe, !(bo->madv_purgeable == XE_MADV_PURGEABLE_PURGED &&
+ new_state != XE_MADV_PURGEABLE_PURGED));
+
+ bo->madv_purgeable = new_state;
+}
+
+/**
+ * xe_ttm_bo_purge() - Purge buffer object backing store
+ * @ttm_bo: The TTM buffer object to purge
+ * @ctx: TTM operation context
+ *
+ * This function purges the backing store of a BO marked as DONTNEED and
+ * triggers rebind to invalidate stale GPU mappings. For fault-mode VMs,
+ * this zaps the PTEs. The next GPU access will trigger a page fault and
+ * perform NULL rebind (scratch pages or clear PTEs based on VM config).
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+static int xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
+{
+ struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
+ struct ttm_placement place = {};
+ int ret;
+
+ xe_bo_assert_held(bo);
+
+ if (!ttm_bo->ttm)
+ return 0;
+
+ if (!xe_bo_madv_is_dontneed(bo))
+ return 0;
+
+ /*
+ * Use the standard pre-move hook so we share the same cleanup/invalidate
+ * path as migrations: drop any CPU vmap and schedule the necessary GPU
+ * unbind/rebind work.
+ *
+ * This must be called before ttm_bo_validate() frees the pages.
+ * May fail in no-wait contexts (fault/shrinker) or if the BO is
+ * pinned. Keep state unchanged on failure so we don't end up "PURGED"
+ * with stale mappings.
+ */
+ ret = xe_bo_move_notify(bo, ctx);
+ if (ret)
+ return ret;
+
+ ret = ttm_bo_validate(ttm_bo, &place, ctx);
+ if (ret)
+ return ret;
+
+ /* Commit the state transition only once invalidation was queued */
+ xe_bo_set_purgeable_state(bo, XE_MADV_PURGEABLE_PURGED);
+
+ return 0;
+}
+
static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
struct ttm_operation_ctx *ctx,
struct ttm_resource *new_mem,
@@ -854,6 +932,20 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
ttm && ttm_tt_is_populated(ttm)) ? true : false;
int ret = 0;
+ /*
+ * Purge only non-shared BOs explicitly marked DONTNEED by userspace.
+ * The move_notify callback will handle invalidation asynchronously.
+ */
+ if (evict && xe_bo_madv_is_dontneed(bo)) {
+ ret = xe_ttm_bo_purge(ttm_bo, ctx);
+ if (ret)
+ return ret;
+
+ /* Free the unused eviction destination resource */
+ ttm_resource_free(ttm_bo, &new_mem);
+ return 0;
+ }
+
/* Bo creation path, moving to system or TT. */
if ((!old_mem && ttm) && !handle_system_ccs) {
if (new_mem->mem_type == XE_PL_TT)
@@ -1603,18 +1695,6 @@ static void xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object *ttm_bo)
}
}
-static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
-{
- struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
-
- if (ttm_bo->ttm) {
- struct ttm_placement place = {};
- int ret = ttm_bo_validate(ttm_bo, &place, ctx);
-
- drm_WARN_ON(&xe->drm, ret);
- }
-}
-
static void xe_ttm_bo_swap_notify(struct ttm_buffer_object *ttm_bo)
{
struct ttm_operation_ctx ctx = {
@@ -2195,6 +2275,9 @@ struct xe_bo *xe_bo_init_locked(struct xe_device *xe, struct xe_bo *bo,
#endif
INIT_LIST_HEAD(&bo->vram_userfault_link);
+ /* Initialize purge advisory state */
+ bo->madv_purgeable = XE_MADV_PURGEABLE_WILLNEED;
+
drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
if (resv) {
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index fb5541bdf602..653851d47aa6 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -271,6 +271,8 @@ static inline bool xe_bo_madv_is_dontneed(struct xe_bo *bo)
return bo->madv_purgeable == XE_MADV_PURGEABLE_DONTNEED;
}
+void xe_bo_set_purgeable_state(struct xe_bo *bo, enum xe_madv_purgeable_state new_state);
+
static inline void xe_bo_unpin_map_no_vm(struct xe_bo *bo)
{
if (likely(bo)) {
diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index ea4857acf28d..2ac6e1edaa81 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -59,6 +59,19 @@ static int xe_pagefault_begin(struct drm_exec *exec, struct xe_vma *vma,
if (!bo)
return 0;
+ /*
+ * Skip validate/migrate for DONTNEED/purged BOs - repopulating
+ * their pages would prevent the shrinker from reclaiming them.
+ * For non-scratch VMs there is no safe fallback so fail the fault.
+ * For scratch VMs let xe_vma_rebind() run normally; it will install
+ * scratch PTEs so the GPU gets safe zero reads instead of faulting.
+ */
+ if (unlikely(xe_bo_madv_is_dontneed(bo) || xe_bo_is_purged(bo))) {
+ if (!xe_vm_has_scratch(vm))
+ return -EACCES;
+ return 0;
+ }
+
return need_vram_move ? xe_bo_migrate(bo, vram->placement, NULL, exec) :
xe_bo_validate(bo, vm, true, exec);
}
@@ -145,7 +158,7 @@ static struct xe_vm *xe_pagefault_asid_to_vm(struct xe_device *xe, u32 asid)
down_read(&xe->usm.lock);
vm = xa_load(&xe->usm.asid_to_vm, asid);
- if (vm && xe_vm_in_fault_mode(vm))
+ if (vm && (xe_vm_in_fault_mode(vm) || xe_vm_has_scratch(vm)))
xe_vm_get(vm);
else
vm = ERR_PTR(-EINVAL);
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 2d9ce2c4cb4f..08f40701f654 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -531,20 +531,26 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
/* Is this a leaf entry ?*/
if (level == 0 || xe_pt_hugepte_possible(addr, next, level, xe_walk)) {
struct xe_res_cursor *curs = xe_walk->curs;
- bool is_null = xe_vma_is_null(xe_walk->vma);
- bool is_vram = is_null ? false : xe_res_is_vram(curs);
+ struct xe_bo *bo = xe_vma_bo(xe_walk->vma);
+ bool is_null_or_purged = xe_vma_is_null(xe_walk->vma) ||
+ (bo && xe_bo_is_purged(bo));
+ bool is_vram = is_null_or_purged ? false : xe_res_is_vram(curs);
XE_WARN_ON(xe_walk->va_curs_start != addr);
if (xe_walk->clear_pt) {
pte = 0;
} else {
- pte = vm->pt_ops->pte_encode_vma(is_null ? 0 :
+ /*
+ * For purged BOs, treat like null VMAs - pass address 0.
+ * The pte_encode_vma will set XE_PTE_NULL flag for scratch mapping.
+ */
+ pte = vm->pt_ops->pte_encode_vma(is_null_or_purged ? 0 :
xe_res_dma(curs) +
xe_walk->dma_offset,
xe_walk->vma,
pat_index, level);
- if (!is_null)
+ if (!is_null_or_purged)
pte |= is_vram ? xe_walk->default_vram_pte :
xe_walk->default_system_pte;
@@ -568,7 +574,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
if (unlikely(ret))
return ret;
- if (!is_null && !xe_walk->clear_pt)
+ if (!is_null_or_purged && !xe_walk->clear_pt)
xe_res_next(curs, next - addr);
xe_walk->va_curs_start = next;
xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << level);
@@ -721,6 +727,26 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
};
struct xe_pt *pt = vm->pt_root[tile->id];
int ret;
+ bool is_purged = false;
+
+ /*
+ * Check if BO is purged:
+ * - Scratch VMs: Use scratch PTEs (XE_PTE_NULL) for safe zero reads
+ * - Non-scratch VMs: Clear PTEs to zero (non-present) to avoid mapping to phys addr 0
+ *
+ * For non-scratch VMs, we force clear_pt=true so leaf PTEs become completely
+ * zero instead of creating a PRESENT mapping to physical address 0.
+ */
+ if (bo && xe_bo_is_purged(bo)) {
+ is_purged = true;
+
+ /*
+ * For non-scratch VMs, a NULL rebind should use zero PTEs
+ * (non-present), not a present PTE to phys 0.
+ */
+ if (!xe_vm_has_scratch(vm))
+ xe_walk.clear_pt = true;
+ }
if (range) {
/* Move this entire thing to xe_svm.c? */
@@ -756,11 +782,11 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
}
xe_walk.default_vram_pte |= XE_PPGTT_PTE_DM;
- xe_walk.dma_offset = bo ? vram_region_gpu_offset(bo->ttm.resource) : 0;
+ xe_walk.dma_offset = (bo && !is_purged) ? vram_region_gpu_offset(bo->ttm.resource) : 0;
if (!range)
xe_bo_assert_held(bo);
- if (!xe_vma_is_null(vma) && !range) {
+ if (!xe_vma_is_null(vma) && !range && !is_purged) {
if (xe_vma_is_userptr(vma))
xe_res_first_dma(to_userptr_vma(vma)->userptr.pages.dma_addr, 0,
xe_vma_size(vma), &curs);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 5572e12c2a7e..a0ade67d616e 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -326,6 +326,7 @@ void xe_vm_kill(struct xe_vm *vm, bool unlocked)
static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
{
struct xe_vm *vm = gpuvm_to_vm(vm_bo->vm);
+ struct xe_bo *bo = gem_to_xe_bo(vm_bo->obj);
struct drm_gpuva *gpuva;
int ret;
@@ -334,10 +335,16 @@ static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
list_move_tail(&gpuva_to_vma(gpuva)->combined_links.rebind,
&vm->rebind_list);
+ /* Skip re-populating purged BOs, rebind maps scratch pages. */
+ if (xe_bo_is_purged(bo)) {
+ vm_bo->evicted = false;
+ return 0;
+ }
+
if (!try_wait_for_completion(&vm->xe->pm_block))
return -EAGAIN;
- ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false, exec);
+ ret = xe_bo_validate(bo, vm, false, exec);
if (ret)
return ret;
@@ -1358,6 +1365,9 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
u16 pat_index, u32 pt_level)
{
+ struct xe_bo *bo = xe_vma_bo(vma);
+ struct xe_vm *vm = xe_vma_vm(vma);
+
pte |= XE_PAGE_PRESENT;
if (likely(!xe_vma_read_only(vma)))
@@ -1366,7 +1376,13 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
pte |= pte_encode_pat_index(pat_index, pt_level);
pte |= pte_encode_ps(pt_level);
- if (unlikely(xe_vma_is_null(vma)))
+ /*
+ * NULL PTEs redirect to scratch page (return zeros on read).
+ * Set for: 1) explicit null VMAs, 2) purged BOs on scratch VMs.
+ * Never set NULL flag without scratch page - causes undefined behavior.
+ */
+ if (unlikely(xe_vma_is_null(vma) ||
+ (bo && xe_bo_is_purged(bo) && xe_vm_has_scratch(vm))))
pte |= XE_PTE_NULL;
return pte;
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 869db304d96d..881de6cb6c11 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -26,6 +26,8 @@ struct xe_vmas_in_madvise_range {
/**
* struct xe_madvise_details - Argument to madvise_funcs
* @dpagemap: Reference-counted pointer to a struct drm_pagemap.
+ * @has_purged_bo: Track if any BO was purged (for purgeable state)
+ * @retained_ptr: User pointer for retained value (for purgeable state)
*
* The madvise IOCTL handler may, in addition to the user-space
* args, have additional info to pass into the madvise_func that
@@ -34,6 +36,8 @@ struct xe_vmas_in_madvise_range {
*/
struct xe_madvise_details {
struct drm_pagemap *dpagemap;
+ bool has_purged_bo;
+ u64 retained_ptr;
};
static int get_vmas(struct xe_vm *vm, struct xe_vmas_in_madvise_range *madvise_range)
@@ -180,6 +184,67 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
}
}
+/**
+ * madvise_purgeable - Handle purgeable buffer object advice
+ * @xe: XE device
+ * @vm: VM
+ * @vmas: Array of VMAs
+ * @num_vmas: Number of VMAs
+ * @op: Madvise operation
+ * @details: Madvise details for return values
+ *
+ * Handles DONTNEED/WILLNEED/PURGED states. Tracks if any BO was purged
+ * in details->has_purged_bo for later copy to userspace.
+ *
+ * Note: Marked __maybe_unused until hooked into madvise_funcs[] in the
+ * final patch to maintain bisectability. The NULL placeholder in the
+ * array ensures proper -EINVAL return for userspace until all supporting
+ * infrastructure (shrinker, per-VMA tracking) is complete.
+ */
+static void __maybe_unused madvise_purgeable(struct xe_device *xe,
+ struct xe_vm *vm,
+ struct xe_vma **vmas,
+ int num_vmas,
+ struct drm_xe_madvise *op,
+ struct xe_madvise_details *details)
+{
+ int i;
+
+ xe_assert(vm->xe, op->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE);
+
+ for (i = 0; i < num_vmas; i++) {
+ struct xe_bo *bo = xe_vma_bo(vmas[i]);
+
+ if (!bo)
+ continue;
+
+ /* BO must be locked before modifying madv state */
+ xe_bo_assert_held(bo);
+
+ /*
+ * Once purged, always purged. Cannot transition back to WILLNEED.
+ * This matches i915 semantics where purged BOs are permanently invalid.
+ */
+ if (xe_bo_is_purged(bo)) {
+ details->has_purged_bo = true;
+ continue;
+ }
+
+ switch (op->purge_state_val.val) {
+ case DRM_XE_VMA_PURGEABLE_STATE_WILLNEED:
+ xe_bo_set_purgeable_state(bo, XE_MADV_PURGEABLE_WILLNEED);
+ break;
+ case DRM_XE_VMA_PURGEABLE_STATE_DONTNEED:
+ xe_bo_set_purgeable_state(bo, XE_MADV_PURGEABLE_DONTNEED);
+ break;
+ default:
+ drm_warn(&vm->xe->drm, "Invalid madvise value = %d\n",
+ op->purge_state_val.val);
+ return;
+ }
+ }
+}
+
typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
struct xe_vma **vmas, int num_vmas,
struct drm_xe_madvise *op,
@@ -189,6 +254,12 @@ static const madvise_func madvise_funcs[] = {
[DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC] = madvise_preferred_mem_loc,
[DRM_XE_MEM_RANGE_ATTR_ATOMIC] = madvise_atomic,
[DRM_XE_MEM_RANGE_ATTR_PAT] = madvise_pat_index,
+ /*
+ * Purgeable support implemented but not enabled yet to maintain
+ * bisectability. Will be set to madvise_purgeable() in final patch
+ * when all infrastructure (shrinker, VMA tracking) is complete.
+ */
+ [DRM_XE_VMA_ATTR_PURGEABLE_STATE] = NULL,
};
static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start, u64 end)
@@ -319,6 +390,19 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
return false;
break;
}
+ case DRM_XE_VMA_ATTR_PURGEABLE_STATE:
+ {
+ u32 val = args->purge_state_val.val;
+
+ if (XE_IOCTL_DBG(xe, !(val == DRM_XE_VMA_PURGEABLE_STATE_WILLNEED ||
+ val == DRM_XE_VMA_PURGEABLE_STATE_DONTNEED)))
+ return false;
+
+ if (XE_IOCTL_DBG(xe, args->purge_state_val.pad))
+ return false;
+
+ break;
+ }
default:
if (XE_IOCTL_DBG(xe, 1))
return false;
@@ -337,6 +421,12 @@ static int xe_madvise_details_init(struct xe_vm *vm, const struct drm_xe_madvise
memset(details, 0, sizeof(*details));
+ /* Store retained pointer for purgeable state */
+ if (args->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE) {
+ details->retained_ptr = args->purge_state_val.retained_ptr;
+ return 0;
+ }
+
if (args->type == DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC) {
int fd = args->preferred_mem_loc.devmem_fd;
struct drm_pagemap *dpagemap;
@@ -365,6 +455,21 @@ static void xe_madvise_details_fini(struct xe_madvise_details *details)
drm_pagemap_put(details->dpagemap);
}
+static int xe_madvise_purgeable_retained_to_user(const struct xe_madvise_details *details)
+{
+ u32 retained;
+
+ if (!details->retained_ptr)
+ return 0;
+
+ retained = !details->has_purged_bo;
+
+ if (put_user(retained, (u32 __user *)u64_to_user_ptr(details->retained_ptr)))
+ return -EFAULT;
+
+ return 0;
+}
+
static bool check_bo_args_are_sane(struct xe_vm *vm, struct xe_vma **vmas,
int num_vmas, u32 atomic_val)
{
@@ -422,6 +527,7 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
struct xe_vm *vm;
struct drm_exec exec;
int err, attr_type;
+ bool do_retained;
vm = xe_vm_lookup(xef, args->vm_id);
if (XE_IOCTL_DBG(xe, !vm))
@@ -432,6 +538,25 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
goto put_vm;
}
+ /* Cache whether we need to write retained, and validate it's initialized to 0 */
+ do_retained = args->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE &&
+ args->purge_state_val.retained_ptr;
+ if (do_retained) {
+ u32 retained;
+ u32 __user *retained_ptr;
+
+ retained_ptr = u64_to_user_ptr(args->purge_state_val.retained_ptr);
+ if (get_user(retained, retained_ptr)) {
+ err = -EFAULT;
+ goto put_vm;
+ }
+
+ if (XE_IOCTL_DBG(xe, retained != 0)) {
+ err = -EINVAL;
+ goto put_vm;
+ }
+ }
+
xe_svm_flush(vm);
err = down_write_killable(&vm->lock);
@@ -487,6 +612,13 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
}
attr_type = array_index_nospec(args->type, ARRAY_SIZE(madvise_funcs));
+
+ /* Ensure the madvise function exists for this type */
+ if (!madvise_funcs[attr_type]) {
+ err = -EINVAL;
+ goto err_fini;
+ }
+
madvise_funcs[attr_type](xe, vm, madvise_range.vmas, madvise_range.num_vmas, args,
&details);
@@ -505,6 +637,10 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
xe_madvise_details_fini(&details);
unlock_vm:
up_write(&vm->lock);
+
+ /* Write retained value to user after releasing all locks */
+ if (!err && do_retained)
+ err = xe_madvise_purgeable_retained_to_user(&details);
put_vm:
xe_vm_put(vm);
return err;
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH v8 03/12] drm/xe/madvise: Implement purgeable buffer object support
2026-03-26 5:51 ` [PATCH v8 03/12] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
@ 2026-03-26 8:19 ` Thomas Hellström
0 siblings, 0 replies; 16+ messages in thread
From: Thomas Hellström @ 2026-03-26 8:19 UTC (permalink / raw)
To: Arvind Yadav, intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray
On Thu, 2026-03-26 at 11:21 +0530, Arvind Yadav wrote:
> This allows userspace applications to provide memory usage hints to
> the kernel for better memory management under pressure:
>
> Add the core implementation for purgeable buffer objects, enabling
> memory
> reclamation of user-designated DONTNEED buffers during eviction.
>
> This patch implements the purge operation and state machine
> transitions:
>
> Purgeable States (from xe_madv_purgeable_state):
> - WILLNEED (0): BO should be retained, actively used
> - DONTNEED (1): BO eligible for purging, not currently needed
> - PURGED (2): BO backing store reclaimed, permanently invalid
>
> Design Rationale:
> - Async TLB invalidation via trigger_rebind (no blocking
> xe_vm_invalidate_vma)
> - i915 compatibility: retained field, "once purged always purged"
> semantics
> - Shared BO protection prevents multi-process memory corruption
> - Scratch PTE reuse avoids new infrastructure, safe for fault mode
>
> Note: The madvise_purgeable() function is implemented but not hooked
> into
> the IOCTL handler (madvise_funcs[] entry is NULL) to maintain
> bisectability.
> The feature will be enabled in the final patch when all supporting
> infrastructure (shrinker, per-VMA tracking) is complete.
>
> v2:
> - Use xe_bo_trigger_rebind() for async TLB invalidation (Thomas
> Hellström)
> - Add NULL rebind with scratch PTEs for fault mode (Thomas
> Hellström)
> - Implement i915-compatible retained field logic (Thomas Hellström)
> - Skip BO validation for purged BOs in page fault handler (crash
> fix)
> - Add scratch VM check in page fault path (non-scratch VMs fail
> fault)
> - Force clear_pt for non-scratch VMs to avoid phys addr 0 mapping
> (review fix)
> - Add !is_purged check to resource cursor setup to prevent stale
> access
>
> v3:
> - Rebase as xe_gt_pagefault.c is gone upstream and replaced
> with xe_pagefault.c (Matthew Brost)
> - Xe specific warn on (Matthew Brost)
> - Call helpers for madv_purgeable access(Matthew Brost)
> - Remove bo NULL check(Matthew Brost)
> - Use xe_bo_assert_held instead of dma assert(Matthew Brost)
> - Move the xe_bo_is_purged check under the dma-resv lock( by Matt)
> - Drop is_purged from xe_pt_stage_bind_entry and just set is_null
> to true
> for purged BO rename s/is_null/is_null_or_purged (by Matt)
> - UAPI rule should not be changed.(Matthew Brost)
> - Make 'retained' a userptr (Matthew Brost)
>
> v4:
> - @madv_purgeable atomic_t → u32 change across all relevant patches
> (Matt)
>
> v5:
> - Introduce xe_bo_set_purgeable_state() helper (void return) to
> centralize
> madv_purgeable updates with xe_bo_assert_held() and state
> transition
> validation using explicit enum checks (no transition out of
> PURGED) (Matt)
> - Make xe_ttm_bo_purge() return int and propagate failures from
> xe_bo_move(); handle xe_bo_trigger_rebind() failures (e.g.
> no_wait_gpu
> paths) rather than silently ignoring (Matt)
> - Replace drm_WARN_ON with xe_assert for better Xe-specific
> assertions (Matt)
> - Hook purgeable handling into
> madvise_funcs[DRM_XE_VMA_ATTR_PURGEABLE_STATE]
> instead of special-case path in xe_vm_madvise_ioctl() (Matt)
> - Track purgeable retained return via xe_madvise_details and
> perform
> copy_to_user() from xe_madvise_details_fini() after locks are
> dropped (Matt)
> - Set madvise_funcs[DRM_XE_VMA_ATTR_PURGEABLE_STATE] to NULL with
> __maybe_unused on madvise_purgeable() to maintain bisectability
> until
> shrinker integration is complete in final patch (Matt)
> - Use put_user() instead of copy_to_user() for single u32 retained
> value (Thomas)
> - Return -EFAULT from ioctl if put_user() fails (Thomas)
> - Validate userspace initialized retained to 0 before ioctl,
> ensuring safe
> default (0 = "assume purged") if put_user() fails (Thomas)
> - Refactor error handling: separate fallible put_user from
> infallible cleanup
> - xe_madvise_purgeable_retained_to_user(): separate helper for
> fallible put_user
> - Call put_user() after releasing all locks to avoid circular
> dependencies
> - Use xe_bo_move_notify() instead of xe_bo_trigger_rebind() in
> xe_ttm_bo_purge()
> for proper abstraction - handles vunmap, dma-buf notifications,
> and VRAM
> userfault cleanup (Thomas)
> - Fix LRU crash while running shrink test
> - Skip xe_bo_validate() for purged BOs in xe_gpuvm_validate()
>
> v6:
> - xe_bo_move_notify() must be called *before* ttm_bo_validate().
> (Thomas)
> - Block GPU page faults (fault-mode VMs) for DONTNEED bo's (Thomas,
> Matt)
> - Rename retained to retained_ptr. (Jose)
>
> v7 Changes:
> - Fix engine reset from EU overfetch in scratch VMs:
> xe_pagefault_begin()
> and xe_pagefault_service() now return 0 instead of -EACCES/-
> EINVAL for
> DONTNEED/purged BOs and missing VMAs so stale accesses hit
> scratch PTEs.
> - Fix Engine memory CAT errors when Mesa uses
> DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE:
> accept scratch VMs in xe_pagefault_asid_to_vm() via '||
> xe_vm_has_scratch(vm).
> - Skip validate/migrate/rebind for DONTNEED/purged BOs in
> xe_pagefault_begin()
> using a bool *skip_rebind out-parameter. Scratch VMs ACK the
> fault and fall back
> to scratch PTEs; non-scratch VMs return -EACCES.
>
> v8:
> - Remove skip_rebind out-parameter from xe_pagefault_begin();
> always let
> xe_vma_rebind() run so tile_present is updated and the GPU fault
> resolves.
> Previously skip_rebind=true left tile_present=0, causing an
> infinite
> refault loop on scratch VMs. (Thomas)
Let's discuss later whether we need a follow-up patch for this. Perhaps
we should kill the VM in the case of a real fault and just have
prefaults nop.
For now
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> - Fixed spelling: corrected "madvice" → "madvise". (Thomas)
>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
> drivers/gpu/drm/xe/xe_bo.c | 107 ++++++++++++++++++++---
> drivers/gpu/drm/xe/xe_bo.h | 2 +
> drivers/gpu/drm/xe/xe_pagefault.c | 15 +++-
> drivers/gpu/drm/xe/xe_pt.c | 40 +++++++--
> drivers/gpu/drm/xe/xe_vm.c | 20 ++++-
> drivers/gpu/drm/xe/xe_vm_madvise.c | 136
> +++++++++++++++++++++++++++++
> 6 files changed, 298 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 22179b2df85c..b6055bb4c578 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -835,6 +835,84 @@ static int xe_bo_move_notify(struct xe_bo *bo,
> return 0;
> }
>
> +/**
> + * xe_bo_set_purgeable_state() - Set BO purgeable state with
> validation
> + * @bo: Buffer object
> + * @new_state: New purgeable state
> + *
> + * Sets the purgeable state with lockdep assertions and validates
> state
> + * transitions. Once a BO is PURGED, it cannot transition to any
> other state.
> + * Invalid transitions are caught with xe_assert().
> + */
> +void xe_bo_set_purgeable_state(struct xe_bo *bo,
> + enum xe_madv_purgeable_state
> new_state)
> +{
> + struct xe_device *xe = xe_bo_device(bo);
> +
> + xe_bo_assert_held(bo);
> +
> + /* Validate state is one of the known values */
> + xe_assert(xe, new_state == XE_MADV_PURGEABLE_WILLNEED ||
> + new_state == XE_MADV_PURGEABLE_DONTNEED ||
> + new_state == XE_MADV_PURGEABLE_PURGED);
> +
> + /* Once purged, always purged - cannot transition out */
> + xe_assert(xe, !(bo->madv_purgeable ==
> XE_MADV_PURGEABLE_PURGED &&
> + new_state != XE_MADV_PURGEABLE_PURGED));
> +
> + bo->madv_purgeable = new_state;
> +}
> +
> +/**
> + * xe_ttm_bo_purge() - Purge buffer object backing store
> + * @ttm_bo: The TTM buffer object to purge
> + * @ctx: TTM operation context
> + *
> + * This function purges the backing store of a BO marked as DONTNEED
> and
> + * triggers rebind to invalidate stale GPU mappings. For fault-mode
> VMs,
> + * this zaps the PTEs. The next GPU access will trigger a page fault
> and
> + * perform NULL rebind (scratch pages or clear PTEs based on VM
> config).
> + *
> + * Return: 0 on success, negative error code on failure
> + */
> +static int xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct
> ttm_operation_ctx *ctx)
> +{
> + struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
> + struct ttm_placement place = {};
> + int ret;
> +
> + xe_bo_assert_held(bo);
> +
> + if (!ttm_bo->ttm)
> + return 0;
> +
> + if (!xe_bo_madv_is_dontneed(bo))
> + return 0;
> +
> + /*
> + * Use the standard pre-move hook so we share the same
> cleanup/invalidate
> + * path as migrations: drop any CPU vmap and schedule the
> necessary GPU
> + * unbind/rebind work.
> + *
> + * This must be called before ttm_bo_validate() frees the
> pages.
> + * May fail in no-wait contexts (fault/shrinker) or if the
> BO is
> + * pinned. Keep state unchanged on failure so we don't end
> up "PURGED"
> + * with stale mappings.
> + */
> + ret = xe_bo_move_notify(bo, ctx);
> + if (ret)
> + return ret;
> +
> + ret = ttm_bo_validate(ttm_bo, &place, ctx);
> + if (ret)
> + return ret;
> +
> + /* Commit the state transition only once invalidation was
> queued */
> + xe_bo_set_purgeable_state(bo, XE_MADV_PURGEABLE_PURGED);
> +
> + return 0;
> +}
> +
> static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
> struct ttm_operation_ctx *ctx,
> struct ttm_resource *new_mem,
> @@ -854,6 +932,20 @@ static int xe_bo_move(struct ttm_buffer_object
> *ttm_bo, bool evict,
> ttm && ttm_tt_is_populated(ttm)) ?
> true : false;
> int ret = 0;
>
> + /*
> + * Purge only non-shared BOs explicitly marked DONTNEED by
> userspace.
> + * The move_notify callback will handle invalidation
> asynchronously.
> + */
> + if (evict && xe_bo_madv_is_dontneed(bo)) {
> + ret = xe_ttm_bo_purge(ttm_bo, ctx);
> + if (ret)
> + return ret;
> +
> + /* Free the unused eviction destination resource */
> + ttm_resource_free(ttm_bo, &new_mem);
> + return 0;
> + }
> +
> /* Bo creation path, moving to system or TT. */
> if ((!old_mem && ttm) && !handle_system_ccs) {
> if (new_mem->mem_type == XE_PL_TT)
> @@ -1603,18 +1695,6 @@ static void xe_ttm_bo_delete_mem_notify(struct
> ttm_buffer_object *ttm_bo)
> }
> }
>
> -static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct
> ttm_operation_ctx *ctx)
> -{
> - struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
> -
> - if (ttm_bo->ttm) {
> - struct ttm_placement place = {};
> - int ret = ttm_bo_validate(ttm_bo, &place, ctx);
> -
> - drm_WARN_ON(&xe->drm, ret);
> - }
> -}
> -
> static void xe_ttm_bo_swap_notify(struct ttm_buffer_object *ttm_bo)
> {
> struct ttm_operation_ctx ctx = {
> @@ -2195,6 +2275,9 @@ struct xe_bo *xe_bo_init_locked(struct
> xe_device *xe, struct xe_bo *bo,
> #endif
> INIT_LIST_HEAD(&bo->vram_userfault_link);
>
> + /* Initialize purge advisory state */
> + bo->madv_purgeable = XE_MADV_PURGEABLE_WILLNEED;
> +
> drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
>
> if (resv) {
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index fb5541bdf602..653851d47aa6 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -271,6 +271,8 @@ static inline bool xe_bo_madv_is_dontneed(struct
> xe_bo *bo)
> return bo->madv_purgeable == XE_MADV_PURGEABLE_DONTNEED;
> }
>
> +void xe_bo_set_purgeable_state(struct xe_bo *bo, enum
> xe_madv_purgeable_state new_state);
> +
> static inline void xe_bo_unpin_map_no_vm(struct xe_bo *bo)
> {
> if (likely(bo)) {
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c
> b/drivers/gpu/drm/xe/xe_pagefault.c
> index ea4857acf28d..2ac6e1edaa81 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -59,6 +59,19 @@ static int xe_pagefault_begin(struct drm_exec
> *exec, struct xe_vma *vma,
> if (!bo)
> return 0;
>
> + /*
> + * Skip validate/migrate for DONTNEED/purged BOs -
> repopulating
> + * their pages would prevent the shrinker from reclaiming
> them.
> + * For non-scratch VMs there is no safe fallback so fail the
> fault.
> + * For scratch VMs let xe_vma_rebind() run normally; it will
> install
> + * scratch PTEs so the GPU gets safe zero reads instead of
> faulting.
> + */
> + if (unlikely(xe_bo_madv_is_dontneed(bo) ||
> xe_bo_is_purged(bo))) {
> + if (!xe_vm_has_scratch(vm))
> + return -EACCES;
> + return 0;
> + }
> +
> return need_vram_move ? xe_bo_migrate(bo, vram->placement,
> NULL, exec) :
> xe_bo_validate(bo, vm, true, exec);
> }
> @@ -145,7 +158,7 @@ static struct xe_vm
> *xe_pagefault_asid_to_vm(struct xe_device *xe, u32 asid)
>
> down_read(&xe->usm.lock);
> vm = xa_load(&xe->usm.asid_to_vm, asid);
> - if (vm && xe_vm_in_fault_mode(vm))
> + if (vm && (xe_vm_in_fault_mode(vm) ||
> xe_vm_has_scratch(vm)))
> xe_vm_get(vm);
> else
> vm = ERR_PTR(-EINVAL);
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 2d9ce2c4cb4f..08f40701f654 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -531,20 +531,26 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent,
> pgoff_t offset,
> /* Is this a leaf entry ?*/
> if (level == 0 || xe_pt_hugepte_possible(addr, next, level,
> xe_walk)) {
> struct xe_res_cursor *curs = xe_walk->curs;
> - bool is_null = xe_vma_is_null(xe_walk->vma);
> - bool is_vram = is_null ? false :
> xe_res_is_vram(curs);
> + struct xe_bo *bo = xe_vma_bo(xe_walk->vma);
> + bool is_null_or_purged = xe_vma_is_null(xe_walk-
> >vma) ||
> + (bo &&
> xe_bo_is_purged(bo));
> + bool is_vram = is_null_or_purged ? false :
> xe_res_is_vram(curs);
>
> XE_WARN_ON(xe_walk->va_curs_start != addr);
>
> if (xe_walk->clear_pt) {
> pte = 0;
> } else {
> - pte = vm->pt_ops->pte_encode_vma(is_null ? 0
> :
> + /*
> + * For purged BOs, treat like null VMAs -
> pass address 0.
> + * The pte_encode_vma will set XE_PTE_NULL
> flag for scratch mapping.
> + */
> + pte = vm->pt_ops-
> >pte_encode_vma(is_null_or_purged ? 0 :
>
> xe_res_dma(curs) +
> xe_walk-
> >dma_offset,
> xe_walk-
> >vma,
> pat_index,
> level);
> - if (!is_null)
> + if (!is_null_or_purged)
> pte |= is_vram ? xe_walk-
> >default_vram_pte :
> xe_walk->default_system_pte;
>
> @@ -568,7 +574,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent,
> pgoff_t offset,
> if (unlikely(ret))
> return ret;
>
> - if (!is_null && !xe_walk->clear_pt)
> + if (!is_null_or_purged && !xe_walk->clear_pt)
> xe_res_next(curs, next - addr);
> xe_walk->va_curs_start = next;
> xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K <<
> level);
> @@ -721,6 +727,26 @@ xe_pt_stage_bind(struct xe_tile *tile, struct
> xe_vma *vma,
> };
> struct xe_pt *pt = vm->pt_root[tile->id];
> int ret;
> + bool is_purged = false;
> +
> + /*
> + * Check if BO is purged:
> + * - Scratch VMs: Use scratch PTEs (XE_PTE_NULL) for safe
> zero reads
> + * - Non-scratch VMs: Clear PTEs to zero (non-present) to
> avoid mapping to phys addr 0
> + *
> + * For non-scratch VMs, we force clear_pt=true so leaf PTEs
> become completely
> + * zero instead of creating a PRESENT mapping to physical
> address 0.
> + */
> + if (bo && xe_bo_is_purged(bo)) {
> + is_purged = true;
> +
> + /*
> + * For non-scratch VMs, a NULL rebind should use
> zero PTEs
> + * (non-present), not a present PTE to phys 0.
> + */
> + if (!xe_vm_has_scratch(vm))
> + xe_walk.clear_pt = true;
> + }
>
> if (range) {
> /* Move this entire thing to xe_svm.c? */
> @@ -756,11 +782,11 @@ xe_pt_stage_bind(struct xe_tile *tile, struct
> xe_vma *vma,
> }
>
> xe_walk.default_vram_pte |= XE_PPGTT_PTE_DM;
> - xe_walk.dma_offset = bo ? vram_region_gpu_offset(bo-
> >ttm.resource) : 0;
> + xe_walk.dma_offset = (bo && !is_purged) ?
> vram_region_gpu_offset(bo->ttm.resource) : 0;
> if (!range)
> xe_bo_assert_held(bo);
>
> - if (!xe_vma_is_null(vma) && !range) {
> + if (!xe_vma_is_null(vma) && !range && !is_purged) {
> if (xe_vma_is_userptr(vma))
> xe_res_first_dma(to_userptr_vma(vma)-
> >userptr.pages.dma_addr, 0,
> xe_vma_size(vma), &curs);
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 5572e12c2a7e..a0ade67d616e 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -326,6 +326,7 @@ void xe_vm_kill(struct xe_vm *vm, bool unlocked)
> static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct
> drm_exec *exec)
> {
> struct xe_vm *vm = gpuvm_to_vm(vm_bo->vm);
> + struct xe_bo *bo = gem_to_xe_bo(vm_bo->obj);
> struct drm_gpuva *gpuva;
> int ret;
>
> @@ -334,10 +335,16 @@ static int xe_gpuvm_validate(struct
> drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
> list_move_tail(&gpuva_to_vma(gpuva)-
> >combined_links.rebind,
> &vm->rebind_list);
>
> + /* Skip re-populating purged BOs, rebind maps scratch pages.
> */
> + if (xe_bo_is_purged(bo)) {
> + vm_bo->evicted = false;
> + return 0;
> + }
> +
> if (!try_wait_for_completion(&vm->xe->pm_block))
> return -EAGAIN;
>
> - ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false,
> exec);
> + ret = xe_bo_validate(bo, vm, false, exec);
> if (ret)
> return ret;
>
> @@ -1358,6 +1365,9 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo,
> u64 bo_offset,
> static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
> u16 pat_index, u32 pt_level)
> {
> + struct xe_bo *bo = xe_vma_bo(vma);
> + struct xe_vm *vm = xe_vma_vm(vma);
> +
> pte |= XE_PAGE_PRESENT;
>
> if (likely(!xe_vma_read_only(vma)))
> @@ -1366,7 +1376,13 @@ static u64 xelp_pte_encode_vma(u64 pte, struct
> xe_vma *vma,
> pte |= pte_encode_pat_index(pat_index, pt_level);
> pte |= pte_encode_ps(pt_level);
>
> - if (unlikely(xe_vma_is_null(vma)))
> + /*
> + * NULL PTEs redirect to scratch page (return zeros on
> read).
> + * Set for: 1) explicit null VMAs, 2) purged BOs on scratch
> VMs.
> + * Never set NULL flag without scratch page - causes
> undefined behavior.
> + */
> + if (unlikely(xe_vma_is_null(vma) ||
> + (bo && xe_bo_is_purged(bo) &&
> xe_vm_has_scratch(vm))))
> pte |= XE_PTE_NULL;
>
> return pte;
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c
> b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index 869db304d96d..881de6cb6c11 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -26,6 +26,8 @@ struct xe_vmas_in_madvise_range {
> /**
> * struct xe_madvise_details - Argument to madvise_funcs
> * @dpagemap: Reference-counted pointer to a struct drm_pagemap.
> + * @has_purged_bo: Track if any BO was purged (for purgeable state)
> + * @retained_ptr: User pointer for retained value (for purgeable
> state)
> *
> * The madvise IOCTL handler may, in addition to the user-space
> * args, have additional info to pass into the madvise_func that
> @@ -34,6 +36,8 @@ struct xe_vmas_in_madvise_range {
> */
> struct xe_madvise_details {
> struct drm_pagemap *dpagemap;
> + bool has_purged_bo;
> + u64 retained_ptr;
> };
>
> static int get_vmas(struct xe_vm *vm, struct
> xe_vmas_in_madvise_range *madvise_range)
> @@ -180,6 +184,67 @@ static void madvise_pat_index(struct xe_device
> *xe, struct xe_vm *vm,
> }
> }
>
> +/**
> + * madvise_purgeable - Handle purgeable buffer object advice
> + * @xe: XE device
> + * @vm: VM
> + * @vmas: Array of VMAs
> + * @num_vmas: Number of VMAs
> + * @op: Madvise operation
> + * @details: Madvise details for return values
> + *
> + * Handles DONTNEED/WILLNEED/PURGED states. Tracks if any BO was
> purged
> + * in details->has_purged_bo for later copy to userspace.
> + *
> + * Note: Marked __maybe_unused until hooked into madvise_funcs[] in
> the
> + * final patch to maintain bisectability. The NULL placeholder in
> the
> + * array ensures proper -EINVAL return for userspace until all
> supporting
> + * infrastructure (shrinker, per-VMA tracking) is complete.
> + */
> +static void __maybe_unused madvise_purgeable(struct xe_device *xe,
> + struct xe_vm *vm,
> + struct xe_vma **vmas,
> + int num_vmas,
> + struct drm_xe_madvise
> *op,
> + struct
> xe_madvise_details *details)
> +{
> + int i;
> +
> + xe_assert(vm->xe, op->type ==
> DRM_XE_VMA_ATTR_PURGEABLE_STATE);
> +
> + for (i = 0; i < num_vmas; i++) {
> + struct xe_bo *bo = xe_vma_bo(vmas[i]);
> +
> + if (!bo)
> + continue;
> +
> + /* BO must be locked before modifying madv state */
> + xe_bo_assert_held(bo);
> +
> + /*
> + * Once purged, always purged. Cannot transition
> back to WILLNEED.
> + * This matches i915 semantics where purged BOs are
> permanently invalid.
> + */
> + if (xe_bo_is_purged(bo)) {
> + details->has_purged_bo = true;
> + continue;
> + }
> +
> + switch (op->purge_state_val.val) {
> + case DRM_XE_VMA_PURGEABLE_STATE_WILLNEED:
> + xe_bo_set_purgeable_state(bo,
> XE_MADV_PURGEABLE_WILLNEED);
> + break;
> + case DRM_XE_VMA_PURGEABLE_STATE_DONTNEED:
> + xe_bo_set_purgeable_state(bo,
> XE_MADV_PURGEABLE_DONTNEED);
> + break;
> + default:
> + drm_warn(&vm->xe->drm, "Invalid madvise
> value = %d\n",
> + op->purge_state_val.val);
> + return;
> + }
> + }
> +}
> +
> typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
> struct xe_vma **vmas, int num_vmas,
> struct drm_xe_madvise *op,
> @@ -189,6 +254,12 @@ static const madvise_func madvise_funcs[] = {
> [DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC] =
> madvise_preferred_mem_loc,
> [DRM_XE_MEM_RANGE_ATTR_ATOMIC] = madvise_atomic,
> [DRM_XE_MEM_RANGE_ATTR_PAT] = madvise_pat_index,
> + /*
> + * Purgeable support implemented but not enabled yet to
> maintain
> + * bisectability. Will be set to madvise_purgeable() in
> final patch
> + * when all infrastructure (shrinker, VMA tracking) is
> complete.
> + */
> + [DRM_XE_VMA_ATTR_PURGEABLE_STATE] = NULL,
> };
>
> static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start,
> u64 end)
> @@ -319,6 +390,19 @@ static bool madvise_args_are_sane(struct
> xe_device *xe, const struct drm_xe_madv
> return false;
> break;
> }
> + case DRM_XE_VMA_ATTR_PURGEABLE_STATE:
> + {
> + u32 val = args->purge_state_val.val;
> +
> + if (XE_IOCTL_DBG(xe, !(val ==
> DRM_XE_VMA_PURGEABLE_STATE_WILLNEED ||
> + val ==
> DRM_XE_VMA_PURGEABLE_STATE_DONTNEED)))
> + return false;
> +
> + if (XE_IOCTL_DBG(xe, args->purge_state_val.pad))
> + return false;
> +
> + break;
> + }
> default:
> if (XE_IOCTL_DBG(xe, 1))
> return false;
> @@ -337,6 +421,12 @@ static int xe_madvise_details_init(struct xe_vm
> *vm, const struct drm_xe_madvise
>
> memset(details, 0, sizeof(*details));
>
> + /* Store retained pointer for purgeable state */
> + if (args->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE) {
> + details->retained_ptr = args-
> >purge_state_val.retained_ptr;
> + return 0;
> + }
> +
> if (args->type == DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC) {
> int fd = args->preferred_mem_loc.devmem_fd;
> struct drm_pagemap *dpagemap;
> @@ -365,6 +455,21 @@ static void xe_madvise_details_fini(struct
> xe_madvise_details *details)
> drm_pagemap_put(details->dpagemap);
> }
>
> +static int xe_madvise_purgeable_retained_to_user(const struct
> xe_madvise_details *details)
> +{
> + u32 retained;
> +
> + if (!details->retained_ptr)
> + return 0;
> +
> + retained = !details->has_purged_bo;
> +
> + if (put_user(retained, (u32 __user
> *)u64_to_user_ptr(details->retained_ptr)))
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> static bool check_bo_args_are_sane(struct xe_vm *vm, struct xe_vma
> **vmas,
> int num_vmas, u32 atomic_val)
> {
> @@ -422,6 +527,7 @@ int xe_vm_madvise_ioctl(struct drm_device *dev,
> void *data, struct drm_file *fil
> struct xe_vm *vm;
> struct drm_exec exec;
> int err, attr_type;
> + bool do_retained;
>
> vm = xe_vm_lookup(xef, args->vm_id);
> if (XE_IOCTL_DBG(xe, !vm))
> @@ -432,6 +538,25 @@ int xe_vm_madvise_ioctl(struct drm_device *dev,
> void *data, struct drm_file *fil
> goto put_vm;
> }
>
> + /* Cache whether we need to write retained, and validate
> it's initialized to 0 */
> + do_retained = args->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE
> &&
> + args->purge_state_val.retained_ptr;
> + if (do_retained) {
> + u32 retained;
> + u32 __user *retained_ptr;
> +
> + retained_ptr = u64_to_user_ptr(args-
> >purge_state_val.retained_ptr);
> + if (get_user(retained, retained_ptr)) {
> + err = -EFAULT;
> + goto put_vm;
> + }
> +
> + if (XE_IOCTL_DBG(xe, retained != 0)) {
> + err = -EINVAL;
> + goto put_vm;
> + }
> + }
> +
> xe_svm_flush(vm);
>
> err = down_write_killable(&vm->lock);
> @@ -487,6 +612,13 @@ int xe_vm_madvise_ioctl(struct drm_device *dev,
> void *data, struct drm_file *fil
> }
>
> attr_type = array_index_nospec(args->type,
> ARRAY_SIZE(madvise_funcs));
> +
> + /* Ensure the madvise function exists for this type */
> + if (!madvise_funcs[attr_type]) {
> + err = -EINVAL;
> + goto err_fini;
> + }
> +
> madvise_funcs[attr_type](xe, vm, madvise_range.vmas,
> madvise_range.num_vmas, args,
> &details);
>
> @@ -505,6 +637,10 @@ int xe_vm_madvise_ioctl(struct drm_device *dev,
> void *data, struct drm_file *fil
> xe_madvise_details_fini(&details);
> unlock_vm:
> up_write(&vm->lock);
> +
> + /* Write retained value to user after releasing all locks */
> + if (!err && do_retained)
> + err =
> xe_madvise_purgeable_retained_to_user(&details);
> put_vm:
> xe_vm_put(vm);
> return err;
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v8 04/12] drm/xe/bo: Block CPU faults to purgeable buffer objects
2026-03-26 5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
` (2 preceding siblings ...)
2026-03-26 5:51 ` [PATCH v8 03/12] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
@ 2026-03-26 5:51 ` Arvind Yadav
2026-03-26 5:51 ` [PATCH v8 05/12] drm/xe/vm: Prevent binding of purged " Arvind Yadav
` (7 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26 5:51 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom
Block CPU page faults to buffer objects marked as purgeable (DONTNEED)
or already purged. Once a BO is marked DONTNEED, its contents can be
discarded by the kernel at any time, making access undefined behavior.
Return VM_FAULT_SIGBUS immediately to fail consistently instead of
allowing erratic behavior where access sometimes works (if not yet
purged) and sometimes fails (if purged).
For DONTNEED BOs:
- Block new CPU faults with SIGBUS to prevent undefined behavior.
- Existing CPU PTEs may still work until TLB flush, but new faults
fail immediately.
For PURGED BOs:
- Backing store has been reclaimed, making CPU access invalid.
- Without this check, accessing existing mmap mappings would trigger
xe_bo_fault_migrate() on freed backing store, causing kernel hangs
or crashes.
The purgeable check is added to both CPU fault paths:
- Fastpath (xe_bo_cpu_fault_fastpath): Returns VM_FAULT_SIGBUS immediately
under dma-resv lock, preventing attempts to migrate/validate
DONTNEED/purged pages.
- Slowpath (xe_bo_cpu_fault): Returns -EFAULT under drm_exec lock,
converted to VM_FAULT_SIGBUS.
This matches i915 semantics for purged buffer handling.
v2:
- Added xe_bo_is_purged(bo) instead of atomic_read.
- Avoids leaks and keeps drm_dev_exit() while returning.
v3:
- Move xe_bo_is_purged check under a dma-resv lock (Matthew Brost)
v4:
- Add purged check to fastpath (xe_bo_cpu_fault_fastpath) to prevent
hang when accessing existing mmap of purged BO.
v6:
- Block CPU faults to DONTNEED BOs with VM_FAULT_SIGBUS. (Thomas, Matt)
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
drivers/gpu/drm/xe/xe_bo.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index b6055bb4c578..da18b43650e3 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1979,6 +1979,16 @@ static vm_fault_t xe_bo_cpu_fault_fastpath(struct vm_fault *vmf, struct xe_devic
if (!dma_resv_trylock(tbo->base.resv))
goto out_validation;
+ /*
+ * Reject CPU faults to purgeable BOs. DONTNEED BOs can be purged
+ * at any time, and purged BOs have no backing store. Either case
+ * is undefined behavior for CPU access.
+ */
+ if (xe_bo_madv_is_dontneed(bo) || xe_bo_is_purged(bo)) {
+ ret = VM_FAULT_SIGBUS;
+ goto out_unlock;
+ }
+
if (xe_ttm_bo_is_imported(tbo)) {
ret = VM_FAULT_SIGBUS;
drm_dbg(&xe->drm, "CPU trying to access an imported buffer object.\n");
@@ -2069,6 +2079,15 @@ static vm_fault_t xe_bo_cpu_fault(struct vm_fault *vmf)
if (err)
break;
+ /*
+ * Reject CPU faults to purgeable BOs. DONTNEED BOs can be
+ * purged at any time, and purged BOs have no backing store.
+ */
+ if (xe_bo_madv_is_dontneed(bo) || xe_bo_is_purged(bo)) {
+ err = -EFAULT;
+ break;
+ }
+
if (xe_ttm_bo_is_imported(tbo)) {
err = -EFAULT;
drm_dbg(&xe->drm, "CPU trying to access an imported buffer object.\n");
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH v8 05/12] drm/xe/vm: Prevent binding of purged buffer objects
2026-03-26 5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
` (3 preceding siblings ...)
2026-03-26 5:51 ` [PATCH v8 04/12] drm/xe/bo: Block CPU faults to purgeable buffer objects Arvind Yadav
@ 2026-03-26 5:51 ` Arvind Yadav
2026-03-26 5:51 ` [PATCH v8 06/12] drm/xe/madvise: Implement per-VMA purgeable state tracking Arvind Yadav
` (6 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26 5:51 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom
Add purge checking to vma_lock_and_validate() to block new mapping
operations on purged BOs while allowing cleanup operations to proceed.
Purged BOs have their backing pages freed by the kernel. New
mapping operations (MAP, PREFETCH, REMAP) must be rejected with
-EINVAL to prevent GPU access to invalid memory. Cleanup
operations (UNMAP) must be allowed so applications can release
resources after detecting purge via the retained field.
REMAP operations require mixed handling - reject new prev/next
VMAs if the BO is purged, but allow the unmap portion to proceed
for cleanup.
The check_purged flag in struct xe_vma_lock_and_validate_flags
distinguishes between these cases: true for new mappings (must reject),
false for cleanup (allow).
v2:
- Clarify that purged BOs are permanently invalid (i915 semantics)
- Remove incorrect claim about madvise(WILLNEED) restoring purged BOs
v3:
- Move xe_bo_is_purged check under vma_lock_and_validate (Matt)
- Add check_purged parameter to distinguish new mappings from cleanup
- Allow UNMAP operations to prevent resource leaks
- Handle REMAP operation's dual nature (cleanup + new mappings)
v5:
- Replace three boolean parameters with struct xe_vma_lock_and_validate_flags
to improve readability and prevent argument transposition (Matt)
- Use u32 bitfields instead of bool members to match xe_bo_shrink_flags
pattern - more efficient packing and follows xe driver conventions (Thomas)
- Pass struct as const since flags are read-only (Matt)
v6:
- Block VM_BIND to DONTNEED BOs with -EBUSY (Thomas, Matt)
v7:
- Pass xe_vma_lock_and_validate_flags by value instead of by
pointer, consistent with xe driver style. (Thomas)
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
drivers/gpu/drm/xe/xe_vm.c | 82 ++++++++++++++++++++++++++++++++------
1 file changed, 69 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index a0ade67d616e..9c1a82b64a43 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2918,8 +2918,22 @@ static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm,
}
}
+/**
+ * struct xe_vma_lock_and_validate_flags - Flags for vma_lock_and_validate()
+ * @res_evict: Allow evicting resources during validation
+ * @validate: Perform BO validation
+ * @request_decompress: Request BO decompression
+ * @check_purged: Reject operation if BO is purged
+ */
+struct xe_vma_lock_and_validate_flags {
+ u32 res_evict : 1;
+ u32 validate : 1;
+ u32 request_decompress : 1;
+ u32 check_purged : 1;
+};
+
static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma,
- bool res_evict, bool validate, bool request_decompress)
+ struct xe_vma_lock_and_validate_flags flags)
{
struct xe_bo *bo = xe_vma_bo(vma);
struct xe_vm *vm = xe_vma_vm(vma);
@@ -2928,15 +2942,24 @@ static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma,
if (bo) {
if (!bo->vm)
err = drm_exec_lock_obj(exec, &bo->ttm.base);
- if (!err && validate)
+
+ /* Reject new mappings to DONTNEED/purged BOs; allow cleanup operations */
+ if (!err && flags.check_purged) {
+ if (xe_bo_madv_is_dontneed(bo))
+ err = -EBUSY; /* BO marked purgeable */
+ else if (xe_bo_is_purged(bo))
+ err = -EINVAL; /* BO already purged */
+ }
+
+ if (!err && flags.validate)
err = xe_bo_validate(bo, vm,
xe_vm_allow_vm_eviction(vm) &&
- res_evict, exec);
+ flags.res_evict, exec);
if (err)
return err;
- if (request_decompress)
+ if (flags.request_decompress)
err = xe_bo_decompress(bo);
}
@@ -3030,10 +3053,13 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
case DRM_GPUVA_OP_MAP:
if (!op->map.invalidate_on_bind)
err = vma_lock_and_validate(exec, op->map.vma,
- res_evict,
- !xe_vm_in_fault_mode(vm) ||
- op->map.immediate,
- op->map.request_decompress);
+ (struct xe_vma_lock_and_validate_flags) {
+ .res_evict = res_evict,
+ .validate = !xe_vm_in_fault_mode(vm) ||
+ op->map.immediate,
+ .request_decompress = op->map.request_decompress,
+ .check_purged = true,
+ });
break;
case DRM_GPUVA_OP_REMAP:
err = check_ufence(gpuva_to_vma(op->base.remap.unmap->va));
@@ -3042,13 +3068,28 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
err = vma_lock_and_validate(exec,
gpuva_to_vma(op->base.remap.unmap->va),
- res_evict, false, false);
+ (struct xe_vma_lock_and_validate_flags) {
+ .res_evict = res_evict,
+ .validate = false,
+ .request_decompress = false,
+ .check_purged = false,
+ });
if (!err && op->remap.prev)
err = vma_lock_and_validate(exec, op->remap.prev,
- res_evict, true, false);
+ (struct xe_vma_lock_and_validate_flags) {
+ .res_evict = res_evict,
+ .validate = true,
+ .request_decompress = false,
+ .check_purged = true,
+ });
if (!err && op->remap.next)
err = vma_lock_and_validate(exec, op->remap.next,
- res_evict, true, false);
+ (struct xe_vma_lock_and_validate_flags) {
+ .res_evict = res_evict,
+ .validate = true,
+ .request_decompress = false,
+ .check_purged = true,
+ });
break;
case DRM_GPUVA_OP_UNMAP:
err = check_ufence(gpuva_to_vma(op->base.unmap.va));
@@ -3057,7 +3098,12 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
err = vma_lock_and_validate(exec,
gpuva_to_vma(op->base.unmap.va),
- res_evict, false, false);
+ (struct xe_vma_lock_and_validate_flags) {
+ .res_evict = res_evict,
+ .validate = false,
+ .request_decompress = false,
+ .check_purged = false,
+ });
break;
case DRM_GPUVA_OP_PREFETCH:
{
@@ -3070,9 +3116,19 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
region <= ARRAY_SIZE(region_to_mem_type));
}
+ /*
+ * Prefetch attempts to migrate BO's backing store without
+ * repopulating it first. Purged BOs have no backing store
+ * to migrate, so reject the operation.
+ */
err = vma_lock_and_validate(exec,
gpuva_to_vma(op->base.prefetch.va),
- res_evict, false, false);
+ (struct xe_vma_lock_and_validate_flags) {
+ .res_evict = res_evict,
+ .validate = false,
+ .request_decompress = false,
+ .check_purged = true,
+ });
if (!err && !xe_vma_has_no_bo(vma))
err = xe_bo_migrate(xe_vma_bo(vma),
region_to_mem_type[region],
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH v8 06/12] drm/xe/madvise: Implement per-VMA purgeable state tracking
2026-03-26 5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
` (4 preceding siblings ...)
2026-03-26 5:51 ` [PATCH v8 05/12] drm/xe/vm: Prevent binding of purged " Arvind Yadav
@ 2026-03-26 5:51 ` Arvind Yadav
2026-03-26 5:51 ` [PATCH v8 07/12] drm/xe/madvise: Block imported and exported dma-bufs Arvind Yadav
` (5 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26 5:51 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom
Track purgeable state per-VMA instead of using a coarse shared
BO check. This prevents purging shared BOs until all VMAs across
all VMs are marked DONTNEED.
Add xe_bo_all_vmas_dontneed() to check all VMAs before marking
a BO purgeable. Add xe_bo_recheck_purgeable_on_vma_unbind() to
handle state transitions when VMAs are destroyed - if all
remaining VMAs are DONTNEED the BO can become purgeable, or if
no VMAs remain it transitions to WILLNEED.
The per-VMA purgeable_state field stores the madvise hint for
each mapping. Shared BOs can only be purged when all VMAs
unanimously indicate DONTNEED.
This prevents the bug where unmapping the last VMA would incorrectly flip
a DONTNEED BO back to WILLNEED. The enum-based state check preserves BO
state when no VMAs remain, only updating when VMAs provide explicit hints.
v3:
- This addresses Thomas Hellström's feedback: "loop over all vmas
attached to the bo and check that they all say WONTNEED. This will
also need a check at VMA unbinding"
v4:
- @madv_purgeable atomic_t → u32 change across all relevant
patches (Matt)
v5:
- Call xe_bo_recheck_purgeable_on_vma_unbind() from xe_vma_destroy()
right after drm_gpuva_unlink() where we already hold the BO lock,
drop the trylock-based late destroy path (Matt)
- Move purgeable_state into xe_vma_mem_attr with the other madvise
attributes (Matt)
- Drop READ_ONCE since the BO lock already protects us (Matt)
- Keep returning false when there are no VMAs - otherwise we'd mark
BOs purgeable without any user hint (Matt)
- Use xe_bo_set_purgeable_state() instead of direct initialization(Matt)
- use xe_assert instead of drm_warn (Thomas)
v6:
- Fix state transition bug: don't flip DONTNEED → WILLNEED when last
VMA unmapped (Matt)
- Change xe_bo_all_vmas_dontneed() from bool to enum to distinguish
"no VMAs" from "has WILLNEED VMA" (Matt)
- Preserve BO state on NO_VMAS instead of forcing WILLNEED.
- Set skip_invalidation explicitly in madvise_purgeable() to ensure
DONTNEED always zaps GPU PTEs regardless of prior madvise state.
v7:
- Don't zap PTEs at DONTNEED time -- pages are still alive.
The zap happens in xe_bo_move_notify() right before the shrinker
frees them.
- Simplify xe_bo_recompute_purgeable_state() by relying on the
intentional value alignment between xe_bo_vmas_purge_state and
xe_madv_purgeable_state enums. Add static_assert to enforce the
alignment. (Thomas)
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
drivers/gpu/drm/xe/xe_svm.c | 1 +
drivers/gpu/drm/xe/xe_vm.c | 9 +-
drivers/gpu/drm/xe/xe_vm_madvise.c | 136 +++++++++++++++++++++++++++--
drivers/gpu/drm/xe/xe_vm_madvise.h | 3 +
drivers/gpu/drm/xe/xe_vm_types.h | 11 +++
5 files changed, 153 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index a91c84487a67..062ef77e283f 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -322,6 +322,7 @@ static void xe_vma_set_default_attributes(struct xe_vma *vma)
.preferred_loc.migration_policy = DRM_XE_MIGRATE_ALL_PAGES,
.pat_index = vma->attr.default_pat_index,
.atomic_access = DRM_XE_ATOMIC_UNDEFINED,
+ .purgeable_state = XE_MADV_PURGEABLE_WILLNEED,
};
xe_vma_mem_attr_copy(&vma->attr, &default_attr);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 9c1a82b64a43..07393540f34c 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -39,6 +39,7 @@
#include "xe_tile.h"
#include "xe_tlb_inval.h"
#include "xe_trace_bo.h"
+#include "xe_vm_madvise.h"
#include "xe_wa.h"
static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
@@ -1085,6 +1086,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
static void xe_vma_destroy_late(struct xe_vma *vma)
{
struct xe_vm *vm = xe_vma_vm(vma);
+ struct xe_bo *bo = xe_vma_bo(vma);
if (vma->ufence) {
xe_sync_ufence_put(vma->ufence);
@@ -1099,7 +1101,7 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
} else if (xe_vma_is_null(vma) || xe_vma_is_cpu_addr_mirror(vma)) {
xe_vm_put(vm);
} else {
- xe_bo_put(xe_vma_bo(vma));
+ xe_bo_put(bo);
}
xe_vma_free(vma);
@@ -1125,6 +1127,7 @@ static void vma_destroy_cb(struct dma_fence *fence,
static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
{
struct xe_vm *vm = xe_vma_vm(vma);
+ struct xe_bo *bo = xe_vma_bo(vma);
lockdep_assert_held_write(&vm->lock);
xe_assert(vm->xe, list_empty(&vma->combined_links.destroy));
@@ -1133,9 +1136,10 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
xe_assert(vm->xe, vma->gpuva.flags & XE_VMA_DESTROYED);
xe_userptr_destroy(to_userptr_vma(vma));
} else if (!xe_vma_is_null(vma) && !xe_vma_is_cpu_addr_mirror(vma)) {
- xe_bo_assert_held(xe_vma_bo(vma));
+ xe_bo_assert_held(bo);
drm_gpuva_unlink(&vma->gpuva);
+ xe_bo_recompute_purgeable_state(bo);
}
xe_vm_assert_held(vm);
@@ -2692,6 +2696,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
.atomic_access = DRM_XE_ATOMIC_UNDEFINED,
.default_pat_index = op->map.pat_index,
.pat_index = op->map.pat_index,
+ .purgeable_state = XE_MADV_PURGEABLE_WILLNEED,
};
flags |= op->map.vma_flags & XE_VMA_CREATE_MASK;
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 881de6cb6c11..ed1940da7739 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -13,6 +13,7 @@
#include "xe_pt.h"
#include "xe_svm.h"
#include "xe_tlb_inval.h"
+#include "xe_vm.h"
struct xe_vmas_in_madvise_range {
u64 addr;
@@ -184,6 +185,116 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
}
}
+/**
+ * enum xe_bo_vmas_purge_state - VMA purgeable state aggregation
+ *
+ * Distinguishes whether a BO's VMAs are all DONTNEED, have at least
+ * one WILLNEED, or have no VMAs at all.
+ *
+ * Enum values align with XE_MADV_PURGEABLE_* states for consistency.
+ */
+enum xe_bo_vmas_purge_state {
+ /** @XE_BO_VMAS_STATE_WILLNEED: At least one VMA is WILLNEED */
+ XE_BO_VMAS_STATE_WILLNEED = 0,
+ /** @XE_BO_VMAS_STATE_DONTNEED: All VMAs are DONTNEED */
+ XE_BO_VMAS_STATE_DONTNEED = 1,
+ /** @XE_BO_VMAS_STATE_NO_VMAS: BO has no VMAs */
+ XE_BO_VMAS_STATE_NO_VMAS = 2,
+};
+
+/*
+ * xe_bo_recompute_purgeable_state() casts between xe_bo_vmas_purge_state and
+ * xe_madv_purgeable_state. Enforce that WILLNEED=0 and DONTNEED=1 match across
+ * both enums so the single-line cast is always valid.
+ */
+static_assert(XE_BO_VMAS_STATE_WILLNEED == (int)XE_MADV_PURGEABLE_WILLNEED,
+ "VMA purge state WILLNEED must equal madv purgeable WILLNEED");
+static_assert(XE_BO_VMAS_STATE_DONTNEED == (int)XE_MADV_PURGEABLE_DONTNEED,
+ "VMA purge state DONTNEED must equal madv purgeable DONTNEED");
+
+/**
+ * xe_bo_all_vmas_dontneed() - Determine BO VMA purgeable state
+ * @bo: Buffer object
+ *
+ * Check all VMAs across all VMs to determine aggregate purgeable state.
+ * Shared BOs require unanimous DONTNEED state from all mappings.
+ *
+ * Caller must hold BO dma-resv lock.
+ *
+ * Return: XE_BO_VMAS_STATE_DONTNEED if all VMAs are DONTNEED,
+ * XE_BO_VMAS_STATE_WILLNEED if at least one VMA is not DONTNEED,
+ * XE_BO_VMAS_STATE_NO_VMAS if BO has no VMAs
+ */
+static enum xe_bo_vmas_purge_state xe_bo_all_vmas_dontneed(struct xe_bo *bo)
+{
+ struct drm_gpuvm_bo *vm_bo;
+ struct drm_gpuva *gpuva;
+ struct drm_gem_object *obj = &bo->ttm.base;
+ bool has_vmas = false;
+
+ xe_bo_assert_held(bo);
+
+ drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
+ drm_gpuvm_bo_for_each_va(gpuva, vm_bo) {
+ struct xe_vma *vma = gpuva_to_vma(gpuva);
+
+ has_vmas = true;
+
+ /* Any non-DONTNEED VMA prevents purging */
+ if (vma->attr.purgeable_state != XE_MADV_PURGEABLE_DONTNEED)
+ return XE_BO_VMAS_STATE_WILLNEED;
+ }
+ }
+
+ /*
+ * No VMAs => preserve existing BO purgeable state.
+ * Avoids incorrectly flipping DONTNEED -> WILLNEED when last VMA unmapped.
+ */
+ if (!has_vmas)
+ return XE_BO_VMAS_STATE_NO_VMAS;
+
+ return XE_BO_VMAS_STATE_DONTNEED;
+}
+
+/**
+ * xe_bo_recompute_purgeable_state() - Recompute BO purgeable state from VMAs
+ * @bo: Buffer object
+ *
+ * Walk all VMAs to determine if BO should be purgeable or not.
+ * Shared BOs require unanimous DONTNEED state from all mappings.
+ * If the BO has no VMAs the existing state is preserved.
+ *
+ * Locking: Caller must hold BO dma-resv lock. When iterating GPUVM lists,
+ * VM lock must also be held (write) to prevent concurrent VMA modifications.
+ * This is satisfied at both call sites:
+ * - xe_vma_destroy(): holds vm->lock write
+ * - madvise_purgeable(): holds vm->lock write (from madvise ioctl path)
+ *
+ * Return: nothing
+ */
+void xe_bo_recompute_purgeable_state(struct xe_bo *bo)
+{
+ enum xe_bo_vmas_purge_state vma_state;
+
+ if (!bo)
+ return;
+
+ xe_bo_assert_held(bo);
+
+ /*
+ * Once purged, always purged. Cannot transition back to WILLNEED.
+ * This matches i915 semantics where purged BOs are permanently invalid.
+ */
+ if (bo->madv_purgeable == XE_MADV_PURGEABLE_PURGED)
+ return;
+
+ vma_state = xe_bo_all_vmas_dontneed(bo);
+
+ if (vma_state != (enum xe_bo_vmas_purge_state)bo->madv_purgeable &&
+ vma_state != XE_BO_VMAS_STATE_NO_VMAS)
+ xe_bo_set_purgeable_state(bo, (enum xe_madv_purgeable_state)vma_state);
+}
+
/**
* madvise_purgeable - Handle purgeable buffer object advice
* @xe: XE device
@@ -215,8 +326,11 @@ static void __maybe_unused madvise_purgeable(struct xe_device *xe,
for (i = 0; i < num_vmas; i++) {
struct xe_bo *bo = xe_vma_bo(vmas[i]);
- if (!bo)
+ if (!bo) {
+ /* Purgeable state applies to BOs only, skip non-BO VMAs */
+ vmas[i]->skip_invalidation = true;
continue;
+ }
/* BO must be locked before modifying madv state */
xe_bo_assert_held(bo);
@@ -227,19 +341,31 @@ static void __maybe_unused madvise_purgeable(struct xe_device *xe,
*/
if (xe_bo_is_purged(bo)) {
details->has_purged_bo = true;
+ vmas[i]->skip_invalidation = true;
continue;
}
switch (op->purge_state_val.val) {
case DRM_XE_VMA_PURGEABLE_STATE_WILLNEED:
- xe_bo_set_purgeable_state(bo, XE_MADV_PURGEABLE_WILLNEED);
+ vmas[i]->attr.purgeable_state = XE_MADV_PURGEABLE_WILLNEED;
+ vmas[i]->skip_invalidation = true;
+
+ xe_bo_recompute_purgeable_state(bo);
break;
case DRM_XE_VMA_PURGEABLE_STATE_DONTNEED:
- xe_bo_set_purgeable_state(bo, XE_MADV_PURGEABLE_DONTNEED);
+ vmas[i]->attr.purgeable_state = XE_MADV_PURGEABLE_DONTNEED;
+ /*
+ * Don't zap PTEs at DONTNEED time -- pages are still
+ * alive. The zap happens in xe_bo_move_notify() right
+ * before the shrinker frees them.
+ */
+ vmas[i]->skip_invalidation = true;
+
+ xe_bo_recompute_purgeable_state(bo);
break;
default:
- drm_warn(&vm->xe->drm, "Invalid madvise value = %d\n",
- op->purge_state_val.val);
+ /* Should never hit - values validated in madvise_args_are_sane() */
+ xe_assert(vm->xe, 0);
return;
}
}
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.h b/drivers/gpu/drm/xe/xe_vm_madvise.h
index b0e1fc445f23..39acd2689ca0 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.h
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.h
@@ -8,8 +8,11 @@
struct drm_device;
struct drm_file;
+struct xe_bo;
int xe_vm_madvise_ioctl(struct drm_device *dev, void *data,
struct drm_file *file);
+void xe_bo_recompute_purgeable_state(struct xe_bo *bo);
+
#endif
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 69e80c94138a..033cfdd56c95 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -95,6 +95,17 @@ struct xe_vma_mem_attr {
* same as default_pat_index unless overwritten by madvise.
*/
u16 pat_index;
+
+ /**
+ * @purgeable_state: Purgeable hint for this VMA mapping
+ *
+ * Per-VMA purgeable state from madvise. Valid states are WILLNEED (0)
+ * or DONTNEED (1). Shared BOs require all VMAs to be DONTNEED before
+ * the BO can be purged. PURGED state exists only at BO level.
+ *
+ * Protected by BO dma-resv lock. Set via DRM_IOCTL_XE_MADVISE.
+ */
+ u32 purgeable_state;
};
struct xe_vma {
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH v8 07/12] drm/xe/madvise: Block imported and exported dma-bufs
2026-03-26 5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
` (5 preceding siblings ...)
2026-03-26 5:51 ` [PATCH v8 06/12] drm/xe/madvise: Implement per-VMA purgeable state tracking Arvind Yadav
@ 2026-03-26 5:51 ` Arvind Yadav
2026-03-26 5:51 ` [PATCH v8 08/12] drm/xe/bo: Block mmap of DONTNEED/purged BOs Arvind Yadav
` (4 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26 5:51 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom
Prevent marking imported or exported dma-bufs as purgeable.
External devices may be accessing these buffers without our
knowledge, making purging unsafe.
Check drm_gem_is_imported() for buffers created by other
drivers and obj->dma_buf for buffers exported to other
drivers. Silently skip these BOs during madvise processing.
This follows drm_gem_shmem's purgeable implementation and
prevents data corruption from purging actively-used shared
buffers.
v3:
- Addresses review feedback from Matt Roper about handling
imported/exported BOs correctly in the purgeable BO
implementation.
v4:
- Check should be add to xe_vm_madvise_purgeable_bo.
v5:
- Rename xe_bo_is_external_dmabuf() to xe_bo_is_dmabuf_shared()
for clarity (Thomas)
- Update comments to clarify why both imports and exports
are unsafe to purge.
v6:
- No PTEs to zap for shared dma-bufs.
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
drivers/gpu/drm/xe/xe_vm_madvise.c | 38 ++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index ed1940da7739..340e83764a76 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -185,6 +185,34 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
}
}
+
+/**
+ * xe_bo_is_dmabuf_shared() - Check if BO is shared via dma-buf
+ * @bo: Buffer object
+ *
+ * Prevent marking imported or exported dma-bufs as purgeable.
+ * For imported BOs, Xe doesn't own the backing store and cannot
+ * safely reclaim pages (exporter or other devices may still be
+ * using them). For exported BOs, external devices may have active
+ * mappings we cannot track.
+ *
+ * Return: true if BO is imported or exported, false otherwise
+ */
+static bool xe_bo_is_dmabuf_shared(struct xe_bo *bo)
+{
+ struct drm_gem_object *obj = &bo->ttm.base;
+
+ /* Imported: exporter owns backing store */
+ if (drm_gem_is_imported(obj))
+ return true;
+
+ /* Exported: external devices may be accessing */
+ if (obj->dma_buf)
+ return true;
+
+ return false;
+}
+
/**
* enum xe_bo_vmas_purge_state - VMA purgeable state aggregation
*
@@ -234,6 +262,10 @@ static enum xe_bo_vmas_purge_state xe_bo_all_vmas_dontneed(struct xe_bo *bo)
xe_bo_assert_held(bo);
+ /* Shared dma-bufs cannot be purgeable */
+ if (xe_bo_is_dmabuf_shared(bo))
+ return XE_BO_VMAS_STATE_WILLNEED;
+
drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
drm_gpuvm_bo_for_each_va(gpuva, vm_bo) {
struct xe_vma *vma = gpuva_to_vma(gpuva);
@@ -335,6 +367,12 @@ static void __maybe_unused madvise_purgeable(struct xe_device *xe,
/* BO must be locked before modifying madv state */
xe_bo_assert_held(bo);
+ /* Skip shared dma-bufs - no PTEs to zap */
+ if (xe_bo_is_dmabuf_shared(bo)) {
+ vmas[i]->skip_invalidation = true;
+ continue;
+ }
+
/*
* Once purged, always purged. Cannot transition back to WILLNEED.
* This matches i915 semantics where purged BOs are permanently invalid.
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH v8 08/12] drm/xe/bo: Block mmap of DONTNEED/purged BOs
2026-03-26 5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
` (6 preceding siblings ...)
2026-03-26 5:51 ` [PATCH v8 07/12] drm/xe/madvise: Block imported and exported dma-bufs Arvind Yadav
@ 2026-03-26 5:51 ` Arvind Yadav
2026-03-26 7:41 ` Matthew Brost
2026-03-26 5:51 ` [PATCH v8 09/12] drm/xe/dma_buf: Block export " Arvind Yadav
` (3 subsequent siblings)
11 siblings, 1 reply; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26 5:51 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom
Don't allow new CPU mmaps to BOs marked DONTNEED or PURGED.
DONTNEED BOs can have their contents discarded at any time, making
CPU access undefined behavior. PURGED BOs have no backing store and
are permanently invalid.
Return -EBUSY for DONTNEED BOs (temporary purgeable state) and
-EINVAL for purged BOs (permanent, no backing store).
The mmap offset ioctl now checks the BO's purgeable state before
allowing userspace to establish a new CPU mapping. This prevents
the race where userspace gets a valid offset but the BO is purged
before actual faulting begins.
Existing mmaps (established before DONTNEED) may still work until
pages are purged, at which point CPU faults fail with SIGBUS.
v6:
- Split DONTNEED → -EBUSY and PURGED → -EINVAL for consistency
with the rest of the series (Thomas, Matt)
v7:
- Move purgeable check from xe_gem_mmap_offset_ioctl() into a new
xe_gem_object_mmap() callback that wraps drm_gem_ttm_mmap(). (Thomas)
- Use an interruptible lock. (Thomas)
v8:
- Check xe_bo_lock() return value and propagate error. (Thomas and
Matt)
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
drivers/gpu/drm/xe/xe_bo.c | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index da18b43650e3..c8e3a3fd4880 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -2165,10 +2165,35 @@ static const struct vm_operations_struct xe_gem_vm_ops = {
.access = xe_bo_vm_access,
};
+static int xe_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
+{
+ struct xe_bo *bo = gem_to_xe_bo(obj);
+ int err = 0;
+
+ /*
+ * Reject mmap of purgeable BOs. DONTNEED BOs can be purged
+ * at any time, making CPU access undefined behavior. Purged BOs have
+ * no backing store and are permanently invalid.
+ */
+ err = xe_bo_lock(bo, true);
+ if (err)
+ return err;
+
+ if (xe_bo_madv_is_dontneed(bo))
+ err = -EBUSY;
+ else if (xe_bo_is_purged(bo))
+ err = -EINVAL;
+ xe_bo_unlock(bo);
+ if (err)
+ return err;
+
+ return drm_gem_ttm_mmap(obj, vma);
+}
+
static const struct drm_gem_object_funcs xe_gem_object_funcs = {
.free = xe_gem_object_free,
.close = xe_gem_object_close,
- .mmap = drm_gem_ttm_mmap,
+ .mmap = xe_gem_object_mmap,
.export = xe_gem_prime_export,
.vm_ops = &xe_gem_vm_ops,
};
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH v8 08/12] drm/xe/bo: Block mmap of DONTNEED/purged BOs
2026-03-26 5:51 ` [PATCH v8 08/12] drm/xe/bo: Block mmap of DONTNEED/purged BOs Arvind Yadav
@ 2026-03-26 7:41 ` Matthew Brost
0 siblings, 0 replies; 16+ messages in thread
From: Matthew Brost @ 2026-03-26 7:41 UTC (permalink / raw)
To: Arvind Yadav; +Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom
On Thu, Mar 26, 2026 at 11:21:07AM +0530, Arvind Yadav wrote:
> Don't allow new CPU mmaps to BOs marked DONTNEED or PURGED.
> DONTNEED BOs can have their contents discarded at any time, making
> CPU access undefined behavior. PURGED BOs have no backing store and
> are permanently invalid.
>
> Return -EBUSY for DONTNEED BOs (temporary purgeable state) and
> -EINVAL for purged BOs (permanent, no backing store).
>
> The mmap offset ioctl now checks the BO's purgeable state before
> allowing userspace to establish a new CPU mapping. This prevents
> the race where userspace gets a valid offset but the BO is purged
> before actual faulting begins.
>
> Existing mmaps (established before DONTNEED) may still work until
> pages are purged, at which point CPU faults fail with SIGBUS.
>
> v6:
> - Split DONTNEED → -EBUSY and PURGED → -EINVAL for consistency
> with the rest of the series (Thomas, Matt)
>
> v7:
> - Move purgeable check from xe_gem_mmap_offset_ioctl() into a new
> xe_gem_object_mmap() callback that wraps drm_gem_ttm_mmap(). (Thomas)
> - Use an interruptible lock. (Thomas)
>
> v8:
> - Check xe_bo_lock() return value and propagate error. (Thomas and
> Matt)
>
> Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
> drivers/gpu/drm/xe/xe_bo.c | 27 ++++++++++++++++++++++++++-
> 1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index da18b43650e3..c8e3a3fd4880 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -2165,10 +2165,35 @@ static const struct vm_operations_struct xe_gem_vm_ops = {
> .access = xe_bo_vm_access,
> };
>
> +static int xe_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
> +{
> + struct xe_bo *bo = gem_to_xe_bo(obj);
> + int err = 0;
> +
> + /*
> + * Reject mmap of purgeable BOs. DONTNEED BOs can be purged
> + * at any time, making CPU access undefined behavior. Purged BOs have
> + * no backing store and are permanently invalid.
> + */
> + err = xe_bo_lock(bo, true);
> + if (err)
> + return err;
> +
> + if (xe_bo_madv_is_dontneed(bo))
> + err = -EBUSY;
> + else if (xe_bo_is_purged(bo))
> + err = -EINVAL;
> + xe_bo_unlock(bo);
> + if (err)
> + return err;
> +
> + return drm_gem_ttm_mmap(obj, vma);
> +}
> +
> static const struct drm_gem_object_funcs xe_gem_object_funcs = {
> .free = xe_gem_object_free,
> .close = xe_gem_object_close,
> - .mmap = drm_gem_ttm_mmap,
> + .mmap = xe_gem_object_mmap,
> .export = xe_gem_prime_export,
> .vm_ops = &xe_gem_vm_ops,
> };
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v8 09/12] drm/xe/dma_buf: Block export of DONTNEED/purged BOs
2026-03-26 5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
` (7 preceding siblings ...)
2026-03-26 5:51 ` [PATCH v8 08/12] drm/xe/bo: Block mmap of DONTNEED/purged BOs Arvind Yadav
@ 2026-03-26 5:51 ` Arvind Yadav
2026-03-26 7:42 ` Matthew Brost
2026-03-26 5:51 ` [PATCH v8 10/12] drm/xe/bo: Add purgeable shrinker state helpers Arvind Yadav
` (2 subsequent siblings)
11 siblings, 1 reply; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26 5:51 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom
Don't allow exporting BOs marked DONTNEED or PURGED as dma-bufs.
DONTNEED BOs can have their contents discarded at any time, making
the exported dma-buf unusable for external devices. PURGED BOs have
no backing store and are permanently invalid.
Return -EBUSY for DONTNEED BOs (temporary purgeable state) and
-EINVAL for purged BOs (permanent, no backing store).
The export path now checks the BO's purgeable state before creating
the dma-buf, preventing external devices from accessing memory that
may be purged at any time.
v6:
- Split DONTNEED → -EBUSY and PURGED → -EINVAL for consistency
with the rest of the series (Thomas, Matt)
v7:
- Use Interruptible lock. (Thomas)
v8:
- Check xe_bo_lock() return value and propagate error. (Thomas)
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
drivers/gpu/drm/xe/xe_dma_buf.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index ea370cd373e9..7f9602b3363d 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -223,6 +223,26 @@ struct dma_buf *xe_gem_prime_export(struct drm_gem_object *obj, int flags)
if (bo->vm)
return ERR_PTR(-EPERM);
+ /*
+ * Reject exporting purgeable BOs. DONTNEED BOs can be purged
+ * at any time, making the exported dma-buf unusable. Purged BOs
+ * have no backing store and are permanently invalid.
+ */
+ ret = xe_bo_lock(bo, true);
+ if (ret)
+ return ERR_PTR(ret);
+
+ if (xe_bo_madv_is_dontneed(bo)) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+
+ if (xe_bo_is_purged(bo)) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+ xe_bo_unlock(bo);
+
ret = ttm_bo_setup_export(&bo->ttm, &ctx);
if (ret)
return ERR_PTR(ret);
@@ -232,6 +252,10 @@ struct dma_buf *xe_gem_prime_export(struct drm_gem_object *obj, int flags)
buf->ops = &xe_dmabuf_ops;
return buf;
+
+out_unlock:
+ xe_bo_unlock(bo);
+ return ERR_PTR(ret);
}
static struct drm_gem_object *
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH v8 09/12] drm/xe/dma_buf: Block export of DONTNEED/purged BOs
2026-03-26 5:51 ` [PATCH v8 09/12] drm/xe/dma_buf: Block export " Arvind Yadav
@ 2026-03-26 7:42 ` Matthew Brost
0 siblings, 0 replies; 16+ messages in thread
From: Matthew Brost @ 2026-03-26 7:42 UTC (permalink / raw)
To: Arvind Yadav; +Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom
On Thu, Mar 26, 2026 at 11:21:08AM +0530, Arvind Yadav wrote:
> Don't allow exporting BOs marked DONTNEED or PURGED as dma-bufs.
> DONTNEED BOs can have their contents discarded at any time, making
> the exported dma-buf unusable for external devices. PURGED BOs have
> no backing store and are permanently invalid.
>
> Return -EBUSY for DONTNEED BOs (temporary purgeable state) and
> -EINVAL for purged BOs (permanent, no backing store).
>
> The export path now checks the BO's purgeable state before creating
> the dma-buf, preventing external devices from accessing memory that
> may be purged at any time.
>
> v6:
> - Split DONTNEED → -EBUSY and PURGED → -EINVAL for consistency
> with the rest of the series (Thomas, Matt)
>
> v7:
> - Use Interruptible lock. (Thomas)
>
> v8:
> - Check xe_bo_lock() return value and propagate error. (Thomas)
>
> Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
> drivers/gpu/drm/xe/xe_dma_buf.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> index ea370cd373e9..7f9602b3363d 100644
> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> @@ -223,6 +223,26 @@ struct dma_buf *xe_gem_prime_export(struct drm_gem_object *obj, int flags)
> if (bo->vm)
> return ERR_PTR(-EPERM);
>
> + /*
> + * Reject exporting purgeable BOs. DONTNEED BOs can be purged
> + * at any time, making the exported dma-buf unusable. Purged BOs
> + * have no backing store and are permanently invalid.
> + */
> + ret = xe_bo_lock(bo, true);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + if (xe_bo_madv_is_dontneed(bo)) {
> + ret = -EBUSY;
> + goto out_unlock;
> + }
> +
> + if (xe_bo_is_purged(bo)) {
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> + xe_bo_unlock(bo);
> +
> ret = ttm_bo_setup_export(&bo->ttm, &ctx);
> if (ret)
> return ERR_PTR(ret);
> @@ -232,6 +252,10 @@ struct dma_buf *xe_gem_prime_export(struct drm_gem_object *obj, int flags)
> buf->ops = &xe_dmabuf_ops;
>
> return buf;
> +
> +out_unlock:
> + xe_bo_unlock(bo);
> + return ERR_PTR(ret);
> }
>
> static struct drm_gem_object *
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v8 10/12] drm/xe/bo: Add purgeable shrinker state helpers
2026-03-26 5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
` (8 preceding siblings ...)
2026-03-26 5:51 ` [PATCH v8 09/12] drm/xe/dma_buf: Block export " Arvind Yadav
@ 2026-03-26 5:51 ` Arvind Yadav
2026-03-26 5:51 ` [PATCH v8 11/12] drm/xe/madvise: Enable purgeable buffer object IOCTL support Arvind Yadav
2026-03-26 5:51 ` [PATCH v8 12/12] drm/xe/madvise: Accept canonical GPU addresses in xe_vm_madvise_ioctl Arvind Yadav
11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26 5:51 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom
Encapsulate TTM purgeable flag updates and shrinker page accounting
into helper functions to prevent desynchronization between the TTM
tt->purgeable flag and the shrinker's page bucket counters.
Without these helpers, direct manipulation of xe_ttm_tt->purgeable
risks forgetting to update the corresponding shrinker counters,
leading to incorrect memory pressure calculations.
Update purgeable BO state to PURGED after successful shrinker purge
for DONTNEED BOs.
v4:
- @madv_purgeable atomic_t → u32 change across all relevant
patches (Matt)
v5:
- Update purgeable BO state to PURGED after a successful shrinker
purge for DONTNEED BOs.
- Split ghost BO and zero-refcount handling in xe_bo_shrink() (Thomas)
v6:
- Create separate patch for 'Split ghost BO and zero-refcount
handling'. (Thomas)
v7:
- Merge xe_bo_set_purgeable_shrinker() and xe_bo_clear_purgeable_shrinker()
into a single static helper xe_bo_set_purgeable_shrinker(bo, new_state)
called automatically from xe_bo_set_purgeable_state(). Callers no longer
need to manage shrinker accounting separately. (Thomas)
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
drivers/gpu/drm/xe/xe_bo.c | 43 +++++++++++++++++++++++++++++++++++++-
1 file changed, 42 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index c8e3a3fd4880..0a3e66f9f18a 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -835,6 +835,42 @@ static int xe_bo_move_notify(struct xe_bo *bo,
return 0;
}
+/**
+ * xe_bo_set_purgeable_shrinker() - Update shrinker accounting for purgeable state
+ * @bo: Buffer object
+ * @new_state: New purgeable state being set
+ *
+ * Transfers pages between shrinkable and purgeable buckets when the BO
+ * purgeable state changes. Called automatically from xe_bo_set_purgeable_state().
+ */
+static void xe_bo_set_purgeable_shrinker(struct xe_bo *bo,
+ enum xe_madv_purgeable_state new_state)
+{
+ struct ttm_buffer_object *ttm_bo = &bo->ttm;
+ struct ttm_tt *tt = ttm_bo->ttm;
+ struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
+ struct xe_ttm_tt *xe_tt;
+ long tt_pages;
+
+ xe_bo_assert_held(bo);
+
+ if (!tt || !ttm_tt_is_populated(tt))
+ return;
+
+ xe_tt = container_of(tt, struct xe_ttm_tt, ttm);
+ tt_pages = tt->num_pages;
+
+ if (!xe_tt->purgeable && new_state == XE_MADV_PURGEABLE_DONTNEED) {
+ xe_tt->purgeable = true;
+ /* Transfer pages from shrinkable to purgeable count */
+ xe_shrinker_mod_pages(xe->mem.shrinker, -tt_pages, tt_pages);
+ } else if (xe_tt->purgeable && new_state == XE_MADV_PURGEABLE_WILLNEED) {
+ xe_tt->purgeable = false;
+ /* Transfer pages from purgeable to shrinkable count */
+ xe_shrinker_mod_pages(xe->mem.shrinker, tt_pages, -tt_pages);
+ }
+}
+
/**
* xe_bo_set_purgeable_state() - Set BO purgeable state with validation
* @bo: Buffer object
@@ -842,7 +878,8 @@ static int xe_bo_move_notify(struct xe_bo *bo,
*
* Sets the purgeable state with lockdep assertions and validates state
* transitions. Once a BO is PURGED, it cannot transition to any other state.
- * Invalid transitions are caught with xe_assert().
+ * Invalid transitions are caught with xe_assert(). Shrinker page accounting
+ * is updated automatically.
*/
void xe_bo_set_purgeable_state(struct xe_bo *bo,
enum xe_madv_purgeable_state new_state)
@@ -861,6 +898,7 @@ void xe_bo_set_purgeable_state(struct xe_bo *bo,
new_state != XE_MADV_PURGEABLE_PURGED));
bo->madv_purgeable = new_state;
+ xe_bo_set_purgeable_shrinker(bo, new_state);
}
/**
@@ -1243,6 +1281,9 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
lret = xe_bo_move_notify(xe_bo, ctx);
if (!lret)
lret = xe_bo_shrink_purge(ctx, bo, scanned);
+ if (lret > 0 && xe_bo_madv_is_dontneed(xe_bo))
+ xe_bo_set_purgeable_state(xe_bo,
+ XE_MADV_PURGEABLE_PURGED);
goto out_unref;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH v8 11/12] drm/xe/madvise: Enable purgeable buffer object IOCTL support
2026-03-26 5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
` (9 preceding siblings ...)
2026-03-26 5:51 ` [PATCH v8 10/12] drm/xe/bo: Add purgeable shrinker state helpers Arvind Yadav
@ 2026-03-26 5:51 ` Arvind Yadav
2026-03-26 5:51 ` [PATCH v8 12/12] drm/xe/madvise: Accept canonical GPU addresses in xe_vm_madvise_ioctl Arvind Yadav
11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26 5:51 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom
Hook the madvise_purgeable() handler into the madvise IOCTL now that all
supporting infrastructure is complete:
- Core purge implementation (patch 3)
- BO state tracking and helpers (patches 1-2)
- Per-VMA purgeable state tracking (patch 6)
- Shrinker integration for memory reclamation (patch 10)
This final patch enables userspace to use the DRM_XE_VMA_ATTR_PURGEABLE_STATE
madvise type to mark buffers as WILLNEED/DONTNEED and receive the retained
status indicating whether buffers were purged.
The feature was kept disabled in earlier patches to maintain bisectability
and ensure all components are in place before exposing to userspace.
Userspace can detect kernel support for purgeable BOs by checking the
DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT flag in the query_config
response.
v6:
- Add DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT for userspace
feature detection. (Jose)
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
drivers/gpu/drm/xe/xe_query.c | 2 ++
drivers/gpu/drm/xe/xe_vm_madvise.c | 22 +++++-----------------
2 files changed, 7 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c
index 4852fdcb4b95..d84d6a422c45 100644
--- a/drivers/gpu/drm/xe/xe_query.c
+++ b/drivers/gpu/drm/xe/xe_query.c
@@ -342,6 +342,8 @@ static int query_config(struct xe_device *xe, struct drm_xe_device_query *query)
DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY;
config->info[DRM_XE_QUERY_CONFIG_FLAGS] |=
DRM_XE_QUERY_CONFIG_FLAG_HAS_DISABLE_STATE_CACHE_PERF_FIX;
+ config->info[DRM_XE_QUERY_CONFIG_FLAGS] |=
+ DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT;
config->info[DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT] =
xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : SZ_4K;
config->info[DRM_XE_QUERY_CONFIG_VA_BITS] = xe->info.va_bits;
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 340e83764a76..4a19da5e86d4 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -338,18 +338,11 @@ void xe_bo_recompute_purgeable_state(struct xe_bo *bo)
*
* Handles DONTNEED/WILLNEED/PURGED states. Tracks if any BO was purged
* in details->has_purged_bo for later copy to userspace.
- *
- * Note: Marked __maybe_unused until hooked into madvise_funcs[] in the
- * final patch to maintain bisectability. The NULL placeholder in the
- * array ensures proper -EINVAL return for userspace until all supporting
- * infrastructure (shrinker, per-VMA tracking) is complete.
*/
-static void __maybe_unused madvise_purgeable(struct xe_device *xe,
- struct xe_vm *vm,
- struct xe_vma **vmas,
- int num_vmas,
- struct drm_xe_madvise *op,
- struct xe_madvise_details *details)
+static void madvise_purgeable(struct xe_device *xe, struct xe_vm *vm,
+ struct xe_vma **vmas, int num_vmas,
+ struct drm_xe_madvise *op,
+ struct xe_madvise_details *details)
{
int i;
@@ -418,12 +411,7 @@ static const madvise_func madvise_funcs[] = {
[DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC] = madvise_preferred_mem_loc,
[DRM_XE_MEM_RANGE_ATTR_ATOMIC] = madvise_atomic,
[DRM_XE_MEM_RANGE_ATTR_PAT] = madvise_pat_index,
- /*
- * Purgeable support implemented but not enabled yet to maintain
- * bisectability. Will be set to madvise_purgeable() in final patch
- * when all infrastructure (shrinker, VMA tracking) is complete.
- */
- [DRM_XE_VMA_ATTR_PURGEABLE_STATE] = NULL,
+ [DRM_XE_VMA_ATTR_PURGEABLE_STATE] = madvise_purgeable,
};
static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start, u64 end)
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH v8 12/12] drm/xe/madvise: Accept canonical GPU addresses in xe_vm_madvise_ioctl
2026-03-26 5:50 [PATCH v8 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
` (10 preceding siblings ...)
2026-03-26 5:51 ` [PATCH v8 11/12] drm/xe/madvise: Enable purgeable buffer object IOCTL support Arvind Yadav
@ 2026-03-26 5:51 ` Arvind Yadav
11 siblings, 0 replies; 16+ messages in thread
From: Arvind Yadav @ 2026-03-26 5:51 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom
Userspace passes canonical (sign-extended) GPU addresses where bits 63:48
mirror bit 47. The internal GPUVM uses non-canonical form (upper bits
zeroed), so passing raw canonical addresses into GPUVM lookups causes
mismatches for addresses above 128TiB.
Strip the sign extension with xe_device_uncanonicalize_addr() at the
top of xe_vm_madvise_ioctl(). Non-canonical addresses are unaffected.
Fixes: ada7486c5668 ("drm/xe: Implement madvise ioctl for xe")
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
drivers/gpu/drm/xe/xe_vm_madvise.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 4a19da5e86d4..2d03676ee595 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -673,8 +673,15 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
struct xe_device *xe = to_xe_device(dev);
struct xe_file *xef = to_xe_file(file);
struct drm_xe_madvise *args = data;
- struct xe_vmas_in_madvise_range madvise_range = {.addr = args->start,
- .range = args->range, };
+ struct xe_vmas_in_madvise_range madvise_range = {
+ /*
+ * Userspace may pass canonical (sign-extended) addresses.
+ * Strip the sign extension to get the internal non-canonical
+ * form used by the GPUVM, matching xe_vm_bind_ioctl() behavior.
+ */
+ .addr = xe_device_uncanonicalize_addr(xe, args->start),
+ .range = args->range,
+ };
struct xe_madvise_details details;
struct xe_vm *vm;
struct drm_exec exec;
@@ -724,7 +731,7 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
if (err)
goto unlock_vm;
- err = xe_vm_alloc_madvise_vma(vm, args->start, args->range);
+ err = xe_vm_alloc_madvise_vma(vm, madvise_range.addr, args->range);
if (err)
goto madv_fini;
@@ -774,7 +781,8 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
madvise_funcs[attr_type](xe, vm, madvise_range.vmas, madvise_range.num_vmas, args,
&details);
- err = xe_vm_invalidate_madvise_range(vm, args->start, args->start + args->range);
+ err = xe_vm_invalidate_madvise_range(vm, madvise_range.addr,
+ madvise_range.addr + args->range);
if (madvise_range.has_svm_userptr_vmas)
xe_svm_notifier_unlock(vm);
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread