[RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects
@ 2025-12-01  5:50 Arvind Yadav
  2025-12-01  5:50 ` [RFC v2 1/9] drm/xe/uapi: Add UAPI " Arvind Yadav
                   ` (9 more replies)
  0 siblings, 10 replies; 35+ messages in thread
From: Arvind Yadav @ 2025-12-01  5:50 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom,
	pallavi.mishra

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=yes, Size: 3615 bytes --]

This patch series introduces comprehensive support for purgeable buffer objects
in the Xe driver, enabling userspace to provide memory usage hints for better
memory management under system pressure.

Overview:

Purgeable memory allows applications to mark buffer objects as "not currently
needed" (DONTNEED), making them eligible for kernel reclamation during memory
pressure. This helps prevent OOM conditions and enables more efficient GPU
memory utilization for workloads with temporary or regeneratable data (caches,
intermediate results, decoded frames, etc.).

Purgeable BO Lifecycle:
1. WILLNEED (default): BO actively needed, kernel preserves backing store
2. DONTNEED (user hint): BO contents discardable, eligible for purging
3. PURGED (kernel action): Backing store reclaimed during memory pressure

Key Design Principles:
  - i915 compatibility: "Once purged, always purged" semantics - purged BOs
     remain permanently invalid and must be destroyed/recreated
  - Safety first: Only non-shared BOs can be marked DONTNEED to prevent
    multi-process data corruption
  - Multiple protection layers: Validation in madvise, VM bind, mmap, and
    fault handlers
  - Async TLB invalidation: Uses xe_bo_trigger_rebind() for non-blocking
    GPU mapping invalidation
  - Scratch PTE support: Fault-mode VMs use scratch pages for safe zero reads
    on purged BO access

Error Handling:
  - CPU access (mmap): Returns VM_FAULT_SIGBUS (SIGBUS signal to process)
  - GPU access (non-scratch VM): Page fault fails with -EACCES, GPU context reset
  - GPU access (scratch VM): Page fault succeeds, rebinds with scratch PTEs
  - VM_BIND operations: MAP/PREFETCH rejected with -EINVAL
  - Mmap offset ioctl: Rejected with -EINVAL for early error detection

v2 Changes:
  - Reordered patches: Moved shared BO helper before main implementation for
    proper dependency order
  - Fixed reference counting in mmap offset validation (use drm_gem_object_put)
  - Removed incorrect claims about madvise(WILLNEED) restoring purged BOs
  - Fixed error code documentation inconsistencies
  - Initialize purge_state_val fields to prevent kernel memory leaks
  - Use xe_bo_trigger_rebind() for async TLB invalidation (Thomas Hellström)
  - Add NULL rebind with scratch PTEs for fault mode (Thomas Hellström)
  - Implement i915-compatible retained field logic (Thomas Hellström)
  - Skip BO validation for purged BOs in page fault handler (crash fix)
  - Add scratch VM check in page fault path (non-scratch VMs fail fault) 
 

Arvind Yadav (7):
  drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo
  drm/xe/bo: Prevent purging of shared buffer objects
  drm/xe/madvise: Implement purgeable buffer object support
  drm/xe/bo: Handle CPU faults on purged buffer objects
  drm/xe/bo: Prevent mmap of purged buffer objects
  drm/xe/vm: Prevent binding of purged buffer objects
  drm/xe: Add support for querying purgeable BO states

Himal Prasad Ghimiray (2):
  drm/xe/uapi: Add UAPI support for purgeable buffer objects
  drm/xe/uapi: Add UAPI for purgeable bo state to madvise query response

 drivers/gpu/drm/xe/xe_bo.c           | 92 ++++++++++++++++++++++++----
 drivers/gpu/drm/xe/xe_bo.h           | 57 +++++++++++++++++
 drivers/gpu/drm/xe/xe_bo_types.h     |  3 +
 drivers/gpu/drm/xe/xe_gt_pagefault.c | 19 ++++++
 drivers/gpu/drm/xe/xe_pt.c           | 13 +++-
 drivers/gpu/drm/xe/xe_vm.c           | 12 ++++
 drivers/gpu/drm/xe/xe_vm_madvise.c   | 76 +++++++++++++++++++++++
 include/uapi/drm/xe_drm.h            | 50 ++++++++++++++-
 8 files changed, 308 insertions(+), 14 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC v2 1/9] drm/xe/uapi: Add UAPI support for purgeable buffer objects
  2025-12-01  5:50 [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
@ 2025-12-01  5:50 ` Arvind Yadav
  2025-12-01 23:00   ` Matthew Brost
  2025-12-01  5:50 ` [RFC v2 2/9] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo Arvind Yadav
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Arvind Yadav @ 2025-12-01  5:50 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom,
	pallavi.mishra

From: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

Extend the DRM_XE_MADVISE ioctl to support purgeable buffer object
management by adding DRM_XE_VMA_ATTR_PURGEABLE_STATE attribute type.

This allows userspace applications to provide memory usage hints to
the kernel for better memory management under pressure:

This allows userspace applications to provide memory usage hints to
the kernel for better memory management under pressure:

- WILLNEED: Buffer is needed and should not be purged. If the BO was
  previously purged, retained field returns 0 indicating backing store
  was lost (once purged, always purged semantics matching i915).

- DONTNEED: Buffer is not currently needed and may be purged by the
  kernel under memory pressure to free resources. Only applies to
  non-shared BOs.

The implementation includes a 'retained' output field (matching i915's
drm_i915_gem_madvise.retained) that indicates whether the BO's backing
store still exists (1) or has been purged (0).

v2: Add PURGED state for read-only status, change ioctl to DRM_IOWR,
    add retained field for i915 compatibility

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 include/uapi/drm/xe_drm.h | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 47853659a705..02d63938d16f 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -121,7 +121,7 @@ extern "C" {
 #define DRM_IOCTL_XE_EXEC			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC, struct drm_xe_exec)
 #define DRM_IOCTL_XE_WAIT_USER_FENCE		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence)
 #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
-#define DRM_IOCTL_XE_MADVISE			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise)
+#define DRM_IOCTL_XE_MADVISE			DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise)
 #define DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS	DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_VM_QUERY_MEM_RANGE_ATTRS, struct drm_xe_vm_query_mem_range_attr)
 
 /**
@@ -2051,6 +2051,7 @@ struct drm_xe_madvise {
 #define DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC	0
 #define DRM_XE_MEM_RANGE_ATTR_ATOMIC		1
 #define DRM_XE_MEM_RANGE_ATTR_PAT		2
+#define DRM_XE_VMA_ATTR_PURGEABLE_STATE		3
 	/** @type: type of attribute */
 	__u32 type;
 
@@ -2129,6 +2130,40 @@ struct drm_xe_madvise {
 			/** @pat_index.reserved: Reserved */
 			__u64 reserved;
 		} pat_index;
+
+		/**
+		 * @purge_state_val: Purgeable state configuration
+		 *
+		 * Used when @type == DRM_XE_VMA_ATTR_PURGEABLE_STATE.
+		 *
+		 * Configures the purgeable state of buffer objects in the specified
+		 * virtual address range. This allows applications to hint to the kernel
+		 * about bo's usage patterns for better memory management.
+		 *
+		 * Supported values for @purge_state_val.val:
+		 *  - DRM_XE_VMA_PURGEABLE_STATE_WILLNEED (0): Marks BO as needed.
+		 *    If BO was purged, returns retained=0 (backing store lost).
+		 *
+		 *  - DRM_XE_VMA_PURGEABLE_STATE_DONTNEED (1): Hints that BO is not
+		 *    currently needed. Kernel may purge it under memory pressure.
+		 *    Only applies to non-shared BOs. Returns retained=1 if not purged.
+		 */
+		struct {
+#define DRM_XE_VMA_PURGEABLE_STATE_WILLNEED	0
+#define DRM_XE_VMA_PURGEABLE_STATE_DONTNEED	1
+			/** @purge_state_val.val: value for DRM_XE_VMA_ATTR_PURGEABLE_STATE */
+			__u32 val;
+			/**
+			 * @purge_state_val.retained: Whether the backing store still exists.
+			 *
+			 * Output field indicating if the BO's backing store is retained.
+			 * Set to 1 if backing store exists, 0 if it has been purged.
+			 * Similar to i915's drm_i915_gem_madvise.retained field.
+			 */
+			__u32 retained;
+			/** @purge_state_val.reserved: Reserved */
+			__u64 reserved;
+		} purge_state_val;
 	};
 
 	/** @reserved: Reserved */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 2/9] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo
  2025-12-01  5:50 [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
  2025-12-01  5:50 ` [RFC v2 1/9] drm/xe/uapi: Add UAPI " Arvind Yadav
@ 2025-12-01  5:50 ` Arvind Yadav
  2025-12-01 23:02   ` Matthew Brost
  2025-12-02 18:52   ` Matthew Brost
  2025-12-01  5:50 ` [RFC v2 3/9] drm/xe/bo: Prevent purging of shared buffer objects Arvind Yadav
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Arvind Yadav @ 2025-12-01  5:50 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom,
	pallavi.mishra

Add infrastructure for tracking purgeable state of buffer objects.
This includes:

Introduce enum xe_madv_purgeable_state with three states:
   - XE_MADV_PURGEABLE_WILLNEED (0): BO is needed and should not be
     purged. This is the default state for all BOs.

   - XE_MADV_PURGEABLE_DONTNEED (1): BO is not currently needed and
     can be purged by the kernel under memory pressure to reclaim
     resources. Only non-shared BOs can be marked as DONTNEED.

   - XE_MADV_PURGEABLE_PURGED (2): BO has been purged by the kernel.
     Accessing a purged BO results in error. Follows i915 semantics
     where once purged, the BO remains permanently invalid ("once
     purged, always purged").

Add atomic_t madv field to struct xe_bo for state tracking
  of purgeable state across concurrent access paths

v2: Add xe_bo_is_purged() helper, improve state documentation

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.h       | 27 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_bo_types.h |  3 +++
 2 files changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 911d5b90461a..b0a31c77e612 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -85,6 +85,28 @@
 
 #define XE_PCI_BARRIER_MMAP_OFFSET	(0x50 << XE_PTE_SHIFT)
 
+/**
+ * enum xe_madv_purgeable_state - Buffer object purgeable state enumeration
+ *
+ * This enum defines the possible purgeable states for a buffer object,
+ * allowing userspace to provide memory usage hints to the kernel for
+ * better memory management under pressure.
+ *
+ * @XE_MADV_PURGEABLE_WILLNEED: The buffer object is needed and should not be purged.
+ * This is the default state.
+ * @XE_MADV_PURGEABLE_DONTNEED: The buffer object is not currently needed and can be
+ * purged by the kernel under memory pressure.
+ * @XE_MADV_PURGEABLE_PURGED: The buffer object has been purged by the kernel.
+ *
+ * Accessing a purged buffer will result in an error. Per i915 semantics,
+ * once purged, a BO remains permanently invalid and must be destroyed and recreated.
+ */
+enum xe_madv_purgeable_state {
+	XE_MADV_PURGEABLE_WILLNEED,
+	XE_MADV_PURGEABLE_DONTNEED,
+	XE_MADV_PURGEABLE_PURGED,
+};
+
 struct sg_table;
 
 struct xe_bo *xe_bo_alloc(void);
@@ -213,6 +235,11 @@ static inline bool xe_bo_is_protected(const struct xe_bo *bo)
 	return bo->pxp_key_instance;
 }
 
+static inline bool xe_bo_is_purged(struct xe_bo *bo)
+{
+	return atomic_read(&bo->madv_purgeable) == XE_MADV_PURGEABLE_PURGED;
+}
+
 static inline void xe_bo_unpin_map_no_vm(struct xe_bo *bo)
 {
 	if (likely(bo)) {
diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index d4fe3c8dca5b..57b4dc7012e2 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -108,6 +108,9 @@ struct xe_bo {
 	 * from default
 	 */
 	u64 min_align;
+
+	/** @madv_purgeable: user space advise on BO purgeability */
+	atomic_t madv_purgeable;
 };
 
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 3/9] drm/xe/bo: Prevent purging of shared buffer objects
  2025-12-01  5:50 [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
  2025-12-01  5:50 ` [RFC v2 1/9] drm/xe/uapi: Add UAPI " Arvind Yadav
  2025-12-01  5:50 ` [RFC v2 2/9] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo Arvind Yadav
@ 2025-12-01  5:50 ` Arvind Yadav
  2025-12-01 23:10   ` Matthew Brost
  2025-12-01  5:50 ` [RFC v2 4/9] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Arvind Yadav @ 2025-12-01  5:50 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom,
	pallavi.mishra

Introduce the `xe_bo_is_shared_locked()` inline helper to determine if a
buffer object is shared across multiple clients or drivers. A buffer is
considered shared if it is exported via dma-buf, imported, or has a
handle count greater than one.

This check is critical for safely implementing purgeable memory. Purging
a buffer that is shared would lead to data corruption for other clients
that still hold a reference to it.

The kernel cannot safely determine when all clients are done with a
shared buffer, so shared BOs must never be marked DONTNEED or purged.

The new helper is used in two key locations:
1.  In `xe_vm_madvise_purgeable_bo()`, to prevent userspace from
    successfully marking a shared buffer as `DONTNEED`. This is the
    primary safeguard against incorrect usage.

2.  In `xe_bo_move()`, as a final safety check before the kernel
    initiates a purge during eviction. This ensures that even if a
    shared buffer were somehow marked `DONTNEED`, it would not be
    purged.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index b0a31c77e612..97edb38bf1ed 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -478,4 +478,34 @@ static inline bool xe_bo_is_mem_type(struct xe_bo *bo, u32 mem_type)
 	xe_bo_assert_held(bo);
 	return bo->ttm.resource->mem_type == mem_type;
 }
+
+/**
+ * xe_bo_is_shared_locked - Check if a buffer object is shared
+ * @bo: The buffer object to check
+ *
+ * Determines if a buffer object is considered shared, which includes:
+ * - Exported via dma-buf (obj->dma_buf is set)
+ * - Imported from another driver (obj->import_attach is set)
+ * - Referenced by multiple clients (handle_count > 1)
+ *
+ * This check is used to prevent data loss on shared content by avoiding
+ * certain operations like purging on buffers that other processes or
+ * drivers might still be using.
+ *
+ * Return: true if the buffer object is shared, false otherwise.
+ */
+static inline bool xe_bo_is_shared_locked(const struct xe_bo *bo)
+{
+	const struct drm_gem_object *obj = &bo->ttm.base;
+
+	dma_resv_assert_held(obj->resv);
+
+	if (obj->dma_buf || obj->import_attach)
+		return true;
+
+	if (obj->handle_count > 1)
+		return true;
+
+	return false;
+}
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 4/9] drm/xe/madvise: Implement purgeable buffer object support
  2025-12-01  5:50 [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (2 preceding siblings ...)
  2025-12-01  5:50 ` [RFC v2 3/9] drm/xe/bo: Prevent purging of shared buffer objects Arvind Yadav
@ 2025-12-01  5:50 ` Arvind Yadav
  2025-12-02  1:46   ` Matthew Brost
  2025-12-02 21:39   ` Matthew Brost
  2025-12-01  5:50 ` [RFC v2 5/9] drm/xe/bo: Handle CPU faults on purged buffer objects Arvind Yadav
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Arvind Yadav @ 2025-12-01  5:50 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom,
	pallavi.mishra

This allows userspace applications to provide memory usage hints to
the kernel for better memory management under pressure:

Add the core implementation for purgeable buffer objects, enabling memory
reclamation of user-designated DONTNEED buffers during eviction.

This patch implements the purge operation and state machine transitions:

Purgeable States (from xe_madv_purgeable_state):
 - WILLNEED (0): BO should be retained, actively used
 - DONTNEED (1): BO eligible for purging, not currently needed
 - PURGED (2): BO backing store reclaimed, permanently invalid

Design Rationale:
  - Async TLB invalidation via trigger_rebind (no blocking xe_vm_invalidate_vma)
  - i915 compatibility: retained field, "once purged always purged" semantics
  - Shared BO protection prevents multi-process memory corruption
  - Scratch PTE reuse avoids new infrastructure, safe for fault mode

v2:
  - Use xe_bo_trigger_rebind() for async TLB invalidation (Thomas Hellström)
  - Add NULL rebind with scratch PTEs for fault mode (Thomas Hellström)
  - Implement i915-compatible retained field logic (Thomas Hellström)
  - Skip BO validation for purged BOs in page fault handler (crash fix)
  - Add scratch VM check in page fault path (non-scratch VMs fail fault)
  - Force clear_pt for non-scratch VMs to avoid phys addr 0 mapping (review fix)
  - Add !is_purged check to resource cursor setup to prevent stale access

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c           | 72 ++++++++++++++++++++++-----
 drivers/gpu/drm/xe/xe_gt_pagefault.c | 19 ++++++++
 drivers/gpu/drm/xe/xe_pt.c           | 36 ++++++++++++--
 drivers/gpu/drm/xe/xe_vm.c           | 11 ++++-
 drivers/gpu/drm/xe/xe_vm_madvise.c   | 73 ++++++++++++++++++++++++++++
 5 files changed, 193 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index cbc3ee157218..f0b3f7a13114 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -836,6 +836,53 @@ static int xe_bo_move_notify(struct xe_bo *bo,
 	return 0;
 }
 
+static void xe_bo_set_purged(struct xe_bo *bo)
+{
+	/* BO must be locked before modifying madv state */
+	dma_resv_assert_held(bo->ttm.base.resv);
+
+	atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_PURGED);
+}
+
+/**
+ * xe_ttm_bo_purge() - Purge buffer object backing store
+ * @ttm_bo: The TTM buffer object to purge
+ * @ctx: TTM operation context
+ *
+ * This function purges the backing store of a BO marked as DONTNEED and
+ * triggers rebind to invalidate stale GPU mappings. For fault-mode VMs,
+ * this zaps the PTEs. The next GPU access will trigger a page fault and
+ * perform NULL rebind (scratch pages or clear PTEs based on VM config).
+ */
+static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
+{
+	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
+	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
+
+	if (ttm_bo->ttm) {
+		struct ttm_placement place = {};
+		int ret = ttm_bo_validate(ttm_bo, &place, ctx);
+
+		drm_WARN_ON(&xe->drm, ret);
+		if (!ret && bo) {
+			if (atomic_read(&bo->madv_purgeable) == XE_MADV_PURGEABLE_DONTNEED) {
+				xe_bo_set_purged(bo);
+
+				/*
+				 * Trigger rebind to invalidate stale GPU mappings.
+				 * - Non-fault mode: Marks VMAs for rebind
+				 * - Fault mode: Zaps PTEs (sets to 0), next access triggers fault
+				 *   and NULL rebind with scratch/clear PTEs per VM config
+				 */
+				ret = xe_bo_trigger_rebind(xe, bo, ctx);
+				if (ret)
+					drm_warn(&xe->drm,
+						 "Failed to invalidate purged BO: %d\n", ret);
+			}
+		}
+	}
+}
+
 static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
 		      struct ttm_operation_ctx *ctx,
 		      struct ttm_resource *new_mem,
@@ -853,8 +900,18 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
 	bool needs_clear;
 	bool handle_system_ccs = (!IS_DGFX(xe) && xe_bo_needs_ccs_pages(bo) &&
 				  ttm && ttm_tt_is_populated(ttm)) ? true : false;
+	int state = atomic_read(&bo->madv_purgeable);
 	int ret = 0;
 
+	/*
+	 * Purge only non-shared BOs explicitly marked DONTNEED by userspace.
+	 * The move_notify callback will handle invalidation asynchronously.
+	 */
+	if (evict && state == XE_MADV_PURGEABLE_DONTNEED && !xe_bo_is_shared_locked(bo)) {
+		xe_ttm_bo_purge(ttm_bo, ctx);
+		return 0;
+	}
+
 	/* Bo creation path, moving to system or TT. */
 	if ((!old_mem && ttm) && !handle_system_ccs) {
 		if (new_mem->mem_type == XE_PL_TT)
@@ -1606,18 +1663,6 @@ static void xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object *ttm_bo)
 	}
 }
 
-static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
-{
-	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
-
-	if (ttm_bo->ttm) {
-		struct ttm_placement place = {};
-		int ret = ttm_bo_validate(ttm_bo, &place, ctx);
-
-		drm_WARN_ON(&xe->drm, ret);
-	}
-}
-
 static void xe_ttm_bo_swap_notify(struct ttm_buffer_object *ttm_bo)
 {
 	struct ttm_operation_ctx ctx = {
@@ -2202,6 +2247,9 @@ struct xe_bo *xe_bo_init_locked(struct xe_device *xe, struct xe_bo *bo,
 #endif
 	INIT_LIST_HEAD(&bo->vram_userfault_link);
 
+	/* Initialize purge advisory state */
+	atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_WILLNEED);
+
 	drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
 
 	if (resv) {
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index a054d6010ae0..8c7e5dcb627b 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -87,6 +87,13 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
 	if (!bo)
 		return 0;
 
+	/*
+	 * Skip validation/migration for purged BOs - they have no backing pages.
+	 * Rebind will use scratch PTEs instead.
+	 */
+	if (xe_bo_is_purged(bo))
+		return 0;
+
 	return need_vram_move ? xe_bo_migrate(bo, vram->placement, NULL, exec) :
 		xe_bo_validate(bo, vm, true, exec);
 }
@@ -100,9 +107,21 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
 	struct drm_exec exec;
 	struct dma_fence *fence;
 	int err, needs_vram;
+	struct xe_bo *bo;
 
 	lockdep_assert_held_write(&vm->lock);
 
+	/*
+	 * Check if BO is purged. For purged BOs:
+	 * - Scratch VMs: Allow rebind with scratch PTEs (safe zero reads)
+	 * - Non-scratch VMs: FAIL the page fault (no scratch page available)
+	 */
+	bo = xe_vma_bo(vma);
+	if (bo && xe_bo_is_purged(bo)) {
+		if (!xe_vm_has_scratch(vm))
+			return -EACCES;
+	}
+
 	needs_vram = xe_vma_need_vram_for_atomic(vm->xe, vma, atomic);
 	if (needs_vram < 0 || (needs_vram && xe_vma_is_userptr(vma)))
 		return needs_vram < 0 ? needs_vram : -EACCES;
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index d22fd1ccc0ba..062f64b16a58 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -533,20 +533,26 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 	/* Is this a leaf entry ?*/
 	if (level == 0 || xe_pt_hugepte_possible(addr, next, level, xe_walk)) {
 		struct xe_res_cursor *curs = xe_walk->curs;
+		struct xe_bo *bo = xe_vma_bo(xe_walk->vma);
 		bool is_null = xe_vma_is_null(xe_walk->vma);
-		bool is_vram = is_null ? false : xe_res_is_vram(curs);
+		bool is_purged = bo && xe_bo_is_purged(bo);
+		bool is_vram = (is_null || is_purged) ? false : xe_res_is_vram(curs);
 
 		XE_WARN_ON(xe_walk->va_curs_start != addr);
 
 		if (xe_walk->clear_pt) {
 			pte = 0;
 		} else {
-			pte = vm->pt_ops->pte_encode_vma(is_null ? 0 :
+			/*
+			 * For purged BOs, treat like null VMAs - pass address 0.
+			 * The pte_encode_vma will set XE_PTE_NULL flag for scratch mapping.
+			 */
+			pte = vm->pt_ops->pte_encode_vma((is_null || is_purged) ? 0 :
 							 xe_res_dma(curs) +
 							 xe_walk->dma_offset,
 							 xe_walk->vma,
 							 pat_index, level);
-			if (!is_null)
+			if (!is_null && !is_purged)
 				pte |= is_vram ? xe_walk->default_vram_pte :
 					xe_walk->default_system_pte;
 
@@ -570,7 +576,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 		if (unlikely(ret))
 			return ret;
 
-		if (!is_null && !xe_walk->clear_pt)
+		if (!is_null && !is_purged && !xe_walk->clear_pt)
 			xe_res_next(curs, next - addr);
 		xe_walk->va_curs_start = next;
 		xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << level);
@@ -723,6 +729,26 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 	};
 	struct xe_pt *pt = vm->pt_root[tile->id];
 	int ret;
+	bool is_purged = false;
+
+	/*
+	 * Check if BO is purged:
+	 * - Scratch VMs: Use scratch PTEs (XE_PTE_NULL) for safe zero reads
+	 * - Non-scratch VMs: Clear PTEs to zero (non-present) to avoid mapping to phys addr 0
+	 *
+	 * For non-scratch VMs, we force clear_pt=true so leaf PTEs become completely
+	 * zero instead of creating a PRESENT mapping to physical address 0.
+	 */
+	if (bo && xe_bo_is_purged(bo)) {
+		is_purged = true;
+
+		/*
+		 * For non-scratch VMs, a NULL rebind should use zero PTEs
+		 * (non-present), not a present PTE to phys 0.
+		 */
+		if (!xe_vm_has_scratch(vm))
+			xe_walk.clear_pt = true;
+	}
 
 	if (range) {
 		/* Move this entire thing to xe_svm.c? */
@@ -762,7 +788,7 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 	if (!range)
 		xe_bo_assert_held(bo);
 
-	if (!xe_vma_is_null(vma) && !range) {
+	if (!xe_vma_is_null(vma) && !range && !is_purged) {
 		if (xe_vma_is_userptr(vma))
 			xe_res_first_dma(to_userptr_vma(vma)->userptr.pages.dma_addr, 0,
 					 xe_vma_size(vma), &curs);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 10d77666a425..d03e69524369 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1336,6 +1336,9 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
 static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
 			       u16 pat_index, u32 pt_level)
 {
+	struct xe_bo *bo = xe_vma_bo(vma);
+	struct xe_vm *vm = xe_vma_vm(vma);
+
 	pte |= XE_PAGE_PRESENT;
 
 	if (likely(!xe_vma_read_only(vma)))
@@ -1344,7 +1347,13 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
 	pte |= pte_encode_pat_index(pat_index, pt_level);
 	pte |= pte_encode_ps(pt_level);
 
-	if (unlikely(xe_vma_is_null(vma)))
+	/*
+	 * NULL PTEs redirect to scratch page (return zeros on read).
+	 * Set for: 1) explicit null VMAs, 2) purged BOs on scratch VMs.
+	 * Never set NULL flag without scratch page - causes undefined behavior.
+	 */
+	if (unlikely(xe_vma_is_null(vma) ||
+		     (bo && xe_bo_is_purged(bo) && xe_vm_has_scratch(vm))))
 		pte |= XE_PTE_NULL;
 
 	return pte;
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index cad3cf627c3f..3ba851e0b870 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -158,6 +158,60 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
 	}
 }
 
+/*
+ * Handle purgeable buffer object advice for DONTNEED/WILLNEED/PURGED.
+ * Updates op->purge_state_val.retained to indicate if backing store
+ * exists (matches i915's retained).
+ */
+static void xe_vm_madvise_purgeable_bo(struct xe_device *xe, struct xe_vm *vm,
+				       struct xe_vma **vmas, int num_vmas,
+				       struct drm_xe_madvise *op)
+{
+	bool has_purged_bo = false;
+	int i;
+
+	xe_assert(vm->xe, op->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE);
+
+	for (i = 0; i < num_vmas; i++) {
+		struct xe_bo *bo = xe_vma_bo(vmas[i]);
+
+		if (!bo)
+			continue;
+
+		/* BO must be locked before modifying madv state */
+		dma_resv_assert_held(bo->ttm.base.resv);
+
+		/*
+		 * Once purged, always purged. Cannot transition back to WILLNEED.
+		 * This matches i915 semantics where purged BOs are permanently invalid.
+		 */
+		if (xe_bo_is_purged(bo)) {
+			has_purged_bo = true;
+			continue;
+		}
+
+		switch (op->purge_state_val.val) {
+		case DRM_XE_VMA_PURGEABLE_STATE_WILLNEED:
+			atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_WILLNEED);
+			break;
+		case DRM_XE_VMA_PURGEABLE_STATE_DONTNEED:
+			if (!xe_bo_is_shared_locked(bo))
+				atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_DONTNEED);
+			break;
+		default:
+			drm_warn(&vm->xe->drm, "Invalid madvice value = %d\n",
+				 op->purge_state_val.val);
+			return;
+		}
+	}
+
+	/*
+	 * Set retained flag to indicate if backing store still exists.
+	 * Matches i915: retained = 1 if not purged, 0 if purged.
+	 */
+	op->purge_state_val.retained = !has_purged_bo;
+}
+
 typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
 			     struct xe_vma **vmas, int num_vmas,
 			     struct drm_xe_madvise *op);
@@ -283,6 +337,19 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
 			return false;
 		break;
 	}
+	case DRM_XE_VMA_ATTR_PURGEABLE_STATE:
+	{
+		u32 val = args->purge_state_val.val;
+
+		if (XE_IOCTL_DBG(xe, !((val == DRM_XE_VMA_PURGEABLE_STATE_WILLNEED) ||
+				       (val == DRM_XE_VMA_PURGEABLE_STATE_DONTNEED))))
+			return false;
+
+		if (XE_IOCTL_DBG(xe, args->purge_state_val.reserved))
+			return false;
+
+		break;
+	}
 	default:
 		if (XE_IOCTL_DBG(xe, 1))
 			return false;
@@ -402,6 +469,12 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
 					goto err_fini;
 			}
 		}
+		if (args->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE) {
+			xe_vm_madvise_purgeable_bo(xe, vm, madvise_range.vmas,
+						   madvise_range.num_vmas, args);
+			goto err_fini;
+
+		}
 	}
 
 	if (madvise_range.has_svm_userptr_vmas) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 5/9] drm/xe/bo: Handle CPU faults on purged buffer objects
  2025-12-01  5:50 [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (3 preceding siblings ...)
  2025-12-01  5:50 ` [RFC v2 4/9] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
@ 2025-12-01  5:50 ` Arvind Yadav
  2025-12-02 18:42   ` Matthew Brost
  2025-12-01  5:50 ` [RFC v2 6/9] drm/xe/bo: Prevent mmap of " Arvind Yadav
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Arvind Yadav @ 2025-12-01  5:50 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom,
	pallavi.mishra

Modify the CPU page fault handler, `xe_bo_cpu_fault()`, to correctly
handle access to buffer objects that have been purged.

When a buffer object is in the `XE_MADV_PURGED` state, its backing
store has been reclaimed by the kernel. If the CPU attempts to access
this memory, it is an error that should be reported to the application.

v2:
  - Added xe_bo_is_purged(bo) instead of atomic_read.
  - Avoids leaks and keeps drm_dev_exit() while returning.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index f0b3f7a13114..7f5bcf114ed4 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1992,6 +1992,16 @@ static vm_fault_t xe_bo_cpu_fault(struct vm_fault *vmf)
 	if (!drm_dev_enter(&xe->drm, &idx))
 		return ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
 
+	/*
+	 * BO content is gone. Signal the user process.
+	 * Once purged, BO remains permanently invalid (i915 semantics).
+	 * Application must destroy and recreate the BO.
+	 */
+	if (xe_bo_is_purged(bo)) {
+		ret = VM_FAULT_SIGBUS;
+		goto out;
+	}
+
 	ret = xe_bo_cpu_fault_fastpath(vmf, xe, bo, needs_rpm);
 	if (ret != VM_FAULT_RETRY)
 		goto out;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 6/9] drm/xe/bo: Prevent mmap of purged buffer objects
  2025-12-01  5:50 [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (4 preceding siblings ...)
  2025-12-01  5:50 ` [RFC v2 5/9] drm/xe/bo: Handle CPU faults on purged buffer objects Arvind Yadav
@ 2025-12-01  5:50 ` Arvind Yadav
  2025-12-02 18:54   ` Matthew Brost
  2025-12-01  5:50 ` [RFC v2 7/9] drm/xe/vm: Prevent binding " Arvind Yadav
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Arvind Yadav @ 2025-12-01  5:50 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom,
	pallavi.mishra

Fail DRM_IOCTL_XE_GEM_MMAP_OFFSET with -EINVAL when called on purged
buffer objects to provide early error detection instead of allowing
deferred SIGBUS on memory access.

Problem:
  The mmap offset ioctl (DRM_IOCTL_XE_GEM_MMAP_OFFSET) returns a file
  offset that userspace can pass to mmap() to map GPU memory into its
  address space. For purged BOs, the backing store has been freed, but
  the VMA node offset remains valid. Without this check:

  1. Userspace successfully gets mmap offset for purged BO
  2. mmap() succeeds (VMA is created but has no backing pages)
  3. Any memory access triggers CPU page fault
  4. xe_bo_cpu_fault() detects purged state and returns VM_FAULT_SIGBUS

v2:
  - Fix reference counting: use drm_gem_object_put() instead of xe_bo_put()
    to properly balance drm_gem_object_lookup() (review feedback).
  - Added xe_bo_is_purged(bo) instead of atomic_read.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 7f5bcf114ed4..dbbfb58ac657 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -3346,6 +3346,7 @@ int xe_gem_mmap_offset_ioctl(struct drm_device *dev, void *data,
 	struct xe_device *xe = to_xe_device(dev);
 	struct drm_xe_gem_mmap_offset *args = data;
 	struct drm_gem_object *gem_obj;
+	struct xe_bo *bo;
 
 	if (XE_IOCTL_DBG(xe, args->extensions) ||
 	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
@@ -3375,6 +3376,16 @@ int xe_gem_mmap_offset_ioctl(struct drm_device *dev, void *data,
 	if (XE_IOCTL_DBG(xe, !gem_obj))
 		return -ENOENT;
 
+	bo = gem_to_xe_bo(gem_obj);
+
+	/*
+	 * Reject mmap offset requests for purged BOs.
+	 */
+	if (xe_bo_is_purged(bo)) {
+		drm_gem_object_put(gem_obj);
+		return -EINVAL;
+	}
+
 	/* The mmap offset was set up at BO allocation time. */
 	args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 7/9] drm/xe/vm: Prevent binding of purged buffer objects
  2025-12-01  5:50 [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (5 preceding siblings ...)
  2025-12-01  5:50 ` [RFC v2 6/9] drm/xe/bo: Prevent mmap of " Arvind Yadav
@ 2025-12-01  5:50 ` Arvind Yadav
  2025-12-02 18:57   ` Matthew Brost
  2025-12-01  5:50 ` [RFC v2 8/9] drm/xe/uapi: Add UAPI for purgeable bo state to madvise query response Arvind Yadav
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Arvind Yadav @ 2025-12-01  5:50 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom,
	pallavi.mishra

Add validation in xe_vm_bind_ioctl_validate_bo() to reject MAP and
PREFETCH operations on purged buffer objects with -EINVAL.

Problem:
When a BO is purged (XE_MADV_PURGEABLE_PURGED state), its backing pages
have been freed by the kernel. Without this check, VM_BIND operations
would proceed:

 1. DRM_XE_VM_BIND_OP_MAP: Attempts to create GPU mappings to freed memory
    - xe_vma_ops_alloc() creates VMA pointing to invalid BO
    - Page tables populated with stale/invalid addresses
    - GPU access leads to undefined behavior or hangs

 2. DRM_XE_VM_BIND_OP_PREFETCH: Attempts to migrate non-existent pages
    - Triggers BO validation in TTM
    - ttm_bo_validate() fails or crashes (no backing store)
    - Wasted work for permanently invalid BO

With this check:
  - MAP/PREFETCH immediately fail with -EINVAL at ioctl boundary
  - Clear error message at syscall (better UX than deferred GPU hang)
  - Prevents creation of invalid GPU page table entries

v2:
  - Clarify that purged BOs are permanently invalid (i915 semantics)
  - Remove incorrect claim about madvise(WILLNEED) restoring purged BOs

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index d03e69524369..cc946bff9607 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3482,6 +3482,13 @@ static int xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo *bo,
 		return -EINVAL;
 	}
 
+	/* Purged BOs are permanently invalid; reject new MAP/PREFETCH. */
+	if (XE_IOCTL_DBG(xe,
+			 xe_bo_is_purged(bo) &&
+			 (op == DRM_XE_VM_BIND_OP_MAP ||
+			  op == DRM_XE_VM_BIND_OP_PREFETCH)))
+		return -EINVAL;
+
 	/*
 	 * Some platforms require 64k VM_BIND alignment,
 	 * specifically those with XE_VRAM_FLAGS_NEED64K.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 8/9] drm/xe/uapi: Add UAPI for purgeable bo state to madvise query response
  2025-12-01  5:50 [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (6 preceding siblings ...)
  2025-12-01  5:50 ` [RFC v2 7/9] drm/xe/vm: Prevent binding " Arvind Yadav
@ 2025-12-01  5:50 ` Arvind Yadav
  2025-12-02 19:01   ` Matthew Brost
  2025-12-01  5:50 ` [RFC v2 9/9] drm/xe: Add support for querying purgeable BO states Arvind Yadav
  2025-12-02 18:36 ` [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects Souza, Jose
  9 siblings, 1 reply; 35+ messages in thread
From: Arvind Yadav @ 2025-12-01  5:50 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom,
	pallavi.mishra

From: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

Complete the purgeable buffer object UAPI by adding the response
structure to drm_xe_mem_range_attr for querying current purgeable
state of buffer objects within a memory range.

This allows userspace to determine the current state of BOs:
- DRM_XE_VMA_PURGEABLE_STATE_WILLNEED (0): BO actively needed, has backing store
- DRM_XE_VMA_PURGEABLE_STATE_DONTNEED (1): BO eligible for purging, still has backing
- DRM_XE_VMA_PURGEABLE_STATE_PURGED (2): BO purged, backing store freed (read-only)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 include/uapi/drm/xe_drm.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 02d63938d16f..8f289a2849ff 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -2147,10 +2147,14 @@ struct drm_xe_madvise {
 		 *  - DRM_XE_VMA_PURGEABLE_STATE_DONTNEED (1): Hints that BO is not
 		 *    currently needed. Kernel may purge it under memory pressure.
 		 *    Only applies to non-shared BOs. Returns retained=1 if not purged.
+		 *
+		 *  - DRM_XE_VMA_PURGEABLE_STATE_PURGED: Read-only state indicating
+		 *    the BO purge state.
 		 */
 		struct {
 #define DRM_XE_VMA_PURGEABLE_STATE_WILLNEED	0
 #define DRM_XE_VMA_PURGEABLE_STATE_DONTNEED	1
+#define DRM_XE_VMA_PURGEABLE_STATE_PURGED	2
 			/** @purge_state_val.val: value for DRM_XE_VMA_ATTR_PURGEABLE_STATE */
 			__u32 val;
 			/**
@@ -2224,6 +2228,15 @@ struct drm_xe_mem_range_attr {
 		__u32 reserved;
 	} pat_index;
 
+	/** @purge_state_val: Purgeable state configuration */
+	struct {
+		/** @purge_state_val.val: value for DRM_XE_VMA_ATTR_PURGEABLE_STATE */
+		__u32 val;
+
+		/** @purge_state_val.reserved: Reserved */
+		__u32 reserved;
+	} purge_state_val;
+
 	/** @reserved: Reserved */
 	__u64 reserved[2];
 };
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 9/9] drm/xe: Add support for querying purgeable BO states
  2025-12-01  5:50 [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (7 preceding siblings ...)
  2025-12-01  5:50 ` [RFC v2 8/9] drm/xe/uapi: Add UAPI for purgeable bo state to madvise query response Arvind Yadav
@ 2025-12-01  5:50 ` Arvind Yadav
  2025-12-02 18:36 ` [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects Souza, Jose
  9 siblings, 0 replies; 35+ messages in thread
From: Arvind Yadav @ 2025-12-01  5:50 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.brost, himal.prasad.ghimiray, thomas.hellstrom,
	pallavi.mishra

Add support for querying purgeable buffer object states through the
XE_VM_QUERY_MEM_RANGE_ATTRS ioctl. This allows userspace to determine
the current purgeable state of BO:
  - WILLNEED (0): BO actively needed, has backing store
  - DONTNEED (1): BO eligible for purging, still valid
  - PURGED (2): BO purged by kernel, backing store freed

v2:
  - Initialize purge_state_val for non-BO VMAs to avoid leaking kernel data.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index cc946bff9607..fd0550d901c6 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2023,6 +2023,7 @@ static int get_mem_attrs(struct xe_vm *vm, u32 *num_vmas, u64 start,
 
 	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, start, end) {
 		struct xe_vma *vma = gpuva_to_vma(gpuva);
+		struct xe_bo *bo;
 
 		if (i == *num_vmas)
 			return -ENOSPC;
@@ -2035,6 +2036,12 @@ static int get_mem_attrs(struct xe_vm *vm, u32 *num_vmas, u64 start,
 		attrs[i].preferred_mem_loc.migration_policy =
 		vma->attr.preferred_loc.migration_policy;
 
+		bo = xe_vma_bo(vma);
+		if (bo)
+			attrs[i].purge_state_val.val = atomic_read(&bo->madv_purgeable);
+		else /* Non-BO VMAs (userptr, null) have no purgeable state */
+			attrs[i].purge_state_val.val = DRM_XE_VMA_PURGEABLE_STATE_WILLNEED;
+
 		i++;
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC v2 1/9] drm/xe/uapi: Add UAPI support for purgeable buffer objects
  2025-12-01  5:50 ` [RFC v2 1/9] drm/xe/uapi: Add UAPI " Arvind Yadav
@ 2025-12-01 23:00   ` Matthew Brost
  2025-12-02  2:55     ` Yadav, Arvind
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Brost @ 2025-12-01 23:00 UTC (permalink / raw)
  To: Arvind Yadav
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra

On Mon, Dec 01, 2025 at 11:20:11AM +0530, Arvind Yadav wrote:
> From: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> 
> Extend the DRM_XE_MADVISE ioctl to support purgeable buffer object
> management by adding DRM_XE_VMA_ATTR_PURGEABLE_STATE attribute type.
> 
> This allows userspace applications to provide memory usage hints to
> the kernel for better memory management under pressure:
> 
> This allows userspace applications to provide memory usage hints to
> the kernel for better memory management under pressure:
> 
> - WILLNEED: Buffer is needed and should not be purged. If the BO was
>   previously purged, retained field returns 0 indicating backing store
>   was lost (once purged, always purged semantics matching i915).
> 
> - DONTNEED: Buffer is not currently needed and may be purged by the
>   kernel under memory pressure to free resources. Only applies to
>   non-shared BOs.
> 
> The implementation includes a 'retained' output field (matching i915's
> drm_i915_gem_madvise.retained) that indicates whether the BO's backing
> store still exists (1) or has been purged (0).
> 
> v2: Add PURGED state for read-only status, change ioctl to DRM_IOWR,
>     add retained field for i915 compatibility
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
>  include/uapi/drm/xe_drm.h | 37 ++++++++++++++++++++++++++++++++++++-
>  1 file changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 47853659a705..02d63938d16f 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -121,7 +121,7 @@ extern "C" {
>  #define DRM_IOCTL_XE_EXEC			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC, struct drm_xe_exec)
>  #define DRM_IOCTL_XE_WAIT_USER_FENCE		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence)
>  #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
> -#define DRM_IOCTL_XE_MADVISE			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise)
> +#define DRM_IOCTL_XE_MADVISE			DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise)

I'm not sure if this is something we are allowed to change per Linux
uAPI rules. I'd check with our maintainers (Thomas, Rodrigo) on this
one.

>  #define DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS	DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_VM_QUERY_MEM_RANGE_ATTRS, struct drm_xe_vm_query_mem_range_attr)
>  
>  /**
> @@ -2051,6 +2051,7 @@ struct drm_xe_madvise {
>  #define DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC	0
>  #define DRM_XE_MEM_RANGE_ATTR_ATOMIC		1
>  #define DRM_XE_MEM_RANGE_ATTR_PAT		2
> +#define DRM_XE_VMA_ATTR_PURGEABLE_STATE		3
>  	/** @type: type of attribute */
>  	__u32 type;
>  
> @@ -2129,6 +2130,40 @@ struct drm_xe_madvise {
>  			/** @pat_index.reserved: Reserved */
>  			__u64 reserved;
>  		} pat_index;
> +
> +		/**
> +		 * @purge_state_val: Purgeable state configuration
> +		 *
> +		 * Used when @type == DRM_XE_VMA_ATTR_PURGEABLE_STATE.
> +		 *
> +		 * Configures the purgeable state of buffer objects in the specified
> +		 * virtual address range. This allows applications to hint to the kernel
> +		 * about bo's usage patterns for better memory management.
> +		 *
> +		 * Supported values for @purge_state_val.val:
> +		 *  - DRM_XE_VMA_PURGEABLE_STATE_WILLNEED (0): Marks BO as needed.
> +		 *    If BO was purged, returns retained=0 (backing store lost).
> +		 *
> +		 *  - DRM_XE_VMA_PURGEABLE_STATE_DONTNEED (1): Hints that BO is not
> +		 *    currently needed. Kernel may purge it under memory pressure.
> +		 *    Only applies to non-shared BOs. Returns retained=1 if not purged.
> +		 */
> +		struct {
> +#define DRM_XE_VMA_PURGEABLE_STATE_WILLNEED	0
> +#define DRM_XE_VMA_PURGEABLE_STATE_DONTNEED	1
> +			/** @purge_state_val.val: value for DRM_XE_VMA_ATTR_PURGEABLE_STATE */
> +			__u32 val;
> +			/**
> +			 * @purge_state_val.retained: Whether the backing store still exists.
> +			 *
> +			 * Output field indicating if the BO's backing store is retained.
> +			 * Set to 1 if backing store exists, 0 if it has been purged.
> +			 * Similar to i915's drm_i915_gem_madvise.retained field.
> +			 */
> +			__u32 retained;

If we can't change the IOCTL to DRM_IOWR, then we could hack around this
restriction with making 'retained' a userptr which the madvise IOCTL
explictly copies back into rather than relying DRM IOCTL core to
implement the copy_to_user.

Matt

> +			/** @purge_state_val.reserved: Reserved */
> +			__u64 reserved;
> +		} purge_state_val;
>  	};
>  
>  	/** @reserved: Reserved */
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 2/9] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo
  2025-12-01  5:50 ` [RFC v2 2/9] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo Arvind Yadav
@ 2025-12-01 23:02   ` Matthew Brost
  2025-12-02  2:56     ` Yadav, Arvind
  2025-12-02 18:52   ` Matthew Brost
  1 sibling, 1 reply; 35+ messages in thread
From: Matthew Brost @ 2025-12-01 23:02 UTC (permalink / raw)
  To: Arvind Yadav
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra

On Mon, Dec 01, 2025 at 11:20:12AM +0530, Arvind Yadav wrote:
> Add infrastructure for tracking purgeable state of buffer objects.
> This includes:
> 
> Introduce enum xe_madv_purgeable_state with three states:
>    - XE_MADV_PURGEABLE_WILLNEED (0): BO is needed and should not be
>      purged. This is the default state for all BOs.
> 
>    - XE_MADV_PURGEABLE_DONTNEED (1): BO is not currently needed and
>      can be purged by the kernel under memory pressure to reclaim
>      resources. Only non-shared BOs can be marked as DONTNEED.
> 
>    - XE_MADV_PURGEABLE_PURGED (2): BO has been purged by the kernel.
>      Accessing a purged BO results in error. Follows i915 semantics
>      where once purged, the BO remains permanently invalid ("once
>      purged, always purged").
> 
> Add atomic_t madv field to struct xe_bo for state tracking
>   of purgeable state across concurrent access paths
> 
> v2: Add xe_bo_is_purged() helper, improve state documentation
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.h       | 27 +++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_bo_types.h |  3 +++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index 911d5b90461a..b0a31c77e612 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -85,6 +85,28 @@
>  
>  #define XE_PCI_BARRIER_MMAP_OFFSET	(0x50 << XE_PTE_SHIFT)
>  
> +/**
> + * enum xe_madv_purgeable_state - Buffer object purgeable state enumeration
> + *
> + * This enum defines the possible purgeable states for a buffer object,
> + * allowing userspace to provide memory usage hints to the kernel for
> + * better memory management under pressure.
> + *
> + * @XE_MADV_PURGEABLE_WILLNEED: The buffer object is needed and should not be purged.
> + * This is the default state.
> + * @XE_MADV_PURGEABLE_DONTNEED: The buffer object is not currently needed and can be
> + * purged by the kernel under memory pressure.
> + * @XE_MADV_PURGEABLE_PURGED: The buffer object has been purged by the kernel.
> + *
> + * Accessing a purged buffer will result in an error. Per i915 semantics,
> + * once purged, a BO remains permanently invalid and must be destroyed and recreated.
> + */
> +enum xe_madv_purgeable_state {
> +	XE_MADV_PURGEABLE_WILLNEED,
> +	XE_MADV_PURGEABLE_DONTNEED,
> +	XE_MADV_PURGEABLE_PURGED,
> +};
> +
>  struct sg_table;
>  
>  struct xe_bo *xe_bo_alloc(void);
> @@ -213,6 +235,11 @@ static inline bool xe_bo_is_protected(const struct xe_bo *bo)
>  	return bo->pxp_key_instance;
>  }
>  

Kernel doc.

Matt

> +static inline bool xe_bo_is_purged(struct xe_bo *bo)
> +{
> +	return atomic_read(&bo->madv_purgeable) == XE_MADV_PURGEABLE_PURGED;
> +}
> +
>  static inline void xe_bo_unpin_map_no_vm(struct xe_bo *bo)
>  {
>  	if (likely(bo)) {
> diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
> index d4fe3c8dca5b..57b4dc7012e2 100644
> --- a/drivers/gpu/drm/xe/xe_bo_types.h
> +++ b/drivers/gpu/drm/xe/xe_bo_types.h
> @@ -108,6 +108,9 @@ struct xe_bo {
>  	 * from default
>  	 */
>  	u64 min_align;
> +
> +	/** @madv_purgeable: user space advise on BO purgeability */
> +	atomic_t madv_purgeable;
>  };
>  
>  #endif
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 3/9] drm/xe/bo: Prevent purging of shared buffer objects
  2025-12-01  5:50 ` [RFC v2 3/9] drm/xe/bo: Prevent purging of shared buffer objects Arvind Yadav
@ 2025-12-01 23:10   ` Matthew Brost
  2025-12-02  3:42     ` Yadav, Arvind
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Brost @ 2025-12-01 23:10 UTC (permalink / raw)
  To: Arvind Yadav
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra

On Mon, Dec 01, 2025 at 11:20:13AM +0530, Arvind Yadav wrote:
> Introduce the `xe_bo_is_shared_locked()` inline helper to determine if a
> buffer object is shared across multiple clients or drivers. A buffer is
> considered shared if it is exported via dma-buf, imported, or has a
> handle count greater than one.
> 
> This check is critical for safely implementing purgeable memory. Purging
> a buffer that is shared would lead to data corruption for other clients
> that still hold a reference to it.
> 
> The kernel cannot safely determine when all clients are done with a
> shared buffer, so shared BOs must never be marked DONTNEED or purged.
> 
> The new helper is used in two key locations:
> 1.  In `xe_vm_madvise_purgeable_bo()`, to prevent userspace from
>     successfully marking a shared buffer as `DONTNEED`. This is the
>     primary safeguard against incorrect usage.
> 
> 2.  In `xe_bo_move()`, as a final safety check before the kernel
>     initiates a purge during eviction. This ensures that even if a
>     shared buffer were somehow marked `DONTNEED`, it would not be
>     purged.
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.h | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index b0a31c77e612..97edb38bf1ed 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -478,4 +478,34 @@ static inline bool xe_bo_is_mem_type(struct xe_bo *bo, u32 mem_type)
>  	xe_bo_assert_held(bo);
>  	return bo->ttm.resource->mem_type == mem_type;
>  }
> +
> +/**
> + * xe_bo_is_shared_locked - Check if a buffer object is shared
> + * @bo: The buffer object to check
> + *
> + * Determines if a buffer object is considered shared, which includes:
> + * - Exported via dma-buf (obj->dma_buf is set)
> + * - Imported from another driver (obj->import_attach is set)
> + * - Referenced by multiple clients (handle_count > 1)
> + *
> + * This check is used to prevent data loss on shared content by avoiding
> + * certain operations like purging on buffers that other processes or
> + * drivers might still be using.
> + *
> + * Return: true if the buffer object is shared, false otherwise.
> + */
> +static inline bool xe_bo_is_shared_locked(const struct xe_bo *bo)
> +{
> +	const struct drm_gem_object *obj = &bo->ttm.base;
> +

It seems like everything below here should be a new drm gem helper.

> +	dma_resv_assert_held(obj->resv);
> +
> +	if (obj->dma_buf || obj->import_attach)
> +		return true;
> +
> +	if (obj->handle_count > 1)

So this covers the case when we prime fd to handle but we resolve to a
BO (i.e., we don't do a dma-buf attach, rather just take reference on BO
as the BO is from the same device)? I just want to make sure I'm
understanding this part correctly. If so, maybe throw a comment in here
or update the functions kernel doc a bit with a better explaination.

Matt 

> +		return true;
> +
> +	return false;
> +}
>  #endif
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 4/9] drm/xe/madvise: Implement purgeable buffer object support
  2025-12-01  5:50 ` [RFC v2 4/9] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
@ 2025-12-02  1:46   ` Matthew Brost
  2025-12-02  4:01     ` Yadav, Arvind
  2025-12-02 21:39   ` Matthew Brost
  1 sibling, 1 reply; 35+ messages in thread
From: Matthew Brost @ 2025-12-02  1:46 UTC (permalink / raw)
  To: Arvind Yadav
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra

On Mon, Dec 01, 2025 at 11:20:14AM +0530, Arvind Yadav wrote:
> This allows userspace applications to provide memory usage hints to
> the kernel for better memory management under pressure:
> 
> Add the core implementation for purgeable buffer objects, enabling memory
> reclamation of user-designated DONTNEED buffers during eviction.
> 
> This patch implements the purge operation and state machine transitions:
> 
> Purgeable States (from xe_madv_purgeable_state):
>  - WILLNEED (0): BO should be retained, actively used
>  - DONTNEED (1): BO eligible for purging, not currently needed

Quick comment - should we use TTM priority levels so WILLNEED is a higher
priority (less likely to be evicted) than DONTNEED (more likely to be
evicted).

Expect more comments but just a quick thought.

Matt

>  - PURGED (2): BO backing store reclaimed, permanently invalid
> 
> Design Rationale:
>   - Async TLB invalidation via trigger_rebind (no blocking xe_vm_invalidate_vma)
>   - i915 compatibility: retained field, "once purged always purged" semantics
>   - Shared BO protection prevents multi-process memory corruption
>   - Scratch PTE reuse avoids new infrastructure, safe for fault mode
> 
> v2:
>   - Use xe_bo_trigger_rebind() for async TLB invalidation (Thomas Hellström)
>   - Add NULL rebind with scratch PTEs for fault mode (Thomas Hellström)
>   - Implement i915-compatible retained field logic (Thomas Hellström)
>   - Skip BO validation for purged BOs in page fault handler (crash fix)
>   - Add scratch VM check in page fault path (non-scratch VMs fail fault)
>   - Force clear_pt for non-scratch VMs to avoid phys addr 0 mapping (review fix)
>   - Add !is_purged check to resource cursor setup to prevent stale access
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c           | 72 ++++++++++++++++++++++-----
>  drivers/gpu/drm/xe/xe_gt_pagefault.c | 19 ++++++++
>  drivers/gpu/drm/xe/xe_pt.c           | 36 ++++++++++++--
>  drivers/gpu/drm/xe/xe_vm.c           | 11 ++++-
>  drivers/gpu/drm/xe/xe_vm_madvise.c   | 73 ++++++++++++++++++++++++++++
>  5 files changed, 193 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index cbc3ee157218..f0b3f7a13114 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -836,6 +836,53 @@ static int xe_bo_move_notify(struct xe_bo *bo,
>  	return 0;
>  }
>  
> +static void xe_bo_set_purged(struct xe_bo *bo)
> +{
> +	/* BO must be locked before modifying madv state */
> +	dma_resv_assert_held(bo->ttm.base.resv);
> +
> +	atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_PURGED);
> +}
> +
> +/**
> + * xe_ttm_bo_purge() - Purge buffer object backing store
> + * @ttm_bo: The TTM buffer object to purge
> + * @ctx: TTM operation context
> + *
> + * This function purges the backing store of a BO marked as DONTNEED and
> + * triggers rebind to invalidate stale GPU mappings. For fault-mode VMs,
> + * this zaps the PTEs. The next GPU access will trigger a page fault and
> + * perform NULL rebind (scratch pages or clear PTEs based on VM config).
> + */
> +static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
> +{
> +	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
> +	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
> +
> +	if (ttm_bo->ttm) {
> +		struct ttm_placement place = {};
> +		int ret = ttm_bo_validate(ttm_bo, &place, ctx);
> +
> +		drm_WARN_ON(&xe->drm, ret);
> +		if (!ret && bo) {
> +			if (atomic_read(&bo->madv_purgeable) == XE_MADV_PURGEABLE_DONTNEED) {
> +				xe_bo_set_purged(bo);
> +
> +				/*
> +				 * Trigger rebind to invalidate stale GPU mappings.
> +				 * - Non-fault mode: Marks VMAs for rebind
> +				 * - Fault mode: Zaps PTEs (sets to 0), next access triggers fault
> +				 *   and NULL rebind with scratch/clear PTEs per VM config
> +				 */
> +				ret = xe_bo_trigger_rebind(xe, bo, ctx);
> +				if (ret)
> +					drm_warn(&xe->drm,
> +						 "Failed to invalidate purged BO: %d\n", ret);
> +			}
> +		}
> +	}
> +}
> +
>  static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>  		      struct ttm_operation_ctx *ctx,
>  		      struct ttm_resource *new_mem,
> @@ -853,8 +900,18 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>  	bool needs_clear;
>  	bool handle_system_ccs = (!IS_DGFX(xe) && xe_bo_needs_ccs_pages(bo) &&
>  				  ttm && ttm_tt_is_populated(ttm)) ? true : false;
> +	int state = atomic_read(&bo->madv_purgeable);
>  	int ret = 0;
>  
> +	/*
> +	 * Purge only non-shared BOs explicitly marked DONTNEED by userspace.
> +	 * The move_notify callback will handle invalidation asynchronously.
> +	 */
> +	if (evict && state == XE_MADV_PURGEABLE_DONTNEED && !xe_bo_is_shared_locked(bo)) {
> +		xe_ttm_bo_purge(ttm_bo, ctx);
> +		return 0;
> +	}
> +
>  	/* Bo creation path, moving to system or TT. */
>  	if ((!old_mem && ttm) && !handle_system_ccs) {
>  		if (new_mem->mem_type == XE_PL_TT)
> @@ -1606,18 +1663,6 @@ static void xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object *ttm_bo)
>  	}
>  }
>  
> -static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
> -{
> -	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
> -
> -	if (ttm_bo->ttm) {
> -		struct ttm_placement place = {};
> -		int ret = ttm_bo_validate(ttm_bo, &place, ctx);
> -
> -		drm_WARN_ON(&xe->drm, ret);
> -	}
> -}
> -
>  static void xe_ttm_bo_swap_notify(struct ttm_buffer_object *ttm_bo)
>  {
>  	struct ttm_operation_ctx ctx = {
> @@ -2202,6 +2247,9 @@ struct xe_bo *xe_bo_init_locked(struct xe_device *xe, struct xe_bo *bo,
>  #endif
>  	INIT_LIST_HEAD(&bo->vram_userfault_link);
>  
> +	/* Initialize purge advisory state */
> +	atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_WILLNEED);
> +
>  	drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
>  
>  	if (resv) {
> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> index a054d6010ae0..8c7e5dcb627b 100644
> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> @@ -87,6 +87,13 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
>  	if (!bo)
>  		return 0;
>  
> +	/*
> +	 * Skip validation/migration for purged BOs - they have no backing pages.
> +	 * Rebind will use scratch PTEs instead.
> +	 */
> +	if (xe_bo_is_purged(bo))
> +		return 0;
> +
>  	return need_vram_move ? xe_bo_migrate(bo, vram->placement, NULL, exec) :
>  		xe_bo_validate(bo, vm, true, exec);
>  }
> @@ -100,9 +107,21 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
>  	struct drm_exec exec;
>  	struct dma_fence *fence;
>  	int err, needs_vram;
> +	struct xe_bo *bo;
>  
>  	lockdep_assert_held_write(&vm->lock);
>  
> +	/*
> +	 * Check if BO is purged. For purged BOs:
> +	 * - Scratch VMs: Allow rebind with scratch PTEs (safe zero reads)
> +	 * - Non-scratch VMs: FAIL the page fault (no scratch page available)
> +	 */
> +	bo = xe_vma_bo(vma);
> +	if (bo && xe_bo_is_purged(bo)) {
> +		if (!xe_vm_has_scratch(vm))
> +			return -EACCES;
> +	}
> +
>  	needs_vram = xe_vma_need_vram_for_atomic(vm->xe, vma, atomic);
>  	if (needs_vram < 0 || (needs_vram && xe_vma_is_userptr(vma)))
>  		return needs_vram < 0 ? needs_vram : -EACCES;
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index d22fd1ccc0ba..062f64b16a58 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -533,20 +533,26 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>  	/* Is this a leaf entry ?*/
>  	if (level == 0 || xe_pt_hugepte_possible(addr, next, level, xe_walk)) {
>  		struct xe_res_cursor *curs = xe_walk->curs;
> +		struct xe_bo *bo = xe_vma_bo(xe_walk->vma);
>  		bool is_null = xe_vma_is_null(xe_walk->vma);
> -		bool is_vram = is_null ? false : xe_res_is_vram(curs);
> +		bool is_purged = bo && xe_bo_is_purged(bo);
> +		bool is_vram = (is_null || is_purged) ? false : xe_res_is_vram(curs);
>  
>  		XE_WARN_ON(xe_walk->va_curs_start != addr);
>  
>  		if (xe_walk->clear_pt) {
>  			pte = 0;
>  		} else {
> -			pte = vm->pt_ops->pte_encode_vma(is_null ? 0 :
> +			/*
> +			 * For purged BOs, treat like null VMAs - pass address 0.
> +			 * The pte_encode_vma will set XE_PTE_NULL flag for scratch mapping.
> +			 */
> +			pte = vm->pt_ops->pte_encode_vma((is_null || is_purged) ? 0 :
>  							 xe_res_dma(curs) +
>  							 xe_walk->dma_offset,
>  							 xe_walk->vma,
>  							 pat_index, level);
> -			if (!is_null)
> +			if (!is_null && !is_purged)
>  				pte |= is_vram ? xe_walk->default_vram_pte :
>  					xe_walk->default_system_pte;
>  
> @@ -570,7 +576,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>  		if (unlikely(ret))
>  			return ret;
>  
> -		if (!is_null && !xe_walk->clear_pt)
> +		if (!is_null && !is_purged && !xe_walk->clear_pt)
>  			xe_res_next(curs, next - addr);
>  		xe_walk->va_curs_start = next;
>  		xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << level);
> @@ -723,6 +729,26 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>  	};
>  	struct xe_pt *pt = vm->pt_root[tile->id];
>  	int ret;
> +	bool is_purged = false;
> +
> +	/*
> +	 * Check if BO is purged:
> +	 * - Scratch VMs: Use scratch PTEs (XE_PTE_NULL) for safe zero reads
> +	 * - Non-scratch VMs: Clear PTEs to zero (non-present) to avoid mapping to phys addr 0
> +	 *
> +	 * For non-scratch VMs, we force clear_pt=true so leaf PTEs become completely
> +	 * zero instead of creating a PRESENT mapping to physical address 0.
> +	 */
> +	if (bo && xe_bo_is_purged(bo)) {
> +		is_purged = true;
> +
> +		/*
> +		 * For non-scratch VMs, a NULL rebind should use zero PTEs
> +		 * (non-present), not a present PTE to phys 0.
> +		 */
> +		if (!xe_vm_has_scratch(vm))
> +			xe_walk.clear_pt = true;
> +	}
>  
>  	if (range) {
>  		/* Move this entire thing to xe_svm.c? */
> @@ -762,7 +788,7 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>  	if (!range)
>  		xe_bo_assert_held(bo);
>  
> -	if (!xe_vma_is_null(vma) && !range) {
> +	if (!xe_vma_is_null(vma) && !range && !is_purged) {
>  		if (xe_vma_is_userptr(vma))
>  			xe_res_first_dma(to_userptr_vma(vma)->userptr.pages.dma_addr, 0,
>  					 xe_vma_size(vma), &curs);
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 10d77666a425..d03e69524369 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1336,6 +1336,9 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
>  static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>  			       u16 pat_index, u32 pt_level)
>  {
> +	struct xe_bo *bo = xe_vma_bo(vma);
> +	struct xe_vm *vm = xe_vma_vm(vma);
> +
>  	pte |= XE_PAGE_PRESENT;
>  
>  	if (likely(!xe_vma_read_only(vma)))
> @@ -1344,7 +1347,13 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>  	pte |= pte_encode_pat_index(pat_index, pt_level);
>  	pte |= pte_encode_ps(pt_level);
>  
> -	if (unlikely(xe_vma_is_null(vma)))
> +	/*
> +	 * NULL PTEs redirect to scratch page (return zeros on read).
> +	 * Set for: 1) explicit null VMAs, 2) purged BOs on scratch VMs.
> +	 * Never set NULL flag without scratch page - causes undefined behavior.
> +	 */
> +	if (unlikely(xe_vma_is_null(vma) ||
> +		     (bo && xe_bo_is_purged(bo) && xe_vm_has_scratch(vm))))
>  		pte |= XE_PTE_NULL;
>  
>  	return pte;
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index cad3cf627c3f..3ba851e0b870 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -158,6 +158,60 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
>  	}
>  }
>  
> +/*
> + * Handle purgeable buffer object advice for DONTNEED/WILLNEED/PURGED.
> + * Updates op->purge_state_val.retained to indicate if backing store
> + * exists (matches i915's retained).
> + */
> +static void xe_vm_madvise_purgeable_bo(struct xe_device *xe, struct xe_vm *vm,
> +				       struct xe_vma **vmas, int num_vmas,
> +				       struct drm_xe_madvise *op)
> +{
> +	bool has_purged_bo = false;
> +	int i;
> +
> +	xe_assert(vm->xe, op->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE);
> +
> +	for (i = 0; i < num_vmas; i++) {
> +		struct xe_bo *bo = xe_vma_bo(vmas[i]);
> +
> +		if (!bo)
> +			continue;
> +
> +		/* BO must be locked before modifying madv state */
> +		dma_resv_assert_held(bo->ttm.base.resv);
> +
> +		/*
> +		 * Once purged, always purged. Cannot transition back to WILLNEED.
> +		 * This matches i915 semantics where purged BOs are permanently invalid.
> +		 */
> +		if (xe_bo_is_purged(bo)) {
> +			has_purged_bo = true;
> +			continue;
> +		}
> +
> +		switch (op->purge_state_val.val) {
> +		case DRM_XE_VMA_PURGEABLE_STATE_WILLNEED:
> +			atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_WILLNEED);
> +			break;
> +		case DRM_XE_VMA_PURGEABLE_STATE_DONTNEED:
> +			if (!xe_bo_is_shared_locked(bo))
> +				atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_DONTNEED);
> +			break;
> +		default:
> +			drm_warn(&vm->xe->drm, "Invalid madvice value = %d\n",
> +				 op->purge_state_val.val);
> +			return;
> +		}
> +	}
> +
> +	/*
> +	 * Set retained flag to indicate if backing store still exists.
> +	 * Matches i915: retained = 1 if not purged, 0 if purged.
> +	 */
> +	op->purge_state_val.retained = !has_purged_bo;
> +}
> +
>  typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
>  			     struct xe_vma **vmas, int num_vmas,
>  			     struct drm_xe_madvise *op);
> @@ -283,6 +337,19 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
>  			return false;
>  		break;
>  	}
> +	case DRM_XE_VMA_ATTR_PURGEABLE_STATE:
> +	{
> +		u32 val = args->purge_state_val.val;
> +
> +		if (XE_IOCTL_DBG(xe, !((val == DRM_XE_VMA_PURGEABLE_STATE_WILLNEED) ||
> +				       (val == DRM_XE_VMA_PURGEABLE_STATE_DONTNEED))))
> +			return false;
> +
> +		if (XE_IOCTL_DBG(xe, args->purge_state_val.reserved))
> +			return false;
> +
> +		break;
> +	}
>  	default:
>  		if (XE_IOCTL_DBG(xe, 1))
>  			return false;
> @@ -402,6 +469,12 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
>  					goto err_fini;
>  			}
>  		}
> +		if (args->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE) {
> +			xe_vm_madvise_purgeable_bo(xe, vm, madvise_range.vmas,
> +						   madvise_range.num_vmas, args);
> +			goto err_fini;
> +
> +		}
>  	}
>  
>  	if (madvise_range.has_svm_userptr_vmas) {
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 1/9] drm/xe/uapi: Add UAPI support for purgeable buffer objects
  2025-12-01 23:00   ` Matthew Brost
@ 2025-12-02  2:55     ` Yadav, Arvind
  0 siblings, 0 replies; 35+ messages in thread
From: Yadav, Arvind @ 2025-12-02  2:55 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra


On 02-12-2025 04:30, Matthew Brost wrote:
> On Mon, Dec 01, 2025 at 11:20:11AM +0530, Arvind Yadav wrote:
>> From: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>>
>> Extend the DRM_XE_MADVISE ioctl to support purgeable buffer object
>> management by adding DRM_XE_VMA_ATTR_PURGEABLE_STATE attribute type.
>>
>> This allows userspace applications to provide memory usage hints to
>> the kernel for better memory management under pressure:
>>
>> This allows userspace applications to provide memory usage hints to
>> the kernel for better memory management under pressure:
>>
>> - WILLNEED: Buffer is needed and should not be purged. If the BO was
>>    previously purged, retained field returns 0 indicating backing store
>>    was lost (once purged, always purged semantics matching i915).
>>
>> - DONTNEED: Buffer is not currently needed and may be purged by the
>>    kernel under memory pressure to free resources. Only applies to
>>    non-shared BOs.
>>
>> The implementation includes a 'retained' output field (matching i915's
>> drm_i915_gem_madvise.retained) that indicates whether the BO's backing
>> store still exists (1) or has been purged (0).
>>
>> v2: Add PURGED state for read-only status, change ioctl to DRM_IOWR,
>>      add retained field for i915 compatibility
>>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
>> ---
>>   include/uapi/drm/xe_drm.h | 37 ++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 36 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 47853659a705..02d63938d16f 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -121,7 +121,7 @@ extern "C" {
>>   #define DRM_IOCTL_XE_EXEC			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC, struct drm_xe_exec)
>>   #define DRM_IOCTL_XE_WAIT_USER_FENCE		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence)
>>   #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
>> -#define DRM_IOCTL_XE_MADVISE			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise)
>> +#define DRM_IOCTL_XE_MADVISE			DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise)
> I'm not sure if this is something we are allowed to change per Linux
> uAPI rules. I'd check with our maintainers (Thomas, Rodrigo) on this
> one.
>
>>   #define DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS	DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_VM_QUERY_MEM_RANGE_ATTRS, struct drm_xe_vm_query_mem_range_attr)
>>   
>>   /**
>> @@ -2051,6 +2051,7 @@ struct drm_xe_madvise {
>>   #define DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC	0
>>   #define DRM_XE_MEM_RANGE_ATTR_ATOMIC		1
>>   #define DRM_XE_MEM_RANGE_ATTR_PAT		2
>> +#define DRM_XE_VMA_ATTR_PURGEABLE_STATE		3
>>   	/** @type: type of attribute */
>>   	__u32 type;
>>   
>> @@ -2129,6 +2130,40 @@ struct drm_xe_madvise {
>>   			/** @pat_index.reserved: Reserved */
>>   			__u64 reserved;
>>   		} pat_index;
>> +
>> +		/**
>> +		 * @purge_state_val: Purgeable state configuration
>> +		 *
>> +		 * Used when @type == DRM_XE_VMA_ATTR_PURGEABLE_STATE.
>> +		 *
>> +		 * Configures the purgeable state of buffer objects in the specified
>> +		 * virtual address range. This allows applications to hint to the kernel
>> +		 * about bo's usage patterns for better memory management.
>> +		 *
>> +		 * Supported values for @purge_state_val.val:
>> +		 *  - DRM_XE_VMA_PURGEABLE_STATE_WILLNEED (0): Marks BO as needed.
>> +		 *    If BO was purged, returns retained=0 (backing store lost).
>> +		 *
>> +		 *  - DRM_XE_VMA_PURGEABLE_STATE_DONTNEED (1): Hints that BO is not
>> +		 *    currently needed. Kernel may purge it under memory pressure.
>> +		 *    Only applies to non-shared BOs. Returns retained=1 if not purged.
>> +		 */
>> +		struct {
>> +#define DRM_XE_VMA_PURGEABLE_STATE_WILLNEED	0
>> +#define DRM_XE_VMA_PURGEABLE_STATE_DONTNEED	1
>> +			/** @purge_state_val.val: value for DRM_XE_VMA_ATTR_PURGEABLE_STATE */
>> +			__u32 val;
>> +			/**
>> +			 * @purge_state_val.retained: Whether the backing store still exists.
>> +			 *
>> +			 * Output field indicating if the BO's backing store is retained.
>> +			 * Set to 1 if backing store exists, 0 if it has been purged.
>> +			 * Similar to i915's drm_i915_gem_madvise.retained field.
>> +			 */
>> +			__u32 retained;
> If we can't change the IOCTL to DRM_IOWR, then we could hack around this
> restriction with making 'retained' a userptr which the madvise IOCTL
> explictly copies back into rather than relying DRM IOCTL core to
> implement the copy_to_user.

Noted. I will do the changes as per your suggestion.


~Arvind
>
> Matt
>
>> +			/** @purge_state_val.reserved: Reserved */
>> +			__u64 reserved;
>> +		} purge_state_val;
>>   	};
>>   
>>   	/** @reserved: Reserved */
>> -- 
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 2/9] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo
  2025-12-01 23:02   ` Matthew Brost
@ 2025-12-02  2:56     ` Yadav, Arvind
  0 siblings, 0 replies; 35+ messages in thread
From: Yadav, Arvind @ 2025-12-02  2:56 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra


On 02-12-2025 04:32, Matthew Brost wrote:
> On Mon, Dec 01, 2025 at 11:20:12AM +0530, Arvind Yadav wrote:
>> Add infrastructure for tracking purgeable state of buffer objects.
>> This includes:
>>
>> Introduce enum xe_madv_purgeable_state with three states:
>>     - XE_MADV_PURGEABLE_WILLNEED (0): BO is needed and should not be
>>       purged. This is the default state for all BOs.
>>
>>     - XE_MADV_PURGEABLE_DONTNEED (1): BO is not currently needed and
>>       can be purged by the kernel under memory pressure to reclaim
>>       resources. Only non-shared BOs can be marked as DONTNEED.
>>
>>     - XE_MADV_PURGEABLE_PURGED (2): BO has been purged by the kernel.
>>       Accessing a purged BO results in error. Follows i915 semantics
>>       where once purged, the BO remains permanently invalid ("once
>>       purged, always purged").
>>
>> Add atomic_t madv field to struct xe_bo for state tracking
>>    of purgeable state across concurrent access paths
>>
>> v2: Add xe_bo_is_purged() helper, improve state documentation
>>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_bo.h       | 27 +++++++++++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_bo_types.h |  3 +++
>>   2 files changed, 30 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
>> index 911d5b90461a..b0a31c77e612 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.h
>> +++ b/drivers/gpu/drm/xe/xe_bo.h
>> @@ -85,6 +85,28 @@
>>   
>>   #define XE_PCI_BARRIER_MMAP_OFFSET	(0x50 << XE_PTE_SHIFT)
>>   
>> +/**
>> + * enum xe_madv_purgeable_state - Buffer object purgeable state enumeration
>> + *
>> + * This enum defines the possible purgeable states for a buffer object,
>> + * allowing userspace to provide memory usage hints to the kernel for
>> + * better memory management under pressure.
>> + *
>> + * @XE_MADV_PURGEABLE_WILLNEED: The buffer object is needed and should not be purged.
>> + * This is the default state.
>> + * @XE_MADV_PURGEABLE_DONTNEED: The buffer object is not currently needed and can be
>> + * purged by the kernel under memory pressure.
>> + * @XE_MADV_PURGEABLE_PURGED: The buffer object has been purged by the kernel.
>> + *
>> + * Accessing a purged buffer will result in an error. Per i915 semantics,
>> + * once purged, a BO remains permanently invalid and must be destroyed and recreated.
>> + */
>> +enum xe_madv_purgeable_state {
>> +	XE_MADV_PURGEABLE_WILLNEED,
>> +	XE_MADV_PURGEABLE_DONTNEED,
>> +	XE_MADV_PURGEABLE_PURGED,
>> +};
>> +
>>   struct sg_table;
>>   
>>   struct xe_bo *xe_bo_alloc(void);
>> @@ -213,6 +235,11 @@ static inline bool xe_bo_is_protected(const struct xe_bo *bo)
>>   	return bo->pxp_key_instance;
>>   }
>>   
> Kernel doc.

Noted.

~Arvind

>
> Matt
>
>> +static inline bool xe_bo_is_purged(struct xe_bo *bo)
>> +{
>> +	return atomic_read(&bo->madv_purgeable) == XE_MADV_PURGEABLE_PURGED;
>> +}
>> +
>>   static inline void xe_bo_unpin_map_no_vm(struct xe_bo *bo)
>>   {
>>   	if (likely(bo)) {
>> diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
>> index d4fe3c8dca5b..57b4dc7012e2 100644
>> --- a/drivers/gpu/drm/xe/xe_bo_types.h
>> +++ b/drivers/gpu/drm/xe/xe_bo_types.h
>> @@ -108,6 +108,9 @@ struct xe_bo {
>>   	 * from default
>>   	 */
>>   	u64 min_align;
>> +
>> +	/** @madv_purgeable: user space advise on BO purgeability */
>> +	atomic_t madv_purgeable;
>>   };
>>   
>>   #endif
>> -- 
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 3/9] drm/xe/bo: Prevent purging of shared buffer objects
  2025-12-01 23:10   ` Matthew Brost
@ 2025-12-02  3:42     ` Yadav, Arvind
  2025-12-02  9:42       ` Thomas Hellström
  0 siblings, 1 reply; 35+ messages in thread
From: Yadav, Arvind @ 2025-12-02  3:42 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra


On 02-12-2025 04:40, Matthew Brost wrote:
> On Mon, Dec 01, 2025 at 11:20:13AM +0530, Arvind Yadav wrote:
>> Introduce the `xe_bo_is_shared_locked()` inline helper to determine if a
>> buffer object is shared across multiple clients or drivers. A buffer is
>> considered shared if it is exported via dma-buf, imported, or has a
>> handle count greater than one.
>>
>> This check is critical for safely implementing purgeable memory. Purging
>> a buffer that is shared would lead to data corruption for other clients
>> that still hold a reference to it.
>>
>> The kernel cannot safely determine when all clients are done with a
>> shared buffer, so shared BOs must never be marked DONTNEED or purged.
>>
>> The new helper is used in two key locations:
>> 1.  In `xe_vm_madvise_purgeable_bo()`, to prevent userspace from
>>      successfully marking a shared buffer as `DONTNEED`. This is the
>>      primary safeguard against incorrect usage.
>>
>> 2.  In `xe_bo_move()`, as a final safety check before the kernel
>>      initiates a purge during eviction. This ensures that even if a
>>      shared buffer were somehow marked `DONTNEED`, it would not be
>>      purged.
>>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_bo.h | 30 ++++++++++++++++++++++++++++++
>>   1 file changed, 30 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
>> index b0a31c77e612..97edb38bf1ed 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.h
>> +++ b/drivers/gpu/drm/xe/xe_bo.h
>> @@ -478,4 +478,34 @@ static inline bool xe_bo_is_mem_type(struct xe_bo *bo, u32 mem_type)
>>   	xe_bo_assert_held(bo);
>>   	return bo->ttm.resource->mem_type == mem_type;
>>   }
>> +
>> +/**
>> + * xe_bo_is_shared_locked - Check if a buffer object is shared
>> + * @bo: The buffer object to check
>> + *
>> + * Determines if a buffer object is considered shared, which includes:
>> + * - Exported via dma-buf (obj->dma_buf is set)
>> + * - Imported from another driver (obj->import_attach is set)
>> + * - Referenced by multiple clients (handle_count > 1)
>> + *
>> + * This check is used to prevent data loss on shared content by avoiding
>> + * certain operations like purging on buffers that other processes or
>> + * drivers might still be using.
>> + *
>> + * Return: true if the buffer object is shared, false otherwise.
>> + */
>> +static inline bool xe_bo_is_shared_locked(const struct xe_bo *bo)
>> +{
>> +	const struct drm_gem_object *obj = &bo->ttm.base;
>> +
> It seems like everything below here should be a new drm gem helper.
There is a DRM helper 'drm_gem_object_is_shared_for_memory_stats()', but 
it's
specifically scoped for fdinfo memory accounting and doesn't check 
import_attach.
>> +	dma_resv_assert_held(obj->resv);
>> +
>> +	if (obj->dma_buf || obj->import_attach)
>> +		return true;
>> +
>> +	if (obj->handle_count > 1)
> So this covers the case when we prime fd to handle but we resolve to a
> BO (i.e., we don't do a dma-buf attach, rather just take reference on BO
> as the BO is from the same device)? I just want to make sure I'm
> understanding this part correctly. If so, maybe throw a comment in here
> or update the functions kernel doc a bit with a better explaination.
Yes, that's correct! The handle_count > 1 check covers exactly that 
scenario:
When we do prime fd-to-handle but both processes are using the same xe 
device,
we don't do a dma-buf attach. Instead, we just increment the reference 
count
and handle_count on the same xe_bo.
I will add and update the function with comments.

~Arvind
> Matt
>
>> +		return true;
>> +
>> +	return false;
>> +}
>>   #endif
>> -- 
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 4/9] drm/xe/madvise: Implement purgeable buffer object support
  2025-12-02  1:46   ` Matthew Brost
@ 2025-12-02  4:01     ` Yadav, Arvind
  0 siblings, 0 replies; 35+ messages in thread
From: Yadav, Arvind @ 2025-12-02  4:01 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra

[-- Attachment #1: Type: text/plain, Size: 15667 bytes --]


On 02-12-2025 07:16, Matthew Brost wrote:
> On Mon, Dec 01, 2025 at 11:20:14AM +0530, Arvind Yadav wrote:
>> This allows userspace applications to provide memory usage hints to
>> the kernel for better memory management under pressure:
>>
>> Add the core implementation for purgeable buffer objects, enabling memory
>> reclamation of user-designated DONTNEED buffers during eviction.
>>
>> This patch implements the purge operation and state machine transitions:
>>
>> Purgeable States (from xe_madv_purgeable_state):
>>   - WILLNEED (0): BO should be retained, actively used
>>   - DONTNEED (1): BO eligible for purging, not currently needed
> Quick comment - should we use TTM priority levels so WILLNEED is a higher
> priority (less likely to be evicted) than DONTNEED (more likely to be
> evicted).
>
> Expect more comments but just a quick thought.


Yes, we should leverage TTM priority levels for better eviction ordering.
Currently TTM has separate LRU lists per priority 
(man->lru[TTM_MAX_BO_PRIORITY]),
and eviction walks the lists starting from lower priority BOs first.
  1. Set DONTNEED BOs to priority 0 (evicted first, before normal BOs)
  2. Keep WILLNEED BOs at priority ' XE_BO_PRIORITY_NORMAL' (normal 
eviction order)

~Arvind

> Matt
>
>>   - PURGED (2): BO backing store reclaimed, permanently invalid
>>
>> Design Rationale:
>>    - Async TLB invalidation via trigger_rebind (no blocking xe_vm_invalidate_vma)
>>    - i915 compatibility: retained field, "once purged always purged" semantics
>>    - Shared BO protection prevents multi-process memory corruption
>>    - Scratch PTE reuse avoids new infrastructure, safe for fault mode
>>
>> v2:
>>    - Use xe_bo_trigger_rebind() for async TLB invalidation (Thomas Hellström)
>>    - Add NULL rebind with scratch PTEs for fault mode (Thomas Hellström)
>>    - Implement i915-compatible retained field logic (Thomas Hellström)
>>    - Skip BO validation for purged BOs in page fault handler (crash fix)
>>    - Add scratch VM check in page fault path (non-scratch VMs fail fault)
>>    - Force clear_pt for non-scratch VMs to avoid phys addr 0 mapping (review fix)
>>    - Add !is_purged check to resource cursor setup to prevent stale access
>>
>> Cc: Matthew Brost<matthew.brost@intel.com>
>> Cc: Thomas Hellström<thomas.hellstrom@linux.intel.com>
>> Cc: Himal Prasad Ghimiray<himal.prasad.ghimiray@intel.com>
>> Signed-off-by: Arvind Yadav<arvind.yadav@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_bo.c           | 72 ++++++++++++++++++++++-----
>>   drivers/gpu/drm/xe/xe_gt_pagefault.c | 19 ++++++++
>>   drivers/gpu/drm/xe/xe_pt.c           | 36 ++++++++++++--
>>   drivers/gpu/drm/xe/xe_vm.c           | 11 ++++-
>>   drivers/gpu/drm/xe/xe_vm_madvise.c   | 73 ++++++++++++++++++++++++++++
>>   5 files changed, 193 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>> index cbc3ee157218..f0b3f7a13114 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.c
>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>> @@ -836,6 +836,53 @@ static int xe_bo_move_notify(struct xe_bo *bo,
>>   	return 0;
>>   }
>>   
>> +static void xe_bo_set_purged(struct xe_bo *bo)
>> +{
>> +	/* BO must be locked before modifying madv state */
>> +	dma_resv_assert_held(bo->ttm.base.resv);
>> +
>> +	atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_PURGED);
>> +}
>> +
>> +/**
>> + * xe_ttm_bo_purge() - Purge buffer object backing store
>> + * @ttm_bo: The TTM buffer object to purge
>> + * @ctx: TTM operation context
>> + *
>> + * This function purges the backing store of a BO marked as DONTNEED and
>> + * triggers rebind to invalidate stale GPU mappings. For fault-mode VMs,
>> + * this zaps the PTEs. The next GPU access will trigger a page fault and
>> + * perform NULL rebind (scratch pages or clear PTEs based on VM config).
>> + */
>> +static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
>> +{
>> +	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
>> +	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
>> +
>> +	if (ttm_bo->ttm) {
>> +		struct ttm_placement place = {};
>> +		int ret = ttm_bo_validate(ttm_bo, &place, ctx);
>> +
>> +		drm_WARN_ON(&xe->drm, ret);
>> +		if (!ret && bo) {
>> +			if (atomic_read(&bo->madv_purgeable) == XE_MADV_PURGEABLE_DONTNEED) {
>> +				xe_bo_set_purged(bo);
>> +
>> +				/*
>> +				 * Trigger rebind to invalidate stale GPU mappings.
>> +				 * - Non-fault mode: Marks VMAs for rebind
>> +				 * - Fault mode: Zaps PTEs (sets to 0), next access triggers fault
>> +				 *   and NULL rebind with scratch/clear PTEs per VM config
>> +				 */
>> +				ret = xe_bo_trigger_rebind(xe, bo, ctx);
>> +				if (ret)
>> +					drm_warn(&xe->drm,
>> +						 "Failed to invalidate purged BO: %d\n", ret);
>> +			}
>> +		}
>> +	}
>> +}
>> +
>>   static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>>   		      struct ttm_operation_ctx *ctx,
>>   		      struct ttm_resource *new_mem,
>> @@ -853,8 +900,18 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>>   	bool needs_clear;
>>   	bool handle_system_ccs = (!IS_DGFX(xe) && xe_bo_needs_ccs_pages(bo) &&
>>   				  ttm && ttm_tt_is_populated(ttm)) ? true : false;
>> +	int state = atomic_read(&bo->madv_purgeable);
>>   	int ret = 0;
>>   
>> +	/*
>> +	 * Purge only non-shared BOs explicitly marked DONTNEED by userspace.
>> +	 * The move_notify callback will handle invalidation asynchronously.
>> +	 */
>> +	if (evict && state == XE_MADV_PURGEABLE_DONTNEED && !xe_bo_is_shared_locked(bo)) {
>> +		xe_ttm_bo_purge(ttm_bo, ctx);
>> +		return 0;
>> +	}
>> +
>>   	/* Bo creation path, moving to system or TT. */
>>   	if ((!old_mem && ttm) && !handle_system_ccs) {
>>   		if (new_mem->mem_type == XE_PL_TT)
>> @@ -1606,18 +1663,6 @@ static void xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object *ttm_bo)
>>   	}
>>   }
>>   
>> -static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
>> -{
>> -	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
>> -
>> -	if (ttm_bo->ttm) {
>> -		struct ttm_placement place = {};
>> -		int ret = ttm_bo_validate(ttm_bo, &place, ctx);
>> -
>> -		drm_WARN_ON(&xe->drm, ret);
>> -	}
>> -}
>> -
>>   static void xe_ttm_bo_swap_notify(struct ttm_buffer_object *ttm_bo)
>>   {
>>   	struct ttm_operation_ctx ctx = {
>> @@ -2202,6 +2247,9 @@ struct xe_bo *xe_bo_init_locked(struct xe_device *xe, struct xe_bo *bo,
>>   #endif
>>   	INIT_LIST_HEAD(&bo->vram_userfault_link);
>>   
>> +	/* Initialize purge advisory state */
>> +	atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_WILLNEED);
>> +
>>   	drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
>>   
>>   	if (resv) {
>> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
>> index a054d6010ae0..8c7e5dcb627b 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
>> @@ -87,6 +87,13 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
>>   	if (!bo)
>>   		return 0;
>>   
>> +	/*
>> +	 * Skip validation/migration for purged BOs - they have no backing pages.
>> +	 * Rebind will use scratch PTEs instead.
>> +	 */
>> +	if (xe_bo_is_purged(bo))
>> +		return 0;
>> +
>>   	return need_vram_move ? xe_bo_migrate(bo, vram->placement, NULL, exec) :
>>   		xe_bo_validate(bo, vm, true, exec);
>>   }
>> @@ -100,9 +107,21 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
>>   	struct drm_exec exec;
>>   	struct dma_fence *fence;
>>   	int err, needs_vram;
>> +	struct xe_bo *bo;
>>   
>>   	lockdep_assert_held_write(&vm->lock);
>>   
>> +	/*
>> +	 * Check if BO is purged. For purged BOs:
>> +	 * - Scratch VMs: Allow rebind with scratch PTEs (safe zero reads)
>> +	 * - Non-scratch VMs: FAIL the page fault (no scratch page available)
>> +	 */
>> +	bo = xe_vma_bo(vma);
>> +	if (bo && xe_bo_is_purged(bo)) {
>> +		if (!xe_vm_has_scratch(vm))
>> +			return -EACCES;
>> +	}
>> +
>>   	needs_vram = xe_vma_need_vram_for_atomic(vm->xe, vma, atomic);
>>   	if (needs_vram < 0 || (needs_vram && xe_vma_is_userptr(vma)))
>>   		return needs_vram < 0 ? needs_vram : -EACCES;
>> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
>> index d22fd1ccc0ba..062f64b16a58 100644
>> --- a/drivers/gpu/drm/xe/xe_pt.c
>> +++ b/drivers/gpu/drm/xe/xe_pt.c
>> @@ -533,20 +533,26 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>>   	/* Is this a leaf entry ?*/
>>   	if (level == 0 || xe_pt_hugepte_possible(addr, next, level, xe_walk)) {
>>   		struct xe_res_cursor *curs = xe_walk->curs;
>> +		struct xe_bo *bo = xe_vma_bo(xe_walk->vma);
>>   		bool is_null = xe_vma_is_null(xe_walk->vma);
>> -		bool is_vram = is_null ? false : xe_res_is_vram(curs);
>> +		bool is_purged = bo && xe_bo_is_purged(bo);
>> +		bool is_vram = (is_null || is_purged) ? false : xe_res_is_vram(curs);
>>   
>>   		XE_WARN_ON(xe_walk->va_curs_start != addr);
>>   
>>   		if (xe_walk->clear_pt) {
>>   			pte = 0;
>>   		} else {
>> -			pte = vm->pt_ops->pte_encode_vma(is_null ? 0 :
>> +			/*
>> +			 * For purged BOs, treat like null VMAs - pass address 0.
>> +			 * The pte_encode_vma will set XE_PTE_NULL flag for scratch mapping.
>> +			 */
>> +			pte = vm->pt_ops->pte_encode_vma((is_null || is_purged) ? 0 :
>>   							 xe_res_dma(curs) +
>>   							 xe_walk->dma_offset,
>>   							 xe_walk->vma,
>>   							 pat_index, level);
>> -			if (!is_null)
>> +			if (!is_null && !is_purged)
>>   				pte |= is_vram ? xe_walk->default_vram_pte :
>>   					xe_walk->default_system_pte;
>>   
>> @@ -570,7 +576,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>>   		if (unlikely(ret))
>>   			return ret;
>>   
>> -		if (!is_null && !xe_walk->clear_pt)
>> +		if (!is_null && !is_purged && !xe_walk->clear_pt)
>>   			xe_res_next(curs, next - addr);
>>   		xe_walk->va_curs_start = next;
>>   		xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << level);
>> @@ -723,6 +729,26 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>>   	};
>>   	struct xe_pt *pt = vm->pt_root[tile->id];
>>   	int ret;
>> +	bool is_purged = false;
>> +
>> +	/*
>> +	 * Check if BO is purged:
>> +	 * - Scratch VMs: Use scratch PTEs (XE_PTE_NULL) for safe zero reads
>> +	 * - Non-scratch VMs: Clear PTEs to zero (non-present) to avoid mapping to phys addr 0
>> +	 *
>> +	 * For non-scratch VMs, we force clear_pt=true so leaf PTEs become completely
>> +	 * zero instead of creating a PRESENT mapping to physical address 0.
>> +	 */
>> +	if (bo && xe_bo_is_purged(bo)) {
>> +		is_purged = true;
>> +
>> +		/*
>> +		 * For non-scratch VMs, a NULL rebind should use zero PTEs
>> +		 * (non-present), not a present PTE to phys 0.
>> +		 */
>> +		if (!xe_vm_has_scratch(vm))
>> +			xe_walk.clear_pt = true;
>> +	}
>>   
>>   	if (range) {
>>   		/* Move this entire thing to xe_svm.c? */
>> @@ -762,7 +788,7 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>>   	if (!range)
>>   		xe_bo_assert_held(bo);
>>   
>> -	if (!xe_vma_is_null(vma) && !range) {
>> +	if (!xe_vma_is_null(vma) && !range && !is_purged) {
>>   		if (xe_vma_is_userptr(vma))
>>   			xe_res_first_dma(to_userptr_vma(vma)->userptr.pages.dma_addr, 0,
>>   					 xe_vma_size(vma), &curs);
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index 10d77666a425..d03e69524369 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -1336,6 +1336,9 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
>>   static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>>   			       u16 pat_index, u32 pt_level)
>>   {
>> +	struct xe_bo *bo = xe_vma_bo(vma);
>> +	struct xe_vm *vm = xe_vma_vm(vma);
>> +
>>   	pte |= XE_PAGE_PRESENT;
>>   
>>   	if (likely(!xe_vma_read_only(vma)))
>> @@ -1344,7 +1347,13 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>>   	pte |= pte_encode_pat_index(pat_index, pt_level);
>>   	pte |= pte_encode_ps(pt_level);
>>   
>> -	if (unlikely(xe_vma_is_null(vma)))
>> +	/*
>> +	 * NULL PTEs redirect to scratch page (return zeros on read).
>> +	 * Set for: 1) explicit null VMAs, 2) purged BOs on scratch VMs.
>> +	 * Never set NULL flag without scratch page - causes undefined behavior.
>> +	 */
>> +	if (unlikely(xe_vma_is_null(vma) ||
>> +		     (bo && xe_bo_is_purged(bo) && xe_vm_has_scratch(vm))))
>>   		pte |= XE_PTE_NULL;
>>   
>>   	return pte;
>> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
>> index cad3cf627c3f..3ba851e0b870 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
>> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
>> @@ -158,6 +158,60 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
>>   	}
>>   }
>>   
>> +/*
>> + * Handle purgeable buffer object advice for DONTNEED/WILLNEED/PURGED.
>> + * Updates op->purge_state_val.retained to indicate if backing store
>> + * exists (matches i915's retained).
>> + */
>> +static void xe_vm_madvise_purgeable_bo(struct xe_device *xe, struct xe_vm *vm,
>> +				       struct xe_vma **vmas, int num_vmas,
>> +				       struct drm_xe_madvise *op)
>> +{
>> +	bool has_purged_bo = false;
>> +	int i;
>> +
>> +	xe_assert(vm->xe, op->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE);
>> +
>> +	for (i = 0; i < num_vmas; i++) {
>> +		struct xe_bo *bo = xe_vma_bo(vmas[i]);
>> +
>> +		if (!bo)
>> +			continue;
>> +
>> +		/* BO must be locked before modifying madv state */
>> +		dma_resv_assert_held(bo->ttm.base.resv);
>> +
>> +		/*
>> +		 * Once purged, always purged. Cannot transition back to WILLNEED.
>> +		 * This matches i915 semantics where purged BOs are permanently invalid.
>> +		 */
>> +		if (xe_bo_is_purged(bo)) {
>> +			has_purged_bo = true;
>> +			continue;
>> +		}
>> +
>> +		switch (op->purge_state_val.val) {
>> +		case DRM_XE_VMA_PURGEABLE_STATE_WILLNEED:
>> +			atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_WILLNEED);
>> +			break;
>> +		case DRM_XE_VMA_PURGEABLE_STATE_DONTNEED:
>> +			if (!xe_bo_is_shared_locked(bo))
>> +				atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_DONTNEED);
>> +			break;
>> +		default:
>> +			drm_warn(&vm->xe->drm, "Invalid madvice value = %d\n",
>> +				 op->purge_state_val.val);
>> +			return;
>> +		}
>> +	}
>> +
>> +	/*
>> +	 * Set retained flag to indicate if backing store still exists.
>> +	 * Matches i915: retained = 1 if not purged, 0 if purged.
>> +	 */
>> +	op->purge_state_val.retained = !has_purged_bo;
>> +}
>> +
>>   typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
>>   			     struct xe_vma **vmas, int num_vmas,
>>   			     struct drm_xe_madvise *op);
>> @@ -283,6 +337,19 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
>>   			return false;
>>   		break;
>>   	}
>> +	case DRM_XE_VMA_ATTR_PURGEABLE_STATE:
>> +	{
>> +		u32 val = args->purge_state_val.val;
>> +
>> +		if (XE_IOCTL_DBG(xe, !((val == DRM_XE_VMA_PURGEABLE_STATE_WILLNEED) ||
>> +				       (val == DRM_XE_VMA_PURGEABLE_STATE_DONTNEED))))
>> +			return false;
>> +
>> +		if (XE_IOCTL_DBG(xe, args->purge_state_val.reserved))
>> +			return false;
>> +
>> +		break;
>> +	}
>>   	default:
>>   		if (XE_IOCTL_DBG(xe, 1))
>>   			return false;
>> @@ -402,6 +469,12 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
>>   					goto err_fini;
>>   			}
>>   		}
>> +		if (args->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE) {
>> +			xe_vm_madvise_purgeable_bo(xe, vm, madvise_range.vmas,
>> +						   madvise_range.num_vmas, args);
>> +			goto err_fini;
>> +
>> +		}
>>   	}
>>   
>>   	if (madvise_range.has_svm_userptr_vmas) {
>> -- 
>> 2.43.0
>>

[-- Attachment #2: Type: text/html, Size: 16070 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 3/9] drm/xe/bo: Prevent purging of shared buffer objects
  2025-12-02  3:42     ` Yadav, Arvind
@ 2025-12-02  9:42       ` Thomas Hellström
  2025-12-02 15:17         ` Matthew Brost
  0 siblings, 1 reply; 35+ messages in thread
From: Thomas Hellström @ 2025-12-02  9:42 UTC (permalink / raw)
  To: Yadav, Arvind, Matthew Brost
  Cc: intel-xe, himal.prasad.ghimiray, pallavi.mishra

On Tue, 2025-12-02 at 09:12 +0530, Yadav, Arvind wrote:
> 
> On 02-12-2025 04:40, Matthew Brost wrote:
> > On Mon, Dec 01, 2025 at 11:20:13AM +0530, Arvind Yadav wrote:
> > > Introduce the `xe_bo_is_shared_locked()` inline helper to
> > > determine if a
> > > buffer object is shared across multiple clients or drivers. A
> > > buffer is
> > > considered shared if it is exported via dma-buf, imported, or has
> > > a
> > > handle count greater than one.
> > > 
> > > This check is critical for safely implementing purgeable memory.
> > > Purging
> > > a buffer that is shared would lead to data corruption for other
> > > clients
> > > that still hold a reference to it.
> > > 
> > > The kernel cannot safely determine when all clients are done with
> > > a
> > > shared buffer, so shared BOs must never be marked DONTNEED or
> > > purged.
> > > 
> > > The new helper is used in two key locations:
> > > 1.  In `xe_vm_madvise_purgeable_bo()`, to prevent userspace from
> > >      successfully marking a shared buffer as `DONTNEED`. This is
> > > the
> > >      primary safeguard against incorrect usage.
> > > 
> > > 2.  In `xe_bo_move()`, as a final safety check before the kernel
> > >      initiates a purge during eviction. This ensures that even if
> > > a
> > >      shared buffer were somehow marked `DONTNEED`, it would not
> > > be
> > >      purged.
> > > 
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> > > Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/xe_bo.h | 30 ++++++++++++++++++++++++++++++
> > >   1 file changed, 30 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.h
> > > b/drivers/gpu/drm/xe/xe_bo.h
> > > index b0a31c77e612..97edb38bf1ed 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.h
> > > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > > @@ -478,4 +478,34 @@ static inline bool xe_bo_is_mem_type(struct
> > > xe_bo *bo, u32 mem_type)
> > >   	xe_bo_assert_held(bo);
> > >   	return bo->ttm.resource->mem_type == mem_type;
> > >   }
> > > +
> > > +/**
> > > + * xe_bo_is_shared_locked - Check if a buffer object is shared
> > > + * @bo: The buffer object to check
> > > + *
> > > + * Determines if a buffer object is considered shared, which
> > > includes:
> > > + * - Exported via dma-buf (obj->dma_buf is set)
> > > + * - Imported from another driver (obj->import_attach is set)
> > > + * - Referenced by multiple clients (handle_count > 1)
> > > + *
> > > + * This check is used to prevent data loss on shared content by
> > > avoiding
> > > + * certain operations like purging on buffers that other
> > > processes or
> > > + * drivers might still be using.
> > > + *
> > > + * Return: true if the buffer object is shared, false otherwise.
> > > + */
> > > +static inline bool xe_bo_is_shared_locked(const struct xe_bo
> > > *bo)
> > > +{
> > > +	const struct drm_gem_object *obj = &bo->ttm.base;
> > > +
> > It seems like everything below here should be a new drm gem helper.
> There is a DRM helper 'drm_gem_object_is_shared_for_memory_stats()',
> but 
> it's
> specifically scoped for fdinfo memory accounting and doesn't check 
> import_attach.
> > > +	dma_resv_assert_held(obj->resv);
> > > +
> > > +	if (obj->dma_buf || obj->import_attach)
> > > +		return true;
> > > +
> > > +	if (obj->handle_count > 1)
> > So this covers the case when we prime fd to handle but we resolve
> > to a
> > BO (i.e., we don't do a dma-buf attach, rather just take reference
> > on BO
> > as the BO is from the same device)? I just want to make sure I'm
> > understanding this part correctly. If so, maybe throw a comment in
> > here
> > or update the functions kernel doc a bit with a better
> > explaination.
> Yes, that's correct! The handle_count > 1 check covers exactly that 
> scenario:
> When we do prime fd-to-handle but both processes are using the same
> xe 
> device,
> we don't do a dma-buf attach. Instead, we just increment the
> reference 
> count
> and handle_count on the same xe_bo.
> I will add and update the function with comments.

Arvind, Matt

I think the correct way to check purgability support for shared buffers
is to loop over all vmas attached to the bo and check that they all say
WONTNEED? If they don't, the bo is not purgeable. This will also need a
check at VMA unbinding.

/Thomas






> 
> ~Arvind
> > Matt
> > 
> > > +		return true;
> > > +
> > > +	return false;
> > > +}
> > >   #endif
> > > -- 
> > > 2.43.0
> > > 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 3/9] drm/xe/bo: Prevent purging of shared buffer objects
  2025-12-02  9:42       ` Thomas Hellström
@ 2025-12-02 15:17         ` Matthew Brost
  2025-12-02 18:22           ` Yadav, Arvind
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Brost @ 2025-12-02 15:17 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Yadav, Arvind, intel-xe, himal.prasad.ghimiray, pallavi.mishra

On Tue, Dec 02, 2025 at 10:42:07AM +0100, Thomas Hellström wrote:
> On Tue, 2025-12-02 at 09:12 +0530, Yadav, Arvind wrote:
> > 
> > On 02-12-2025 04:40, Matthew Brost wrote:
> > > On Mon, Dec 01, 2025 at 11:20:13AM +0530, Arvind Yadav wrote:
> > > > Introduce the `xe_bo_is_shared_locked()` inline helper to
> > > > determine if a
> > > > buffer object is shared across multiple clients or drivers. A
> > > > buffer is
> > > > considered shared if it is exported via dma-buf, imported, or has
> > > > a
> > > > handle count greater than one.
> > > > 
> > > > This check is critical for safely implementing purgeable memory.
> > > > Purging
> > > > a buffer that is shared would lead to data corruption for other
> > > > clients
> > > > that still hold a reference to it.
> > > > 
> > > > The kernel cannot safely determine when all clients are done with
> > > > a
> > > > shared buffer, so shared BOs must never be marked DONTNEED or
> > > > purged.
> > > > 
> > > > The new helper is used in two key locations:
> > > > 1.  In `xe_vm_madvise_purgeable_bo()`, to prevent userspace from
> > > >      successfully marking a shared buffer as `DONTNEED`. This is
> > > > the
> > > >      primary safeguard against incorrect usage.
> > > > 
> > > > 2.  In `xe_bo_move()`, as a final safety check before the kernel
> > > >      initiates a purge during eviction. This ensures that even if
> > > > a
> > > >      shared buffer were somehow marked `DONTNEED`, it would not
> > > > be
> > > >      purged.
> > > > 
> > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> > > > Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> > > > ---
> > > >   drivers/gpu/drm/xe/xe_bo.h | 30 ++++++++++++++++++++++++++++++
> > > >   1 file changed, 30 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_bo.h
> > > > b/drivers/gpu/drm/xe/xe_bo.h
> > > > index b0a31c77e612..97edb38bf1ed 100644
> > > > --- a/drivers/gpu/drm/xe/xe_bo.h
> > > > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > > > @@ -478,4 +478,34 @@ static inline bool xe_bo_is_mem_type(struct
> > > > xe_bo *bo, u32 mem_type)
> > > >   	xe_bo_assert_held(bo);
> > > >   	return bo->ttm.resource->mem_type == mem_type;
> > > >   }
> > > > +
> > > > +/**
> > > > + * xe_bo_is_shared_locked - Check if a buffer object is shared
> > > > + * @bo: The buffer object to check
> > > > + *
> > > > + * Determines if a buffer object is considered shared, which
> > > > includes:
> > > > + * - Exported via dma-buf (obj->dma_buf is set)
> > > > + * - Imported from another driver (obj->import_attach is set)
> > > > + * - Referenced by multiple clients (handle_count > 1)
> > > > + *
> > > > + * This check is used to prevent data loss on shared content by
> > > > avoiding
> > > > + * certain operations like purging on buffers that other
> > > > processes or
> > > > + * drivers might still be using.
> > > > + *
> > > > + * Return: true if the buffer object is shared, false otherwise.
> > > > + */
> > > > +static inline bool xe_bo_is_shared_locked(const struct xe_bo
> > > > *bo)
> > > > +{
> > > > +	const struct drm_gem_object *obj = &bo->ttm.base;
> > > > +
> > > It seems like everything below here should be a new drm gem helper.
> > There is a DRM helper 'drm_gem_object_is_shared_for_memory_stats()',
> > but 
> > it's
> > specifically scoped for fdinfo memory accounting and doesn't check 
> > import_attach.
> > > > +	dma_resv_assert_held(obj->resv);
> > > > +
> > > > +	if (obj->dma_buf || obj->import_attach)
> > > > +		return true;
> > > > +
> > > > +	if (obj->handle_count > 1)
> > > So this covers the case when we prime fd to handle but we resolve
> > > to a
> > > BO (i.e., we don't do a dma-buf attach, rather just take reference
> > > on BO
> > > as the BO is from the same device)? I just want to make sure I'm
> > > understanding this part correctly. If so, maybe throw a comment in
> > > here
> > > or update the functions kernel doc a bit with a better
> > > explaination.
> > Yes, that's correct! The handle_count > 1 check covers exactly that 
> > scenario:
> > When we do prime fd-to-handle but both processes are using the same
> > xe 
> > device,
> > we don't do a dma-buf attach. Instead, we just increment the
> > reference 
> > count
> > and handle_count on the same xe_bo.
> > I will add and update the function with comments.
> 
> Arvind, Matt
> 
> I think the correct way to check purgability support for shared buffers
> is to loop over all vmas attached to the bo and check that they all say
> WONTNEED? If they don't, the bo is not purgeable. This will also need a
> check at VMA unbinding.
> 

I think this makes sense. I haven't fully gotten through this series
yet, but will consider this during the code reviews. I probably should
apply this series in full before providing feedback.

Matt 

> /Thomas
> 
> 
> 
> 
> 
> 
> > 
> > ~Arvind
> > > Matt
> > > 
> > > > +		return true;
> > > > +
> > > > +	return false;
> > > > +}
> > > >   #endif
> > > > -- 
> > > > 2.43.0
> > > > 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 3/9] drm/xe/bo: Prevent purging of shared buffer objects
  2025-12-02 15:17         ` Matthew Brost
@ 2025-12-02 18:22           ` Yadav, Arvind
  2025-12-02 18:35             ` Matthew Brost
  0 siblings, 1 reply; 35+ messages in thread
From: Yadav, Arvind @ 2025-12-02 18:22 UTC (permalink / raw)
  To: Matthew Brost, Thomas Hellström
  Cc: intel-xe, himal.prasad.ghimiray, pallavi.mishra


On 02-12-2025 20:47, Matthew Brost wrote:
> On Tue, Dec 02, 2025 at 10:42:07AM +0100, Thomas Hellström wrote:
>> On Tue, 2025-12-02 at 09:12 +0530, Yadav, Arvind wrote:
>>> On 02-12-2025 04:40, Matthew Brost wrote:
>>>> On Mon, Dec 01, 2025 at 11:20:13AM +0530, Arvind Yadav wrote:
>>>>> Introduce the `xe_bo_is_shared_locked()` inline helper to
>>>>> determine if a
>>>>> buffer object is shared across multiple clients or drivers. A
>>>>> buffer is
>>>>> considered shared if it is exported via dma-buf, imported, or has
>>>>> a
>>>>> handle count greater than one.
>>>>>
>>>>> This check is critical for safely implementing purgeable memory.
>>>>> Purging
>>>>> a buffer that is shared would lead to data corruption for other
>>>>> clients
>>>>> that still hold a reference to it.
>>>>>
>>>>> The kernel cannot safely determine when all clients are done with
>>>>> a
>>>>> shared buffer, so shared BOs must never be marked DONTNEED or
>>>>> purged.
>>>>>
>>>>> The new helper is used in two key locations:
>>>>> 1.  In `xe_vm_madvise_purgeable_bo()`, to prevent userspace from
>>>>>       successfully marking a shared buffer as `DONTNEED`. This is
>>>>> the
>>>>>       primary safeguard against incorrect usage.
>>>>>
>>>>> 2.  In `xe_bo_move()`, as a final safety check before the kernel
>>>>>       initiates a purge during eviction. This ensures that even if
>>>>> a
>>>>>       shared buffer were somehow marked `DONTNEED`, it would not
>>>>> be
>>>>>       purged.
>>>>>
>>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>>> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>>>>> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/xe/xe_bo.h | 30 ++++++++++++++++++++++++++++++
>>>>>    1 file changed, 30 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_bo.h
>>>>> b/drivers/gpu/drm/xe/xe_bo.h
>>>>> index b0a31c77e612..97edb38bf1ed 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_bo.h
>>>>> +++ b/drivers/gpu/drm/xe/xe_bo.h
>>>>> @@ -478,4 +478,34 @@ static inline bool xe_bo_is_mem_type(struct
>>>>> xe_bo *bo, u32 mem_type)
>>>>>    	xe_bo_assert_held(bo);
>>>>>    	return bo->ttm.resource->mem_type == mem_type;
>>>>>    }
>>>>> +
>>>>> +/**
>>>>> + * xe_bo_is_shared_locked - Check if a buffer object is shared
>>>>> + * @bo: The buffer object to check
>>>>> + *
>>>>> + * Determines if a buffer object is considered shared, which
>>>>> includes:
>>>>> + * - Exported via dma-buf (obj->dma_buf is set)
>>>>> + * - Imported from another driver (obj->import_attach is set)
>>>>> + * - Referenced by multiple clients (handle_count > 1)
>>>>> + *
>>>>> + * This check is used to prevent data loss on shared content by
>>>>> avoiding
>>>>> + * certain operations like purging on buffers that other
>>>>> processes or
>>>>> + * drivers might still be using.
>>>>> + *
>>>>> + * Return: true if the buffer object is shared, false otherwise.
>>>>> + */
>>>>> +static inline bool xe_bo_is_shared_locked(const struct xe_bo
>>>>> *bo)
>>>>> +{
>>>>> +	const struct drm_gem_object *obj = &bo->ttm.base;
>>>>> +
>>>> It seems like everything below here should be a new drm gem helper.
>>> There is a DRM helper 'drm_gem_object_is_shared_for_memory_stats()',
>>> but
>>> it's
>>> specifically scoped for fdinfo memory accounting and doesn't check
>>> import_attach.
>>>>> +	dma_resv_assert_held(obj->resv);
>>>>> +
>>>>> +	if (obj->dma_buf || obj->import_attach)
>>>>> +		return true;
>>>>> +
>>>>> +	if (obj->handle_count > 1)
>>>> So this covers the case when we prime fd to handle but we resolve
>>>> to a
>>>> BO (i.e., we don't do a dma-buf attach, rather just take reference
>>>> on BO
>>>> as the BO is from the same device)? I just want to make sure I'm
>>>> understanding this part correctly. If so, maybe throw a comment in
>>>> here
>>>> or update the functions kernel doc a bit with a better
>>>> explaination.
>>> Yes, that's correct! The handle_count > 1 check covers exactly that
>>> scenario:
>>> When we do prime fd-to-handle but both processes are using the same
>>> xe
>>> device,
>>> we don't do a dma-buf attach. Instead, we just increment the
>>> reference
>>> count
>>> and handle_count on the same xe_bo.
>>> I will add and update the function with comments.
>> Arvind, Matt
>>
>> I think the correct way to check purgability support for shared buffers
>> is to loop over all vmas attached to the bo and check that they all say
>> WONTNEED? If they don't, the bo is not purgeable. This will also need a
>> check at VMA unbinding.
>>
> I think this makes sense. I haven't fully gotten through this series
> yet, but will consider this during the code reviews. I probably should
> apply this series in full before providing feedback.

Thomas, Matt

Agreed. I will reworked the purgeability logic exactly as you suggested.
Instead of relying on a “shared BO” helper, the driver now keeps a 
per-VMA purgeable_state,
and a BO becomes DONTNEED only when all VMAs attached to it report DONTNEED.

~Arvind

> Matt
>
>> /Thomas
>>
>>
>>
>>
>>
>>
>>> ~Arvind
>>>> Matt
>>>>
>>>>> +		return true;
>>>>> +
>>>>> +	return false;
>>>>> +}
>>>>>    #endif
>>>>> -- 
>>>>> 2.43.0
>>>>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 3/9] drm/xe/bo: Prevent purging of shared buffer objects
  2025-12-02 18:22           ` Yadav, Arvind
@ 2025-12-02 18:35             ` Matthew Brost
  0 siblings, 0 replies; 35+ messages in thread
From: Matthew Brost @ 2025-12-02 18:35 UTC (permalink / raw)
  To: Yadav, Arvind
  Cc: Thomas Hellström, intel-xe, himal.prasad.ghimiray,
	pallavi.mishra

On Tue, Dec 02, 2025 at 11:52:13PM +0530, Yadav, Arvind wrote:
> 
> On 02-12-2025 20:47, Matthew Brost wrote:
> > On Tue, Dec 02, 2025 at 10:42:07AM +0100, Thomas Hellström wrote:
> > > On Tue, 2025-12-02 at 09:12 +0530, Yadav, Arvind wrote:
> > > > On 02-12-2025 04:40, Matthew Brost wrote:
> > > > > On Mon, Dec 01, 2025 at 11:20:13AM +0530, Arvind Yadav wrote:
> > > > > > Introduce the `xe_bo_is_shared_locked()` inline helper to
> > > > > > determine if a
> > > > > > buffer object is shared across multiple clients or drivers. A
> > > > > > buffer is
> > > > > > considered shared if it is exported via dma-buf, imported, or has
> > > > > > a
> > > > > > handle count greater than one.
> > > > > > 
> > > > > > This check is critical for safely implementing purgeable memory.
> > > > > > Purging
> > > > > > a buffer that is shared would lead to data corruption for other
> > > > > > clients
> > > > > > that still hold a reference to it.
> > > > > > 
> > > > > > The kernel cannot safely determine when all clients are done with
> > > > > > a
> > > > > > shared buffer, so shared BOs must never be marked DONTNEED or
> > > > > > purged.
> > > > > > 
> > > > > > The new helper is used in two key locations:
> > > > > > 1.  In `xe_vm_madvise_purgeable_bo()`, to prevent userspace from
> > > > > >       successfully marking a shared buffer as `DONTNEED`. This is
> > > > > > the
> > > > > >       primary safeguard against incorrect usage.
> > > > > > 
> > > > > > 2.  In `xe_bo_move()`, as a final safety check before the kernel
> > > > > >       initiates a purge during eviction. This ensures that even if
> > > > > > a
> > > > > >       shared buffer were somehow marked `DONTNEED`, it would not
> > > > > > be
> > > > > >       purged.
> > > > > > 
> > > > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > > > Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> > > > > > Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> > > > > > ---
> > > > > >    drivers/gpu/drm/xe/xe_bo.h | 30 ++++++++++++++++++++++++++++++
> > > > > >    1 file changed, 30 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_bo.h
> > > > > > b/drivers/gpu/drm/xe/xe_bo.h
> > > > > > index b0a31c77e612..97edb38bf1ed 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_bo.h
> > > > > > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > > > > > @@ -478,4 +478,34 @@ static inline bool xe_bo_is_mem_type(struct
> > > > > > xe_bo *bo, u32 mem_type)
> > > > > >    	xe_bo_assert_held(bo);
> > > > > >    	return bo->ttm.resource->mem_type == mem_type;
> > > > > >    }
> > > > > > +
> > > > > > +/**
> > > > > > + * xe_bo_is_shared_locked - Check if a buffer object is shared
> > > > > > + * @bo: The buffer object to check
> > > > > > + *
> > > > > > + * Determines if a buffer object is considered shared, which
> > > > > > includes:
> > > > > > + * - Exported via dma-buf (obj->dma_buf is set)
> > > > > > + * - Imported from another driver (obj->import_attach is set)
> > > > > > + * - Referenced by multiple clients (handle_count > 1)
> > > > > > + *
> > > > > > + * This check is used to prevent data loss on shared content by
> > > > > > avoiding
> > > > > > + * certain operations like purging on buffers that other
> > > > > > processes or
> > > > > > + * drivers might still be using.
> > > > > > + *
> > > > > > + * Return: true if the buffer object is shared, false otherwise.
> > > > > > + */
> > > > > > +static inline bool xe_bo_is_shared_locked(const struct xe_bo
> > > > > > *bo)
> > > > > > +{
> > > > > > +	const struct drm_gem_object *obj = &bo->ttm.base;
> > > > > > +
> > > > > It seems like everything below here should be a new drm gem helper.
> > > > There is a DRM helper 'drm_gem_object_is_shared_for_memory_stats()',
> > > > but
> > > > it's
> > > > specifically scoped for fdinfo memory accounting and doesn't check
> > > > import_attach.
> > > > > > +	dma_resv_assert_held(obj->resv);
> > > > > > +
> > > > > > +	if (obj->dma_buf || obj->import_attach)
> > > > > > +		return true;
> > > > > > +
> > > > > > +	if (obj->handle_count > 1)
> > > > > So this covers the case when we prime fd to handle but we resolve
> > > > > to a
> > > > > BO (i.e., we don't do a dma-buf attach, rather just take reference
> > > > > on BO
> > > > > as the BO is from the same device)? I just want to make sure I'm
> > > > > understanding this part correctly. If so, maybe throw a comment in
> > > > > here
> > > > > or update the functions kernel doc a bit with a better
> > > > > explaination.
> > > > Yes, that's correct! The handle_count > 1 check covers exactly that
> > > > scenario:
> > > > When we do prime fd-to-handle but both processes are using the same
> > > > xe
> > > > device,
> > > > we don't do a dma-buf attach. Instead, we just increment the
> > > > reference
> > > > count
> > > > and handle_count on the same xe_bo.
> > > > I will add and update the function with comments.
> > > Arvind, Matt
> > > 
> > > I think the correct way to check purgability support for shared buffers
> > > is to loop over all vmas attached to the bo and check that they all say
> > > WONTNEED? If they don't, the bo is not purgeable. This will also need a
> > > check at VMA unbinding.
> > > 
> > I think this makes sense. I haven't fully gotten through this series
> > yet, but will consider this during the code reviews. I probably should
> > apply this series in full before providing feedback.
> 
> Thomas, Matt
> 
> Agreed. I will reworked the purgeability logic exactly as you suggested.
> Instead of relying on a “shared BO” helper, the driver now keeps a per-VMA
> purgeable_state,
> and a BO becomes DONTNEED only when all VMAs attached to it report DONTNEED.
> 

+1, let me get through the rest of the series today and see if I spot
anything low hanging that should be fixed in the next rev.

Matt

> ~Arvind
> 
> > Matt
> > 
> > > /Thomas
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > > ~Arvind
> > > > > Matt
> > > > > 
> > > > > > +		return true;
> > > > > > +
> > > > > > +	return false;
> > > > > > +}
> > > > > >    #endif
> > > > > > -- 
> > > > > > 2.43.0
> > > > > > 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects
  2025-12-01  5:50 [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
                   ` (8 preceding siblings ...)
  2025-12-01  5:50 ` [RFC v2 9/9] drm/xe: Add support for querying purgeable BO states Arvind Yadav
@ 2025-12-02 18:36 ` Souza, Jose
  9 siblings, 0 replies; 35+ messages in thread
From: Souza, Jose @ 2025-12-02 18:36 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Yadav,  Arvind
  Cc: Brost, Matthew, Mishra, Pallavi, Ghimiray, Himal Prasad,
	thomas.hellstrom@linux.intel.com

On Mon, 2025-12-01 at 11:20 +0530, Arvind Yadav wrote:
> This patch series introduces comprehensive support for purgeable
> buffer objects
> in the Xe driver, enabling userspace to provide memory usage hints
> for better
> memory management under system pressure.
> 
> Overview:
> 
> Purgeable memory allows applications to mark buffer objects as "not
> currently
> needed" (DONTNEED), making them eligible for kernel reclamation
> during memory
> pressure. This helps prevent OOM conditions and enables more
> efficient GPU
> memory utilization for workloads with temporary or regeneratable data
> (caches,
> intermediate results, decoded frames, etc.).
> 
> Purgeable BO Lifecycle:
> 1. WILLNEED (default): BO actively needed, kernel preserves backing
> store
> 2. DONTNEED (user hint): BO contents discardable, eligible for
> purging
> 3. PURGED (kernel action): Backing store reclaimed during memory
> pressure
> 
> Key Design Principles:
>   - i915 compatibility: "Once purged, always purged" semantics -
> purged BOs
>      remain permanently invalid and must be destroyed/recreated
>   - Safety first: Only non-shared BOs can be marked DONTNEED to
> prevent
>     multi-process data corruption
>   - Multiple protection layers: Validation in madvise, VM bind, mmap,
> and
>     fault handlers
>   - Async TLB invalidation: Uses xe_bo_trigger_rebind() for non-
> blocking
>     GPU mapping invalidation
>   - Scratch PTE support: Fault-mode VMs use scratch pages for safe
> zero reads
>     on purged BO access
> 
> Error Handling:
>   - CPU access (mmap): Returns VM_FAULT_SIGBUS (SIGBUS signal to
> process)
>   - GPU access (non-scratch VM): Page fault fails with -EACCES, GPU
> context reset
>   - GPU access (scratch VM): Page fault succeeds, rebinds with
> scratch PTEs
>   - VM_BIND operations: MAP/PREFETCH rejected with -EINVAL
>   - Mmap offset ioctl: Rejected with -EINVAL for early error
> detection

uAPI and overal this lgtm, please let me know when you think this is
ready to be implemented on Mesa side.

> 
> v2 Changes:
>   - Reordered patches: Moved shared BO helper before main
> implementation for
>     proper dependency order
>   - Fixed reference counting in mmap offset validation (use
> drm_gem_object_put)
>   - Removed incorrect claims about madvise(WILLNEED) restoring purged
> BOs
>   - Fixed error code documentation inconsistencies
>   - Initialize purge_state_val fields to prevent kernel memory leaks
>   - Use xe_bo_trigger_rebind() for async TLB invalidation (Thomas
> Hellström)
>   - Add NULL rebind with scratch PTEs for fault mode (Thomas
> Hellström)
>   - Implement i915-compatible retained field logic (Thomas Hellström)
>   - Skip BO validation for purged BOs in page fault handler (crash
> fix)
>   - Add scratch VM check in page fault path (non-scratch VMs fail
> fault) 
>  
> 
> Arvind Yadav (7):
>   drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo
>   drm/xe/bo: Prevent purging of shared buffer objects
>   drm/xe/madvise: Implement purgeable buffer object support
>   drm/xe/bo: Handle CPU faults on purged buffer objects
>   drm/xe/bo: Prevent mmap of purged buffer objects
>   drm/xe/vm: Prevent binding of purged buffer objects
>   drm/xe: Add support for querying purgeable BO states
> 
> Himal Prasad Ghimiray (2):
>   drm/xe/uapi: Add UAPI support for purgeable buffer objects
>   drm/xe/uapi: Add UAPI for purgeable bo state to madvise query
> response
> 
>  drivers/gpu/drm/xe/xe_bo.c           | 92 ++++++++++++++++++++++++--
> --
>  drivers/gpu/drm/xe/xe_bo.h           | 57 +++++++++++++++++
>  drivers/gpu/drm/xe/xe_bo_types.h     |  3 +
>  drivers/gpu/drm/xe/xe_gt_pagefault.c | 19 ++++++
>  drivers/gpu/drm/xe/xe_pt.c           | 13 +++-
>  drivers/gpu/drm/xe/xe_vm.c           | 12 ++++
>  drivers/gpu/drm/xe/xe_vm_madvise.c   | 76 +++++++++++++++++++++++
>  include/uapi/drm/xe_drm.h            | 50 ++++++++++++++-
>  8 files changed, 308 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 5/9] drm/xe/bo: Handle CPU faults on purged buffer objects
  2025-12-01  5:50 ` [RFC v2 5/9] drm/xe/bo: Handle CPU faults on purged buffer objects Arvind Yadav
@ 2025-12-02 18:42   ` Matthew Brost
  2025-12-02 18:48     ` Matthew Brost
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Brost @ 2025-12-02 18:42 UTC (permalink / raw)
  To: Arvind Yadav
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra

On Mon, Dec 01, 2025 at 11:20:15AM +0530, Arvind Yadav wrote:
> Modify the CPU page fault handler, `xe_bo_cpu_fault()`, to correctly
> handle access to buffer objects that have been purged.
> 
> When a buffer object is in the `XE_MADV_PURGED` state, its backing
> store has been reclaimed by the kernel. If the CPU attempts to access
> this memory, it is an error that should be reported to the application.
> 
> v2:
>   - Added xe_bo_is_purged(bo) instead of atomic_read.
>   - Avoids leaks and keeps drm_dev_exit() while returning.
> 
> Cc: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index f0b3f7a13114..7f5bcf114ed4 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1992,6 +1992,16 @@ static vm_fault_t xe_bo_cpu_fault(struct vm_fault *vmf)
>  	if (!drm_dev_enter(&xe->drm, &idx))
>  		return ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
>  
> +	/*
> +	 * BO content is gone. Signal the user process.
> +	 * Once purged, BO remains permanently invalid (i915 semantics).
> +	 * Application must destroy and recreate the BO.
> +	 */
> +	if (xe_bo_is_purged(bo)) {
> +		ret = VM_FAULT_SIGBUS;
> +		goto out;
> +	}
> +
>  	ret = xe_bo_cpu_fault_fastpath(vmf, xe, bo, needs_rpm);
>  	if (ret != VM_FAULT_RETRY)
>  		goto out;
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 5/9] drm/xe/bo: Handle CPU faults on purged buffer objects
  2025-12-02 18:42   ` Matthew Brost
@ 2025-12-02 18:48     ` Matthew Brost
  2025-12-03  7:25       ` Yadav, Arvind
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Brost @ 2025-12-02 18:48 UTC (permalink / raw)
  To: Arvind Yadav
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra

On Tue, Dec 02, 2025 at 10:42:39AM -0800, Matthew Brost wrote:
> On Mon, Dec 01, 2025 at 11:20:15AM +0530, Arvind Yadav wrote:
> > Modify the CPU page fault handler, `xe_bo_cpu_fault()`, to correctly
> > handle access to buffer objects that have been purged.
> > 
> > When a buffer object is in the `XE_MADV_PURGED` state, its backing
> > store has been reclaimed by the kernel. If the CPU attempts to access
> > this memory, it is an error that should be reported to the application.
> > 
> > v2:
> >   - Added xe_bo_is_purged(bo) instead of atomic_read.
> >   - Avoids leaks and keeps drm_dev_exit() while returning.
> > 
> > Cc: Matthew Brost <matthew.brost@intel.com>
> 
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> 

Ah, actually I think I made a mistake here.

> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> > Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_bo.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > index f0b3f7a13114..7f5bcf114ed4 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -1992,6 +1992,16 @@ static vm_fault_t xe_bo_cpu_fault(struct vm_fault *vmf)
> >  	if (!drm_dev_enter(&xe->drm, &idx))
> >  		return ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
> >  
> > +	/*
> > +	 * BO content is gone. Signal the user process.
> > +	 * Once purged, BO remains permanently invalid (i915 semantics).
> > +	 * Application must destroy and recreate the BO.
> > +	 */
> > +	if (xe_bo_is_purged(bo)) {

Doesn't this need to done under the BO's dma-resv lock to avoid a race?
Consider the case where after this check, TTM evicts this BO changing
the state purged. Now we grab the BO's dma-resv lock and try to get
pages on purged BO. Seems like an issue.

Also with that, xe_bo_is_purged likely should have lockdep annotation
asserting the BOs dma-resv lock is held.

Matt

> > +		ret = VM_FAULT_SIGBUS;
> > +		goto out;
> > +	}
> > +
> >  	ret = xe_bo_cpu_fault_fastpath(vmf, xe, bo, needs_rpm);
> >  	if (ret != VM_FAULT_RETRY)
> >  		goto out;
> > -- 
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 2/9] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo
  2025-12-01  5:50 ` [RFC v2 2/9] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo Arvind Yadav
  2025-12-01 23:02   ` Matthew Brost
@ 2025-12-02 18:52   ` Matthew Brost
  1 sibling, 0 replies; 35+ messages in thread
From: Matthew Brost @ 2025-12-02 18:52 UTC (permalink / raw)
  To: Arvind Yadav
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra

On Mon, Dec 01, 2025 at 11:20:12AM +0530, Arvind Yadav wrote:
> Add infrastructure for tracking purgeable state of buffer objects.
> This includes:
> 
> Introduce enum xe_madv_purgeable_state with three states:
>    - XE_MADV_PURGEABLE_WILLNEED (0): BO is needed and should not be
>      purged. This is the default state for all BOs.
> 
>    - XE_MADV_PURGEABLE_DONTNEED (1): BO is not currently needed and
>      can be purged by the kernel under memory pressure to reclaim
>      resources. Only non-shared BOs can be marked as DONTNEED.
> 
>    - XE_MADV_PURGEABLE_PURGED (2): BO has been purged by the kernel.
>      Accessing a purged BO results in error. Follows i915 semantics
>      where once purged, the BO remains permanently invalid ("once
>      purged, always purged").
> 
> Add atomic_t madv field to struct xe_bo for state tracking
>   of purgeable state across concurrent access paths
> 
> v2: Add xe_bo_is_purged() helper, improve state documentation
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.h       | 27 +++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_bo_types.h |  3 +++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index 911d5b90461a..b0a31c77e612 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -85,6 +85,28 @@
>  
>  #define XE_PCI_BARRIER_MMAP_OFFSET	(0x50 << XE_PTE_SHIFT)
>  
> +/**
> + * enum xe_madv_purgeable_state - Buffer object purgeable state enumeration
> + *
> + * This enum defines the possible purgeable states for a buffer object,
> + * allowing userspace to provide memory usage hints to the kernel for
> + * better memory management under pressure.
> + *
> + * @XE_MADV_PURGEABLE_WILLNEED: The buffer object is needed and should not be purged.
> + * This is the default state.
> + * @XE_MADV_PURGEABLE_DONTNEED: The buffer object is not currently needed and can be
> + * purged by the kernel under memory pressure.
> + * @XE_MADV_PURGEABLE_PURGED: The buffer object has been purged by the kernel.
> + *
> + * Accessing a purged buffer will result in an error. Per i915 semantics,
> + * once purged, a BO remains permanently invalid and must be destroyed and recreated.
> + */
> +enum xe_madv_purgeable_state {
> +	XE_MADV_PURGEABLE_WILLNEED,
> +	XE_MADV_PURGEABLE_DONTNEED,
> +	XE_MADV_PURGEABLE_PURGED,
> +};
> +
>  struct sg_table;
>  
>  struct xe_bo *xe_bo_alloc(void);
> @@ -213,6 +235,11 @@ static inline bool xe_bo_is_protected(const struct xe_bo *bo)
>  	return bo->pxp_key_instance;
>  }
>  
> +static inline bool xe_bo_is_purged(struct xe_bo *bo)
> +{

I think you want to assert the BO's dma-resv lock is held here. I
suggest this in patch #5.

> +	return atomic_read(&bo->madv_purgeable) == XE_MADV_PURGEABLE_PURGED;
> +}
> +
>  static inline void xe_bo_unpin_map_no_vm(struct xe_bo *bo)
>  {
>  	if (likely(bo)) {
> diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
> index d4fe3c8dca5b..57b4dc7012e2 100644
> --- a/drivers/gpu/drm/xe/xe_bo_types.h
> +++ b/drivers/gpu/drm/xe/xe_bo_types.h
> @@ -108,6 +108,9 @@ struct xe_bo {
>  	 * from default
>  	 */
>  	u64 min_align;
> +
> +	/** @madv_purgeable: user space advise on BO purgeability */
> +	atomic_t madv_purgeable;

Does this really need to be atomic? From a grep on the final result, all
critical accesses (ones where functionality would break) are done under
(or should be done under) the BO's dma-resv lock, right? The madvise
reading the attribute is likely the expection where it is ok to just
read this value without the dma-resv lock.

So with that, I'd suggest drop the atomic and also add a helper to set
this value that also asserts the BO's dma-resv lock is held. Use a
WRITE_ONCE which would pair with a READ_ONCE in the madvise attribute
query.

Matt

>  };
>  
>  #endif
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 6/9] drm/xe/bo: Prevent mmap of purged buffer objects
  2025-12-01  5:50 ` [RFC v2 6/9] drm/xe/bo: Prevent mmap of " Arvind Yadav
@ 2025-12-02 18:54   ` Matthew Brost
  0 siblings, 0 replies; 35+ messages in thread
From: Matthew Brost @ 2025-12-02 18:54 UTC (permalink / raw)
  To: Arvind Yadav
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra

On Mon, Dec 01, 2025 at 11:20:16AM +0530, Arvind Yadav wrote:
> Fail DRM_IOCTL_XE_GEM_MMAP_OFFSET with -EINVAL when called on purged
> buffer objects to provide early error detection instead of allowing
> deferred SIGBUS on memory access.
> 
> Problem:
>   The mmap offset ioctl (DRM_IOCTL_XE_GEM_MMAP_OFFSET) returns a file
>   offset that userspace can pass to mmap() to map GPU memory into its
>   address space. For purged BOs, the backing store has been freed, but
>   the VMA node offset remains valid. Without this check:
> 
>   1. Userspace successfully gets mmap offset for purged BO
>   2. mmap() succeeds (VMA is created but has no backing pages)
>   3. Any memory access triggers CPU page fault
>   4. xe_bo_cpu_fault() detects purged state and returns VM_FAULT_SIGBUS
> 
> v2:
>   - Fix reference counting: use drm_gem_object_put() instead of xe_bo_put()
>     to properly balance drm_gem_object_lookup() (review feedback).
>   - Added xe_bo_is_purged(bo) instead of atomic_read.
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 7f5bcf114ed4..dbbfb58ac657 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -3346,6 +3346,7 @@ int xe_gem_mmap_offset_ioctl(struct drm_device *dev, void *data,
>  	struct xe_device *xe = to_xe_device(dev);
>  	struct drm_xe_gem_mmap_offset *args = data;
>  	struct drm_gem_object *gem_obj;
> +	struct xe_bo *bo;
>  
>  	if (XE_IOCTL_DBG(xe, args->extensions) ||
>  	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
> @@ -3375,6 +3376,16 @@ int xe_gem_mmap_offset_ioctl(struct drm_device *dev, void *data,
>  	if (XE_IOCTL_DBG(xe, !gem_obj))
>  		return -ENOENT;
>  
> +	bo = gem_to_xe_bo(gem_obj);
> +
> +	/*
> +	 * Reject mmap offset requests for purged BOs.
> +	 */
> +	if (xe_bo_is_purged(bo)) {

I don't think this is needed. A subsequent CPU page fault would just
fail with SIGBUS. As discussed in patch 2/5 we'd have the BO's dma-resv
lock there too making this check race free.

Matt

> +		drm_gem_object_put(gem_obj);
> +		return -EINVAL;
> +	}
> +
>  	/* The mmap offset was set up at BO allocation time. */
>  	args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 7/9] drm/xe/vm: Prevent binding of purged buffer objects
  2025-12-01  5:50 ` [RFC v2 7/9] drm/xe/vm: Prevent binding " Arvind Yadav
@ 2025-12-02 18:57   ` Matthew Brost
  2025-12-03 11:24     ` Yadav, Arvind
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Brost @ 2025-12-02 18:57 UTC (permalink / raw)
  To: Arvind Yadav
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra

On Mon, Dec 01, 2025 at 11:20:17AM +0530, Arvind Yadav wrote:
> Add validation in xe_vm_bind_ioctl_validate_bo() to reject MAP and
> PREFETCH operations on purged buffer objects with -EINVAL.
> 
> Problem:
> When a BO is purged (XE_MADV_PURGEABLE_PURGED state), its backing pages
> have been freed by the kernel. Without this check, VM_BIND operations
> would proceed:
> 
>  1. DRM_XE_VM_BIND_OP_MAP: Attempts to create GPU mappings to freed memory
>     - xe_vma_ops_alloc() creates VMA pointing to invalid BO
>     - Page tables populated with stale/invalid addresses
>     - GPU access leads to undefined behavior or hangs
> 
>  2. DRM_XE_VM_BIND_OP_PREFETCH: Attempts to migrate non-existent pages
>     - Triggers BO validation in TTM
>     - ttm_bo_validate() fails or crashes (no backing store)
>     - Wasted work for permanently invalid BO
> 
> With this check:
>   - MAP/PREFETCH immediately fail with -EINVAL at ioctl boundary
>   - Clear error message at syscall (better UX than deferred GPU hang)
>   - Prevents creation of invalid GPU page table entries
> 
> v2:
>   - Clarify that purged BOs are permanently invalid (i915 semantics)
>   - Remove incorrect claim about madvise(WILLNEED) restoring purged BOs
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_vm.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index d03e69524369..cc946bff9607 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -3482,6 +3482,13 @@ static int xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo *bo,
>  		return -EINVAL;
>  	}
>  
> +	/* Purged BOs are permanently invalid; reject new MAP/PREFETCH. */
> +	if (XE_IOCTL_DBG(xe,
> +			 xe_bo_is_purged(bo) &&
> +			 (op == DRM_XE_VM_BIND_OP_MAP ||
> +			  op == DRM_XE_VM_BIND_OP_PREFETCH)))
> +		return -EINVAL;

I'd move this check to later in the bind pipeline once we have BO's
dma-resv lock to make this race free. I think vma_lock_and_validate is
likely the correct function after drm_exec_lock_obj but before
xe_bo_validate. Or you could just make xe_bo_validate fail if the object
is purged.

Matt 

> +
>  	/*
>  	 * Some platforms require 64k VM_BIND alignment,
>  	 * specifically those with XE_VRAM_FLAGS_NEED64K.
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 8/9] drm/xe/uapi: Add UAPI for purgeable bo state to madvise query response
  2025-12-01  5:50 ` [RFC v2 8/9] drm/xe/uapi: Add UAPI for purgeable bo state to madvise query response Arvind Yadav
@ 2025-12-02 19:01   ` Matthew Brost
  2025-12-03  3:54     ` Yadav, Arvind
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Brost @ 2025-12-02 19:01 UTC (permalink / raw)
  To: Arvind Yadav
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra

On Mon, Dec 01, 2025 at 11:20:18AM +0530, Arvind Yadav wrote:
> From: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> 
> Complete the purgeable buffer object UAPI by adding the response
> structure to drm_xe_mem_range_attr for querying current purgeable
> state of buffer objects within a memory range.
> 
> This allows userspace to determine the current state of BOs:
> - DRM_XE_VMA_PURGEABLE_STATE_WILLNEED (0): BO actively needed, has backing store
> - DRM_XE_VMA_PURGEABLE_STATE_DONTNEED (1): BO eligible for purging, still has backing
> - DRM_XE_VMA_PURGEABLE_STATE_PURGED (2): BO purged, backing store freed (read-only)
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
>  include/uapi/drm/xe_drm.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 02d63938d16f..8f289a2849ff 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -2147,10 +2147,14 @@ struct drm_xe_madvise {
>  		 *  - DRM_XE_VMA_PURGEABLE_STATE_DONTNEED (1): Hints that BO is not
>  		 *    currently needed. Kernel may purge it under memory pressure.
>  		 *    Only applies to non-shared BOs. Returns retained=1 if not purged.
> +		 *
> +		 *  - DRM_XE_VMA_PURGEABLE_STATE_PURGED: Read-only state indicating
> +		 *    the BO purge state.

This is tricky one. Since the value of purgable state can immediately
change after getting populated, is there any value in reporting this to
user space? My guess is no. So with that, I'd probably remove this uAPI.

Matt

>  		 */
>  		struct {
>  #define DRM_XE_VMA_PURGEABLE_STATE_WILLNEED	0
>  #define DRM_XE_VMA_PURGEABLE_STATE_DONTNEED	1
> +#define DRM_XE_VMA_PURGEABLE_STATE_PURGED	2
>  			/** @purge_state_val.val: value for DRM_XE_VMA_ATTR_PURGEABLE_STATE */
>  			__u32 val;
>  			/**
> @@ -2224,6 +2228,15 @@ struct drm_xe_mem_range_attr {
>  		__u32 reserved;
>  	} pat_index;
>  
> +	/** @purge_state_val: Purgeable state configuration */
> +	struct {
> +		/** @purge_state_val.val: value for DRM_XE_VMA_ATTR_PURGEABLE_STATE */
> +		__u32 val;
> +
> +		/** @purge_state_val.reserved: Reserved */
> +		__u32 reserved;
> +	} purge_state_val;
> +
>  	/** @reserved: Reserved */
>  	__u64 reserved[2];
>  };
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 4/9] drm/xe/madvise: Implement purgeable buffer object support
  2025-12-01  5:50 ` [RFC v2 4/9] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
  2025-12-02  1:46   ` Matthew Brost
@ 2025-12-02 21:39   ` Matthew Brost
  2025-12-03 14:01     ` Yadav, Arvind
  1 sibling, 1 reply; 35+ messages in thread
From: Matthew Brost @ 2025-12-02 21:39 UTC (permalink / raw)
  To: Arvind Yadav
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra

On Mon, Dec 01, 2025 at 11:20:14AM +0530, Arvind Yadav wrote:
> This allows userspace applications to provide memory usage hints to
> the kernel for better memory management under pressure:
> 
> Add the core implementation for purgeable buffer objects, enabling memory
> reclamation of user-designated DONTNEED buffers during eviction.
> 
> This patch implements the purge operation and state machine transitions:
> 
> Purgeable States (from xe_madv_purgeable_state):
>  - WILLNEED (0): BO should be retained, actively used
>  - DONTNEED (1): BO eligible for purging, not currently needed
>  - PURGED (2): BO backing store reclaimed, permanently invalid
> 
> Design Rationale:
>   - Async TLB invalidation via trigger_rebind (no blocking xe_vm_invalidate_vma)
>   - i915 compatibility: retained field, "once purged always purged" semantics
>   - Shared BO protection prevents multi-process memory corruption
>   - Scratch PTE reuse avoids new infrastructure, safe for fault mode
> 
> v2:
>   - Use xe_bo_trigger_rebind() for async TLB invalidation (Thomas Hellström)
>   - Add NULL rebind with scratch PTEs for fault mode (Thomas Hellström)
>   - Implement i915-compatible retained field logic (Thomas Hellström)
>   - Skip BO validation for purged BOs in page fault handler (crash fix)
>   - Add scratch VM check in page fault path (non-scratch VMs fail fault)
>   - Force clear_pt for non-scratch VMs to avoid phys addr 0 mapping (review fix)
>   - Add !is_purged check to resource cursor setup to prevent stale access
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c           | 72 ++++++++++++++++++++++-----
>  drivers/gpu/drm/xe/xe_gt_pagefault.c | 19 ++++++++
>  drivers/gpu/drm/xe/xe_pt.c           | 36 ++++++++++++--
>  drivers/gpu/drm/xe/xe_vm.c           | 11 ++++-
>  drivers/gpu/drm/xe/xe_vm_madvise.c   | 73 ++++++++++++++++++++++++++++
>  5 files changed, 193 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index cbc3ee157218..f0b3f7a13114 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -836,6 +836,53 @@ static int xe_bo_move_notify(struct xe_bo *bo,
>  	return 0;
>  }
>  
> +static void xe_bo_set_purged(struct xe_bo *bo)
> +{
> +	/* BO must be locked before modifying madv state */
> +	dma_resv_assert_held(bo->ttm.base.resv);
> +

+1 to assert, but I think xe_bo_assert_held can be used.

> +	atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_PURGED);
> +}
> +
> +/**
> + * xe_ttm_bo_purge() - Purge buffer object backing store
> + * @ttm_bo: The TTM buffer object to purge
> + * @ctx: TTM operation context
> + *
> + * This function purges the backing store of a BO marked as DONTNEED and
> + * triggers rebind to invalidate stale GPU mappings. For fault-mode VMs,
> + * this zaps the PTEs. The next GPU access will trigger a page fault and
> + * perform NULL rebind (scratch pages or clear PTEs based on VM config).
> + */
> +static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
> +{
> +	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
> +	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
> +
> +	if (ttm_bo->ttm) {
> +		struct ttm_placement place = {};
> +		int ret = ttm_bo_validate(ttm_bo, &place, ctx);

Do we need to the eviction to complete? e.g., wait on DMA_RESV_USAGE_KERNEL slots?

> +
> +		drm_WARN_ON(&xe->drm, ret);

Likely an specific warn on function or assert.

> +		if (!ret && bo) {

When would BO be NULL? I don't think ever.

> +			if (atomic_read(&bo->madv_purgeable) == XE_MADV_PURGEABLE_DONTNEED) {

Can we helpers for madv_purgeable access /w an assert?

> +				xe_bo_set_purged(bo);
> +
> +				/*
> +				 * Trigger rebind to invalidate stale GPU mappings.
> +				 * - Non-fault mode: Marks VMAs for rebind
> +				 * - Fault mode: Zaps PTEs (sets to 0), next access triggers fault
> +				 *   and NULL rebind with scratch/clear PTEs per VM config
> +				 */
> +				ret = xe_bo_trigger_rebind(xe, bo, ctx);
> +				if (ret)
> +					drm_warn(&xe->drm,
> +						 "Failed to invalidate purged BO: %d\n", ret);

Xe specific warn on or assert?

Also maybe this function just returns an error if something fails too
and xe_bo_move would return that failure.

> +			}
> +		}
> +	}
> +}
> +
>  static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>  		      struct ttm_operation_ctx *ctx,
>  		      struct ttm_resource *new_mem,
> @@ -853,8 +900,18 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>  	bool needs_clear;
>  	bool handle_system_ccs = (!IS_DGFX(xe) && xe_bo_needs_ccs_pages(bo) &&
>  				  ttm && ttm_tt_is_populated(ttm)) ? true : false;
> +	int state = atomic_read(&bo->madv_purgeable);
>  	int ret = 0;
>  
> +	/*
> +	 * Purge only non-shared BOs explicitly marked DONTNEED by userspace.
> +	 * The move_notify callback will handle invalidation asynchronously.
> +	 */
> +	if (evict && state == XE_MADV_PURGEABLE_DONTNEED && !xe_bo_is_shared_locked(bo)) {
> +		xe_ttm_bo_purge(ttm_bo, ctx);
> +		return 0;
> +	}
> +
>  	/* Bo creation path, moving to system or TT. */
>  	if ((!old_mem && ttm) && !handle_system_ccs) {
>  		if (new_mem->mem_type == XE_PL_TT)
> @@ -1606,18 +1663,6 @@ static void xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object *ttm_bo)
>  	}
>  }
>  
> -static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
> -{
> -	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
> -
> -	if (ttm_bo->ttm) {
> -		struct ttm_placement place = {};
> -		int ret = ttm_bo_validate(ttm_bo, &place, ctx);
> -
> -		drm_WARN_ON(&xe->drm, ret);
> -	}
> -}
> -
>  static void xe_ttm_bo_swap_notify(struct ttm_buffer_object *ttm_bo)
>  {
>  	struct ttm_operation_ctx ctx = {
> @@ -2202,6 +2247,9 @@ struct xe_bo *xe_bo_init_locked(struct xe_device *xe, struct xe_bo *bo,
>  #endif
>  	INIT_LIST_HEAD(&bo->vram_userfault_link);
>  
> +	/* Initialize purge advisory state */
> +	atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_WILLNEED);
> +
>  	drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
>  
>  	if (resv) {
> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> index a054d6010ae0..8c7e5dcb627b 100644
> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> @@ -87,6 +87,13 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
>  	if (!bo)
>  		return 0;
>  
> +	/*
> +	 * Skip validation/migration for purged BOs - they have no backing pages.
> +	 * Rebind will use scratch PTEs instead.
> +	 */
> +	if (xe_bo_is_purged(bo))
> +		return 0;
> +

This needs a rebase as xe_gt_pagefault.c is gone upstream and replaced
with xe_pagefault.c.

>  	return need_vram_move ? xe_bo_migrate(bo, vram->placement, NULL, exec) :
>  		xe_bo_validate(bo, vm, true, exec);
>  }
> @@ -100,9 +107,21 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
>  	struct drm_exec exec;
>  	struct dma_fence *fence;
>  	int err, needs_vram;
> +	struct xe_bo *bo;
>  
>  	lockdep_assert_held_write(&vm->lock);
>  
> +	/*
> +	 * Check if BO is purged. For purged BOs:
> +	 * - Scratch VMs: Allow rebind with scratch PTEs (safe zero reads)
> +	 * - Non-scratch VMs: FAIL the page fault (no scratch page available)
> +	 */
> +	bo = xe_vma_bo(vma);
> +	if (bo && xe_bo_is_purged(bo)) {
> +		if (!xe_vm_has_scratch(vm))
> +			return -EACCES;
> +	}

As discussed in other patches, move the xe_bo_is_purged check under the
dma-resv lock.

> +
>  	needs_vram = xe_vma_need_vram_for_atomic(vm->xe, vma, atomic);
>  	if (needs_vram < 0 || (needs_vram && xe_vma_is_userptr(vma)))
>  		return needs_vram < 0 ? needs_vram : -EACCES;
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index d22fd1ccc0ba..062f64b16a58 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -533,20 +533,26 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>  	/* Is this a leaf entry ?*/
>  	if (level == 0 || xe_pt_hugepte_possible(addr, next, level, xe_walk)) {
>  		struct xe_res_cursor *curs = xe_walk->curs;
> +		struct xe_bo *bo = xe_vma_bo(xe_walk->vma);
>  		bool is_null = xe_vma_is_null(xe_walk->vma);
> -		bool is_vram = is_null ? false : xe_res_is_vram(curs);
> +		bool is_purged = bo && xe_bo_is_purged(bo);

Can we drop is_purged and just set is_null to true for purged BOs?

Or rename s/is_null/is_null_or_purged for clarity?

> +		bool is_vram = (is_null || is_purged) ? false : xe_res_is_vram(curs);
>  
>  		XE_WARN_ON(xe_walk->va_curs_start != addr);
>  
>  		if (xe_walk->clear_pt) {
>  			pte = 0;
>  		} else {
> -			pte = vm->pt_ops->pte_encode_vma(is_null ? 0 :
> +			/*
> +			 * For purged BOs, treat like null VMAs - pass address 0.
> +			 * The pte_encode_vma will set XE_PTE_NULL flag for scratch mapping.
> +			 */
> +			pte = vm->pt_ops->pte_encode_vma((is_null || is_purged) ? 0 :
>  							 xe_res_dma(curs) +
>  							 xe_walk->dma_offset,
>  							 xe_walk->vma,
>  							 pat_index, level);
> -			if (!is_null)
> +			if (!is_null && !is_purged)
>  				pte |= is_vram ? xe_walk->default_vram_pte :
>  					xe_walk->default_system_pte;
>  
> @@ -570,7 +576,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>  		if (unlikely(ret))
>  			return ret;
>  
> -		if (!is_null && !xe_walk->clear_pt)
> +		if (!is_null && !is_purged && !xe_walk->clear_pt)
>  			xe_res_next(curs, next - addr);
>  		xe_walk->va_curs_start = next;
>  		xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << level);
> @@ -723,6 +729,26 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>  	};
>  	struct xe_pt *pt = vm->pt_root[tile->id];
>  	int ret;
> +	bool is_purged = false;
> +
> +	/*
> +	 * Check if BO is purged:
> +	 * - Scratch VMs: Use scratch PTEs (XE_PTE_NULL) for safe zero reads
> +	 * - Non-scratch VMs: Clear PTEs to zero (non-present) to avoid mapping to phys addr 0
> +	 *
> +	 * For non-scratch VMs, we force clear_pt=true so leaf PTEs become completely
> +	 * zero instead of creating a PRESENT mapping to physical address 0.
> +	 */
> +	if (bo && xe_bo_is_purged(bo)) {
> +		is_purged = true;
> +
> +		/*
> +		 * For non-scratch VMs, a NULL rebind should use zero PTEs
> +		 * (non-present), not a present PTE to phys 0.
> +		 */
> +		if (!xe_vm_has_scratch(vm))
> +			xe_walk.clear_pt = true;

So the idea is purged BOs will fault if the VMA is accessed on
non-scratch VMs?

> +	}
>  
>  	if (range) {
>  		/* Move this entire thing to xe_svm.c? */
> @@ -762,7 +788,7 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>  	if (!range)
>  		xe_bo_assert_held(bo);
>  
> -	if (!xe_vma_is_null(vma) && !range) {
> +	if (!xe_vma_is_null(vma) && !range && !is_purged) {
>  		if (xe_vma_is_userptr(vma))
>  			xe_res_first_dma(to_userptr_vma(vma)->userptr.pages.dma_addr, 0,
>  					 xe_vma_size(vma), &curs);
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 10d77666a425..d03e69524369 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1336,6 +1336,9 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
>  static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>  			       u16 pat_index, u32 pt_level)
>  {
> +	struct xe_bo *bo = xe_vma_bo(vma);
> +	struct xe_vm *vm = xe_vma_vm(vma);
> +
>  	pte |= XE_PAGE_PRESENT;
>  
>  	if (likely(!xe_vma_read_only(vma)))
> @@ -1344,7 +1347,13 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>  	pte |= pte_encode_pat_index(pat_index, pt_level);
>  	pte |= pte_encode_ps(pt_level);
>  
> -	if (unlikely(xe_vma_is_null(vma)))
> +	/*
> +	 * NULL PTEs redirect to scratch page (return zeros on read).
> +	 * Set for: 1) explicit null VMAs, 2) purged BOs on scratch VMs.
> +	 * Never set NULL flag without scratch page - causes undefined behavior.
> +	 */
> +	if (unlikely(xe_vma_is_null(vma) ||
> +		     (bo && xe_bo_is_purged(bo) && xe_vm_has_scratch(vm))))
>  		pte |= XE_PTE_NULL;
>  
>  	return pte;
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index cad3cf627c3f..3ba851e0b870 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -158,6 +158,60 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
>  	}
>  }
>  
> +/*
> + * Handle purgeable buffer object advice for DONTNEED/WILLNEED/PURGED.
> + * Updates op->purge_state_val.retained to indicate if backing store
> + * exists (matches i915's retained).
> + */
> +static void xe_vm_madvise_purgeable_bo(struct xe_device *xe, struct xe_vm *vm,
> +				       struct xe_vma **vmas, int num_vmas,
> +				       struct drm_xe_madvise *op)
> +{
> +	bool has_purged_bo = false;
> +	int i;
> +
> +	xe_assert(vm->xe, op->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE);
> +
> +	for (i = 0; i < num_vmas; i++) {
> +		struct xe_bo *bo = xe_vma_bo(vmas[i]);
> +
> +		if (!bo)
> +			continue;
> +
> +		/* BO must be locked before modifying madv state */
> +		dma_resv_assert_held(bo->ttm.base.resv);

xe_bo_assert_held

> +
> +		/*
> +		 * Once purged, always purged. Cannot transition back to WILLNEED.
> +		 * This matches i915 semantics where purged BOs are permanently invalid.
> +		 */
> +		if (xe_bo_is_purged(bo)) {
> +			has_purged_bo = true;
> +			continue;
> +		}
> +
> +		switch (op->purge_state_val.val) {
> +		case DRM_XE_VMA_PURGEABLE_STATE_WILLNEED:
> +			atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_WILLNEED);
> +			break;
> +		case DRM_XE_VMA_PURGEABLE_STATE_DONTNEED:
> +			if (!xe_bo_is_shared_locked(bo))

I think we need a complete VMA check here per Thomas's feedback (e.g.,
all VMAs attached to the BO must be in
DRM_XE_VMA_PURGEABLE_STATE_DONTNEED to flip the BO state). I think the
BO is an import or exported dma-buf we can never flip the state as we
don't know what an external device is doing with it.

Matt

> +				atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_DONTNEED);
> +			break;
> +		default:
> +			drm_warn(&vm->xe->drm, "Invalid madvice value = %d\n",
> +				 op->purge_state_val.val);
> +			return;
> +		}
> +	}
> +
> +	/*
> +	 * Set retained flag to indicate if backing store still exists.
> +	 * Matches i915: retained = 1 if not purged, 0 if purged.
> +	 */
> +	op->purge_state_val.retained = !has_purged_bo;
> +}
> +
>  typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
>  			     struct xe_vma **vmas, int num_vmas,
>  			     struct drm_xe_madvise *op);
> @@ -283,6 +337,19 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
>  			return false;
>  		break;
>  	}
> +	case DRM_XE_VMA_ATTR_PURGEABLE_STATE:
> +	{
> +		u32 val = args->purge_state_val.val;
> +
> +		if (XE_IOCTL_DBG(xe, !((val == DRM_XE_VMA_PURGEABLE_STATE_WILLNEED) ||
> +				       (val == DRM_XE_VMA_PURGEABLE_STATE_DONTNEED))))
> +			return false;
> +
> +		if (XE_IOCTL_DBG(xe, args->purge_state_val.reserved))
> +			return false;
> +
> +		break;
> +	}
>  	default:
>  		if (XE_IOCTL_DBG(xe, 1))
>  			return false;
> @@ -402,6 +469,12 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
>  					goto err_fini;
>  			}
>  		}
> +		if (args->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE) {
> +			xe_vm_madvise_purgeable_bo(xe, vm, madvise_range.vmas,
> +						   madvise_range.num_vmas, args);
> +			goto err_fini;
> +
> +		}
>  	}
>  
>  	if (madvise_range.has_svm_userptr_vmas) {
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 8/9] drm/xe/uapi: Add UAPI for purgeable bo state to madvise query response
  2025-12-02 19:01   ` Matthew Brost
@ 2025-12-03  3:54     ` Yadav, Arvind
  0 siblings, 0 replies; 35+ messages in thread
From: Yadav, Arvind @ 2025-12-03  3:54 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra


On 03-12-2025 00:31, Matthew Brost wrote:
> On Mon, Dec 01, 2025 at 11:20:18AM +0530, Arvind Yadav wrote:
>> From: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>>
>> Complete the purgeable buffer object UAPI by adding the response
>> structure to drm_xe_mem_range_attr for querying current purgeable
>> state of buffer objects within a memory range.
>>
>> This allows userspace to determine the current state of BOs:
>> - DRM_XE_VMA_PURGEABLE_STATE_WILLNEED (0): BO actively needed, has backing store
>> - DRM_XE_VMA_PURGEABLE_STATE_DONTNEED (1): BO eligible for purging, still has backing
>> - DRM_XE_VMA_PURGEABLE_STATE_PURGED (2): BO purged, backing store freed (read-only)
>>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
>> ---
>>   include/uapi/drm/xe_drm.h | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 02d63938d16f..8f289a2849ff 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -2147,10 +2147,14 @@ struct drm_xe_madvise {
>>   		 *  - DRM_XE_VMA_PURGEABLE_STATE_DONTNEED (1): Hints that BO is not
>>   		 *    currently needed. Kernel may purge it under memory pressure.
>>   		 *    Only applies to non-shared BOs. Returns retained=1 if not purged.
>> +		 *
>> +		 *  - DRM_XE_VMA_PURGEABLE_STATE_PURGED: Read-only state indicating
>> +		 *    the BO purge state.
> This is tricky one. Since the value of purgable state can immediately
> change after getting populated, is there any value in reporting this to
> user space? My guess is no. So with that, I'd probably remove this uAPI.


Noted. I will drop this patch along with next following patch both.

~Arvind

>
> Matt
>
>>   		 */
>>   		struct {
>>   #define DRM_XE_VMA_PURGEABLE_STATE_WILLNEED	0
>>   #define DRM_XE_VMA_PURGEABLE_STATE_DONTNEED	1
>> +#define DRM_XE_VMA_PURGEABLE_STATE_PURGED	2
>>   			/** @purge_state_val.val: value for DRM_XE_VMA_ATTR_PURGEABLE_STATE */
>>   			__u32 val;
>>   			/**
>> @@ -2224,6 +2228,15 @@ struct drm_xe_mem_range_attr {
>>   		__u32 reserved;
>>   	} pat_index;
>>   
>> +	/** @purge_state_val: Purgeable state configuration */
>> +	struct {
>> +		/** @purge_state_val.val: value for DRM_XE_VMA_ATTR_PURGEABLE_STATE */
>> +		__u32 val;
>> +
>> +		/** @purge_state_val.reserved: Reserved */
>> +		__u32 reserved;
>> +	} purge_state_val;
>> +
>>   	/** @reserved: Reserved */
>>   	__u64 reserved[2];
>>   };
>> -- 
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 5/9] drm/xe/bo: Handle CPU faults on purged buffer objects
  2025-12-02 18:48     ` Matthew Brost
@ 2025-12-03  7:25       ` Yadav, Arvind
  2025-12-03 16:24         ` Matthew Brost
  0 siblings, 1 reply; 35+ messages in thread
From: Yadav, Arvind @ 2025-12-03  7:25 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra


On 03-12-2025 00:18, Matthew Brost wrote:
> On Tue, Dec 02, 2025 at 10:42:39AM -0800, Matthew Brost wrote:
>> On Mon, Dec 01, 2025 at 11:20:15AM +0530, Arvind Yadav wrote:
>>> Modify the CPU page fault handler, `xe_bo_cpu_fault()`, to correctly
>>> handle access to buffer objects that have been purged.
>>>
>>> When a buffer object is in the `XE_MADV_PURGED` state, its backing
>>> store has been reclaimed by the kernel. If the CPU attempts to access
>>> this memory, it is an error that should be reported to the application.
>>>
>>> v2:
>>>    - Added xe_bo_is_purged(bo) instead of atomic_read.
>>>    - Avoids leaks and keeps drm_dev_exit() while returning.
>>>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
>>
> Ah, actually I think I made a mistake here.
>
>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>>> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/xe_bo.c | 10 ++++++++++
>>>   1 file changed, 10 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>>> index f0b3f7a13114..7f5bcf114ed4 100644
>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>> @@ -1992,6 +1992,16 @@ static vm_fault_t xe_bo_cpu_fault(struct vm_fault *vmf)
>>>   	if (!drm_dev_enter(&xe->drm, &idx))
>>>   		return ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
>>>   
>>> +	/*
>>> +	 * BO content is gone. Signal the user process.
>>> +	 * Once purged, BO remains permanently invalid (i915 semantics).
>>> +	 * Application must destroy and recreate the BO.
>>> +	 */
>>> +	if (xe_bo_is_purged(bo)) {
> Doesn't this need to done under the BO's dma-resv lock to avoid a race?
> Consider the case where after this check, TTM evicts this BO changing
> the state purged. Now we grab the BO's dma-resv lock and try to get
> pages on purged BO. Seems like an issue.

Thanks for catching these issues!.

>
> Also with that, xe_bo_is_purged likely should have lockdep annotation
> asserting the BOs dma-resv lock is held.
I initially added xe_bo_assert_held() to xe_bo_is_purged(), but it 
causes crashes because many callers don't hold the lock.
For example, in xe_pagefault.c (this early check), no lock is held. I’ll 
recheck the call and update accordingly.

~Arvind
>
> Matt
>
>>> +		ret = VM_FAULT_SIGBUS;
>>> +		goto out;
>>> +	}
>>> +
>>>   	ret = xe_bo_cpu_fault_fastpath(vmf, xe, bo, needs_rpm);
>>>   	if (ret != VM_FAULT_RETRY)
>>>   		goto out;
>>> -- 
>>> 2.43.0
>>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 7/9] drm/xe/vm: Prevent binding of purged buffer objects
  2025-12-02 18:57   ` Matthew Brost
@ 2025-12-03 11:24     ` Yadav, Arvind
  0 siblings, 0 replies; 35+ messages in thread
From: Yadav, Arvind @ 2025-12-03 11:24 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra


On 03-12-2025 00:27, Matthew Brost wrote:
> On Mon, Dec 01, 2025 at 11:20:17AM +0530, Arvind Yadav wrote:
>> Add validation in xe_vm_bind_ioctl_validate_bo() to reject MAP and
>> PREFETCH operations on purged buffer objects with -EINVAL.
>>
>> Problem:
>> When a BO is purged (XE_MADV_PURGEABLE_PURGED state), its backing pages
>> have been freed by the kernel. Without this check, VM_BIND operations
>> would proceed:
>>
>>   1. DRM_XE_VM_BIND_OP_MAP: Attempts to create GPU mappings to freed memory
>>      - xe_vma_ops_alloc() creates VMA pointing to invalid BO
>>      - Page tables populated with stale/invalid addresses
>>      - GPU access leads to undefined behavior or hangs
>>
>>   2. DRM_XE_VM_BIND_OP_PREFETCH: Attempts to migrate non-existent pages
>>      - Triggers BO validation in TTM
>>      - ttm_bo_validate() fails or crashes (no backing store)
>>      - Wasted work for permanently invalid BO
>>
>> With this check:
>>    - MAP/PREFETCH immediately fail with -EINVAL at ioctl boundary
>>    - Clear error message at syscall (better UX than deferred GPU hang)
>>    - Prevents creation of invalid GPU page table entries
>>
>> v2:
>>    - Clarify that purged BOs are permanently invalid (i915 semantics)
>>    - Remove incorrect claim about madvise(WILLNEED) restoring purged BOs
>>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_vm.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index d03e69524369..cc946bff9607 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -3482,6 +3482,13 @@ static int xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo *bo,
>>   		return -EINVAL;
>>   	}
>>   
>> +	/* Purged BOs are permanently invalid; reject new MAP/PREFETCH. */
>> +	if (XE_IOCTL_DBG(xe,
>> +			 xe_bo_is_purged(bo) &&
>> +			 (op == DRM_XE_VM_BIND_OP_MAP ||
>> +			  op == DRM_XE_VM_BIND_OP_PREFETCH)))
>> +		return -EINVAL;
> I'd move this check to later in the bind pipeline once we have BO's
> dma-resv lock to make this race free. I think vma_lock_and_validate is
> likely the correct function after drm_exec_lock_obj but before
> xe_bo_validate. Or you could just make xe_bo_validate fail if the object
> is purged.


Noted. I will move this check under vm_lock_and_validate.
Moving the purge check to xe_bo_validate() would break GPU page fault 
recovery for
purged BO-backed VMAs (preventing scratch PTE rebinds).


~Arvind


>
> Matt
>
>> +
>>   	/*
>>   	 * Some platforms require 64k VM_BIND alignment,
>>   	 * specifically those with XE_VRAM_FLAGS_NEED64K.
>> -- 
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 4/9] drm/xe/madvise: Implement purgeable buffer object support
  2025-12-02 21:39   ` Matthew Brost
@ 2025-12-03 14:01     ` Yadav, Arvind
  0 siblings, 0 replies; 35+ messages in thread
From: Yadav, Arvind @ 2025-12-03 14:01 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra


On 03-12-2025 03:09, Matthew Brost wrote:
> On Mon, Dec 01, 2025 at 11:20:14AM +0530, Arvind Yadav wrote:
>> This allows userspace applications to provide memory usage hints to
>> the kernel for better memory management under pressure:
>>
>> Add the core implementation for purgeable buffer objects, enabling memory
>> reclamation of user-designated DONTNEED buffers during eviction.
>>
>> This patch implements the purge operation and state machine transitions:
>>
>> Purgeable States (from xe_madv_purgeable_state):
>>   - WILLNEED (0): BO should be retained, actively used
>>   - DONTNEED (1): BO eligible for purging, not currently needed
>>   - PURGED (2): BO backing store reclaimed, permanently invalid
>>
>> Design Rationale:
>>    - Async TLB invalidation via trigger_rebind (no blocking xe_vm_invalidate_vma)
>>    - i915 compatibility: retained field, "once purged always purged" semantics
>>    - Shared BO protection prevents multi-process memory corruption
>>    - Scratch PTE reuse avoids new infrastructure, safe for fault mode
>>
>> v2:
>>    - Use xe_bo_trigger_rebind() for async TLB invalidation (Thomas Hellström)
>>    - Add NULL rebind with scratch PTEs for fault mode (Thomas Hellström)
>>    - Implement i915-compatible retained field logic (Thomas Hellström)
>>    - Skip BO validation for purged BOs in page fault handler (crash fix)
>>    - Add scratch VM check in page fault path (non-scratch VMs fail fault)
>>    - Force clear_pt for non-scratch VMs to avoid phys addr 0 mapping (review fix)
>>    - Add !is_purged check to resource cursor setup to prevent stale access
>>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_bo.c           | 72 ++++++++++++++++++++++-----
>>   drivers/gpu/drm/xe/xe_gt_pagefault.c | 19 ++++++++
>>   drivers/gpu/drm/xe/xe_pt.c           | 36 ++++++++++++--
>>   drivers/gpu/drm/xe/xe_vm.c           | 11 ++++-
>>   drivers/gpu/drm/xe/xe_vm_madvise.c   | 73 ++++++++++++++++++++++++++++
>>   5 files changed, 193 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>> index cbc3ee157218..f0b3f7a13114 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.c
>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>> @@ -836,6 +836,53 @@ static int xe_bo_move_notify(struct xe_bo *bo,
>>   	return 0;
>>   }
>>   
>> +static void xe_bo_set_purged(struct xe_bo *bo)
>> +{
>> +	/* BO must be locked before modifying madv state */
>> +	dma_resv_assert_held(bo->ttm.base.resv);
>> +
> +1 to assert, but I think xe_bo_assert_held can be used.


Noted.

>
>> +	atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_PURGED);
>> +}
>> +
>> +/**
>> + * xe_ttm_bo_purge() - Purge buffer object backing store
>> + * @ttm_bo: The TTM buffer object to purge
>> + * @ctx: TTM operation context
>> + *
>> + * This function purges the backing store of a BO marked as DONTNEED and
>> + * triggers rebind to invalidate stale GPU mappings. For fault-mode VMs,
>> + * this zaps the PTEs. The next GPU access will trigger a page fault and
>> + * perform NULL rebind (scratch pages or clear PTEs based on VM config).
>> + */
>> +static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
>> +{
>> +	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
>> +	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
>> +
>> +	if (ttm_bo->ttm) {
>> +		struct ttm_placement place = {};
>> +		int ret = ttm_bo_validate(ttm_bo, &place, ctx);
> Do we need to the eviction to complete? e.g., wait on DMA_RESV_USAGE_KERNEL slots?


xe_ttm_bo_purge is existing code, not written by me. I just relocated 
the function to this section.

>> +
>> +		drm_WARN_ON(&xe->drm, ret);
> Likely an specific warn on function or assert.
same, xe_ttm_bo_purge is existing code, not written by me. I just 
relocated the function to this section.
>
>> +		if (!ret && bo) {
> When would BO be NULL? I don't think ever.


Noted.

>
>> +			if (atomic_read(&bo->madv_purgeable) == XE_MADV_PURGEABLE_DONTNEED) {
> Can we helpers for madv_purgeable access /w an assert?


Noted.

>
>> +				xe_bo_set_purged(bo);
>> +
>> +				/*
>> +				 * Trigger rebind to invalidate stale GPU mappings.
>> +				 * - Non-fault mode: Marks VMAs for rebind
>> +				 * - Fault mode: Zaps PTEs (sets to 0), next access triggers fault
>> +				 *   and NULL rebind with scratch/clear PTEs per VM config
>> +				 */
>> +				ret = xe_bo_trigger_rebind(xe, bo, ctx);
>> +				if (ret)
>> +					drm_warn(&xe->drm,
>> +						 "Failed to invalidate purged BO: %d\n", ret);
> Xe specific warn on or assert?


Noted.

>
> Also maybe this function just returns an error if something fails too
> and xe_bo_move would return that failure.


same, xe_ttm_bo_purge is existing code, not written by me. I just 
relocated the function to this section

>
>> +			}
>> +		}
>> +	}
>> +}
>> +
>>   static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>>   		      struct ttm_operation_ctx *ctx,
>>   		      struct ttm_resource *new_mem,
>> @@ -853,8 +900,18 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>>   	bool needs_clear;
>>   	bool handle_system_ccs = (!IS_DGFX(xe) && xe_bo_needs_ccs_pages(bo) &&
>>   				  ttm && ttm_tt_is_populated(ttm)) ? true : false;
>> +	int state = atomic_read(&bo->madv_purgeable);
>>   	int ret = 0;
>>   
>> +	/*
>> +	 * Purge only non-shared BOs explicitly marked DONTNEED by userspace.
>> +	 * The move_notify callback will handle invalidation asynchronously.
>> +	 */
>> +	if (evict && state == XE_MADV_PURGEABLE_DONTNEED && !xe_bo_is_shared_locked(bo)) {
>> +		xe_ttm_bo_purge(ttm_bo, ctx);
>> +		return 0;
>> +	}
>> +
>>   	/* Bo creation path, moving to system or TT. */
>>   	if ((!old_mem && ttm) && !handle_system_ccs) {
>>   		if (new_mem->mem_type == XE_PL_TT)
>> @@ -1606,18 +1663,6 @@ static void xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object *ttm_bo)
>>   	}
>>   }
>>   
>> -static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
>> -{
>> -	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
>> -
>> -	if (ttm_bo->ttm) {
>> -		struct ttm_placement place = {};
>> -		int ret = ttm_bo_validate(ttm_bo, &place, ctx);
>> -
>> -		drm_WARN_ON(&xe->drm, ret);
>> -	}
>> -}
>> -
>>   static void xe_ttm_bo_swap_notify(struct ttm_buffer_object *ttm_bo)
>>   {
>>   	struct ttm_operation_ctx ctx = {
>> @@ -2202,6 +2247,9 @@ struct xe_bo *xe_bo_init_locked(struct xe_device *xe, struct xe_bo *bo,
>>   #endif
>>   	INIT_LIST_HEAD(&bo->vram_userfault_link);
>>   
>> +	/* Initialize purge advisory state */
>> +	atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_WILLNEED);
>> +
>>   	drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
>>   
>>   	if (resv) {
>> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
>> index a054d6010ae0..8c7e5dcb627b 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
>> @@ -87,6 +87,13 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
>>   	if (!bo)
>>   		return 0;
>>   
>> +	/*
>> +	 * Skip validation/migration for purged BOs - they have no backing pages.
>> +	 * Rebind will use scratch PTEs instead.
>> +	 */
>> +	if (xe_bo_is_purged(bo))
>> +		return 0;
>> +
> This needs a rebase as xe_gt_pagefault.c is gone upstream and replaced
> with xe_pagefault.c.


Noted.

>
>>   	return need_vram_move ? xe_bo_migrate(bo, vram->placement, NULL, exec) :
>>   		xe_bo_validate(bo, vm, true, exec);
>>   }
>> @@ -100,9 +107,21 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
>>   	struct drm_exec exec;
>>   	struct dma_fence *fence;
>>   	int err, needs_vram;
>> +	struct xe_bo *bo;
>>   
>>   	lockdep_assert_held_write(&vm->lock);
>>   
>> +	/*
>> +	 * Check if BO is purged. For purged BOs:
>> +	 * - Scratch VMs: Allow rebind with scratch PTEs (safe zero reads)
>> +	 * - Non-scratch VMs: FAIL the page fault (no scratch page available)
>> +	 */
>> +	bo = xe_vma_bo(vma);
>> +	if (bo && xe_bo_is_purged(bo)) {
>> +		if (!xe_vm_has_scratch(vm))
>> +			return -EACCES;
>> +	}
> As discussed in other patches, move the xe_bo_is_purged check under the
> dma-resv lock.


Noted.

>
>> +
>>   	needs_vram = xe_vma_need_vram_for_atomic(vm->xe, vma, atomic);
>>   	if (needs_vram < 0 || (needs_vram && xe_vma_is_userptr(vma)))
>>   		return needs_vram < 0 ? needs_vram : -EACCES;
>> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
>> index d22fd1ccc0ba..062f64b16a58 100644
>> --- a/drivers/gpu/drm/xe/xe_pt.c
>> +++ b/drivers/gpu/drm/xe/xe_pt.c
>> @@ -533,20 +533,26 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>>   	/* Is this a leaf entry ?*/
>>   	if (level == 0 || xe_pt_hugepte_possible(addr, next, level, xe_walk)) {
>>   		struct xe_res_cursor *curs = xe_walk->curs;
>> +		struct xe_bo *bo = xe_vma_bo(xe_walk->vma);
>>   		bool is_null = xe_vma_is_null(xe_walk->vma);
>> -		bool is_vram = is_null ? false : xe_res_is_vram(curs);
>> +		bool is_purged = bo && xe_bo_is_purged(bo);
> Can we drop is_purged and just set is_null to true for purged BOs?
>
> Or rename s/is_null/is_null_or_purged for clarity?


Agreed, I will do the changes as per suggestion.

>
>> +		bool is_vram = (is_null || is_purged) ? false : xe_res_is_vram(curs);
>>   
>>   		XE_WARN_ON(xe_walk->va_curs_start != addr);
>>   
>>   		if (xe_walk->clear_pt) {
>>   			pte = 0;
>>   		} else {
>> -			pte = vm->pt_ops->pte_encode_vma(is_null ? 0 :
>> +			/*
>> +			 * For purged BOs, treat like null VMAs - pass address 0.
>> +			 * The pte_encode_vma will set XE_PTE_NULL flag for scratch mapping.
>> +			 */
>> +			pte = vm->pt_ops->pte_encode_vma((is_null || is_purged) ? 0 :
>>   							 xe_res_dma(curs) +
>>   							 xe_walk->dma_offset,
>>   							 xe_walk->vma,
>>   							 pat_index, level);
>> -			if (!is_null)
>> +			if (!is_null && !is_purged)
>>   				pte |= is_vram ? xe_walk->default_vram_pte :
>>   					xe_walk->default_system_pte;
>>   
>> @@ -570,7 +576,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>>   		if (unlikely(ret))
>>   			return ret;
>>   
>> -		if (!is_null && !xe_walk->clear_pt)
>> +		if (!is_null && !is_purged && !xe_walk->clear_pt)
>>   			xe_res_next(curs, next - addr);
>>   		xe_walk->va_curs_start = next;
>>   		xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << level);
>> @@ -723,6 +729,26 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>>   	};
>>   	struct xe_pt *pt = vm->pt_root[tile->id];
>>   	int ret;
>> +	bool is_purged = false;
>> +
>> +	/*
>> +	 * Check if BO is purged:
>> +	 * - Scratch VMs: Use scratch PTEs (XE_PTE_NULL) for safe zero reads
>> +	 * - Non-scratch VMs: Clear PTEs to zero (non-present) to avoid mapping to phys addr 0
>> +	 *
>> +	 * For non-scratch VMs, we force clear_pt=true so leaf PTEs become completely
>> +	 * zero instead of creating a PRESENT mapping to physical address 0.
>> +	 */
>> +	if (bo && xe_bo_is_purged(bo)) {
>> +		is_purged = true;
>> +
>> +		/*
>> +		 * For non-scratch VMs, a NULL rebind should use zero PTEs
>> +		 * (non-present), not a present PTE to phys 0.
>> +		 */
>> +		if (!xe_vm_has_scratch(vm))
>> +			xe_walk.clear_pt = true;
> So the idea is purged BOs will fault if the VMA is accessed on
> non-scratch VMs?


Yes, exactly. The behavior differs based on VM type:

- Scratch VMs: Purged BOs use scratch PTEs (XE_PTE_NULL flag set by 
pte_encode_vma when addr=0). GPU reads return zeros, no fault.
- Non-scratch VMs: Purged BOs get completely zero PTEs (non-present). 
Any GPU access will fault, which is the expected behavior for 
non-scratch VMs - they don't have scratch pages to fall back to.

>
>> +	}
>>   
>>   	if (range) {
>>   		/* Move this entire thing to xe_svm.c? */
>> @@ -762,7 +788,7 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>>   	if (!range)
>>   		xe_bo_assert_held(bo);
>>   
>> -	if (!xe_vma_is_null(vma) && !range) {
>> +	if (!xe_vma_is_null(vma) && !range && !is_purged) {
>>   		if (xe_vma_is_userptr(vma))
>>   			xe_res_first_dma(to_userptr_vma(vma)->userptr.pages.dma_addr, 0,
>>   					 xe_vma_size(vma), &curs);
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index 10d77666a425..d03e69524369 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -1336,6 +1336,9 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
>>   static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>>   			       u16 pat_index, u32 pt_level)
>>   {
>> +	struct xe_bo *bo = xe_vma_bo(vma);
>> +	struct xe_vm *vm = xe_vma_vm(vma);
>> +
>>   	pte |= XE_PAGE_PRESENT;
>>   
>>   	if (likely(!xe_vma_read_only(vma)))
>> @@ -1344,7 +1347,13 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>>   	pte |= pte_encode_pat_index(pat_index, pt_level);
>>   	pte |= pte_encode_ps(pt_level);
>>   
>> -	if (unlikely(xe_vma_is_null(vma)))
>> +	/*
>> +	 * NULL PTEs redirect to scratch page (return zeros on read).
>> +	 * Set for: 1) explicit null VMAs, 2) purged BOs on scratch VMs.
>> +	 * Never set NULL flag without scratch page - causes undefined behavior.
>> +	 */
>> +	if (unlikely(xe_vma_is_null(vma) ||
>> +		     (bo && xe_bo_is_purged(bo) && xe_vm_has_scratch(vm))))
>>   		pte |= XE_PTE_NULL;
>>   
>>   	return pte;
>> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
>> index cad3cf627c3f..3ba851e0b870 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
>> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
>> @@ -158,6 +158,60 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
>>   	}
>>   }
>>   
>> +/*
>> + * Handle purgeable buffer object advice for DONTNEED/WILLNEED/PURGED.
>> + * Updates op->purge_state_val.retained to indicate if backing store
>> + * exists (matches i915's retained).
>> + */
>> +static void xe_vm_madvise_purgeable_bo(struct xe_device *xe, struct xe_vm *vm,
>> +				       struct xe_vma **vmas, int num_vmas,
>> +				       struct drm_xe_madvise *op)
>> +{
>> +	bool has_purged_bo = false;
>> +	int i;
>> +
>> +	xe_assert(vm->xe, op->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE);
>> +
>> +	for (i = 0; i < num_vmas; i++) {
>> +		struct xe_bo *bo = xe_vma_bo(vmas[i]);
>> +
>> +		if (!bo)
>> +			continue;
>> +
>> +		/* BO must be locked before modifying madv state */
>> +		dma_resv_assert_held(bo->ttm.base.resv);
> xe_bo_assert_held


Noted.

>
>> +
>> +		/*
>> +		 * Once purged, always purged. Cannot transition back to WILLNEED.
>> +		 * This matches i915 semantics where purged BOs are permanently invalid.
>> +		 */
>> +		if (xe_bo_is_purged(bo)) {
>> +			has_purged_bo = true;
>> +			continue;
>> +		}
>> +
>> +		switch (op->purge_state_val.val) {
>> +		case DRM_XE_VMA_PURGEABLE_STATE_WILLNEED:
>> +			atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_WILLNEED);
>> +			break;
>> +		case DRM_XE_VMA_PURGEABLE_STATE_DONTNEED:
>> +			if (!xe_bo_is_shared_locked(bo))
> I think we need a complete VMA check here per Thomas's feedback (e.g.,
> all VMAs attached to the BO must be in
> DRM_XE_VMA_PURGEABLE_STATE_DONTNEED to flip the BO state). I think the
> BO is an import or exported dma-buf we can never flip the state as we
> don't know what an external device is doing with it.


I will add VMA checK as per suggestion.

Good catch on the dma-buf case.
However, you're right that we're missing the imported/exported dma-buf 
check. I'll add a check to reject DONTNEED for:
- Imported BOs (bo->ttm.base.import_attach != NULL)
- Exported BOs (bo->ttm.base.dma_buf != NULL && 
bo->ttm.base.dma_buf->file != NULL)

~Arvind

>
> Matt
>
>> +				atomic_set(&bo->madv_purgeable, XE_MADV_PURGEABLE_DONTNEED);
>> +			break;
>> +		default:
>> +			drm_warn(&vm->xe->drm, "Invalid madvice value = %d\n",
>> +				 op->purge_state_val.val);
>> +			return;
>> +		}
>> +	}
>> +
>> +	/*
>> +	 * Set retained flag to indicate if backing store still exists.
>> +	 * Matches i915: retained = 1 if not purged, 0 if purged.
>> +	 */
>> +	op->purge_state_val.retained = !has_purged_bo;
>> +}
>> +
>>   typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
>>   			     struct xe_vma **vmas, int num_vmas,
>>   			     struct drm_xe_madvise *op);
>> @@ -283,6 +337,19 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
>>   			return false;
>>   		break;
>>   	}
>> +	case DRM_XE_VMA_ATTR_PURGEABLE_STATE:
>> +	{
>> +		u32 val = args->purge_state_val.val;
>> +
>> +		if (XE_IOCTL_DBG(xe, !((val == DRM_XE_VMA_PURGEABLE_STATE_WILLNEED) ||
>> +				       (val == DRM_XE_VMA_PURGEABLE_STATE_DONTNEED))))
>> +			return false;
>> +
>> +		if (XE_IOCTL_DBG(xe, args->purge_state_val.reserved))
>> +			return false;
>> +
>> +		break;
>> +	}
>>   	default:
>>   		if (XE_IOCTL_DBG(xe, 1))
>>   			return false;
>> @@ -402,6 +469,12 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
>>   					goto err_fini;
>>   			}
>>   		}
>> +		if (args->type == DRM_XE_VMA_ATTR_PURGEABLE_STATE) {
>> +			xe_vm_madvise_purgeable_bo(xe, vm, madvise_range.vmas,
>> +						   madvise_range.num_vmas, args);
>> +			goto err_fini;
>> +
>> +		}
>>   	}
>>   
>>   	if (madvise_range.has_svm_userptr_vmas) {
>> -- 
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 5/9] drm/xe/bo: Handle CPU faults on purged buffer objects
  2025-12-03  7:25       ` Yadav, Arvind
@ 2025-12-03 16:24         ` Matthew Brost
  0 siblings, 0 replies; 35+ messages in thread
From: Matthew Brost @ 2025-12-03 16:24 UTC (permalink / raw)
  To: Yadav, Arvind
  Cc: intel-xe, himal.prasad.ghimiray, thomas.hellstrom, pallavi.mishra

On Wed, Dec 03, 2025 at 12:55:52PM +0530, Yadav, Arvind wrote:
> 
> On 03-12-2025 00:18, Matthew Brost wrote:
> > On Tue, Dec 02, 2025 at 10:42:39AM -0800, Matthew Brost wrote:
> > > On Mon, Dec 01, 2025 at 11:20:15AM +0530, Arvind Yadav wrote:
> > > > Modify the CPU page fault handler, `xe_bo_cpu_fault()`, to correctly
> > > > handle access to buffer objects that have been purged.
> > > > 
> > > > When a buffer object is in the `XE_MADV_PURGED` state, its backing
> > > > store has been reclaimed by the kernel. If the CPU attempts to access
> > > > this memory, it is an error that should be reported to the application.
> > > > 
> > > > v2:
> > > >    - Added xe_bo_is_purged(bo) instead of atomic_read.
> > > >    - Avoids leaks and keeps drm_dev_exit() while returning.
> > > > 
> > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> > > 
> > Ah, actually I think I made a mistake here.
> > 
> > > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> > > > Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> > > > ---
> > > >   drivers/gpu/drm/xe/xe_bo.c | 10 ++++++++++
> > > >   1 file changed, 10 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > > > index f0b3f7a13114..7f5bcf114ed4 100644
> > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > @@ -1992,6 +1992,16 @@ static vm_fault_t xe_bo_cpu_fault(struct vm_fault *vmf)
> > > >   	if (!drm_dev_enter(&xe->drm, &idx))
> > > >   		return ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
> > > > +	/*
> > > > +	 * BO content is gone. Signal the user process.
> > > > +	 * Once purged, BO remains permanently invalid (i915 semantics).
> > > > +	 * Application must destroy and recreate the BO.
> > > > +	 */
> > > > +	if (xe_bo_is_purged(bo)) {
> > Doesn't this need to done under the BO's dma-resv lock to avoid a race?
> > Consider the case where after this check, TTM evicts this BO changing
> > the state purged. Now we grab the BO's dma-resv lock and try to get
> > pages on purged BO. Seems like an issue.
> 
> Thanks for catching these issues!.
> 
> > 
> > Also with that, xe_bo_is_purged likely should have lockdep annotation
> > asserting the BOs dma-resv lock is held.
> I initially added xe_bo_assert_held() to xe_bo_is_purged(), but it causes
> crashes because many callers don't hold the lock.
> For example, in xe_pagefault.c (this early check), no lock is held. I’ll
> recheck the call and update accordingly.
> 

I've touched on this in other patches - sorry my reviews sometimes come
in scattered bursts as I look at code - but I think the point is all
purging state changes / critical checks should be done under the BO
dma-resv lock to avoid races. Sure user space shouldn't touching a BO in
WONTNEED state but it could and if purging races with a check outside a
lock it seems like bad things could happen in the kernel.

Matt

> ~Arvind
> > 
> > Matt
> > 
> > > > +		ret = VM_FAULT_SIGBUS;
> > > > +		goto out;
> > > > +	}
> > > > +
> > > >   	ret = xe_bo_cpu_fault_fastpath(vmf, xe, bo, needs_rpm);
> > > >   	if (ret != VM_FAULT_RETRY)
> > > >   		goto out;
> > > > -- 
> > > > 2.43.0
> > > > 

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2025-12-03 16:25 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-01  5:50 [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
2025-12-01  5:50 ` [RFC v2 1/9] drm/xe/uapi: Add UAPI " Arvind Yadav
2025-12-01 23:00   ` Matthew Brost
2025-12-02  2:55     ` Yadav, Arvind
2025-12-01  5:50 ` [RFC v2 2/9] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo Arvind Yadav
2025-12-01 23:02   ` Matthew Brost
2025-12-02  2:56     ` Yadav, Arvind
2025-12-02 18:52   ` Matthew Brost
2025-12-01  5:50 ` [RFC v2 3/9] drm/xe/bo: Prevent purging of shared buffer objects Arvind Yadav
2025-12-01 23:10   ` Matthew Brost
2025-12-02  3:42     ` Yadav, Arvind
2025-12-02  9:42       ` Thomas Hellström
2025-12-02 15:17         ` Matthew Brost
2025-12-02 18:22           ` Yadav, Arvind
2025-12-02 18:35             ` Matthew Brost
2025-12-01  5:50 ` [RFC v2 4/9] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
2025-12-02  1:46   ` Matthew Brost
2025-12-02  4:01     ` Yadav, Arvind
2025-12-02 21:39   ` Matthew Brost
2025-12-03 14:01     ` Yadav, Arvind
2025-12-01  5:50 ` [RFC v2 5/9] drm/xe/bo: Handle CPU faults on purged buffer objects Arvind Yadav
2025-12-02 18:42   ` Matthew Brost
2025-12-02 18:48     ` Matthew Brost
2025-12-03  7:25       ` Yadav, Arvind
2025-12-03 16:24         ` Matthew Brost
2025-12-01  5:50 ` [RFC v2 6/9] drm/xe/bo: Prevent mmap of " Arvind Yadav
2025-12-02 18:54   ` Matthew Brost
2025-12-01  5:50 ` [RFC v2 7/9] drm/xe/vm: Prevent binding " Arvind Yadav
2025-12-02 18:57   ` Matthew Brost
2025-12-03 11:24     ` Yadav, Arvind
2025-12-01  5:50 ` [RFC v2 8/9] drm/xe/uapi: Add UAPI for purgeable bo state to madvise query response Arvind Yadav
2025-12-02 19:01   ` Matthew Brost
2025-12-03  3:54     ` Yadav, Arvind
2025-12-01  5:50 ` [RFC v2 9/9] drm/xe: Add support for querying purgeable BO states Arvind Yadav
2025-12-02 18:36 ` [RFC v2 0/9] drm/xe/madvise: Add support for purgeable buffer objects Souza, Jose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox