All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/23] MADVISE FOR XE
@ 2025-07-22 13:35 Himal Prasad Ghimiray
  2025-07-22 13:35 ` [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops Himal Prasad Ghimiray
                   ` (22 more replies)
  0 siblings, 23 replies; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

Provides a user API to assign attributes like pat_index, atomic
operation type, and preferred location for SVM ranges.
The Kernel Mode Driver (KMD) may split existing VMAs to cover input
ranges, assign user-provided attributes, and invalidate existing PTEs so
that the next page fault/prefetch can use the new attributes.

-v5
Restore attributes to default after free from userspace
Add defragment worker to merge cpu mirror vma with default attributes
Avoid using VMA in uapi
address review comments

-v4:
fix atomic policies
fix attribute copy
address review comments

Himal Prasad Ghimiray (23):
  Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops
  drm/xe/uapi: Add madvise interface
  drm/xe/vm: Add attributes struct as member of vma
  drm/xe/vma: Move pat_index to vma attributes
  drm/xe/vma: Modify new_vma to accept struct xe_vma_mem_attr as
    parameter
  drm/gpusvm: Make drm_gpusvm_for_each_* macros public
  drm/xe/svm: Split system allocator vma incase of madvise call
  drm/xe: Allow CPU address mirror VMA unbind with gpu bindings for
    madvise
  drm/xe/svm: Add xe_svm_ranges_zap_ptes_in_range() for PTE zapping
  drm/xe: Implement madvise ioctl for xe
  drm/xe/svm : Add svm ranges migration policy on atomic access
  drm/xe/madvise: Update migration policy based on preferred location
  drm/xe/svm: Support DRM_XE_SVM_ATTR_PAT memory attribute
  drm/xe/uapi: Add flag for consulting madvise hints on svm prefetch
  drm/xe/svm: Consult madvise preferred location in prefetch
  drm/xe/bo: Add attributes field to xe_bo
  drm/xe/bo: Update atomic_access attribute on madvise
  drm/xe/madvise: Skip vma invalidation if mem attr are unchanged
  drm/xe/vm: Add helper to check for default VMA memory attributes
  drm/xe: Reset VMA attributes to default in SVM garbage collector
  drm/xe/vm: Add a delayed worker to merge fragmented vmas
  drm/xe: Enable madvise ioctl for xe
  drm/xe/uapi: Add UAPI for querying VMA count and memory attributes

 drivers/gpu/drm/drm_gpusvm.c           | 122 ++----
 drivers/gpu/drm/drm_gpuvm.c            |  93 ++++-
 drivers/gpu/drm/nouveau/nouveau_uvmm.c |   1 +
 drivers/gpu/drm/xe/Makefile            |   1 +
 drivers/gpu/drm/xe/xe_bo.c             |  29 +-
 drivers/gpu/drm/xe/xe_bo_types.h       |   8 +
 drivers/gpu/drm/xe/xe_device.c         |   4 +
 drivers/gpu/drm/xe/xe_gt_pagefault.c   |   2 +-
 drivers/gpu/drm/xe/xe_pt.c             |  39 +-
 drivers/gpu/drm/xe/xe_svm.c            | 154 +++++++-
 drivers/gpu/drm/xe/xe_svm.h            |  22 ++
 drivers/gpu/drm/xe/xe_tile.h           |  18 +
 drivers/gpu/drm/xe/xe_vm.c             | 518 ++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_vm.h             |  10 +-
 drivers/gpu/drm/xe/xe_vm_madvise.c     | 431 ++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm_madvise.h     |  15 +
 drivers/gpu/drm/xe/xe_vm_types.h       |  65 +++-
 include/drm/drm_gpusvm.h               |  70 ++++
 include/drm/drm_gpuvm.h                |  25 +-
 include/uapi/drm/xe_drm.h              | 273 +++++++++++++
 20 files changed, 1743 insertions(+), 157 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_vm_madvise.c
 create mode 100644 drivers/gpu/drm/xe/xe_vm_madvise.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-22 13:38   ` Danilo Krummrich
  2025-07-27 21:18   ` Matthew Brost
  2025-07-22 13:35 ` [PATCH v5 02/23] drm/xe/uapi: Add madvise interface Himal Prasad Ghimiray
                   ` (21 subsequent siblings)
  22 siblings, 2 replies; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe
  Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray,
	Danilo Krummrich, Boris Brezillon, dri-devel

- DRM_GPUVM_SM_MAP_NOT_MADVISE: Default sm_map operations for the input
  range.

- DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
  drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
user-provided range and split the existing non-GEM object VMA if the
start or end of the input range lies within it. The operations can
create up to 2 REMAPS and 2 MAPs. The purpose of this operation is to be
used by the Xe driver to assign attributes to GPUVMA's within the
user-defined range. Unlike drm_gpuvm_sm_map_ops_flags in default mode,
the operation with this flag will never have UNMAPs and
merges, and can be without any final operations.

v2
- use drm_gpuvm_sm_map_ops_create with flags instead of defining new
  ops_create (Danilo)
- Add doc (Danilo)

v3
- Fix doc
- Fix unmapping check

v4
- Fix mapping for non madvise ops

Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Boris Brezillon <bbrezillon@kernel.org>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Himal Prasad Ghimiray<himal.prasad.ghimiray@intel.com>
---
 drivers/gpu/drm/drm_gpuvm.c            | 93 ++++++++++++++++++++------
 drivers/gpu/drm/nouveau/nouveau_uvmm.c |  1 +
 drivers/gpu/drm/xe/xe_vm.c             |  1 +
 include/drm/drm_gpuvm.h                | 25 ++++++-
 4 files changed, 98 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index e89b932e987c..c7779588ea38 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -2103,10 +2103,13 @@ static int
 __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
 		   const struct drm_gpuvm_ops *ops, void *priv,
 		   u64 req_addr, u64 req_range,
+		   enum drm_gpuvm_sm_map_ops_flags flags,
 		   struct drm_gem_object *req_obj, u64 req_offset)
 {
 	struct drm_gpuva *va, *next;
 	u64 req_end = req_addr + req_range;
+	bool is_madvise_ops = (flags == DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE);
+	bool needs_map = !is_madvise_ops;
 	int ret;
 
 	if (unlikely(!drm_gpuvm_range_valid(gpuvm, req_addr, req_range)))
@@ -2119,26 +2122,35 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
 		u64 range = va->va.range;
 		u64 end = addr + range;
 		bool merge = !!va->gem.obj;
+		bool skip_madvise_ops = is_madvise_ops && merge;
 
+		needs_map = !is_madvise_ops;
 		if (addr == req_addr) {
 			merge &= obj == req_obj &&
 				 offset == req_offset;
 
 			if (end == req_end) {
-				ret = op_unmap_cb(ops, priv, va, merge);
-				if (ret)
-					return ret;
+				if (!is_madvise_ops) {
+					ret = op_unmap_cb(ops, priv, va, merge);
+					if (ret)
+						return ret;
+				}
 				break;
 			}
 
 			if (end < req_end) {
-				ret = op_unmap_cb(ops, priv, va, merge);
-				if (ret)
-					return ret;
+				if (!is_madvise_ops) {
+					ret = op_unmap_cb(ops, priv, va, merge);
+					if (ret)
+						return ret;
+				}
 				continue;
 			}
 
 			if (end > req_end) {
+				if (skip_madvise_ops)
+					break;
+
 				struct drm_gpuva_op_map n = {
 					.va.addr = req_end,
 					.va.range = range - req_range,
@@ -2153,6 +2165,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
 				ret = op_remap_cb(ops, priv, NULL, &n, &u);
 				if (ret)
 					return ret;
+
+				if (is_madvise_ops)
+					needs_map = true;
 				break;
 			}
 		} else if (addr < req_addr) {
@@ -2170,20 +2185,42 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
 			u.keep = merge;
 
 			if (end == req_end) {
+				if (skip_madvise_ops)
+					break;
+
 				ret = op_remap_cb(ops, priv, &p, NULL, &u);
 				if (ret)
 					return ret;
+
+				if (is_madvise_ops)
+					needs_map = true;
+
 				break;
 			}
 
 			if (end < req_end) {
+				if (skip_madvise_ops)
+					continue;
+
 				ret = op_remap_cb(ops, priv, &p, NULL, &u);
 				if (ret)
 					return ret;
+
+				if (is_madvise_ops) {
+					ret = op_map_cb(ops, priv, req_addr,
+							min(end - req_addr, req_end - end),
+							NULL, req_offset);
+					if (ret)
+						return ret;
+				}
+
 				continue;
 			}
 
 			if (end > req_end) {
+				if (skip_madvise_ops)
+					break;
+
 				struct drm_gpuva_op_map n = {
 					.va.addr = req_end,
 					.va.range = end - req_end,
@@ -2195,6 +2232,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
 				ret = op_remap_cb(ops, priv, &p, &n, &u);
 				if (ret)
 					return ret;
+
+				if (is_madvise_ops)
+					needs_map = true;
 				break;
 			}
 		} else if (addr > req_addr) {
@@ -2203,20 +2243,29 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
 					   (addr - req_addr);
 
 			if (end == req_end) {
-				ret = op_unmap_cb(ops, priv, va, merge);
-				if (ret)
-					return ret;
+				if (!is_madvise_ops) {
+					ret = op_unmap_cb(ops, priv, va, merge);
+					if (ret)
+						return ret;
+				}
+
 				break;
 			}
 
 			if (end < req_end) {
-				ret = op_unmap_cb(ops, priv, va, merge);
-				if (ret)
-					return ret;
+				if (!is_madvise_ops) {
+					ret = op_unmap_cb(ops, priv, va, merge);
+					if (ret)
+						return ret;
+				}
+
 				continue;
 			}
 
 			if (end > req_end) {
+				if (skip_madvise_ops)
+					break;
+
 				struct drm_gpuva_op_map n = {
 					.va.addr = req_end,
 					.va.range = end - req_end,
@@ -2231,14 +2280,16 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
 				ret = op_remap_cb(ops, priv, NULL, &n, &u);
 				if (ret)
 					return ret;
+
+				if (is_madvise_ops)
+					return op_map_cb(ops, priv, addr,
+							(req_end - addr), NULL, req_offset);
 				break;
 			}
 		}
 	}
-
-	return op_map_cb(ops, priv,
-			 req_addr, req_range,
-			 req_obj, req_offset);
+	return needs_map ? op_map_cb(ops, priv, req_addr,
+			   req_range, req_obj, req_offset) : 0;
 }
 
 static int
@@ -2337,15 +2388,15 @@ drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
 		 struct drm_gem_object *req_obj, u64 req_offset)
 {
 	const struct drm_gpuvm_ops *ops = gpuvm->ops;
+	enum drm_gpuvm_sm_map_ops_flags flags = DRM_GPUVM_SM_MAP_NOT_MADVISE;
 
 	if (unlikely(!(ops && ops->sm_step_map &&
 		       ops->sm_step_remap &&
 		       ops->sm_step_unmap)))
 		return -EINVAL;
 
-	return __drm_gpuvm_sm_map(gpuvm, ops, priv,
-				  req_addr, req_range,
-				  req_obj, req_offset);
+	return __drm_gpuvm_sm_map(gpuvm, ops, priv, req_addr, req_range,
+				  flags, req_obj, req_offset);
 }
 EXPORT_SYMBOL_GPL(drm_gpuvm_sm_map);
 
@@ -2487,6 +2538,7 @@ static const struct drm_gpuvm_ops gpuvm_list_ops = {
  * @gpuvm: the &drm_gpuvm representing the GPU VA space
  * @req_addr: the start address of the new mapping
  * @req_range: the range of the new mapping
+ * @drm_gpuvm_sm_map_ops_flag: ops flag determining madvise or not
  * @req_obj: the &drm_gem_object to map
  * @req_offset: the offset within the &drm_gem_object
  *
@@ -2517,6 +2569,7 @@ static const struct drm_gpuvm_ops gpuvm_list_ops = {
 struct drm_gpuva_ops *
 drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm,
 			    u64 req_addr, u64 req_range,
+			    enum drm_gpuvm_sm_map_ops_flags flags,
 			    struct drm_gem_object *req_obj, u64 req_offset)
 {
 	struct drm_gpuva_ops *ops;
@@ -2536,7 +2589,7 @@ drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm,
 	args.ops = ops;
 
 	ret = __drm_gpuvm_sm_map(gpuvm, &gpuvm_list_ops, &args,
-				 req_addr, req_range,
+				 req_addr, req_range, flags,
 				 req_obj, req_offset);
 	if (ret)
 		goto err_free_ops;
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
index 48f105239f42..26e13fcdbdb8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
@@ -1303,6 +1303,7 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job,
 			op->ops = drm_gpuvm_sm_map_ops_create(&uvmm->base,
 							      op->va.addr,
 							      op->va.range,
+							      DRM_GPUVM_SM_MAP_NOT_MADVISE,
 							      op->gem.obj,
 							      op->gem.offset);
 			if (IS_ERR(op->ops)) {
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 2035604121e6..b2ed99551b6e 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2318,6 +2318,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_vma_ops *vops,
 	case DRM_XE_VM_BIND_OP_MAP:
 	case DRM_XE_VM_BIND_OP_MAP_USERPTR:
 		ops = drm_gpuvm_sm_map_ops_create(&vm->gpuvm, addr, range,
+						  DRM_GPUVM_SM_MAP_NOT_MADVISE,
 						  obj, bo_offset_or_userptr);
 		break;
 	case DRM_XE_VM_BIND_OP_UNMAP:
diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
index 2a9629377633..c589b886a4fd 100644
--- a/include/drm/drm_gpuvm.h
+++ b/include/drm/drm_gpuvm.h
@@ -211,6 +211,27 @@ enum drm_gpuvm_flags {
 	DRM_GPUVM_USERBITS = BIT(1),
 };
 
+/**
+ * enum drm_gpuvm_sm_map_ops_flags - flags for drm_gpuvm split/merge ops
+ */
+enum drm_gpuvm_sm_map_ops_flags {
+	/**
+	 * @DRM_GPUVM_SM_MAP_NOT_MADVISE: DEFAULT sm_map ops
+	 */
+	DRM_GPUVM_SM_MAP_NOT_MADVISE = 0,
+
+	/**
+	 * @DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
+	 * drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
+	 * user-provided range and split the existing non-GEM object VMA if the
+	 * start or end of the input range lies within it. The operations can
+	 * create up to 2 REMAPS and 2 MAPs. Unlike drm_gpuvm_sm_map_ops_flags
+	 * in default mode, the operation with this flag will never have UNMAPs and
+	 * merges, and can be without any final operations.
+	 */
+	DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE = BIT(0),
+};
+
 /**
  * struct drm_gpuvm - DRM GPU VA Manager
  *
@@ -1059,8 +1080,8 @@ struct drm_gpuva_ops {
 #define drm_gpuva_next_op(op) list_next_entry(op, entry)
 
 struct drm_gpuva_ops *
-drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm,
-			    u64 addr, u64 range,
+drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm, u64 addr, u64 range,
+			    enum drm_gpuvm_sm_map_ops_flags flags,
 			    struct drm_gem_object *obj, u64 offset);
 struct drm_gpuva_ops *
 drm_gpuvm_sm_unmap_ops_create(struct drm_gpuvm *gpuvm,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 02/23] drm/xe/uapi: Add madvise interface
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
  2025-07-22 13:35 ` [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-29  3:29   ` Matthew Brost
  2025-07-22 13:35 ` [PATCH v5 03/23] drm/xe/vm: Add attributes struct as member of vma Himal Prasad Ghimiray
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

This commit introduces a new madvise interface to support
driver-specific ioctl operations. The madvise interface allows for more
efficient memory management by providing hints to the driver about the
expected memory usage and pte update policy for gpuvma.

v2 (Matthew/Thomas)
- Drop num_ops support
- Drop purgeable support
- Add kernel-docs
- IOWR/IOW

v3 (Matthew/Thomas)
- Reorder attributes
- use __u16 for migration_policy
- use __u64 for reserved in unions
- Avoid usage of vma

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
 include/uapi/drm/xe_drm.h | 131 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 131 insertions(+)

diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index e2426413488f..51dcf63684b0 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -81,6 +81,7 @@ extern "C" {
  *  - &DRM_IOCTL_XE_EXEC
  *  - &DRM_IOCTL_XE_WAIT_USER_FENCE
  *  - &DRM_IOCTL_XE_OBSERVATION
+ *  - &DRM_IOCTL_XE_MADVISE
  */
 
 /*
@@ -102,6 +103,7 @@ extern "C" {
 #define DRM_XE_EXEC			0x09
 #define DRM_XE_WAIT_USER_FENCE		0x0a
 #define DRM_XE_OBSERVATION		0x0b
+#define DRM_XE_MADVISE			0x0c
 
 /* Must be kept compact -- no holes */
 
@@ -117,6 +119,7 @@ extern "C" {
 #define DRM_IOCTL_XE_EXEC			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC, struct drm_xe_exec)
 #define DRM_IOCTL_XE_WAIT_USER_FENCE		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence)
 #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
+#define DRM_IOCTL_XE_MADVISE			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise)
 
 /**
  * DOC: Xe IOCTL Extensions
@@ -1974,6 +1977,134 @@ struct drm_xe_query_eu_stall {
 	__u64 sampling_rates[];
 };
 
+/**
+ * struct drm_xe_madvise - Input of &DRM_IOCTL_XE_MADVISE
+ *
+ * This structure is used to set memory attributes for a virtual address range
+ * in a VM. The type of attribute is specified by @type, and the corresponding
+ * union member is used to provide additional parameters for @type.
+ *
+ * Supported attribute types:
+ * - DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: Set preferred memory location.
+ * - DRM_XE_MEM_RANGE_ATTR_ATOMIC: Set atomic access policy.
+ * - DRM_XE_MEM_RANGE_ATTR_PAT: Set page attribute table index.
+ *
+ * Example:
+ *
+ * .. code-block:: C
+ *
+ * struct drm_xe_madvise madvise = {
+ *          .vm_id = vm_id,
+ *          .start = 0x100000,
+ *          .range = 0x2000,
+ *          .type = DRM_XE_MEM_RANGE_ATTR_ATOMIC,
+ *          .atomic_val = DRM_XE_ATOMIC_DEVICE,
+ *          .pad = 0,
+ *         };
+ *
+ * ioctl(fd, DRM_IOCTL_XE_MADVISE, &madvise);
+ *
+ */
+struct drm_xe_madvise {
+	/** @extensions: Pointer to the first extension struct, if any */
+	__u64 extensions;
+
+	/** @start: start of the virtual address range */
+	__u64 start;
+
+	/** @range: size of the virtual address range */
+	__u64 range;
+
+	/** @vm_id: vm_id of the virtual range */
+	__u32 vm_id;
+
+#define DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC	0
+#define DRM_XE_MEM_RANGE_ATTR_ATOMIC		1
+#define DRM_XE_MEM_RANGE_ATTR_PAT		2
+	/** @type: type of attribute */
+	__u32 type;
+
+	union {
+		/**
+		 * @preferred_mem_loc: preferred memory location
+		 *
+		 * Used when @type == DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC
+		 *
+		 * Supported values for @preferred_mem_loc.devmem_fd:
+		 * - DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE: set vram of faulting tile as preferred loc
+		 * - DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM: set smem as preferred loc
+		 *
+		 * Supported values for @preferred_mem_loc.migration_policy:
+		 * - DRM_XE_MIGRATE_ALL_PAGES
+		 * - DRM_XE_MIGRATE_ONLY_SYSTEM_PAGES
+		 */
+		struct {
+#define DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE	0
+#define DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM	-1
+			/** @preferred_mem_loc.devmem_fd: fd for preferred loc */
+			__u32 devmem_fd;
+
+#define DRM_XE_MIGRATE_ALL_PAGES		0
+#define DRM_XE_MIGRATE_ONLY_SYSTEM_PAGES	1
+			/** @preferred_mem_loc.migration_policy: Page migration policy */
+			__u16 migration_policy;
+
+			/** @preferred_mem_loc.pad : MBZ */
+			__u16 pad;
+
+			/** @preferred_mem_loc.reserved : Reserved */
+			__u64 reserved;
+		} preferred_mem_loc;
+
+		/**
+		 * @atomic: Atomic access policy
+		 *
+		 * Used when @type == DRM_XE_MEM_RANGE_ATTR_ATOMIC.
+		 *
+		 * Supported values for @atomic.val:
+		 * - DRM_XE_ATOMIC_UNDEFINED: Undefined or default behaviour
+		 *   Support both GPU and CPU atomic operations for system allocator
+		 *   Support GPU atomic operations for normal(bo) allocator
+		 * - DRM_XE_ATOMIC_DEVICE: Support GPU atomic operations
+		 * - DRM_XE_ATOMIC_GLOBAL: Support both GPU and CPU atomic operations
+		 * - DRM_XE_ATOMIC_CPU: Support CPU atomic
+		 */
+		struct {
+#define DRM_XE_ATOMIC_UNDEFINED	0
+#define DRM_XE_ATOMIC_DEVICE	1
+#define DRM_XE_ATOMIC_GLOBAL	2
+#define DRM_XE_ATOMIC_CPU	3
+			/** @atomic.val: value of atomic operation */
+			__u32 val;
+
+			/** @atomic.pad: MBZ */
+			__u32 pad;
+
+			/** @atomic.reserved: Reserved */
+			__u64 reserved;
+		} atomic;
+
+		/**
+		 * @pat_index: Page attribute table index
+		 *
+		 * Used when @type == DRM_XE_MEM_RANGE_ATTR_PAT.
+		 */
+		struct {
+			/** @pat_index.val: PAT index value */
+			__u32 val;
+
+			/** @pat_index.pad: MBZ */
+			__u32 pad;
+
+			/** @pat_index.reserved: Reserved */
+			__u64 reserved;
+		} pat_index;
+	};
+
+	/** @reserved: Reserved */
+	__u64 reserved[2];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 03/23] drm/xe/vm: Add attributes struct as member of vma
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
  2025-07-22 13:35 ` [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops Himal Prasad Ghimiray
  2025-07-22 13:35 ` [PATCH v5 02/23] drm/xe/uapi: Add madvise interface Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-22 13:35 ` [PATCH v5 04/23] drm/xe/vma: Move pat_index to vma attributes Himal Prasad Ghimiray
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

The attribute of xe_vma will determine the migration policy and the
encoding of the page table entries (PTEs) for that vma.
This attribute helps manage how memory pages are moved and how their
addresses are translated. It will be used by madvise to set the
behavior of the vma.

v2 (Matthew Brost)
- Add docs

v3 (Matthew Brost)
- Add uapi references
- 80 characters line wrap

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_vm_types.h | 33 ++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index bed6088e1bb3..5777b0e0c6a9 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -77,6 +77,33 @@ struct xe_userptr {
 #endif
 };
 
+/**
+ * struct xe_vma_mem_attr - memory attributes associated with vma
+ */
+struct xe_vma_mem_attr {
+	/** @preferred_loc: perferred memory_location */
+	struct {
+		/** @preferred_loc.migration_policy: Pages migration policy */
+		u32 migration_policy;
+
+		/**
+		 * @preferred_loc.devmem_fd: used for determining pagemap_fd
+		 * requested by user DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM and
+		 * DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE mean system memory or
+		 * closest device memory respectively.
+		 */
+		u32 devmem_fd;
+	} preferred_loc;
+
+	/**
+	 * @atomic_access: The atomic access type for the vma
+	 * See %DRM_XE_VMA_ATOMIC_UNDEFINED, %DRM_XE_VMA_ATOMIC_DEVICE,
+	 * %DRM_XE_VMA_ATOMIC_GLOBAL, and %DRM_XE_VMA_ATOMIC_CPU for possible
+	 * values. These are defined in uapi/drm/xe_drm.h.
+	 */
+	u32 atomic_access;
+};
+
 struct xe_vma {
 	/** @gpuva: Base GPUVA object */
 	struct drm_gpuva gpuva;
@@ -135,6 +162,12 @@ struct xe_vma {
 	 * Needs to be signalled before UNMAP can be processed.
 	 */
 	struct xe_user_fence *ufence;
+
+	/**
+	 * @attr: The attributes of vma which determines the migration policy
+	 * and encoding of the PTEs for this vma.
+	 */
+	struct xe_vma_mem_attr attr;
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 04/23] drm/xe/vma: Move pat_index to vma attributes
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (2 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 03/23] drm/xe/vm: Add attributes struct as member of vma Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-22 13:35 ` [PATCH v5 05/23] drm/xe/vma: Modify new_vma to accept struct xe_vma_mem_attr as parameter Himal Prasad Ghimiray
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

The PAT index determines how PTEs are encoded and can be modified by
madvise. Therefore, it is now part of the vma attributes.

Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c       |  2 +-
 drivers/gpu/drm/xe/xe_vm.c       |  6 +++---
 drivers/gpu/drm/xe/xe_vm_types.h | 10 +++++-----
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index c8e63bd23300..1bf0cf81513c 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -518,7 +518,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 {
 	struct xe_pt_stage_bind_walk *xe_walk =
 		container_of(walk, typeof(*xe_walk), base);
-	u16 pat_index = xe_walk->vma->pat_index;
+	u16 pat_index = xe_walk->vma->attr.pat_index;
 	struct xe_pt *xe_parent = container_of(parent, typeof(*xe_parent), base);
 	struct xe_vm *vm = xe_walk->vm;
 	struct xe_pt *xe_child;
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index b2ed99551b6e..696f3e87bb73 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1223,7 +1223,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 	if (vm->xe->info.has_atomic_enable_pte_bit)
 		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
 
-	vma->pat_index = pat_index;
+	vma->attr.pat_index = pat_index;
 
 	if (bo) {
 		struct drm_gpuvm_bo *vm_bo;
@@ -2673,7 +2673,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 
 			if (op->base.remap.prev) {
 				vma = new_vma(vm, op->base.remap.prev,
-					      old->pat_index, flags);
+					      old->attr.pat_index, flags);
 				if (IS_ERR(vma))
 					return PTR_ERR(vma);
 
@@ -2703,7 +2703,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 
 			if (op->base.remap.next) {
 				vma = new_vma(vm, op->base.remap.next,
-					      old->pat_index, flags);
+					      old->attr.pat_index, flags);
 				if (IS_ERR(vma))
 					return PTR_ERR(vma);
 
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 5777b0e0c6a9..c30f404a00e3 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -102,6 +102,11 @@ struct xe_vma_mem_attr {
 	 * values. These are defined in uapi/drm/xe_drm.h.
 	 */
 	u32 atomic_access;
+
+	/**
+	 * @pat_index: The pat index to use when encoding the PTEs for this vma.
+	 */
+	u16 pat_index;
 };
 
 struct xe_vma {
@@ -152,11 +157,6 @@ struct xe_vma {
 	/** @tile_staged: bind is staged for this VMA */
 	u8 tile_staged;
 
-	/**
-	 * @pat_index: The pat index to use when encoding the PTEs for this vma.
-	 */
-	u16 pat_index;
-
 	/**
 	 * @ufence: The user fence that was provided with MAP.
 	 * Needs to be signalled before UNMAP can be processed.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 05/23] drm/xe/vma: Modify new_vma to accept struct xe_vma_mem_attr as parameter
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (3 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 04/23] drm/xe/vma: Move pat_index to vma attributes Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-22 13:35 ` [PATCH v5 06/23] drm/gpusvm: Make drm_gpusvm_for_each_* macros public Himal Prasad Ghimiray
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

This change simplifies the logic by ensuring that remapped previous or
next VMAs are created with the same memory attributes as the original VMA.
By passing struct xe_vma_mem_attr as a parameter, we maintain consistency
in memory attributes.

-v2
 *dst = *src (Matthew Brost)

-v3 (Matthew Brost)
 Drop unnecessary helper
 pass attr ptr as input to new_vma and vma_create

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 696f3e87bb73..480cf75340ce 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1168,7 +1168,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 				    struct xe_bo *bo,
 				    u64 bo_offset_or_userptr,
 				    u64 start, u64 end,
-				    u16 pat_index, unsigned int flags)
+				    struct xe_vma_mem_attr *attr,
+				    unsigned int flags)
 {
 	struct xe_vma *vma;
 	struct xe_tile *tile;
@@ -1223,7 +1224,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 	if (vm->xe->info.has_atomic_enable_pte_bit)
 		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
 
-	vma->attr.pat_index = pat_index;
+	vma->attr = *attr;
 
 	if (bo) {
 		struct drm_gpuvm_bo *vm_bo;
@@ -2444,7 +2445,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_vma_ops *vops,
 ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_create, ERRNO);
 
 static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
-			      u16 pat_index, unsigned int flags)
+			      struct xe_vma_mem_attr *attr, unsigned int flags)
 {
 	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
 	struct drm_exec exec;
@@ -2473,7 +2474,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 	}
 	vma = xe_vma_create(vm, bo, op->gem.offset,
 			    op->va.addr, op->va.addr +
-			    op->va.range - 1, pat_index, flags);
+			    op->va.range - 1, attr, flags);
 	if (IS_ERR(vma))
 		goto err_unlock;
 
@@ -2616,6 +2617,15 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 		switch (op->base.op) {
 		case DRM_GPUVA_OP_MAP:
 		{
+			struct xe_vma_mem_attr default_attr = {
+				.preferred_loc = {
+					.devmem_fd = DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE,
+					.migration_policy = DRM_XE_MIGRATE_ALL_PAGES,
+				},
+				.atomic_access = DRM_XE_ATOMIC_UNDEFINED,
+				.pat_index = op->map.pat_index,
+			};
+
 			flags |= op->map.read_only ?
 				VMA_CREATE_FLAG_READ_ONLY : 0;
 			flags |= op->map.is_null ?
@@ -2625,7 +2635,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 			flags |= op->map.is_cpu_addr_mirror ?
 				VMA_CREATE_FLAG_IS_SYSTEM_ALLOCATOR : 0;
 
-			vma = new_vma(vm, &op->base.map, op->map.pat_index,
+			vma = new_vma(vm, &op->base.map, &default_attr,
 				      flags);
 			if (IS_ERR(vma))
 				return PTR_ERR(vma);
@@ -2673,7 +2683,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 
 			if (op->base.remap.prev) {
 				vma = new_vma(vm, op->base.remap.prev,
-					      old->attr.pat_index, flags);
+					      &old->attr, flags);
 				if (IS_ERR(vma))
 					return PTR_ERR(vma);
 
@@ -2703,7 +2713,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 
 			if (op->base.remap.next) {
 				vma = new_vma(vm, op->base.remap.next,
-					      old->attr.pat_index, flags);
+					      &old->attr, flags);
 				if (IS_ERR(vma))
 					return PTR_ERR(vma);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 06/23] drm/gpusvm: Make drm_gpusvm_for_each_* macros public
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (4 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 05/23] drm/xe/vma: Modify new_vma to accept struct xe_vma_mem_attr as parameter Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-22 13:35 ` [PATCH v5 07/23] drm/xe/svm: Split system allocator vma incase of madvise call Himal Prasad Ghimiray
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

The drm_gpusvm_for_each_notifier, drm_gpusvm_for_each_notifier_safe and
drm_gpusvm_for_each_range_safe macros are useful for locating notifiers
and ranges within a user-specified range. By making these macros public,
we enable broader access and utility for developers who need to leverage
them in their implementations.

v2 (Matthew Brost)
- drop inline __drm_gpusvm_range_find
- /s/notifier_iter_first/drm_gpusvm_notifier_find

Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/drm_gpusvm.c | 122 +++++++----------------------------
 include/drm/drm_gpusvm.h     |  70 ++++++++++++++++++++
 2 files changed, 95 insertions(+), 97 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c
index 5bb4c77db2c3..647b49ff2da5 100644
--- a/drivers/gpu/drm/drm_gpusvm.c
+++ b/drivers/gpu/drm/drm_gpusvm.c
@@ -271,107 +271,50 @@ npages_in_range(unsigned long start, unsigned long end)
 }
 
 /**
- * drm_gpusvm_range_find() - Find GPU SVM range from GPU SVM notifier
- * @notifier: Pointer to the GPU SVM notifier structure.
- * @start: Start address of the range
- * @end: End address of the range
+ * drm_gpusvm_notifier_find() - Find GPU SVM notifier from GPU SVM
+ * @gpusvm: Pointer to the GPU SVM structure.
+ * @start: Start address of the notifier
+ * @end: End address of the notifier
  *
- * Return: A pointer to the drm_gpusvm_range if found or NULL
+ * Return: A pointer to the drm_gpusvm_notifier if found or NULL
  */
-struct drm_gpusvm_range *
-drm_gpusvm_range_find(struct drm_gpusvm_notifier *notifier, unsigned long start,
-		      unsigned long end)
+struct drm_gpusvm_notifier *
+drm_gpusvm_notifier_find(struct drm_gpusvm *gpusvm, unsigned long start,
+			 unsigned long end)
 {
 	struct interval_tree_node *itree;
 
-	itree = interval_tree_iter_first(&notifier->root, start, end - 1);
+	itree = interval_tree_iter_first(&gpusvm->root, start, end - 1);
 
 	if (itree)
-		return container_of(itree, struct drm_gpusvm_range, itree);
+		return container_of(itree, struct drm_gpusvm_notifier, itree);
 	else
 		return NULL;
 }
-EXPORT_SYMBOL_GPL(drm_gpusvm_range_find);
+EXPORT_SYMBOL_GPL(drm_gpusvm_notifier_find);
 
 /**
- * drm_gpusvm_for_each_range_safe() - Safely iterate over GPU SVM ranges in a notifier
- * @range__: Iterator variable for the ranges
- * @next__: Iterator variable for the ranges temporay storage
- * @notifier__: Pointer to the GPU SVM notifier
- * @start__: Start address of the range
- * @end__: End address of the range
- *
- * This macro is used to iterate over GPU SVM ranges in a notifier while
- * removing ranges from it.
- */
-#define drm_gpusvm_for_each_range_safe(range__, next__, notifier__, start__, end__)	\
-	for ((range__) = drm_gpusvm_range_find((notifier__), (start__), (end__)),	\
-	     (next__) = __drm_gpusvm_range_next(range__);				\
-	     (range__) && (drm_gpusvm_range_start(range__) < (end__));			\
-	     (range__) = (next__), (next__) = __drm_gpusvm_range_next(range__))
-
-/**
- * __drm_gpusvm_notifier_next() - get the next drm_gpusvm_notifier in the list
- * @notifier: a pointer to the current drm_gpusvm_notifier
+ * drm_gpusvm_range_find() - Find GPU SVM range from GPU SVM notifier
+ * @notifier: Pointer to the GPU SVM notifier structure.
+ * @start: Start address of the range
+ * @end: End address of the range
  *
- * Return: A pointer to the next drm_gpusvm_notifier if available, or NULL if
- *         the current notifier is the last one or if the input notifier is
- *         NULL.
+ * Return: A pointer to the drm_gpusvm_range if found or NULL
  */
-static struct drm_gpusvm_notifier *
-__drm_gpusvm_notifier_next(struct drm_gpusvm_notifier *notifier)
-{
-	if (notifier && !list_is_last(&notifier->entry,
-				      &notifier->gpusvm->notifier_list))
-		return list_next_entry(notifier, entry);
-
-	return NULL;
-}
-
-static struct drm_gpusvm_notifier *
-notifier_iter_first(struct rb_root_cached *root, unsigned long start,
-		    unsigned long last)
+struct drm_gpusvm_range *
+drm_gpusvm_range_find(struct drm_gpusvm_notifier *notifier, unsigned long start,
+		      unsigned long end)
 {
 	struct interval_tree_node *itree;
 
-	itree = interval_tree_iter_first(root, start, last);
+	itree = interval_tree_iter_first(&notifier->root, start, end - 1);
 
 	if (itree)
-		return container_of(itree, struct drm_gpusvm_notifier, itree);
+		return container_of(itree, struct drm_gpusvm_range, itree);
 	else
 		return NULL;
 }
-
-/**
- * drm_gpusvm_for_each_notifier() - Iterate over GPU SVM notifiers in a gpusvm
- * @notifier__: Iterator variable for the notifiers
- * @notifier__: Pointer to the GPU SVM notifier
- * @start__: Start address of the notifier
- * @end__: End address of the notifier
- *
- * This macro is used to iterate over GPU SVM notifiers in a gpusvm.
- */
-#define drm_gpusvm_for_each_notifier(notifier__, gpusvm__, start__, end__)		\
-	for ((notifier__) = notifier_iter_first(&(gpusvm__)->root, (start__), (end__) - 1);	\
-	     (notifier__) && (drm_gpusvm_notifier_start(notifier__) < (end__));		\
-	     (notifier__) = __drm_gpusvm_notifier_next(notifier__))
-
-/**
- * drm_gpusvm_for_each_notifier_safe() - Safely iterate over GPU SVM notifiers in a gpusvm
- * @notifier__: Iterator variable for the notifiers
- * @next__: Iterator variable for the notifiers temporay storage
- * @notifier__: Pointer to the GPU SVM notifier
- * @start__: Start address of the notifier
- * @end__: End address of the notifier
- *
- * This macro is used to iterate over GPU SVM notifiers in a gpusvm while
- * removing notifiers from it.
- */
-#define drm_gpusvm_for_each_notifier_safe(notifier__, next__, gpusvm__, start__, end__)	\
-	for ((notifier__) = notifier_iter_first(&(gpusvm__)->root, (start__), (end__) - 1),	\
-	     (next__) = __drm_gpusvm_notifier_next(notifier__);				\
-	     (notifier__) && (drm_gpusvm_notifier_start(notifier__) < (end__));		\
-	     (notifier__) = (next__), (next__) = __drm_gpusvm_notifier_next(notifier__))
+EXPORT_SYMBOL_GPL(drm_gpusvm_range_find);
 
 /**
  * drm_gpusvm_notifier_invalidate() - Invalidate a GPU SVM notifier.
@@ -472,22 +415,6 @@ int drm_gpusvm_init(struct drm_gpusvm *gpusvm,
 }
 EXPORT_SYMBOL_GPL(drm_gpusvm_init);
 
-/**
- * drm_gpusvm_notifier_find() - Find GPU SVM notifier
- * @gpusvm: Pointer to the GPU SVM structure
- * @fault_addr: Fault address
- *
- * This function finds the GPU SVM notifier associated with the fault address.
- *
- * Return: Pointer to the GPU SVM notifier on success, NULL otherwise.
- */
-static struct drm_gpusvm_notifier *
-drm_gpusvm_notifier_find(struct drm_gpusvm *gpusvm,
-			 unsigned long fault_addr)
-{
-	return notifier_iter_first(&gpusvm->root, fault_addr, fault_addr + 1);
-}
-
 /**
  * to_drm_gpusvm_notifier() - retrieve the container struct for a given rbtree node
  * @node: a pointer to the rbtree node embedded within a drm_gpusvm_notifier struct
@@ -943,7 +870,7 @@ drm_gpusvm_range_find_or_insert(struct drm_gpusvm *gpusvm,
 	if (!mmget_not_zero(mm))
 		return ERR_PTR(-EFAULT);
 
-	notifier = drm_gpusvm_notifier_find(gpusvm, fault_addr);
+	notifier = drm_gpusvm_notifier_find(gpusvm, fault_addr, fault_addr + 1);
 	if (!notifier) {
 		notifier = drm_gpusvm_notifier_alloc(gpusvm, fault_addr);
 		if (IS_ERR(notifier)) {
@@ -1107,7 +1034,8 @@ void drm_gpusvm_range_remove(struct drm_gpusvm *gpusvm,
 	drm_gpusvm_driver_lock_held(gpusvm);
 
 	notifier = drm_gpusvm_notifier_find(gpusvm,
-					    drm_gpusvm_range_start(range));
+					    drm_gpusvm_range_start(range),
+					    drm_gpusvm_range_start(range) + 1);
 	if (WARN_ON_ONCE(!notifier))
 		return;
 
diff --git a/include/drm/drm_gpusvm.h b/include/drm/drm_gpusvm.h
index 4aedc5423aff..142fc2af1716 100644
--- a/include/drm/drm_gpusvm.h
+++ b/include/drm/drm_gpusvm.h
@@ -282,6 +282,10 @@ void drm_gpusvm_range_unmap_pages(struct drm_gpusvm *gpusvm,
 bool drm_gpusvm_has_mapping(struct drm_gpusvm *gpusvm, unsigned long start,
 			    unsigned long end);
 
+struct drm_gpusvm_notifier *
+drm_gpusvm_notifier_find(struct drm_gpusvm *gpusvm, unsigned long start,
+			 unsigned long end);
+
 struct drm_gpusvm_range *
 drm_gpusvm_range_find(struct drm_gpusvm_notifier *notifier, unsigned long start,
 		      unsigned long end);
@@ -434,4 +438,70 @@ __drm_gpusvm_range_next(struct drm_gpusvm_range *range)
 	     (range__) && (drm_gpusvm_range_start(range__) < (end__));	\
 	     (range__) = __drm_gpusvm_range_next(range__))
 
+/**
+ * drm_gpusvm_for_each_range_safe() - Safely iterate over GPU SVM ranges in a notifier
+ * @range__: Iterator variable for the ranges
+ * @next__: Iterator variable for the ranges temporay storage
+ * @notifier__: Pointer to the GPU SVM notifier
+ * @start__: Start address of the range
+ * @end__: End address of the range
+ *
+ * This macro is used to iterate over GPU SVM ranges in a notifier while
+ * removing ranges from it.
+ */
+#define drm_gpusvm_for_each_range_safe(range__, next__, notifier__, start__, end__)	\
+	for ((range__) = drm_gpusvm_range_find((notifier__), (start__), (end__)),	\
+	     (next__) = __drm_gpusvm_range_next(range__);				\
+	     (range__) && (drm_gpusvm_range_start(range__) < (end__));			\
+	     (range__) = (next__), (next__) = __drm_gpusvm_range_next(range__))
+
+/**
+ * __drm_gpusvm_notifier_next() - get the next drm_gpusvm_notifier in the list
+ * @notifier: a pointer to the current drm_gpusvm_notifier
+ *
+ * Return: A pointer to the next drm_gpusvm_notifier if available, or NULL if
+ *         the current notifier is the last one or if the input notifier is
+ *         NULL.
+ */
+static inline struct drm_gpusvm_notifier *
+__drm_gpusvm_notifier_next(struct drm_gpusvm_notifier *notifier)
+{
+	if (notifier && !list_is_last(&notifier->entry,
+				      &notifier->gpusvm->notifier_list))
+		return list_next_entry(notifier, entry);
+
+	return NULL;
+}
+
+/**
+ * drm_gpusvm_for_each_notifier() - Iterate over GPU SVM notifiers in a gpusvm
+ * @notifier__: Iterator variable for the notifiers
+ * @gpusvm__: Pointer to the GPU SVM notifier
+ * @start__: Start address of the notifier
+ * @end__: End address of the notifier
+ *
+ * This macro is used to iterate over GPU SVM notifiers in a gpusvm.
+ */
+#define drm_gpusvm_for_each_notifier(notifier__, gpusvm__, start__, end__)		\
+	for ((notifier__) = drm_gpusvm_notifier_find((gpusvm__), (start__), (end__));	\
+	     (notifier__) && (drm_gpusvm_notifier_start(notifier__) < (end__));		\
+	     (notifier__) = __drm_gpusvm_notifier_next(notifier__))
+
+/**
+ * drm_gpusvm_for_each_notifier_safe() - Safely iterate over GPU SVM notifiers in a gpusvm
+ * @notifier__: Iterator variable for the notifiers
+ * @next__: Iterator variable for the notifiers temporay storage
+ * @gpusvm__: Pointer to the GPU SVM notifier
+ * @start__: Start address of the notifier
+ * @end__: End address of the notifier
+ *
+ * This macro is used to iterate over GPU SVM notifiers in a gpusvm while
+ * removing notifiers from it.
+ */
+#define drm_gpusvm_for_each_notifier_safe(notifier__, next__, gpusvm__, start__, end__)	\
+	for ((notifier__) = drm_gpusvm_notifier_find((gpusvm__), (start__), (end__)),	\
+	     (next__) = __drm_gpusvm_notifier_next(notifier__);				\
+	     (notifier__) && (drm_gpusvm_notifier_start(notifier__) < (end__));		\
+	     (notifier__) = (next__), (next__) = __drm_gpusvm_notifier_next(notifier__))
+
 #endif /* __DRM_GPUSVM_H__ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 07/23] drm/xe/svm: Split system allocator vma incase of madvise call
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (5 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 06/23] drm/gpusvm: Make drm_gpusvm_for_each_* macros public Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-22 13:35 ` [PATCH v5 08/23] drm/xe: Allow CPU address mirror VMA unbind with gpu bindings for madvise Himal Prasad Ghimiray
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

If the start or end of input address range lies within system allocator
vma split the vma to create new vma's as per input range.

v2 (Matthew Brost)
- Add lockdep_assert_write for vm->lock
- Remove unnecessary page aligned checks
- Add kerrnel-doc and comments
- Remove unnecessary unwind_ops and return

v3
- Fix copying of attributes

v4
- Nit fixes

v5
- Squash identifier for madvise in xe_vma_ops to this patch

Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c       | 107 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm.h       |   2 +
 drivers/gpu/drm/xe/xe_vm_types.h |   1 +
 3 files changed, 110 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 480cf75340ce..a56384325f4d 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -4172,3 +4172,110 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap)
 	}
 	kvfree(snap);
 }
+
+/**
+ * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
+ * @vm: Pointer to the xe_vm structure
+ * @start: Starting input address
+ * @range: Size of the input range
+ *
+ * This function splits existing vma to create new vma for user provided input range
+ *
+ *  Return: 0 if success
+ */
+int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
+{
+	struct xe_vma_ops vops;
+	struct drm_gpuva_ops *ops = NULL;
+	struct drm_gpuva_op *__op;
+	bool is_cpu_addr_mirror = false;
+	bool remap_op = false;
+	struct xe_vma_mem_attr tmp_attr;
+	int err;
+
+	vm_dbg(&vm->xe->drm, "MADVISE IN: addr=0x%016llx, size=0x%016llx", start, range);
+
+	lockdep_assert_held_write(&vm->lock);
+
+	vm_dbg(&vm->xe->drm, "MADVISE_OPS_CREATE: addr=0x%016llx, size=0x%016llx", start, range);
+	ops = drm_gpuvm_sm_map_ops_create(&vm->gpuvm, start, range,
+					  DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE,
+					  NULL, start);
+	if (IS_ERR(ops))
+		return PTR_ERR(ops);
+
+	if (list_empty(&ops->list)) {
+		err = 0;
+		goto free_ops;
+	}
+
+	drm_gpuva_for_each_op(__op, ops) {
+		struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
+
+		if (__op->op == DRM_GPUVA_OP_REMAP) {
+			xe_assert(vm->xe, !remap_op);
+			remap_op = true;
+
+			if (xe_vma_is_cpu_addr_mirror(gpuva_to_vma(op->base.remap.unmap->va)))
+				is_cpu_addr_mirror = true;
+			else
+				is_cpu_addr_mirror = false;
+		}
+
+		if (__op->op == DRM_GPUVA_OP_MAP) {
+			xe_assert(vm->xe, remap_op);
+			remap_op = false;
+
+			/* In case of madvise ops DRM_GPUVA_OP_MAP is always after
+			 * DRM_GPUVA_OP_REMAP, so ensure we assign op->map.is_cpu_addr_mirror true
+			 * if REMAP is for xe_vma_is_cpu_addr_mirror vma
+			 */
+			op->map.is_cpu_addr_mirror = is_cpu_addr_mirror;
+		}
+
+		print_op(vm->xe, __op);
+	}
+
+	xe_vma_ops_init(&vops, vm, NULL, NULL, 0);
+	vops.flags |= XE_VMA_OPS_FLAG_MADVISE;
+	err = vm_bind_ioctl_ops_parse(vm, ops, &vops);
+	if (err)
+		goto unwind_ops;
+
+	xe_vm_lock(vm, false);
+
+	drm_gpuva_for_each_op(__op, ops) {
+		struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
+		struct xe_vma *vma;
+
+		if (__op->op == DRM_GPUVA_OP_UNMAP) {
+			/* There should be no unmap */
+			XE_WARN_ON("UNEXPECTED UNMAP");
+			xe_vma_destroy(gpuva_to_vma(op->base.unmap.va), NULL);
+		} else if (__op->op == DRM_GPUVA_OP_REMAP) {
+			vma = gpuva_to_vma(op->base.remap.unmap->va);
+			/* Store attributes for REMAP UNMAPPED VMA, so they can be assigned
+			 * to newly MAP created vma.
+			 */
+			tmp_attr = vma->attr;
+			xe_vma_destroy(gpuva_to_vma(op->base.remap.unmap->va), NULL);
+		} else if (__op->op == DRM_GPUVA_OP_MAP) {
+			vma = op->map.vma;
+			/* In case of madvise call, MAP will always be follwed by REMAP.
+			 * Therefore temp_attr will always have sane values, making it safe to
+			 * copy them to new vma.
+			 */
+			vma->attr = tmp_attr;
+		}
+	}
+
+	xe_vm_unlock(vm);
+	drm_gpuva_ops_free(&vm->gpuvm, ops);
+	return 0;
+
+unwind_ops:
+	vm_bind_ioctl_ops_unwind(vm, &ops, 1);
+free_ops:
+	drm_gpuva_ops_free(&vm->gpuvm, ops);
+	return err;
+}
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 3475a118f666..0d6b08cc4163 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -171,6 +171,8 @@ static inline bool xe_vma_is_userptr(struct xe_vma *vma)
 
 struct xe_vma *xe_vm_find_vma_by_addr(struct xe_vm *vm, u64 page_addr);
 
+int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
+
 /**
  * to_userptr_vma() - Return a pointer to an embedding userptr vma
  * @vma: Pointer to the embedded struct xe_vma
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index c30f404a00e3..cd94d8b5819d 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -495,6 +495,7 @@ struct xe_vma_ops {
 	struct xe_vm_pgtable_update_ops pt_update_ops[XE_MAX_TILES_PER_DEVICE];
 	/** @flag: signify the properties within xe_vma_ops*/
 #define XE_VMA_OPS_FLAG_HAS_SVM_PREFETCH BIT(0)
+#define XE_VMA_OPS_FLAG_MADVISE          BIT(1)
 	u32 flags;
 #ifdef TEST_VM_OPS_ERROR
 	/** @inject_error: inject error to test error handling */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 08/23] drm/xe: Allow CPU address mirror VMA unbind with gpu bindings for madvise
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (6 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 07/23] drm/xe/svm: Split system allocator vma incase of madvise call Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-29  3:40   ` Matthew Brost
  2025-07-22 13:35 ` [PATCH v5 09/23] drm/xe/svm: Add xe_svm_ranges_zap_ptes_in_range() for PTE zapping Himal Prasad Ghimiray
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

In the case of the MADVISE ioctl, if the start or end addresses fall
within a VMA and existing SVM ranges are present, remove the existing
SVM mappings. Then, continue with ops_parse to create new VMAs by REMAP
unmapping of old one.

v2 (Matthew Brost)
- Use vops flag to call unmapping of ranges in vm_bind_ioctl_ops_parse
- Rename the function

v3
- Fix doc

Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c | 28 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_svm.h |  7 +++++++
 drivers/gpu/drm/xe/xe_vm.c  |  8 ++++++--
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index a7ff5975873f..ce8a71b80811 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -933,6 +933,34 @@ bool xe_svm_has_mapping(struct xe_vm *vm, u64 start, u64 end)
 	return drm_gpusvm_has_mapping(&vm->svm.gpusvm, start, end);
 }
 
+/**
+ * xe_svm_unmap_address_range - UNMAP SVM mappings and ranges
+ * @vm: The VM
+ * @start: start addr
+ * @end: end addr
+ *
+ * This function UNMAPS svm ranges if start or end address are inside them.
+ */
+void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end)
+{
+	struct drm_gpusvm_notifier *notifier, *next;
+
+	lockdep_assert_held_write(&vm->lock);
+
+	drm_gpusvm_for_each_notifier_safe(notifier, next, &vm->svm.gpusvm, start, end) {
+		struct drm_gpusvm_range *range, *__next;
+
+		drm_gpusvm_for_each_range_safe(range, __next, notifier, start, end) {
+			if (start > drm_gpusvm_range_start(range) ||
+			    end < drm_gpusvm_range_end(range)) {
+				if (IS_DGFX(vm->xe) && xe_svm_range_in_vram(to_xe_range(range)))
+					drm_gpusvm_range_evict(&vm->svm.gpusvm, range);
+				__xe_svm_garbage_collector(vm, to_xe_range(range));
+			}
+		}
+	}
+}
+
 /**
  * xe_svm_bo_evict() - SVM evict BO to system memory
  * @bo: BO to evict
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index da9a69ea0bb1..754d56b4d255 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -90,6 +90,8 @@ bool xe_svm_range_validate(struct xe_vm *vm,
 
 u64 xe_svm_find_vma_start(struct xe_vm *vm, u64 addr, u64 end,  struct xe_vma *vma);
 
+void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end);
+
 /**
  * xe_svm_range_has_dma_mapping() - SVM range has DMA mapping
  * @range: SVM range
@@ -303,6 +305,11 @@ u64 xe_svm_find_vma_start(struct xe_vm *vm, u64 addr, u64 end, struct xe_vma *vm
 	return ULONG_MAX;
 }
 
+static inline
+void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end)
+{
+}
+
 #define xe_svm_assert_in_notifier(...) do {} while (0)
 #define xe_svm_range_has_dma_mapping(...) false
 
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index a56384325f4d..7f3d0ad04b3f 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2663,8 +2663,12 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 				end = op->base.remap.next->va.addr;
 
 			if (xe_vma_is_cpu_addr_mirror(old) &&
-			    xe_svm_has_mapping(vm, start, end))
-				return -EBUSY;
+			    xe_svm_has_mapping(vm, start, end)) {
+				if (vops->flags & XE_VMA_OPS_FLAG_MADVISE)
+					xe_svm_unmap_address_range(vm, start, end);
+				else
+					return -EBUSY;
+			}
 
 			op->remap.start = xe_vma_start(old);
 			op->remap.range = xe_vma_size(old);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 09/23] drm/xe/svm: Add xe_svm_ranges_zap_ptes_in_range() for PTE zapping
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (7 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 08/23] drm/xe: Allow CPU address mirror VMA unbind with gpu bindings for madvise Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-29  3:42   ` Matthew Brost
  2025-07-22 13:35 ` [PATCH v5 10/23] drm/xe: Implement madvise ioctl for xe Himal Prasad Ghimiray
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

Introduce xe_svm_ranges_zap_ptes_in_range(), a function to zap page table
entries (PTEs) for all SVM ranges within a user-specified address range.

-v2 (Matthew Brost)
Lock should be called even for tlb_invalidation

v3(Matthew Brost)
- Update comment
- s/notifier->itree.start/drm_gpusvm_notifier_start
- s/notifier->itree.last + 1/drm_gpusvm_notifier_end
- use WRITE_ONCE

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c  | 14 ++++++++++-
 drivers/gpu/drm/xe/xe_svm.c | 50 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_svm.h |  8 ++++++
 3 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 1bf0cf81513c..b499006df2cf 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -950,7 +950,19 @@ bool xe_pt_zap_ptes_range(struct xe_tile *tile, struct xe_vm *vm,
 	struct xe_pt *pt = vm->pt_root[tile->id];
 	u8 pt_mask = (range->tile_present & ~range->tile_invalidated);
 
-	xe_svm_assert_in_notifier(vm);
+	/*
+	 * Locking rules:
+	 *
+	 * - notifier_lock (write): full protection against page table changes
+	 *   and MMU notifier invalidations.
+	 *
+	 * - notifier_lock (read) + vm_lock (write): combined protection against
+	 *   invalidations and concurrent page table modifications. (e.g., madvise)
+	 *
+	 */
+	lockdep_assert(lockdep_is_held_type(&vm->svm.gpusvm.notifier_lock, 0) ||
+		       (lockdep_is_held_type(&vm->svm.gpusvm.notifier_lock, 1) &&
+		       lockdep_is_held_type(&vm->lock, 0)));
 
 	if (!(pt_mask & BIT(tile->id)))
 		return false;
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index ce8a71b80811..c093dc453e32 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -1025,6 +1025,56 @@ int xe_svm_range_get_pages(struct xe_vm *vm, struct xe_svm_range *range,
 	return err;
 }
 
+/**
+ * xe_svm_ranges_zap_ptes_in_range - clear ptes of svm ranges in input range
+ * @vm: Pointer to the xe_vm structure
+ * @start: Start of the input range
+ * @end: End of the input range
+ *
+ * This function removes the page table entries (PTEs) associated
+ * with the svm ranges within the given input start and end
+ *
+ * Return: tile_mask for which gt's need to be tlb invalidated.
+ */
+u8 xe_svm_ranges_zap_ptes_in_range(struct xe_vm *vm, u64 start, u64 end)
+{
+	struct drm_gpusvm_notifier *notifier;
+	struct xe_svm_range *range;
+	u64 adj_start, adj_end;
+	struct xe_tile *tile;
+	u8 tile_mask = 0;
+	u8 id;
+
+	lockdep_assert(lockdep_is_held_type(&vm->svm.gpusvm.notifier_lock, 1) &&
+		       lockdep_is_held_type(&vm->lock, 0));
+
+	drm_gpusvm_for_each_notifier(notifier, &vm->svm.gpusvm, start, end) {
+		struct drm_gpusvm_range *r = NULL;
+
+		adj_start = max(start, drm_gpusvm_notifier_start(notifier));
+		adj_end = min(end, drm_gpusvm_notifier_end(notifier));
+		drm_gpusvm_for_each_range(r, notifier, adj_start, adj_end) {
+			range = to_xe_range(r);
+			for_each_tile(tile, vm->xe, id) {
+				if (xe_pt_zap_ptes_range(tile, vm, range)) {
+					tile_mask |= BIT(id);
+					/*
+					 * WRITE_ONCE pairs with READ_ONCE in
+					 * xe_vm_has_valid_gpu_mapping().
+					 * Must not fail after setting
+					 * tile_invalidated and before
+					 * TLB invalidation.
+					 */
+					WRITE_ONCE(range->tile_invalidated,
+						   range->tile_invalidated | BIT(id));
+				}
+			}
+		}
+	}
+
+	return tile_mask;
+}
+
 #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 754d56b4d255..b0da0e85f0b8 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -92,6 +92,8 @@ u64 xe_svm_find_vma_start(struct xe_vm *vm, u64 addr, u64 end,  struct xe_vma *v
 
 void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end);
 
+u8 xe_svm_ranges_zap_ptes_in_range(struct xe_vm *vm, u64 start, u64 end);
+
 /**
  * xe_svm_range_has_dma_mapping() - SVM range has DMA mapping
  * @range: SVM range
@@ -310,6 +312,12 @@ void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end)
 {
 }
 
+static inline
+u8 xe_svm_ranges_zap_ptes_in_range(struct xe_vm *vm, u64 start, u64 end)
+{
+	return 0;
+}
+
 #define xe_svm_assert_in_notifier(...) do {} while (0)
 #define xe_svm_range_has_dma_mapping(...) false
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 10/23] drm/xe: Implement madvise ioctl for xe
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (8 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 09/23] drm/xe/svm: Add xe_svm_ranges_zap_ptes_in_range() for PTE zapping Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-29  3:52   ` Matthew Brost
  2025-07-22 13:35 ` [PATCH v5 11/23] drm/xe/svm : Add svm ranges migration policy on atomic access Himal Prasad Ghimiray
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe
  Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray,
	Shuicheng Lin

This driver-specific ioctl enables UMDs to control the memory attributes
for GPU VMAs within a specified input range. If the start or end
addresses fall within an existing VMA, the VMA is split accordingly. The
attributes of the VMA are modified as provided by the users. The old
mappings of the VMAs are invalidated, and TLB invalidation is performed
if necessary.

v2(Matthew brost)
- xe_vm_in_fault_mode can't be enabled by Mesa, hence allow ioctl in non
fault mode too
- fix tlb invalidation skip for same ranges in multiple op
- use helper for tlb invalidation
- use xe_svm_notifier_lock/unlock helper
- s/lockdep_assert_held/lockdep_assert_held_write
- Add kernel-doc

v3(Matthew Brost)
- make vfunc fail safe
- Add sanitizing input args before vfunc

v4(Matthew Brost/Shuicheng)
- Make locks interruptable
- Error handling fixes
- vm_put fixes

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
 drivers/gpu/drm/xe/Makefile        |   1 +
 drivers/gpu/drm/xe/xe_vm_madvise.c | 306 +++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm_madvise.h |  15 ++
 3 files changed, 322 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_vm_madvise.c
 create mode 100644 drivers/gpu/drm/xe/xe_vm_madvise.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 83a36c47a2f9..fa52866bb72c 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -125,6 +125,7 @@ xe-y += xe_bb.o \
 	xe_uc.o \
 	xe_uc_fw.o \
 	xe_vm.o \
+	xe_vm_madvise.o \
 	xe_vram.o \
 	xe_vram_freq.o \
 	xe_vsec.o \
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
new file mode 100644
index 000000000000..f64728120d7c
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -0,0 +1,306 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#include "xe_vm_madvise.h"
+
+#include <linux/nospec.h>
+#include <drm/xe_drm.h>
+
+#include "xe_bo.h"
+#include "xe_pt.h"
+#include "xe_svm.h"
+
+struct xe_vmas_in_madvise_range {
+	u64 addr;
+	u64 range;
+	struct xe_vma **vmas;
+	int num_vmas;
+	bool has_svm_vmas;
+	bool has_bo_vmas;
+	bool has_userptr_vmas;
+};
+
+static int get_vmas(struct xe_vm *vm, struct xe_vmas_in_madvise_range *madvise_range)
+{
+	u64 addr = madvise_range->addr;
+	u64 range = madvise_range->range;
+
+	struct xe_vma  **__vmas;
+	struct drm_gpuva *gpuva;
+	int max_vmas = 8;
+
+	lockdep_assert_held(&vm->lock);
+
+	madvise_range->num_vmas = 0;
+	madvise_range->vmas = kmalloc_array(max_vmas, sizeof(*madvise_range->vmas), GFP_KERNEL);
+	if (!madvise_range->vmas)
+		return -ENOMEM;
+
+	vm_dbg(&vm->xe->drm, "VMA's in range: start=0x%016llx, end=0x%016llx", addr, addr + range);
+
+	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, addr, addr + range) {
+		struct xe_vma *vma = gpuva_to_vma(gpuva);
+
+		if (xe_vma_bo(vma))
+			madvise_range->has_bo_vmas = true;
+		else if (xe_vma_is_cpu_addr_mirror(vma))
+			madvise_range->has_svm_vmas = true;
+		else if (xe_vma_is_userptr(vma))
+			madvise_range->has_userptr_vmas = true;
+
+		if (madvise_range->num_vmas == max_vmas) {
+			max_vmas <<= 1;
+			__vmas = krealloc(madvise_range->vmas,
+					  max_vmas * sizeof(*madvise_range->vmas),
+					  GFP_KERNEL);
+			if (!__vmas) {
+				kfree(madvise_range->vmas);
+				return -ENOMEM;
+			}
+			madvise_range->vmas = __vmas;
+		}
+
+		madvise_range->vmas[madvise_range->num_vmas] = vma;
+		(madvise_range->num_vmas)++;
+	}
+
+	if (!madvise_range->num_vmas)
+		kfree(madvise_range->vmas);
+
+	vm_dbg(&vm->xe->drm, "madvise_range-num_vmas = %d\n", madvise_range->num_vmas);
+
+	return 0;
+}
+
+static void madvise_preferred_mem_loc(struct xe_device *xe, struct xe_vm *vm,
+				      struct xe_vma **vmas, int num_vmas,
+				      struct drm_xe_madvise *op)
+{
+	/* Implementation pending */
+}
+
+static void madvise_atomic(struct xe_device *xe, struct xe_vm *vm,
+			   struct xe_vma **vmas, int num_vmas,
+			   struct drm_xe_madvise *op)
+{
+	/* Implementation pending */
+}
+
+static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
+			      struct xe_vma **vmas, int num_vmas,
+			      struct drm_xe_madvise *op)
+{
+	/* Implementation pending */
+}
+
+typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
+			     struct xe_vma **vmas, int num_vmas,
+			     struct drm_xe_madvise *op);
+
+static const madvise_func madvise_funcs[] = {
+	[DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC] = madvise_preferred_mem_loc,
+	[DRM_XE_MEM_RANGE_ATTR_ATOMIC] = madvise_atomic,
+	[DRM_XE_MEM_RANGE_ATTR_PAT] = madvise_pat_index,
+};
+
+static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start, u64 end)
+{
+	struct drm_gpuva *gpuva;
+	struct xe_tile *tile;
+	u8 id, tile_mask;
+
+	lockdep_assert_held_write(&vm->lock);
+
+	/* Wait for pending binds */
+	if (dma_resv_wait_timeout(xe_vm_resv(vm), DMA_RESV_USAGE_BOOKKEEP,
+				  false, MAX_SCHEDULE_TIMEOUT) <= 0)
+		XE_WARN_ON(1);
+
+	tile_mask = xe_svm_ranges_zap_ptes_in_range(vm, start, end);
+
+	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, start, end) {
+		struct xe_vma *vma = gpuva_to_vma(gpuva);
+
+		if (xe_vma_is_cpu_addr_mirror(vma))
+			continue;
+
+		for_each_tile(tile, vm->xe, id) {
+			if (xe_pt_zap_ptes(tile, vma)) {
+				tile_mask |= BIT(id);
+
+				/*
+				 * WRITE_ONCE pairs with READ_ONCE
+				 * in xe_vm_has_valid_gpu_mapping()
+				 */
+				WRITE_ONCE(vma->tile_invalidated,
+					   vma->tile_invalidated | BIT(id));
+			}
+		}
+	}
+
+	return tile_mask;
+}
+
+static int xe_vm_invalidate_madvise_range(struct xe_vm *vm, u64 start, u64 end)
+{
+	u8 tile_mask = xe_zap_ptes_in_madvise_range(vm, start, end);
+
+	if (!tile_mask)
+		return 0;
+
+	xe_device_wmb(vm->xe);
+
+	return xe_vm_range_tilemask_tlb_invalidation(vm, start, end, tile_mask);
+}
+
+static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madvise *args)
+{
+	if (XE_IOCTL_DBG(xe, !args))
+		return false;
+
+	if (XE_IOCTL_DBG(xe, !IS_ALIGNED(args->start, SZ_4K)))
+		return false;
+
+	if (XE_IOCTL_DBG(xe, !IS_ALIGNED(args->range, SZ_4K)))
+		return false;
+
+	if (XE_IOCTL_DBG(xe, args->range < SZ_4K))
+		return false;
+
+	switch (args->type) {
+	case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC:
+		if (XE_IOCTL_DBG(xe, args->preferred_mem_loc.migration_policy >
+				     DRM_XE_MIGRATE_ONLY_SYSTEM_PAGES))
+			return false;
+
+		if (XE_IOCTL_DBG(xe, args->preferred_mem_loc.pad))
+			return false;
+
+		if (XE_IOCTL_DBG(xe, args->atomic.reserved))
+			return false;
+		break;
+	case DRM_XE_MEM_RANGE_ATTR_ATOMIC:
+		if (XE_IOCTL_DBG(xe, args->atomic.val > DRM_XE_ATOMIC_CPU))
+			return false;
+
+		if (XE_IOCTL_DBG(xe, args->atomic.pad))
+			return false;
+
+		if (XE_IOCTL_DBG(xe, args->atomic.reserved))
+			return false;
+
+		break;
+	case DRM_XE_MEM_RANGE_ATTR_PAT:
+		/*TODO: Add valid pat check */
+		break;
+	default:
+		if (XE_IOCTL_DBG(xe, 1))
+			return false;
+	}
+
+	if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
+		return false;
+
+	return true;
+}
+
+/**
+ * xe_vm_madvise_ioctl - Handle MADVise ioctl for a VM
+ * @dev: DRM device pointer
+ * @data: Pointer to ioctl data (drm_xe_madvise*)
+ * @file: DRM file pointer
+ *
+ * Handles the MADVISE ioctl to provide memory advice for vma's within
+ * input range.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
+{
+	struct xe_device *xe = to_xe_device(dev);
+	struct xe_file *xef = to_xe_file(file);
+	struct drm_xe_madvise *args = data;
+	struct xe_vmas_in_madvise_range madvise_range = {.addr = args->start,
+							 .range =  args->range, };
+	struct xe_vm *vm;
+	struct drm_exec exec;
+	int err, attr_type;
+
+	vm = xe_vm_lookup(xef, args->vm_id);
+	if (XE_IOCTL_DBG(xe, !vm))
+		return -EINVAL;
+
+	if (!madvise_args_are_sane(vm->xe, args)) {
+		err = -EINVAL;
+		goto put_vm;
+	}
+
+	err = down_write_killable(&vm->lock);
+	if (err)
+		goto put_vm;
+
+	if (XE_IOCTL_DBG(xe, xe_vm_is_closed_or_banned(vm))) {
+		err = -ENOENT;
+		goto unlock_vm;
+	}
+
+	err = xe_vm_alloc_madvise_vma(vm, args->start, args->range);
+	if (err)
+		goto unlock_vm;
+
+	err = get_vmas(vm, &madvise_range);
+	if (err || !madvise_range.num_vmas)
+		goto unlock_vm;
+
+	if (madvise_range.has_bo_vmas) {
+		drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES | DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
+		drm_exec_until_all_locked(&exec) {
+			for (int i = 0; i < madvise_range.num_vmas; i++) {
+				struct xe_bo *bo = xe_vma_bo(madvise_range.vmas[i]);
+
+				if (!bo)
+					continue;
+				err = drm_exec_lock_obj(&exec, &bo->ttm.base);
+				drm_exec_retry_on_contention(&exec);
+				if (err)
+					goto err_fini;
+			}
+		}
+	}
+
+	if (madvise_range.has_userptr_vmas) {
+		err = down_read_interruptible(&vm->userptr.notifier_lock);
+		if (err)
+			goto err_fini;
+	}
+
+	if (madvise_range.has_svm_vmas) {
+		err = down_read_interruptible(&vm->svm.gpusvm.notifier_lock);
+		if (err)
+			goto unlock_userptr;
+	}
+
+	attr_type = array_index_nospec(args->type, ARRAY_SIZE(madvise_funcs));
+	madvise_funcs[attr_type](xe, vm, madvise_range.vmas, madvise_range.num_vmas, args);
+
+	err = xe_vm_invalidate_madvise_range(vm, args->start, args->start + args->range);
+
+	if (madvise_range.has_svm_vmas)
+		xe_svm_notifier_unlock(vm);
+
+unlock_userptr:
+	if (madvise_range.has_userptr_vmas)
+		up_read(&vm->userptr.notifier_lock);
+err_fini:
+	if (madvise_range.has_bo_vmas)
+		drm_exec_fini(&exec);
+	kfree(madvise_range.vmas);
+	madvise_range.vmas = NULL;
+unlock_vm:
+	up_write(&vm->lock);
+put_vm:
+	xe_vm_put(vm);
+	return err;
+}
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.h b/drivers/gpu/drm/xe/xe_vm_madvise.h
new file mode 100644
index 000000000000..b0e1fc445f23
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#ifndef _XE_VM_MADVISE_H_
+#define _XE_VM_MADVISE_H_
+
+struct drm_device;
+struct drm_file;
+
+int xe_vm_madvise_ioctl(struct drm_device *dev, void *data,
+			struct drm_file *file);
+
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 11/23] drm/xe/svm : Add svm ranges migration policy on atomic access
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (9 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 10/23] drm/xe: Implement madvise ioctl for xe Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-29  4:04   ` Matthew Brost
  2025-07-22 13:35 ` [PATCH v5 12/23] drm/xe/madvise: Update migration policy based on preferred location Himal Prasad Ghimiray
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

If the platform does not support atomic access on system memory, and the
ranges are in system memory, but the user requires atomic accesses on
the VMA, then migrate the ranges to VRAM. Apply this policy for prefetch
operations as well.

v2
- Drop unnecessary vm_dbg

v3 (Matthew Brost)
- fix atomic policy
- prefetch shouldn't have any impact of atomic
- bo can be accessed from vma, avoid duplicate parameter

v4 (Matthew Brost)
- Remove TODO comment
- Fix comment
- Dont allow gpu atomic ops when user is setting atomic attr as CPU

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c         | 23 +++++++++--------
 drivers/gpu/drm/xe/xe_svm.c        |  2 +-
 drivers/gpu/drm/xe/xe_vm.c         | 40 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm.h         |  2 ++
 drivers/gpu/drm/xe/xe_vm_madvise.c |  9 ++++++-
 5 files changed, 64 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index b499006df2cf..96d0ffe8154e 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -640,28 +640,31 @@ static const struct xe_pt_walk_ops xe_pt_stage_bind_ops = {
  *    - In all other cases device atomics will be disabled with AE=0 until an application
  *      request differently using a ioctl like madvise.
  */
-static bool xe_atomic_for_vram(struct xe_vm *vm)
+static bool xe_atomic_for_vram(struct xe_vm *vm, struct xe_vma *vma)
 {
+	if (vma->attr.atomic_access == DRM_XE_ATOMIC_CPU)
+		return false;
+
 	return true;
 }
 
-static bool xe_atomic_for_system(struct xe_vm *vm, struct xe_bo *bo)
+static bool xe_atomic_for_system(struct xe_vm *vm, struct xe_vma *vma)
 {
 	struct xe_device *xe = vm->xe;
+	struct xe_bo *bo = xe_vma_bo(vma);
 
-	if (!xe->info.has_device_atomics_on_smem)
+	if (!xe->info.has_device_atomics_on_smem ||
+	    vma->attr.atomic_access == DRM_XE_ATOMIC_CPU)
 		return false;
 
+	if (vma->attr.atomic_access == DRM_XE_ATOMIC_DEVICE)
+		return true;
+
 	/*
 	 * If a SMEM+LMEM allocation is backed by SMEM, a device
 	 * atomics will cause a gpu page fault and which then
 	 * gets migrated to LMEM, bind such allocations with
 	 * device atomics enabled.
-	 *
-	 * TODO: Revisit this. Perhaps add something like a
-	 * fault_on_atomics_in_system UAPI flag.
-	 * Note that this also prohibits GPU atomics in LR mode for
-	 * userptr and system memory on DGFX.
 	 */
 	return (!IS_DGFX(xe) || (!xe_vm_in_lr_mode(vm) ||
 				 (bo && xe_bo_has_single_placement(bo))));
@@ -744,8 +747,8 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 		goto walk_pt;
 
 	if (vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) {
-		xe_walk.default_vram_pte = xe_atomic_for_vram(vm) ? XE_USM_PPGTT_PTE_AE : 0;
-		xe_walk.default_system_pte = xe_atomic_for_system(vm, bo) ?
+		xe_walk.default_vram_pte = xe_atomic_for_vram(vm, vma) ? XE_USM_PPGTT_PTE_AE : 0;
+		xe_walk.default_system_pte = xe_atomic_for_system(vm, vma) ?
 			XE_USM_PPGTT_PTE_AE : 0;
 	}
 
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index c093dc453e32..49d3405aacb9 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -813,7 +813,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP),
 		.check_pages_threshold = IS_DGFX(vm->xe) &&
 			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ? SZ_64K : 0,
-		.devmem_only = atomic && IS_DGFX(vm->xe) &&
+		.devmem_only = xe_vma_need_vram_for_atomic(vm->xe, vma, atomic) &&
 			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP),
 		.timeslice_ms = atomic && IS_DGFX(vm->xe) &&
 			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ?
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 7f3d0ad04b3f..be51fcf322ec 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -4177,6 +4177,46 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap)
 	kvfree(snap);
 }
 
+/**
+ * xe_vma_need_vram_for_atomic - Check if VMA needs VRAM migration for atomic operations
+ * @xe: Pointer to the XE device structure
+ * @vma: Pointer to the virtual memory area (VMA) structure
+ * @is_atomic: In pagefault path and atomic operation
+ *
+ * This function determines whether the given VMA needs to be migrated to
+ * VRAM in order to do atomic GPU operation.
+ *
+ * Return: true if migration to VRAM is required, false otherwise.
+ */
+bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool is_atomic)
+{
+	if (!IS_DGFX(xe))
+		return false;
+
+	/*
+	 * NOTE: The checks implemented here are platform-specific. For
+	 * instance, on a device supporting CXL atomics, these would ideally
+	 * work universally without additional handling.
+	 */
+	switch (vma->attr.atomic_access) {
+	case DRM_XE_ATOMIC_DEVICE:
+		return !xe->info.has_device_atomics_on_smem;
+
+	case DRM_XE_ATOMIC_CPU:
+		XE_WARN_ON(is_atomic);
+		return false;
+
+	case DRM_XE_ATOMIC_UNDEFINED:
+		return is_atomic;
+
+	case DRM_XE_ATOMIC_GLOBAL:
+		return true;
+
+	default:
+		return is_atomic;
+	}
+}
+
 /**
  * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
  * @vm: Pointer to the xe_vm structure
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 0d6b08cc4163..d5bc09ae640c 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -171,6 +171,8 @@ static inline bool xe_vma_is_userptr(struct xe_vma *vma)
 
 struct xe_vma *xe_vm_find_vma_by_addr(struct xe_vm *vm, u64 page_addr);
 
+bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool is_atomic);
+
 int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index f64728120d7c..62dc5cec8950 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -85,7 +85,14 @@ static void madvise_atomic(struct xe_device *xe, struct xe_vm *vm,
 			   struct xe_vma **vmas, int num_vmas,
 			   struct drm_xe_madvise *op)
 {
-	/* Implementation pending */
+	int i;
+
+	xe_assert(vm->xe, op->type == DRM_XE_MEM_RANGE_ATTR_ATOMIC);
+	xe_assert(vm->xe, op->atomic.val <= DRM_XE_ATOMIC_CPU);
+
+	for (i = 0; i < num_vmas; i++)
+		vmas[i]->attr.atomic_access = op->atomic.val;
+	/*TODO: handle bo backed vmas */
 }
 
 static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 12/23] drm/xe/madvise: Update migration policy based on preferred location
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (10 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 11/23] drm/xe/svm : Add svm ranges migration policy on atomic access Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-29  4:07   ` Matthew Brost
  2025-07-22 13:35 ` [PATCH v5 13/23] drm/xe/svm: Support DRM_XE_SVM_ATTR_PAT memory attribute Himal Prasad Ghimiray
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

When the user sets the valid devmem_fd as a preferred location, GPU fault
will trigger migration to tile of device associated with devmem_fd.

If the user sets an invalid devmem_fd the preferred location is current
placement(smem) only.

v2(Matthew Brost)
- Default should be faulting tile
- remove devmem_fd used as region

v3 (Matthew Brost)
- Add migration_policy
- Fix return condition
- fix migrate condition

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c        | 40 +++++++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_svm.h        |  8 ++++++
 drivers/gpu/drm/xe/xe_vm_madvise.c | 21 +++++++++++++++-
 3 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 49d3405aacb9..ba1233d0d5a2 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -790,6 +790,37 @@ bool xe_svm_range_needs_migrate_to_vram(struct xe_svm_range *range, struct xe_vm
 	return true;
 }
 
+/**
+ * xe_vma_resolve_pagemap - Resolve the appropriate DRM pagemap for a VMA
+ * @vma: Pointer to the xe_vma structure containing memory attributes
+ * @tile: Pointer to the xe_tile structure used as fallback for VRAM mapping
+ *
+ * This function determines the correct DRM pagemap to use for a given VMA.
+ * It first checks if a valid devmem_fd is provided in the VMA's preferred
+ * location. If the devmem_fd is negative, it returns NULL, indicating no
+ * pagemap is available and smem to be used as preferred location.
+ * If the devmem_fd is equal to the default faulting
+ * GT identifier, it returns the VRAM pagemap associated with the tile.
+ *
+ * Future support for multi-device configurations may use drm_pagemap_from_fd()
+ * to resolve pagemaps from arbitrary file descriptors.
+ *
+ * Return: A pointer to the resolved drm_pagemap, or NULL if none is applicable.
+ */
+struct drm_pagemap *xe_vma_resolve_pagemap(struct xe_vma *vma, struct xe_tile *tile)
+{
+	s32 fd = (s32)vma->attr.preferred_loc.devmem_fd;
+
+	if (fd == DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM)
+		return NULL;
+
+	if (fd == DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE)
+		return IS_DGFX(tile_to_xe(tile)) ? xe_tile_local_pagemap(tile) : NULL;
+
+	/* TODO: Support multi-device with drm_pagemap_from_fd(fd) */
+	return NULL;
+}
+
 /**
  * xe_svm_handle_pagefault() - SVM handle page fault
  * @vm: The VM.
@@ -821,6 +852,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 	};
 	struct xe_svm_range *range;
 	struct dma_fence *fence;
+	struct drm_pagemap *dpagemap;
 	struct xe_tile *tile = gt_to_tile(gt);
 	int migrate_try_count = ctx.devmem_only ? 3 : 1;
 	ktime_t end = 0;
@@ -850,8 +882,14 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 
 	range_debug(range, "PAGE FAULT");
 
+	dpagemap = xe_vma_resolve_pagemap(vma, tile);
 	if (--migrate_try_count >= 0 &&
-	    xe_svm_range_needs_migrate_to_vram(range, vma, IS_DGFX(vm->xe))) {
+	    xe_svm_range_needs_migrate_to_vram(range, vma, !!dpagemap || ctx.devmem_only)) {
+		/* TODO : For multi-device dpagemap will be used to find the
+		 * remote tile and remote device. Will need to modify
+		 * xe_svm_alloc_vram to use dpagemap for future multi-device
+		 * support.
+		 */
 		err = xe_svm_alloc_vram(tile, range, &ctx);
 		ctx.timeslice_ms <<= 1;	/* Double timeslice if we have to retry */
 		if (err) {
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index b0da0e85f0b8..494823afaa98 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -94,6 +94,8 @@ void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end);
 
 u8 xe_svm_ranges_zap_ptes_in_range(struct xe_vm *vm, u64 start, u64 end);
 
+struct drm_pagemap *xe_vma_resolve_pagemap(struct xe_vma *vma, struct xe_tile *tile);
+
 /**
  * xe_svm_range_has_dma_mapping() - SVM range has DMA mapping
  * @range: SVM range
@@ -318,6 +320,12 @@ u8 xe_svm_ranges_zap_ptes_in_range(struct xe_vm *vm, u64 start, u64 end)
 	return 0;
 }
 
+static inline
+struct drm_pagemap *xe_vma_resolve_pagemap(struct xe_vma *vma, struct xe_tile *tile)
+{
+	return NULL;
+}
+
 #define xe_svm_assert_in_notifier(...) do {} while (0)
 #define xe_svm_range_has_dma_mapping(...) false
 
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 62dc5cec8950..17959257ee1d 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -78,7 +78,19 @@ static void madvise_preferred_mem_loc(struct xe_device *xe, struct xe_vm *vm,
 				      struct xe_vma **vmas, int num_vmas,
 				      struct drm_xe_madvise *op)
 {
-	/* Implementation pending */
+	int i;
+
+	xe_assert(vm->xe, op->type == DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC);
+
+	for (i = 0; i < num_vmas; i++) {
+		vmas[i]->attr.preferred_loc.devmem_fd = op->preferred_mem_loc.devmem_fd;
+
+		/* Till multi-device support is not added migration_policy
+		 * is of no use and can be ignored.
+		 */
+		vmas[i]->attr.preferred_loc.migration_policy =
+						op->preferred_mem_loc.migration_policy;
+	}
 }
 
 static void madvise_atomic(struct xe_device *xe, struct xe_vm *vm,
@@ -178,6 +190,12 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
 
 	switch (args->type) {
 	case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC:
+	{
+		s32 fd = (s32)args->preferred_mem_loc.devmem_fd;
+
+		if (XE_IOCTL_DBG(xe, fd < DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM))
+			return false;
+
 		if (XE_IOCTL_DBG(xe, args->preferred_mem_loc.migration_policy >
 				     DRM_XE_MIGRATE_ONLY_SYSTEM_PAGES))
 			return false;
@@ -188,6 +206,7 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
 		if (XE_IOCTL_DBG(xe, args->atomic.reserved))
 			return false;
 		break;
+	}
 	case DRM_XE_MEM_RANGE_ATTR_ATOMIC:
 		if (XE_IOCTL_DBG(xe, args->atomic.val > DRM_XE_ATOMIC_CPU))
 			return false;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 13/23] drm/xe/svm: Support DRM_XE_SVM_ATTR_PAT memory attribute
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (11 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 12/23] drm/xe/madvise: Update migration policy based on preferred location Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-23 16:55   ` Ghimiray, Himal Prasad
  2025-07-22 13:35 ` [PATCH v5 14/23] drm/xe/uapi: Add flag for consulting madvise hints on svm prefetch Himal Prasad Ghimiray
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

This attributes sets the pat_index for the svm used vma range, which is
utilized to ascertain the coherence.

v2 (Matthew Brost)
- Pat index sanity check

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_vm_madvise.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 17959257ee1d..1dc4d19a5f2a 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -9,6 +9,7 @@
 #include <drm/xe_drm.h>
 
 #include "xe_bo.h"
+#include "xe_pat.h"
 #include "xe_pt.h"
 #include "xe_svm.h"
 
@@ -111,7 +112,13 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
 			      struct xe_vma **vmas, int num_vmas,
 			      struct drm_xe_madvise *op)
 {
-	/* Implementation pending */
+	int i;
+
+	xe_assert(vm->xe, op->type == DRM_XE_MEM_RANGE_ATTR_PAT);
+
+	for (i = 0; i < num_vmas; i++)
+		vmas[i]->attr.pat_index = op->pat_index.val;
+
 }
 
 typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
@@ -219,8 +226,22 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
 
 		break;
 	case DRM_XE_MEM_RANGE_ATTR_PAT:
-		/*TODO: Add valid pat check */
+	{
+		u16 coh_mode = xe_pat_index_get_coh_mode(xe, args->pat_index.val);
+
+		if (XE_IOCTL_DBG(xe, !coh_mode))
+			return false;
+
+		if (XE_WARN_ON(coh_mode > XE_COH_AT_LEAST_1WAY))
+			return false;
+
+		if (XE_IOCTL_DBG(xe, args->pat_index.pad))
+			return false;
+
+		if (XE_IOCTL_DBG(xe, args->pat_index.reserved))
+			return false;
 		break;
+	}
 	default:
 		if (XE_IOCTL_DBG(xe, 1))
 			return false;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 14/23] drm/xe/uapi: Add flag for consulting madvise hints on svm prefetch
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (12 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 13/23] drm/xe/svm: Support DRM_XE_SVM_ATTR_PAT memory attribute Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-22 13:35 ` [PATCH v5 15/23] drm/xe/svm: Consult madvise preferred location in prefetch Himal Prasad Ghimiray
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

Introduce flag DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC to ensure prefetching
in madvise-advised memory regions

v2 (Matthew Brost)
- Add kernel-doc

v3 (Matthew Brost)
- Fix kernel-doc

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 include/uapi/drm/xe_drm.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 51dcf63684b0..8f1d48664424 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1006,6 +1006,10 @@ struct drm_xe_vm_destroy {
  *    valid on VMs with DRM_XE_VM_CREATE_FLAG_FAULT_MODE set. The CPU address
  *    mirror flag are only valid for DRM_XE_VM_BIND_OP_MAP operations, the BO
  *    handle MBZ, and the BO offset MBZ.
+ *
+ * The @prefetch_mem_region_instance for %DRM_XE_VM_BIND_OP_PREFETCH can also be:
+ *  - %DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC, which ensures prefetching occurs in
+ *    the memory region advised by madvise.
  */
 struct drm_xe_vm_bind_op {
 	/** @extensions: Pointer to the first extension struct, if any */
@@ -1111,6 +1115,7 @@ struct drm_xe_vm_bind_op {
 	/** @flags: Bind flags */
 	__u32 flags;
 
+#define DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC	-1
 	/**
 	 * @prefetch_mem_region_instance: Memory region to prefetch VMA to.
 	 * It is a region instance, not a mask.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 15/23] drm/xe/svm: Consult madvise preferred location in prefetch
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (13 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 14/23] drm/xe/uapi: Add flag for consulting madvise hints on svm prefetch Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-22 13:35 ` [PATCH v5 16/23] drm/xe/bo: Add attributes field to xe_bo Himal Prasad Ghimiray
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

When prefetch region is DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC, prefetch svm
ranges to preferred location provided by madvise.

v2 (Matthew Brost)
- Fix region, devmem_fd usages
- consult madvise is applicable for other vma's too.

v3
- Fix atomic handling

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.h  |  1 -
 drivers/gpu/drm/xe/xe_tile.h | 18 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm.c   | 26 ++++++++++++++++++--------
 3 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 494823afaa98..7cf83f59ea48 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -325,7 +325,6 @@ struct drm_pagemap *xe_vma_resolve_pagemap(struct xe_vma *vma, struct xe_tile *t
 {
 	return NULL;
 }
-
 #define xe_svm_assert_in_notifier(...) do {} while (0)
 #define xe_svm_range_has_dma_mapping(...) false
 
diff --git a/drivers/gpu/drm/xe/xe_tile.h b/drivers/gpu/drm/xe/xe_tile.h
index 066a3d0cea79..fb657f5364f3 100644
--- a/drivers/gpu/drm/xe/xe_tile.h
+++ b/drivers/gpu/drm/xe/xe_tile.h
@@ -10,6 +10,24 @@
 
 struct xe_tile;
 
+#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
+/**
+ * xe_tile_from_dpagemap - Find xe_tile from drm_pagemap
+ * @dpagemap: pointer to struct drm_pagemap
+ *
+ * Return: Pointer to xe_tile
+ */
+static inline struct xe_tile *xe_tile_from_dpagemap(struct drm_pagemap *dpagemap)
+{
+	return container_of(dpagemap, struct xe_tile, mem.vram.dpagemap);
+}
+
+#else
+static inline  struct xe_tile *xe_tile_from_dpagemap(struct drm_pagemap *dpagemap)
+{
+	return NULL;
+}
+#endif
 int xe_tile_init_early(struct xe_tile *tile, struct xe_device *xe, u8 id);
 int xe_tile_init_noalloc(struct xe_tile *tile);
 int xe_tile_init(struct xe_tile *tile);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index be51fcf322ec..2226b1eb46f1 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -38,6 +38,7 @@
 #include "xe_res_cursor.h"
 #include "xe_svm.h"
 #include "xe_sync.h"
+#include "xe_tile.h"
 #include "xe_trace_bo.h"
 #include "xe_wa.h"
 #include "xe_hmm.h"
@@ -2907,15 +2908,24 @@ static int prefetch_ranges(struct xe_vm *vm, struct xe_vma_op *op)
 	int err = 0;
 
 	struct xe_svm_range *svm_range;
+	struct drm_pagemap *dpagemap;
 	struct drm_gpusvm_ctx ctx = {};
-	struct xe_tile *tile;
+	struct xe_tile *tile = NULL;
 	unsigned long i;
 	u32 region;
 
 	if (!xe_vma_is_cpu_addr_mirror(vma))
 		return 0;
 
-	region = op->prefetch_range.region;
+	if (op->prefetch_range.region == DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC) {
+		dpagemap = xe_vma_resolve_pagemap(vma, xe_device_get_root_tile(vm->xe));
+		if (dpagemap)
+			tile = xe_tile_from_dpagemap(dpagemap);
+	} else {
+		region = op->prefetch_range.region;
+		if (region)
+			tile = &vm->xe->tiles[region_to_mem_type[region] - XE_PL_VRAM0];
+	}
 
 	ctx.read_only = xe_vma_read_only(vma);
 	ctx.devmem_possible = devmem_possible;
@@ -2923,11 +2933,10 @@ static int prefetch_ranges(struct xe_vm *vm, struct xe_vma_op *op)
 
 	/* TODO: Threading the migration */
 	xa_for_each(&op->prefetch_range.range, i, svm_range) {
-		if (!region)
+		if (!tile)
 			xe_svm_range_migrate_to_smem(vm, svm_range);
 
-		if (xe_svm_range_needs_migrate_to_vram(svm_range, vma, region)) {
-			tile = &vm->xe->tiles[region_to_mem_type[region] - XE_PL_VRAM0];
+		if (xe_svm_range_needs_migrate_to_vram(svm_range, vma, !!tile)) {
 			err = xe_svm_alloc_vram(tile, svm_range, &ctx);
 			if (err) {
 				drm_dbg(&vm->xe->drm, "VRAM allocation failed, retry from userspace, asid=%u, gpusvm=%p, errno=%pe\n",
@@ -2995,7 +3004,8 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
 		else
 			region = op->prefetch.region;
 
-		xe_assert(vm->xe, region <= ARRAY_SIZE(region_to_mem_type));
+		xe_assert(vm->xe, region == DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC ||
+			  region <= ARRAY_SIZE(region_to_mem_type));
 
 		err = vma_lock_and_validate(exec,
 					    gpuva_to_vma(op->base.prefetch.va),
@@ -3413,8 +3423,8 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe, struct xe_vm *vm,
 				 op == DRM_XE_VM_BIND_OP_PREFETCH) ||
 		    XE_IOCTL_DBG(xe, prefetch_region &&
 				 op != DRM_XE_VM_BIND_OP_PREFETCH) ||
-		    XE_IOCTL_DBG(xe, !(BIT(prefetch_region) &
-				       xe->info.mem_region_mask)) ||
+		    XE_IOCTL_DBG(xe,  (prefetch_region != DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC &&
+				       !(BIT(prefetch_region) & xe->info.mem_region_mask))) ||
 		    XE_IOCTL_DBG(xe, obj &&
 				 op == DRM_XE_VM_BIND_OP_UNMAP)) {
 			err = -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 16/23] drm/xe/bo: Add attributes field to xe_bo
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (14 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 15/23] drm/xe/svm: Consult madvise preferred location in prefetch Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-22 13:35 ` [PATCH v5 17/23] drm/xe/bo: Update atomic_access attribute on madvise Himal Prasad Ghimiray
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

A single BO can be linked to multiple VMAs, making VMA attributes
insufficient for determining the placement and PTE update attributes
of the BO. To address this, an attributes field has been added to the
BO.

Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_bo_types.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index ff560d82496f..ad6bc01386d4 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -60,6 +60,14 @@ struct xe_bo {
 	 */
 	struct list_head client_link;
 #endif
+	/** @attr: User controlled attributes for bo */
+	struct {
+		/**
+		 * @atomic_access: type of atomic access bo needs
+		 * protected by bo dma-resv lock
+		 */
+		u32 atomic_access;
+	} attr;
 	/**
 	 * @pxp_key_instance: PXP key instance this BO was created against. A
 	 * 0 in this variable indicates that the BO does not use PXP encryption.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 17/23] drm/xe/bo: Update atomic_access attribute on madvise
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (15 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 16/23] drm/xe/bo: Add attributes field to xe_bo Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-29  4:18   ` Matthew Brost
  2025-07-22 13:35 ` [PATCH v5 18/23] drm/xe/madvise: Skip vma invalidation if mem attr are unchanged Himal Prasad Ghimiray
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

Update the bo_atomic_access based on user-provided input and determine
the migration to smem during a CPU fault

v2 (Matthew Brost)
- Avoid cpu unmapping if bo is already in smem
- check atomics on smem too for ioctl
- Add comments

v3
- Avoid migration in prefetch

v4 (Matthew Brost)
- make sanity check function bool
- add assert for smem placement
- fix doc

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c           | 29 +++++++++++--
 drivers/gpu/drm/xe/xe_gt_pagefault.c |  2 +-
 drivers/gpu/drm/xe/xe_vm.c           |  5 ++-
 drivers/gpu/drm/xe/xe_vm_madvise.c   | 62 +++++++++++++++++++++++++++-
 4 files changed, 91 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 4e0355d0f406..f133fc54664e 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1685,6 +1685,18 @@ static void xe_gem_object_close(struct drm_gem_object *obj,
 	}
 }
 
+static bool should_migrate_to_smem(struct xe_bo *bo)
+{
+	/*
+	 * NOTE: The following atomic checks are platform-specific. For example,
+	 * if a device supports CXL atomics, these may not be necessary or
+	 * may behave differently.
+	 */
+
+	return bo->attr.atomic_access == DRM_XE_ATOMIC_GLOBAL ||
+	       bo->attr.atomic_access == DRM_XE_ATOMIC_CPU;
+}
+
 static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
 {
 	struct ttm_buffer_object *tbo = vmf->vma->vm_private_data;
@@ -1693,7 +1705,7 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
 	struct xe_bo *bo = ttm_to_xe_bo(tbo);
 	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
 	vm_fault_t ret;
-	int idx;
+	int idx, r = 0;
 
 	if (needs_rpm)
 		xe_pm_runtime_get(xe);
@@ -1705,8 +1717,19 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
 	if (drm_dev_enter(ddev, &idx)) {
 		trace_xe_bo_cpu_fault(bo);
 
-		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
-					       TTM_BO_VM_NUM_PREFAULT);
+		if (should_migrate_to_smem(bo)) {
+			xe_assert(xe, bo->flags & XE_BO_FLAG_SYSTEM);
+
+			r = xe_bo_migrate(bo, XE_PL_TT);
+			if (r == -EBUSY || r == -ERESTARTSYS || r == -EINTR)
+				ret = VM_FAULT_NOPAGE;
+			else if (r)
+				ret = VM_FAULT_SIGBUS;
+		}
+		if (!ret)
+			ret = ttm_bo_vm_fault_reserved(vmf,
+						       vmf->vma->vm_page_prot,
+						       TTM_BO_VM_NUM_PREFAULT);
 		drm_dev_exit(idx);
 	} else {
 		ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 5a75d56d8558..c1cb69c6ada8 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -84,7 +84,7 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
 	if (err)
 		return err;
 
-	if (atomic && IS_DGFX(vm->xe)) {
+	if (xe_vma_need_vram_for_atomic(vm->xe, vma, atomic)) {
 		if (xe_vma_is_userptr(vma)) {
 			err = -EACCES;
 			return err;
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 2226b1eb46f1..5dc7cd7769f8 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -4200,6 +4200,9 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap)
  */
 bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool is_atomic)
 {
+	u32 atomic_access = xe_vma_bo(vma) ? xe_vma_bo(vma)->attr.atomic_access :
+					     vma->attr.atomic_access;
+
 	if (!IS_DGFX(xe))
 		return false;
 
@@ -4208,7 +4211,7 @@ bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool
 	 * instance, on a device supporting CXL atomics, these would ideally
 	 * work universally without additional handling.
 	 */
-	switch (vma->attr.atomic_access) {
+	switch (atomic_access) {
 	case DRM_XE_ATOMIC_DEVICE:
 		return !xe->info.has_device_atomics_on_smem;
 
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 1dc4d19a5f2a..727833780b4b 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -98,14 +98,28 @@ static void madvise_atomic(struct xe_device *xe, struct xe_vm *vm,
 			   struct xe_vma **vmas, int num_vmas,
 			   struct drm_xe_madvise *op)
 {
+	struct xe_bo *bo;
 	int i;
 
 	xe_assert(vm->xe, op->type == DRM_XE_MEM_RANGE_ATTR_ATOMIC);
 	xe_assert(vm->xe, op->atomic.val <= DRM_XE_ATOMIC_CPU);
 
-	for (i = 0; i < num_vmas; i++)
+	for (i = 0; i < num_vmas; i++) {
 		vmas[i]->attr.atomic_access = op->atomic.val;
-	/*TODO: handle bo backed vmas */
+
+		bo = xe_vma_bo(vmas[i]);
+		if (!bo)
+			continue;
+
+		xe_bo_assert_held(bo);
+		bo->attr.atomic_access = op->atomic.val;
+
+		/* Invalidate cpu page table, so bo can migrate to smem in next access */
+		if (xe_bo_is_vram(bo) &&
+		    (bo->attr.atomic_access == DRM_XE_ATOMIC_CPU ||
+		     bo->attr.atomic_access == DRM_XE_ATOMIC_GLOBAL))
+			ttm_bo_unmap_virtual(&bo->ttm);
+	}
 }
 
 static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
@@ -253,6 +267,41 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
 	return true;
 }
 
+static bool check_bo_args_are_sane(struct xe_vm *vm, struct xe_vma **vmas,
+				   int num_vmas, u32 atomic_val)
+{
+	struct xe_device *xe = vm->xe;
+	struct xe_bo *bo;
+	int i;
+
+	for (i = 0; i < num_vmas; i++) {
+		bo = xe_vma_bo(vmas[i]);
+		if (!bo)
+			continue;
+		/*
+		 * NOTE: The following atomic checks are platform-specific. For example,
+		 * if a device supports CXL atomics, these may not be necessary or
+		 * may behave differently.
+		 */
+		if (XE_IOCTL_DBG(xe, atomic_val == DRM_XE_ATOMIC_CPU &&
+				 !(bo->flags & XE_BO_FLAG_SYSTEM)))
+			return false;
+
+		if (XE_IOCTL_DBG(xe, atomic_val == DRM_XE_ATOMIC_DEVICE &&
+				 !(bo->flags & XE_BO_FLAG_VRAM0) &&
+				 !(bo->flags & XE_BO_FLAG_VRAM1) &&
+				 !(bo->flags & XE_BO_FLAG_SYSTEM &&
+				   xe->info.has_device_atomics_on_smem)))
+			return false;
+
+		if (XE_IOCTL_DBG(xe, atomic_val == DRM_XE_ATOMIC_GLOBAL &&
+				 (!(bo->flags & XE_BO_FLAG_SYSTEM) ||
+				  (!(bo->flags & XE_BO_FLAG_VRAM0) &&
+				   !(bo->flags & XE_BO_FLAG_VRAM1)))))
+			return false;
+	}
+	return true;
+}
 /**
  * xe_vm_madvise_ioctl - Handle MADVise ioctl for a VM
  * @dev: DRM device pointer
@@ -302,6 +351,15 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
 		goto unlock_vm;
 
 	if (madvise_range.has_bo_vmas) {
+		if (args->type == DRM_XE_MEM_RANGE_ATTR_ATOMIC) {
+			if (!check_bo_args_are_sane(vm, madvise_range.vmas,
+						    madvise_range.num_vmas,
+						    args->atomic.val)) {
+				err = -EINVAL;
+				goto unlock_vm;
+			}
+		}
+
 		drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES | DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
 		drm_exec_until_all_locked(&exec) {
 			for (int i = 0; i < madvise_range.num_vmas; i++) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 18/23] drm/xe/madvise: Skip vma invalidation if mem attr are unchanged
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (16 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 17/23] drm/xe/bo: Update atomic_access attribute on madvise Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-29  4:19   ` Matthew Brost
  2025-07-22 13:35 ` [PATCH v5 19/23] drm/xe/vm: Add helper to check for default VMA memory attributes Himal Prasad Ghimiray
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

If a VMA within the madvise input range already has the same memory
attribute as the one requested by the user, skip PTE zapping for that
VMA to avoid unnecessary invalidation.

v2 (Matthew Brost)
- fix skip_invalidation for new attributes
- s/u32/bool
- Remove unnecessary assignment  for kzalloc'ed

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
 drivers/gpu/drm/xe/xe_vm_madvise.c | 52 +++++++++++++++++++++---------
 drivers/gpu/drm/xe/xe_vm_types.h   |  6 ++++
 2 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 727833780b4b..fbb6aa8a7a5e 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -84,13 +84,19 @@ static void madvise_preferred_mem_loc(struct xe_device *xe, struct xe_vm *vm,
 	xe_assert(vm->xe, op->type == DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC);
 
 	for (i = 0; i < num_vmas; i++) {
-		vmas[i]->attr.preferred_loc.devmem_fd = op->preferred_mem_loc.devmem_fd;
-
-		/* Till multi-device support is not added migration_policy
-		 * is of no use and can be ignored.
-		 */
-		vmas[i]->attr.preferred_loc.migration_policy =
+		if (vmas[i]->attr.preferred_loc.devmem_fd == op->preferred_mem_loc.devmem_fd &&
+		    vmas[i]->attr.preferred_loc.migration_policy ==
+		    op->preferred_mem_loc.migration_policy) {
+			vmas[i]->skip_invalidation = true;
+		} else {
+			vmas[i]->skip_invalidation = false;
+			vmas[i]->attr.preferred_loc.devmem_fd = op->preferred_mem_loc.devmem_fd;
+			/* Till multi-device support is not added migration_policy
+			 * is of no use and can be ignored.
+			 */
+			vmas[i]->attr.preferred_loc.migration_policy =
 						op->preferred_mem_loc.migration_policy;
+		}
 	}
 }
 
@@ -105,7 +111,12 @@ static void madvise_atomic(struct xe_device *xe, struct xe_vm *vm,
 	xe_assert(vm->xe, op->atomic.val <= DRM_XE_ATOMIC_CPU);
 
 	for (i = 0; i < num_vmas; i++) {
-		vmas[i]->attr.atomic_access = op->atomic.val;
+		if (vmas[i]->attr.atomic_access == op->atomic.val) {
+			vmas[i]->skip_invalidation = true;
+		} else {
+			vmas[i]->skip_invalidation = false;
+			vmas[i]->attr.atomic_access = op->atomic.val;
+		}
 
 		bo = xe_vma_bo(vmas[i]);
 		if (!bo)
@@ -130,9 +141,14 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
 
 	xe_assert(vm->xe, op->type == DRM_XE_MEM_RANGE_ATTR_PAT);
 
-	for (i = 0; i < num_vmas; i++)
-		vmas[i]->attr.pat_index = op->pat_index.val;
-
+	for (i = 0; i < num_vmas; i++) {
+		if (vmas[i]->attr.pat_index == op->pat_index.val) {
+			vmas[i]->skip_invalidation = true;
+		} else {
+			vmas[i]->skip_invalidation = false;
+			vmas[i]->attr.pat_index = op->pat_index.val;
+		}
+	}
 }
 
 typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
@@ -158,17 +174,20 @@ static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start, u64 end)
 				  false, MAX_SCHEDULE_TIMEOUT) <= 0)
 		XE_WARN_ON(1);
 
-	tile_mask = xe_svm_ranges_zap_ptes_in_range(vm, start, end);
-
 	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, start, end) {
 		struct xe_vma *vma = gpuva_to_vma(gpuva);
 
-		if (xe_vma_is_cpu_addr_mirror(vma))
+		if (vma->skip_invalidation)
 			continue;
 
-		for_each_tile(tile, vm->xe, id) {
-			if (xe_pt_zap_ptes(tile, vma)) {
-				tile_mask |= BIT(id);
+		if (xe_vma_is_cpu_addr_mirror(vma)) {
+			tile_mask |= xe_svm_ranges_zap_ptes_in_range(vm,
+								      xe_vma_start(vma),
+								      xe_vma_end(vma));
+		} else {
+			for_each_tile(tile, vm->xe, id) {
+				if (xe_pt_zap_ptes(tile, vma)) {
+					tile_mask |= BIT(id);
 
 				/*
 				 * WRITE_ONCE pairs with READ_ONCE
@@ -176,6 +195,7 @@ static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start, u64 end)
 				 */
 				WRITE_ONCE(vma->tile_invalidated,
 					   vma->tile_invalidated | BIT(id));
+				}
 			}
 		}
 	}
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index cd94d8b5819d..81d92d886578 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -157,6 +157,12 @@ struct xe_vma {
 	/** @tile_staged: bind is staged for this VMA */
 	u8 tile_staged;
 
+	/**
+	 * @skip_invalidation: Used in madvise to avoid invalidation
+	 * if mem attributes doesn't change
+	 */
+	bool skip_invalidation;
+
 	/**
 	 * @ufence: The user fence that was provided with MAP.
 	 * Needs to be signalled before UNMAP can be processed.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 19/23] drm/xe/vm: Add helper to check for default VMA memory attributes
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (17 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 18/23] drm/xe/madvise: Skip vma invalidation if mem attr are unchanged Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-29  4:33   ` Matthew Brost
  2025-07-22 13:35 ` [PATCH v5 20/23] drm/xe: Reset VMA attributes to default in SVM garbage collector Himal Prasad Ghimiray
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

Introduce a new helper function `xe_vma_has_default_mem_attrs()` to
determine whether a VMA's memory attributes are set to their default
values. This includes checks for atomic access, PAT index, and preferred
location.

Also, add a new field `default_pat_index` to `struct xe_vma_mem_attr`
to track the initial PAT index set during the first bind. This helps
distinguish between default and user-modified pat index, such as those
changed via madvise.

Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c       | 24 ++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm.h       |  2 ++
 drivers/gpu/drm/xe/xe_vm_types.h |  6 ++++++
 3 files changed, 32 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 5dc7cd7769f8..d3f08bf9a3ee 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2592,6 +2592,29 @@ static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
 	return err;
 }
 
+/**
+ * xe_vma_has_default_mem_attrs - Check if a VMA has default memory attributes
+ * @vma: Pointer to the xe_vma structure to check
+ *
+ * This function determines whether the given VMA (Virtual Memory Area)
+ * has its memory attributes set to their default values. Specifically,
+ * it checks the following conditions:
+ *
+ * - `atomic_access` is `DRM_XE_VMA_ATOMIC_UNDEFINED`
+ * - `pat_index` is equal to `default_pat_index`
+ * - `preferred_loc.devmem_fd` is `DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE`
+ * - `preferred_loc.migration_policy` is `DRM_XE_MIGRATE_ALL_PAGES`
+ *
+ * Return: true if all attributes are at their default values, false otherwise.
+ */
+bool xe_vma_has_default_mem_attrs(struct xe_vma *vma)
+{
+	return (vma->attr.atomic_access == DRM_XE_ATOMIC_UNDEFINED &&
+		vma->attr.pat_index ==  vma->attr.default_pat_index &&
+		vma->attr.preferred_loc.devmem_fd == DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE &&
+		vma->attr.preferred_loc.migration_policy == DRM_XE_MIGRATE_ALL_PAGES);
+}
+
 static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 				   struct xe_vma_ops *vops)
 {
@@ -2624,6 +2647,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 					.migration_policy = DRM_XE_MIGRATE_ALL_PAGES,
 				},
 				.atomic_access = DRM_XE_ATOMIC_UNDEFINED,
+				.default_pat_index = op->map.pat_index,
 				.pat_index = op->map.pat_index,
 			};
 
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index d5bc09ae640c..a4db843de540 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -66,6 +66,8 @@ static inline bool xe_vm_is_closed_or_banned(struct xe_vm *vm)
 struct xe_vma *
 xe_vm_find_overlapping_vma(struct xe_vm *vm, u64 start, u64 range);
 
+bool xe_vma_has_default_mem_attrs(struct xe_vma *vma);
+
 /**
  * xe_vm_has_scratch() - Whether the vm is configured for scratch PTEs
  * @vm: The vm
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 81d92d886578..351242c92c12 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -103,8 +103,14 @@ struct xe_vma_mem_attr {
 	 */
 	u32 atomic_access;
 
+	/**
+	 * @default_pat_index: The pat index for VMA set during first bind by user.
+	 */
+	u16 default_pat_index;
+
 	/**
 	 * @pat_index: The pat index to use when encoding the PTEs for this vma.
+	 * same as default_pat_index unless overwritten by madvise.
 	 */
 	u16 pat_index;
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 20/23] drm/xe: Reset VMA attributes to default in SVM garbage collector
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (18 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 19/23] drm/xe/vm: Add helper to check for default VMA memory attributes Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-24 21:50   ` Matthew Brost
  2025-07-29  5:41   ` Matthew Brost
  2025-07-22 13:35 ` [PATCH v5 21/23] drm/xe/vm: Add a delayed worker to merge fragmented vmas Himal Prasad Ghimiray
                   ` (2 subsequent siblings)
  22 siblings, 2 replies; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

Restore default memory attributes for VMAs during garbage collection
if they were modified by madvise. Reuse existing VMA if fully overlapping;
otherwise, allocate a new mirror VMA.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c |  34 +++++++++
 drivers/gpu/drm/xe/xe_vm.c  | 140 +++++++++++++++++++++++++-----------
 drivers/gpu/drm/xe/xe_vm.h  |   2 +
 3 files changed, 135 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index ba1233d0d5a2..79709dc066b9 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -255,7 +255,18 @@ static int __xe_svm_garbage_collector(struct xe_vm *vm,
 static int xe_svm_garbage_collector(struct xe_vm *vm)
 {
 	struct xe_svm_range *range;
+	struct xe_vma *vma;
+	u64 range_start;
+	u64 range_size;
+	u64 range_end;
 	int err;
+	struct xe_vma_mem_attr default_attr = {
+		.preferred_loc = {
+			.devmem_fd = DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE,
+			.migration_policy = DRM_XE_MIGRATE_ALL_PAGES,
+		},
+		.atomic_access = DRM_XE_ATOMIC_UNDEFINED,
+	};
 
 	lockdep_assert_held_write(&vm->lock);
 
@@ -270,6 +281,12 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
 		if (!range)
 			break;
 
+		range_start = xe_svm_range_start(range);
+		range_size = xe_svm_range_size(range);
+		range_end = xe_svm_range_end(range);
+
+		vma = xe_vm_find_vma_by_addr(vm, xe_svm_range_start(range));
+
 		list_del(&range->garbage_collector_link);
 		spin_unlock(&vm->svm.garbage_collector.lock);
 
@@ -282,7 +299,24 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
 			return err;
 		}
 
+		if (!xe_vma_has_default_mem_attrs(vma)) {
+			vm_dbg(&vm->xe->drm, "Existing VMA start=0x%016llx, vma_end=0x%016llx",
+			       xe_vma_start(vma), xe_vma_end(vma));
+
+			if (xe_vma_start(vma) == range_start && xe_vma_end(vma) == range_end) {
+				default_attr.pat_index = vma->attr.default_pat_index;
+				default_attr.default_pat_index  = vma->attr.default_pat_index;
+				vma->attr = default_attr;
+			} else {
+				vm_dbg(&vm->xe->drm, "Split VMA start=0x%016llx, vma_end=0x%016llx",
+				       range_start, range_end);
+				err = xe_vm_alloc_cpu_addr_mirror_vma(vm, range_start, range_size);
+				if (err)
+					return err;
+			}
+		}
 		spin_lock(&vm->svm.garbage_collector.lock);
+
 	}
 	spin_unlock(&vm->svm.garbage_collector.lock);
 
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index d3f08bf9a3ee..003c8209f8bd 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -4254,34 +4254,24 @@ bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool
 	}
 }
 
-/**
- * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
- * @vm: Pointer to the xe_vm structure
- * @start: Starting input address
- * @range: Size of the input range
- *
- * This function splits existing vma to create new vma for user provided input range
- *
- *  Return: 0 if success
- */
-int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
+static int xe_vm_alloc_vma(struct xe_vm *vm,
+			   u64 start, u64 range,
+			   enum drm_gpuvm_sm_map_ops_flags flags)
 {
 	struct xe_vma_ops vops;
 	struct drm_gpuva_ops *ops = NULL;
 	struct drm_gpuva_op *__op;
 	bool is_cpu_addr_mirror = false;
 	bool remap_op = false;
+	bool is_madvise = flags == DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE;
 	struct xe_vma_mem_attr tmp_attr;
+	u16 default_pat;
 	int err;
 
-	vm_dbg(&vm->xe->drm, "MADVISE IN: addr=0x%016llx, size=0x%016llx", start, range);
-
 	lockdep_assert_held_write(&vm->lock);
 
-	vm_dbg(&vm->xe->drm, "MADVISE_OPS_CREATE: addr=0x%016llx, size=0x%016llx", start, range);
 	ops = drm_gpuvm_sm_map_ops_create(&vm->gpuvm, start, range,
-					  DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE,
-					  NULL, start);
+					  flags, NULL, start);
 	if (IS_ERR(ops))
 		return PTR_ERR(ops);
 
@@ -4292,33 +4282,56 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
 
 	drm_gpuva_for_each_op(__op, ops) {
 		struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
+		struct xe_vma *vma = NULL;
 
-		if (__op->op == DRM_GPUVA_OP_REMAP) {
-			xe_assert(vm->xe, !remap_op);
-			remap_op = true;
+		if (!is_madvise) {
+			if (__op->op == DRM_GPUVA_OP_UNMAP) {
+				vma = gpuva_to_vma(op->base.unmap.va);
+				XE_WARN_ON(!xe_vma_has_default_mem_attrs(vma));
+				default_pat = vma->attr.default_pat_index;
+			}
 
-			if (xe_vma_is_cpu_addr_mirror(gpuva_to_vma(op->base.remap.unmap->va)))
-				is_cpu_addr_mirror = true;
-			else
-				is_cpu_addr_mirror = false;
-		}
+			if (__op->op == DRM_GPUVA_OP_REMAP) {
+				vma = gpuva_to_vma(op->base.remap.unmap->va);
+				default_pat = vma->attr.default_pat_index;
+			}
 
-		if (__op->op == DRM_GPUVA_OP_MAP) {
-			xe_assert(vm->xe, remap_op);
-			remap_op = false;
+			if (__op->op == DRM_GPUVA_OP_MAP) {
+				op->map.is_cpu_addr_mirror = true;
+				op->map.pat_index = default_pat;
+			}
+		} else {
+			if (__op->op == DRM_GPUVA_OP_REMAP) {
+				vma = gpuva_to_vma(op->base.remap.unmap->va);
+				xe_assert(vm->xe, !remap_op);
+				remap_op = true;
 
-			/* In case of madvise ops DRM_GPUVA_OP_MAP is always after
-			 * DRM_GPUVA_OP_REMAP, so ensure we assign op->map.is_cpu_addr_mirror true
-			 * if REMAP is for xe_vma_is_cpu_addr_mirror vma
-			 */
-			op->map.is_cpu_addr_mirror = is_cpu_addr_mirror;
-		}
+				if (xe_vma_is_cpu_addr_mirror(vma))
+					is_cpu_addr_mirror = true;
+				else
+					is_cpu_addr_mirror = false;
+			}
 
+			if (__op->op == DRM_GPUVA_OP_MAP) {
+				xe_assert(vm->xe, remap_op);
+				remap_op = false;
+				/*
+				 * In case of madvise ops DRM_GPUVA_OP_MAP is
+				 * always after DRM_GPUVA_OP_REMAP, so ensure
+				 * we assign op->map.is_cpu_addr_mirror true
+				 * if REMAP is for xe_vma_is_cpu_addr_mirror vma
+				 */
+				op->map.is_cpu_addr_mirror = is_cpu_addr_mirror;
+			}
+		}
 		print_op(vm->xe, __op);
 	}
 
 	xe_vma_ops_init(&vops, vm, NULL, NULL, 0);
-	vops.flags |= XE_VMA_OPS_FLAG_MADVISE;
+
+	if (is_madvise)
+		vops.flags |= XE_VMA_OPS_FLAG_MADVISE;
+
 	err = vm_bind_ioctl_ops_parse(vm, ops, &vops);
 	if (err)
 		goto unwind_ops;
@@ -4330,15 +4343,20 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
 		struct xe_vma *vma;
 
 		if (__op->op == DRM_GPUVA_OP_UNMAP) {
-			/* There should be no unmap */
-			XE_WARN_ON("UNEXPECTED UNMAP");
-			xe_vma_destroy(gpuva_to_vma(op->base.unmap.va), NULL);
+			vma = gpuva_to_vma(op->base.unmap.va);
+			/* There should be no unmap for madvise */
+			if (is_madvise)
+				XE_WARN_ON("UNEXPECTED UNMAP");
+
+			xe_vma_destroy(vma, NULL);
 		} else if (__op->op == DRM_GPUVA_OP_REMAP) {
 			vma = gpuva_to_vma(op->base.remap.unmap->va);
-			/* Store attributes for REMAP UNMAPPED VMA, so they can be assigned
-			 * to newly MAP created vma.
+			/* In case of madvise ops Store attributes for REMAP UNMAPPED
+			 * VMA, so they can be assigned to newly MAP created vma.
 			 */
-			tmp_attr = vma->attr;
+			if (is_madvise)
+				tmp_attr = vma->attr;
+
 			xe_vma_destroy(gpuva_to_vma(op->base.remap.unmap->va), NULL);
 		} else if (__op->op == DRM_GPUVA_OP_MAP) {
 			vma = op->map.vma;
@@ -4346,7 +4364,8 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
 			 * Therefore temp_attr will always have sane values, making it safe to
 			 * copy them to new vma.
 			 */
-			vma->attr = tmp_attr;
+			if (is_madvise)
+				vma->attr = tmp_attr;
 		}
 	}
 
@@ -4360,3 +4379,42 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
 	drm_gpuva_ops_free(&vm->gpuvm, ops);
 	return err;
 }
+
+/**
+ * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
+ * @vm: Pointer to the xe_vm structure
+ * @start: Starting input address
+ * @range: Size of the input range
+ *
+ * This function splits existing vma to create new vma for user provided input range
+ *
+ * Return: 0 if success
+ */
+int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
+{
+	lockdep_assert_held_write(&vm->lock);
+
+	vm_dbg(&vm->xe->drm, "MADVISE_OPS_CREATE: addr=0x%016llx, size=0x%016llx", start, range);
+
+	return xe_vm_alloc_vma(vm, start, range, DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE);
+}
+
+/**
+ * xe_vm_alloc_cpu_addr_mirror_vma - Allocate CPU addr mirror vma
+ * @vm: Pointer to the xe_vm structure
+ * @start: Starting input address
+ * @range: Size of the input range
+ *
+ * This function splits/merges existing vma to create new vma for user provided input range
+ *
+ * Return: 0 if success
+ */
+int xe_vm_alloc_cpu_addr_mirror_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
+{
+	lockdep_assert_held_write(&vm->lock);
+
+	vm_dbg(&vm->xe->drm, "CPU_ADDR_MIRROR_VMA_OPS_CREATE: addr=0x%016llx, size=0x%016llx",
+	       start, range);
+
+	return xe_vm_alloc_vma(vm, start, range, DRM_GPUVM_SM_MAP_NOT_MADVISE);
+}
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index a4db843de540..f7b9ad83685a 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -177,6 +177,8 @@ bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool
 
 int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
 
+int xe_vm_alloc_cpu_addr_mirror_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
+
 /**
  * to_userptr_vma() - Return a pointer to an embedding userptr vma
  * @vma: Pointer to the embedded struct xe_vma
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 21/23] drm/xe/vm: Add a delayed worker to merge fragmented vmas
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (19 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 20/23] drm/xe: Reset VMA attributes to default in SVM garbage collector Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-29  4:39   ` Matthew Brost
  2025-07-22 13:35 ` [PATCH v5 22/23] drm/xe: Enable madvise ioctl for xe Himal Prasad Ghimiray
  2025-07-22 13:35 ` [PATCH v5 23/23] drm/xe/uapi: Add UAPI for querying VMA count and memory attributes Himal Prasad Ghimiray
  22 siblings, 1 reply; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

During initial mirror bind initialize and start the delayed work item
responsible for merging adjacent CPU address mirror VMAs with default
memory attributes. This function sets the merge_active flag and schedules
the work to run after a delay, allowing batching of VMA updates.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c       | 126 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm_types.h |  15 ++++
 2 files changed, 141 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 003c8209f8bd..bee849167c0d 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1160,6 +1160,127 @@ static void xe_vma_free(struct xe_vma *vma)
 		kfree(vma);
 }
 
+struct va_range {
+	u64 start;
+	u64 end;
+};
+
+static void add_merged_range(struct va_range **ranges, int *count, int *capacity,
+			     u64 start, u64 end)
+{
+	const int array_size  = 8;
+	struct va_range *new_ranges;
+	int new_capacity;
+
+	if (*count == *capacity) {
+		new_capacity = *capacity ? *capacity * 2 : array_size;
+		new_ranges = krealloc(*ranges, new_capacity * sizeof(**ranges), GFP_KERNEL);
+		if (!new_ranges)
+			return;
+
+		*ranges = new_ranges;
+		*capacity = new_capacity;
+	}
+	(*ranges)[(*count)++] = (struct va_range){ .start = start, .end = end };
+}
+
+static void xe_vm_vmas_merge_worker(struct work_struct *work)
+{
+	struct xe_vm *vm = container_of(to_delayed_work(work), struct xe_vm, merge_vmas_work);
+	struct drm_gpuva *gpuva, *next = NULL;
+	struct va_range *merged_ranges = NULL;
+	int merge_count = 0, merge_capacity = 0;
+	bool in_merge = false;
+	u64 merge_start = 0, merge_end = 0;
+	int merge_len = 0;
+
+	if (!vm->merge_active)
+		return;
+
+	down_write(&vm->lock);
+
+	drm_gpuvm_for_each_va_safe(gpuva, next, &vm->gpuvm) {
+		struct xe_vma *vma = gpuva_to_vma(gpuva);
+
+		if (!xe_vma_is_cpu_addr_mirror(vma) || !xe_vma_has_default_mem_attrs(vma)) {
+			if (in_merge && merge_len > 1)
+				add_merged_range(&merged_ranges, &merge_count, &merge_capacity,
+						 merge_start, merge_end);
+
+			in_merge = false;
+			merge_len = 0;
+			continue;
+		}
+
+		if (!in_merge) {
+			merge_start = xe_vma_start(vma);
+			merge_end = xe_vma_end(vma);
+			in_merge = true;
+			merge_len = 1;
+		} else if (xe_vma_start(vma) == merge_end && xe_vma_has_default_mem_attrs(vma)) {
+			merge_end = xe_vma_end(vma);
+			merge_len++;
+		} else {
+			if (merge_len > 1)
+				add_merged_range(&merged_ranges, &merge_count, &merge_capacity,
+						 merge_start, merge_end);
+			merge_start = xe_vma_start(vma);
+			merge_end = xe_vma_end(vma);
+			merge_len = 1;
+		}
+	}
+
+	if (in_merge && merge_len > 1) {
+		add_merged_range(&merged_ranges, &merge_count, &merge_capacity,
+				 merge_start, merge_end);
+	}
+
+	for (int i = 0; i < merge_count; i++) {
+		vm_dbg(&vm->xe->drm, "Merged VA range %d: start=0x%016llx, end=0x%016llx\n",
+		       i, merged_ranges[i].start, merged_ranges[i].end);
+
+		if (xe_vm_alloc_cpu_addr_mirror_vma(vm, merged_ranges[i].start,
+						    merged_ranges[i].end - merged_ranges[i].start))
+			break;
+	}
+
+	up_write(&vm->lock);
+	kfree(merged_ranges);
+	schedule_delayed_work(&vm->merge_vmas_work, msecs_to_jiffies(5000));
+}
+
+/*
+ * xe_vm_start_vmas_merge - Initialize and schedule VMA merge work
+ * @vm: Pointer to the xe_vm structure
+ *
+ * Initializes the delayed work item responsible for merging adjacent
+ * CPU address mirror VMAs with default memory attributes. This function
+ * sets the merge_active flag and schedules the work to run after a delay,
+ * allowing batching of VMA updates.
+ */
+static void xe_vm_start_vmas_merge(struct xe_vm *vm)
+{
+	if (vm->merge_active)
+		return;
+
+	vm->merge_active = true;
+	INIT_DELAYED_WORK(&vm->merge_vmas_work, xe_vm_vmas_merge_worker);
+	schedule_delayed_work(&vm->merge_vmas_work, msecs_to_jiffies(5000));
+}
+
+/*
+ * xe_vm_stop_vmas_merge - Cancel scheduled VMA merge work
+ * @vm: Pointer to the xe_vm structure
+ */
+static void xe_vm_stop_vmas_merge(struct xe_vm *vm)
+{
+	if (!vm->merge_active)
+		return;
+
+	vm->merge_active = false;
+	cancel_delayed_work_sync(&vm->merge_vmas_work);
+}
+
 #define VMA_CREATE_FLAG_READ_ONLY		BIT(0)
 #define VMA_CREATE_FLAG_IS_NULL			BIT(1)
 #define VMA_CREATE_FLAG_DUMPABLE		BIT(2)
@@ -1269,6 +1390,9 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 		xe_vm_get(vm);
 	}
 
+	if (xe_vma_is_cpu_addr_mirror(vma))
+		xe_vm_start_vmas_merge(vm);
+
 	return vma;
 }
 
@@ -1982,6 +2106,8 @@ static void vm_destroy_work_func(struct work_struct *w)
 	/* xe_vm_close_and_put was not called? */
 	xe_assert(xe, !vm->size);
 
+	xe_vm_stop_vmas_merge(vm);
+
 	if (xe_vm_in_preempt_fence_mode(vm))
 		flush_work(&vm->preempt.rebind_work);
 
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 351242c92c12..c4f3542eb464 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -374,6 +374,21 @@ struct xe_vm {
 	bool batch_invalidate_tlb;
 	/** @xef: XE file handle for tracking this VM's drm client */
 	struct xe_file *xef;
+
+	/**
+	 * @merge_vmas_work: Delayed work item used to merge CPU address mirror VMAs.
+	 * This work is scheduled to scan the GPU virtual memory space and
+	 * identify adjacent CPU address mirror VMAs that have default memory
+	 * attributes. When such VMAs are found, they are merged into a single
+	 * larger VMA to reduce fragmentation. The merging process is triggered
+	 * asynchronously via a delayed workqueue avoid blocking critical paths
+	 * and to batch updates when possible.
+	 */
+	struct delayed_work merge_vmas_work;
+
+	/** @merge_active: True if merge_vmas_work has been initialized */
+	bool merge_active;
+
 };
 
 /** struct xe_vma_op_map - VMA map operation */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 22/23] drm/xe: Enable madvise ioctl for xe
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (20 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 21/23] drm/xe/vm: Add a delayed worker to merge fragmented vmas Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-29  4:34   ` Matthew Brost
  2025-07-22 13:35 ` [PATCH v5 23/23] drm/xe/uapi: Add UAPI for querying VMA count and memory attributes Himal Prasad Ghimiray
  22 siblings, 1 reply; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray

Ioctl enables setting up of memory attributes in user provided range.

Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 6dc84e4ed281..b02c4ae0fdbf 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -63,6 +63,7 @@
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_ttm_sys_mgr.h"
 #include "xe_vm.h"
+#include "xe_vm_madvise.h"
 #include "xe_vram.h"
 #include "xe_vsec.h"
 #include "xe_wait_user_fence.h"
@@ -200,6 +201,7 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(XE_WAIT_USER_FENCE, xe_wait_user_fence_ioctl,
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(XE_OBSERVATION, xe_observation_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(XE_MADVISE, xe_vm_madvise_ioctl, DRM_RENDER_ALLOW),
 };
 
 static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 23/23] drm/xe/uapi: Add UAPI for querying VMA count and memory attributes
  2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
                   ` (21 preceding siblings ...)
  2025-07-22 13:35 ` [PATCH v5 22/23] drm/xe: Enable madvise ioctl for xe Himal Prasad Ghimiray
@ 2025-07-22 13:35 ` Himal Prasad Ghimiray
  2025-07-29  5:37   ` Matthew Brost
  22 siblings, 1 reply; 55+ messages in thread
From: Himal Prasad Ghimiray @ 2025-07-22 13:35 UTC (permalink / raw)
  To: intel-xe
  Cc: Matthew Brost, Thomas Hellström, Himal Prasad Ghimiray,
	Shuicheng Lin

Introduce the DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS ioctl to allow
userspace to query memory attributes of VMAs within a user specified
virtual address range.

Userspace first calls the ioctl with num_mem_ranges = 0,
sizeof_mem_ranges_attr = 0 and vector_of_vma_mem_attr = NULL to retrieve
the number of memory ranges (vmas) and size of each memory range attribute.
Then, it allocates a buffer of that size and calls the ioctl again to fill
the buffer with memory range attributes.

This two-step interface allows userspace to first query the required
buffer size, then retrieve detailed attributes efficiently.

v2 (Matthew Brost)
- Use same ioctl to overload functionality

v3
- Add kernel-doc

v4
- Make uapi future proof by passing struct size (Matthew Brost)
- make lock interruptible (Matthew Brost)
- set reserved bits to zero (Matthew Brost)
- s/__copy_to_user/copy_to_user (Matthew Brost)
- Avod using VMA term in uapi (Thomas)
- xe_vm_put(vm) is missing (Shuicheng)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c |   2 +
 drivers/gpu/drm/xe/xe_vm.c     | 101 ++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm.h     |   2 +-
 include/uapi/drm/xe_drm.h      | 137 +++++++++++++++++++++++++++++++++
 4 files changed, 241 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index b02c4ae0fdbf..1e77570db531 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -202,6 +202,8 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(XE_OBSERVATION, xe_observation_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(XE_MADVISE, xe_vm_madvise_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(XE_VM_QUERY_MEM_RANGE_ATTRS, xe_vm_query_vmas_attrs_ioctl,
+			  DRM_RENDER_ALLOW),
 };
 
 static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index bee849167c0d..e54ab4dce8df 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2297,6 +2297,107 @@ int xe_vm_destroy_ioctl(struct drm_device *dev, void *data,
 	return err;
 }
 
+static int xe_vm_query_vmas(struct xe_vm *vm, u64 start, u64 end)
+{
+	struct drm_gpuva *gpuva;
+	u32 num_vmas = 0;
+
+	lockdep_assert_held(&vm->lock);
+	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, start, end)
+		num_vmas++;
+
+	return num_vmas;
+}
+
+static int get_mem_attrs(struct xe_vm *vm, u32 *num_vmas, u64 start,
+			 u64 end, struct drm_xe_mem_range_attr *attrs)
+{
+	struct drm_gpuva *gpuva;
+	int i = 0;
+
+	lockdep_assert_held(&vm->lock);
+
+	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, start, end) {
+		struct xe_vma *vma = gpuva_to_vma(gpuva);
+
+		if (i == *num_vmas)
+			return -ENOSPC;
+
+		attrs[i].start = xe_vma_start(vma);
+		attrs[i].end = xe_vma_end(vma);
+		attrs[i].atomic.val = vma->attr.atomic_access;
+		attrs[i].pat_index.val = vma->attr.pat_index;
+		attrs[i].preferred_mem_loc.devmem_fd = vma->attr.preferred_loc.devmem_fd;
+		attrs[i].preferred_mem_loc.migration_policy =
+		vma->attr.preferred_loc.migration_policy;
+
+		i++;
+	}
+
+	if (i <  (*num_vmas - 1))
+		*num_vmas = i;
+	return 0;
+}
+
+int xe_vm_query_vmas_attrs_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
+{
+	struct xe_device *xe = to_xe_device(dev);
+	struct xe_file *xef = to_xe_file(file);
+	struct drm_xe_mem_range_attr *mem_attrs;
+	struct drm_xe_vm_query_mem_range_attr *args = data;
+	u64 __user *attrs_user = u64_to_user_ptr(args->vector_of_mem_attr);
+	struct xe_vm *vm;
+	int err = 0;
+
+	if (XE_IOCTL_DBG(xe,
+			 ((args->num_mem_ranges == 0 &&
+			  (attrs_user || args->sizeof_mem_range_attr != 0)) ||
+			 (args->num_mem_ranges > 0 &&
+			  (!attrs_user || args->sizeof_mem_range_attr == 0)))))
+		return -EINVAL;
+
+	vm = xe_vm_lookup(xef, args->vm_id);
+	if (XE_IOCTL_DBG(xe, !vm))
+		return -EINVAL;
+
+	err = down_read_interruptible(&vm->lock);
+	if (err)
+		goto put_vm;
+
+	attrs_user = u64_to_user_ptr(args->vector_of_mem_attr);
+
+	if (args->num_mem_ranges == 0 && !attrs_user) {
+		args->num_mem_ranges = xe_vm_query_vmas(vm, args->start, args->start + args->range);
+		args->sizeof_mem_range_attr = sizeof(struct drm_xe_mem_range_attr);
+		goto unlock_vm;
+	}
+
+	mem_attrs = kvmalloc_array(args->num_mem_ranges, args->sizeof_mem_range_attr,
+				   GFP_KERNEL | __GFP_ACCOUNT |
+				   __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+	if (!mem_attrs) {
+		err = args->num_mem_ranges > 1 ? -ENOBUFS : -ENOMEM;
+		goto unlock_vm;
+	}
+
+	memset(mem_attrs, 0, args->num_mem_ranges * args->sizeof_mem_range_attr);
+	err = get_mem_attrs(vm, &args->num_mem_ranges, args->start,
+			    args->start + args->range, mem_attrs);
+	if (err)
+		goto free_mem_attrs;
+
+	err = copy_to_user(attrs_user, mem_attrs,
+			   args->sizeof_mem_range_attr * args->num_mem_ranges);
+
+free_mem_attrs:
+	kvfree(mem_attrs);
+unlock_vm:
+	up_read(&vm->lock);
+put_vm:
+	xe_vm_put(vm);
+	return err;
+}
+
 static bool vma_matches(struct xe_vma *vma, u64 page_addr)
 {
 	if (page_addr > xe_vma_end(vma) - 1 ||
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index f7b9ad83685a..6f25d6820991 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -199,7 +199,7 @@ int xe_vm_destroy_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file);
 int xe_vm_bind_ioctl(struct drm_device *dev, void *data,
 		     struct drm_file *file);
-
+int xe_vm_query_vmas_attrs_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
 void xe_vm_close_and_put(struct xe_vm *vm);
 
 static inline bool xe_vm_in_fault_mode(struct xe_vm *vm)
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 8f1d48664424..ee328bcb8bfa 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -82,6 +82,7 @@ extern "C" {
  *  - &DRM_IOCTL_XE_WAIT_USER_FENCE
  *  - &DRM_IOCTL_XE_OBSERVATION
  *  - &DRM_IOCTL_XE_MADVISE
+ *  - &DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS
  */
 
 /*
@@ -104,6 +105,7 @@ extern "C" {
 #define DRM_XE_WAIT_USER_FENCE		0x0a
 #define DRM_XE_OBSERVATION		0x0b
 #define DRM_XE_MADVISE			0x0c
+#define DRM_XE_VM_QUERY_MEM_RANGE_ATTRS	0x0d
 
 /* Must be kept compact -- no holes */
 
@@ -120,6 +122,7 @@ extern "C" {
 #define DRM_IOCTL_XE_WAIT_USER_FENCE		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence)
 #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
 #define DRM_IOCTL_XE_MADVISE			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise)
+#define DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS	DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_VM_QUERY_MEM_RANGE_ATTRS, struct drm_xe_vm_query_mem_range_attr)
 
 /**
  * DOC: Xe IOCTL Extensions
@@ -2110,6 +2113,140 @@ struct drm_xe_madvise {
 	__u64 reserved[2];
 };
 
+/**
+ * struct drm_xe_mem_range_attr - Output of &DRM_IOCTL_XE_VM_QUERY_MEM_RANGES_ATTRS
+ *
+ * This structure is provided by userspace and filled by KMD in response to the
+ * DRM_IOCTL_XE_VM_QUERY_MEM_RANGES_ATTRS ioctl. It describes memory attributes of
+ * a memory ranges within a user specified address range in a VM.
+ *
+ * The structure includes information such as atomic access policy,
+ * page attribute table (PAT) index, and preferred memory location.
+ * Userspace allocates an array of these structures and passes a pointer to the
+ * ioctl to retrieve attributes for each memory ranges
+ *
+ * @extensions: Pointer to the first extension struct, if any
+ * @start: Start address of the memory range
+ * @end: End address of the virtual memory range
+ *
+ */
+struct drm_xe_mem_range_attr {
+	 /** @extensions: Pointer to the first extension struct, if any */
+	__u64 extensions;
+
+	/** @start: start of the memory range */
+	__u64 start;
+
+	/** @end: end of the memory range */
+	__u64 end;
+
+	/** @preferred_mem_loc: preferred memory location */
+	struct {
+		/** @preferred_mem_loc.devmem_fd: fd for preferred loc */
+		__u32 devmem_fd;
+
+		/** @preferred_mem_loc.migration_policy: Page migration policy */
+		__u32 migration_policy;
+	} preferred_mem_loc;
+
+	struct {
+		/** @atomic.val: atomic attribute */
+		__u32 val;
+
+		/** @atomic.reserved: Reserved */
+		__u32 reserved;
+	} atomic;
+
+	struct {
+		/** @pat_index.val: PAT index */
+		__u32 val;
+
+		/** @pat_index.reserved: Reserved */
+		__u32 reserved;
+	} pat_index;
+
+	/** @reserved: Reserved */
+	__u64 reserved[2];
+};
+
+/**
+ * struct drm_xe_vm_query_mem_range_attr - Input of &DRM_IOCTL_XE_VM_QUERY_MEM_ATTRIBUTES
+ *
+ * This structure is used to query memory attributes of memory regions
+ * within a user specified address range in a VM. It provides detailed
+ * information about each memory range, including atomic access policy,
+ * page attribute table (PAT) index, and preferred memory location.
+ *
+ * Userspace first calls the ioctl with @num_mem_ranges = 0,
+ * @sizeof_mem_ranges_attr = 0 and @vector_of_vma_mem_attr = NULL to retrieve
+ * the number of memory regions and size of each memory range attribute.
+ * Then, it allocates a buffer of that size and calls the ioctl again to fill
+ * the buffer with memory range attributes.
+ *
+ * If second call fails with -ENOSPC, it means memory ranges changed between
+ * first call and now, retry IOCTL again with @num_mem_ranges = 0,
+ * @sizeof_mem_ranges_attr = 0 and @vector_of_vma_mem_attr = NULL followed by
+ * Second ioctl call.
+ *
+ * Example:
+ *
+ * .. code-block:: C
+ *    struct drm_xe_vm_query_mem_range_attr query = {
+ *         .vm_id = vm_id,
+ *         .start = 0x100000,
+ *         .range = 0x2000,
+ *     };
+ *
+ *    // First ioctl call to get num of mem regions and sizeof each attribute
+ *    ioctl(fd, DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS, &query);
+ *
+ *    // Allocate buffer for the memory region attributes
+ *    void *ptr = malloc(query.num_mem_ranges * query.sizeof_mem_range_attr);
+ *
+ *    query.vector_of_mem_attr = (uintptr_t)ptr;
+ *
+ *    // Second ioctl call to actually fill the memory attributes
+ *    ioctl(fd, DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS, &query);
+ *
+ *    // Iterate over the returned memory region attributes
+ *    for (unsigned int i = 0; i < query.num_mem_ranges; ++i) {
+ *       struct drm_xe_mem_range_attr *attr = (struct drm_xe_mem_range_attr *)ptr;
+ *
+ *       // Do something with attr
+ *
+ *       // Move pointer by one entry
+ *       ptr += query.sizeof_mem_range_attr;
+ *     }
+ *
+ *    free(ptr);
+ */
+struct drm_xe_vm_query_mem_range_attr {
+	/** @extensions: Pointer to the first extension struct, if any */
+	__u64 extensions;
+
+	/** @vm_id: vm_id of the virtual range */
+	__u32 vm_id;
+
+	/** @num_mem_ranges: number of mem_ranges in range */
+	__u32 num_mem_ranges;
+
+	/** @start: start of the virtual address range */
+	__u64 start;
+
+	/** @range: size of the virtual address range */
+	__u64 range;
+
+	/** @sizeof_mem_range_attr: size of struct drm_xe_mem_range_attr */
+	__u64 sizeof_mem_range_attr;
+
+	/** @vector_of_ops: userptr to array of struct drm_xe_mem_range_attr */
+	__u64 vector_of_mem_attr;
+
+	/** @reserved: Reserved */
+	__u64 reserved[2];
+
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops
  2025-07-22 13:35 ` [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops Himal Prasad Ghimiray
@ 2025-07-22 13:38   ` Danilo Krummrich
  2025-07-24  0:43     ` Matthew Brost
  2025-07-24 10:02     ` Ghimiray, Himal Prasad
  2025-07-27 21:18   ` Matthew Brost
  1 sibling, 2 replies; 55+ messages in thread
From: Danilo Krummrich @ 2025-07-22 13:38 UTC (permalink / raw)
  To: Himal Prasad Ghimiray
  Cc: intel-xe, Matthew Brost, Thomas Hellström, Danilo Krummrich,
	Boris Brezillon, Caterina Shablia, dri-devel

(Cc: Caterina)

On Tue Jul 22, 2025 at 3:35 PM CEST, Himal Prasad Ghimiray wrote:
> - DRM_GPUVM_SM_MAP_NOT_MADVISE: Default sm_map operations for the input
>   range.
>
> - DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
>   drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
> user-provided range and split the existing non-GEM object VMA if the
> start or end of the input range lies within it. The operations can
> create up to 2 REMAPS and 2 MAPs. The purpose of this operation is to be
> used by the Xe driver to assign attributes to GPUVMA's within the
> user-defined range. Unlike drm_gpuvm_sm_map_ops_flags in default mode,
> the operation with this flag will never have UNMAPs and
> merges, and can be without any final operations.
>
> v2
> - use drm_gpuvm_sm_map_ops_create with flags instead of defining new
>   ops_create (Danilo)
> - Add doc (Danilo)
>
> v3
> - Fix doc
> - Fix unmapping check
>
> v4
> - Fix mapping for non madvise ops
>
> Cc: Danilo Krummrich <dakr@redhat.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Boris Brezillon <bbrezillon@kernel.org>
> Cc: <dri-devel@lists.freedesktop.org>
> Signed-off-by: Himal Prasad Ghimiray<himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/drm_gpuvm.c            | 93 ++++++++++++++++++++------
>  drivers/gpu/drm/nouveau/nouveau_uvmm.c |  1 +
>  drivers/gpu/drm/xe/xe_vm.c             |  1 +

What about the other drivers using GPUVM, aren't they affected by the changes?

>  include/drm/drm_gpuvm.h                | 25 ++++++-
>  4 files changed, 98 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> index e89b932e987c..c7779588ea38 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -2103,10 +2103,13 @@ static int
>  __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>  		   const struct drm_gpuvm_ops *ops, void *priv,
>  		   u64 req_addr, u64 req_range,
> +		   enum drm_gpuvm_sm_map_ops_flags flags,

Please coordinate with Boris and Caterina here. They're adding a new request
structure, struct drm_gpuvm_map_req.

I think we can define it as

	struct drm_gpuvm_map_req {
		struct drm_gpuva_op_map map;
		struct drm_gpuvm_sm_map_ops_flags flags;
	}

eventually.

Please also coordinate on the changes in __drm_gpuvm_sm_map() below regarding
Caterina's series [1], it looks like they're conflicting.

[1] https://lore.kernel.org/all/20250707170442.1437009-1-caterina.shablia@collabora.com/

> +/**
> + * enum drm_gpuvm_sm_map_ops_flags - flags for drm_gpuvm split/merge ops
> + */
> +enum drm_gpuvm_sm_map_ops_flags {
> +	/**
> +	 * @DRM_GPUVM_SM_MAP_NOT_MADVISE: DEFAULT sm_map ops
> +	 */
> +	DRM_GPUVM_SM_MAP_NOT_MADVISE = 0,

Why would we name this "NOT_MADVISE"? What if we add more flags for other
purposes?

> +	/**
> +	 * @DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
> +	 * drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
> +	 * user-provided range and split the existing non-GEM object VMA if the
> +	 * start or end of the input range lies within it. The operations can
> +	 * create up to 2 REMAPS and 2 MAPs. Unlike drm_gpuvm_sm_map_ops_flags
> +	 * in default mode, the operation with this flag will never have UNMAPs and
> +	 * merges, and can be without any final operations.
> +	 */
> +	DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE = BIT(0),
> +};

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 13/23] drm/xe/svm: Support DRM_XE_SVM_ATTR_PAT memory attribute
  2025-07-22 13:35 ` [PATCH v5 13/23] drm/xe/svm: Support DRM_XE_SVM_ATTR_PAT memory attribute Himal Prasad Ghimiray
@ 2025-07-23 16:55   ` Ghimiray, Himal Prasad
  0 siblings, 0 replies; 55+ messages in thread
From: Ghimiray, Himal Prasad @ 2025-07-23 16:55 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, Thomas Hellström

Need change in commit
s/DRM_XE_SVM_ATTR_PAT/DRM_XE_SVM_MEM_RANGE_ATTR_PAT
will fix in next version.

On 22-07-2025 19:05, Himal Prasad Ghimiray wrote:
> This attributes sets the pat_index for the svm used vma range, which is
> utilized to ascertain the coherence.
> 
> v2 (Matthew Brost)
> - Pat index sanity check
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_vm_madvise.c | 25 +++++++++++++++++++++++--
>   1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index 17959257ee1d..1dc4d19a5f2a 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -9,6 +9,7 @@
>   #include <drm/xe_drm.h>
>   
>   #include "xe_bo.h"
> +#include "xe_pat.h"
>   #include "xe_pt.h"
>   #include "xe_svm.h"
>   
> @@ -111,7 +112,13 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
>   			      struct xe_vma **vmas, int num_vmas,
>   			      struct drm_xe_madvise *op)
>   {
> -	/* Implementation pending */
> +	int i;
> +
> +	xe_assert(vm->xe, op->type == DRM_XE_MEM_RANGE_ATTR_PAT);
> +
> +	for (i = 0; i < num_vmas; i++)
> +		vmas[i]->attr.pat_index = op->pat_index.val;
> +
>   }
>   
>   typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
> @@ -219,8 +226,22 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
>   
>   		break;
>   	case DRM_XE_MEM_RANGE_ATTR_PAT:
> -		/*TODO: Add valid pat check */
> +	{
> +		u16 coh_mode = xe_pat_index_get_coh_mode(xe, args->pat_index.val);
> +
> +		if (XE_IOCTL_DBG(xe, !coh_mode))
> +			return false;
> +
> +		if (XE_WARN_ON(coh_mode > XE_COH_AT_LEAST_1WAY))
> +			return false;
> +
> +		if (XE_IOCTL_DBG(xe, args->pat_index.pad))
> +			return false;
> +
> +		if (XE_IOCTL_DBG(xe, args->pat_index.reserved))
> +			return false;
>   		break;
> +	}
>   	default:
>   		if (XE_IOCTL_DBG(xe, 1))
>   			return false;


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops
  2025-07-22 13:38   ` Danilo Krummrich
@ 2025-07-24  0:43     ` Matthew Brost
  2025-07-24 10:05       ` Ghimiray, Himal Prasad
  2025-07-24 10:32       ` Caterina Shablia
  2025-07-24 10:02     ` Ghimiray, Himal Prasad
  1 sibling, 2 replies; 55+ messages in thread
From: Matthew Brost @ 2025-07-24  0:43 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Himal Prasad Ghimiray, intel-xe, Thomas Hellström,
	Danilo Krummrich, Boris Brezillon, Caterina Shablia, dri-devel

On Tue, Jul 22, 2025 at 03:38:14PM +0200, Danilo Krummrich wrote:
> (Cc: Caterina)
> 
> On Tue Jul 22, 2025 at 3:35 PM CEST, Himal Prasad Ghimiray wrote:
> > - DRM_GPUVM_SM_MAP_NOT_MADVISE: Default sm_map operations for the input
> >   range.
> >
> > - DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
> >   drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
> > user-provided range and split the existing non-GEM object VMA if the
> > start or end of the input range lies within it. The operations can
> > create up to 2 REMAPS and 2 MAPs. The purpose of this operation is to be
> > used by the Xe driver to assign attributes to GPUVMA's within the
> > user-defined range. Unlike drm_gpuvm_sm_map_ops_flags in default mode,
> > the operation with this flag will never have UNMAPs and
> > merges, and can be without any final operations.
> >
> > v2
> > - use drm_gpuvm_sm_map_ops_create with flags instead of defining new
> >   ops_create (Danilo)
> > - Add doc (Danilo)
> >
> > v3
> > - Fix doc
> > - Fix unmapping check
> >
> > v4
> > - Fix mapping for non madvise ops
> >
> > Cc: Danilo Krummrich <dakr@redhat.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Boris Brezillon <bbrezillon@kernel.org>
> > Cc: <dri-devel@lists.freedesktop.org>
> > Signed-off-by: Himal Prasad Ghimiray<himal.prasad.ghimiray@intel.com>
> > ---
> >  drivers/gpu/drm/drm_gpuvm.c            | 93 ++++++++++++++++++++------
> >  drivers/gpu/drm/nouveau/nouveau_uvmm.c |  1 +
> >  drivers/gpu/drm/xe/xe_vm.c             |  1 +
> 
> What about the other drivers using GPUVM, aren't they affected by the changes?
> 

Yes, this seemly would break the build or other users. If the baseline
includes the patch below that I suggest to pull in this is a moot point
though.

> >  include/drm/drm_gpuvm.h                | 25 ++++++-
> >  4 files changed, 98 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> > index e89b932e987c..c7779588ea38 100644
> > --- a/drivers/gpu/drm/drm_gpuvm.c
> > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > @@ -2103,10 +2103,13 @@ static int
> >  __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> >  		   const struct drm_gpuvm_ops *ops, void *priv,
> >  		   u64 req_addr, u64 req_range,
> > +		   enum drm_gpuvm_sm_map_ops_flags flags,
> 
> Please coordinate with Boris and Caterina here. They're adding a new request
> structure, struct drm_gpuvm_map_req.
> 
> I think we can define it as
> 
> 	struct drm_gpuvm_map_req {
> 		struct drm_gpuva_op_map map;
> 		struct drm_gpuvm_sm_map_ops_flags flags;
> 	}

+1, I see the patch [2] and the suggested change to drm_gpuva_op_map
[3]. Both patch and your suggestion look good to me.

Perhaps we try to accelerate [2] landing ahead of either series as
overall just looks like a good cleanup which can be merged asap.

Himal - I'd rebase on top [2], with Danilo suggestion in [3] if this
hasn't landed by your next rev.

[2] https://lore.kernel.org/all/20250707170442.1437009-4-caterina.shablia@collabora.com/
[3] https://lore.kernel.org/all/DB61N61AKIJ3.FG7GUJBG386P@kernel.org/

> 
> eventually.
> 
> Please also coordinate on the changes in __drm_gpuvm_sm_map() below regarding
> Caterina's series [1], it looks like they're conflicting.
> 

It looks pretty minor actually. I'm sure if really matter who this is
race but yes, always good to coordinate.

> [1] https://lore.kernel.org/all/20250707170442.1437009-1-caterina.shablia@collabora.com/
> 
> > +/**
> > + * enum drm_gpuvm_sm_map_ops_flags - flags for drm_gpuvm split/merge ops
> > + */
> > +enum drm_gpuvm_sm_map_ops_flags {
> > +	/**
> > +	 * @DRM_GPUVM_SM_MAP_NOT_MADVISE: DEFAULT sm_map ops
> > +	 */
> > +	DRM_GPUVM_SM_MAP_NOT_MADVISE = 0,
> 
> Why would we name this "NOT_MADVISE"? What if we add more flags for other
> purposes?
> 

How about...

s/DRM_GPUVM_SM_MAP_NOT_MADVISE/DRM_GPUVM_SM_MAP_OPS_FLAG_NONE/

> > +	/**
> > +	 * @DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
> > +	 * drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
> > +	 * user-provided range and split the existing non-GEM object VMA if the
> > +	 * start or end of the input range lies within it. The operations can
> > +	 * create up to 2 REMAPS and 2 MAPs. Unlike drm_gpuvm_sm_map_ops_flags
> > +	 * in default mode, the operation with this flag will never have UNMAPs and
> > +	 * merges, and can be without any final operations.
> > +	 */
> > +	DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE = BIT(0),

Then normalize this one...

s/DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE/DRM_GPUVM_SM_MAP_OPS_FLAG_SPLIT_MADVISE/

Matt

> > +};

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops
  2025-07-22 13:38   ` Danilo Krummrich
  2025-07-24  0:43     ` Matthew Brost
@ 2025-07-24 10:02     ` Ghimiray, Himal Prasad
  1 sibling, 0 replies; 55+ messages in thread
From: Ghimiray, Himal Prasad @ 2025-07-24 10:02 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: intel-xe, Matthew Brost, Thomas Hellström, Danilo Krummrich,
	Boris Brezillon, Caterina Shablia, dri-devel



On 22-07-2025 19:08, Danilo Krummrich wrote:
> (Cc: Caterina)
> 
> On Tue Jul 22, 2025 at 3:35 PM CEST, Himal Prasad Ghimiray wrote:
>> - DRM_GPUVM_SM_MAP_NOT_MADVISE: Default sm_map operations for the input
>>    range.
>>
>> - DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
>>    drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
>> user-provided range and split the existing non-GEM object VMA if the
>> start or end of the input range lies within it. The operations can
>> create up to 2 REMAPS and 2 MAPs. The purpose of this operation is to be
>> used by the Xe driver to assign attributes to GPUVMA's within the
>> user-defined range. Unlike drm_gpuvm_sm_map_ops_flags in default mode,
>> the operation with this flag will never have UNMAPs and
>> merges, and can be without any final operations.
>>
>> v2
>> - use drm_gpuvm_sm_map_ops_create with flags instead of defining new
>>    ops_create (Danilo)
>> - Add doc (Danilo)
>>
>> v3
>> - Fix doc
>> - Fix unmapping check
>>
>> v4
>> - Fix mapping for non madvise ops
>>
>> Cc: Danilo Krummrich <dakr@redhat.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Boris Brezillon <bbrezillon@kernel.org>
>> Cc: <dri-devel@lists.freedesktop.org>
>> Signed-off-by: Himal Prasad Ghimiray<himal.prasad.ghimiray@intel.com>
>> ---
>>   drivers/gpu/drm/drm_gpuvm.c            | 93 ++++++++++++++++++++------
>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c |  1 +
>>   drivers/gpu/drm/xe/xe_vm.c             |  1 +
> 
> What about the other drivers using GPUVM, aren't they affected by the changes?

Apart from xe, nouveau_uvmm.c is the only user of 
drm_gpuvm_sm_map_ops_create api and patch takes care for nouveau_uvmm.c


> 
>>   include/drm/drm_gpuvm.h                | 25 ++++++-
>>   4 files changed, 98 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
>> index e89b932e987c..c7779588ea38 100644
>> --- a/drivers/gpu/drm/drm_gpuvm.c
>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>> @@ -2103,10 +2103,13 @@ static int
>>   __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>>   		   const struct drm_gpuvm_ops *ops, void *priv,
>>   		   u64 req_addr, u64 req_range,
>> +		   enum drm_gpuvm_sm_map_ops_flags flags,
> 
> Please coordinate with Boris and Caterina here. They're adding a new request
> structure, struct drm_gpuvm_map_req.
> 
> I think we can define it as
> 
> 	struct drm_gpuvm_map_req {
> 		struct drm_gpuva_op_map map;
> 		struct drm_gpuvm_sm_map_ops_flags flags;
> 	}
> 
> eventually.

Sure will check this.

> 
> Please also coordinate on the changes in __drm_gpuvm_sm_map() below regarding
> Caterina's series [1], it looks like they're conflicting.

Will give it a look

> 
> [1] https://lore.kernel.org/all/20250707170442.1437009-1-caterina.shablia@collabora.com/
> 
>> +/**
>> + * enum drm_gpuvm_sm_map_ops_flags - flags for drm_gpuvm split/merge ops
>> + */
>> +enum drm_gpuvm_sm_map_ops_flags {
>> +	/**
>> +	 * @DRM_GPUVM_SM_MAP_NOT_MADVISE: DEFAULT sm_map ops
>> +	 */
>> +	DRM_GPUVM_SM_MAP_NOT_MADVISE = 0,
> 
> Why would we name this "NOT_MADVISE"? What if we add more flags for other
> purposes?

How about something like DRM_GPUVM_SM_MAP_DEFAULT ?

> 
>> +	/**
>> +	 * @DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
>> +	 * drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
>> +	 * user-provided range and split the existing non-GEM object VMA if the
>> +	 * start or end of the input range lies within it. The operations can
>> +	 * create up to 2 REMAPS and 2 MAPs. Unlike drm_gpuvm_sm_map_ops_flags
>> +	 * in default mode, the operation with this flag will never have UNMAPs and
>> +	 * merges, and can be without any final operations.
>> +	 */
>> +	DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE = BIT(0),
>> +};


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops
  2025-07-24  0:43     ` Matthew Brost
@ 2025-07-24 10:05       ` Ghimiray, Himal Prasad
  2025-07-24 10:32       ` Caterina Shablia
  1 sibling, 0 replies; 55+ messages in thread
From: Ghimiray, Himal Prasad @ 2025-07-24 10:05 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich
  Cc: intel-xe, Thomas Hellström, Danilo Krummrich,
	Boris Brezillon, Caterina Shablia, dri-devel



On 24-07-2025 06:13, Matthew Brost wrote:
> On Tue, Jul 22, 2025 at 03:38:14PM +0200, Danilo Krummrich wrote:
>> (Cc: Caterina)
>>
>> On Tue Jul 22, 2025 at 3:35 PM CEST, Himal Prasad Ghimiray wrote:
>>> - DRM_GPUVM_SM_MAP_NOT_MADVISE: Default sm_map operations for the input
>>>    range.
>>>
>>> - DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
>>>    drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
>>> user-provided range and split the existing non-GEM object VMA if the
>>> start or end of the input range lies within it. The operations can
>>> create up to 2 REMAPS and 2 MAPs. The purpose of this operation is to be
>>> used by the Xe driver to assign attributes to GPUVMA's within the
>>> user-defined range. Unlike drm_gpuvm_sm_map_ops_flags in default mode,
>>> the operation with this flag will never have UNMAPs and
>>> merges, and can be without any final operations.
>>>
>>> v2
>>> - use drm_gpuvm_sm_map_ops_create with flags instead of defining new
>>>    ops_create (Danilo)
>>> - Add doc (Danilo)
>>>
>>> v3
>>> - Fix doc
>>> - Fix unmapping check
>>>
>>> v4
>>> - Fix mapping for non madvise ops
>>>
>>> Cc: Danilo Krummrich <dakr@redhat.com>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: Boris Brezillon <bbrezillon@kernel.org>
>>> Cc: <dri-devel@lists.freedesktop.org>
>>> Signed-off-by: Himal Prasad Ghimiray<himal.prasad.ghimiray@intel.com>
>>> ---
>>>   drivers/gpu/drm/drm_gpuvm.c            | 93 ++++++++++++++++++++------
>>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c |  1 +
>>>   drivers/gpu/drm/xe/xe_vm.c             |  1 +
>>
>> What about the other drivers using GPUVM, aren't they affected by the changes?
>>
> 
> Yes, this seemly would break the build or other users. If the baseline
> includes the patch below that I suggest to pull in this is a moot point
> though.
> 
>>>   include/drm/drm_gpuvm.h                | 25 ++++++-
>>>   4 files changed, 98 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
>>> index e89b932e987c..c7779588ea38 100644
>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>> @@ -2103,10 +2103,13 @@ static int
>>>   __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>>>   		   const struct drm_gpuvm_ops *ops, void *priv,
>>>   		   u64 req_addr, u64 req_range,
>>> +		   enum drm_gpuvm_sm_map_ops_flags flags,
>>
>> Please coordinate with Boris and Caterina here. They're adding a new request
>> structure, struct drm_gpuvm_map_req.
>>
>> I think we can define it as
>>
>> 	struct drm_gpuvm_map_req {
>> 		struct drm_gpuva_op_map map;
>> 		struct drm_gpuvm_sm_map_ops_flags flags;
>> 	}
> 
> +1, I see the patch [2] and the suggested change to drm_gpuva_op_map
> [3]. Both patch and your suggestion look good to me.
> 
> Perhaps we try to accelerate [2] landing ahead of either series as
> overall just looks like a good cleanup which can be merged asap.
> 
> Himal - I'd rebase on top [2], with Danilo suggestion in [3] if this
> hasn't landed by your next rev.
> 
> [2] https://lore.kernel.org/all/20250707170442.1437009-4-caterina.shablia@collabora.com/
> [3] https://lore.kernel.org/all/DB61N61AKIJ3.FG7GUJBG386P@kernel.org/
> 

Sure will take care of this.
  >>
>> eventually.
>>
>> Please also coordinate on the changes in __drm_gpuvm_sm_map() below regarding
>> Caterina's series [1], it looks like they're conflicting.
>>
> 
> It looks pretty minor actually. I'm sure if really matter who this is
> race but yes, always good to coordinate.
> 
>> [1] https://lore.kernel.org/all/20250707170442.1437009-1-caterina.shablia@collabora.com/
>>
>>> +/**
>>> + * enum drm_gpuvm_sm_map_ops_flags - flags for drm_gpuvm split/merge ops
>>> + */
>>> +enum drm_gpuvm_sm_map_ops_flags {
>>> +	/**
>>> +	 * @DRM_GPUVM_SM_MAP_NOT_MADVISE: DEFAULT sm_map ops
>>> +	 */
>>> +	DRM_GPUVM_SM_MAP_NOT_MADVISE = 0,
>>
>> Why would we name this "NOT_MADVISE"? What if we add more flags for other
>> purposes?
>>
> 
> How about...
> 
> s/DRM_GPUVM_SM_MAP_NOT_MADVISE/DRM_GPUVM_SM_MAP_OPS_FLAG_NONE/

I was thinking DRM_GPUVM_SM_MAP_DEFAULT, but 
DRM_GPUVM_SM_MAP_OPS_FLAG_NONE looks better. will update it in next rev.

> 
>>> +	/**
>>> +	 * @DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
>>> +	 * drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
>>> +	 * user-provided range and split the existing non-GEM object VMA if the
>>> +	 * start or end of the input range lies within it. The operations can
>>> +	 * create up to 2 REMAPS and 2 MAPs. Unlike drm_gpuvm_sm_map_ops_flags
>>> +	 * in default mode, the operation with this flag will never have UNMAPs and
>>> +	 * merges, and can be without any final operations.
>>> +	 */
>>> +	DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE = BIT(0),
> 
> Then normalize this one...
> 
> s/DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE/DRM_GPUVM_SM_MAP_OPS_FLAG_SPLIT_MADVISE/

Sure

> 
> Matt
> 
>>> +};


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops
  2025-07-24  0:43     ` Matthew Brost
  2025-07-24 10:05       ` Ghimiray, Himal Prasad
@ 2025-07-24 10:32       ` Caterina Shablia
  2025-07-28 10:20         ` Ghimiray, Himal Prasad
  1 sibling, 1 reply; 55+ messages in thread
From: Caterina Shablia @ 2025-07-24 10:32 UTC (permalink / raw)
  To: Danilo Krummrich, Matthew Brost
  Cc: Himal Prasad Ghimiray, intel-xe, Thomas Hellström,
	Danilo Krummrich, Boris Brezillon, dri-devel

El jueves, 24 de julio de 2025 2:43:56 (hora de verano de Europa central), 
Matthew Brost escribió:
> On Tue, Jul 22, 2025 at 03:38:14PM +0200, Danilo Krummrich wrote:
> > (Cc: Caterina)
> > 
> > On Tue Jul 22, 2025 at 3:35 PM CEST, Himal Prasad Ghimiray wrote:
> > > - DRM_GPUVM_SM_MAP_NOT_MADVISE: Default sm_map operations for the input
> > > 
> > >   range.
> > > 
> > > - DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
> > > 
> > >   drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
> > > 
> > > user-provided range and split the existing non-GEM object VMA if the
> > > start or end of the input range lies within it. The operations can
> > > create up to 2 REMAPS and 2 MAPs. The purpose of this operation is to be
> > > used by the Xe driver to assign attributes to GPUVMA's within the
> > > user-defined range. Unlike drm_gpuvm_sm_map_ops_flags in default mode,
> > > the operation with this flag will never have UNMAPs and
> > > merges, and can be without any final operations.
> > > 
> > > v2
> > > - use drm_gpuvm_sm_map_ops_create with flags instead of defining new
> > > 
> > >   ops_create (Danilo)
> > > 
> > > - Add doc (Danilo)
> > > 
> > > v3
> > > - Fix doc
> > > - Fix unmapping check
> > > 
> > > v4
> > > - Fix mapping for non madvise ops
> > > 
> > > Cc: Danilo Krummrich <dakr@redhat.com>
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Cc: Boris Brezillon <bbrezillon@kernel.org>
> > > Cc: <dri-devel@lists.freedesktop.org>
> > > Signed-off-by: Himal Prasad Ghimiray<himal.prasad.ghimiray@intel.com>
> > > ---
> > > 
> > >  drivers/gpu/drm/drm_gpuvm.c            | 93 ++++++++++++++++++++------
> > >  drivers/gpu/drm/nouveau/nouveau_uvmm.c |  1 +
> > >  drivers/gpu/drm/xe/xe_vm.c             |  1 +
> > 
> > What about the other drivers using GPUVM, aren't they affected by the
> > changes?
> Yes, this seemly would break the build or other users. If the baseline
> includes the patch below that I suggest to pull in this is a moot point
> though.
> 
> > >  include/drm/drm_gpuvm.h                | 25 ++++++-
> > >  4 files changed, 98 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> > > index e89b932e987c..c7779588ea38 100644
> > > --- a/drivers/gpu/drm/drm_gpuvm.c
> > > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > > @@ -2103,10 +2103,13 @@ static int
> > > 
> > >  __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > >  
> > >  		   const struct drm_gpuvm_ops *ops, void *priv,
> > >  		   u64 req_addr, u64 req_range,
> > > 
> > > +		   enum drm_gpuvm_sm_map_ops_flags flags,
> > 
> > Please coordinate with Boris and Caterina here. They're adding a new
> > request structure, struct drm_gpuvm_map_req.
> > 
> > I think we can define it as
> > 
> > 	struct drm_gpuvm_map_req {
> > 	
> > 		struct drm_gpuva_op_map map;
> > 		struct drm_gpuvm_sm_map_ops_flags flags;
> > 	
> > 	}
> 
> +1, I see the patch [2] and the suggested change to drm_gpuva_op_map
> [3]. Both patch and your suggestion look good to me.
> 
> Perhaps we try to accelerate [2] landing ahead of either series as
> overall just looks like a good cleanup which can be merged asap.
I'm not sure my patchset would be in a mergeable state any time soon -- I've 
discovered some issues with split/merge of repeated mappings while writing the 
doc, so it will be a while before I'll be submitting that again. [2] itself is 
in a good shape, absolutely feel free to submit that as part of your series.
> 
> Himal - I'd rebase on top [2], with Danilo suggestion in [3] if this
> hasn't landed by your next rev.
> 
> [2]
> https://lore.kernel.org/all/20250707170442.1437009-4-caterina.shablia@colla
> bora.com/ [3]
> https://lore.kernel.org/all/DB61N61AKIJ3.FG7GUJBG386P@kernel.org/
> > eventually.
> > 
> > Please also coordinate on the changes in __drm_gpuvm_sm_map() below
> > regarding Caterina's series [1], it looks like they're conflicting.
> 
> It looks pretty minor actually. I'm sure if really matter who this is
> race but yes, always good to coordinate.
> 
> > [1]
> > https://lore.kernel.org/all/20250707170442.1437009-1-caterina.shablia@col
> > labora.com/> 
> > > +/**
> > > + * enum drm_gpuvm_sm_map_ops_flags - flags for drm_gpuvm split/merge
> > > ops
> > > + */
> > > +enum drm_gpuvm_sm_map_ops_flags {
> > > +	/**
> > > +	 * @DRM_GPUVM_SM_MAP_NOT_MADVISE: DEFAULT sm_map ops
> > > +	 */
> > > +	DRM_GPUVM_SM_MAP_NOT_MADVISE = 0,
> > 
> > Why would we name this "NOT_MADVISE"? What if we add more flags for other
> > purposes?
> 
> How about...
> 
> s/DRM_GPUVM_SM_MAP_NOT_MADVISE/DRM_GPUVM_SM_MAP_OPS_FLAG_NONE/
> 
> > > +	/**
> > > +	 * @DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
> > > +	 * drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
> > > +	 * user-provided range and split the existing non-GEM object VMA 
if
> > > the
> > > +	 * start or end of the input range lies within it. The operations 
can
> > > +	 * create up to 2 REMAPS and 2 MAPs. Unlike 
drm_gpuvm_sm_map_ops_flags
> > > +	 * in default mode, the operation with this flag will never have
> > > UNMAPs and +	 * merges, and can be without any final operations.
> > > +	 */
> > > +	DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE = BIT(0),
> 
> Then normalize this one...
> 
> s/DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE/DRM_GPUVM_SM_MAP_OPS_FLAG_SPLIT_MA
> DVISE/
> 
> Matt
> 
> > > +};





^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 20/23] drm/xe: Reset VMA attributes to default in SVM garbage collector
  2025-07-22 13:35 ` [PATCH v5 20/23] drm/xe: Reset VMA attributes to default in SVM garbage collector Himal Prasad Ghimiray
@ 2025-07-24 21:50   ` Matthew Brost
  2025-07-29  5:27     ` Matthew Brost
  2025-07-30  6:09     ` Ghimiray, Himal Prasad
  2025-07-29  5:41   ` Matthew Brost
  1 sibling, 2 replies; 55+ messages in thread
From: Matthew Brost @ 2025-07-24 21:50 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström

On Tue, Jul 22, 2025 at 07:05:23PM +0530, Himal Prasad Ghimiray wrote:
> Restore default memory attributes for VMAs during garbage collection
> if they were modified by madvise. Reuse existing VMA if fully overlapping;
> otherwise, allocate a new mirror VMA.
> 
> Suggested-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_svm.c |  34 +++++++++
>  drivers/gpu/drm/xe/xe_vm.c  | 140 +++++++++++++++++++++++++-----------
>  drivers/gpu/drm/xe/xe_vm.h  |   2 +
>  3 files changed, 135 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index ba1233d0d5a2..79709dc066b9 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -255,7 +255,18 @@ static int __xe_svm_garbage_collector(struct xe_vm *vm,
>  static int xe_svm_garbage_collector(struct xe_vm *vm)
>  {
>  	struct xe_svm_range *range;
> +	struct xe_vma *vma;
> +	u64 range_start;
> +	u64 range_size;
> +	u64 range_end;
>  	int err;
> +	struct xe_vma_mem_attr default_attr = {
> +		.preferred_loc = {
> +			.devmem_fd = DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE,
> +			.migration_policy = DRM_XE_MIGRATE_ALL_PAGES,
> +		},
> +		.atomic_access = DRM_XE_ATOMIC_UNDEFINED,
> +	};
>  
>  	lockdep_assert_held_write(&vm->lock);
>  
> @@ -270,6 +281,12 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
>  		if (!range)
>  			break;
>  
> +		range_start = xe_svm_range_start(range);
> +		range_size = xe_svm_range_size(range);
> +		range_end = xe_svm_range_end(range);
> +
> +		vma = xe_vm_find_vma_by_addr(vm, xe_svm_range_start(range));
> +

I'd find the VMA outside of the svm.garbage_collector.lock.

>  		list_del(&range->garbage_collector_link);
>  		spin_unlock(&vm->svm.garbage_collector.lock);
>  
> @@ -282,7 +299,24 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
>  			return err;
>  		}
>  
> +		if (!xe_vma_has_default_mem_attrs(vma)) {

It seems possible the VMA could be NULL in error cases. I'd check for
NULL and error out.

Also could this code be moved to a helper? Internal SVM seems ok, in
that case xe_vm_find_vma_by_addr could also be in the helper.

> +			vm_dbg(&vm->xe->drm, "Existing VMA start=0x%016llx, vma_end=0x%016llx",
> +			       xe_vma_start(vma), xe_vma_end(vma));
> +
> +			if (xe_vma_start(vma) == range_start && xe_vma_end(vma) == range_end) {
> +				default_attr.pat_index = vma->attr.default_pat_index;
> +				default_attr.default_pat_index  = vma->attr.default_pat_index;
> +				vma->attr = default_attr;
> +			} else {
> +				vm_dbg(&vm->xe->drm, "Split VMA start=0x%016llx, vma_end=0x%016llx",
> +				       range_start, range_end);
> +				err = xe_vm_alloc_cpu_addr_mirror_vma(vm, range_start, range_size);
> +				if (err)

On error, I'd print a message and kill the VM as it shouldn't be
possible to fail aside from a memory allocation failure and we can't
code with errors given this can be inside a worker.

I'll circle back to the rest of the patch a bit later.

Matt

> +					return err;
> +			}
> +		}
>  		spin_lock(&vm->svm.garbage_collector.lock);
> +
>  	}
>  	spin_unlock(&vm->svm.garbage_collector.lock);
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index d3f08bf9a3ee..003c8209f8bd 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -4254,34 +4254,24 @@ bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool
>  	}
>  }
>  
> -/**
> - * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
> - * @vm: Pointer to the xe_vm structure
> - * @start: Starting input address
> - * @range: Size of the input range
> - *
> - * This function splits existing vma to create new vma for user provided input range
> - *
> - *  Return: 0 if success
> - */
> -int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
> +static int xe_vm_alloc_vma(struct xe_vm *vm,
> +			   u64 start, u64 range,
> +			   enum drm_gpuvm_sm_map_ops_flags flags)
>  {
>  	struct xe_vma_ops vops;
>  	struct drm_gpuva_ops *ops = NULL;
>  	struct drm_gpuva_op *__op;
>  	bool is_cpu_addr_mirror = false;
>  	bool remap_op = false;
> +	bool is_madvise = flags == DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE;
>  	struct xe_vma_mem_attr tmp_attr;
> +	u16 default_pat;
>  	int err;
>  
> -	vm_dbg(&vm->xe->drm, "MADVISE IN: addr=0x%016llx, size=0x%016llx", start, range);
> -
>  	lockdep_assert_held_write(&vm->lock);
>  
> -	vm_dbg(&vm->xe->drm, "MADVISE_OPS_CREATE: addr=0x%016llx, size=0x%016llx", start, range);
>  	ops = drm_gpuvm_sm_map_ops_create(&vm->gpuvm, start, range,
> -					  DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE,
> -					  NULL, start);
> +					  flags, NULL, start);
>  	if (IS_ERR(ops))
>  		return PTR_ERR(ops);
>  
> @@ -4292,33 +4282,56 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>  
>  	drm_gpuva_for_each_op(__op, ops) {
>  		struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
> +		struct xe_vma *vma = NULL;
>  
> -		if (__op->op == DRM_GPUVA_OP_REMAP) {
> -			xe_assert(vm->xe, !remap_op);
> -			remap_op = true;
> +		if (!is_madvise) {
> +			if (__op->op == DRM_GPUVA_OP_UNMAP) {
> +				vma = gpuva_to_vma(op->base.unmap.va);
> +				XE_WARN_ON(!xe_vma_has_default_mem_attrs(vma));
> +				default_pat = vma->attr.default_pat_index;
> +			}
>  
> -			if (xe_vma_is_cpu_addr_mirror(gpuva_to_vma(op->base.remap.unmap->va)))
> -				is_cpu_addr_mirror = true;
> -			else
> -				is_cpu_addr_mirror = false;
> -		}
> +			if (__op->op == DRM_GPUVA_OP_REMAP) {
> +				vma = gpuva_to_vma(op->base.remap.unmap->va);
> +				default_pat = vma->attr.default_pat_index;
> +			}
>  
> -		if (__op->op == DRM_GPUVA_OP_MAP) {
> -			xe_assert(vm->xe, remap_op);
> -			remap_op = false;
> +			if (__op->op == DRM_GPUVA_OP_MAP) {
> +				op->map.is_cpu_addr_mirror = true;
> +				op->map.pat_index = default_pat;
> +			}
> +		} else {
> +			if (__op->op == DRM_GPUVA_OP_REMAP) {
> +				vma = gpuva_to_vma(op->base.remap.unmap->va);
> +				xe_assert(vm->xe, !remap_op);
> +				remap_op = true;
>  
> -			/* In case of madvise ops DRM_GPUVA_OP_MAP is always after
> -			 * DRM_GPUVA_OP_REMAP, so ensure we assign op->map.is_cpu_addr_mirror true
> -			 * if REMAP is for xe_vma_is_cpu_addr_mirror vma
> -			 */
> -			op->map.is_cpu_addr_mirror = is_cpu_addr_mirror;
> -		}
> +				if (xe_vma_is_cpu_addr_mirror(vma))
> +					is_cpu_addr_mirror = true;
> +				else
> +					is_cpu_addr_mirror = false;
> +			}
>  
> +			if (__op->op == DRM_GPUVA_OP_MAP) {
> +				xe_assert(vm->xe, remap_op);
> +				remap_op = false;
> +				/*
> +				 * In case of madvise ops DRM_GPUVA_OP_MAP is
> +				 * always after DRM_GPUVA_OP_REMAP, so ensure
> +				 * we assign op->map.is_cpu_addr_mirror true
> +				 * if REMAP is for xe_vma_is_cpu_addr_mirror vma
> +				 */
> +				op->map.is_cpu_addr_mirror = is_cpu_addr_mirror;
> +			}
> +		}
>  		print_op(vm->xe, __op);
>  	}
>  
>  	xe_vma_ops_init(&vops, vm, NULL, NULL, 0);
> -	vops.flags |= XE_VMA_OPS_FLAG_MADVISE;
> +
> +	if (is_madvise)
> +		vops.flags |= XE_VMA_OPS_FLAG_MADVISE;
> +
>  	err = vm_bind_ioctl_ops_parse(vm, ops, &vops);
>  	if (err)
>  		goto unwind_ops;
> @@ -4330,15 +4343,20 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>  		struct xe_vma *vma;
>  
>  		if (__op->op == DRM_GPUVA_OP_UNMAP) {
> -			/* There should be no unmap */
> -			XE_WARN_ON("UNEXPECTED UNMAP");
> -			xe_vma_destroy(gpuva_to_vma(op->base.unmap.va), NULL);
> +			vma = gpuva_to_vma(op->base.unmap.va);
> +			/* There should be no unmap for madvise */
> +			if (is_madvise)
> +				XE_WARN_ON("UNEXPECTED UNMAP");
> +
> +			xe_vma_destroy(vma, NULL);
>  		} else if (__op->op == DRM_GPUVA_OP_REMAP) {
>  			vma = gpuva_to_vma(op->base.remap.unmap->va);
> -			/* Store attributes for REMAP UNMAPPED VMA, so they can be assigned
> -			 * to newly MAP created vma.
> +			/* In case of madvise ops Store attributes for REMAP UNMAPPED
> +			 * VMA, so they can be assigned to newly MAP created vma.
>  			 */
> -			tmp_attr = vma->attr;
> +			if (is_madvise)
> +				tmp_attr = vma->attr;
> +
>  			xe_vma_destroy(gpuva_to_vma(op->base.remap.unmap->va), NULL);
>  		} else if (__op->op == DRM_GPUVA_OP_MAP) {
>  			vma = op->map.vma;
> @@ -4346,7 +4364,8 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>  			 * Therefore temp_attr will always have sane values, making it safe to
>  			 * copy them to new vma.
>  			 */
> -			vma->attr = tmp_attr;
> +			if (is_madvise)
> +				vma->attr = tmp_attr;
>  		}
>  	}
>  
> @@ -4360,3 +4379,42 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>  	drm_gpuva_ops_free(&vm->gpuvm, ops);
>  	return err;
>  }
> +
> +/**
> + * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
> + * @vm: Pointer to the xe_vm structure
> + * @start: Starting input address
> + * @range: Size of the input range
> + *
> + * This function splits existing vma to create new vma for user provided input range
> + *
> + * Return: 0 if success
> + */
> +int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
> +{
> +	lockdep_assert_held_write(&vm->lock);
> +
> +	vm_dbg(&vm->xe->drm, "MADVISE_OPS_CREATE: addr=0x%016llx, size=0x%016llx", start, range);
> +
> +	return xe_vm_alloc_vma(vm, start, range, DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE);
> +}
> +
> +/**
> + * xe_vm_alloc_cpu_addr_mirror_vma - Allocate CPU addr mirror vma
> + * @vm: Pointer to the xe_vm structure
> + * @start: Starting input address
> + * @range: Size of the input range
> + *
> + * This function splits/merges existing vma to create new vma for user provided input range
> + *
> + * Return: 0 if success
> + */
> +int xe_vm_alloc_cpu_addr_mirror_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
> +{
> +	lockdep_assert_held_write(&vm->lock);
> +
> +	vm_dbg(&vm->xe->drm, "CPU_ADDR_MIRROR_VMA_OPS_CREATE: addr=0x%016llx, size=0x%016llx",
> +	       start, range);
> +
> +	return xe_vm_alloc_vma(vm, start, range, DRM_GPUVM_SM_MAP_NOT_MADVISE);
> +}
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index a4db843de540..f7b9ad83685a 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -177,6 +177,8 @@ bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool
>  
>  int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
>  
> +int xe_vm_alloc_cpu_addr_mirror_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
> +
>  /**
>   * to_userptr_vma() - Return a pointer to an embedding userptr vma
>   * @vma: Pointer to the embedded struct xe_vma
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops
  2025-07-22 13:35 ` [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops Himal Prasad Ghimiray
  2025-07-22 13:38   ` Danilo Krummrich
@ 2025-07-27 21:18   ` Matthew Brost
  2025-07-28  6:16     ` Ghimiray, Himal Prasad
  1 sibling, 1 reply; 55+ messages in thread
From: Matthew Brost @ 2025-07-27 21:18 UTC (permalink / raw)
  To: Himal Prasad Ghimiray
  Cc: intel-xe, Thomas Hellström, Danilo Krummrich,
	Boris Brezillon, dri-devel

On Tue, Jul 22, 2025 at 07:05:04PM +0530, Himal Prasad Ghimiray wrote:
> - DRM_GPUVM_SM_MAP_NOT_MADVISE: Default sm_map operations for the input
>   range.
> 
> - DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
>   drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
> user-provided range and split the existing non-GEM object VMA if the
> start or end of the input range lies within it. The operations can
> create up to 2 REMAPS and 2 MAPs. The purpose of this operation is to be
> used by the Xe driver to assign attributes to GPUVMA's within the
> user-defined range. Unlike drm_gpuvm_sm_map_ops_flags in default mode,
> the operation with this flag will never have UNMAPs and
> merges, and can be without any final operations.
> 
> v2
> - use drm_gpuvm_sm_map_ops_create with flags instead of defining new
>   ops_create (Danilo)
> - Add doc (Danilo)
> 
> v3
> - Fix doc
> - Fix unmapping check
> 
> v4
> - Fix mapping for non madvise ops
> 
> Cc: Danilo Krummrich <dakr@redhat.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Boris Brezillon <bbrezillon@kernel.org>
> Cc: <dri-devel@lists.freedesktop.org>
> Signed-off-by: Himal Prasad Ghimiray<himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/drm_gpuvm.c            | 93 ++++++++++++++++++++------
>  drivers/gpu/drm/nouveau/nouveau_uvmm.c |  1 +
>  drivers/gpu/drm/xe/xe_vm.c             |  1 +
>  include/drm/drm_gpuvm.h                | 25 ++++++-
>  4 files changed, 98 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> index e89b932e987c..c7779588ea38 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -2103,10 +2103,13 @@ static int
>  __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>  		   const struct drm_gpuvm_ops *ops, void *priv,
>  		   u64 req_addr, u64 req_range,
> +		   enum drm_gpuvm_sm_map_ops_flags flags,
>  		   struct drm_gem_object *req_obj, u64 req_offset)
>  {
>  	struct drm_gpuva *va, *next;
>  	u64 req_end = req_addr + req_range;
> +	bool is_madvise_ops = (flags == DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE);
> +	bool needs_map = !is_madvise_ops;
>  	int ret;
>  
>  	if (unlikely(!drm_gpuvm_range_valid(gpuvm, req_addr, req_range)))
> @@ -2119,26 +2122,35 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>  		u64 range = va->va.range;
>  		u64 end = addr + range;
>  		bool merge = !!va->gem.obj;
> +		bool skip_madvise_ops = is_madvise_ops && merge;
>  
> +		needs_map = !is_madvise_ops;
>  		if (addr == req_addr) {
>  			merge &= obj == req_obj &&
>  				 offset == req_offset;
>  
>  			if (end == req_end) {
> -				ret = op_unmap_cb(ops, priv, va, merge);
> -				if (ret)
> -					return ret;
> +				if (!is_madvise_ops) {
> +					ret = op_unmap_cb(ops, priv, va, merge);
> +					if (ret)
> +						return ret;
> +				}
>  				break;
>  			}
>  
>  			if (end < req_end) {
> -				ret = op_unmap_cb(ops, priv, va, merge);
> -				if (ret)
> -					return ret;
> +				if (!is_madvise_ops) {
> +					ret = op_unmap_cb(ops, priv, va, merge);
> +					if (ret)
> +						return ret;
> +				}
>  				continue;
>  			}
>  
>  			if (end > req_end) {
> +				if (skip_madvise_ops)
> +					break;
> +
>  				struct drm_gpuva_op_map n = {
>  					.va.addr = req_end,
>  					.va.range = range - req_range,
> @@ -2153,6 +2165,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>  				ret = op_remap_cb(ops, priv, NULL, &n, &u);
>  				if (ret)
>  					return ret;
> +
> +				if (is_madvise_ops)
> +					needs_map = true;
>  				break;
>  			}
>  		} else if (addr < req_addr) {
> @@ -2170,20 +2185,42 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>  			u.keep = merge;
>  
>  			if (end == req_end) {
> +				if (skip_madvise_ops)
> +					break;
> +
>  				ret = op_remap_cb(ops, priv, &p, NULL, &u);
>  				if (ret)
>  					return ret;
> +
> +				if (is_madvise_ops)
> +					needs_map = true;
> +
>  				break;
>  			}
>  
>  			if (end < req_end) {
> +				if (skip_madvise_ops)
> +					continue;
> +
>  				ret = op_remap_cb(ops, priv, &p, NULL, &u);
>  				if (ret)
>  					return ret;
> +
> +				if (is_madvise_ops) {
> +					ret = op_map_cb(ops, priv, req_addr,
> +							min(end - req_addr, req_end - end),

This doesn't look right.

This creating a new MAP operation to replace what the REMAP operation
unmapped but didn't remap. In Xe debug speak, this is where we are:

REMAP:UNMAP
REMAP:PREV
MAP <-- This is the calculation we are doing.

We want to 'MAP' to size here to be:

'REMAP:UNMAP.end - REMAP:PREV.end'

Which is 'end - req_addr'. So delete the min statement here and replace
with 'end - req_addr'.

Matt

> +							NULL, req_offset);
> +					if (ret)
> +						return ret;
> +				}
> +
>  				continue;
>  			}
>  
>  			if (end > req_end) {
> +				if (skip_madvise_ops)
> +					break;
> +
>  				struct drm_gpuva_op_map n = {
>  					.va.addr = req_end,
>  					.va.range = end - req_end,
> @@ -2195,6 +2232,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>  				ret = op_remap_cb(ops, priv, &p, &n, &u);
>  				if (ret)
>  					return ret;
> +
> +				if (is_madvise_ops)
> +					needs_map = true;
>  				break;
>  			}
>  		} else if (addr > req_addr) {
> @@ -2203,20 +2243,29 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>  					   (addr - req_addr);
>  
>  			if (end == req_end) {
> -				ret = op_unmap_cb(ops, priv, va, merge);
> -				if (ret)
> -					return ret;
> +				if (!is_madvise_ops) {
> +					ret = op_unmap_cb(ops, priv, va, merge);
> +					if (ret)
> +						return ret;
> +				}
> +
>  				break;
>  			}
>  
>  			if (end < req_end) {
> -				ret = op_unmap_cb(ops, priv, va, merge);
> -				if (ret)
> -					return ret;
> +				if (!is_madvise_ops) {
> +					ret = op_unmap_cb(ops, priv, va, merge);
> +					if (ret)
> +						return ret;
> +				}
> +
>  				continue;
>  			}
>  
>  			if (end > req_end) {
> +				if (skip_madvise_ops)
> +					break;
> +
>  				struct drm_gpuva_op_map n = {
>  					.va.addr = req_end,
>  					.va.range = end - req_end,
> @@ -2231,14 +2280,16 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>  				ret = op_remap_cb(ops, priv, NULL, &n, &u);
>  				if (ret)
>  					return ret;
> +
> +				if (is_madvise_ops)
> +					return op_map_cb(ops, priv, addr,
> +							(req_end - addr), NULL, req_offset);
>  				break;
>  			}
>  		}
>  	}
> -
> -	return op_map_cb(ops, priv,
> -			 req_addr, req_range,
> -			 req_obj, req_offset);
> +	return needs_map ? op_map_cb(ops, priv, req_addr,
> +			   req_range, req_obj, req_offset) : 0;
>  }
>  
>  static int
> @@ -2337,15 +2388,15 @@ drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
>  		 struct drm_gem_object *req_obj, u64 req_offset)
>  {
>  	const struct drm_gpuvm_ops *ops = gpuvm->ops;
> +	enum drm_gpuvm_sm_map_ops_flags flags = DRM_GPUVM_SM_MAP_NOT_MADVISE;
>  
>  	if (unlikely(!(ops && ops->sm_step_map &&
>  		       ops->sm_step_remap &&
>  		       ops->sm_step_unmap)))
>  		return -EINVAL;
>  
> -	return __drm_gpuvm_sm_map(gpuvm, ops, priv,
> -				  req_addr, req_range,
> -				  req_obj, req_offset);
> +	return __drm_gpuvm_sm_map(gpuvm, ops, priv, req_addr, req_range,
> +				  flags, req_obj, req_offset);
>  }
>  EXPORT_SYMBOL_GPL(drm_gpuvm_sm_map);
>  
> @@ -2487,6 +2538,7 @@ static const struct drm_gpuvm_ops gpuvm_list_ops = {
>   * @gpuvm: the &drm_gpuvm representing the GPU VA space
>   * @req_addr: the start address of the new mapping
>   * @req_range: the range of the new mapping
> + * @drm_gpuvm_sm_map_ops_flag: ops flag determining madvise or not
>   * @req_obj: the &drm_gem_object to map
>   * @req_offset: the offset within the &drm_gem_object
>   *
> @@ -2517,6 +2569,7 @@ static const struct drm_gpuvm_ops gpuvm_list_ops = {
>  struct drm_gpuva_ops *
>  drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm,
>  			    u64 req_addr, u64 req_range,
> +			    enum drm_gpuvm_sm_map_ops_flags flags,
>  			    struct drm_gem_object *req_obj, u64 req_offset)
>  {
>  	struct drm_gpuva_ops *ops;
> @@ -2536,7 +2589,7 @@ drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm,
>  	args.ops = ops;
>  
>  	ret = __drm_gpuvm_sm_map(gpuvm, &gpuvm_list_ops, &args,
> -				 req_addr, req_range,
> +				 req_addr, req_range, flags,
>  				 req_obj, req_offset);
>  	if (ret)
>  		goto err_free_ops;
> diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> index 48f105239f42..26e13fcdbdb8 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> @@ -1303,6 +1303,7 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job,
>  			op->ops = drm_gpuvm_sm_map_ops_create(&uvmm->base,
>  							      op->va.addr,
>  							      op->va.range,
> +							      DRM_GPUVM_SM_MAP_NOT_MADVISE,
>  							      op->gem.obj,
>  							      op->gem.offset);
>  			if (IS_ERR(op->ops)) {
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 2035604121e6..b2ed99551b6e 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -2318,6 +2318,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_vma_ops *vops,
>  	case DRM_XE_VM_BIND_OP_MAP:
>  	case DRM_XE_VM_BIND_OP_MAP_USERPTR:
>  		ops = drm_gpuvm_sm_map_ops_create(&vm->gpuvm, addr, range,
> +						  DRM_GPUVM_SM_MAP_NOT_MADVISE,
>  						  obj, bo_offset_or_userptr);
>  		break;
>  	case DRM_XE_VM_BIND_OP_UNMAP:
> diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
> index 2a9629377633..c589b886a4fd 100644
> --- a/include/drm/drm_gpuvm.h
> +++ b/include/drm/drm_gpuvm.h
> @@ -211,6 +211,27 @@ enum drm_gpuvm_flags {
>  	DRM_GPUVM_USERBITS = BIT(1),
>  };
>  
> +/**
> + * enum drm_gpuvm_sm_map_ops_flags - flags for drm_gpuvm split/merge ops
> + */
> +enum drm_gpuvm_sm_map_ops_flags {
> +	/**
> +	 * @DRM_GPUVM_SM_MAP_NOT_MADVISE: DEFAULT sm_map ops
> +	 */
> +	DRM_GPUVM_SM_MAP_NOT_MADVISE = 0,
> +
> +	/**
> +	 * @DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
> +	 * drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
> +	 * user-provided range and split the existing non-GEM object VMA if the
> +	 * start or end of the input range lies within it. The operations can
> +	 * create up to 2 REMAPS and 2 MAPs. Unlike drm_gpuvm_sm_map_ops_flags
> +	 * in default mode, the operation with this flag will never have UNMAPs and
> +	 * merges, and can be without any final operations.
> +	 */
> +	DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE = BIT(0),
> +};
> +
>  /**
>   * struct drm_gpuvm - DRM GPU VA Manager
>   *
> @@ -1059,8 +1080,8 @@ struct drm_gpuva_ops {
>  #define drm_gpuva_next_op(op) list_next_entry(op, entry)
>  
>  struct drm_gpuva_ops *
> -drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm,
> -			    u64 addr, u64 range,
> +drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm, u64 addr, u64 range,
> +			    enum drm_gpuvm_sm_map_ops_flags flags,
>  			    struct drm_gem_object *obj, u64 offset);
>  struct drm_gpuva_ops *
>  drm_gpuvm_sm_unmap_ops_create(struct drm_gpuvm *gpuvm,
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops
  2025-07-27 21:18   ` Matthew Brost
@ 2025-07-28  6:16     ` Ghimiray, Himal Prasad
  0 siblings, 0 replies; 55+ messages in thread
From: Ghimiray, Himal Prasad @ 2025-07-28  6:16 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Thomas Hellström, Danilo Krummrich,
	Boris Brezillon, dri-devel



On 28-07-2025 02:48, Matthew Brost wrote:
> On Tue, Jul 22, 2025 at 07:05:04PM +0530, Himal Prasad Ghimiray wrote:
>> - DRM_GPUVM_SM_MAP_NOT_MADVISE: Default sm_map operations for the input
>>    range.
>>
>> - DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
>>    drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
>> user-provided range and split the existing non-GEM object VMA if the
>> start or end of the input range lies within it. The operations can
>> create up to 2 REMAPS and 2 MAPs. The purpose of this operation is to be
>> used by the Xe driver to assign attributes to GPUVMA's within the
>> user-defined range. Unlike drm_gpuvm_sm_map_ops_flags in default mode,
>> the operation with this flag will never have UNMAPs and
>> merges, and can be without any final operations.
>>
>> v2
>> - use drm_gpuvm_sm_map_ops_create with flags instead of defining new
>>    ops_create (Danilo)
>> - Add doc (Danilo)
>>
>> v3
>> - Fix doc
>> - Fix unmapping check
>>
>> v4
>> - Fix mapping for non madvise ops
>>
>> Cc: Danilo Krummrich <dakr@redhat.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Boris Brezillon <bbrezillon@kernel.org>
>> Cc: <dri-devel@lists.freedesktop.org>
>> Signed-off-by: Himal Prasad Ghimiray<himal.prasad.ghimiray@intel.com>
>> ---
>>   drivers/gpu/drm/drm_gpuvm.c            | 93 ++++++++++++++++++++------
>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c |  1 +
>>   drivers/gpu/drm/xe/xe_vm.c             |  1 +
>>   include/drm/drm_gpuvm.h                | 25 ++++++-
>>   4 files changed, 98 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
>> index e89b932e987c..c7779588ea38 100644
>> --- a/drivers/gpu/drm/drm_gpuvm.c
>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>> @@ -2103,10 +2103,13 @@ static int
>>   __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>>   		   const struct drm_gpuvm_ops *ops, void *priv,
>>   		   u64 req_addr, u64 req_range,
>> +		   enum drm_gpuvm_sm_map_ops_flags flags,
>>   		   struct drm_gem_object *req_obj, u64 req_offset)
>>   {
>>   	struct drm_gpuva *va, *next;
>>   	u64 req_end = req_addr + req_range;
>> +	bool is_madvise_ops = (flags == DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE);
>> +	bool needs_map = !is_madvise_ops;
>>   	int ret;
>>   
>>   	if (unlikely(!drm_gpuvm_range_valid(gpuvm, req_addr, req_range)))
>> @@ -2119,26 +2122,35 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>>   		u64 range = va->va.range;
>>   		u64 end = addr + range;
>>   		bool merge = !!va->gem.obj;
>> +		bool skip_madvise_ops = is_madvise_ops && merge;
>>   
>> +		needs_map = !is_madvise_ops;
>>   		if (addr == req_addr) {
>>   			merge &= obj == req_obj &&
>>   				 offset == req_offset;
>>   
>>   			if (end == req_end) {
>> -				ret = op_unmap_cb(ops, priv, va, merge);
>> -				if (ret)
>> -					return ret;
>> +				if (!is_madvise_ops) {
>> +					ret = op_unmap_cb(ops, priv, va, merge);
>> +					if (ret)
>> +						return ret;
>> +				}
>>   				break;
>>   			}
>>   
>>   			if (end < req_end) {
>> -				ret = op_unmap_cb(ops, priv, va, merge);
>> -				if (ret)
>> -					return ret;
>> +				if (!is_madvise_ops) {
>> +					ret = op_unmap_cb(ops, priv, va, merge);
>> +					if (ret)
>> +						return ret;
>> +				}
>>   				continue;
>>   			}
>>   
>>   			if (end > req_end) {
>> +				if (skip_madvise_ops)
>> +					break;
>> +
>>   				struct drm_gpuva_op_map n = {
>>   					.va.addr = req_end,
>>   					.va.range = range - req_range,
>> @@ -2153,6 +2165,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>>   				ret = op_remap_cb(ops, priv, NULL, &n, &u);
>>   				if (ret)
>>   					return ret;
>> +
>> +				if (is_madvise_ops)
>> +					needs_map = true;
>>   				break;
>>   			}
>>   		} else if (addr < req_addr) {
>> @@ -2170,20 +2185,42 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>>   			u.keep = merge;
>>   
>>   			if (end == req_end) {
>> +				if (skip_madvise_ops)
>> +					break;
>> +
>>   				ret = op_remap_cb(ops, priv, &p, NULL, &u);
>>   				if (ret)
>>   					return ret;
>> +
>> +				if (is_madvise_ops)
>> +					needs_map = true;
>> +
>>   				break;
>>   			}
>>   
>>   			if (end < req_end) {
>> +				if (skip_madvise_ops)
>> +					continue;
>> +
>>   				ret = op_remap_cb(ops, priv, &p, NULL, &u);
>>   				if (ret)
>>   					return ret;
>> +
>> +				if (is_madvise_ops) {
>> +					ret = op_map_cb(ops, priv, req_addr,
>> +							min(end - req_addr, req_end - end),
> 
> This doesn't look right.
> 
> This creating a new MAP operation to replace what the REMAP operation
> unmapped but didn't remap. In Xe debug speak, this is where we are:
> 
> REMAP:UNMAP
> REMAP:PREV
> MAP <-- This is the calculation we are doing.
> 
> We want to 'MAP' to size here to be:
> 
> 'REMAP:UNMAP.end - REMAP:PREV.end'
> 
> Which is 'end - req_addr'. So delete the min statement here and replace
> with 'end - req_addr'.
> 
> Matt

True, will fix this.

> 
>> +							NULL, req_offset);
>> +					if (ret)
>> +						return ret;
>> +				}
>> +
>>   				continue;
>>   			}
>>   
>>   			if (end > req_end) {
>> +				if (skip_madvise_ops)
>> +					break;
>> +
>>   				struct drm_gpuva_op_map n = {
>>   					.va.addr = req_end,
>>   					.va.range = end - req_end,
>> @@ -2195,6 +2232,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>>   				ret = op_remap_cb(ops, priv, &p, &n, &u);
>>   				if (ret)
>>   					return ret;
>> +
>> +				if (is_madvise_ops)
>> +					needs_map = true;
>>   				break;
>>   			}
>>   		} else if (addr > req_addr) {
>> @@ -2203,20 +2243,29 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>>   					   (addr - req_addr);
>>   
>>   			if (end == req_end) {
>> -				ret = op_unmap_cb(ops, priv, va, merge);
>> -				if (ret)
>> -					return ret;
>> +				if (!is_madvise_ops) {
>> +					ret = op_unmap_cb(ops, priv, va, merge);
>> +					if (ret)
>> +						return ret;
>> +				}
>> +
>>   				break;
>>   			}
>>   
>>   			if (end < req_end) {
>> -				ret = op_unmap_cb(ops, priv, va, merge);
>> -				if (ret)
>> -					return ret;
>> +				if (!is_madvise_ops) {
>> +					ret = op_unmap_cb(ops, priv, va, merge);
>> +					if (ret)
>> +						return ret;
>> +				}
>> +
>>   				continue;
>>   			}
>>   
>>   			if (end > req_end) {
>> +				if (skip_madvise_ops)
>> +					break;
>> +
>>   				struct drm_gpuva_op_map n = {
>>   					.va.addr = req_end,
>>   					.va.range = end - req_end,
>> @@ -2231,14 +2280,16 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>>   				ret = op_remap_cb(ops, priv, NULL, &n, &u);
>>   				if (ret)
>>   					return ret;
>> +
>> +				if (is_madvise_ops)
>> +					return op_map_cb(ops, priv, addr,
>> +							(req_end - addr), NULL, req_offset);
>>   				break;
>>   			}
>>   		}
>>   	}
>> -
>> -	return op_map_cb(ops, priv,
>> -			 req_addr, req_range,
>> -			 req_obj, req_offset);
>> +	return needs_map ? op_map_cb(ops, priv, req_addr,
>> +			   req_range, req_obj, req_offset) : 0;
>>   }
>>   
>>   static int
>> @@ -2337,15 +2388,15 @@ drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
>>   		 struct drm_gem_object *req_obj, u64 req_offset)
>>   {
>>   	const struct drm_gpuvm_ops *ops = gpuvm->ops;
>> +	enum drm_gpuvm_sm_map_ops_flags flags = DRM_GPUVM_SM_MAP_NOT_MADVISE;
>>   
>>   	if (unlikely(!(ops && ops->sm_step_map &&
>>   		       ops->sm_step_remap &&
>>   		       ops->sm_step_unmap)))
>>   		return -EINVAL;
>>   
>> -	return __drm_gpuvm_sm_map(gpuvm, ops, priv,
>> -				  req_addr, req_range,
>> -				  req_obj, req_offset);
>> +	return __drm_gpuvm_sm_map(gpuvm, ops, priv, req_addr, req_range,
>> +				  flags, req_obj, req_offset);
>>   }
>>   EXPORT_SYMBOL_GPL(drm_gpuvm_sm_map);
>>   
>> @@ -2487,6 +2538,7 @@ static const struct drm_gpuvm_ops gpuvm_list_ops = {
>>    * @gpuvm: the &drm_gpuvm representing the GPU VA space
>>    * @req_addr: the start address of the new mapping
>>    * @req_range: the range of the new mapping
>> + * @drm_gpuvm_sm_map_ops_flag: ops flag determining madvise or not
>>    * @req_obj: the &drm_gem_object to map
>>    * @req_offset: the offset within the &drm_gem_object
>>    *
>> @@ -2517,6 +2569,7 @@ static const struct drm_gpuvm_ops gpuvm_list_ops = {
>>   struct drm_gpuva_ops *
>>   drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm,
>>   			    u64 req_addr, u64 req_range,
>> +			    enum drm_gpuvm_sm_map_ops_flags flags,
>>   			    struct drm_gem_object *req_obj, u64 req_offset)
>>   {
>>   	struct drm_gpuva_ops *ops;
>> @@ -2536,7 +2589,7 @@ drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm,
>>   	args.ops = ops;
>>   
>>   	ret = __drm_gpuvm_sm_map(gpuvm, &gpuvm_list_ops, &args,
>> -				 req_addr, req_range,
>> +				 req_addr, req_range, flags,
>>   				 req_obj, req_offset);
>>   	if (ret)
>>   		goto err_free_ops;
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
>> index 48f105239f42..26e13fcdbdb8 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
>> @@ -1303,6 +1303,7 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job,
>>   			op->ops = drm_gpuvm_sm_map_ops_create(&uvmm->base,
>>   							      op->va.addr,
>>   							      op->va.range,
>> +							      DRM_GPUVM_SM_MAP_NOT_MADVISE,
>>   							      op->gem.obj,
>>   							      op->gem.offset);
>>   			if (IS_ERR(op->ops)) {
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index 2035604121e6..b2ed99551b6e 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -2318,6 +2318,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_vma_ops *vops,
>>   	case DRM_XE_VM_BIND_OP_MAP:
>>   	case DRM_XE_VM_BIND_OP_MAP_USERPTR:
>>   		ops = drm_gpuvm_sm_map_ops_create(&vm->gpuvm, addr, range,
>> +						  DRM_GPUVM_SM_MAP_NOT_MADVISE,
>>   						  obj, bo_offset_or_userptr);
>>   		break;
>>   	case DRM_XE_VM_BIND_OP_UNMAP:
>> diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
>> index 2a9629377633..c589b886a4fd 100644
>> --- a/include/drm/drm_gpuvm.h
>> +++ b/include/drm/drm_gpuvm.h
>> @@ -211,6 +211,27 @@ enum drm_gpuvm_flags {
>>   	DRM_GPUVM_USERBITS = BIT(1),
>>   };
>>   
>> +/**
>> + * enum drm_gpuvm_sm_map_ops_flags - flags for drm_gpuvm split/merge ops
>> + */
>> +enum drm_gpuvm_sm_map_ops_flags {
>> +	/**
>> +	 * @DRM_GPUVM_SM_MAP_NOT_MADVISE: DEFAULT sm_map ops
>> +	 */
>> +	DRM_GPUVM_SM_MAP_NOT_MADVISE = 0,
>> +
>> +	/**
>> +	 * @DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
>> +	 * drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
>> +	 * user-provided range and split the existing non-GEM object VMA if the
>> +	 * start or end of the input range lies within it. The operations can
>> +	 * create up to 2 REMAPS and 2 MAPs. Unlike drm_gpuvm_sm_map_ops_flags
>> +	 * in default mode, the operation with this flag will never have UNMAPs and
>> +	 * merges, and can be without any final operations.
>> +	 */
>> +	DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE = BIT(0),
>> +};
>> +
>>   /**
>>    * struct drm_gpuvm - DRM GPU VA Manager
>>    *
>> @@ -1059,8 +1080,8 @@ struct drm_gpuva_ops {
>>   #define drm_gpuva_next_op(op) list_next_entry(op, entry)
>>   
>>   struct drm_gpuva_ops *
>> -drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm,
>> -			    u64 addr, u64 range,
>> +drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm, u64 addr, u64 range,
>> +			    enum drm_gpuvm_sm_map_ops_flags flags,
>>   			    struct drm_gem_object *obj, u64 offset);
>>   struct drm_gpuva_ops *
>>   drm_gpuvm_sm_unmap_ops_create(struct drm_gpuvm *gpuvm,
>> -- 
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops
  2025-07-24 10:32       ` Caterina Shablia
@ 2025-07-28 10:20         ` Ghimiray, Himal Prasad
  0 siblings, 0 replies; 55+ messages in thread
From: Ghimiray, Himal Prasad @ 2025-07-28 10:20 UTC (permalink / raw)
  To: Caterina Shablia, Danilo Krummrich, Matthew Brost
  Cc: intel-xe, Thomas Hellström, Danilo Krummrich,
	Boris Brezillon, dri-devel



On 24-07-2025 16:02, Caterina Shablia wrote:
> El jueves, 24 de julio de 2025 2:43:56 (hora de verano de Europa central),
> Matthew Brost escribió:
>> On Tue, Jul 22, 2025 at 03:38:14PM +0200, Danilo Krummrich wrote:
>>> (Cc: Caterina)
>>>
>>> On Tue Jul 22, 2025 at 3:35 PM CEST, Himal Prasad Ghimiray wrote:
>>>> - DRM_GPUVM_SM_MAP_NOT_MADVISE: Default sm_map operations for the input
>>>>
>>>>    range.
>>>>
>>>> - DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
>>>>
>>>>    drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
>>>>
>>>> user-provided range and split the existing non-GEM object VMA if the
>>>> start or end of the input range lies within it. The operations can
>>>> create up to 2 REMAPS and 2 MAPs. The purpose of this operation is to be
>>>> used by the Xe driver to assign attributes to GPUVMA's within the
>>>> user-defined range. Unlike drm_gpuvm_sm_map_ops_flags in default mode,
>>>> the operation with this flag will never have UNMAPs and
>>>> merges, and can be without any final operations.
>>>>
>>>> v2
>>>> - use drm_gpuvm_sm_map_ops_create with flags instead of defining new
>>>>
>>>>    ops_create (Danilo)
>>>>
>>>> - Add doc (Danilo)
>>>>
>>>> v3
>>>> - Fix doc
>>>> - Fix unmapping check
>>>>
>>>> v4
>>>> - Fix mapping for non madvise ops
>>>>
>>>> Cc: Danilo Krummrich <dakr@redhat.com>
>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>> Cc: Boris Brezillon <bbrezillon@kernel.org>
>>>> Cc: <dri-devel@lists.freedesktop.org>
>>>> Signed-off-by: Himal Prasad Ghimiray<himal.prasad.ghimiray@intel.com>
>>>> ---
>>>>
>>>>   drivers/gpu/drm/drm_gpuvm.c            | 93 ++++++++++++++++++++------
>>>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c |  1 +
>>>>   drivers/gpu/drm/xe/xe_vm.c             |  1 +
>>>
>>> What about the other drivers using GPUVM, aren't they affected by the
>>> changes?
>> Yes, this seemly would break the build or other users. If the baseline
>> includes the patch below that I suggest to pull in this is a moot point
>> though.
>>
>>>>   include/drm/drm_gpuvm.h                | 25 ++++++-
>>>>   4 files changed, 98 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
>>>> index e89b932e987c..c7779588ea38 100644
>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>> @@ -2103,10 +2103,13 @@ static int
>>>>
>>>>   __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>>>>   
>>>>   		   const struct drm_gpuvm_ops *ops, void *priv,
>>>>   		   u64 req_addr, u64 req_range,
>>>>
>>>> +		   enum drm_gpuvm_sm_map_ops_flags flags,
>>>
>>> Please coordinate with Boris and Caterina here. They're adding a new
>>> request structure, struct drm_gpuvm_map_req.
>>>
>>> I think we can define it as
>>>
>>> 	struct drm_gpuvm_map_req {
>>> 	
>>> 		struct drm_gpuva_op_map map;
>>> 		struct drm_gpuvm_sm_map_ops_flags flags;
>>> 	
>>> 	}
>>
>> +1, I see the patch [2] and the suggested change to drm_gpuva_op_map
>> [3]. Both patch and your suggestion look good to me.
>>
>> Perhaps we try to accelerate [2] landing ahead of either series as
>> overall just looks like a good cleanup which can be merged asap.
> I'm not sure my patchset would be in a mergeable state any time soon -- I've
> discovered some issues with split/merge of repeated mappings while writing the
> doc, so it will be a while before I'll be submitting that again. [2] itself is
> in a good shape, absolutely feel free to submit that as part of your series.

Thanks for the confirmation. Will update next rev accordingly.

>>
>> Himal - I'd rebase on top [2], with Danilo suggestion in [3] if this
>> hasn't landed by your next rev.
>>
>> [2]
>> https://lore.kernel.org/all/20250707170442.1437009-4-caterina.shablia@colla
>> bora.com/ [3]
>> https://lore.kernel.org/all/DB61N61AKIJ3.FG7GUJBG386P@kernel.org/
>>> eventually.
>>>
>>> Please also coordinate on the changes in __drm_gpuvm_sm_map() below
>>> regarding Caterina's series [1], it looks like they're conflicting.
>>
>> It looks pretty minor actually. I'm sure if really matter who this is
>> race but yes, always good to coordinate.
>>
>>> [1]
>>> https://lore.kernel.org/all/20250707170442.1437009-1-caterina.shablia@col
>>> labora.com/>
>>>> +/**
>>>> + * enum drm_gpuvm_sm_map_ops_flags - flags for drm_gpuvm split/merge
>>>> ops
>>>> + */
>>>> +enum drm_gpuvm_sm_map_ops_flags {
>>>> +	/**
>>>> +	 * @DRM_GPUVM_SM_MAP_NOT_MADVISE: DEFAULT sm_map ops
>>>> +	 */
>>>> +	DRM_GPUVM_SM_MAP_NOT_MADVISE = 0,
>>>
>>> Why would we name this "NOT_MADVISE"? What if we add more flags for other
>>> purposes?
>>
>> How about...
>>
>> s/DRM_GPUVM_SM_MAP_NOT_MADVISE/DRM_GPUVM_SM_MAP_OPS_FLAG_NONE/
>>
>>>> +	/**
>>>> +	 * @DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE: This flag is used by
>>>> +	 * drm_gpuvm_sm_map_ops_create to iterate over GPUVMA's in the
>>>> +	 * user-provided range and split the existing non-GEM object VMA
> if
>>>> the
>>>> +	 * start or end of the input range lies within it. The operations
> can
>>>> +	 * create up to 2 REMAPS and 2 MAPs. Unlike
> drm_gpuvm_sm_map_ops_flags
>>>> +	 * in default mode, the operation with this flag will never have
>>>> UNMAPs and +	 * merges, and can be without any final operations.
>>>> +	 */
>>>> +	DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE = BIT(0),
>>
>> Then normalize this one...
>>
>> s/DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE/DRM_GPUVM_SM_MAP_OPS_FLAG_SPLIT_MA
>> DVISE/
>>
>> Matt
>>
>>>> +};
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 02/23] drm/xe/uapi: Add madvise interface
  2025-07-22 13:35 ` [PATCH v5 02/23] drm/xe/uapi: Add madvise interface Himal Prasad Ghimiray
@ 2025-07-29  3:29   ` Matthew Brost
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  3:29 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström

On Tue, Jul 22, 2025 at 07:05:05PM +0530, Himal Prasad Ghimiray wrote:
> This commit introduces a new madvise interface to support
> driver-specific ioctl operations. The madvise interface allows for more
> efficient memory management by providing hints to the driver about the
> expected memory usage and pte update policy for gpuvma.
> 
> v2 (Matthew/Thomas)
> - Drop num_ops support
> - Drop purgeable support
> - Add kernel-docs
> - IOWR/IOW
> 
> v3 (Matthew/Thomas)
> - Reorder attributes
> - use __u16 for migration_policy
> - use __u64 for reserved in unions
> - Avoid usage of vma
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  include/uapi/drm/xe_drm.h | 131 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 131 insertions(+)
> 
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index e2426413488f..51dcf63684b0 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -81,6 +81,7 @@ extern "C" {
>   *  - &DRM_IOCTL_XE_EXEC
>   *  - &DRM_IOCTL_XE_WAIT_USER_FENCE
>   *  - &DRM_IOCTL_XE_OBSERVATION
> + *  - &DRM_IOCTL_XE_MADVISE
>   */
>  
>  /*
> @@ -102,6 +103,7 @@ extern "C" {
>  #define DRM_XE_EXEC			0x09
>  #define DRM_XE_WAIT_USER_FENCE		0x0a
>  #define DRM_XE_OBSERVATION		0x0b
> +#define DRM_XE_MADVISE			0x0c
>  
>  /* Must be kept compact -- no holes */
>  
> @@ -117,6 +119,7 @@ extern "C" {
>  #define DRM_IOCTL_XE_EXEC			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC, struct drm_xe_exec)
>  #define DRM_IOCTL_XE_WAIT_USER_FENCE		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence)
>  #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
> +#define DRM_IOCTL_XE_MADVISE			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise)
>  
>  /**
>   * DOC: Xe IOCTL Extensions
> @@ -1974,6 +1977,134 @@ struct drm_xe_query_eu_stall {
>  	__u64 sampling_rates[];
>  };
>  
> +/**
> + * struct drm_xe_madvise - Input of &DRM_IOCTL_XE_MADVISE
> + *
> + * This structure is used to set memory attributes for a virtual address range
> + * in a VM. The type of attribute is specified by @type, and the corresponding
> + * union member is used to provide additional parameters for @type.
> + *
> + * Supported attribute types:
> + * - DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: Set preferred memory location.
> + * - DRM_XE_MEM_RANGE_ATTR_ATOMIC: Set atomic access policy.
> + * - DRM_XE_MEM_RANGE_ATTR_PAT: Set page attribute table index.
> + *
> + * Example:
> + *
> + * .. code-block:: C
> + *
> + * struct drm_xe_madvise madvise = {
> + *          .vm_id = vm_id,
> + *          .start = 0x100000,
> + *          .range = 0x2000,
> + *          .type = DRM_XE_MEM_RANGE_ATTR_ATOMIC,
> + *          .atomic_val = DRM_XE_ATOMIC_DEVICE,
> + *          .pad = 0,

Nit, you don't need to show init of zero fields.

Nit aside, uAPI looks good to me.
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> + *         };
> + *
> + * ioctl(fd, DRM_IOCTL_XE_MADVISE, &madvise);
> + *
> + */
> +struct drm_xe_madvise {
> +	/** @extensions: Pointer to the first extension struct, if any */
> +	__u64 extensions;
> +
> +	/** @start: start of the virtual address range */
> +	__u64 start;
> +
> +	/** @range: size of the virtual address range */
> +	__u64 range;
> +
> +	/** @vm_id: vm_id of the virtual range */
> +	__u32 vm_id;
> +
> +#define DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC	0
> +#define DRM_XE_MEM_RANGE_ATTR_ATOMIC		1
> +#define DRM_XE_MEM_RANGE_ATTR_PAT		2
> +	/** @type: type of attribute */
> +	__u32 type;
> +
> +	union {
> +		/**
> +		 * @preferred_mem_loc: preferred memory location
> +		 *
> +		 * Used when @type == DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC
> +		 *
> +		 * Supported values for @preferred_mem_loc.devmem_fd:
> +		 * - DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE: set vram of faulting tile as preferred loc
> +		 * - DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM: set smem as preferred loc
> +		 *
> +		 * Supported values for @preferred_mem_loc.migration_policy:
> +		 * - DRM_XE_MIGRATE_ALL_PAGES
> +		 * - DRM_XE_MIGRATE_ONLY_SYSTEM_PAGES
> +		 */
> +		struct {
> +#define DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE	0
> +#define DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM	-1
> +			/** @preferred_mem_loc.devmem_fd: fd for preferred loc */
> +			__u32 devmem_fd;
> +
> +#define DRM_XE_MIGRATE_ALL_PAGES		0
> +#define DRM_XE_MIGRATE_ONLY_SYSTEM_PAGES	1
> +			/** @preferred_mem_loc.migration_policy: Page migration policy */
> +			__u16 migration_policy;
> +
> +			/** @preferred_mem_loc.pad : MBZ */
> +			__u16 pad;
> +
> +			/** @preferred_mem_loc.reserved : Reserved */
> +			__u64 reserved;
> +		} preferred_mem_loc;
> +
> +		/**
> +		 * @atomic: Atomic access policy
> +		 *
> +		 * Used when @type == DRM_XE_MEM_RANGE_ATTR_ATOMIC.
> +		 *
> +		 * Supported values for @atomic.val:
> +		 * - DRM_XE_ATOMIC_UNDEFINED: Undefined or default behaviour
> +		 *   Support both GPU and CPU atomic operations for system allocator
> +		 *   Support GPU atomic operations for normal(bo) allocator
> +		 * - DRM_XE_ATOMIC_DEVICE: Support GPU atomic operations
> +		 * - DRM_XE_ATOMIC_GLOBAL: Support both GPU and CPU atomic operations
> +		 * - DRM_XE_ATOMIC_CPU: Support CPU atomic
> +		 */
> +		struct {
> +#define DRM_XE_ATOMIC_UNDEFINED	0
> +#define DRM_XE_ATOMIC_DEVICE	1
> +#define DRM_XE_ATOMIC_GLOBAL	2
> +#define DRM_XE_ATOMIC_CPU	3
> +			/** @atomic.val: value of atomic operation */
> +			__u32 val;
> +
> +			/** @atomic.pad: MBZ */
> +			__u32 pad;
> +
> +			/** @atomic.reserved: Reserved */
> +			__u64 reserved;
> +		} atomic;
> +
> +		/**
> +		 * @pat_index: Page attribute table index
> +		 *
> +		 * Used when @type == DRM_XE_MEM_RANGE_ATTR_PAT.
> +		 */
> +		struct {
> +			/** @pat_index.val: PAT index value */
> +			__u32 val;
> +
> +			/** @pat_index.pad: MBZ */
> +			__u32 pad;
> +
> +			/** @pat_index.reserved: Reserved */
> +			__u64 reserved;
> +		} pat_index;
> +	};
> +
> +	/** @reserved: Reserved */
> +	__u64 reserved[2];
> +};
> +
>  #if defined(__cplusplus)
>  }
>  #endif
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 08/23] drm/xe: Allow CPU address mirror VMA unbind with gpu bindings for madvise
  2025-07-22 13:35 ` [PATCH v5 08/23] drm/xe: Allow CPU address mirror VMA unbind with gpu bindings for madvise Himal Prasad Ghimiray
@ 2025-07-29  3:40   ` Matthew Brost
  2025-07-29  7:42     ` Ghimiray, Himal Prasad
  0 siblings, 1 reply; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  3:40 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström

On Tue, Jul 22, 2025 at 07:05:11PM +0530, Himal Prasad Ghimiray wrote:
> In the case of the MADVISE ioctl, if the start or end addresses fall
> within a VMA and existing SVM ranges are present, remove the existing
> SVM mappings. Then, continue with ops_parse to create new VMAs by REMAP
> unmapping of old one.
> 
> v2 (Matthew Brost)
> - Use vops flag to call unmapping of ranges in vm_bind_ioctl_ops_parse
> - Rename the function
> 
> v3
> - Fix doc
> 
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_svm.c | 28 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_svm.h |  7 +++++++
>  drivers/gpu/drm/xe/xe_vm.c  |  8 ++++++--
>  3 files changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index a7ff5975873f..ce8a71b80811 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -933,6 +933,34 @@ bool xe_svm_has_mapping(struct xe_vm *vm, u64 start, u64 end)
>  	return drm_gpusvm_has_mapping(&vm->svm.gpusvm, start, end);
>  }
>  
> +/**
> + * xe_svm_unmap_address_range - UNMAP SVM mappings and ranges
> + * @vm: The VM
> + * @start: start addr
> + * @end: end addr
> + *
> + * This function UNMAPS svm ranges if start or end address are inside them.
> + */
> +void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end)
> +{
> +	struct drm_gpusvm_notifier *notifier, *next;
> +
> +	lockdep_assert_held_write(&vm->lock);
> +
> +	drm_gpusvm_for_each_notifier_safe(notifier, next, &vm->svm.gpusvm, start, end) {
> +		struct drm_gpusvm_range *range, *__next;
> +
> +		drm_gpusvm_for_each_range_safe(range, __next, notifier, start, end) {
> +			if (start > drm_gpusvm_range_start(range) ||
> +			    end < drm_gpusvm_range_end(range)) {
> +				if (IS_DGFX(vm->xe) && xe_svm_range_in_vram(to_xe_range(range)))
> +					drm_gpusvm_range_evict(&vm->svm.gpusvm, range);
> +				__xe_svm_garbage_collector(vm, to_xe_range(range));

There is a corner here - the range could be in the garbage collector
list...

I think to fix you have to do this:

drm_gpusvm_range_get(range);
__xe_svm_garbage_collector(vm, to_xe_range(range));
if (!list_empty(&to_xe_range(range)->garbage_collector_link)) {
	spin_lock(&vm->svm.garbage_collector.list_lock);
	list_del(&to_xe_range(range)->garbage_collector_link);	
	spin_unlock(&vm->svm.garbage_collector.list_lock);
}
drm_gpusvm_range_put(range);

A little convoluted as it is only safe to check if the range is in the
garbage collector list after it has been removed from the notifier,
hence the need for extra ref counting here.

Also I believe this code path will need an IGT specifically to test this
code path.

Roughly...

buf = aligned_alloc(SZ_2M, SZ_2M);
fault_in_buf_on_gpu();
madvise(buf, SZ_1M, some attribute);
fault_in_buf_on_gpu();	/* Ideally showing different behavior between 2 chunks */
read_buf_back_via_cpu();

Matt

> +			}
> +		}
> +	}
> +}
> +
>  /**
>   * xe_svm_bo_evict() - SVM evict BO to system memory
>   * @bo: BO to evict
> diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
> index da9a69ea0bb1..754d56b4d255 100644
> --- a/drivers/gpu/drm/xe/xe_svm.h
> +++ b/drivers/gpu/drm/xe/xe_svm.h
> @@ -90,6 +90,8 @@ bool xe_svm_range_validate(struct xe_vm *vm,
>  
>  u64 xe_svm_find_vma_start(struct xe_vm *vm, u64 addr, u64 end,  struct xe_vma *vma);
>  
> +void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end);
> +
>  /**
>   * xe_svm_range_has_dma_mapping() - SVM range has DMA mapping
>   * @range: SVM range
> @@ -303,6 +305,11 @@ u64 xe_svm_find_vma_start(struct xe_vm *vm, u64 addr, u64 end, struct xe_vma *vm
>  	return ULONG_MAX;
>  }
>  
> +static inline
> +void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end)
> +{
> +}
> +
>  #define xe_svm_assert_in_notifier(...) do {} while (0)
>  #define xe_svm_range_has_dma_mapping(...) false
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index a56384325f4d..7f3d0ad04b3f 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -2663,8 +2663,12 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
>  				end = op->base.remap.next->va.addr;
>  
>  			if (xe_vma_is_cpu_addr_mirror(old) &&
> -			    xe_svm_has_mapping(vm, start, end))
> -				return -EBUSY;
> +			    xe_svm_has_mapping(vm, start, end)) {
> +				if (vops->flags & XE_VMA_OPS_FLAG_MADVISE)
> +					xe_svm_unmap_address_range(vm, start, end);
> +				else
> +					return -EBUSY;
> +			}
>  
>  			op->remap.start = xe_vma_start(old);
>  			op->remap.range = xe_vma_size(old);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 09/23] drm/xe/svm: Add xe_svm_ranges_zap_ptes_in_range() for PTE zapping
  2025-07-22 13:35 ` [PATCH v5 09/23] drm/xe/svm: Add xe_svm_ranges_zap_ptes_in_range() for PTE zapping Himal Prasad Ghimiray
@ 2025-07-29  3:42   ` Matthew Brost
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  3:42 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström

On Tue, Jul 22, 2025 at 07:05:12PM +0530, Himal Prasad Ghimiray wrote:
> Introduce xe_svm_ranges_zap_ptes_in_range(), a function to zap page table
> entries (PTEs) for all SVM ranges within a user-specified address range.
> 
> -v2 (Matthew Brost)
> Lock should be called even for tlb_invalidation
> 
> v3(Matthew Brost)
> - Update comment
> - s/notifier->itree.start/drm_gpusvm_notifier_start
> - s/notifier->itree.last + 1/drm_gpusvm_notifier_end
> - use WRITE_ONCE
> 
> Cc: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_pt.c  | 14 ++++++++++-
>  drivers/gpu/drm/xe/xe_svm.c | 50 +++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_svm.h |  8 ++++++
>  3 files changed, 71 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 1bf0cf81513c..b499006df2cf 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -950,7 +950,19 @@ bool xe_pt_zap_ptes_range(struct xe_tile *tile, struct xe_vm *vm,
>  	struct xe_pt *pt = vm->pt_root[tile->id];
>  	u8 pt_mask = (range->tile_present & ~range->tile_invalidated);
>  
> -	xe_svm_assert_in_notifier(vm);
> +	/*
> +	 * Locking rules:
> +	 *
> +	 * - notifier_lock (write): full protection against page table changes
> +	 *   and MMU notifier invalidations.
> +	 *
> +	 * - notifier_lock (read) + vm_lock (write): combined protection against
> +	 *   invalidations and concurrent page table modifications. (e.g., madvise)
> +	 *
> +	 */
> +	lockdep_assert(lockdep_is_held_type(&vm->svm.gpusvm.notifier_lock, 0) ||
> +		       (lockdep_is_held_type(&vm->svm.gpusvm.notifier_lock, 1) &&
> +		       lockdep_is_held_type(&vm->lock, 0)));
>  
>  	if (!(pt_mask & BIT(tile->id)))
>  		return false;
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index ce8a71b80811..c093dc453e32 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -1025,6 +1025,56 @@ int xe_svm_range_get_pages(struct xe_vm *vm, struct xe_svm_range *range,
>  	return err;
>  }
>  
> +/**
> + * xe_svm_ranges_zap_ptes_in_range - clear ptes of svm ranges in input range
> + * @vm: Pointer to the xe_vm structure
> + * @start: Start of the input range
> + * @end: End of the input range
> + *
> + * This function removes the page table entries (PTEs) associated
> + * with the svm ranges within the given input start and end
> + *
> + * Return: tile_mask for which gt's need to be tlb invalidated.
> + */
> +u8 xe_svm_ranges_zap_ptes_in_range(struct xe_vm *vm, u64 start, u64 end)
> +{
> +	struct drm_gpusvm_notifier *notifier;
> +	struct xe_svm_range *range;
> +	u64 adj_start, adj_end;
> +	struct xe_tile *tile;
> +	u8 tile_mask = 0;
> +	u8 id;
> +
> +	lockdep_assert(lockdep_is_held_type(&vm->svm.gpusvm.notifier_lock, 1) &&
> +		       lockdep_is_held_type(&vm->lock, 0));
> +
> +	drm_gpusvm_for_each_notifier(notifier, &vm->svm.gpusvm, start, end) {
> +		struct drm_gpusvm_range *r = NULL;
> +
> +		adj_start = max(start, drm_gpusvm_notifier_start(notifier));
> +		adj_end = min(end, drm_gpusvm_notifier_end(notifier));
> +		drm_gpusvm_for_each_range(r, notifier, adj_start, adj_end) {
> +			range = to_xe_range(r);
> +			for_each_tile(tile, vm->xe, id) {
> +				if (xe_pt_zap_ptes_range(tile, vm, range)) {
> +					tile_mask |= BIT(id);
> +					/*
> +					 * WRITE_ONCE pairs with READ_ONCE in
> +					 * xe_vm_has_valid_gpu_mapping().
> +					 * Must not fail after setting
> +					 * tile_invalidated and before
> +					 * TLB invalidation.
> +					 */
> +					WRITE_ONCE(range->tile_invalidated,
> +						   range->tile_invalidated | BIT(id));
> +				}
> +			}
> +		}
> +	}
> +
> +	return tile_mask;
> +}
> +
>  #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
>  
>  /**
> diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
> index 754d56b4d255..b0da0e85f0b8 100644
> --- a/drivers/gpu/drm/xe/xe_svm.h
> +++ b/drivers/gpu/drm/xe/xe_svm.h
> @@ -92,6 +92,8 @@ u64 xe_svm_find_vma_start(struct xe_vm *vm, u64 addr, u64 end,  struct xe_vma *v
>  
>  void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end);
>  
> +u8 xe_svm_ranges_zap_ptes_in_range(struct xe_vm *vm, u64 start, u64 end);
> +
>  /**
>   * xe_svm_range_has_dma_mapping() - SVM range has DMA mapping
>   * @range: SVM range
> @@ -310,6 +312,12 @@ void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end)
>  {
>  }
>  
> +static inline
> +u8 xe_svm_ranges_zap_ptes_in_range(struct xe_vm *vm, u64 start, u64 end)
> +{
> +	return 0;
> +}
> +
>  #define xe_svm_assert_in_notifier(...) do {} while (0)
>  #define xe_svm_range_has_dma_mapping(...) false
>  
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 10/23] drm/xe: Implement madvise ioctl for xe
  2025-07-22 13:35 ` [PATCH v5 10/23] drm/xe: Implement madvise ioctl for xe Himal Prasad Ghimiray
@ 2025-07-29  3:52   ` Matthew Brost
  2025-07-29  4:23     ` Matthew Brost
  0 siblings, 1 reply; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  3:52 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström, Shuicheng Lin

On Tue, Jul 22, 2025 at 07:05:13PM +0530, Himal Prasad Ghimiray wrote:
> This driver-specific ioctl enables UMDs to control the memory attributes
> for GPU VMAs within a specified input range. If the start or end
> addresses fall within an existing VMA, the VMA is split accordingly. The
> attributes of the VMA are modified as provided by the users. The old
> mappings of the VMAs are invalidated, and TLB invalidation is performed
> if necessary.
> 
> v2(Matthew brost)
> - xe_vm_in_fault_mode can't be enabled by Mesa, hence allow ioctl in non
> fault mode too
> - fix tlb invalidation skip for same ranges in multiple op
> - use helper for tlb invalidation
> - use xe_svm_notifier_lock/unlock helper
> - s/lockdep_assert_held/lockdep_assert_held_write
> - Add kernel-doc
> 
> v3(Matthew Brost)
> - make vfunc fail safe
> - Add sanitizing input args before vfunc
> 
> v4(Matthew Brost/Shuicheng)
> - Make locks interruptable
> - Error handling fixes
> - vm_put fixes
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Shuicheng Lin <shuicheng.lin@intel.com>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/xe/Makefile        |   1 +
>  drivers/gpu/drm/xe/xe_vm_madvise.c | 306 +++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_vm_madvise.h |  15 ++
>  3 files changed, 322 insertions(+)
>  create mode 100644 drivers/gpu/drm/xe/xe_vm_madvise.c
>  create mode 100644 drivers/gpu/drm/xe/xe_vm_madvise.h
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 83a36c47a2f9..fa52866bb72c 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -125,6 +125,7 @@ xe-y += xe_bb.o \
>  	xe_uc.o \
>  	xe_uc_fw.o \
>  	xe_vm.o \
> +	xe_vm_madvise.o \
>  	xe_vram.o \
>  	xe_vram_freq.o \
>  	xe_vsec.o \
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
> new file mode 100644
> index 000000000000..f64728120d7c
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include "xe_vm_madvise.h"
> +
> +#include <linux/nospec.h>
> +#include <drm/xe_drm.h>
> +
> +#include "xe_bo.h"
> +#include "xe_pt.h"
> +#include "xe_svm.h"
> +
> +struct xe_vmas_in_madvise_range {
> +	u64 addr;
> +	u64 range;
> +	struct xe_vma **vmas;
> +	int num_vmas;
> +	bool has_svm_vmas;
> +	bool has_bo_vmas;
> +	bool has_userptr_vmas;
> +};
> +
> +static int get_vmas(struct xe_vm *vm, struct xe_vmas_in_madvise_range *madvise_range)
> +{
> +	u64 addr = madvise_range->addr;
> +	u64 range = madvise_range->range;
> +
> +	struct xe_vma  **__vmas;
> +	struct drm_gpuva *gpuva;
> +	int max_vmas = 8;
> +
> +	lockdep_assert_held(&vm->lock);
> +
> +	madvise_range->num_vmas = 0;
> +	madvise_range->vmas = kmalloc_array(max_vmas, sizeof(*madvise_range->vmas), GFP_KERNEL);
> +	if (!madvise_range->vmas)
> +		return -ENOMEM;
> +
> +	vm_dbg(&vm->xe->drm, "VMA's in range: start=0x%016llx, end=0x%016llx", addr, addr + range);
> +
> +	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, addr, addr + range) {
> +		struct xe_vma *vma = gpuva_to_vma(gpuva);
> +
> +		if (xe_vma_bo(vma))
> +			madvise_range->has_bo_vmas = true;
> +		else if (xe_vma_is_cpu_addr_mirror(vma))
> +			madvise_range->has_svm_vmas = true;
> +		else if (xe_vma_is_userptr(vma))
> +			madvise_range->has_userptr_vmas = true;
> +
> +		if (madvise_range->num_vmas == max_vmas) {
> +			max_vmas <<= 1;
> +			__vmas = krealloc(madvise_range->vmas,
> +					  max_vmas * sizeof(*madvise_range->vmas),
> +					  GFP_KERNEL);
> +			if (!__vmas) {
> +				kfree(madvise_range->vmas);
> +				return -ENOMEM;
> +			}
> +			madvise_range->vmas = __vmas;
> +		}
> +
> +		madvise_range->vmas[madvise_range->num_vmas] = vma;
> +		(madvise_range->num_vmas)++;
> +	}
> +
> +	if (!madvise_range->num_vmas)
> +		kfree(madvise_range->vmas);
> +
> +	vm_dbg(&vm->xe->drm, "madvise_range-num_vmas = %d\n", madvise_range->num_vmas);
> +
> +	return 0;
> +}
> +
> +static void madvise_preferred_mem_loc(struct xe_device *xe, struct xe_vm *vm,
> +				      struct xe_vma **vmas, int num_vmas,
> +				      struct drm_xe_madvise *op)
> +{
> +	/* Implementation pending */
> +}
> +
> +static void madvise_atomic(struct xe_device *xe, struct xe_vm *vm,
> +			   struct xe_vma **vmas, int num_vmas,
> +			   struct drm_xe_madvise *op)
> +{
> +	/* Implementation pending */
> +}
> +
> +static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
> +			      struct xe_vma **vmas, int num_vmas,
> +			      struct drm_xe_madvise *op)
> +{
> +	/* Implementation pending */
> +}
> +
> +typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
> +			     struct xe_vma **vmas, int num_vmas,
> +			     struct drm_xe_madvise *op);
> +
> +static const madvise_func madvise_funcs[] = {
> +	[DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC] = madvise_preferred_mem_loc,
> +	[DRM_XE_MEM_RANGE_ATTR_ATOMIC] = madvise_atomic,
> +	[DRM_XE_MEM_RANGE_ATTR_PAT] = madvise_pat_index,
> +};
> +
> +static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start, u64 end)
> +{
> +	struct drm_gpuva *gpuva;
> +	struct xe_tile *tile;
> +	u8 id, tile_mask;
> +
> +	lockdep_assert_held_write(&vm->lock);
> +
> +	/* Wait for pending binds */
> +	if (dma_resv_wait_timeout(xe_vm_resv(vm), DMA_RESV_USAGE_BOOKKEEP,
> +				  false, MAX_SCHEDULE_TIMEOUT) <= 0)
> +		XE_WARN_ON(1);
> +
> +	tile_mask = xe_svm_ranges_zap_ptes_in_range(vm, start, end);
> +
> +	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, start, end) {
> +		struct xe_vma *vma = gpuva_to_vma(gpuva);
> +
> +		if (xe_vma_is_cpu_addr_mirror(vma))

I think:

xe_vma_is_cpu_addr_mirror(vma) || xe_vma_is_null(vma)

No need to invalidate NULL VMA's as those mappings are not modified by
madvise (i.e., you can call madvise on NULL mappings but it doesn't
actually do anything).

> +			continue;
> +
> +		for_each_tile(tile, vm->xe, id) {
> +			if (xe_pt_zap_ptes(tile, vma)) {
> +				tile_mask |= BIT(id);
> +
> +				/*
> +				 * WRITE_ONCE pairs with READ_ONCE
> +				 * in xe_vm_has_valid_gpu_mapping()
> +				 */
> +				WRITE_ONCE(vma->tile_invalidated,
> +					   vma->tile_invalidated | BIT(id));
> +			}
> +		}
> +	}
> +
> +	return tile_mask;
> +}
> +
> +static int xe_vm_invalidate_madvise_range(struct xe_vm *vm, u64 start, u64 end)
> +{
> +	u8 tile_mask = xe_zap_ptes_in_madvise_range(vm, start, end);
> +
> +	if (!tile_mask)
> +		return 0;
> +
> +	xe_device_wmb(vm->xe);
> +
> +	return xe_vm_range_tilemask_tlb_invalidation(vm, start, end, tile_mask);
> +}
> +
> +static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madvise *args)
> +{
> +	if (XE_IOCTL_DBG(xe, !args))
> +		return false;
> +
> +	if (XE_IOCTL_DBG(xe, !IS_ALIGNED(args->start, SZ_4K)))
> +		return false;
> +
> +	if (XE_IOCTL_DBG(xe, !IS_ALIGNED(args->range, SZ_4K)))
> +		return false;
> +
> +	if (XE_IOCTL_DBG(xe, args->range < SZ_4K))
> +		return false;
> +
> +	switch (args->type) {
> +	case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC:
> +		if (XE_IOCTL_DBG(xe, args->preferred_mem_loc.migration_policy >
> +				     DRM_XE_MIGRATE_ONLY_SYSTEM_PAGES))
> +			return false;
> +
> +		if (XE_IOCTL_DBG(xe, args->preferred_mem_loc.pad))
> +			return false;
> +
> +		if (XE_IOCTL_DBG(xe, args->atomic.reserved))
> +			return false;
> +		break;
> +	case DRM_XE_MEM_RANGE_ATTR_ATOMIC:
> +		if (XE_IOCTL_DBG(xe, args->atomic.val > DRM_XE_ATOMIC_CPU))
> +			return false;
> +
> +		if (XE_IOCTL_DBG(xe, args->atomic.pad))
> +			return false;
> +
> +		if (XE_IOCTL_DBG(xe, args->atomic.reserved))
> +			return false;
> +
> +		break;
> +	case DRM_XE_MEM_RANGE_ATTR_PAT:
> +		/*TODO: Add valid pat check */
> +		break;
> +	default:
> +		if (XE_IOCTL_DBG(xe, 1))
> +			return false;
> +	}
> +
> +	if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
> +		return false;
> +
> +	return true;
> +}
> +
> +/**
> + * xe_vm_madvise_ioctl - Handle MADVise ioctl for a VM
> + * @dev: DRM device pointer
> + * @data: Pointer to ioctl data (drm_xe_madvise*)
> + * @file: DRM file pointer
> + *
> + * Handles the MADVISE ioctl to provide memory advice for vma's within
> + * input range.
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> +{
> +	struct xe_device *xe = to_xe_device(dev);
> +	struct xe_file *xef = to_xe_file(file);
> +	struct drm_xe_madvise *args = data;
> +	struct xe_vmas_in_madvise_range madvise_range = {.addr = args->start,
> +							 .range =  args->range, };
> +	struct xe_vm *vm;
> +	struct drm_exec exec;
> +	int err, attr_type;
> +
> +	vm = xe_vm_lookup(xef, args->vm_id);
> +	if (XE_IOCTL_DBG(xe, !vm))
> +		return -EINVAL;
> +
> +	if (!madvise_args_are_sane(vm->xe, args)) {
> +		err = -EINVAL;
> +		goto put_vm;
> +	}
> +

I think as this code can modify the ranges during a VMA split, you will
need to ensure all queued unmaps prior to this are complete.

So call xe_svm_flush(vm) prior to taking any locks.

Looks good otherwise.

Matt

> +	err = down_write_killable(&vm->lock);
> +	if (err)
> +		goto put_vm;
> +
> +	if (XE_IOCTL_DBG(xe, xe_vm_is_closed_or_banned(vm))) {
> +		err = -ENOENT;
> +		goto unlock_vm;
> +	}
> +
> +	err = xe_vm_alloc_madvise_vma(vm, args->start, args->range);
> +	if (err)
> +		goto unlock_vm;
> +
> +	err = get_vmas(vm, &madvise_range);
> +	if (err || !madvise_range.num_vmas)
> +		goto unlock_vm;
> +
> +	if (madvise_range.has_bo_vmas) {
> +		drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES | DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> +		drm_exec_until_all_locked(&exec) {
> +			for (int i = 0; i < madvise_range.num_vmas; i++) {
> +				struct xe_bo *bo = xe_vma_bo(madvise_range.vmas[i]);
> +
> +				if (!bo)
> +					continue;
> +				err = drm_exec_lock_obj(&exec, &bo->ttm.base);
> +				drm_exec_retry_on_contention(&exec);
> +				if (err)
> +					goto err_fini;
> +			}
> +		}
> +	}
> +
> +	if (madvise_range.has_userptr_vmas) {
> +		err = down_read_interruptible(&vm->userptr.notifier_lock);
> +		if (err)
> +			goto err_fini;
> +	}
> +
> +	if (madvise_range.has_svm_vmas) {
> +		err = down_read_interruptible(&vm->svm.gpusvm.notifier_lock);
> +		if (err)
> +			goto unlock_userptr;
> +	}
> +
> +	attr_type = array_index_nospec(args->type, ARRAY_SIZE(madvise_funcs));
> +	madvise_funcs[attr_type](xe, vm, madvise_range.vmas, madvise_range.num_vmas, args);
> +
> +	err = xe_vm_invalidate_madvise_range(vm, args->start, args->start + args->range);
> +
> +	if (madvise_range.has_svm_vmas)
> +		xe_svm_notifier_unlock(vm);
> +
> +unlock_userptr:
> +	if (madvise_range.has_userptr_vmas)
> +		up_read(&vm->userptr.notifier_lock);
> +err_fini:
> +	if (madvise_range.has_bo_vmas)
> +		drm_exec_fini(&exec);
> +	kfree(madvise_range.vmas);
> +	madvise_range.vmas = NULL;
> +unlock_vm:
> +	up_write(&vm->lock);
> +put_vm:
> +	xe_vm_put(vm);
> +	return err;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.h b/drivers/gpu/drm/xe/xe_vm_madvise.h
> new file mode 100644
> index 000000000000..b0e1fc445f23
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_VM_MADVISE_H_
> +#define _XE_VM_MADVISE_H_
> +
> +struct drm_device;
> +struct drm_file;
> +
> +int xe_vm_madvise_ioctl(struct drm_device *dev, void *data,
> +			struct drm_file *file);
> +
> +#endif
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 11/23] drm/xe/svm : Add svm ranges migration policy on atomic access
  2025-07-22 13:35 ` [PATCH v5 11/23] drm/xe/svm : Add svm ranges migration policy on atomic access Himal Prasad Ghimiray
@ 2025-07-29  4:04   ` Matthew Brost
  2025-07-30  4:59     ` Ghimiray, Himal Prasad
  0 siblings, 1 reply; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  4:04 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström

On Tue, Jul 22, 2025 at 07:05:14PM +0530, Himal Prasad Ghimiray wrote:
> If the platform does not support atomic access on system memory, and the
> ranges are in system memory, but the user requires atomic accesses on
> the VMA, then migrate the ranges to VRAM. Apply this policy for prefetch
> operations as well.
> 
> v2
> - Drop unnecessary vm_dbg
> 
> v3 (Matthew Brost)
> - fix atomic policy
> - prefetch shouldn't have any impact of atomic
> - bo can be accessed from vma, avoid duplicate parameter
> 
> v4 (Matthew Brost)
> - Remove TODO comment
> - Fix comment
> - Dont allow gpu atomic ops when user is setting atomic attr as CPU
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_pt.c         | 23 +++++++++--------
>  drivers/gpu/drm/xe/xe_svm.c        |  2 +-
>  drivers/gpu/drm/xe/xe_vm.c         | 40 ++++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_vm.h         |  2 ++
>  drivers/gpu/drm/xe/xe_vm_madvise.c |  9 ++++++-
>  5 files changed, 64 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index b499006df2cf..96d0ffe8154e 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -640,28 +640,31 @@ static const struct xe_pt_walk_ops xe_pt_stage_bind_ops = {
>   *    - In all other cases device atomics will be disabled with AE=0 until an application
>   *      request differently using a ioctl like madvise.
>   */
> -static bool xe_atomic_for_vram(struct xe_vm *vm)
> +static bool xe_atomic_for_vram(struct xe_vm *vm, struct xe_vma *vma)
>  {
> +	if (vma->attr.atomic_access == DRM_XE_ATOMIC_CPU)
> +		return false;
> +
>  	return true;
>  }
>  
> -static bool xe_atomic_for_system(struct xe_vm *vm, struct xe_bo *bo)
> +static bool xe_atomic_for_system(struct xe_vm *vm, struct xe_vma *vma)
>  {
>  	struct xe_device *xe = vm->xe;
> +	struct xe_bo *bo = xe_vma_bo(vma);
>  
> -	if (!xe->info.has_device_atomics_on_smem)
> +	if (!xe->info.has_device_atomics_on_smem ||
> +	    vma->attr.atomic_access == DRM_XE_ATOMIC_CPU)
>  		return false;
>  
> +	if (vma->attr.atomic_access == DRM_XE_ATOMIC_DEVICE)
> +		return true;
> +
>  	/*
>  	 * If a SMEM+LMEM allocation is backed by SMEM, a device
>  	 * atomics will cause a gpu page fault and which then
>  	 * gets migrated to LMEM, bind such allocations with
>  	 * device atomics enabled.
> -	 *
> -	 * TODO: Revisit this. Perhaps add something like a
> -	 * fault_on_atomics_in_system UAPI flag.
> -	 * Note that this also prohibits GPU atomics in LR mode for
> -	 * userptr and system memory on DGFX.
>  	 */
>  	return (!IS_DGFX(xe) || (!xe_vm_in_lr_mode(vm) ||
>  				 (bo && xe_bo_has_single_placement(bo))));
> @@ -744,8 +747,8 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>  		goto walk_pt;
>  
>  	if (vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) {
> -		xe_walk.default_vram_pte = xe_atomic_for_vram(vm) ? XE_USM_PPGTT_PTE_AE : 0;
> -		xe_walk.default_system_pte = xe_atomic_for_system(vm, bo) ?
> +		xe_walk.default_vram_pte = xe_atomic_for_vram(vm, vma) ? XE_USM_PPGTT_PTE_AE : 0;
> +		xe_walk.default_system_pte = xe_atomic_for_system(vm, vma) ?
>  			XE_USM_PPGTT_PTE_AE : 0;
>  	}
>  
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index c093dc453e32..49d3405aacb9 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -813,7 +813,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>  			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP),
>  		.check_pages_threshold = IS_DGFX(vm->xe) &&
>  			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ? SZ_64K : 0,
> -		.devmem_only = atomic && IS_DGFX(vm->xe) &&
> +		.devmem_only = xe_vma_need_vram_for_atomic(vm->xe, vma, atomic) &&
>  			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP),
>  		.timeslice_ms = atomic && IS_DGFX(vm->xe) &&
>  			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ?
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 7f3d0ad04b3f..be51fcf322ec 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -4177,6 +4177,46 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap)
>  	kvfree(snap);
>  }
>  
> +/**
> + * xe_vma_need_vram_for_atomic - Check if VMA needs VRAM migration for atomic operations
> + * @xe: Pointer to the XE device structure
> + * @vma: Pointer to the virtual memory area (VMA) structure
> + * @is_atomic: In pagefault path and atomic operation
> + *
> + * This function determines whether the given VMA needs to be migrated to
> + * VRAM in order to do atomic GPU operation.
> + *
> + * Return: true if migration to VRAM is required, false otherwise.
> + */
> +bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool is_atomic)
> +{
> +	if (!IS_DGFX(xe))
> +		return false;
> +
> +	/*
> +	 * NOTE: The checks implemented here are platform-specific. For
> +	 * instance, on a device supporting CXL atomics, these would ideally
> +	 * work universally without additional handling.
> +	 */
> +	switch (vma->attr.atomic_access) {
> +	case DRM_XE_ATOMIC_DEVICE:
> +		return !xe->info.has_device_atomics_on_smem;

I think this is is_atomic && !xe->info.has_device_atomics_on_smem;

We really only want strick migration if the fault is an atomic one.

> +
> +	case DRM_XE_ATOMIC_CPU:
> +		XE_WARN_ON(is_atomic);

I think we should nack the fault if an atomic occurs and
DRM_XE_ATOMIC_CPU is set - both for SVMs and BOs.

> +		return false;
> +
> +	case DRM_XE_ATOMIC_UNDEFINED:
> +		return is_atomic;
> +
> +	case DRM_XE_ATOMIC_GLOBAL:
> +		return true;

As with above, I think this is is_atomic to only implement strick
migration on atomic faults.

Matt

> +
> +	default:
> +		return is_atomic;
> +	}
> +}
> +
>  /**
>   * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
>   * @vm: Pointer to the xe_vm structure
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 0d6b08cc4163..d5bc09ae640c 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -171,6 +171,8 @@ static inline bool xe_vma_is_userptr(struct xe_vma *vma)
>  
>  struct xe_vma *xe_vm_find_vma_by_addr(struct xe_vm *vm, u64 page_addr);
>  
> +bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool is_atomic);
> +
>  int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
>  
>  /**
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index f64728120d7c..62dc5cec8950 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -85,7 +85,14 @@ static void madvise_atomic(struct xe_device *xe, struct xe_vm *vm,
>  			   struct xe_vma **vmas, int num_vmas,
>  			   struct drm_xe_madvise *op)
>  {
> -	/* Implementation pending */
> +	int i;
> +
> +	xe_assert(vm->xe, op->type == DRM_XE_MEM_RANGE_ATTR_ATOMIC);
> +	xe_assert(vm->xe, op->atomic.val <= DRM_XE_ATOMIC_CPU);
> +
> +	for (i = 0; i < num_vmas; i++)
> +		vmas[i]->attr.atomic_access = op->atomic.val;
> +	/*TODO: handle bo backed vmas */
>  }
>  
>  static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 12/23] drm/xe/madvise: Update migration policy based on preferred location
  2025-07-22 13:35 ` [PATCH v5 12/23] drm/xe/madvise: Update migration policy based on preferred location Himal Prasad Ghimiray
@ 2025-07-29  4:07   ` Matthew Brost
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  4:07 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström

On Tue, Jul 22, 2025 at 07:05:15PM +0530, Himal Prasad Ghimiray wrote:
> When the user sets the valid devmem_fd as a preferred location, GPU fault
> will trigger migration to tile of device associated with devmem_fd.
> 
> If the user sets an invalid devmem_fd the preferred location is current
> placement(smem) only.
> 
> v2(Matthew Brost)
> - Default should be faulting tile
> - remove devmem_fd used as region
> 
> v3 (Matthew Brost)
> - Add migration_policy
> - Fix return condition
> - fix migrate condition
> 
> Cc: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_svm.c        | 40 +++++++++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_svm.h        |  8 ++++++
>  drivers/gpu/drm/xe/xe_vm_madvise.c | 21 +++++++++++++++-
>  3 files changed, 67 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index 49d3405aacb9..ba1233d0d5a2 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -790,6 +790,37 @@ bool xe_svm_range_needs_migrate_to_vram(struct xe_svm_range *range, struct xe_vm
>  	return true;
>  }
>  
> +/**
> + * xe_vma_resolve_pagemap - Resolve the appropriate DRM pagemap for a VMA
> + * @vma: Pointer to the xe_vma structure containing memory attributes
> + * @tile: Pointer to the xe_tile structure used as fallback for VRAM mapping
> + *
> + * This function determines the correct DRM pagemap to use for a given VMA.
> + * It first checks if a valid devmem_fd is provided in the VMA's preferred
> + * location. If the devmem_fd is negative, it returns NULL, indicating no
> + * pagemap is available and smem to be used as preferred location.
> + * If the devmem_fd is equal to the default faulting
> + * GT identifier, it returns the VRAM pagemap associated with the tile.
> + *
> + * Future support for multi-device configurations may use drm_pagemap_from_fd()
> + * to resolve pagemaps from arbitrary file descriptors.
> + *
> + * Return: A pointer to the resolved drm_pagemap, or NULL if none is applicable.
> + */
> +struct drm_pagemap *xe_vma_resolve_pagemap(struct xe_vma *vma, struct xe_tile *tile)
> +{
> +	s32 fd = (s32)vma->attr.preferred_loc.devmem_fd;
> +
> +	if (fd == DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM)
> +		return NULL;
> +
> +	if (fd == DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE)
> +		return IS_DGFX(tile_to_xe(tile)) ? xe_tile_local_pagemap(tile) : NULL;
> +
> +	/* TODO: Support multi-device with drm_pagemap_from_fd(fd) */
> +	return NULL;
> +}
> +
>  /**
>   * xe_svm_handle_pagefault() - SVM handle page fault
>   * @vm: The VM.
> @@ -821,6 +852,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>  	};
>  	struct xe_svm_range *range;
>  	struct dma_fence *fence;
> +	struct drm_pagemap *dpagemap;
>  	struct xe_tile *tile = gt_to_tile(gt);
>  	int migrate_try_count = ctx.devmem_only ? 3 : 1;
>  	ktime_t end = 0;
> @@ -850,8 +882,14 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>  
>  	range_debug(range, "PAGE FAULT");
>  
> +	dpagemap = xe_vma_resolve_pagemap(vma, tile);
>  	if (--migrate_try_count >= 0 &&
> -	    xe_svm_range_needs_migrate_to_vram(range, vma, IS_DGFX(vm->xe))) {
> +	    xe_svm_range_needs_migrate_to_vram(range, vma, !!dpagemap || ctx.devmem_only)) {
> +		/* TODO : For multi-device dpagemap will be used to find the
> +		 * remote tile and remote device. Will need to modify
> +		 * xe_svm_alloc_vram to use dpagemap for future multi-device
> +		 * support.
> +		 */
>  		err = xe_svm_alloc_vram(tile, range, &ctx);
>  		ctx.timeslice_ms <<= 1;	/* Double timeslice if we have to retry */
>  		if (err) {
> diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
> index b0da0e85f0b8..494823afaa98 100644
> --- a/drivers/gpu/drm/xe/xe_svm.h
> +++ b/drivers/gpu/drm/xe/xe_svm.h
> @@ -94,6 +94,8 @@ void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end);
>  
>  u8 xe_svm_ranges_zap_ptes_in_range(struct xe_vm *vm, u64 start, u64 end);
>  
> +struct drm_pagemap *xe_vma_resolve_pagemap(struct xe_vma *vma, struct xe_tile *tile);
> +
>  /**
>   * xe_svm_range_has_dma_mapping() - SVM range has DMA mapping
>   * @range: SVM range
> @@ -318,6 +320,12 @@ u8 xe_svm_ranges_zap_ptes_in_range(struct xe_vm *vm, u64 start, u64 end)
>  	return 0;
>  }
>  
> +static inline
> +struct drm_pagemap *xe_vma_resolve_pagemap(struct xe_vma *vma, struct xe_tile *tile)
> +{
> +	return NULL;
> +}
> +
>  #define xe_svm_assert_in_notifier(...) do {} while (0)
>  #define xe_svm_range_has_dma_mapping(...) false
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index 62dc5cec8950..17959257ee1d 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -78,7 +78,19 @@ static void madvise_preferred_mem_loc(struct xe_device *xe, struct xe_vm *vm,
>  				      struct xe_vma **vmas, int num_vmas,
>  				      struct drm_xe_madvise *op)
>  {
> -	/* Implementation pending */
> +	int i;
> +
> +	xe_assert(vm->xe, op->type == DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC);
> +
> +	for (i = 0; i < num_vmas; i++) {
> +		vmas[i]->attr.preferred_loc.devmem_fd = op->preferred_mem_loc.devmem_fd;
> +
> +		/* Till multi-device support is not added migration_policy
> +		 * is of no use and can be ignored.
> +		 */
> +		vmas[i]->attr.preferred_loc.migration_policy =
> +						op->preferred_mem_loc.migration_policy;
> +	}
>  }
>  
>  static void madvise_atomic(struct xe_device *xe, struct xe_vm *vm,
> @@ -178,6 +190,12 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
>  
>  	switch (args->type) {
>  	case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC:
> +	{
> +		s32 fd = (s32)args->preferred_mem_loc.devmem_fd;
> +
> +		if (XE_IOCTL_DBG(xe, fd < DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM))
> +			return false;
> +
>  		if (XE_IOCTL_DBG(xe, args->preferred_mem_loc.migration_policy >
>  				     DRM_XE_MIGRATE_ONLY_SYSTEM_PAGES))
>  			return false;
> @@ -188,6 +206,7 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
>  		if (XE_IOCTL_DBG(xe, args->atomic.reserved))
>  			return false;
>  		break;
> +	}
>  	case DRM_XE_MEM_RANGE_ATTR_ATOMIC:
>  		if (XE_IOCTL_DBG(xe, args->atomic.val > DRM_XE_ATOMIC_CPU))
>  			return false;
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 17/23] drm/xe/bo: Update atomic_access attribute on madvise
  2025-07-22 13:35 ` [PATCH v5 17/23] drm/xe/bo: Update atomic_access attribute on madvise Himal Prasad Ghimiray
@ 2025-07-29  4:18   ` Matthew Brost
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  4:18 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström

On Tue, Jul 22, 2025 at 07:05:20PM +0530, Himal Prasad Ghimiray wrote:
> Update the bo_atomic_access based on user-provided input and determine
> the migration to smem during a CPU fault
> 
> v2 (Matthew Brost)
> - Avoid cpu unmapping if bo is already in smem
> - check atomics on smem too for ioctl
> - Add comments
> 
> v3
> - Avoid migration in prefetch
> 
> v4 (Matthew Brost)
> - make sanity check function bool
> - add assert for smem placement
> - fix doc
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c           | 29 +++++++++++--
>  drivers/gpu/drm/xe/xe_gt_pagefault.c |  2 +-
>  drivers/gpu/drm/xe/xe_vm.c           |  5 ++-
>  drivers/gpu/drm/xe/xe_vm_madvise.c   | 62 +++++++++++++++++++++++++++-
>  4 files changed, 91 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 4e0355d0f406..f133fc54664e 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1685,6 +1685,18 @@ static void xe_gem_object_close(struct drm_gem_object *obj,
>  	}
>  }
>  
> +static bool should_migrate_to_smem(struct xe_bo *bo)
> +{
> +	/*
> +	 * NOTE: The following atomic checks are platform-specific. For example,
> +	 * if a device supports CXL atomics, these may not be necessary or
> +	 * may behave differently.
> +	 */
> +
> +	return bo->attr.atomic_access == DRM_XE_ATOMIC_GLOBAL ||
> +	       bo->attr.atomic_access == DRM_XE_ATOMIC_CPU;
> +}
> +
>  static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
>  {
>  	struct ttm_buffer_object *tbo = vmf->vma->vm_private_data;
> @@ -1693,7 +1705,7 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
>  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
>  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
>  	vm_fault_t ret;
> -	int idx;
> +	int idx, r = 0;
>  
>  	if (needs_rpm)
>  		xe_pm_runtime_get(xe);
> @@ -1705,8 +1717,19 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
>  	if (drm_dev_enter(ddev, &idx)) {
>  		trace_xe_bo_cpu_fault(bo);
>  
> -		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
> -					       TTM_BO_VM_NUM_PREFAULT);
> +		if (should_migrate_to_smem(bo)) {
> +			xe_assert(xe, bo->flags & XE_BO_FLAG_SYSTEM);
> +
> +			r = xe_bo_migrate(bo, XE_PL_TT);
> +			if (r == -EBUSY || r == -ERESTARTSYS || r == -EINTR)
> +				ret = VM_FAULT_NOPAGE;
> +			else if (r)
> +				ret = VM_FAULT_SIGBUS;
> +		}
> +		if (!ret)
> +			ret = ttm_bo_vm_fault_reserved(vmf,
> +						       vmf->vma->vm_page_prot,
> +						       TTM_BO_VM_NUM_PREFAULT);
>  		drm_dev_exit(idx);
>  	} else {
>  		ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> index 5a75d56d8558..c1cb69c6ada8 100644
> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> @@ -84,7 +84,7 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
>  	if (err)
>  		return err;
>  
> -	if (atomic && IS_DGFX(vm->xe)) {
> +	if (xe_vma_need_vram_for_atomic(vm->xe, vma, atomic)) {

Same as patch #11, if atomic fault and DRM_XE_ATOMIC_CPU, we nack the fault.

So I think the helper should be defined to figure out what to nack too.

>  		if (xe_vma_is_userptr(vma)) {
>  			err = -EACCES;

I think DRM_XE_ATOMIC_DEVICE works for userptr now if
xe->info.has_device_atomics_on_smem is true too.

>  			return err;
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 2226b1eb46f1..5dc7cd7769f8 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -4200,6 +4200,9 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap)
>   */
>  bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool is_atomic)
>  {
> +	u32 atomic_access = xe_vma_bo(vma) ? xe_vma_bo(vma)->attr.atomic_access :
> +					     vma->attr.atomic_access;
> +
>  	if (!IS_DGFX(xe))
>  		return false;
>  
> @@ -4208,7 +4211,7 @@ bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool
>  	 * instance, on a device supporting CXL atomics, these would ideally
>  	 * work universally without additional handling.
>  	 */
> -	switch (vma->attr.atomic_access) {
> +	switch (atomic_access) {
>  	case DRM_XE_ATOMIC_DEVICE:
>  		return !xe->info.has_device_atomics_on_smem;
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index 1dc4d19a5f2a..727833780b4b 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -98,14 +98,28 @@ static void madvise_atomic(struct xe_device *xe, struct xe_vm *vm,
>  			   struct xe_vma **vmas, int num_vmas,
>  			   struct drm_xe_madvise *op)
>  {
> +	struct xe_bo *bo;
>  	int i;
>  
>  	xe_assert(vm->xe, op->type == DRM_XE_MEM_RANGE_ATTR_ATOMIC);
>  	xe_assert(vm->xe, op->atomic.val <= DRM_XE_ATOMIC_CPU);
>  
> -	for (i = 0; i < num_vmas; i++)
> +	for (i = 0; i < num_vmas; i++) {
>  		vmas[i]->attr.atomic_access = op->atomic.val;
> -	/*TODO: handle bo backed vmas */
> +
> +		bo = xe_vma_bo(vmas[i]);
> +		if (!bo)
> +			continue;
> +
> +		xe_bo_assert_held(bo);
> +		bo->attr.atomic_access = op->atomic.val;
> +
> +		/* Invalidate cpu page table, so bo can migrate to smem in next access */
> +		if (xe_bo_is_vram(bo) &&
> +		    (bo->attr.atomic_access == DRM_XE_ATOMIC_CPU ||
> +		     bo->attr.atomic_access == DRM_XE_ATOMIC_GLOBAL))
> +			ttm_bo_unmap_virtual(&bo->ttm);
> +	}
>  }
>  
>  static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
> @@ -253,6 +267,41 @@ static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madv
>  	return true;
>  }
>  
> +static bool check_bo_args_are_sane(struct xe_vm *vm, struct xe_vma **vmas,
> +				   int num_vmas, u32 atomic_val)
> +{
> +	struct xe_device *xe = vm->xe;
> +	struct xe_bo *bo;
> +	int i;
> +
> +	for (i = 0; i < num_vmas; i++) {
> +		bo = xe_vma_bo(vmas[i]);
> +		if (!bo)
> +			continue;

I think for userptr VMAs we should reject any DRM_XE_ATOMIC_GLOBAL as
that won't work given we can't migrate userptrs. 

Likewise DRM_XE_ATOMIC_DEVICE should be rejected for userptr if
xe->info.has_device_atomics_on_smem is clear.

Matt

> +		/*
> +		 * NOTE: The following atomic checks are platform-specific. For example,
> +		 * if a device supports CXL atomics, these may not be necessary or
> +		 * may behave differently.
> +		 */
> +		if (XE_IOCTL_DBG(xe, atomic_val == DRM_XE_ATOMIC_CPU &&
> +				 !(bo->flags & XE_BO_FLAG_SYSTEM)))
> +			return false;
> +
> +		if (XE_IOCTL_DBG(xe, atomic_val == DRM_XE_ATOMIC_DEVICE &&
> +				 !(bo->flags & XE_BO_FLAG_VRAM0) &&
> +				 !(bo->flags & XE_BO_FLAG_VRAM1) &&
> +				 !(bo->flags & XE_BO_FLAG_SYSTEM &&
> +				   xe->info.has_device_atomics_on_smem)))
> +			return false;
> +
> +		if (XE_IOCTL_DBG(xe, atomic_val == DRM_XE_ATOMIC_GLOBAL &&
> +				 (!(bo->flags & XE_BO_FLAG_SYSTEM) ||
> +				  (!(bo->flags & XE_BO_FLAG_VRAM0) &&
> +				   !(bo->flags & XE_BO_FLAG_VRAM1)))))
> +			return false;
> +	}
> +	return true;
> +}
>  /**
>   * xe_vm_madvise_ioctl - Handle MADVise ioctl for a VM
>   * @dev: DRM device pointer
> @@ -302,6 +351,15 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
>  		goto unlock_vm;
>  
>  	if (madvise_range.has_bo_vmas) {
> +		if (args->type == DRM_XE_MEM_RANGE_ATTR_ATOMIC) {
> +			if (!check_bo_args_are_sane(vm, madvise_range.vmas,
> +						    madvise_range.num_vmas,
> +						    args->atomic.val)) {
> +				err = -EINVAL;
> +				goto unlock_vm;
> +			}
> +		}
> +
>  		drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES | DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
>  		drm_exec_until_all_locked(&exec) {
>  			for (int i = 0; i < madvise_range.num_vmas; i++) {
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 18/23] drm/xe/madvise: Skip vma invalidation if mem attr are unchanged
  2025-07-22 13:35 ` [PATCH v5 18/23] drm/xe/madvise: Skip vma invalidation if mem attr are unchanged Himal Prasad Ghimiray
@ 2025-07-29  4:19   ` Matthew Brost
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  4:19 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström

On Tue, Jul 22, 2025 at 07:05:21PM +0530, Himal Prasad Ghimiray wrote:
> If a VMA within the madvise input range already has the same memory
> attribute as the one requested by the user, skip PTE zapping for that
> VMA to avoid unnecessary invalidation.
> 
> v2 (Matthew Brost)
> - fix skip_invalidation for new attributes
> - s/u32/bool
> - Remove unnecessary assignment  for kzalloc'ed
> 
> Suggested-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_vm_madvise.c | 52 +++++++++++++++++++++---------
>  drivers/gpu/drm/xe/xe_vm_types.h   |  6 ++++
>  2 files changed, 42 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index 727833780b4b..fbb6aa8a7a5e 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -84,13 +84,19 @@ static void madvise_preferred_mem_loc(struct xe_device *xe, struct xe_vm *vm,
>  	xe_assert(vm->xe, op->type == DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC);
>  
>  	for (i = 0; i < num_vmas; i++) {
> -		vmas[i]->attr.preferred_loc.devmem_fd = op->preferred_mem_loc.devmem_fd;
> -
> -		/* Till multi-device support is not added migration_policy
> -		 * is of no use and can be ignored.
> -		 */
> -		vmas[i]->attr.preferred_loc.migration_policy =
> +		if (vmas[i]->attr.preferred_loc.devmem_fd == op->preferred_mem_loc.devmem_fd &&
> +		    vmas[i]->attr.preferred_loc.migration_policy ==
> +		    op->preferred_mem_loc.migration_policy) {
> +			vmas[i]->skip_invalidation = true;
> +		} else {
> +			vmas[i]->skip_invalidation = false;
> +			vmas[i]->attr.preferred_loc.devmem_fd = op->preferred_mem_loc.devmem_fd;
> +			/* Till multi-device support is not added migration_policy
> +			 * is of no use and can be ignored.
> +			 */
> +			vmas[i]->attr.preferred_loc.migration_policy =
>  						op->preferred_mem_loc.migration_policy;
> +		}
>  	}
>  }
>  
> @@ -105,7 +111,12 @@ static void madvise_atomic(struct xe_device *xe, struct xe_vm *vm,
>  	xe_assert(vm->xe, op->atomic.val <= DRM_XE_ATOMIC_CPU);
>  
>  	for (i = 0; i < num_vmas; i++) {
> -		vmas[i]->attr.atomic_access = op->atomic.val;
> +		if (vmas[i]->attr.atomic_access == op->atomic.val) {
> +			vmas[i]->skip_invalidation = true;
> +		} else {
> +			vmas[i]->skip_invalidation = false;
> +			vmas[i]->attr.atomic_access = op->atomic.val;
> +		}
>  
>  		bo = xe_vma_bo(vmas[i]);
>  		if (!bo)
> @@ -130,9 +141,14 @@ static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
>  
>  	xe_assert(vm->xe, op->type == DRM_XE_MEM_RANGE_ATTR_PAT);
>  
> -	for (i = 0; i < num_vmas; i++)
> -		vmas[i]->attr.pat_index = op->pat_index.val;
> -
> +	for (i = 0; i < num_vmas; i++) {
> +		if (vmas[i]->attr.pat_index == op->pat_index.val) {
> +			vmas[i]->skip_invalidation = true;
> +		} else {
> +			vmas[i]->skip_invalidation = false;
> +			vmas[i]->attr.pat_index = op->pat_index.val;
> +		}
> +	}
>  }
>  
>  typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
> @@ -158,17 +174,20 @@ static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start, u64 end)
>  				  false, MAX_SCHEDULE_TIMEOUT) <= 0)
>  		XE_WARN_ON(1);
>  
> -	tile_mask = xe_svm_ranges_zap_ptes_in_range(vm, start, end);
> -
>  	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, start, end) {
>  		struct xe_vma *vma = gpuva_to_vma(gpuva);
>  
> -		if (xe_vma_is_cpu_addr_mirror(vma))
> +		if (vma->skip_invalidation)
>  			continue;
>  
> -		for_each_tile(tile, vm->xe, id) {
> -			if (xe_pt_zap_ptes(tile, vma)) {
> -				tile_mask |= BIT(id);
> +		if (xe_vma_is_cpu_addr_mirror(vma)) {
> +			tile_mask |= xe_svm_ranges_zap_ptes_in_range(vm,
> +								      xe_vma_start(vma),
> +								      xe_vma_end(vma));
> +		} else {
> +			for_each_tile(tile, vm->xe, id) {
> +				if (xe_pt_zap_ptes(tile, vma)) {
> +					tile_mask |= BIT(id);
>  
>  				/*
>  				 * WRITE_ONCE pairs with READ_ONCE
> @@ -176,6 +195,7 @@ static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start, u64 end)
>  				 */
>  				WRITE_ONCE(vma->tile_invalidated,
>  					   vma->tile_invalidated | BIT(id));
> +				}
>  			}
>  		}
>  	}
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index cd94d8b5819d..81d92d886578 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -157,6 +157,12 @@ struct xe_vma {
>  	/** @tile_staged: bind is staged for this VMA */
>  	u8 tile_staged;
>  
> +	/**
> +	 * @skip_invalidation: Used in madvise to avoid invalidation
> +	 * if mem attributes doesn't change
> +	 */
> +	bool skip_invalidation;
> +
>  	/**
>  	 * @ufence: The user fence that was provided with MAP.
>  	 * Needs to be signalled before UNMAP can be processed.
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 10/23] drm/xe: Implement madvise ioctl for xe
  2025-07-29  3:52   ` Matthew Brost
@ 2025-07-29  4:23     ` Matthew Brost
  2025-07-29  9:43       ` Ghimiray, Himal Prasad
  0 siblings, 1 reply; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  4:23 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström, Shuicheng Lin

On Mon, Jul 28, 2025 at 08:52:26PM -0700, Matthew Brost wrote:
> On Tue, Jul 22, 2025 at 07:05:13PM +0530, Himal Prasad Ghimiray wrote:
> > This driver-specific ioctl enables UMDs to control the memory attributes
> > for GPU VMAs within a specified input range. If the start or end
> > addresses fall within an existing VMA, the VMA is split accordingly. The
> > attributes of the VMA are modified as provided by the users. The old
> > mappings of the VMAs are invalidated, and TLB invalidation is performed
> > if necessary.
> > 
> > v2(Matthew brost)
> > - xe_vm_in_fault_mode can't be enabled by Mesa, hence allow ioctl in non
> > fault mode too
> > - fix tlb invalidation skip for same ranges in multiple op
> > - use helper for tlb invalidation
> > - use xe_svm_notifier_lock/unlock helper
> > - s/lockdep_assert_held/lockdep_assert_held_write
> > - Add kernel-doc
> > 
> > v3(Matthew Brost)
> > - make vfunc fail safe
> > - Add sanitizing input args before vfunc
> > 
> > v4(Matthew Brost/Shuicheng)
> > - Make locks interruptable
> > - Error handling fixes
> > - vm_put fixes
> > 
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Shuicheng Lin <shuicheng.lin@intel.com>
> > Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> > ---
> >  drivers/gpu/drm/xe/Makefile        |   1 +
> >  drivers/gpu/drm/xe/xe_vm_madvise.c | 306 +++++++++++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_vm_madvise.h |  15 ++
> >  3 files changed, 322 insertions(+)
> >  create mode 100644 drivers/gpu/drm/xe/xe_vm_madvise.c
> >  create mode 100644 drivers/gpu/drm/xe/xe_vm_madvise.h
> > 
> > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> > index 83a36c47a2f9..fa52866bb72c 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -125,6 +125,7 @@ xe-y += xe_bb.o \
> >  	xe_uc.o \
> >  	xe_uc_fw.o \
> >  	xe_vm.o \
> > +	xe_vm_madvise.o \
> >  	xe_vram.o \
> >  	xe_vram_freq.o \
> >  	xe_vsec.o \
> > diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
> > new file mode 100644
> > index 000000000000..f64728120d7c
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> > @@ -0,0 +1,306 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#include "xe_vm_madvise.h"
> > +
> > +#include <linux/nospec.h>
> > +#include <drm/xe_drm.h>
> > +
> > +#include "xe_bo.h"
> > +#include "xe_pt.h"
> > +#include "xe_svm.h"
> > +
> > +struct xe_vmas_in_madvise_range {
> > +	u64 addr;
> > +	u64 range;
> > +	struct xe_vma **vmas;
> > +	int num_vmas;
> > +	bool has_svm_vmas;
> > +	bool has_bo_vmas;
> > +	bool has_userptr_vmas;
> > +};
> > +
> > +static int get_vmas(struct xe_vm *vm, struct xe_vmas_in_madvise_range *madvise_range)
> > +{
> > +	u64 addr = madvise_range->addr;
> > +	u64 range = madvise_range->range;
> > +
> > +	struct xe_vma  **__vmas;
> > +	struct drm_gpuva *gpuva;
> > +	int max_vmas = 8;
> > +
> > +	lockdep_assert_held(&vm->lock);
> > +
> > +	madvise_range->num_vmas = 0;
> > +	madvise_range->vmas = kmalloc_array(max_vmas, sizeof(*madvise_range->vmas), GFP_KERNEL);
> > +	if (!madvise_range->vmas)
> > +		return -ENOMEM;
> > +
> > +	vm_dbg(&vm->xe->drm, "VMA's in range: start=0x%016llx, end=0x%016llx", addr, addr + range);
> > +
> > +	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, addr, addr + range) {
> > +		struct xe_vma *vma = gpuva_to_vma(gpuva);
> > +
> > +		if (xe_vma_bo(vma))
> > +			madvise_range->has_bo_vmas = true;
> > +		else if (xe_vma_is_cpu_addr_mirror(vma))
> > +			madvise_range->has_svm_vmas = true;
> > +		else if (xe_vma_is_userptr(vma))
> > +			madvise_range->has_userptr_vmas = true;
> > +
> > +		if (madvise_range->num_vmas == max_vmas) {
> > +			max_vmas <<= 1;
> > +			__vmas = krealloc(madvise_range->vmas,
> > +					  max_vmas * sizeof(*madvise_range->vmas),
> > +					  GFP_KERNEL);
> > +			if (!__vmas) {
> > +				kfree(madvise_range->vmas);
> > +				return -ENOMEM;
> > +			}
> > +			madvise_range->vmas = __vmas;
> > +		}
> > +
> > +		madvise_range->vmas[madvise_range->num_vmas] = vma;
> > +		(madvise_range->num_vmas)++;
> > +	}
> > +
> > +	if (!madvise_range->num_vmas)
> > +		kfree(madvise_range->vmas);
> > +
> > +	vm_dbg(&vm->xe->drm, "madvise_range-num_vmas = %d\n", madvise_range->num_vmas);
> > +
> > +	return 0;
> > +}
> > +
> > +static void madvise_preferred_mem_loc(struct xe_device *xe, struct xe_vm *vm,
> > +				      struct xe_vma **vmas, int num_vmas,
> > +				      struct drm_xe_madvise *op)
> > +{
> > +	/* Implementation pending */
> > +}
> > +
> > +static void madvise_atomic(struct xe_device *xe, struct xe_vm *vm,
> > +			   struct xe_vma **vmas, int num_vmas,
> > +			   struct drm_xe_madvise *op)
> > +{
> > +	/* Implementation pending */
> > +}
> > +
> > +static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
> > +			      struct xe_vma **vmas, int num_vmas,
> > +			      struct drm_xe_madvise *op)
> > +{
> > +	/* Implementation pending */
> > +}
> > +
> > +typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
> > +			     struct xe_vma **vmas, int num_vmas,
> > +			     struct drm_xe_madvise *op);
> > +
> > +static const madvise_func madvise_funcs[] = {
> > +	[DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC] = madvise_preferred_mem_loc,
> > +	[DRM_XE_MEM_RANGE_ATTR_ATOMIC] = madvise_atomic,
> > +	[DRM_XE_MEM_RANGE_ATTR_PAT] = madvise_pat_index,
> > +};
> > +
> > +static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start, u64 end)
> > +{
> > +	struct drm_gpuva *gpuva;
> > +	struct xe_tile *tile;
> > +	u8 id, tile_mask;
> > +
> > +	lockdep_assert_held_write(&vm->lock);
> > +
> > +	/* Wait for pending binds */
> > +	if (dma_resv_wait_timeout(xe_vm_resv(vm), DMA_RESV_USAGE_BOOKKEEP,
> > +				  false, MAX_SCHEDULE_TIMEOUT) <= 0)
> > +		XE_WARN_ON(1);
> > +
> > +	tile_mask = xe_svm_ranges_zap_ptes_in_range(vm, start, end);
> > +
> > +	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, start, end) {
> > +		struct xe_vma *vma = gpuva_to_vma(gpuva);
> > +
> > +		if (xe_vma_is_cpu_addr_mirror(vma))
> 
> I think:
> 
> xe_vma_is_cpu_addr_mirror(vma) || xe_vma_is_null(vma)
> 
> No need to invalidate NULL VMA's as those mappings are not modified by
> madvise (i.e., you can call madvise on NULL mappings but it doesn't
> actually do anything).
> 
> > +			continue;
> > +
> > +		for_each_tile(tile, vm->xe, id) {
> > +			if (xe_pt_zap_ptes(tile, vma)) {
> > +				tile_mask |= BIT(id);
> > +
> > +				/*
> > +				 * WRITE_ONCE pairs with READ_ONCE
> > +				 * in xe_vm_has_valid_gpu_mapping()
> > +				 */
> > +				WRITE_ONCE(vma->tile_invalidated,
> > +					   vma->tile_invalidated | BIT(id));
> > +			}
> > +		}
> > +	}
> > +
> > +	return tile_mask;
> > +}
> > +
> > +static int xe_vm_invalidate_madvise_range(struct xe_vm *vm, u64 start, u64 end)
> > +{
> > +	u8 tile_mask = xe_zap_ptes_in_madvise_range(vm, start, end);
> > +
> > +	if (!tile_mask)
> > +		return 0;
> > +
> > +	xe_device_wmb(vm->xe);
> > +
> > +	return xe_vm_range_tilemask_tlb_invalidation(vm, start, end, tile_mask);
> > +}
> > +
> > +static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madvise *args)
> > +{
> > +	if (XE_IOCTL_DBG(xe, !args))
> > +		return false;
> > +
> > +	if (XE_IOCTL_DBG(xe, !IS_ALIGNED(args->start, SZ_4K)))
> > +		return false;
> > +
> > +	if (XE_IOCTL_DBG(xe, !IS_ALIGNED(args->range, SZ_4K)))
> > +		return false;
> > +
> > +	if (XE_IOCTL_DBG(xe, args->range < SZ_4K))
> > +		return false;
> > +
> > +	switch (args->type) {
> > +	case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC:
> > +		if (XE_IOCTL_DBG(xe, args->preferred_mem_loc.migration_policy >
> > +				     DRM_XE_MIGRATE_ONLY_SYSTEM_PAGES))
> > +			return false;
> > +
> > +		if (XE_IOCTL_DBG(xe, args->preferred_mem_loc.pad))
> > +			return false;
> > +
> > +		if (XE_IOCTL_DBG(xe, args->atomic.reserved))
> > +			return false;
> > +		break;
> > +	case DRM_XE_MEM_RANGE_ATTR_ATOMIC:
> > +		if (XE_IOCTL_DBG(xe, args->atomic.val > DRM_XE_ATOMIC_CPU))
> > +			return false;
> > +
> > +		if (XE_IOCTL_DBG(xe, args->atomic.pad))
> > +			return false;
> > +
> > +		if (XE_IOCTL_DBG(xe, args->atomic.reserved))
> > +			return false;
> > +
> > +		break;
> > +	case DRM_XE_MEM_RANGE_ATTR_PAT:
> > +		/*TODO: Add valid pat check */
> > +		break;
> > +	default:
> > +		if (XE_IOCTL_DBG(xe, 1))
> > +			return false;
> > +	}
> > +
> > +	if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +/**
> > + * xe_vm_madvise_ioctl - Handle MADVise ioctl for a VM
> > + * @dev: DRM device pointer
> > + * @data: Pointer to ioctl data (drm_xe_madvise*)
> > + * @file: DRM file pointer
> > + *
> > + * Handles the MADVISE ioctl to provide memory advice for vma's within
> > + * input range.
> > + *
> > + * Return: 0 on success or a negative error code on failure.
> > + */
> > +int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > +{
> > +	struct xe_device *xe = to_xe_device(dev);
> > +	struct xe_file *xef = to_xe_file(file);
> > +	struct drm_xe_madvise *args = data;
> > +	struct xe_vmas_in_madvise_range madvise_range = {.addr = args->start,
> > +							 .range =  args->range, };
> > +	struct xe_vm *vm;
> > +	struct drm_exec exec;
> > +	int err, attr_type;
> > +
> > +	vm = xe_vm_lookup(xef, args->vm_id);
> > +	if (XE_IOCTL_DBG(xe, !vm))
> > +		return -EINVAL;
> > +
> > +	if (!madvise_args_are_sane(vm->xe, args)) {
> > +		err = -EINVAL;
> > +		goto put_vm;
> > +	}
> > +
> 
> I think as this code can modify the ranges during a VMA split, you will
> need to ensure all queued unmaps prior to this are complete.
> 

The explaination is wrong, but it is still needed. You need a flush of
garbage collector because it can modify VMAs and we need that view to be
current. Feel free add this in this patch or patch 20.

Matt

> So call xe_svm_flush(vm) prior to taking any locks.
> 
> Looks good otherwise.
> 
> Matt
> 
> > +	err = down_write_killable(&vm->lock);
> > +	if (err)
> > +		goto put_vm;
> > +
> > +	if (XE_IOCTL_DBG(xe, xe_vm_is_closed_or_banned(vm))) {
> > +		err = -ENOENT;
> > +		goto unlock_vm;
> > +	}
> > +
> > +	err = xe_vm_alloc_madvise_vma(vm, args->start, args->range);
> > +	if (err)
> > +		goto unlock_vm;
> > +
> > +	err = get_vmas(vm, &madvise_range);
> > +	if (err || !madvise_range.num_vmas)
> > +		goto unlock_vm;
> > +
> > +	if (madvise_range.has_bo_vmas) {
> > +		drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES | DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> > +		drm_exec_until_all_locked(&exec) {
> > +			for (int i = 0; i < madvise_range.num_vmas; i++) {
> > +				struct xe_bo *bo = xe_vma_bo(madvise_range.vmas[i]);
> > +
> > +				if (!bo)
> > +					continue;
> > +				err = drm_exec_lock_obj(&exec, &bo->ttm.base);
> > +				drm_exec_retry_on_contention(&exec);
> > +				if (err)
> > +					goto err_fini;
> > +			}
> > +		}
> > +	}
> > +
> > +	if (madvise_range.has_userptr_vmas) {
> > +		err = down_read_interruptible(&vm->userptr.notifier_lock);
> > +		if (err)
> > +			goto err_fini;
> > +	}
> > +
> > +	if (madvise_range.has_svm_vmas) {
> > +		err = down_read_interruptible(&vm->svm.gpusvm.notifier_lock);
> > +		if (err)
> > +			goto unlock_userptr;
> > +	}
> > +
> > +	attr_type = array_index_nospec(args->type, ARRAY_SIZE(madvise_funcs));
> > +	madvise_funcs[attr_type](xe, vm, madvise_range.vmas, madvise_range.num_vmas, args);
> > +
> > +	err = xe_vm_invalidate_madvise_range(vm, args->start, args->start + args->range);
> > +
> > +	if (madvise_range.has_svm_vmas)
> > +		xe_svm_notifier_unlock(vm);
> > +
> > +unlock_userptr:
> > +	if (madvise_range.has_userptr_vmas)
> > +		up_read(&vm->userptr.notifier_lock);
> > +err_fini:
> > +	if (madvise_range.has_bo_vmas)
> > +		drm_exec_fini(&exec);
> > +	kfree(madvise_range.vmas);
> > +	madvise_range.vmas = NULL;
> > +unlock_vm:
> > +	up_write(&vm->lock);
> > +put_vm:
> > +	xe_vm_put(vm);
> > +	return err;
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.h b/drivers/gpu/drm/xe/xe_vm_madvise.h
> > new file mode 100644
> > index 000000000000..b0e1fc445f23
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_vm_madvise.h
> > @@ -0,0 +1,15 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#ifndef _XE_VM_MADVISE_H_
> > +#define _XE_VM_MADVISE_H_
> > +
> > +struct drm_device;
> > +struct drm_file;
> > +
> > +int xe_vm_madvise_ioctl(struct drm_device *dev, void *data,
> > +			struct drm_file *file);
> > +
> > +#endif
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 19/23] drm/xe/vm: Add helper to check for default VMA memory attributes
  2025-07-22 13:35 ` [PATCH v5 19/23] drm/xe/vm: Add helper to check for default VMA memory attributes Himal Prasad Ghimiray
@ 2025-07-29  4:33   ` Matthew Brost
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  4:33 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström

On Tue, Jul 22, 2025 at 07:05:22PM +0530, Himal Prasad Ghimiray wrote:
> Introduce a new helper function `xe_vma_has_default_mem_attrs()` to
> determine whether a VMA's memory attributes are set to their default
> values. This includes checks for atomic access, PAT index, and preferred
> location.
> 
> Also, add a new field `default_pat_index` to `struct xe_vma_mem_attr`
> to track the initial PAT index set during the first bind. This helps
> distinguish between default and user-modified pat index, such as those
> changed via madvise.
> 
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

Ah, I see how you restore the default pat_index here. Different than how
I was thinking but it makes sense.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_vm.c       | 24 ++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_vm.h       |  2 ++
>  drivers/gpu/drm/xe/xe_vm_types.h |  6 ++++++
>  3 files changed, 32 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 5dc7cd7769f8..d3f08bf9a3ee 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -2592,6 +2592,29 @@ static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
>  	return err;
>  }
>  
> +/**
> + * xe_vma_has_default_mem_attrs - Check if a VMA has default memory attributes
> + * @vma: Pointer to the xe_vma structure to check
> + *
> + * This function determines whether the given VMA (Virtual Memory Area)
> + * has its memory attributes set to their default values. Specifically,
> + * it checks the following conditions:
> + *
> + * - `atomic_access` is `DRM_XE_VMA_ATOMIC_UNDEFINED`
> + * - `pat_index` is equal to `default_pat_index`
> + * - `preferred_loc.devmem_fd` is `DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE`
> + * - `preferred_loc.migration_policy` is `DRM_XE_MIGRATE_ALL_PAGES`
> + *
> + * Return: true if all attributes are at their default values, false otherwise.
> + */
> +bool xe_vma_has_default_mem_attrs(struct xe_vma *vma)
> +{
> +	return (vma->attr.atomic_access == DRM_XE_ATOMIC_UNDEFINED &&
> +		vma->attr.pat_index ==  vma->attr.default_pat_index &&
> +		vma->attr.preferred_loc.devmem_fd == DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE &&
> +		vma->attr.preferred_loc.migration_policy == DRM_XE_MIGRATE_ALL_PAGES);
> +}
> +
>  static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
>  				   struct xe_vma_ops *vops)
>  {
> @@ -2624,6 +2647,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
>  					.migration_policy = DRM_XE_MIGRATE_ALL_PAGES,
>  				},
>  				.atomic_access = DRM_XE_ATOMIC_UNDEFINED,
> +				.default_pat_index = op->map.pat_index,
>  				.pat_index = op->map.pat_index,
>  			};
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index d5bc09ae640c..a4db843de540 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -66,6 +66,8 @@ static inline bool xe_vm_is_closed_or_banned(struct xe_vm *vm)
>  struct xe_vma *
>  xe_vm_find_overlapping_vma(struct xe_vm *vm, u64 start, u64 range);
>  
> +bool xe_vma_has_default_mem_attrs(struct xe_vma *vma);
> +
>  /**
>   * xe_vm_has_scratch() - Whether the vm is configured for scratch PTEs
>   * @vm: The vm
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index 81d92d886578..351242c92c12 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -103,8 +103,14 @@ struct xe_vma_mem_attr {
>  	 */
>  	u32 atomic_access;
>  
> +	/**
> +	 * @default_pat_index: The pat index for VMA set during first bind by user.
> +	 */
> +	u16 default_pat_index;
> +
>  	/**
>  	 * @pat_index: The pat index to use when encoding the PTEs for this vma.
> +	 * same as default_pat_index unless overwritten by madvise.
>  	 */
>  	u16 pat_index;
>  };
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 22/23] drm/xe: Enable madvise ioctl for xe
  2025-07-22 13:35 ` [PATCH v5 22/23] drm/xe: Enable madvise ioctl for xe Himal Prasad Ghimiray
@ 2025-07-29  4:34   ` Matthew Brost
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  4:34 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström

On Tue, Jul 22, 2025 at 07:05:25PM +0530, Himal Prasad Ghimiray wrote:
> Ioctl enables setting up of memory attributes in user provided range.
> 
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_device.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 6dc84e4ed281..b02c4ae0fdbf 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -63,6 +63,7 @@
>  #include "xe_ttm_stolen_mgr.h"
>  #include "xe_ttm_sys_mgr.h"
>  #include "xe_vm.h"
> +#include "xe_vm_madvise.h"
>  #include "xe_vram.h"
>  #include "xe_vsec.h"
>  #include "xe_wait_user_fence.h"
> @@ -200,6 +201,7 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
>  	DRM_IOCTL_DEF_DRV(XE_WAIT_USER_FENCE, xe_wait_user_fence_ioctl,
>  			  DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(XE_OBSERVATION, xe_observation_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(XE_MADVISE, xe_vm_madvise_ioctl, DRM_RENDER_ALLOW),
>  };
>  
>  static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 21/23] drm/xe/vm: Add a delayed worker to merge fragmented vmas
  2025-07-22 13:35 ` [PATCH v5 21/23] drm/xe/vm: Add a delayed worker to merge fragmented vmas Himal Prasad Ghimiray
@ 2025-07-29  4:39   ` Matthew Brost
  2025-07-30 11:08     ` Ghimiray, Himal Prasad
  0 siblings, 1 reply; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  4:39 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström

On Tue, Jul 22, 2025 at 07:05:24PM +0530, Himal Prasad Ghimiray wrote:
> During initial mirror bind initialize and start the delayed work item
> responsible for merging adjacent CPU address mirror VMAs with default
> memory attributes. This function sets the merge_active flag and schedules
> the work to run after a delay, allowing batching of VMA updates.
> 

I think we will need someway to defragment but it might need more
thought. The trade off between defragmenting on every insertion of
mirror VMA (binding a BO back to mirror) and every unmap restoring the
defaults vs. periodic worker needs to be carefully considered.

The trade off is more time up front (plus perhaps some additional
complexity) vs periodic worker which blocks out all memory transactions.

Since this doesn't affect any functionality, perhaps table for now + we
run this one by Thomas to formulate a plan / solution.

Matt

> Suggested-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_vm.c       | 126 +++++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_vm_types.h |  15 ++++
>  2 files changed, 141 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 003c8209f8bd..bee849167c0d 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1160,6 +1160,127 @@ static void xe_vma_free(struct xe_vma *vma)
>  		kfree(vma);
>  }
>  
> +struct va_range {
> +	u64 start;
> +	u64 end;
> +};
> +
> +static void add_merged_range(struct va_range **ranges, int *count, int *capacity,
> +			     u64 start, u64 end)
> +{
> +	const int array_size  = 8;
> +	struct va_range *new_ranges;
> +	int new_capacity;
> +
> +	if (*count == *capacity) {
> +		new_capacity = *capacity ? *capacity * 2 : array_size;
> +		new_ranges = krealloc(*ranges, new_capacity * sizeof(**ranges), GFP_KERNEL);
> +		if (!new_ranges)
> +			return;
> +
> +		*ranges = new_ranges;
> +		*capacity = new_capacity;
> +	}
> +	(*ranges)[(*count)++] = (struct va_range){ .start = start, .end = end };
> +}
> +
> +static void xe_vm_vmas_merge_worker(struct work_struct *work)
> +{
> +	struct xe_vm *vm = container_of(to_delayed_work(work), struct xe_vm, merge_vmas_work);
> +	struct drm_gpuva *gpuva, *next = NULL;
> +	struct va_range *merged_ranges = NULL;
> +	int merge_count = 0, merge_capacity = 0;
> +	bool in_merge = false;
> +	u64 merge_start = 0, merge_end = 0;
> +	int merge_len = 0;
> +
> +	if (!vm->merge_active)
> +		return;
> +
> +	down_write(&vm->lock);
> +
> +	drm_gpuvm_for_each_va_safe(gpuva, next, &vm->gpuvm) {
> +		struct xe_vma *vma = gpuva_to_vma(gpuva);
> +
> +		if (!xe_vma_is_cpu_addr_mirror(vma) || !xe_vma_has_default_mem_attrs(vma)) {
> +			if (in_merge && merge_len > 1)
> +				add_merged_range(&merged_ranges, &merge_count, &merge_capacity,
> +						 merge_start, merge_end);
> +
> +			in_merge = false;
> +			merge_len = 0;
> +			continue;
> +		}
> +
> +		if (!in_merge) {
> +			merge_start = xe_vma_start(vma);
> +			merge_end = xe_vma_end(vma);
> +			in_merge = true;
> +			merge_len = 1;
> +		} else if (xe_vma_start(vma) == merge_end && xe_vma_has_default_mem_attrs(vma)) {
> +			merge_end = xe_vma_end(vma);
> +			merge_len++;
> +		} else {
> +			if (merge_len > 1)
> +				add_merged_range(&merged_ranges, &merge_count, &merge_capacity,
> +						 merge_start, merge_end);
> +			merge_start = xe_vma_start(vma);
> +			merge_end = xe_vma_end(vma);
> +			merge_len = 1;
> +		}
> +	}
> +
> +	if (in_merge && merge_len > 1) {
> +		add_merged_range(&merged_ranges, &merge_count, &merge_capacity,
> +				 merge_start, merge_end);
> +	}
> +
> +	for (int i = 0; i < merge_count; i++) {
> +		vm_dbg(&vm->xe->drm, "Merged VA range %d: start=0x%016llx, end=0x%016llx\n",
> +		       i, merged_ranges[i].start, merged_ranges[i].end);
> +
> +		if (xe_vm_alloc_cpu_addr_mirror_vma(vm, merged_ranges[i].start,
> +						    merged_ranges[i].end - merged_ranges[i].start))
> +			break;
> +	}
> +
> +	up_write(&vm->lock);
> +	kfree(merged_ranges);
> +	schedule_delayed_work(&vm->merge_vmas_work, msecs_to_jiffies(5000));
> +}
> +
> +/*
> + * xe_vm_start_vmas_merge - Initialize and schedule VMA merge work
> + * @vm: Pointer to the xe_vm structure
> + *
> + * Initializes the delayed work item responsible for merging adjacent
> + * CPU address mirror VMAs with default memory attributes. This function
> + * sets the merge_active flag and schedules the work to run after a delay,
> + * allowing batching of VMA updates.
> + */
> +static void xe_vm_start_vmas_merge(struct xe_vm *vm)
> +{
> +	if (vm->merge_active)
> +		return;
> +
> +	vm->merge_active = true;
> +	INIT_DELAYED_WORK(&vm->merge_vmas_work, xe_vm_vmas_merge_worker);
> +	schedule_delayed_work(&vm->merge_vmas_work, msecs_to_jiffies(5000));
> +}
> +
> +/*
> + * xe_vm_stop_vmas_merge - Cancel scheduled VMA merge work
> + * @vm: Pointer to the xe_vm structure
> + */
> +static void xe_vm_stop_vmas_merge(struct xe_vm *vm)
> +{
> +	if (!vm->merge_active)
> +		return;
> +
> +	vm->merge_active = false;
> +	cancel_delayed_work_sync(&vm->merge_vmas_work);
> +}
> +
>  #define VMA_CREATE_FLAG_READ_ONLY		BIT(0)
>  #define VMA_CREATE_FLAG_IS_NULL			BIT(1)
>  #define VMA_CREATE_FLAG_DUMPABLE		BIT(2)
> @@ -1269,6 +1390,9 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  		xe_vm_get(vm);
>  	}
>  
> +	if (xe_vma_is_cpu_addr_mirror(vma))
> +		xe_vm_start_vmas_merge(vm);
> +
>  	return vma;
>  }
>  
> @@ -1982,6 +2106,8 @@ static void vm_destroy_work_func(struct work_struct *w)
>  	/* xe_vm_close_and_put was not called? */
>  	xe_assert(xe, !vm->size);
>  
> +	xe_vm_stop_vmas_merge(vm);
> +
>  	if (xe_vm_in_preempt_fence_mode(vm))
>  		flush_work(&vm->preempt.rebind_work);
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index 351242c92c12..c4f3542eb464 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -374,6 +374,21 @@ struct xe_vm {
>  	bool batch_invalidate_tlb;
>  	/** @xef: XE file handle for tracking this VM's drm client */
>  	struct xe_file *xef;
> +
> +	/**
> +	 * @merge_vmas_work: Delayed work item used to merge CPU address mirror VMAs.
> +	 * This work is scheduled to scan the GPU virtual memory space and
> +	 * identify adjacent CPU address mirror VMAs that have default memory
> +	 * attributes. When such VMAs are found, they are merged into a single
> +	 * larger VMA to reduce fragmentation. The merging process is triggered
> +	 * asynchronously via a delayed workqueue avoid blocking critical paths
> +	 * and to batch updates when possible.
> +	 */
> +	struct delayed_work merge_vmas_work;
> +
> +	/** @merge_active: True if merge_vmas_work has been initialized */
> +	bool merge_active;
> +
>  };
>  
>  /** struct xe_vma_op_map - VMA map operation */
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 20/23] drm/xe: Reset VMA attributes to default in SVM garbage collector
  2025-07-24 21:50   ` Matthew Brost
@ 2025-07-29  5:27     ` Matthew Brost
  2025-07-30  6:09     ` Ghimiray, Himal Prasad
  1 sibling, 0 replies; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  5:27 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström

On Thu, Jul 24, 2025 at 02:50:47PM -0700, Matthew Brost wrote:
> On Tue, Jul 22, 2025 at 07:05:23PM +0530, Himal Prasad Ghimiray wrote:
> > Restore default memory attributes for VMAs during garbage collection
> > if they were modified by madvise. Reuse existing VMA if fully overlapping;
> > otherwise, allocate a new mirror VMA.
> > 
> > Suggested-by: Matthew Brost <matthew.brost@intel.com>
> > Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_svm.c |  34 +++++++++
> >  drivers/gpu/drm/xe/xe_vm.c  | 140 +++++++++++++++++++++++++-----------
> >  drivers/gpu/drm/xe/xe_vm.h  |   2 +
> >  3 files changed, 135 insertions(+), 41 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> > index ba1233d0d5a2..79709dc066b9 100644
> > --- a/drivers/gpu/drm/xe/xe_svm.c
> > +++ b/drivers/gpu/drm/xe/xe_svm.c
> > @@ -255,7 +255,18 @@ static int __xe_svm_garbage_collector(struct xe_vm *vm,
> >  static int xe_svm_garbage_collector(struct xe_vm *vm)
> >  {
> >  	struct xe_svm_range *range;
> > +	struct xe_vma *vma;
> > +	u64 range_start;
> > +	u64 range_size;
> > +	u64 range_end;
> >  	int err;
> > +	struct xe_vma_mem_attr default_attr = {
> > +		.preferred_loc = {
> > +			.devmem_fd = DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE,
> > +			.migration_policy = DRM_XE_MIGRATE_ALL_PAGES,
> > +		},
> > +		.atomic_access = DRM_XE_ATOMIC_UNDEFINED,
> > +	};
> >  
> >  	lockdep_assert_held_write(&vm->lock);
> >  
> > @@ -270,6 +281,12 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
> >  		if (!range)
> >  			break;
> >  
> > +		range_start = xe_svm_range_start(range);
> > +		range_size = xe_svm_range_size(range);
> > +		range_end = xe_svm_range_end(range);
> > +
> > +		vma = xe_vm_find_vma_by_addr(vm, xe_svm_range_start(range));
> > +
> 
> I'd find the VMA outside of the svm.garbage_collector.lock.
> 
> >  		list_del(&range->garbage_collector_link);
> >  		spin_unlock(&vm->svm.garbage_collector.lock);
> >  
> > @@ -282,7 +299,24 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
> >  			return err;
> >  		}
> >  
> > +		if (!xe_vma_has_default_mem_attrs(vma)) {
> 
> It seems possible the VMA could be NULL in error cases. I'd check for
> NULL and error out.
> 
> Also could this code be moved to a helper? Internal SVM seems ok, in
> that case xe_vm_find_vma_by_addr could also be in the helper.
> 
> > +			vm_dbg(&vm->xe->drm, "Existing VMA start=0x%016llx, vma_end=0x%016llx",
> > +			       xe_vma_start(vma), xe_vma_end(vma));
> > +
> > +			if (xe_vma_start(vma) == range_start && xe_vma_end(vma) == range_end) {
> > +				default_attr.pat_index = vma->attr.default_pat_index;
> > +				default_attr.default_pat_index  = vma->attr.default_pat_index;
> > +				vma->attr = default_attr;
> > +			} else {
> > +				vm_dbg(&vm->xe->drm, "Split VMA start=0x%016llx, vma_end=0x%016llx",
> > +				       range_start, range_end);
> > +				err = xe_vm_alloc_cpu_addr_mirror_vma(vm, range_start, range_size);
> > +				if (err)
> 
> On error, I'd print a message and kill the VM as it shouldn't be
> possible to fail aside from a memory allocation failure and we can't
> code with errors given this can be inside a worker.
> 
> I'll circle back to the rest of the patch a bit later.
> 

I think the rest of the patch makes sense.

Matt

> Matt
> 
> > +					return err;
> > +			}
> > +		}
> >  		spin_lock(&vm->svm.garbage_collector.lock);
> > +
> >  	}
> >  	spin_unlock(&vm->svm.garbage_collector.lock);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index d3f08bf9a3ee..003c8209f8bd 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -4254,34 +4254,24 @@ bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool
> >  	}
> >  }
> >  
> > -/**
> > - * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
> > - * @vm: Pointer to the xe_vm structure
> > - * @start: Starting input address
> > - * @range: Size of the input range
> > - *
> > - * This function splits existing vma to create new vma for user provided input range
> > - *
> > - *  Return: 0 if success
> > - */
> > -int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
> > +static int xe_vm_alloc_vma(struct xe_vm *vm,
> > +			   u64 start, u64 range,
> > +			   enum drm_gpuvm_sm_map_ops_flags flags)
> >  {
> >  	struct xe_vma_ops vops;
> >  	struct drm_gpuva_ops *ops = NULL;
> >  	struct drm_gpuva_op *__op;
> >  	bool is_cpu_addr_mirror = false;
> >  	bool remap_op = false;
> > +	bool is_madvise = flags == DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE;
> >  	struct xe_vma_mem_attr tmp_attr;
> > +	u16 default_pat;
> >  	int err;
> >  
> > -	vm_dbg(&vm->xe->drm, "MADVISE IN: addr=0x%016llx, size=0x%016llx", start, range);
> > -
> >  	lockdep_assert_held_write(&vm->lock);
> >  
> > -	vm_dbg(&vm->xe->drm, "MADVISE_OPS_CREATE: addr=0x%016llx, size=0x%016llx", start, range);
> >  	ops = drm_gpuvm_sm_map_ops_create(&vm->gpuvm, start, range,
> > -					  DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE,
> > -					  NULL, start);
> > +					  flags, NULL, start);
> >  	if (IS_ERR(ops))
> >  		return PTR_ERR(ops);
> >  
> > @@ -4292,33 +4282,56 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
> >  
> >  	drm_gpuva_for_each_op(__op, ops) {
> >  		struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
> > +		struct xe_vma *vma = NULL;
> >  
> > -		if (__op->op == DRM_GPUVA_OP_REMAP) {
> > -			xe_assert(vm->xe, !remap_op);
> > -			remap_op = true;
> > +		if (!is_madvise) {
> > +			if (__op->op == DRM_GPUVA_OP_UNMAP) {
> > +				vma = gpuva_to_vma(op->base.unmap.va);
> > +				XE_WARN_ON(!xe_vma_has_default_mem_attrs(vma));
> > +				default_pat = vma->attr.default_pat_index;
> > +			}
> >  
> > -			if (xe_vma_is_cpu_addr_mirror(gpuva_to_vma(op->base.remap.unmap->va)))
> > -				is_cpu_addr_mirror = true;
> > -			else
> > -				is_cpu_addr_mirror = false;
> > -		}
> > +			if (__op->op == DRM_GPUVA_OP_REMAP) {
> > +				vma = gpuva_to_vma(op->base.remap.unmap->va);
> > +				default_pat = vma->attr.default_pat_index;
> > +			}
> >  
> > -		if (__op->op == DRM_GPUVA_OP_MAP) {
> > -			xe_assert(vm->xe, remap_op);
> > -			remap_op = false;
> > +			if (__op->op == DRM_GPUVA_OP_MAP) {
> > +				op->map.is_cpu_addr_mirror = true;
> > +				op->map.pat_index = default_pat;
> > +			}
> > +		} else {
> > +			if (__op->op == DRM_GPUVA_OP_REMAP) {
> > +				vma = gpuva_to_vma(op->base.remap.unmap->va);
> > +				xe_assert(vm->xe, !remap_op);
> > +				remap_op = true;
> >  
> > -			/* In case of madvise ops DRM_GPUVA_OP_MAP is always after
> > -			 * DRM_GPUVA_OP_REMAP, so ensure we assign op->map.is_cpu_addr_mirror true
> > -			 * if REMAP is for xe_vma_is_cpu_addr_mirror vma
> > -			 */
> > -			op->map.is_cpu_addr_mirror = is_cpu_addr_mirror;
> > -		}
> > +				if (xe_vma_is_cpu_addr_mirror(vma))
> > +					is_cpu_addr_mirror = true;
> > +				else
> > +					is_cpu_addr_mirror = false;
> > +			}
> >  
> > +			if (__op->op == DRM_GPUVA_OP_MAP) {
> > +				xe_assert(vm->xe, remap_op);
> > +				remap_op = false;
> > +				/*
> > +				 * In case of madvise ops DRM_GPUVA_OP_MAP is
> > +				 * always after DRM_GPUVA_OP_REMAP, so ensure
> > +				 * we assign op->map.is_cpu_addr_mirror true
> > +				 * if REMAP is for xe_vma_is_cpu_addr_mirror vma
> > +				 */
> > +				op->map.is_cpu_addr_mirror = is_cpu_addr_mirror;
> > +			}
> > +		}
> >  		print_op(vm->xe, __op);
> >  	}
> >  
> >  	xe_vma_ops_init(&vops, vm, NULL, NULL, 0);
> > -	vops.flags |= XE_VMA_OPS_FLAG_MADVISE;
> > +
> > +	if (is_madvise)
> > +		vops.flags |= XE_VMA_OPS_FLAG_MADVISE;
> > +
> >  	err = vm_bind_ioctl_ops_parse(vm, ops, &vops);
> >  	if (err)
> >  		goto unwind_ops;
> > @@ -4330,15 +4343,20 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
> >  		struct xe_vma *vma;
> >  
> >  		if (__op->op == DRM_GPUVA_OP_UNMAP) {
> > -			/* There should be no unmap */
> > -			XE_WARN_ON("UNEXPECTED UNMAP");
> > -			xe_vma_destroy(gpuva_to_vma(op->base.unmap.va), NULL);
> > +			vma = gpuva_to_vma(op->base.unmap.va);
> > +			/* There should be no unmap for madvise */
> > +			if (is_madvise)
> > +				XE_WARN_ON("UNEXPECTED UNMAP");
> > +
> > +			xe_vma_destroy(vma, NULL);
> >  		} else if (__op->op == DRM_GPUVA_OP_REMAP) {
> >  			vma = gpuva_to_vma(op->base.remap.unmap->va);
> > -			/* Store attributes for REMAP UNMAPPED VMA, so they can be assigned
> > -			 * to newly MAP created vma.
> > +			/* In case of madvise ops Store attributes for REMAP UNMAPPED
> > +			 * VMA, so they can be assigned to newly MAP created vma.
> >  			 */
> > -			tmp_attr = vma->attr;
> > +			if (is_madvise)
> > +				tmp_attr = vma->attr;
> > +
> >  			xe_vma_destroy(gpuva_to_vma(op->base.remap.unmap->va), NULL);
> >  		} else if (__op->op == DRM_GPUVA_OP_MAP) {
> >  			vma = op->map.vma;
> > @@ -4346,7 +4364,8 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
> >  			 * Therefore temp_attr will always have sane values, making it safe to
> >  			 * copy them to new vma.
> >  			 */
> > -			vma->attr = tmp_attr;
> > +			if (is_madvise)
> > +				vma->attr = tmp_attr;
> >  		}
> >  	}
> >  
> > @@ -4360,3 +4379,42 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
> >  	drm_gpuva_ops_free(&vm->gpuvm, ops);
> >  	return err;
> >  }
> > +
> > +/**
> > + * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
> > + * @vm: Pointer to the xe_vm structure
> > + * @start: Starting input address
> > + * @range: Size of the input range
> > + *
> > + * This function splits existing vma to create new vma for user provided input range
> > + *
> > + * Return: 0 if success
> > + */
> > +int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
> > +{
> > +	lockdep_assert_held_write(&vm->lock);
> > +
> > +	vm_dbg(&vm->xe->drm, "MADVISE_OPS_CREATE: addr=0x%016llx, size=0x%016llx", start, range);
> > +
> > +	return xe_vm_alloc_vma(vm, start, range, DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE);
> > +}
> > +
> > +/**
> > + * xe_vm_alloc_cpu_addr_mirror_vma - Allocate CPU addr mirror vma
> > + * @vm: Pointer to the xe_vm structure
> > + * @start: Starting input address
> > + * @range: Size of the input range
> > + *
> > + * This function splits/merges existing vma to create new vma for user provided input range
> > + *
> > + * Return: 0 if success
> > + */
> > +int xe_vm_alloc_cpu_addr_mirror_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
> > +{
> > +	lockdep_assert_held_write(&vm->lock);
> > +
> > +	vm_dbg(&vm->xe->drm, "CPU_ADDR_MIRROR_VMA_OPS_CREATE: addr=0x%016llx, size=0x%016llx",
> > +	       start, range);
> > +
> > +	return xe_vm_alloc_vma(vm, start, range, DRM_GPUVM_SM_MAP_NOT_MADVISE);
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> > index a4db843de540..f7b9ad83685a 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.h
> > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > @@ -177,6 +177,8 @@ bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool
> >  
> >  int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
> >  
> > +int xe_vm_alloc_cpu_addr_mirror_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
> > +
> >  /**
> >   * to_userptr_vma() - Return a pointer to an embedding userptr vma
> >   * @vma: Pointer to the embedded struct xe_vma
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 23/23] drm/xe/uapi: Add UAPI for querying VMA count and memory attributes
  2025-07-22 13:35 ` [PATCH v5 23/23] drm/xe/uapi: Add UAPI for querying VMA count and memory attributes Himal Prasad Ghimiray
@ 2025-07-29  5:37   ` Matthew Brost
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  5:37 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström, Shuicheng Lin

On Tue, Jul 22, 2025 at 07:05:26PM +0530, Himal Prasad Ghimiray wrote:
> Introduce the DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS ioctl to allow
> userspace to query memory attributes of VMAs within a user specified
> virtual address range.
> 
> Userspace first calls the ioctl with num_mem_ranges = 0,
> sizeof_mem_ranges_attr = 0 and vector_of_vma_mem_attr = NULL to retrieve
> the number of memory ranges (vmas) and size of each memory range attribute.
> Then, it allocates a buffer of that size and calls the ioctl again to fill
> the buffer with memory range attributes.
> 
> This two-step interface allows userspace to first query the required
> buffer size, then retrieve detailed attributes efficiently.
> 
> v2 (Matthew Brost)
> - Use same ioctl to overload functionality
> 
> v3
> - Add kernel-doc
> 
> v4
> - Make uapi future proof by passing struct size (Matthew Brost)
> - make lock interruptible (Matthew Brost)
> - set reserved bits to zero (Matthew Brost)
> - s/__copy_to_user/copy_to_user (Matthew Brost)
> - Avod using VMA term in uapi (Thomas)
> - xe_vm_put(vm) is missing (Shuicheng)
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Shuicheng Lin <shuicheng.lin@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device.c |   2 +
>  drivers/gpu/drm/xe/xe_vm.c     | 101 ++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_vm.h     |   2 +-
>  include/uapi/drm/xe_drm.h      | 137 +++++++++++++++++++++++++++++++++
>  4 files changed, 241 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index b02c4ae0fdbf..1e77570db531 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -202,6 +202,8 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
>  			  DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(XE_OBSERVATION, xe_observation_ioctl, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(XE_MADVISE, xe_vm_madvise_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(XE_VM_QUERY_MEM_RANGE_ATTRS, xe_vm_query_vmas_attrs_ioctl,
> +			  DRM_RENDER_ALLOW),
>  };
>  
>  static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index bee849167c0d..e54ab4dce8df 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -2297,6 +2297,107 @@ int xe_vm_destroy_ioctl(struct drm_device *dev, void *data,
>  	return err;
>  }
>  
> +static int xe_vm_query_vmas(struct xe_vm *vm, u64 start, u64 end)
> +{
> +	struct drm_gpuva *gpuva;
> +	u32 num_vmas = 0;
> +
> +	lockdep_assert_held(&vm->lock);
> +	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, start, end)
> +		num_vmas++;
> +
> +	return num_vmas;
> +}
> +
> +static int get_mem_attrs(struct xe_vm *vm, u32 *num_vmas, u64 start,
> +			 u64 end, struct drm_xe_mem_range_attr *attrs)
> +{
> +	struct drm_gpuva *gpuva;
> +	int i = 0;
> +
> +	lockdep_assert_held(&vm->lock);
> +
> +	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, start, end) {
> +		struct xe_vma *vma = gpuva_to_vma(gpuva);
> +
> +		if (i == *num_vmas)
> +			return -ENOSPC;
> +
> +		attrs[i].start = xe_vma_start(vma);
> +		attrs[i].end = xe_vma_end(vma);
> +		attrs[i].atomic.val = vma->attr.atomic_access;
> +		attrs[i].pat_index.val = vma->attr.pat_index;
> +		attrs[i].preferred_mem_loc.devmem_fd = vma->attr.preferred_loc.devmem_fd;
> +		attrs[i].preferred_mem_loc.migration_policy =
> +		vma->attr.preferred_loc.migration_policy;
> +
> +		i++;
> +	}
> +
> +	if (i <  (*num_vmas - 1))
> +		*num_vmas = i;

Shouldn't you just set this without a condition?

> +	return 0;
> +}
> +
> +int xe_vm_query_vmas_attrs_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> +{
> +	struct xe_device *xe = to_xe_device(dev);
> +	struct xe_file *xef = to_xe_file(file);
> +	struct drm_xe_mem_range_attr *mem_attrs;
> +	struct drm_xe_vm_query_mem_range_attr *args = data;
> +	u64 __user *attrs_user = u64_to_user_ptr(args->vector_of_mem_attr);
> +	struct xe_vm *vm;
> +	int err = 0;
> +
> +	if (XE_IOCTL_DBG(xe,
> +			 ((args->num_mem_ranges == 0 &&
> +			  (attrs_user || args->sizeof_mem_range_attr != 0)) ||
> +			 (args->num_mem_ranges > 0 &&
> +			  (!attrs_user || args->sizeof_mem_range_attr == 0)))))

sizeof_mem_range_attr != sizeof(struct drm_xe_mem_range_attr)

Looks good aside from these few nits.

Matt

> +		return -EINVAL;
> +
> +	vm = xe_vm_lookup(xef, args->vm_id);
> +	if (XE_IOCTL_DBG(xe, !vm))
> +		return -EINVAL;
> +
> +	err = down_read_interruptible(&vm->lock);
> +	if (err)
> +		goto put_vm;
> +
> +	attrs_user = u64_to_user_ptr(args->vector_of_mem_attr);
> +
> +	if (args->num_mem_ranges == 0 && !attrs_user) {
> +		args->num_mem_ranges = xe_vm_query_vmas(vm, args->start, args->start + args->range);
> +		args->sizeof_mem_range_attr = sizeof(struct drm_xe_mem_range_attr);
> +		goto unlock_vm;
> +	}
> +
> +	mem_attrs = kvmalloc_array(args->num_mem_ranges, args->sizeof_mem_range_attr,
> +				   GFP_KERNEL | __GFP_ACCOUNT |
> +				   __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> +	if (!mem_attrs) {
> +		err = args->num_mem_ranges > 1 ? -ENOBUFS : -ENOMEM;
> +		goto unlock_vm;
> +	}
> +
> +	memset(mem_attrs, 0, args->num_mem_ranges * args->sizeof_mem_range_attr);
> +	err = get_mem_attrs(vm, &args->num_mem_ranges, args->start,
> +			    args->start + args->range, mem_attrs);
> +	if (err)
> +		goto free_mem_attrs;
> +
> +	err = copy_to_user(attrs_user, mem_attrs,
> +			   args->sizeof_mem_range_attr * args->num_mem_ranges);
> +
> +free_mem_attrs:
> +	kvfree(mem_attrs);
> +unlock_vm:
> +	up_read(&vm->lock);
> +put_vm:
> +	xe_vm_put(vm);
> +	return err;
> +}
> +
>  static bool vma_matches(struct xe_vma *vma, u64 page_addr)
>  {
>  	if (page_addr > xe_vma_end(vma) - 1 ||
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index f7b9ad83685a..6f25d6820991 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -199,7 +199,7 @@ int xe_vm_destroy_ioctl(struct drm_device *dev, void *data,
>  			struct drm_file *file);
>  int xe_vm_bind_ioctl(struct drm_device *dev, void *data,
>  		     struct drm_file *file);
> -
> +int xe_vm_query_vmas_attrs_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
>  void xe_vm_close_and_put(struct xe_vm *vm);
>  
>  static inline bool xe_vm_in_fault_mode(struct xe_vm *vm)
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 8f1d48664424..ee328bcb8bfa 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -82,6 +82,7 @@ extern "C" {
>   *  - &DRM_IOCTL_XE_WAIT_USER_FENCE
>   *  - &DRM_IOCTL_XE_OBSERVATION
>   *  - &DRM_IOCTL_XE_MADVISE
> + *  - &DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS
>   */
>  
>  /*
> @@ -104,6 +105,7 @@ extern "C" {
>  #define DRM_XE_WAIT_USER_FENCE		0x0a
>  #define DRM_XE_OBSERVATION		0x0b
>  #define DRM_XE_MADVISE			0x0c
> +#define DRM_XE_VM_QUERY_MEM_RANGE_ATTRS	0x0d
>  
>  /* Must be kept compact -- no holes */
>  
> @@ -120,6 +122,7 @@ extern "C" {
>  #define DRM_IOCTL_XE_WAIT_USER_FENCE		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence)
>  #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
>  #define DRM_IOCTL_XE_MADVISE			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise)
> +#define DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS	DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_VM_QUERY_MEM_RANGE_ATTRS, struct drm_xe_vm_query_mem_range_attr)
>  
>  /**
>   * DOC: Xe IOCTL Extensions
> @@ -2110,6 +2113,140 @@ struct drm_xe_madvise {
>  	__u64 reserved[2];
>  };
>  
> +/**
> + * struct drm_xe_mem_range_attr - Output of &DRM_IOCTL_XE_VM_QUERY_MEM_RANGES_ATTRS
> + *
> + * This structure is provided by userspace and filled by KMD in response to the
> + * DRM_IOCTL_XE_VM_QUERY_MEM_RANGES_ATTRS ioctl. It describes memory attributes of
> + * a memory ranges within a user specified address range in a VM.
> + *
> + * The structure includes information such as atomic access policy,
> + * page attribute table (PAT) index, and preferred memory location.
> + * Userspace allocates an array of these structures and passes a pointer to the
> + * ioctl to retrieve attributes for each memory ranges
> + *
> + * @extensions: Pointer to the first extension struct, if any
> + * @start: Start address of the memory range
> + * @end: End address of the virtual memory range
> + *
> + */
> +struct drm_xe_mem_range_attr {
> +	 /** @extensions: Pointer to the first extension struct, if any */
> +	__u64 extensions;
> +
> +	/** @start: start of the memory range */
> +	__u64 start;
> +
> +	/** @end: end of the memory range */
> +	__u64 end;
> +
> +	/** @preferred_mem_loc: preferred memory location */
> +	struct {
> +		/** @preferred_mem_loc.devmem_fd: fd for preferred loc */
> +		__u32 devmem_fd;
> +
> +		/** @preferred_mem_loc.migration_policy: Page migration policy */
> +		__u32 migration_policy;
> +	} preferred_mem_loc;
> +
> +	struct {
> +		/** @atomic.val: atomic attribute */
> +		__u32 val;
> +
> +		/** @atomic.reserved: Reserved */
> +		__u32 reserved;
> +	} atomic;
> +
> +	struct {
> +		/** @pat_index.val: PAT index */
> +		__u32 val;
> +
> +		/** @pat_index.reserved: Reserved */
> +		__u32 reserved;
> +	} pat_index;
> +
> +	/** @reserved: Reserved */
> +	__u64 reserved[2];
> +};
> +
> +/**
> + * struct drm_xe_vm_query_mem_range_attr - Input of &DRM_IOCTL_XE_VM_QUERY_MEM_ATTRIBUTES
> + *
> + * This structure is used to query memory attributes of memory regions
> + * within a user specified address range in a VM. It provides detailed
> + * information about each memory range, including atomic access policy,
> + * page attribute table (PAT) index, and preferred memory location.
> + *
> + * Userspace first calls the ioctl with @num_mem_ranges = 0,
> + * @sizeof_mem_ranges_attr = 0 and @vector_of_vma_mem_attr = NULL to retrieve
> + * the number of memory regions and size of each memory range attribute.
> + * Then, it allocates a buffer of that size and calls the ioctl again to fill
> + * the buffer with memory range attributes.
> + *
> + * If second call fails with -ENOSPC, it means memory ranges changed between
> + * first call and now, retry IOCTL again with @num_mem_ranges = 0,
> + * @sizeof_mem_ranges_attr = 0 and @vector_of_vma_mem_attr = NULL followed by
> + * Second ioctl call.
> + *
> + * Example:
> + *
> + * .. code-block:: C
> + *    struct drm_xe_vm_query_mem_range_attr query = {
> + *         .vm_id = vm_id,
> + *         .start = 0x100000,
> + *         .range = 0x2000,
> + *     };
> + *
> + *    // First ioctl call to get num of mem regions and sizeof each attribute
> + *    ioctl(fd, DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS, &query);
> + *
> + *    // Allocate buffer for the memory region attributes
> + *    void *ptr = malloc(query.num_mem_ranges * query.sizeof_mem_range_attr);
> + *
> + *    query.vector_of_mem_attr = (uintptr_t)ptr;
> + *
> + *    // Second ioctl call to actually fill the memory attributes
> + *    ioctl(fd, DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS, &query);
> + *
> + *    // Iterate over the returned memory region attributes
> + *    for (unsigned int i = 0; i < query.num_mem_ranges; ++i) {
> + *       struct drm_xe_mem_range_attr *attr = (struct drm_xe_mem_range_attr *)ptr;
> + *
> + *       // Do something with attr
> + *
> + *       // Move pointer by one entry
> + *       ptr += query.sizeof_mem_range_attr;
> + *     }
> + *
> + *    free(ptr);
> + */
> +struct drm_xe_vm_query_mem_range_attr {
> +	/** @extensions: Pointer to the first extension struct, if any */
> +	__u64 extensions;
> +
> +	/** @vm_id: vm_id of the virtual range */
> +	__u32 vm_id;
> +
> +	/** @num_mem_ranges: number of mem_ranges in range */
> +	__u32 num_mem_ranges;
> +
> +	/** @start: start of the virtual address range */
> +	__u64 start;
> +
> +	/** @range: size of the virtual address range */
> +	__u64 range;
> +
> +	/** @sizeof_mem_range_attr: size of struct drm_xe_mem_range_attr */
> +	__u64 sizeof_mem_range_attr;
> +
> +	/** @vector_of_ops: userptr to array of struct drm_xe_mem_range_attr */
> +	__u64 vector_of_mem_attr;
> +
> +	/** @reserved: Reserved */
> +	__u64 reserved[2];
> +
> +};
> +
>  #if defined(__cplusplus)
>  }
>  #endif
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 20/23] drm/xe: Reset VMA attributes to default in SVM garbage collector
  2025-07-22 13:35 ` [PATCH v5 20/23] drm/xe: Reset VMA attributes to default in SVM garbage collector Himal Prasad Ghimiray
  2025-07-24 21:50   ` Matthew Brost
@ 2025-07-29  5:41   ` Matthew Brost
  2025-07-30  6:06     ` Ghimiray, Himal Prasad
  1 sibling, 1 reply; 55+ messages in thread
From: Matthew Brost @ 2025-07-29  5:41 UTC (permalink / raw)
  To: Himal Prasad Ghimiray; +Cc: intel-xe, Thomas Hellström

On Tue, Jul 22, 2025 at 07:05:23PM +0530, Himal Prasad Ghimiray wrote:
> Restore default memory attributes for VMAs during garbage collection
> if they were modified by madvise. Reuse existing VMA if fully overlapping;
> otherwise, allocate a new mirror VMA.
> 
> Suggested-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_svm.c |  34 +++++++++
>  drivers/gpu/drm/xe/xe_vm.c  | 140 +++++++++++++++++++++++++-----------
>  drivers/gpu/drm/xe/xe_vm.h  |   2 +
>  3 files changed, 135 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index ba1233d0d5a2..79709dc066b9 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -255,7 +255,18 @@ static int __xe_svm_garbage_collector(struct xe_vm *vm,
>  static int xe_svm_garbage_collector(struct xe_vm *vm)
>  {
>  	struct xe_svm_range *range;
> +	struct xe_vma *vma;
> +	u64 range_start;
> +	u64 range_size;
> +	u64 range_end;
>  	int err;
> +	struct xe_vma_mem_attr default_attr = {
> +		.preferred_loc = {
> +			.devmem_fd = DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE,
> +			.migration_policy = DRM_XE_MIGRATE_ALL_PAGES,
> +		},
> +		.atomic_access = DRM_XE_ATOMIC_UNDEFINED,
> +	};
>  
>  	lockdep_assert_held_write(&vm->lock);
>  
> @@ -270,6 +281,12 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
>  		if (!range)
>  			break;
>  
> +		range_start = xe_svm_range_start(range);
> +		range_size = xe_svm_range_size(range);
> +		range_end = xe_svm_range_end(range);
> +
> +		vma = xe_vm_find_vma_by_addr(vm, xe_svm_range_start(range));
> +
>  		list_del(&range->garbage_collector_link);
>  		spin_unlock(&vm->svm.garbage_collector.lock);
>  
> @@ -282,7 +299,24 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
>  			return err;
>  		}
>  
> +		if (!xe_vma_has_default_mem_attrs(vma)) {
> +			vm_dbg(&vm->xe->drm, "Existing VMA start=0x%016llx, vma_end=0x%016llx",
> +			       xe_vma_start(vma), xe_vma_end(vma));
> +
> +			if (xe_vma_start(vma) == range_start && xe_vma_end(vma) == range_end) {
> +				default_attr.pat_index = vma->attr.default_pat_index;
> +				default_attr.default_pat_index  = vma->attr.default_pat_index;
> +				vma->attr = default_attr;
> +			} else {
> +				vm_dbg(&vm->xe->drm, "Split VMA start=0x%016llx, vma_end=0x%016llx",
> +				       range_start, range_end);
> +				err = xe_vm_alloc_cpu_addr_mirror_vma(vm, range_start, range_size);

I missed this corner, if you call from the fault handler the VMA it
looked up originally could be gone, if you modify the VMAs you need to
signal the fault handler to lookup the VMA again.

Matt

> +				if (err)
> +					return err;
> +			}
> +		}
>  		spin_lock(&vm->svm.garbage_collector.lock);
> +
>  	}
>  	spin_unlock(&vm->svm.garbage_collector.lock);
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index d3f08bf9a3ee..003c8209f8bd 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -4254,34 +4254,24 @@ bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool
>  	}
>  }
>  
> -/**
> - * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
> - * @vm: Pointer to the xe_vm structure
> - * @start: Starting input address
> - * @range: Size of the input range
> - *
> - * This function splits existing vma to create new vma for user provided input range
> - *
> - *  Return: 0 if success
> - */
> -int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
> +static int xe_vm_alloc_vma(struct xe_vm *vm,
> +			   u64 start, u64 range,
> +			   enum drm_gpuvm_sm_map_ops_flags flags)
>  {
>  	struct xe_vma_ops vops;
>  	struct drm_gpuva_ops *ops = NULL;
>  	struct drm_gpuva_op *__op;
>  	bool is_cpu_addr_mirror = false;
>  	bool remap_op = false;
> +	bool is_madvise = flags == DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE;
>  	struct xe_vma_mem_attr tmp_attr;
> +	u16 default_pat;
>  	int err;
>  
> -	vm_dbg(&vm->xe->drm, "MADVISE IN: addr=0x%016llx, size=0x%016llx", start, range);
> -
>  	lockdep_assert_held_write(&vm->lock);
>  
> -	vm_dbg(&vm->xe->drm, "MADVISE_OPS_CREATE: addr=0x%016llx, size=0x%016llx", start, range);
>  	ops = drm_gpuvm_sm_map_ops_create(&vm->gpuvm, start, range,
> -					  DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE,
> -					  NULL, start);
> +					  flags, NULL, start);
>  	if (IS_ERR(ops))
>  		return PTR_ERR(ops);
>  
> @@ -4292,33 +4282,56 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>  
>  	drm_gpuva_for_each_op(__op, ops) {
>  		struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
> +		struct xe_vma *vma = NULL;
>  
> -		if (__op->op == DRM_GPUVA_OP_REMAP) {
> -			xe_assert(vm->xe, !remap_op);
> -			remap_op = true;
> +		if (!is_madvise) {
> +			if (__op->op == DRM_GPUVA_OP_UNMAP) {
> +				vma = gpuva_to_vma(op->base.unmap.va);
> +				XE_WARN_ON(!xe_vma_has_default_mem_attrs(vma));
> +				default_pat = vma->attr.default_pat_index;
> +			}
>  
> -			if (xe_vma_is_cpu_addr_mirror(gpuva_to_vma(op->base.remap.unmap->va)))
> -				is_cpu_addr_mirror = true;
> -			else
> -				is_cpu_addr_mirror = false;
> -		}
> +			if (__op->op == DRM_GPUVA_OP_REMAP) {
> +				vma = gpuva_to_vma(op->base.remap.unmap->va);
> +				default_pat = vma->attr.default_pat_index;
> +			}
>  
> -		if (__op->op == DRM_GPUVA_OP_MAP) {
> -			xe_assert(vm->xe, remap_op);
> -			remap_op = false;
> +			if (__op->op == DRM_GPUVA_OP_MAP) {
> +				op->map.is_cpu_addr_mirror = true;
> +				op->map.pat_index = default_pat;
> +			}
> +		} else {
> +			if (__op->op == DRM_GPUVA_OP_REMAP) {
> +				vma = gpuva_to_vma(op->base.remap.unmap->va);
> +				xe_assert(vm->xe, !remap_op);
> +				remap_op = true;
>  
> -			/* In case of madvise ops DRM_GPUVA_OP_MAP is always after
> -			 * DRM_GPUVA_OP_REMAP, so ensure we assign op->map.is_cpu_addr_mirror true
> -			 * if REMAP is for xe_vma_is_cpu_addr_mirror vma
> -			 */
> -			op->map.is_cpu_addr_mirror = is_cpu_addr_mirror;
> -		}
> +				if (xe_vma_is_cpu_addr_mirror(vma))
> +					is_cpu_addr_mirror = true;
> +				else
> +					is_cpu_addr_mirror = false;
> +			}
>  
> +			if (__op->op == DRM_GPUVA_OP_MAP) {
> +				xe_assert(vm->xe, remap_op);
> +				remap_op = false;
> +				/*
> +				 * In case of madvise ops DRM_GPUVA_OP_MAP is
> +				 * always after DRM_GPUVA_OP_REMAP, so ensure
> +				 * we assign op->map.is_cpu_addr_mirror true
> +				 * if REMAP is for xe_vma_is_cpu_addr_mirror vma
> +				 */
> +				op->map.is_cpu_addr_mirror = is_cpu_addr_mirror;
> +			}
> +		}
>  		print_op(vm->xe, __op);
>  	}
>  
>  	xe_vma_ops_init(&vops, vm, NULL, NULL, 0);
> -	vops.flags |= XE_VMA_OPS_FLAG_MADVISE;
> +
> +	if (is_madvise)
> +		vops.flags |= XE_VMA_OPS_FLAG_MADVISE;
> +
>  	err = vm_bind_ioctl_ops_parse(vm, ops, &vops);
>  	if (err)
>  		goto unwind_ops;
> @@ -4330,15 +4343,20 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>  		struct xe_vma *vma;
>  
>  		if (__op->op == DRM_GPUVA_OP_UNMAP) {
> -			/* There should be no unmap */
> -			XE_WARN_ON("UNEXPECTED UNMAP");
> -			xe_vma_destroy(gpuva_to_vma(op->base.unmap.va), NULL);
> +			vma = gpuva_to_vma(op->base.unmap.va);
> +			/* There should be no unmap for madvise */
> +			if (is_madvise)
> +				XE_WARN_ON("UNEXPECTED UNMAP");
> +
> +			xe_vma_destroy(vma, NULL);
>  		} else if (__op->op == DRM_GPUVA_OP_REMAP) {
>  			vma = gpuva_to_vma(op->base.remap.unmap->va);
> -			/* Store attributes for REMAP UNMAPPED VMA, so they can be assigned
> -			 * to newly MAP created vma.
> +			/* In case of madvise ops Store attributes for REMAP UNMAPPED
> +			 * VMA, so they can be assigned to newly MAP created vma.
>  			 */
> -			tmp_attr = vma->attr;
> +			if (is_madvise)
> +				tmp_attr = vma->attr;
> +
>  			xe_vma_destroy(gpuva_to_vma(op->base.remap.unmap->va), NULL);
>  		} else if (__op->op == DRM_GPUVA_OP_MAP) {
>  			vma = op->map.vma;
> @@ -4346,7 +4364,8 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>  			 * Therefore temp_attr will always have sane values, making it safe to
>  			 * copy them to new vma.
>  			 */
> -			vma->attr = tmp_attr;
> +			if (is_madvise)
> +				vma->attr = tmp_attr;
>  		}
>  	}
>  
> @@ -4360,3 +4379,42 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>  	drm_gpuva_ops_free(&vm->gpuvm, ops);
>  	return err;
>  }
> +
> +/**
> + * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
> + * @vm: Pointer to the xe_vm structure
> + * @start: Starting input address
> + * @range: Size of the input range
> + *
> + * This function splits existing vma to create new vma for user provided input range
> + *
> + * Return: 0 if success
> + */
> +int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
> +{
> +	lockdep_assert_held_write(&vm->lock);
> +
> +	vm_dbg(&vm->xe->drm, "MADVISE_OPS_CREATE: addr=0x%016llx, size=0x%016llx", start, range);
> +
> +	return xe_vm_alloc_vma(vm, start, range, DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE);
> +}
> +
> +/**
> + * xe_vm_alloc_cpu_addr_mirror_vma - Allocate CPU addr mirror vma
> + * @vm: Pointer to the xe_vm structure
> + * @start: Starting input address
> + * @range: Size of the input range
> + *
> + * This function splits/merges existing vma to create new vma for user provided input range
> + *
> + * Return: 0 if success
> + */
> +int xe_vm_alloc_cpu_addr_mirror_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
> +{
> +	lockdep_assert_held_write(&vm->lock);
> +
> +	vm_dbg(&vm->xe->drm, "CPU_ADDR_MIRROR_VMA_OPS_CREATE: addr=0x%016llx, size=0x%016llx",
> +	       start, range);
> +
> +	return xe_vm_alloc_vma(vm, start, range, DRM_GPUVM_SM_MAP_NOT_MADVISE);
> +}
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index a4db843de540..f7b9ad83685a 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -177,6 +177,8 @@ bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool
>  
>  int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
>  
> +int xe_vm_alloc_cpu_addr_mirror_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
> +
>  /**
>   * to_userptr_vma() - Return a pointer to an embedding userptr vma
>   * @vma: Pointer to the embedded struct xe_vma
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 08/23] drm/xe: Allow CPU address mirror VMA unbind with gpu bindings for madvise
  2025-07-29  3:40   ` Matthew Brost
@ 2025-07-29  7:42     ` Ghimiray, Himal Prasad
  0 siblings, 0 replies; 55+ messages in thread
From: Ghimiray, Himal Prasad @ 2025-07-29  7:42 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Thomas Hellström



On 29-07-2025 09:10, Matthew Brost wrote:
> On Tue, Jul 22, 2025 at 07:05:11PM +0530, Himal Prasad Ghimiray wrote:
>> In the case of the MADVISE ioctl, if the start or end addresses fall
>> within a VMA and existing SVM ranges are present, remove the existing
>> SVM mappings. Then, continue with ops_parse to create new VMAs by REMAP
>> unmapping of old one.
>>
>> v2 (Matthew Brost)
>> - Use vops flag to call unmapping of ranges in vm_bind_ioctl_ops_parse
>> - Rename the function
>>
>> v3
>> - Fix doc
>>
>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_svm.c | 28 ++++++++++++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_svm.h |  7 +++++++
>>   drivers/gpu/drm/xe/xe_vm.c  |  8 ++++++--
>>   3 files changed, 41 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
>> index a7ff5975873f..ce8a71b80811 100644
>> --- a/drivers/gpu/drm/xe/xe_svm.c
>> +++ b/drivers/gpu/drm/xe/xe_svm.c
>> @@ -933,6 +933,34 @@ bool xe_svm_has_mapping(struct xe_vm *vm, u64 start, u64 end)
>>   	return drm_gpusvm_has_mapping(&vm->svm.gpusvm, start, end);
>>   }
>>   
>> +/**
>> + * xe_svm_unmap_address_range - UNMAP SVM mappings and ranges
>> + * @vm: The VM
>> + * @start: start addr
>> + * @end: end addr
>> + *
>> + * This function UNMAPS svm ranges if start or end address are inside them.
>> + */
>> +void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end)
>> +{
>> +	struct drm_gpusvm_notifier *notifier, *next;
>> +
>> +	lockdep_assert_held_write(&vm->lock);
>> +
>> +	drm_gpusvm_for_each_notifier_safe(notifier, next, &vm->svm.gpusvm, start, end) {
>> +		struct drm_gpusvm_range *range, *__next;
>> +
>> +		drm_gpusvm_for_each_range_safe(range, __next, notifier, start, end) {
>> +			if (start > drm_gpusvm_range_start(range) ||
>> +			    end < drm_gpusvm_range_end(range)) {
>> +				if (IS_DGFX(vm->xe) && xe_svm_range_in_vram(to_xe_range(range)))
>> +					drm_gpusvm_range_evict(&vm->svm.gpusvm, range);
>> +				__xe_svm_garbage_collector(vm, to_xe_range(range));
> 
> There is a corner here - the range could be in the garbage collector
> list...
> 
> I think to fix you have to do this:
> 
> drm_gpusvm_range_get(range);
> __xe_svm_garbage_collector(vm, to_xe_range(range));
> if (!list_empty(&to_xe_range(range)->garbage_collector_link)) {
> 	spin_lock(&vm->svm.garbage_collector.list_lock);
> 	list_del(&to_xe_range(range)->garbage_collector_link);	
> 	spin_unlock(&vm->svm.garbage_collector.list_lock);
> }
> drm_gpusvm_range_put(range);
> 
> A little convoluted as it is only safe to check if the range is in the
> garbage collector list after it has been removed from the notifier,
> hence the need for extra ref counting here.

Makes sense, will update in next version.
  >
> Also I believe this code path will need an IGT specifically to test this
> code path.

Its part of plan and being tested by me, just instead of 1st fault, I am 
doing prefetch which populates 2 MiB range.

> 
> Roughly...
> 
> buf = aligned_alloc(SZ_2M, SZ_2M);
> fault_in_buf_on_gpu();
> madvise(buf, SZ_1M, some attribute);
> fault_in_buf_on_gpu();	/* Ideally showing different behavior between 2 chunks */
> read_buf_back_via_cpu();

Thanks.

> 
> Matt
> 
>> +			}
>> +		}
>> +	}
>> +}
>> +
>>   /**
>>    * xe_svm_bo_evict() - SVM evict BO to system memory
>>    * @bo: BO to evict
>> diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
>> index da9a69ea0bb1..754d56b4d255 100644
>> --- a/drivers/gpu/drm/xe/xe_svm.h
>> +++ b/drivers/gpu/drm/xe/xe_svm.h
>> @@ -90,6 +90,8 @@ bool xe_svm_range_validate(struct xe_vm *vm,
>>   
>>   u64 xe_svm_find_vma_start(struct xe_vm *vm, u64 addr, u64 end,  struct xe_vma *vma);
>>   
>> +void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end);
>> +
>>   /**
>>    * xe_svm_range_has_dma_mapping() - SVM range has DMA mapping
>>    * @range: SVM range
>> @@ -303,6 +305,11 @@ u64 xe_svm_find_vma_start(struct xe_vm *vm, u64 addr, u64 end, struct xe_vma *vm
>>   	return ULONG_MAX;
>>   }
>>   
>> +static inline
>> +void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end)
>> +{
>> +}
>> +
>>   #define xe_svm_assert_in_notifier(...) do {} while (0)
>>   #define xe_svm_range_has_dma_mapping(...) false
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index a56384325f4d..7f3d0ad04b3f 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -2663,8 +2663,12 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
>>   				end = op->base.remap.next->va.addr;
>>   
>>   			if (xe_vma_is_cpu_addr_mirror(old) &&
>> -			    xe_svm_has_mapping(vm, start, end))
>> -				return -EBUSY;
>> +			    xe_svm_has_mapping(vm, start, end)) {
>> +				if (vops->flags & XE_VMA_OPS_FLAG_MADVISE)
>> +					xe_svm_unmap_address_range(vm, start, end);
>> +				else
>> +					return -EBUSY;
>> +			}
>>   
>>   			op->remap.start = xe_vma_start(old);
>>   			op->remap.range = xe_vma_size(old);
>> -- 
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 10/23] drm/xe: Implement madvise ioctl for xe
  2025-07-29  4:23     ` Matthew Brost
@ 2025-07-29  9:43       ` Ghimiray, Himal Prasad
  0 siblings, 0 replies; 55+ messages in thread
From: Ghimiray, Himal Prasad @ 2025-07-29  9:43 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Thomas Hellström, Shuicheng Lin



On 29-07-2025 09:53, Matthew Brost wrote:
> On Mon, Jul 28, 2025 at 08:52:26PM -0700, Matthew Brost wrote:
>> On Tue, Jul 22, 2025 at 07:05:13PM +0530, Himal Prasad Ghimiray wrote:
>>> This driver-specific ioctl enables UMDs to control the memory attributes
>>> for GPU VMAs within a specified input range. If the start or end
>>> addresses fall within an existing VMA, the VMA is split accordingly. The
>>> attributes of the VMA are modified as provided by the users. The old
>>> mappings of the VMAs are invalidated, and TLB invalidation is performed
>>> if necessary.
>>>
>>> v2(Matthew brost)
>>> - xe_vm_in_fault_mode can't be enabled by Mesa, hence allow ioctl in non
>>> fault mode too
>>> - fix tlb invalidation skip for same ranges in multiple op
>>> - use helper for tlb invalidation
>>> - use xe_svm_notifier_lock/unlock helper
>>> - s/lockdep_assert_held/lockdep_assert_held_write
>>> - Add kernel-doc
>>>
>>> v3(Matthew Brost)
>>> - make vfunc fail safe
>>> - Add sanitizing input args before vfunc
>>>
>>> v4(Matthew Brost/Shuicheng)
>>> - Make locks interruptable
>>> - Error handling fixes
>>> - vm_put fixes
>>>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: Shuicheng Lin <shuicheng.lin@intel.com>
>>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/Makefile        |   1 +
>>>   drivers/gpu/drm/xe/xe_vm_madvise.c | 306 +++++++++++++++++++++++++++++
>>>   drivers/gpu/drm/xe/xe_vm_madvise.h |  15 ++
>>>   3 files changed, 322 insertions(+)
>>>   create mode 100644 drivers/gpu/drm/xe/xe_vm_madvise.c
>>>   create mode 100644 drivers/gpu/drm/xe/xe_vm_madvise.h
>>>
>>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>>> index 83a36c47a2f9..fa52866bb72c 100644
>>> --- a/drivers/gpu/drm/xe/Makefile
>>> +++ b/drivers/gpu/drm/xe/Makefile
>>> @@ -125,6 +125,7 @@ xe-y += xe_bb.o \
>>>   	xe_uc.o \
>>>   	xe_uc_fw.o \
>>>   	xe_vm.o \
>>> +	xe_vm_madvise.o \
>>>   	xe_vram.o \
>>>   	xe_vram_freq.o \
>>>   	xe_vsec.o \
>>> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
>>> new file mode 100644
>>> index 000000000000..f64728120d7c
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
>>> @@ -0,0 +1,306 @@
>>> +// SPDX-License-Identifier: MIT
>>> +/*
>>> + * Copyright © 2025 Intel Corporation
>>> + */
>>> +
>>> +#include "xe_vm_madvise.h"
>>> +
>>> +#include <linux/nospec.h>
>>> +#include <drm/xe_drm.h>
>>> +
>>> +#include "xe_bo.h"
>>> +#include "xe_pt.h"
>>> +#include "xe_svm.h"
>>> +
>>> +struct xe_vmas_in_madvise_range {
>>> +	u64 addr;
>>> +	u64 range;
>>> +	struct xe_vma **vmas;
>>> +	int num_vmas;
>>> +	bool has_svm_vmas;
>>> +	bool has_bo_vmas;
>>> +	bool has_userptr_vmas;
>>> +};
>>> +
>>> +static int get_vmas(struct xe_vm *vm, struct xe_vmas_in_madvise_range *madvise_range)
>>> +{
>>> +	u64 addr = madvise_range->addr;
>>> +	u64 range = madvise_range->range;
>>> +
>>> +	struct xe_vma  **__vmas;
>>> +	struct drm_gpuva *gpuva;
>>> +	int max_vmas = 8;
>>> +
>>> +	lockdep_assert_held(&vm->lock);
>>> +
>>> +	madvise_range->num_vmas = 0;
>>> +	madvise_range->vmas = kmalloc_array(max_vmas, sizeof(*madvise_range->vmas), GFP_KERNEL);
>>> +	if (!madvise_range->vmas)
>>> +		return -ENOMEM;
>>> +
>>> +	vm_dbg(&vm->xe->drm, "VMA's in range: start=0x%016llx, end=0x%016llx", addr, addr + range);
>>> +
>>> +	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, addr, addr + range) {
>>> +		struct xe_vma *vma = gpuva_to_vma(gpuva);
>>> +
>>> +		if (xe_vma_bo(vma))
>>> +			madvise_range->has_bo_vmas = true;
>>> +		else if (xe_vma_is_cpu_addr_mirror(vma))
>>> +			madvise_range->has_svm_vmas = true;
>>> +		else if (xe_vma_is_userptr(vma))
>>> +			madvise_range->has_userptr_vmas = true;
>>> +
>>> +		if (madvise_range->num_vmas == max_vmas) {
>>> +			max_vmas <<= 1;
>>> +			__vmas = krealloc(madvise_range->vmas,
>>> +					  max_vmas * sizeof(*madvise_range->vmas),
>>> +					  GFP_KERNEL);
>>> +			if (!__vmas) {
>>> +				kfree(madvise_range->vmas);
>>> +				return -ENOMEM;
>>> +			}
>>> +			madvise_range->vmas = __vmas;
>>> +		}
>>> +
>>> +		madvise_range->vmas[madvise_range->num_vmas] = vma;
>>> +		(madvise_range->num_vmas)++;
>>> +	}
>>> +
>>> +	if (!madvise_range->num_vmas)
>>> +		kfree(madvise_range->vmas);
>>> +
>>> +	vm_dbg(&vm->xe->drm, "madvise_range-num_vmas = %d\n", madvise_range->num_vmas);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static void madvise_preferred_mem_loc(struct xe_device *xe, struct xe_vm *vm,
>>> +				      struct xe_vma **vmas, int num_vmas,
>>> +				      struct drm_xe_madvise *op)
>>> +{
>>> +	/* Implementation pending */
>>> +}
>>> +
>>> +static void madvise_atomic(struct xe_device *xe, struct xe_vm *vm,
>>> +			   struct xe_vma **vmas, int num_vmas,
>>> +			   struct drm_xe_madvise *op)
>>> +{
>>> +	/* Implementation pending */
>>> +}
>>> +
>>> +static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
>>> +			      struct xe_vma **vmas, int num_vmas,
>>> +			      struct drm_xe_madvise *op)
>>> +{
>>> +	/* Implementation pending */
>>> +}
>>> +
>>> +typedef void (*madvise_func)(struct xe_device *xe, struct xe_vm *vm,
>>> +			     struct xe_vma **vmas, int num_vmas,
>>> +			     struct drm_xe_madvise *op);
>>> +
>>> +static const madvise_func madvise_funcs[] = {
>>> +	[DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC] = madvise_preferred_mem_loc,
>>> +	[DRM_XE_MEM_RANGE_ATTR_ATOMIC] = madvise_atomic,
>>> +	[DRM_XE_MEM_RANGE_ATTR_PAT] = madvise_pat_index,
>>> +};
>>> +
>>> +static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start, u64 end)
>>> +{
>>> +	struct drm_gpuva *gpuva;
>>> +	struct xe_tile *tile;
>>> +	u8 id, tile_mask;
>>> +
>>> +	lockdep_assert_held_write(&vm->lock);
>>> +
>>> +	/* Wait for pending binds */
>>> +	if (dma_resv_wait_timeout(xe_vm_resv(vm), DMA_RESV_USAGE_BOOKKEEP,
>>> +				  false, MAX_SCHEDULE_TIMEOUT) <= 0)
>>> +		XE_WARN_ON(1);
>>> +
>>> +	tile_mask = xe_svm_ranges_zap_ptes_in_range(vm, start, end);
>>> +
>>> +	drm_gpuvm_for_each_va_range(gpuva, &vm->gpuvm, start, end) {
>>> +		struct xe_vma *vma = gpuva_to_vma(gpuva);
>>> +
>>> +		if (xe_vma_is_cpu_addr_mirror(vma))
>>
>> I think:
>>
>> xe_vma_is_cpu_addr_mirror(vma) || xe_vma_is_null(vma)
>>
>> No need to invalidate NULL VMA's as those mappings are not modified by
>> madvise (i.e., you can call madvise on NULL mappings but it doesn't
>> actually do anything).
>>
>>> +			continue;
>>> +
>>> +		for_each_tile(tile, vm->xe, id) {
>>> +			if (xe_pt_zap_ptes(tile, vma)) {
>>> +				tile_mask |= BIT(id);
>>> +
>>> +				/*
>>> +				 * WRITE_ONCE pairs with READ_ONCE
>>> +				 * in xe_vm_has_valid_gpu_mapping()
>>> +				 */
>>> +				WRITE_ONCE(vma->tile_invalidated,
>>> +					   vma->tile_invalidated | BIT(id));
>>> +			}
>>> +		}
>>> +	}
>>> +
>>> +	return tile_mask;
>>> +}
>>> +
>>> +static int xe_vm_invalidate_madvise_range(struct xe_vm *vm, u64 start, u64 end)
>>> +{
>>> +	u8 tile_mask = xe_zap_ptes_in_madvise_range(vm, start, end);
>>> +
>>> +	if (!tile_mask)
>>> +		return 0;
>>> +
>>> +	xe_device_wmb(vm->xe);
>>> +
>>> +	return xe_vm_range_tilemask_tlb_invalidation(vm, start, end, tile_mask);
>>> +}
>>> +
>>> +static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madvise *args)
>>> +{
>>> +	if (XE_IOCTL_DBG(xe, !args))
>>> +		return false;
>>> +
>>> +	if (XE_IOCTL_DBG(xe, !IS_ALIGNED(args->start, SZ_4K)))
>>> +		return false;
>>> +
>>> +	if (XE_IOCTL_DBG(xe, !IS_ALIGNED(args->range, SZ_4K)))
>>> +		return false;
>>> +
>>> +	if (XE_IOCTL_DBG(xe, args->range < SZ_4K))
>>> +		return false;
>>> +
>>> +	switch (args->type) {
>>> +	case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC:
>>> +		if (XE_IOCTL_DBG(xe, args->preferred_mem_loc.migration_policy >
>>> +				     DRM_XE_MIGRATE_ONLY_SYSTEM_PAGES))
>>> +			return false;
>>> +
>>> +		if (XE_IOCTL_DBG(xe, args->preferred_mem_loc.pad))
>>> +			return false;
>>> +
>>> +		if (XE_IOCTL_DBG(xe, args->atomic.reserved))
>>> +			return false;
>>> +		break;
>>> +	case DRM_XE_MEM_RANGE_ATTR_ATOMIC:
>>> +		if (XE_IOCTL_DBG(xe, args->atomic.val > DRM_XE_ATOMIC_CPU))
>>> +			return false;
>>> +
>>> +		if (XE_IOCTL_DBG(xe, args->atomic.pad))
>>> +			return false;
>>> +
>>> +		if (XE_IOCTL_DBG(xe, args->atomic.reserved))
>>> +			return false;
>>> +
>>> +		break;
>>> +	case DRM_XE_MEM_RANGE_ATTR_PAT:
>>> +		/*TODO: Add valid pat check */
>>> +		break;
>>> +	default:
>>> +		if (XE_IOCTL_DBG(xe, 1))
>>> +			return false;
>>> +	}
>>> +
>>> +	if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
>>> +		return false;
>>> +
>>> +	return true;
>>> +}
>>> +
>>> +/**
>>> + * xe_vm_madvise_ioctl - Handle MADVise ioctl for a VM
>>> + * @dev: DRM device pointer
>>> + * @data: Pointer to ioctl data (drm_xe_madvise*)
>>> + * @file: DRM file pointer
>>> + *
>>> + * Handles the MADVISE ioctl to provide memory advice for vma's within
>>> + * input range.
>>> + *
>>> + * Return: 0 on success or a negative error code on failure.
>>> + */
>>> +int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>> +{
>>> +	struct xe_device *xe = to_xe_device(dev);
>>> +	struct xe_file *xef = to_xe_file(file);
>>> +	struct drm_xe_madvise *args = data;
>>> +	struct xe_vmas_in_madvise_range madvise_range = {.addr = args->start,
>>> +							 .range =  args->range, };
>>> +	struct xe_vm *vm;
>>> +	struct drm_exec exec;
>>> +	int err, attr_type;
>>> +
>>> +	vm = xe_vm_lookup(xef, args->vm_id);
>>> +	if (XE_IOCTL_DBG(xe, !vm))
>>> +		return -EINVAL;
>>> +
>>> +	if (!madvise_args_are_sane(vm->xe, args)) {
>>> +		err = -EINVAL;
>>> +		goto put_vm;
>>> +	}
>>> +
>>
>> I think as this code can modify the ranges during a VMA split, you will
>> need to ensure all queued unmaps prior to this are complete.
>>
> 
> The explaination is wrong, but it is still needed. You need a flush of
> garbage collector because it can modify VMAs and we need that view to be
> current. Feel free add this in this patch or patch 20.

Will add to this patch.

Thanks

> 
> Matt
> 
>> So call xe_svm_flush(vm) prior to taking any locks.
>>
>> Looks good otherwise.
>>
>> Matt
>>
>>> +	err = down_write_killable(&vm->lock);
>>> +	if (err)
>>> +		goto put_vm;
>>> +
>>> +	if (XE_IOCTL_DBG(xe, xe_vm_is_closed_or_banned(vm))) {
>>> +		err = -ENOENT;
>>> +		goto unlock_vm;
>>> +	}
>>> +
>>> +	err = xe_vm_alloc_madvise_vma(vm, args->start, args->range);
>>> +	if (err)
>>> +		goto unlock_vm;
>>> +
>>> +	err = get_vmas(vm, &madvise_range);
>>> +	if (err || !madvise_range.num_vmas)
>>> +		goto unlock_vm;
>>> +
>>> +	if (madvise_range.has_bo_vmas) {
>>> +		drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES | DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
>>> +		drm_exec_until_all_locked(&exec) {
>>> +			for (int i = 0; i < madvise_range.num_vmas; i++) {
>>> +				struct xe_bo *bo = xe_vma_bo(madvise_range.vmas[i]);
>>> +
>>> +				if (!bo)
>>> +					continue;
>>> +				err = drm_exec_lock_obj(&exec, &bo->ttm.base);
>>> +				drm_exec_retry_on_contention(&exec);
>>> +				if (err)
>>> +					goto err_fini;
>>> +			}
>>> +		}
>>> +	}
>>> +
>>> +	if (madvise_range.has_userptr_vmas) {
>>> +		err = down_read_interruptible(&vm->userptr.notifier_lock);
>>> +		if (err)
>>> +			goto err_fini;
>>> +	}
>>> +
>>> +	if (madvise_range.has_svm_vmas) {
>>> +		err = down_read_interruptible(&vm->svm.gpusvm.notifier_lock);
>>> +		if (err)
>>> +			goto unlock_userptr;
>>> +	}
>>> +
>>> +	attr_type = array_index_nospec(args->type, ARRAY_SIZE(madvise_funcs));
>>> +	madvise_funcs[attr_type](xe, vm, madvise_range.vmas, madvise_range.num_vmas, args);
>>> +
>>> +	err = xe_vm_invalidate_madvise_range(vm, args->start, args->start + args->range);
>>> +
>>> +	if (madvise_range.has_svm_vmas)
>>> +		xe_svm_notifier_unlock(vm);
>>> +
>>> +unlock_userptr:
>>> +	if (madvise_range.has_userptr_vmas)
>>> +		up_read(&vm->userptr.notifier_lock);
>>> +err_fini:
>>> +	if (madvise_range.has_bo_vmas)
>>> +		drm_exec_fini(&exec);
>>> +	kfree(madvise_range.vmas);
>>> +	madvise_range.vmas = NULL;
>>> +unlock_vm:
>>> +	up_write(&vm->lock);
>>> +put_vm:
>>> +	xe_vm_put(vm);
>>> +	return err;
>>> +}
>>> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.h b/drivers/gpu/drm/xe/xe_vm_madvise.h
>>> new file mode 100644
>>> index 000000000000..b0e1fc445f23
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.h
>>> @@ -0,0 +1,15 @@
>>> +/* SPDX-License-Identifier: MIT */
>>> +/*
>>> + * Copyright © 2025 Intel Corporation
>>> + */
>>> +
>>> +#ifndef _XE_VM_MADVISE_H_
>>> +#define _XE_VM_MADVISE_H_
>>> +
>>> +struct drm_device;
>>> +struct drm_file;
>>> +
>>> +int xe_vm_madvise_ioctl(struct drm_device *dev, void *data,
>>> +			struct drm_file *file);
>>> +
>>> +#endif
>>> -- 
>>> 2.34.1
>>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 11/23] drm/xe/svm : Add svm ranges migration policy on atomic access
  2025-07-29  4:04   ` Matthew Brost
@ 2025-07-30  4:59     ` Ghimiray, Himal Prasad
  0 siblings, 0 replies; 55+ messages in thread
From: Ghimiray, Himal Prasad @ 2025-07-30  4:59 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Thomas Hellström



On 29-07-2025 09:34, Matthew Brost wrote:
> On Tue, Jul 22, 2025 at 07:05:14PM +0530, Himal Prasad Ghimiray wrote:
>> If the platform does not support atomic access on system memory, and the
>> ranges are in system memory, but the user requires atomic accesses on
>> the VMA, then migrate the ranges to VRAM. Apply this policy for prefetch
>> operations as well.
>>
>> v2
>> - Drop unnecessary vm_dbg
>>
>> v3 (Matthew Brost)
>> - fix atomic policy
>> - prefetch shouldn't have any impact of atomic
>> - bo can be accessed from vma, avoid duplicate parameter
>>
>> v4 (Matthew Brost)
>> - Remove TODO comment
>> - Fix comment
>> - Dont allow gpu atomic ops when user is setting atomic attr as CPU
>>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_pt.c         | 23 +++++++++--------
>>   drivers/gpu/drm/xe/xe_svm.c        |  2 +-
>>   drivers/gpu/drm/xe/xe_vm.c         | 40 ++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_vm.h         |  2 ++
>>   drivers/gpu/drm/xe/xe_vm_madvise.c |  9 ++++++-
>>   5 files changed, 64 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
>> index b499006df2cf..96d0ffe8154e 100644
>> --- a/drivers/gpu/drm/xe/xe_pt.c
>> +++ b/drivers/gpu/drm/xe/xe_pt.c
>> @@ -640,28 +640,31 @@ static const struct xe_pt_walk_ops xe_pt_stage_bind_ops = {
>>    *    - In all other cases device atomics will be disabled with AE=0 until an application
>>    *      request differently using a ioctl like madvise.
>>    */
>> -static bool xe_atomic_for_vram(struct xe_vm *vm)
>> +static bool xe_atomic_for_vram(struct xe_vm *vm, struct xe_vma *vma)
>>   {
>> +	if (vma->attr.atomic_access == DRM_XE_ATOMIC_CPU)
>> +		return false;
>> +
>>   	return true;
>>   }
>>   
>> -static bool xe_atomic_for_system(struct xe_vm *vm, struct xe_bo *bo)
>> +static bool xe_atomic_for_system(struct xe_vm *vm, struct xe_vma *vma)
>>   {
>>   	struct xe_device *xe = vm->xe;
>> +	struct xe_bo *bo = xe_vma_bo(vma);
>>   
>> -	if (!xe->info.has_device_atomics_on_smem)
>> +	if (!xe->info.has_device_atomics_on_smem ||
>> +	    vma->attr.atomic_access == DRM_XE_ATOMIC_CPU)
>>   		return false;
>>   
>> +	if (vma->attr.atomic_access == DRM_XE_ATOMIC_DEVICE)
>> +		return true;
>> +
>>   	/*
>>   	 * If a SMEM+LMEM allocation is backed by SMEM, a device
>>   	 * atomics will cause a gpu page fault and which then
>>   	 * gets migrated to LMEM, bind such allocations with
>>   	 * device atomics enabled.
>> -	 *
>> -	 * TODO: Revisit this. Perhaps add something like a
>> -	 * fault_on_atomics_in_system UAPI flag.
>> -	 * Note that this also prohibits GPU atomics in LR mode for
>> -	 * userptr and system memory on DGFX.
>>   	 */
>>   	return (!IS_DGFX(xe) || (!xe_vm_in_lr_mode(vm) ||
>>   				 (bo && xe_bo_has_single_placement(bo))));
>> @@ -744,8 +747,8 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>>   		goto walk_pt;
>>   
>>   	if (vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) {
>> -		xe_walk.default_vram_pte = xe_atomic_for_vram(vm) ? XE_USM_PPGTT_PTE_AE : 0;
>> -		xe_walk.default_system_pte = xe_atomic_for_system(vm, bo) ?
>> +		xe_walk.default_vram_pte = xe_atomic_for_vram(vm, vma) ? XE_USM_PPGTT_PTE_AE : 0;
>> +		xe_walk.default_system_pte = xe_atomic_for_system(vm, vma) ?
>>   			XE_USM_PPGTT_PTE_AE : 0;
>>   	}
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
>> index c093dc453e32..49d3405aacb9 100644
>> --- a/drivers/gpu/drm/xe/xe_svm.c
>> +++ b/drivers/gpu/drm/xe/xe_svm.c
>> @@ -813,7 +813,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>>   			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP),
>>   		.check_pages_threshold = IS_DGFX(vm->xe) &&
>>   			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ? SZ_64K : 0,
>> -		.devmem_only = atomic && IS_DGFX(vm->xe) &&
>> +		.devmem_only = xe_vma_need_vram_for_atomic(vm->xe, vma, atomic) &&
>>   			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP),
>>   		.timeslice_ms = atomic && IS_DGFX(vm->xe) &&
>>   			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ?
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index 7f3d0ad04b3f..be51fcf322ec 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -4177,6 +4177,46 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap)
>>   	kvfree(snap);
>>   }
>>   
>> +/**
>> + * xe_vma_need_vram_for_atomic - Check if VMA needs VRAM migration for atomic operations
>> + * @xe: Pointer to the XE device structure
>> + * @vma: Pointer to the virtual memory area (VMA) structure
>> + * @is_atomic: In pagefault path and atomic operation
>> + *
>> + * This function determines whether the given VMA needs to be migrated to
>> + * VRAM in order to do atomic GPU operation.
>> + *
>> + * Return: true if migration to VRAM is required, false otherwise.
>> + */
>> +bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool is_atomic)
>> +{
>> +	if (!IS_DGFX(xe))
>> +		return false;
>> +
>> +	/*
>> +	 * NOTE: The checks implemented here are platform-specific. For
>> +	 * instance, on a device supporting CXL atomics, these would ideally
>> +	 * work universally without additional handling.
>> +	 */
>> +	switch (vma->attr.atomic_access) {
>> +	case DRM_XE_ATOMIC_DEVICE:
>> +		return !xe->info.has_device_atomics_on_smem;
> 
> I think this is is_atomic && !xe->info.has_device_atomics_on_smem;
> 
> We really only want strick migration if the fault is an atomic one.
> 
>> +
>> +	case DRM_XE_ATOMIC_CPU:
>> +		XE_WARN_ON(is_atomic);
> 
> I think we should nack the fault if an atomic occurs and
> DRM_XE_ATOMIC_CPU is set - both for SVMs and BOs.
> 
>> +		return false;
>> +
>> +	case DRM_XE_ATOMIC_UNDEFINED:
>> +		return is_atomic;
>> +
>> +	case DRM_XE_ATOMIC_GLOBAL:
>> +		return true;
> 
> As with above, I think this is is_atomic to only implement strick
> migration on atomic faults.

Agreed to all above comments. Will fix in next rev

> 
> Matt
> 
>> +
>> +	default:
>> +		return is_atomic;
>> +	}
>> +}
>> +
>>   /**
>>    * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
>>    * @vm: Pointer to the xe_vm structure
>> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
>> index 0d6b08cc4163..d5bc09ae640c 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.h
>> +++ b/drivers/gpu/drm/xe/xe_vm.h
>> @@ -171,6 +171,8 @@ static inline bool xe_vma_is_userptr(struct xe_vma *vma)
>>   
>>   struct xe_vma *xe_vm_find_vma_by_addr(struct xe_vm *vm, u64 page_addr);
>>   
>> +bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool is_atomic);
>> +
>>   int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
>>   
>>   /**
>> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
>> index f64728120d7c..62dc5cec8950 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
>> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
>> @@ -85,7 +85,14 @@ static void madvise_atomic(struct xe_device *xe, struct xe_vm *vm,
>>   			   struct xe_vma **vmas, int num_vmas,
>>   			   struct drm_xe_madvise *op)
>>   {
>> -	/* Implementation pending */
>> +	int i;
>> +
>> +	xe_assert(vm->xe, op->type == DRM_XE_MEM_RANGE_ATTR_ATOMIC);
>> +	xe_assert(vm->xe, op->atomic.val <= DRM_XE_ATOMIC_CPU);
>> +
>> +	for (i = 0; i < num_vmas; i++)
>> +		vmas[i]->attr.atomic_access = op->atomic.val;
>> +	/*TODO: handle bo backed vmas */
>>   }
>>   
>>   static void madvise_pat_index(struct xe_device *xe, struct xe_vm *vm,
>> -- 
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 20/23] drm/xe: Reset VMA attributes to default in SVM garbage collector
  2025-07-29  5:41   ` Matthew Brost
@ 2025-07-30  6:06     ` Ghimiray, Himal Prasad
  0 siblings, 0 replies; 55+ messages in thread
From: Ghimiray, Himal Prasad @ 2025-07-30  6:06 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Thomas Hellström



On 29-07-2025 11:11, Matthew Brost wrote:
> On Tue, Jul 22, 2025 at 07:05:23PM +0530, Himal Prasad Ghimiray wrote:
>> Restore default memory attributes for VMAs during garbage collection
>> if they were modified by madvise. Reuse existing VMA if fully overlapping;
>> otherwise, allocate a new mirror VMA.
>>
>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_svm.c |  34 +++++++++
>>   drivers/gpu/drm/xe/xe_vm.c  | 140 +++++++++++++++++++++++++-----------
>>   drivers/gpu/drm/xe/xe_vm.h  |   2 +
>>   3 files changed, 135 insertions(+), 41 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
>> index ba1233d0d5a2..79709dc066b9 100644
>> --- a/drivers/gpu/drm/xe/xe_svm.c
>> +++ b/drivers/gpu/drm/xe/xe_svm.c
>> @@ -255,7 +255,18 @@ static int __xe_svm_garbage_collector(struct xe_vm *vm,
>>   static int xe_svm_garbage_collector(struct xe_vm *vm)
>>   {
>>   	struct xe_svm_range *range;
>> +	struct xe_vma *vma;
>> +	u64 range_start;
>> +	u64 range_size;
>> +	u64 range_end;
>>   	int err;
>> +	struct xe_vma_mem_attr default_attr = {
>> +		.preferred_loc = {
>> +			.devmem_fd = DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE,
>> +			.migration_policy = DRM_XE_MIGRATE_ALL_PAGES,
>> +		},
>> +		.atomic_access = DRM_XE_ATOMIC_UNDEFINED,
>> +	};
>>   
>>   	lockdep_assert_held_write(&vm->lock);
>>   
>> @@ -270,6 +281,12 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
>>   		if (!range)
>>   			break;
>>   
>> +		range_start = xe_svm_range_start(range);
>> +		range_size = xe_svm_range_size(range);
>> +		range_end = xe_svm_range_end(range);
>> +
>> +		vma = xe_vm_find_vma_by_addr(vm, xe_svm_range_start(range));
>> +
>>   		list_del(&range->garbage_collector_link);
>>   		spin_unlock(&vm->svm.garbage_collector.lock);
>>   
>> @@ -282,7 +299,24 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
>>   			return err;
>>   		}
>>   
>> +		if (!xe_vma_has_default_mem_attrs(vma)) {
>> +			vm_dbg(&vm->xe->drm, "Existing VMA start=0x%016llx, vma_end=0x%016llx",
>> +			       xe_vma_start(vma), xe_vma_end(vma));
>> +
>> +			if (xe_vma_start(vma) == range_start && xe_vma_end(vma) == range_end) {
>> +				default_attr.pat_index = vma->attr.default_pat_index;
>> +				default_attr.default_pat_index  = vma->attr.default_pat_index;
>> +				vma->attr = default_attr;
>> +			} else {
>> +				vm_dbg(&vm->xe->drm, "Split VMA start=0x%016llx, vma_end=0x%016llx",
>> +				       range_start, range_end);
>> +				err = xe_vm_alloc_cpu_addr_mirror_vma(vm, range_start, range_size);
> 
> I missed this corner, if you call from the fault handler the VMA it
> looked up originally could be gone, if you modify the VMAs you need to
> signal the fault handler to lookup the VMA again.

Makes complete sense. Will update in next version.>
> Matt
> 
>> +				if (err)
>> +					return err;
>> +			}
>> +		}
>>   		spin_lock(&vm->svm.garbage_collector.lock);
>> +
>>   	}
>>   	spin_unlock(&vm->svm.garbage_collector.lock);
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index d3f08bf9a3ee..003c8209f8bd 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -4254,34 +4254,24 @@ bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool
>>   	}
>>   }
>>   
>> -/**
>> - * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
>> - * @vm: Pointer to the xe_vm structure
>> - * @start: Starting input address
>> - * @range: Size of the input range
>> - *
>> - * This function splits existing vma to create new vma for user provided input range
>> - *
>> - *  Return: 0 if success
>> - */
>> -int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>> +static int xe_vm_alloc_vma(struct xe_vm *vm,
>> +			   u64 start, u64 range,
>> +			   enum drm_gpuvm_sm_map_ops_flags flags)
>>   {
>>   	struct xe_vma_ops vops;
>>   	struct drm_gpuva_ops *ops = NULL;
>>   	struct drm_gpuva_op *__op;
>>   	bool is_cpu_addr_mirror = false;
>>   	bool remap_op = false;
>> +	bool is_madvise = flags == DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE;
>>   	struct xe_vma_mem_attr tmp_attr;
>> +	u16 default_pat;
>>   	int err;
>>   
>> -	vm_dbg(&vm->xe->drm, "MADVISE IN: addr=0x%016llx, size=0x%016llx", start, range);
>> -
>>   	lockdep_assert_held_write(&vm->lock);
>>   
>> -	vm_dbg(&vm->xe->drm, "MADVISE_OPS_CREATE: addr=0x%016llx, size=0x%016llx", start, range);
>>   	ops = drm_gpuvm_sm_map_ops_create(&vm->gpuvm, start, range,
>> -					  DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE,
>> -					  NULL, start);
>> +					  flags, NULL, start);
>>   	if (IS_ERR(ops))
>>   		return PTR_ERR(ops);
>>   
>> @@ -4292,33 +4282,56 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>>   
>>   	drm_gpuva_for_each_op(__op, ops) {
>>   		struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
>> +		struct xe_vma *vma = NULL;
>>   
>> -		if (__op->op == DRM_GPUVA_OP_REMAP) {
>> -			xe_assert(vm->xe, !remap_op);
>> -			remap_op = true;
>> +		if (!is_madvise) {
>> +			if (__op->op == DRM_GPUVA_OP_UNMAP) {
>> +				vma = gpuva_to_vma(op->base.unmap.va);
>> +				XE_WARN_ON(!xe_vma_has_default_mem_attrs(vma));
>> +				default_pat = vma->attr.default_pat_index;
>> +			}
>>   
>> -			if (xe_vma_is_cpu_addr_mirror(gpuva_to_vma(op->base.remap.unmap->va)))
>> -				is_cpu_addr_mirror = true;
>> -			else
>> -				is_cpu_addr_mirror = false;
>> -		}
>> +			if (__op->op == DRM_GPUVA_OP_REMAP) {
>> +				vma = gpuva_to_vma(op->base.remap.unmap->va);
>> +				default_pat = vma->attr.default_pat_index;
>> +			}
>>   
>> -		if (__op->op == DRM_GPUVA_OP_MAP) {
>> -			xe_assert(vm->xe, remap_op);
>> -			remap_op = false;
>> +			if (__op->op == DRM_GPUVA_OP_MAP) {
>> +				op->map.is_cpu_addr_mirror = true;
>> +				op->map.pat_index = default_pat;
>> +			}
>> +		} else {
>> +			if (__op->op == DRM_GPUVA_OP_REMAP) {
>> +				vma = gpuva_to_vma(op->base.remap.unmap->va);
>> +				xe_assert(vm->xe, !remap_op);
>> +				remap_op = true;
>>   
>> -			/* In case of madvise ops DRM_GPUVA_OP_MAP is always after
>> -			 * DRM_GPUVA_OP_REMAP, so ensure we assign op->map.is_cpu_addr_mirror true
>> -			 * if REMAP is for xe_vma_is_cpu_addr_mirror vma
>> -			 */
>> -			op->map.is_cpu_addr_mirror = is_cpu_addr_mirror;
>> -		}
>> +				if (xe_vma_is_cpu_addr_mirror(vma))
>> +					is_cpu_addr_mirror = true;
>> +				else
>> +					is_cpu_addr_mirror = false;
>> +			}
>>   
>> +			if (__op->op == DRM_GPUVA_OP_MAP) {
>> +				xe_assert(vm->xe, remap_op);
>> +				remap_op = false;
>> +				/*
>> +				 * In case of madvise ops DRM_GPUVA_OP_MAP is
>> +				 * always after DRM_GPUVA_OP_REMAP, so ensure
>> +				 * we assign op->map.is_cpu_addr_mirror true
>> +				 * if REMAP is for xe_vma_is_cpu_addr_mirror vma
>> +				 */
>> +				op->map.is_cpu_addr_mirror = is_cpu_addr_mirror;
>> +			}
>> +		}
>>   		print_op(vm->xe, __op);
>>   	}
>>   
>>   	xe_vma_ops_init(&vops, vm, NULL, NULL, 0);
>> -	vops.flags |= XE_VMA_OPS_FLAG_MADVISE;
>> +
>> +	if (is_madvise)
>> +		vops.flags |= XE_VMA_OPS_FLAG_MADVISE;
>> +
>>   	err = vm_bind_ioctl_ops_parse(vm, ops, &vops);
>>   	if (err)
>>   		goto unwind_ops;
>> @@ -4330,15 +4343,20 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>>   		struct xe_vma *vma;
>>   
>>   		if (__op->op == DRM_GPUVA_OP_UNMAP) {
>> -			/* There should be no unmap */
>> -			XE_WARN_ON("UNEXPECTED UNMAP");
>> -			xe_vma_destroy(gpuva_to_vma(op->base.unmap.va), NULL);
>> +			vma = gpuva_to_vma(op->base.unmap.va);
>> +			/* There should be no unmap for madvise */
>> +			if (is_madvise)
>> +				XE_WARN_ON("UNEXPECTED UNMAP");
>> +
>> +			xe_vma_destroy(vma, NULL);
>>   		} else if (__op->op == DRM_GPUVA_OP_REMAP) {
>>   			vma = gpuva_to_vma(op->base.remap.unmap->va);
>> -			/* Store attributes for REMAP UNMAPPED VMA, so they can be assigned
>> -			 * to newly MAP created vma.
>> +			/* In case of madvise ops Store attributes for REMAP UNMAPPED
>> +			 * VMA, so they can be assigned to newly MAP created vma.
>>   			 */
>> -			tmp_attr = vma->attr;
>> +			if (is_madvise)
>> +				tmp_attr = vma->attr;
>> +
>>   			xe_vma_destroy(gpuva_to_vma(op->base.remap.unmap->va), NULL);
>>   		} else if (__op->op == DRM_GPUVA_OP_MAP) {
>>   			vma = op->map.vma;
>> @@ -4346,7 +4364,8 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>>   			 * Therefore temp_attr will always have sane values, making it safe to
>>   			 * copy them to new vma.
>>   			 */
>> -			vma->attr = tmp_attr;
>> +			if (is_madvise)
>> +				vma->attr = tmp_attr;
>>   		}
>>   	}
>>   
>> @@ -4360,3 +4379,42 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>>   	drm_gpuva_ops_free(&vm->gpuvm, ops);
>>   	return err;
>>   }
>> +
>> +/**
>> + * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
>> + * @vm: Pointer to the xe_vm structure
>> + * @start: Starting input address
>> + * @range: Size of the input range
>> + *
>> + * This function splits existing vma to create new vma for user provided input range
>> + *
>> + * Return: 0 if success
>> + */
>> +int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>> +{
>> +	lockdep_assert_held_write(&vm->lock);
>> +
>> +	vm_dbg(&vm->xe->drm, "MADVISE_OPS_CREATE: addr=0x%016llx, size=0x%016llx", start, range);
>> +
>> +	return xe_vm_alloc_vma(vm, start, range, DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE);
>> +}
>> +
>> +/**
>> + * xe_vm_alloc_cpu_addr_mirror_vma - Allocate CPU addr mirror vma
>> + * @vm: Pointer to the xe_vm structure
>> + * @start: Starting input address
>> + * @range: Size of the input range
>> + *
>> + * This function splits/merges existing vma to create new vma for user provided input range
>> + *
>> + * Return: 0 if success
>> + */
>> +int xe_vm_alloc_cpu_addr_mirror_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>> +{
>> +	lockdep_assert_held_write(&vm->lock);
>> +
>> +	vm_dbg(&vm->xe->drm, "CPU_ADDR_MIRROR_VMA_OPS_CREATE: addr=0x%016llx, size=0x%016llx",
>> +	       start, range);
>> +
>> +	return xe_vm_alloc_vma(vm, start, range, DRM_GPUVM_SM_MAP_NOT_MADVISE);
>> +}
>> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
>> index a4db843de540..f7b9ad83685a 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.h
>> +++ b/drivers/gpu/drm/xe/xe_vm.h
>> @@ -177,6 +177,8 @@ bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool
>>   
>>   int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
>>   
>> +int xe_vm_alloc_cpu_addr_mirror_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
>> +
>>   /**
>>    * to_userptr_vma() - Return a pointer to an embedding userptr vma
>>    * @vma: Pointer to the embedded struct xe_vma
>> -- 
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 20/23] drm/xe: Reset VMA attributes to default in SVM garbage collector
  2025-07-24 21:50   ` Matthew Brost
  2025-07-29  5:27     ` Matthew Brost
@ 2025-07-30  6:09     ` Ghimiray, Himal Prasad
  1 sibling, 0 replies; 55+ messages in thread
From: Ghimiray, Himal Prasad @ 2025-07-30  6:09 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Thomas Hellström



On 25-07-2025 03:20, Matthew Brost wrote:
> On Tue, Jul 22, 2025 at 07:05:23PM +0530, Himal Prasad Ghimiray wrote:
>> Restore default memory attributes for VMAs during garbage collection
>> if they were modified by madvise. Reuse existing VMA if fully overlapping;
>> otherwise, allocate a new mirror VMA.
>>
>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_svm.c |  34 +++++++++
>>   drivers/gpu/drm/xe/xe_vm.c  | 140 +++++++++++++++++++++++++-----------
>>   drivers/gpu/drm/xe/xe_vm.h  |   2 +
>>   3 files changed, 135 insertions(+), 41 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
>> index ba1233d0d5a2..79709dc066b9 100644
>> --- a/drivers/gpu/drm/xe/xe_svm.c
>> +++ b/drivers/gpu/drm/xe/xe_svm.c
>> @@ -255,7 +255,18 @@ static int __xe_svm_garbage_collector(struct xe_vm *vm,
>>   static int xe_svm_garbage_collector(struct xe_vm *vm)
>>   {
>>   	struct xe_svm_range *range;
>> +	struct xe_vma *vma;
>> +	u64 range_start;
>> +	u64 range_size;
>> +	u64 range_end;
>>   	int err;
>> +	struct xe_vma_mem_attr default_attr = {
>> +		.preferred_loc = {
>> +			.devmem_fd = DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE,
>> +			.migration_policy = DRM_XE_MIGRATE_ALL_PAGES,
>> +		},
>> +		.atomic_access = DRM_XE_ATOMIC_UNDEFINED,
>> +	};
>>   
>>   	lockdep_assert_held_write(&vm->lock);
>>   
>> @@ -270,6 +281,12 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
>>   		if (!range)
>>   			break;
>>   
>> +		range_start = xe_svm_range_start(range);
>> +		range_size = xe_svm_range_size(range);
>> +		range_end = xe_svm_range_end(range);
>> +
>> +		vma = xe_vm_find_vma_by_addr(vm, xe_svm_range_start(range));
>> +
> 
> I'd find the VMA outside of the svm.garbage_collector.lock.

Sure.

> 
>>   		list_del(&range->garbage_collector_link);
>>   		spin_unlock(&vm->svm.garbage_collector.lock);
>>   
>> @@ -282,7 +299,24 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
>>   			return err;
>>   		}
>>   
>> +		if (!xe_vma_has_default_mem_attrs(vma)) {
> 
> It seems possible the VMA could be NULL in error cases. I'd check for
> NULL and error out.
> 
> Also could this code be moved to a helper? Internal SVM seems ok, in
> that case xe_vm_find_vma_by_addr could also be in the helper.

Sure

> 
>> +			vm_dbg(&vm->xe->drm, "Existing VMA start=0x%016llx, vma_end=0x%016llx",
>> +			       xe_vma_start(vma), xe_vma_end(vma));
>> +
>> +			if (xe_vma_start(vma) == range_start && xe_vma_end(vma) == range_end) {
>> +				default_attr.pat_index = vma->attr.default_pat_index;
>> +				default_attr.default_pat_index  = vma->attr.default_pat_index;
>> +				vma->attr = default_attr;
>> +			} else {
>> +				vm_dbg(&vm->xe->drm, "Split VMA start=0x%016llx, vma_end=0x%016llx",
>> +				       range_start, range_end);
>> +				err = xe_vm_alloc_cpu_addr_mirror_vma(vm, range_start, range_size);
>> +				if (err)
> 
> On error, I'd print a message and kill the VM as it shouldn't be
> possible to fail aside from a memory allocation failure and we can't
> code with errors given this can be inside a worker.

Sure

> 
> I'll circle back to the rest of the patch a bit later.
> 
> Matt
> 
>> +					return err;
>> +			}
>> +		}
>>   		spin_lock(&vm->svm.garbage_collector.lock);
>> +
>>   	}
>>   	spin_unlock(&vm->svm.garbage_collector.lock);
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index d3f08bf9a3ee..003c8209f8bd 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -4254,34 +4254,24 @@ bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool
>>   	}
>>   }
>>   
>> -/**
>> - * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
>> - * @vm: Pointer to the xe_vm structure
>> - * @start: Starting input address
>> - * @range: Size of the input range
>> - *
>> - * This function splits existing vma to create new vma for user provided input range
>> - *
>> - *  Return: 0 if success
>> - */
>> -int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>> +static int xe_vm_alloc_vma(struct xe_vm *vm,
>> +			   u64 start, u64 range,
>> +			   enum drm_gpuvm_sm_map_ops_flags flags)
>>   {
>>   	struct xe_vma_ops vops;
>>   	struct drm_gpuva_ops *ops = NULL;
>>   	struct drm_gpuva_op *__op;
>>   	bool is_cpu_addr_mirror = false;
>>   	bool remap_op = false;
>> +	bool is_madvise = flags == DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE;
>>   	struct xe_vma_mem_attr tmp_attr;
>> +	u16 default_pat;
>>   	int err;
>>   
>> -	vm_dbg(&vm->xe->drm, "MADVISE IN: addr=0x%016llx, size=0x%016llx", start, range);
>> -
>>   	lockdep_assert_held_write(&vm->lock);
>>   
>> -	vm_dbg(&vm->xe->drm, "MADVISE_OPS_CREATE: addr=0x%016llx, size=0x%016llx", start, range);
>>   	ops = drm_gpuvm_sm_map_ops_create(&vm->gpuvm, start, range,
>> -					  DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE,
>> -					  NULL, start);
>> +					  flags, NULL, start);
>>   	if (IS_ERR(ops))
>>   		return PTR_ERR(ops);
>>   
>> @@ -4292,33 +4282,56 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>>   
>>   	drm_gpuva_for_each_op(__op, ops) {
>>   		struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
>> +		struct xe_vma *vma = NULL;
>>   
>> -		if (__op->op == DRM_GPUVA_OP_REMAP) {
>> -			xe_assert(vm->xe, !remap_op);
>> -			remap_op = true;
>> +		if (!is_madvise) {
>> +			if (__op->op == DRM_GPUVA_OP_UNMAP) {
>> +				vma = gpuva_to_vma(op->base.unmap.va);
>> +				XE_WARN_ON(!xe_vma_has_default_mem_attrs(vma));
>> +				default_pat = vma->attr.default_pat_index;
>> +			}
>>   
>> -			if (xe_vma_is_cpu_addr_mirror(gpuva_to_vma(op->base.remap.unmap->va)))
>> -				is_cpu_addr_mirror = true;
>> -			else
>> -				is_cpu_addr_mirror = false;
>> -		}
>> +			if (__op->op == DRM_GPUVA_OP_REMAP) {
>> +				vma = gpuva_to_vma(op->base.remap.unmap->va);
>> +				default_pat = vma->attr.default_pat_index;
>> +			}
>>   
>> -		if (__op->op == DRM_GPUVA_OP_MAP) {
>> -			xe_assert(vm->xe, remap_op);
>> -			remap_op = false;
>> +			if (__op->op == DRM_GPUVA_OP_MAP) {
>> +				op->map.is_cpu_addr_mirror = true;
>> +				op->map.pat_index = default_pat;
>> +			}
>> +		} else {
>> +			if (__op->op == DRM_GPUVA_OP_REMAP) {
>> +				vma = gpuva_to_vma(op->base.remap.unmap->va);
>> +				xe_assert(vm->xe, !remap_op);
>> +				remap_op = true;
>>   
>> -			/* In case of madvise ops DRM_GPUVA_OP_MAP is always after
>> -			 * DRM_GPUVA_OP_REMAP, so ensure we assign op->map.is_cpu_addr_mirror true
>> -			 * if REMAP is for xe_vma_is_cpu_addr_mirror vma
>> -			 */
>> -			op->map.is_cpu_addr_mirror = is_cpu_addr_mirror;
>> -		}
>> +				if (xe_vma_is_cpu_addr_mirror(vma))
>> +					is_cpu_addr_mirror = true;
>> +				else
>> +					is_cpu_addr_mirror = false;
>> +			}
>>   
>> +			if (__op->op == DRM_GPUVA_OP_MAP) {
>> +				xe_assert(vm->xe, remap_op);
>> +				remap_op = false;
>> +				/*
>> +				 * In case of madvise ops DRM_GPUVA_OP_MAP is
>> +				 * always after DRM_GPUVA_OP_REMAP, so ensure
>> +				 * we assign op->map.is_cpu_addr_mirror true
>> +				 * if REMAP is for xe_vma_is_cpu_addr_mirror vma
>> +				 */
>> +				op->map.is_cpu_addr_mirror = is_cpu_addr_mirror;
>> +			}
>> +		}
>>   		print_op(vm->xe, __op);
>>   	}
>>   
>>   	xe_vma_ops_init(&vops, vm, NULL, NULL, 0);
>> -	vops.flags |= XE_VMA_OPS_FLAG_MADVISE;
>> +
>> +	if (is_madvise)
>> +		vops.flags |= XE_VMA_OPS_FLAG_MADVISE;
>> +
>>   	err = vm_bind_ioctl_ops_parse(vm, ops, &vops);
>>   	if (err)
>>   		goto unwind_ops;
>> @@ -4330,15 +4343,20 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>>   		struct xe_vma *vma;
>>   
>>   		if (__op->op == DRM_GPUVA_OP_UNMAP) {
>> -			/* There should be no unmap */
>> -			XE_WARN_ON("UNEXPECTED UNMAP");
>> -			xe_vma_destroy(gpuva_to_vma(op->base.unmap.va), NULL);
>> +			vma = gpuva_to_vma(op->base.unmap.va);
>> +			/* There should be no unmap for madvise */
>> +			if (is_madvise)
>> +				XE_WARN_ON("UNEXPECTED UNMAP");
>> +
>> +			xe_vma_destroy(vma, NULL);
>>   		} else if (__op->op == DRM_GPUVA_OP_REMAP) {
>>   			vma = gpuva_to_vma(op->base.remap.unmap->va);
>> -			/* Store attributes for REMAP UNMAPPED VMA, so they can be assigned
>> -			 * to newly MAP created vma.
>> +			/* In case of madvise ops Store attributes for REMAP UNMAPPED
>> +			 * VMA, so they can be assigned to newly MAP created vma.
>>   			 */
>> -			tmp_attr = vma->attr;
>> +			if (is_madvise)
>> +				tmp_attr = vma->attr;
>> +
>>   			xe_vma_destroy(gpuva_to_vma(op->base.remap.unmap->va), NULL);
>>   		} else if (__op->op == DRM_GPUVA_OP_MAP) {
>>   			vma = op->map.vma;
>> @@ -4346,7 +4364,8 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>>   			 * Therefore temp_attr will always have sane values, making it safe to
>>   			 * copy them to new vma.
>>   			 */
>> -			vma->attr = tmp_attr;
>> +			if (is_madvise)
>> +				vma->attr = tmp_attr;
>>   		}
>>   	}
>>   
>> @@ -4360,3 +4379,42 @@ int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>>   	drm_gpuva_ops_free(&vm->gpuvm, ops);
>>   	return err;
>>   }
>> +
>> +/**
>> + * xe_vm_alloc_madvise_vma - Allocate VMA's with madvise ops
>> + * @vm: Pointer to the xe_vm structure
>> + * @start: Starting input address
>> + * @range: Size of the input range
>> + *
>> + * This function splits existing vma to create new vma for user provided input range
>> + *
>> + * Return: 0 if success
>> + */
>> +int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>> +{
>> +	lockdep_assert_held_write(&vm->lock);
>> +
>> +	vm_dbg(&vm->xe->drm, "MADVISE_OPS_CREATE: addr=0x%016llx, size=0x%016llx", start, range);
>> +
>> +	return xe_vm_alloc_vma(vm, start, range, DRM_GPUVM_SKIP_GEM_OBJ_VA_SPLIT_MADVISE);
>> +}
>> +
>> +/**
>> + * xe_vm_alloc_cpu_addr_mirror_vma - Allocate CPU addr mirror vma
>> + * @vm: Pointer to the xe_vm structure
>> + * @start: Starting input address
>> + * @range: Size of the input range
>> + *
>> + * This function splits/merges existing vma to create new vma for user provided input range
>> + *
>> + * Return: 0 if success
>> + */
>> +int xe_vm_alloc_cpu_addr_mirror_vma(struct xe_vm *vm, uint64_t start, uint64_t range)
>> +{
>> +	lockdep_assert_held_write(&vm->lock);
>> +
>> +	vm_dbg(&vm->xe->drm, "CPU_ADDR_MIRROR_VMA_OPS_CREATE: addr=0x%016llx, size=0x%016llx",
>> +	       start, range);
>> +
>> +	return xe_vm_alloc_vma(vm, start, range, DRM_GPUVM_SM_MAP_NOT_MADVISE);
>> +}
>> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
>> index a4db843de540..f7b9ad83685a 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.h
>> +++ b/drivers/gpu/drm/xe/xe_vm.h
>> @@ -177,6 +177,8 @@ bool xe_vma_need_vram_for_atomic(struct xe_device *xe, struct xe_vma *vma, bool
>>   
>>   int xe_vm_alloc_madvise_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
>>   
>> +int xe_vm_alloc_cpu_addr_mirror_vma(struct xe_vm *vm, uint64_t addr, uint64_t size);
>> +
>>   /**
>>    * to_userptr_vma() - Return a pointer to an embedding userptr vma
>>    * @vma: Pointer to the embedded struct xe_vma
>> -- 
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 21/23] drm/xe/vm: Add a delayed worker to merge fragmented vmas
  2025-07-29  4:39   ` Matthew Brost
@ 2025-07-30 11:08     ` Ghimiray, Himal Prasad
  0 siblings, 0 replies; 55+ messages in thread
From: Ghimiray, Himal Prasad @ 2025-07-30 11:08 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Thomas Hellström



On 29-07-2025 10:09, Matthew Brost wrote:
> On Tue, Jul 22, 2025 at 07:05:24PM +0530, Himal Prasad Ghimiray wrote:
>> During initial mirror bind initialize and start the delayed work item
>> responsible for merging adjacent CPU address mirror VMAs with default
>> memory attributes. This function sets the merge_active flag and schedules
>> the work to run after a delay, allowing batching of VMA updates.
>>
> 
> I think we will need someway to defragment but it might need more
> thought. The trade off between defragmenting on every insertion of
> mirror VMA (binding a BO back to mirror) and every unmap restoring the
> defaults vs. periodic worker needs to be carefully considered.
> 
> The trade off is more time up front (plus perhaps some additional
> complexity) vs periodic worker which blocks out all memory transactions.
> 
> Since this doesn't affect any functionality, perhaps table for now + we
> run this one by Thomas to formulate a plan / solution.

Sure, Lets discuss and conclude before taking the approach. Will be 
dropping this patch from next version and will post seperately in future 
after finalizing on it.

Thanks
> 
> Matt
> 
>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_vm.c       | 126 +++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_vm_types.h |  15 ++++
>>   2 files changed, 141 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index 003c8209f8bd..bee849167c0d 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -1160,6 +1160,127 @@ static void xe_vma_free(struct xe_vma *vma)
>>   		kfree(vma);
>>   }
>>   
>> +struct va_range {
>> +	u64 start;
>> +	u64 end;
>> +};
>> +
>> +static void add_merged_range(struct va_range **ranges, int *count, int *capacity,
>> +			     u64 start, u64 end)
>> +{
>> +	const int array_size  = 8;
>> +	struct va_range *new_ranges;
>> +	int new_capacity;
>> +
>> +	if (*count == *capacity) {
>> +		new_capacity = *capacity ? *capacity * 2 : array_size;
>> +		new_ranges = krealloc(*ranges, new_capacity * sizeof(**ranges), GFP_KERNEL);
>> +		if (!new_ranges)
>> +			return;
>> +
>> +		*ranges = new_ranges;
>> +		*capacity = new_capacity;
>> +	}
>> +	(*ranges)[(*count)++] = (struct va_range){ .start = start, .end = end };
>> +}
>> +
>> +static void xe_vm_vmas_merge_worker(struct work_struct *work)
>> +{
>> +	struct xe_vm *vm = container_of(to_delayed_work(work), struct xe_vm, merge_vmas_work);
>> +	struct drm_gpuva *gpuva, *next = NULL;
>> +	struct va_range *merged_ranges = NULL;
>> +	int merge_count = 0, merge_capacity = 0;
>> +	bool in_merge = false;
>> +	u64 merge_start = 0, merge_end = 0;
>> +	int merge_len = 0;
>> +
>> +	if (!vm->merge_active)
>> +		return;
>> +
>> +	down_write(&vm->lock);
>> +
>> +	drm_gpuvm_for_each_va_safe(gpuva, next, &vm->gpuvm) {
>> +		struct xe_vma *vma = gpuva_to_vma(gpuva);
>> +
>> +		if (!xe_vma_is_cpu_addr_mirror(vma) || !xe_vma_has_default_mem_attrs(vma)) {
>> +			if (in_merge && merge_len > 1)
>> +				add_merged_range(&merged_ranges, &merge_count, &merge_capacity,
>> +						 merge_start, merge_end);
>> +
>> +			in_merge = false;
>> +			merge_len = 0;
>> +			continue;
>> +		}
>> +
>> +		if (!in_merge) {
>> +			merge_start = xe_vma_start(vma);
>> +			merge_end = xe_vma_end(vma);
>> +			in_merge = true;
>> +			merge_len = 1;
>> +		} else if (xe_vma_start(vma) == merge_end && xe_vma_has_default_mem_attrs(vma)) {
>> +			merge_end = xe_vma_end(vma);
>> +			merge_len++;
>> +		} else {
>> +			if (merge_len > 1)
>> +				add_merged_range(&merged_ranges, &merge_count, &merge_capacity,
>> +						 merge_start, merge_end);
>> +			merge_start = xe_vma_start(vma);
>> +			merge_end = xe_vma_end(vma);
>> +			merge_len = 1;
>> +		}
>> +	}
>> +
>> +	if (in_merge && merge_len > 1) {
>> +		add_merged_range(&merged_ranges, &merge_count, &merge_capacity,
>> +				 merge_start, merge_end);
>> +	}
>> +
>> +	for (int i = 0; i < merge_count; i++) {
>> +		vm_dbg(&vm->xe->drm, "Merged VA range %d: start=0x%016llx, end=0x%016llx\n",
>> +		       i, merged_ranges[i].start, merged_ranges[i].end);
>> +
>> +		if (xe_vm_alloc_cpu_addr_mirror_vma(vm, merged_ranges[i].start,
>> +						    merged_ranges[i].end - merged_ranges[i].start))
>> +			break;
>> +	}
>> +
>> +	up_write(&vm->lock);
>> +	kfree(merged_ranges);
>> +	schedule_delayed_work(&vm->merge_vmas_work, msecs_to_jiffies(5000));
>> +}
>> +
>> +/*
>> + * xe_vm_start_vmas_merge - Initialize and schedule VMA merge work
>> + * @vm: Pointer to the xe_vm structure
>> + *
>> + * Initializes the delayed work item responsible for merging adjacent
>> + * CPU address mirror VMAs with default memory attributes. This function
>> + * sets the merge_active flag and schedules the work to run after a delay,
>> + * allowing batching of VMA updates.
>> + */
>> +static void xe_vm_start_vmas_merge(struct xe_vm *vm)
>> +{
>> +	if (vm->merge_active)
>> +		return;
>> +
>> +	vm->merge_active = true;
>> +	INIT_DELAYED_WORK(&vm->merge_vmas_work, xe_vm_vmas_merge_worker);
>> +	schedule_delayed_work(&vm->merge_vmas_work, msecs_to_jiffies(5000));
>> +}
>> +
>> +/*
>> + * xe_vm_stop_vmas_merge - Cancel scheduled VMA merge work
>> + * @vm: Pointer to the xe_vm structure
>> + */
>> +static void xe_vm_stop_vmas_merge(struct xe_vm *vm)
>> +{
>> +	if (!vm->merge_active)
>> +		return;
>> +
>> +	vm->merge_active = false;
>> +	cancel_delayed_work_sync(&vm->merge_vmas_work);
>> +}
>> +
>>   #define VMA_CREATE_FLAG_READ_ONLY		BIT(0)
>>   #define VMA_CREATE_FLAG_IS_NULL			BIT(1)
>>   #define VMA_CREATE_FLAG_DUMPABLE		BIT(2)
>> @@ -1269,6 +1390,9 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>>   		xe_vm_get(vm);
>>   	}
>>   
>> +	if (xe_vma_is_cpu_addr_mirror(vma))
>> +		xe_vm_start_vmas_merge(vm);
>> +
>>   	return vma;
>>   }
>>   
>> @@ -1982,6 +2106,8 @@ static void vm_destroy_work_func(struct work_struct *w)
>>   	/* xe_vm_close_and_put was not called? */
>>   	xe_assert(xe, !vm->size);
>>   
>> +	xe_vm_stop_vmas_merge(vm);
>> +
>>   	if (xe_vm_in_preempt_fence_mode(vm))
>>   		flush_work(&vm->preempt.rebind_work);
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
>> index 351242c92c12..c4f3542eb464 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>> @@ -374,6 +374,21 @@ struct xe_vm {
>>   	bool batch_invalidate_tlb;
>>   	/** @xef: XE file handle for tracking this VM's drm client */
>>   	struct xe_file *xef;
>> +
>> +	/**
>> +	 * @merge_vmas_work: Delayed work item used to merge CPU address mirror VMAs.
>> +	 * This work is scheduled to scan the GPU virtual memory space and
>> +	 * identify adjacent CPU address mirror VMAs that have default memory
>> +	 * attributes. When such VMAs are found, they are merged into a single
>> +	 * larger VMA to reduce fragmentation. The merging process is triggered
>> +	 * asynchronously via a delayed workqueue avoid blocking critical paths
>> +	 * and to batch updates when possible.
>> +	 */
>> +	struct delayed_work merge_vmas_work;
>> +
>> +	/** @merge_active: True if merge_vmas_work has been initialized */
>> +	bool merge_active;
>> +
>>   };
>>   
>>   /** struct xe_vma_op_map - VMA map operation */
>> -- 
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2025-07-30 11:08 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-22 13:35 [PATCH v5 00/23] MADVISE FOR XE Himal Prasad Ghimiray
2025-07-22 13:35 ` [PATCH v5 01/23] Introduce drm_gpuvm_sm_map_ops_flags enums for sm_map_ops Himal Prasad Ghimiray
2025-07-22 13:38   ` Danilo Krummrich
2025-07-24  0:43     ` Matthew Brost
2025-07-24 10:05       ` Ghimiray, Himal Prasad
2025-07-24 10:32       ` Caterina Shablia
2025-07-28 10:20         ` Ghimiray, Himal Prasad
2025-07-24 10:02     ` Ghimiray, Himal Prasad
2025-07-27 21:18   ` Matthew Brost
2025-07-28  6:16     ` Ghimiray, Himal Prasad
2025-07-22 13:35 ` [PATCH v5 02/23] drm/xe/uapi: Add madvise interface Himal Prasad Ghimiray
2025-07-29  3:29   ` Matthew Brost
2025-07-22 13:35 ` [PATCH v5 03/23] drm/xe/vm: Add attributes struct as member of vma Himal Prasad Ghimiray
2025-07-22 13:35 ` [PATCH v5 04/23] drm/xe/vma: Move pat_index to vma attributes Himal Prasad Ghimiray
2025-07-22 13:35 ` [PATCH v5 05/23] drm/xe/vma: Modify new_vma to accept struct xe_vma_mem_attr as parameter Himal Prasad Ghimiray
2025-07-22 13:35 ` [PATCH v5 06/23] drm/gpusvm: Make drm_gpusvm_for_each_* macros public Himal Prasad Ghimiray
2025-07-22 13:35 ` [PATCH v5 07/23] drm/xe/svm: Split system allocator vma incase of madvise call Himal Prasad Ghimiray
2025-07-22 13:35 ` [PATCH v5 08/23] drm/xe: Allow CPU address mirror VMA unbind with gpu bindings for madvise Himal Prasad Ghimiray
2025-07-29  3:40   ` Matthew Brost
2025-07-29  7:42     ` Ghimiray, Himal Prasad
2025-07-22 13:35 ` [PATCH v5 09/23] drm/xe/svm: Add xe_svm_ranges_zap_ptes_in_range() for PTE zapping Himal Prasad Ghimiray
2025-07-29  3:42   ` Matthew Brost
2025-07-22 13:35 ` [PATCH v5 10/23] drm/xe: Implement madvise ioctl for xe Himal Prasad Ghimiray
2025-07-29  3:52   ` Matthew Brost
2025-07-29  4:23     ` Matthew Brost
2025-07-29  9:43       ` Ghimiray, Himal Prasad
2025-07-22 13:35 ` [PATCH v5 11/23] drm/xe/svm : Add svm ranges migration policy on atomic access Himal Prasad Ghimiray
2025-07-29  4:04   ` Matthew Brost
2025-07-30  4:59     ` Ghimiray, Himal Prasad
2025-07-22 13:35 ` [PATCH v5 12/23] drm/xe/madvise: Update migration policy based on preferred location Himal Prasad Ghimiray
2025-07-29  4:07   ` Matthew Brost
2025-07-22 13:35 ` [PATCH v5 13/23] drm/xe/svm: Support DRM_XE_SVM_ATTR_PAT memory attribute Himal Prasad Ghimiray
2025-07-23 16:55   ` Ghimiray, Himal Prasad
2025-07-22 13:35 ` [PATCH v5 14/23] drm/xe/uapi: Add flag for consulting madvise hints on svm prefetch Himal Prasad Ghimiray
2025-07-22 13:35 ` [PATCH v5 15/23] drm/xe/svm: Consult madvise preferred location in prefetch Himal Prasad Ghimiray
2025-07-22 13:35 ` [PATCH v5 16/23] drm/xe/bo: Add attributes field to xe_bo Himal Prasad Ghimiray
2025-07-22 13:35 ` [PATCH v5 17/23] drm/xe/bo: Update atomic_access attribute on madvise Himal Prasad Ghimiray
2025-07-29  4:18   ` Matthew Brost
2025-07-22 13:35 ` [PATCH v5 18/23] drm/xe/madvise: Skip vma invalidation if mem attr are unchanged Himal Prasad Ghimiray
2025-07-29  4:19   ` Matthew Brost
2025-07-22 13:35 ` [PATCH v5 19/23] drm/xe/vm: Add helper to check for default VMA memory attributes Himal Prasad Ghimiray
2025-07-29  4:33   ` Matthew Brost
2025-07-22 13:35 ` [PATCH v5 20/23] drm/xe: Reset VMA attributes to default in SVM garbage collector Himal Prasad Ghimiray
2025-07-24 21:50   ` Matthew Brost
2025-07-29  5:27     ` Matthew Brost
2025-07-30  6:09     ` Ghimiray, Himal Prasad
2025-07-29  5:41   ` Matthew Brost
2025-07-30  6:06     ` Ghimiray, Himal Prasad
2025-07-22 13:35 ` [PATCH v5 21/23] drm/xe/vm: Add a delayed worker to merge fragmented vmas Himal Prasad Ghimiray
2025-07-29  4:39   ` Matthew Brost
2025-07-30 11:08     ` Ghimiray, Himal Prasad
2025-07-22 13:35 ` [PATCH v5 22/23] drm/xe: Enable madvise ioctl for xe Himal Prasad Ghimiray
2025-07-29  4:34   ` Matthew Brost
2025-07-22 13:35 ` [PATCH v5 23/23] drm/xe/uapi: Add UAPI for querying VMA count and memory attributes Himal Prasad Ghimiray
2025-07-29  5:37   ` Matthew Brost

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.