[PATCH 1/3] drm/xe: Add a function to zap page table by address range

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/3] drm/xe: Add a function to zap page table by address range
@ 2025-01-28 22:21 Oak Zeng
  2025-01-28 22:21 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Oak Zeng @ 2025-01-28 22:21 UTC (permalink / raw)
  To: intel-xe; +Cc: joonas.lahtinen, Thomas.Hellstrom

Add a function xe_pt_zap_range. This is similar to xe_pt_zap_ptes
but is used when we don't have a vma to work with, such as zap
a range mapped to scratch page where we don't have vma.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c | 28 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_pt.h |  2 ++
 2 files changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 1ddcc7e79a93..2363260da6a6 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -792,6 +792,34 @@ static const struct xe_pt_walk_ops xe_pt_zap_ptes_ops = {
 	.pt_entry = xe_pt_zap_ptes_entry,
 };
 
+/**
+ * xe_pt_zap_range() - Zap (zero) gpu ptes of an virtual address range
+ * @tile: The tile we're zapping for.
+ * @vm: The vm we're zapping for.
+ * @start: Start of the virtual address range, inclusive.
+ * @end: End of the virtual address range, exclusive.
+ *
+ * This is similar to xe_pt_zap_ptes() but it's used when we don't have a
+ * vma to work with. This is used for example when we're clearing the scratch
+ * page mapping during vm_bind.
+ *
+ */
+void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64 start, u64 end)
+{
+	struct xe_pt_zap_ptes_walk xe_walk = {
+		.base = {
+			.ops = &xe_pt_zap_ptes_ops,
+			.shifts = xe_normal_pt_shifts,
+			.max_level = XE_PT_HIGHEST_LEVEL,
+		},
+		.tile = tile,
+	};
+	struct xe_pt *pt = vm->pt_root[tile->id];
+
+	(void)xe_pt_walk_shared(&pt->base, pt->level, start,
+				end, &xe_walk.base);
+}
+
 /**
  * xe_pt_zap_ptes() - Zap (zero) gpu ptes of an address range
  * @tile: The tile we're zapping for.
diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
index 9ab386431cad..b166b324f455 100644
--- a/drivers/gpu/drm/xe/xe_pt.h
+++ b/drivers/gpu/drm/xe/xe_pt.h
@@ -43,4 +43,6 @@ void xe_pt_update_ops_abort(struct xe_tile *tile, struct xe_vma_ops *vops);
 
 bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
 
+void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64 start, u64 end);
+
 #endif
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/3] drm/xe: Clear scratch page before vm_bind
  2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng
@ 2025-01-28 22:21 ` Oak Zeng
  2025-01-28 23:05   ` Cavitt, Jonathan
  2025-01-28 23:19   ` Matthew Brost
  2025-01-28 22:21 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 20+ messages in thread
From: Oak Zeng @ 2025-01-28 22:21 UTC (permalink / raw)
  To: intel-xe; +Cc: joonas.lahtinen, Thomas.Hellstrom

When a vm runs under fault mode, if scratch page is enabled, we need
to clear the scratch page mapping before vm_bind for the vm_bind
address range. Under fault mode, we depend on recoverable page fault
to establish mapping in page table. If scratch page is not cleared,
GPU access of address won't cause page fault because it always hits
the existing scratch page mapping.

When vm_bind with IMMEDIATE flag, there is no need of clearing as
immediate bind can overwrite the scratch page mapping.

So far only is xe2 and xe3 products are allowed to enable scratch page
under fault mode. On other platform we don't allow scratch page under
fault mode, so no need of such clearing.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 690330352d4c..196d347c6ac0 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -38,6 +38,7 @@
 #include "xe_trace_bo.h"
 #include "xe_wa.h"
 #include "xe_hmm.h"
+#include "i915_drv.h"
 
 static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
 {
@@ -2917,6 +2918,34 @@ static int xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo *bo,
 	return 0;
 }
 
+static bool __xe_vm_needs_clear_scratch_pages(struct xe_device *xe,
+					      struct xe_vm *vm, u32 bind_flags)
+{
+	if (!xe_vm_in_fault_mode(vm))
+		return false;
+
+	if (!xe_vm_has_scratch(vm))
+		return false;
+
+	if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE)
+		return false;
+
+	if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || IS_PANTHERLAKE(xe)))
+		return false;
+
+	return true;
+}
+
+static void __xe_vm_clear_scratch_pages(struct xe_device *xe, struct xe_vm *vm,
+					u64 start, u64 end)
+{
+	struct xe_tile *tile;
+	u8 id;
+
+	for_each_tile(tile, xe, id)
+		xe_pt_zap_range(tile, vm, start, end);
+}
+
 int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
 	struct xe_device *xe = to_xe_device(dev);
@@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		u32 prefetch_region = bind_ops[i].prefetch_mem_region_instance;
 		u16 pat_index = bind_ops[i].pat_index;
 
+		if (__xe_vm_needs_clear_scratch_pages(xe, vm, flags))
+			__xe_vm_clear_scratch_pages(xe, vm, addr, addr + range);
+
 		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
 						  addr, range, op, flags,
 						  prefetch_region, pat_index);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform
  2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng
  2025-01-28 22:21 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng
@ 2025-01-28 22:21 ` Oak Zeng
  2025-01-28 23:05   ` Cavitt, Jonathan
  2025-01-29  8:52   ` Thomas Hellström
  2025-01-28 23:04 ` [PATCH 1/3] drm/xe: Add a function to zap page table by address range Cavitt, Jonathan
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 20+ messages in thread
From: Oak Zeng @ 2025-01-28 22:21 UTC (permalink / raw)
  To: intel-xe; +Cc: joonas.lahtinen, Thomas.Hellstrom

Normally scratch page is not allowed when a vm is operate under page
fault mode, i.e., in the existing codes, DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE
and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason
is fault mode relies on recoverable page to work, while scratch page
can mute recoverable page fault.

On xe2 and xe3, HW prefetching can cause page fault interrupt. Due to
page fault interrupt overhead (i.e., need Guc and KMD involved to fix
the page fault), HW prefetching can be slowed by many orders of magnitude.

Fix this problem by allowing scratch page under fault mode for xe2 and xe3.
With scratch page in place, HW prefetching could always hit scratch page
instead of causing interrupt.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 196d347c6ac0..3346f88f284a 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1732,6 +1732,11 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (XE_IOCTL_DBG(xe, args->extensions))
 		return -EINVAL;
 
+	if (XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE &&
+			 args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE &&
+			 !(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || IS_PANTHERLAKE(xe))))
+		return -EINVAL;
+
 	if (XE_WA(xe_root_mmio_gt(xe), 14016763929))
 		args->flags |= DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE;
 
@@ -1745,10 +1750,6 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (XE_IOCTL_DBG(xe, args->flags & ~ALL_DRM_XE_VM_CREATE_FLAGS))
 		return -EINVAL;
 
-	if (XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE &&
-			 args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
-		return -EINVAL;
-
 	if (XE_IOCTL_DBG(xe, !(args->flags & DRM_XE_VM_CREATE_FLAG_LR_MODE) &&
 			 args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
 		return -EINVAL;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [PATCH 1/3] drm/xe: Add a function to zap page table by address range
  2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng
  2025-01-28 22:21 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng
  2025-01-28 22:21 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng
@ 2025-01-28 23:04 ` Cavitt, Jonathan
  2025-01-28 23:12 ` ✓ CI.Patch_applied: success for series starting with [1/3] " Patchwork
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Cavitt, Jonathan @ 2025-01-28 23:04 UTC (permalink / raw)
  To: Zeng, Oak, intel-xe@lists.freedesktop.org
  Cc: joonas.lahtinen@linux.intel.com, Thomas.Hellstrom@linux.intel.com,
	Cavitt, Jonathan

-----Original Message-----
From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Oak Zeng
Sent: Tuesday, January 28, 2025 2:22 PM
To: intel-xe@lists.freedesktop.org
Cc: joonas.lahtinen@linux.intel.com; Thomas.Hellstrom@linux.intel.com
Subject: [PATCH 1/3] drm/xe: Add a function to zap page table by address range
> 
> Add a function xe_pt_zap_range. This is similar to xe_pt_zap_ptes
> but is used when we don't have a vma to work with, such as zap
> a range mapped to scratch page where we don't have vma.
> 
> Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_pt.c | 28 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_pt.h |  2 ++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 1ddcc7e79a93..2363260da6a6 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -792,6 +792,34 @@ static const struct xe_pt_walk_ops xe_pt_zap_ptes_ops = {
>  	.pt_entry = xe_pt_zap_ptes_entry,
>  };
>  
> +/**
> + * xe_pt_zap_range() - Zap (zero) gpu ptes of an virtual address range
> + * @tile: The tile we're zapping for.
> + * @vm: The vm we're zapping for.
> + * @start: Start of the virtual address range, inclusive.
> + * @end: End of the virtual address range, exclusive.
> + *
> + * This is similar to xe_pt_zap_ptes() but it's used when we don't have a
> + * vma to work with. This is used for example when we're clearing the scratch
> + * page mapping during vm_bind.
> + *

NIT:
The asterisk here can be removed.

> + */
> +void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64 start, u64 end)

It looks like the function xe_pt_zap_ptes returns if the walk needs to be invalidated.
Will that also be needed here?

Assuming not:
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
-Jonathan Cavitt

> +{
> +	struct xe_pt_zap_ptes_walk xe_walk = {
> +		.base = {
> +			.ops = &xe_pt_zap_ptes_ops,
> +			.shifts = xe_normal_pt_shifts,
> +			.max_level = XE_PT_HIGHEST_LEVEL,
> +		},
> +		.tile = tile,
> +	};
> +	struct xe_pt *pt = vm->pt_root[tile->id];
> +
> +	(void)xe_pt_walk_shared(&pt->base, pt->level, start,
> +				end, &xe_walk.base);
> +}
> +
>  /**
>   * xe_pt_zap_ptes() - Zap (zero) gpu ptes of an address range
>   * @tile: The tile we're zapping for.
> diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
> index 9ab386431cad..b166b324f455 100644
> --- a/drivers/gpu/drm/xe/xe_pt.h
> +++ b/drivers/gpu/drm/xe/xe_pt.h
> @@ -43,4 +43,6 @@ void xe_pt_update_ops_abort(struct xe_tile *tile, struct xe_vma_ops *vops);
>  
>  bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
>  
> +void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64 start, u64 end);
> +
>  #endif
> -- 
> 2.26.3
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind
  2025-01-28 22:21 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng
@ 2025-01-28 23:05   ` Cavitt, Jonathan
  2025-01-28 23:19   ` Matthew Brost
  1 sibling, 0 replies; 20+ messages in thread
From: Cavitt, Jonathan @ 2025-01-28 23:05 UTC (permalink / raw)
  To: Zeng, Oak, intel-xe@lists.freedesktop.org
  Cc: joonas.lahtinen@linux.intel.com, Thomas.Hellstrom@linux.intel.com,
	Cavitt, Jonathan

-----Original Message-----
From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Oak Zeng
Sent: Tuesday, January 28, 2025 2:22 PM
To: intel-xe@lists.freedesktop.org
Cc: joonas.lahtinen@linux.intel.com; Thomas.Hellstrom@linux.intel.com
Subject: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind
> 
> When a vm runs under fault mode, if scratch page is enabled, we need
> to clear the scratch page mapping before vm_bind for the vm_bind
> address range. Under fault mode, we depend on recoverable page fault
> to establish mapping in page table. If scratch page is not cleared,
> GPU access of address won't cause page fault because it always hits
> the existing scratch page mapping.
> 
> When vm_bind with IMMEDIATE flag, there is no need of clearing as
> immediate bind can overwrite the scratch page mapping.
> 
> So far only is xe2 and xe3 products are allowed to enable scratch page
> under fault mode. On other platform we don't allow scratch page under
> fault mode, so no need of such clearing.
> 
> Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 690330352d4c..196d347c6ac0 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -38,6 +38,7 @@
>  #include "xe_trace_bo.h"
>  #include "xe_wa.h"
>  #include "xe_hmm.h"
> +#include "i915_drv.h"
>  
>  static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
>  {
> @@ -2917,6 +2918,34 @@ static int xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo *bo,
>  	return 0;
>  }
>  
> +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device *xe,
> +					      struct xe_vm *vm, u32 bind_flags)
> +{
> +	if (!xe_vm_in_fault_mode(vm))
> +		return false;
> +
> +	if (!xe_vm_has_scratch(vm))
> +		return false;
> +
> +	if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE)
> +		return false;
> +
> +	if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || IS_PANTHERLAKE(xe)))

We should probably break this platform check out into its own function or macro, especially
since it's being mirrored in the next patch during xe_vm_create_ioctl.  It'd also be easier to
extend that way if future platforms also need this functionality.

But other than that:
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
-Jonathan Cavitt

> +		return false;
> +
> +	return true;
> +}
> +
> +static void __xe_vm_clear_scratch_pages(struct xe_device *xe, struct xe_vm *vm,
> +					u64 start, u64 end)
> +{
> +	struct xe_tile *tile;
> +	u8 id;
> +
> +	for_each_tile(tile, xe, id)
> +		xe_pt_zap_range(tile, vm, start, end);
> +}
> +
>  int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  {
>  	struct xe_device *xe = to_xe_device(dev);
> @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  		u32 prefetch_region = bind_ops[i].prefetch_mem_region_instance;
>  		u16 pat_index = bind_ops[i].pat_index;
>  
> +		if (__xe_vm_needs_clear_scratch_pages(xe, vm, flags))
> +			__xe_vm_clear_scratch_pages(xe, vm, addr, addr + range);
> +
>  		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
>  						  addr, range, op, flags,
>  						  prefetch_region, pat_index);
> -- 
> 2.26.3
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform
  2025-01-28 22:21 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng
@ 2025-01-28 23:05   ` Cavitt, Jonathan
  2025-01-29  8:52   ` Thomas Hellström
  1 sibling, 0 replies; 20+ messages in thread
From: Cavitt, Jonathan @ 2025-01-28 23:05 UTC (permalink / raw)
  To: Zeng, Oak, intel-xe@lists.freedesktop.org
  Cc: joonas.lahtinen@linux.intel.com, Thomas.Hellstrom@linux.intel.com,
	Cavitt, Jonathan

-----Original Message-----
From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Oak Zeng
Sent: Tuesday, January 28, 2025 2:22 PM
To: intel-xe@lists.freedesktop.org
Cc: joonas.lahtinen@linux.intel.com; Thomas.Hellstrom@linux.intel.com
Subject: [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform
> 
> Normally scratch page is not allowed when a vm is operate under page
> fault mode, i.e., in the existing codes, DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE
> and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason
> is fault mode relies on recoverable page to work, while scratch page
> can mute recoverable page fault.
> 
> On xe2 and xe3, HW prefetching can cause page fault interrupt. Due to
> page fault interrupt overhead (i.e., need Guc and KMD involved to fix
> the page fault), HW prefetching can be slowed by many orders of magnitude.
> 
> Fix this problem by allowing scratch page under fault mode for xe2 and xe3.
> With scratch page in place, HW prefetching could always hit scratch page
> instead of causing interrupt.
> 
> Signed-off-by: Oak Zeng <oak.zeng@intel.com>

LGTM, though see my comment for patch 2 regarding the added platform checks.
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
-Jonathan Cavitt

> ---
>  drivers/gpu/drm/xe/xe_vm.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 196d347c6ac0..3346f88f284a 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1732,6 +1732,11 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
>  	if (XE_IOCTL_DBG(xe, args->extensions))
>  		return -EINVAL;
>  
> +	if (XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE &&
> +			 args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE &&
> +			 !(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || IS_PANTHERLAKE(xe))))
> +		return -EINVAL;
> +
>  	if (XE_WA(xe_root_mmio_gt(xe), 14016763929))
>  		args->flags |= DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE;
>  
> @@ -1745,10 +1750,6 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
>  	if (XE_IOCTL_DBG(xe, args->flags & ~ALL_DRM_XE_VM_CREATE_FLAGS))
>  		return -EINVAL;
>  
> -	if (XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE &&
> -			 args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
> -		return -EINVAL;
> -
>  	if (XE_IOCTL_DBG(xe, !(args->flags & DRM_XE_VM_CREATE_FLAG_LR_MODE) &&
>  			 args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
>  		return -EINVAL;
> -- 
> 2.26.3
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* ✓ CI.Patch_applied: success for series starting with [1/3] drm/xe: Add a function to zap page table by address range
  2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng
                   ` (2 preceding siblings ...)
  2025-01-28 23:04 ` [PATCH 1/3] drm/xe: Add a function to zap page table by address range Cavitt, Jonathan
@ 2025-01-28 23:12 ` Patchwork
  2025-01-28 23:12 ` ✓ CI.checkpatch: " Patchwork
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2025-01-28 23:12 UTC (permalink / raw)
  To: Oak Zeng; +Cc: intel-xe

== Series Details ==

Series: series starting with [1/3] drm/xe: Add a function to zap page table by address range
URL   : https://patchwork.freedesktop.org/series/144072/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: 8f3ba847175b drm-tip: 2025y-01m-28d-22h-47m-54s UTC integration manifest
=== git am output follows ===
Applying: drm/xe: Add a function to zap page table by address range
Applying: drm/xe: Clear scratch page before vm_bind
Applying: drm/xe: Allow scratch page under fault mode for certain platform



^ permalink raw reply	[flat|nested] 20+ messages in thread

* ✓ CI.checkpatch: success for series starting with [1/3] drm/xe: Add a function to zap page table by address range
  2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng
                   ` (3 preceding siblings ...)
  2025-01-28 23:12 ` ✓ CI.Patch_applied: success for series starting with [1/3] " Patchwork
@ 2025-01-28 23:12 ` Patchwork
  2025-01-28 23:13 ` ✗ CI.KUnit: failure " Patchwork
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2025-01-28 23:12 UTC (permalink / raw)
  To: Oak Zeng; +Cc: intel-xe

== Series Details ==

Series: series starting with [1/3] drm/xe: Add a function to zap page table by address range
URL   : https://patchwork.freedesktop.org/series/144072/
State : success

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
30ab6715fc09baee6cc14cb3c89ad8858688d474
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit cb0f1cdb1d31379f15c7b5e96e66a8c4d48691ee
Author: Oak Zeng <oak.zeng@intel.com>
Date:   Tue Jan 28 17:21:45 2025 -0500

    drm/xe: Allow scratch page under fault mode for certain platform
    
    Normally scratch page is not allowed when a vm is operate under page
    fault mode, i.e., in the existing codes, DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE
    and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason
    is fault mode relies on recoverable page to work, while scratch page
    can mute recoverable page fault.
    
    On xe2 and xe3, HW prefetching can cause page fault interrupt. Due to
    page fault interrupt overhead (i.e., need Guc and KMD involved to fix
    the page fault), HW prefetching can be slowed by many orders of magnitude.
    
    Fix this problem by allowing scratch page under fault mode for xe2 and xe3.
    With scratch page in place, HW prefetching could always hit scratch page
    instead of causing interrupt.
    
    Signed-off-by: Oak Zeng <oak.zeng@intel.com>
+ /mt/dim checkpatch 8f3ba847175bfd495cbfe5b51d3cdfd8d32d24db drm-intel
fc3c14dbb3cb drm/xe: Add a function to zap page table by address range
108d849b4e71 drm/xe: Clear scratch page before vm_bind
cb0f1cdb1d31 drm/xe: Allow scratch page under fault mode for certain platform



^ permalink raw reply	[flat|nested] 20+ messages in thread

* ✗ CI.KUnit: failure for series starting with [1/3] drm/xe: Add a function to zap page table by address range
  2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng
                   ` (4 preceding siblings ...)
  2025-01-28 23:12 ` ✓ CI.checkpatch: " Patchwork
@ 2025-01-28 23:13 ` Patchwork
  2025-01-28 23:38 ` [PATCH 1/3] " Matthew Brost
  2025-01-29  8:48 ` Thomas Hellström
  7 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2025-01-28 23:13 UTC (permalink / raw)
  To: Oak Zeng; +Cc: intel-xe

== Series Details ==

Series: series starting with [1/3] drm/xe: Add a function to zap page table by address range
URL   : https://patchwork.freedesktop.org/series/144072/
State : failure

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes]
  156 | u64 ioread64_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes]
  163 | u64 ioread64_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes]
  170 | u64 ioread64be_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes]
  178 | u64 ioread64be_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes]
  264 | void iowrite64_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes]
  272 | void iowrite64_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes]
  280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes]
  288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_vm.c:41:10: fatal error: i915_drv.h: No such file or directory
   41 | #include "i915_drv.h"
      |          ^~~~~~~~~~~~
compilation terminated.
make[7]: *** [../scripts/Makefile.build:194: drivers/gpu/drm/xe/xe_vm.o] Error 1
make[7]: *** Waiting for unfinished jobs....
make[6]: *** [../scripts/Makefile.build:440: drivers/gpu/drm/xe] Error 2
make[5]: *** [../scripts/Makefile.build:440: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:440: drivers/gpu] Error 2
make[3]: *** [../scripts/Makefile.build:440: drivers] Error 2
make[2]: *** [/kernel/Makefile:1989: .] Error 2
make[1]: *** [/kernel/Makefile:251: __sub-make] Error 2
make: *** [Makefile:251: __sub-make] Error 2

[23:12:39] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[23:12:43] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind
  2025-01-28 22:21 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng
  2025-01-28 23:05   ` Cavitt, Jonathan
@ 2025-01-28 23:19   ` Matthew Brost
  2025-01-29  5:04     ` Matthew Brost
  2025-01-29 21:01     ` Zeng, Oak
  1 sibling, 2 replies; 20+ messages in thread
From: Matthew Brost @ 2025-01-28 23:19 UTC (permalink / raw)
  To: Oak Zeng; +Cc: intel-xe, joonas.lahtinen, Thomas.Hellstrom

On Tue, Jan 28, 2025 at 05:21:44PM -0500, Oak Zeng wrote:
> When a vm runs under fault mode, if scratch page is enabled, we need
> to clear the scratch page mapping before vm_bind for the vm_bind
> address range. Under fault mode, we depend on recoverable page fault
> to establish mapping in page table. If scratch page is not cleared,
> GPU access of address won't cause page fault because it always hits
> the existing scratch page mapping.
> 
> When vm_bind with IMMEDIATE flag, there is no need of clearing as
> immediate bind can overwrite the scratch page mapping.
> 
> So far only is xe2 and xe3 products are allowed to enable scratch page
> under fault mode. On other platform we don't allow scratch page under
> fault mode, so no need of such clearing.
> 
> Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 690330352d4c..196d347c6ac0 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -38,6 +38,7 @@
>  #include "xe_trace_bo.h"
>  #include "xe_wa.h"
>  #include "xe_hmm.h"
> +#include "i915_drv.h"
>  
>  static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
>  {
> @@ -2917,6 +2918,34 @@ static int xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo *bo,
>  	return 0;
>  }
>  
> +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device *xe,
> +					      struct xe_vm *vm, u32 bind_flags)
> +{
> +	if (!xe_vm_in_fault_mode(vm))
> +		return false;
> +
> +	if (!xe_vm_has_scratch(vm))
> +		return false;
> +
> +	if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE)
> +		return false;
> +
> +	if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || IS_PANTHERLAKE(xe)))
> +		return false;
> +
> +	return true;
> +}
> +
> +static void __xe_vm_clear_scratch_pages(struct xe_device *xe, struct xe_vm *vm,
> +					u64 start, u64 end)
> +{
> +	struct xe_tile *tile;
> +	u8 id;
> +
> +	for_each_tile(tile, xe, id)
> +		xe_pt_zap_range(tile, vm, start, end);
> +}
> +
>  int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  {
>  	struct xe_device *xe = to_xe_device(dev);
> @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  		u32 prefetch_region = bind_ops[i].prefetch_mem_region_instance;
>  		u16 pat_index = bind_ops[i].pat_index;
>  
> +		if (__xe_vm_needs_clear_scratch_pages(xe, vm, flags))
> +			__xe_vm_clear_scratch_pages(xe, vm, addr, addr + range);

A few things...

- I believe this is only needed for bind user operations or internal MAP
  GPU VMA operations.
- I believe a TLB invalidation will be required.
- I don't think calling zap PTEs range works here, given how the scratch
  tables are set up (i.e., new PTEs need to be created pointing to an
  invalid state).
- This series appears to be untested based on the points above.

Therefore, instead of this series, I believe you will need to fully
update the bind pipeline to process MAP GPU VMA operations here.

So roughly...

- Maybe include a bit in xe_vma_op_map that specifies "invalidate on
  bind," set in vm_bind_ioctl_ops_create, since this will need to be
  wired throughout the bind pipeline.
- Don't validate backing memory in this case.
- Ensure that xe_vma_ops_incr_pt_update_ops is called in this case for
  MAP operations, forcing entry into the xe_pt.c backend.
- Update xe_pt_stage_bind_walk with a variable that indicates clearing
  the PTE. Instead of calling pte_encode_vma in xe_pt_stage_bind_entry,
  set this variable for PT bind operations derived from MAP operations
  that meet the "invalidate on bind" condition.
- Ensure needs_invalidation is set in struct xe_vm_pgtable_update_ops if
  a MAP operation is included that meets the "invalidate on bind"
  condition.
- Set the VMA tile_invalidated in addition to tile_present for MAP
  operations that meet the "invalidate on bind" condition.

I might be missing some implementation details mentioned above, but this
should provide you with some direction.

Lastly, and perhaps most importantly, please test this using an IGT and
include the results in the next post.

Matt

> +
>  		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
>  						  addr, range, op, flags,
>  						  prefetch_region, pat_index);
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] drm/xe: Add a function to zap page table by address range
  2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng
                   ` (5 preceding siblings ...)
  2025-01-28 23:13 ` ✗ CI.KUnit: failure " Patchwork
@ 2025-01-28 23:38 ` Matthew Brost
  2025-01-29  8:48 ` Thomas Hellström
  7 siblings, 0 replies; 20+ messages in thread
From: Matthew Brost @ 2025-01-28 23:38 UTC (permalink / raw)
  To: Oak Zeng; +Cc: intel-xe, joonas.lahtinen, Thomas.Hellstrom

On Tue, Jan 28, 2025 at 05:21:43PM -0500, Oak Zeng wrote:
> Add a function xe_pt_zap_range. This is similar to xe_pt_zap_ptes
> but is used when we don't have a vma to work with, such as zap
> a range mapped to scratch page where we don't have vma.
> 

See my reply to the following patch. This won't work and isn't needed.

Let me explain why this won't work.

When we set up scratch PTEs on a VM, we create shell PTE structure to
largest page size (1 GB), which we 1 GB entry pointing to a scratch
page.

With 48 bits of address space, this is 1 level to 1 GB pages, with 57
bits this is 2 levels to 1 GB pages.

Now, let's say we perform a 4K bind from 0x0 to 0x1000 with immediate
clear in fault mode.

How can we invalidate just the 4K range of memory? We can't. We need to
allocate new memory for page tables so that 0x0000–0x1000 points to an
invalidated PTE, while 0x1000–1 GB points to scratch. This is why, in
my reply to the patch, I indicated that the entire bind pipeline needs
to be updated to properly invalidate the PTEs.

I hope this explanation, along with my reply to the next patch, makes
sense.

Matt

> Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_pt.c | 28 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_pt.h |  2 ++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 1ddcc7e79a93..2363260da6a6 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -792,6 +792,34 @@ static const struct xe_pt_walk_ops xe_pt_zap_ptes_ops = {
>  	.pt_entry = xe_pt_zap_ptes_entry,
>  };
>  
> +/**
> + * xe_pt_zap_range() - Zap (zero) gpu ptes of an virtual address range
> + * @tile: The tile we're zapping for.
> + * @vm: The vm we're zapping for.
> + * @start: Start of the virtual address range, inclusive.
> + * @end: End of the virtual address range, exclusive.
> + *
> + * This is similar to xe_pt_zap_ptes() but it's used when we don't have a
> + * vma to work with. This is used for example when we're clearing the scratch
> + * page mapping during vm_bind.
> + *
> + */
> +void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64 start, u64 end)
> +{
> +	struct xe_pt_zap_ptes_walk xe_walk = {
> +		.base = {
> +			.ops = &xe_pt_zap_ptes_ops,
> +			.shifts = xe_normal_pt_shifts,
> +			.max_level = XE_PT_HIGHEST_LEVEL,
> +		},
> +		.tile = tile,
> +	};
> +	struct xe_pt *pt = vm->pt_root[tile->id];
> +
> +	(void)xe_pt_walk_shared(&pt->base, pt->level, start,
> +				end, &xe_walk.base);
> +}
> +
>  /**
>   * xe_pt_zap_ptes() - Zap (zero) gpu ptes of an address range
>   * @tile: The tile we're zapping for.
> diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
> index 9ab386431cad..b166b324f455 100644
> --- a/drivers/gpu/drm/xe/xe_pt.h
> +++ b/drivers/gpu/drm/xe/xe_pt.h
> @@ -43,4 +43,6 @@ void xe_pt_update_ops_abort(struct xe_tile *tile, struct xe_vma_ops *vops);
>  
>  bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
>  
> +void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64 start, u64 end);
> +
>  #endif
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind
  2025-01-28 23:19   ` Matthew Brost
@ 2025-01-29  5:04     ` Matthew Brost
  2025-01-29  8:21       ` Thomas Hellström
  2025-01-29 21:01     ` Zeng, Oak
  1 sibling, 1 reply; 20+ messages in thread
From: Matthew Brost @ 2025-01-29  5:04 UTC (permalink / raw)
  To: Oak Zeng; +Cc: intel-xe, joonas.lahtinen, Thomas.Hellstrom

On Tue, Jan 28, 2025 at 03:19:13PM -0800, Matthew Brost wrote:
> On Tue, Jan 28, 2025 at 05:21:44PM -0500, Oak Zeng wrote:
> > When a vm runs under fault mode, if scratch page is enabled, we need
> > to clear the scratch page mapping before vm_bind for the vm_bind
> > address range. Under fault mode, we depend on recoverable page fault
> > to establish mapping in page table. If scratch page is not cleared,
> > GPU access of address won't cause page fault because it always hits
> > the existing scratch page mapping.
> > 
> > When vm_bind with IMMEDIATE flag, there is no need of clearing as
> > immediate bind can overwrite the scratch page mapping.
> > 
> > So far only is xe2 and xe3 products are allowed to enable scratch page
> > under fault mode. On other platform we don't allow scratch page under
> > fault mode, so no need of such clearing.
> > 
> > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index 690330352d4c..196d347c6ac0 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -38,6 +38,7 @@
> >  #include "xe_trace_bo.h"
> >  #include "xe_wa.h"
> >  #include "xe_hmm.h"
> > +#include "i915_drv.h"
> >  
> >  static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
> >  {
> > @@ -2917,6 +2918,34 @@ static int xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo *bo,
> >  	return 0;
> >  }
> >  
> > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device *xe,
> > +					      struct xe_vm *vm, u32 bind_flags)
> > +{
> > +	if (!xe_vm_in_fault_mode(vm))
> > +		return false;
> > +
> > +	if (!xe_vm_has_scratch(vm))
> > +		return false;
> > +
> > +	if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE)
> > +		return false;
> > +
> > +	if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || IS_PANTHERLAKE(xe)))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +static void __xe_vm_clear_scratch_pages(struct xe_device *xe, struct xe_vm *vm,
> > +					u64 start, u64 end)
> > +{
> > +	struct xe_tile *tile;
> > +	u8 id;
> > +
> > +	for_each_tile(tile, xe, id)
> > +		xe_pt_zap_range(tile, vm, start, end);
> > +}
> > +
> >  int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >  {
> >  	struct xe_device *xe = to_xe_device(dev);
> > @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >  		u32 prefetch_region = bind_ops[i].prefetch_mem_region_instance;
> >  		u16 pat_index = bind_ops[i].pat_index;
> >  
> > +		if (__xe_vm_needs_clear_scratch_pages(xe, vm, flags))
> > +			__xe_vm_clear_scratch_pages(xe, vm, addr, addr + range);
> 
> A few things...
> 
> - I believe this is only needed for bind user operations or internal MAP
>   GPU VMA operations.
> - I believe a TLB invalidation will be required.
> - I don't think calling zap PTEs range works here, given how the scratch
>   tables are set up (i.e., new PTEs need to be created pointing to an
>   invalid state).
> - This series appears to be untested based on the points above.
> 
> Therefore, instead of this series, I believe you will need to fully
> update the bind pipeline to process MAP GPU VMA operations here.
> 
> So roughly...
> 
> - Maybe include a bit in xe_vma_op_map that specifies "invalidate on
>   bind," set in vm_bind_ioctl_ops_create, since this will need to be
>   wired throughout the bind pipeline.
> - Don't validate backing memory in this case.
> - Ensure that xe_vma_ops_incr_pt_update_ops is called in this case for
>   MAP operations, forcing entry into the xe_pt.c backend.
> - Update xe_pt_stage_bind_walk with a variable that indicates clearing
>   the PTE. Instead of calling pte_encode_vma in xe_pt_stage_bind_entry,
>   set this variable for PT bind operations derived from MAP operations
>   that meet the "invalidate on bind" condition.

Ugh, typos / nonsense here. Let me try again.

"Update xe_pt_stage_bind_walk with a variable that indicates clearing
the PTE, set this for MAP ops that meet the 'invalidate on bind'
condition. When this variable is set, instead of calling pte_encode_vma
in xe_pt_stage_bind_entry, call a function which encodes an invalid
PTE."

Matt

> - Ensure needs_invalidation is set in struct xe_vm_pgtable_update_ops if
>   a MAP operation is included that meets the "invalidate on bind"
>   condition.
> - Set the VMA tile_invalidated in addition to tile_present for MAP
>   operations that meet the "invalidate on bind" condition.
> 
> I might be missing some implementation details mentioned above, but this
> should provide you with some direction.
> 
> Lastly, and perhaps most importantly, please test this using an IGT and
> include the results in the next post.
> 
> Matt
> 
> > +
> >  		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
> >  						  addr, range, op, flags,
> >  						  prefetch_region, pat_index);
> > -- 
> > 2.26.3
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind
  2025-01-29  5:04     ` Matthew Brost
@ 2025-01-29  8:21       ` Thomas Hellström
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Hellström @ 2025-01-29  8:21 UTC (permalink / raw)
  To: Matthew Brost, Oak Zeng; +Cc: intel-xe, joonas.lahtinen

On Tue, 2025-01-28 at 21:04 -0800, Matthew Brost wrote:
> On Tue, Jan 28, 2025 at 03:19:13PM -0800, Matthew Brost wrote:
> > On Tue, Jan 28, 2025 at 05:21:44PM -0500, Oak Zeng wrote:
> > > When a vm runs under fault mode, if scratch page is enabled, we
> > > need
> > > to clear the scratch page mapping before vm_bind for the vm_bind
> > > address range. Under fault mode, we depend on recoverable page
> > > fault
> > > to establish mapping in page table. If scratch page is not
> > > cleared,
> > > GPU access of address won't cause page fault because it always
> > > hits
> > > the existing scratch page mapping.
> > > 
> > > When vm_bind with IMMEDIATE flag, there is no need of clearing as
> > > immediate bind can overwrite the scratch page mapping.
> > > 
> > > So far only is xe2 and xe3 products are allowed to enable scratch
> > > page
> > > under fault mode. On other platform we don't allow scratch page
> > > under
> > > fault mode, so no need of such clearing.
> > > 
> > > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++++++
> > >  1 file changed, 32 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > b/drivers/gpu/drm/xe/xe_vm.c
> > > index 690330352d4c..196d347c6ac0 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -38,6 +38,7 @@
> > >  #include "xe_trace_bo.h"
> > >  #include "xe_wa.h"
> > >  #include "xe_hmm.h"
> > > +#include "i915_drv.h"
> > >  
> > >  static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
> > >  {
> > > @@ -2917,6 +2918,34 @@ static int
> > > xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo
> > > *bo,
> > >  	return 0;
> > >  }
> > >  
> > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device
> > > *xe,
> > > +					      struct xe_vm *vm,
> > > u32 bind_flags)
> > > +{
> > > +	if (!xe_vm_in_fault_mode(vm))
> > > +		return false;
> > > +
> > > +	if (!xe_vm_has_scratch(vm))
> > > +		return false;
> > > +
> > > +	if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE)
> > > +		return false;
> > > +
> > > +	if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) ||
> > > IS_PANTHERLAKE(xe)))
> > > +		return false;
> > > +
> > > +	return true;
> > > +}
> > > +
> > > +static void __xe_vm_clear_scratch_pages(struct xe_device *xe,
> > > struct xe_vm *vm,
> > > +					u64 start, u64 end)
> > > +{
> > > +	struct xe_tile *tile;
> > > +	u8 id;
> > > +
> > > +	for_each_tile(tile, xe, id)
> > > +		xe_pt_zap_range(tile, vm, start, end);
> > > +}
> > > +
> > >  int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct
> > > drm_file *file)
> > >  {
> > >  	struct xe_device *xe = to_xe_device(dev);
> > > @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device
> > > *dev, void *data, struct drm_file *file)
> > >  		u32 prefetch_region =
> > > bind_ops[i].prefetch_mem_region_instance;
> > >  		u16 pat_index = bind_ops[i].pat_index;
> > >  
> > > +		if (__xe_vm_needs_clear_scratch_pages(xe, vm,
> > > flags))
> > > +			__xe_vm_clear_scratch_pages(xe, vm,
> > > addr, addr + range);
> > 
> > A few things...
> > 
> > - I believe this is only needed for bind user operations or
> > internal MAP
> >   GPU VMA operations.
> > - I believe a TLB invalidation will be required.
> > - I don't think calling zap PTEs range works here, given how the
> > scratch
> >   tables are set up (i.e., new PTEs need to be created pointing to
> > an
> >   invalid state).
> > - This series appears to be untested based on the points above.
> > 
> > Therefore, instead of this series, I believe you will need to fully
> > update the bind pipeline to process MAP GPU VMA operations here.
> > 
> > So roughly...
> > 
> > - Maybe include a bit in xe_vma_op_map that specifies "invalidate
> > on
> >   bind," set in vm_bind_ioctl_ops_create, since this will need to
> > be
> >   wired throughout the bind pipeline.
> > - Don't validate backing memory in this case.
> > - Ensure that xe_vma_ops_incr_pt_update_ops is called in this case
> > for
> >   MAP operations, forcing entry into the xe_pt.c backend.
> > - Update xe_pt_stage_bind_walk with a variable that indicates
> > clearing
> >   the PTE. Instead of calling pte_encode_vma in
> > xe_pt_stage_bind_entry,
> >   set this variable for PT bind operations derived from MAP
> > operations
> >   that meet the "invalidate on bind" condition.
> 
> Ugh, typos / nonsense here. Let me try again.
> 
> "Update xe_pt_stage_bind_walk with a variable that indicates clearing
> the PTE, set this for MAP ops that meet the 'invalidate on bind'
> condition. When this variable is set, instead of calling
> pte_encode_vma
> in xe_pt_stage_bind_entry, call a function which encodes an invalid
> PTE."
> 
> Matt

I agree with Matt here. This needs to look just like a regular bind,
with the exception that PTEs are cleared instead of pointing towards
valid pages.

/Thomas


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] drm/xe: Add a function to zap page table by address range
  2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng
                   ` (6 preceding siblings ...)
  2025-01-28 23:38 ` [PATCH 1/3] " Matthew Brost
@ 2025-01-29  8:48 ` Thomas Hellström
  7 siblings, 0 replies; 20+ messages in thread
From: Thomas Hellström @ 2025-01-29  8:48 UTC (permalink / raw)
  To: Oak Zeng, intel-xe; +Cc: joonas.lahtinen

On Tue, 2025-01-28 at 17:21 -0500, Oak Zeng wrote:
> Add a function xe_pt_zap_range. This is similar to xe_pt_zap_ptes
> but is used when we don't have a vma to work with, such as zap
> a range mapped to scratch page where we don't have vma.
> 
> Signed-off-by: Oak Zeng <oak.zeng@intel.com>

As I think also Matt mentions, The zap functionality works only when it
exactly matches a pre-bound range (like in this case a vma) and there
is something cleaning up the zapped ptes afterwards. Typically that's a
rebind or an unbind.

Thanks,
Thomas



> ---
>  drivers/gpu/drm/xe/xe_pt.c | 28 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_pt.h |  2 ++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 1ddcc7e79a93..2363260da6a6 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -792,6 +792,34 @@ static const struct xe_pt_walk_ops
> xe_pt_zap_ptes_ops = {
>  	.pt_entry = xe_pt_zap_ptes_entry,
>  };
>  
> +/**
> + * xe_pt_zap_range() - Zap (zero) gpu ptes of an virtual address
> range
> + * @tile: The tile we're zapping for.
> + * @vm: The vm we're zapping for.
> + * @start: Start of the virtual address range, inclusive.
> + * @end: End of the virtual address range, exclusive.
> + *
> + * This is similar to xe_pt_zap_ptes() but it's used when we don't
> have a
> + * vma to work with. This is used for example when we're clearing
> the scratch
> + * page mapping during vm_bind.
> + *
> + */
> +void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64
> start, u64 end)
> +{
> +	struct xe_pt_zap_ptes_walk xe_walk = {
> +		.base = {
> +			.ops = &xe_pt_zap_ptes_ops,
> +			.shifts = xe_normal_pt_shifts,
> +			.max_level = XE_PT_HIGHEST_LEVEL,
> +		},
> +		.tile = tile,
> +	};
> +	struct xe_pt *pt = vm->pt_root[tile->id];
> +
> +	(void)xe_pt_walk_shared(&pt->base, pt->level, start,
> +				end, &xe_walk.base);
> +}
> +
>  /**
>   * xe_pt_zap_ptes() - Zap (zero) gpu ptes of an address range
>   * @tile: The tile we're zapping for.
> diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
> index 9ab386431cad..b166b324f455 100644
> --- a/drivers/gpu/drm/xe/xe_pt.h
> +++ b/drivers/gpu/drm/xe/xe_pt.h
> @@ -43,4 +43,6 @@ void xe_pt_update_ops_abort(struct xe_tile *tile,
> struct xe_vma_ops *vops);
>  
>  bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
>  
> +void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64
> start, u64 end);
> +
>  #endif


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform
  2025-01-28 22:21 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng
  2025-01-28 23:05   ` Cavitt, Jonathan
@ 2025-01-29  8:52   ` Thomas Hellström
  2025-01-29 16:41     ` Matthew Brost
  1 sibling, 1 reply; 20+ messages in thread
From: Thomas Hellström @ 2025-01-29  8:52 UTC (permalink / raw)
  To: Oak Zeng, intel-xe; +Cc: joonas.lahtinen

On Tue, 2025-01-28 at 17:21 -0500, Oak Zeng wrote:
> Normally scratch page is not allowed when a vm is operate under page
> fault mode, i.e., in the existing codes,
> DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE
> and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason
> is fault mode relies on recoverable page to work, while scratch page
> can mute recoverable page fault.
> 
> On xe2 and xe3, HW prefetching can cause page fault interrupt. Due to
> page fault interrupt overhead (i.e., need Guc and KMD involved to fix
> the page fault), HW prefetching can be slowed by many orders of
> magnitude.

The problem on xe2 and xe3 as I understand it is not the overhead but
the fact that OOB prefetching causes pagefaults and we can't resolve
those.

> 
> Fix this problem by allowing scratch page under fault mode for xe2
> and xe3.
> With scratch page in place, HW prefetching could always hit scratch
> page
> instead of causing interrupt.
> 
> Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_vm.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 196d347c6ac0..3346f88f284a 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1732,6 +1732,11 @@ int xe_vm_create_ioctl(struct drm_device *dev,
> void *data,
>  	if (XE_IOCTL_DBG(xe, args->extensions))
>  		return -EINVAL;
>  
> +	if (XE_IOCTL_DBG(xe, args->flags &
> DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE &&
> +			 args->flags &
> DRM_XE_VM_CREATE_FLAG_FAULT_MODE &&
> +			 !(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) ||
> IS_PANTHERLAKE(xe))))
> +		return -EINVAL;
> +

Could we perhaps add a bit in the xe->info structure?

/Thomas


>  	if (XE_WA(xe_root_mmio_gt(xe), 14016763929))
>  		args->flags |= DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE;
>  
> @@ -1745,10 +1750,6 @@ int xe_vm_create_ioctl(struct drm_device *dev,
> void *data,
>  	if (XE_IOCTL_DBG(xe, args->flags &
> ~ALL_DRM_XE_VM_CREATE_FLAGS))
>  		return -EINVAL;
>  
> -	if (XE_IOCTL_DBG(xe, args->flags &
> DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE &&
> -			 args->flags &
> DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
> -		return -EINVAL;
> -
>  	if (XE_IOCTL_DBG(xe, !(args->flags &
> DRM_XE_VM_CREATE_FLAG_LR_MODE) &&
>  			 args->flags &
> DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
>  		return -EINVAL;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform
  2025-01-29  8:52   ` Thomas Hellström
@ 2025-01-29 16:41     ` Matthew Brost
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Brost @ 2025-01-29 16:41 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: Oak Zeng, intel-xe, joonas.lahtinen

On Wed, Jan 29, 2025 at 09:52:54AM +0100, Thomas Hellström wrote:
> On Tue, 2025-01-28 at 17:21 -0500, Oak Zeng wrote:
> > Normally scratch page is not allowed when a vm is operate under page
> > fault mode, i.e., in the existing codes,
> > DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE
> > and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason
> > is fault mode relies on recoverable page to work, while scratch page
> > can mute recoverable page fault.
> > 
> > On xe2 and xe3, HW prefetching can cause page fault interrupt. Due to
> > page fault interrupt overhead (i.e., need Guc and KMD involved to fix
> > the page fault), HW prefetching can be slowed by many orders of
> > magnitude.
> 
> The problem on xe2 and xe3 as I understand it is not the overhead but
> the fact that OOB prefetching causes pagefaults and we can't resolve
> those.
> 
> > 
> > Fix this problem by allowing scratch page under fault mode for xe2
> > and xe3.
> > With scratch page in place, HW prefetching could always hit scratch
> > page
> > instead of causing interrupt.
> > 
> > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_vm.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index 196d347c6ac0..3346f88f284a 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -1732,6 +1732,11 @@ int xe_vm_create_ioctl(struct drm_device *dev,
> > void *data,
> >  	if (XE_IOCTL_DBG(xe, args->extensions))
> >  		return -EINVAL;
> >  
> > +	if (XE_IOCTL_DBG(xe, args->flags &
> > DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE &&
> > +			 args->flags &
> > DRM_XE_VM_CREATE_FLAG_FAULT_MODE &&
> > +			 !(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) ||
> > IS_PANTHERLAKE(xe))))
> > +		return -EINVAL;
> > +
> 
> Could we perhaps add a bit in the xe->info structure?
> 

+1. This seems like a good idea.

Matt

> /Thomas
> 
> 
> >  	if (XE_WA(xe_root_mmio_gt(xe), 14016763929))
> >  		args->flags |= DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE;
> >  
> > @@ -1745,10 +1750,6 @@ int xe_vm_create_ioctl(struct drm_device *dev,
> > void *data,
> >  	if (XE_IOCTL_DBG(xe, args->flags &
> > ~ALL_DRM_XE_VM_CREATE_FLAGS))
> >  		return -EINVAL;
> >  
> > -	if (XE_IOCTL_DBG(xe, args->flags &
> > DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE &&
> > -			 args->flags &
> > DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
> > -		return -EINVAL;
> > -
> >  	if (XE_IOCTL_DBG(xe, !(args->flags &
> > DRM_XE_VM_CREATE_FLAG_LR_MODE) &&
> >  			 args->flags &
> > DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
> >  		return -EINVAL;
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind
  2025-01-28 23:19   ` Matthew Brost
  2025-01-29  5:04     ` Matthew Brost
@ 2025-01-29 21:01     ` Zeng, Oak
  2025-01-30  1:36       ` Matthew Brost
  1 sibling, 1 reply; 20+ messages in thread
From: Zeng, Oak @ 2025-01-29 21:01 UTC (permalink / raw)
  To: Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, joonas.lahtinen@linux.intel.com,
	Thomas.Hellstrom@linux.intel.com



> -----Original Message-----
> From: Brost, Matthew <matthew.brost@intel.com>
> Sent: January 28, 2025 6:19 PM
> To: Zeng, Oak <oak.zeng@intel.com>
> Cc: intel-xe@lists.freedesktop.org; joonas.lahtinen@linux.intel.com;
> Thomas.Hellstrom@linux.intel.com
> Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind
> 
> On Tue, Jan 28, 2025 at 05:21:44PM -0500, Oak Zeng wrote:
> > When a vm runs under fault mode, if scratch page is enabled, we
> need
> > to clear the scratch page mapping before vm_bind for the vm_bind
> > address range. Under fault mode, we depend on recoverable page
> fault
> > to establish mapping in page table. If scratch page is not cleared,
> > GPU access of address won't cause page fault because it always hits
> > the existing scratch page mapping.
> >
> > When vm_bind with IMMEDIATE flag, there is no need of clearing as
> > immediate bind can overwrite the scratch page mapping.
> >
> > So far only is xe2 and xe3 products are allowed to enable scratch
> page
> > under fault mode. On other platform we don't allow scratch page
> under
> > fault mode, so no need of such clearing.
> >
> > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_vm.c | 32
> ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> b/drivers/gpu/drm/xe/xe_vm.c
> > index 690330352d4c..196d347c6ac0 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -38,6 +38,7 @@
> >  #include "xe_trace_bo.h"
> >  #include "xe_wa.h"
> >  #include "xe_hmm.h"
> > +#include "i915_drv.h"
> >
> >  static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
> >  {
> > @@ -2917,6 +2918,34 @@ static int
> xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo
> *bo,
> >  	return 0;
> >  }
> >
> > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device
> *xe,
> > +					      struct xe_vm *vm, u32
> bind_flags)
> > +{
> > +	if (!xe_vm_in_fault_mode(vm))
> > +		return false;
> > +
> > +	if (!xe_vm_has_scratch(vm))
> > +		return false;
> > +
> > +	if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE)
> > +		return false;
> > +
> > +	if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) ||
> IS_PANTHERLAKE(xe)))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +static void __xe_vm_clear_scratch_pages(struct xe_device *xe,
> struct xe_vm *vm,
> > +					u64 start, u64 end)
> > +{
> > +	struct xe_tile *tile;
> > +	u8 id;
> > +
> > +	for_each_tile(tile, xe, id)
> > +		xe_pt_zap_range(tile, vm, start, end);
> > +}
> > +
> >  int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct
> drm_file *file)
> >  {
> >  	struct xe_device *xe = to_xe_device(dev);
> > @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device
> *dev, void *data, struct drm_file *file)
> >  		u32 prefetch_region =
> bind_ops[i].prefetch_mem_region_instance;
> >  		u16 pat_index = bind_ops[i].pat_index;
> >
> > +		if (__xe_vm_needs_clear_scratch_pages(xe, vm,
> flags))
> > +			__xe_vm_clear_scratch_pages(xe, vm, addr,
> addr + range);
> 
> A few things...
> 
> - I believe this is only needed for bind user operations or internal MAP
>   GPU VMA operations.

Did you mean this is only needed for user bind, but not for internal map?

> - I believe a TLB invalidation will be required.

My understanding is, NULL pte won't be cached in TLB, so TLB invalidation is not needed. 

> - I don't think calling zap PTEs range works here, given how the scratch
>   tables are set up (i.e., new PTEs need to be created pointing to an
>   invalid state).

You are right. 

I didn't realize that using xe_pt_walk_shared to walk and zap PTEs has a limitation:
For the virtual address range we want to zap, all the page tables has to be already
exist. This interface doesn't create new page tables. Even though xe_pt_walk_shared
takes a range parameter (addr, end), range parameter can't be arbitrary.

Today only [xe_vma_start, xe_vma_end) is used to specify the xe_pt_walk_shared
Walking range. Arbitrary range won't work as you pointed out. To me this is a small
Interface design issue. If you agree, I can re-parameterize xe_pt_walk_shared to take
VMA vs addr/end. This way people won't make the same mistake in the future.

Anyway, I will follow the direction you give below to rework this series.

Oak

> - This series appears to be untested based on the points above.
> 
> Therefore, instead of this series, I believe you will need to fully
> update the bind pipeline to process MAP GPU VMA operations here.
> 
> So roughly...
> 
> - Maybe include a bit in xe_vma_op_map that specifies "invalidate on
>   bind," set in vm_bind_ioctl_ops_create, since this will need to be
>   wired throughout the bind pipeline.
> - Don't validate backing memory in this case.
> - Ensure that xe_vma_ops_incr_pt_update_ops is called in this case
> for
>   MAP operations, forcing entry into the xe_pt.c backend.
> - Update xe_pt_stage_bind_walk with a variable that indicates
> clearing
>   the PTE. Instead of calling pte_encode_vma in
> xe_pt_stage_bind_entry,
>   set this variable for PT bind operations derived from MAP operations
>   that meet the "invalidate on bind" condition.
> - Ensure needs_invalidation is set in struct
> xe_vm_pgtable_update_ops if
>   a MAP operation is included that meets the "invalidate on bind"
>   condition.
> - Set the VMA tile_invalidated in addition to tile_present for MAP
>   operations that meet the "invalidate on bind" condition.
> 
> I might be missing some implementation details mentioned above,
> but this
> should provide you with some direction.
> 
> Lastly, and perhaps most importantly, please test this using an IGT
> and
> include the results in the next post.
> 
> Matt
> 
> > +
> >  		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i],
> obj_offset,
> >  						  addr, range, op, flags,
> >  						  prefetch_region,
> pat_index);
> > --
> > 2.26.3
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind
  2025-01-29 21:01     ` Zeng, Oak
@ 2025-01-30  1:36       ` Matthew Brost
  2025-01-30  8:36         ` Thomas Hellström
  0 siblings, 1 reply; 20+ messages in thread
From: Matthew Brost @ 2025-01-30  1:36 UTC (permalink / raw)
  To: Zeng, Oak
  Cc: intel-xe@lists.freedesktop.org, joonas.lahtinen@linux.intel.com,
	Thomas.Hellstrom@linux.intel.com

On Wed, Jan 29, 2025 at 02:01:38PM -0700, Zeng, Oak wrote:
> 
> 
> > -----Original Message-----
> > From: Brost, Matthew <matthew.brost@intel.com>
> > Sent: January 28, 2025 6:19 PM
> > To: Zeng, Oak <oak.zeng@intel.com>
> > Cc: intel-xe@lists.freedesktop.org; joonas.lahtinen@linux.intel.com;
> > Thomas.Hellstrom@linux.intel.com
> > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind
> > 
> > On Tue, Jan 28, 2025 at 05:21:44PM -0500, Oak Zeng wrote:
> > > When a vm runs under fault mode, if scratch page is enabled, we
> > need
> > > to clear the scratch page mapping before vm_bind for the vm_bind
> > > address range. Under fault mode, we depend on recoverable page
> > fault
> > > to establish mapping in page table. If scratch page is not cleared,
> > > GPU access of address won't cause page fault because it always hits
> > > the existing scratch page mapping.
> > >
> > > When vm_bind with IMMEDIATE flag, there is no need of clearing as
> > > immediate bind can overwrite the scratch page mapping.
> > >
> > > So far only is xe2 and xe3 products are allowed to enable scratch
> > page
> > > under fault mode. On other platform we don't allow scratch page
> > under
> > > fault mode, so no need of such clearing.
> > >
> > > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_vm.c | 32
> > ++++++++++++++++++++++++++++++++
> > >  1 file changed, 32 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > b/drivers/gpu/drm/xe/xe_vm.c
> > > index 690330352d4c..196d347c6ac0 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -38,6 +38,7 @@
> > >  #include "xe_trace_bo.h"
> > >  #include "xe_wa.h"
> > >  #include "xe_hmm.h"
> > > +#include "i915_drv.h"
> > >
> > >  static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
> > >  {
> > > @@ -2917,6 +2918,34 @@ static int
> > xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo
> > *bo,
> > >  	return 0;
> > >  }
> > >
> > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device
> > *xe,
> > > +					      struct xe_vm *vm, u32
> > bind_flags)
> > > +{
> > > +	if (!xe_vm_in_fault_mode(vm))
> > > +		return false;
> > > +
> > > +	if (!xe_vm_has_scratch(vm))
> > > +		return false;
> > > +
> > > +	if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE)
> > > +		return false;
> > > +
> > > +	if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) ||
> > IS_PANTHERLAKE(xe)))
> > > +		return false;
> > > +
> > > +	return true;
> > > +}
> > > +
> > > +static void __xe_vm_clear_scratch_pages(struct xe_device *xe,
> > struct xe_vm *vm,
> > > +					u64 start, u64 end)
> > > +{
> > > +	struct xe_tile *tile;
> > > +	u8 id;
> > > +
> > > +	for_each_tile(tile, xe, id)
> > > +		xe_pt_zap_range(tile, vm, start, end);
> > > +}
> > > +
> > >  int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct
> > drm_file *file)
> > >  {
> > >  	struct xe_device *xe = to_xe_device(dev);
> > > @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device
> > *dev, void *data, struct drm_file *file)
> > >  		u32 prefetch_region =
> > bind_ops[i].prefetch_mem_region_instance;
> > >  		u16 pat_index = bind_ops[i].pat_index;
> > >
> > > +		if (__xe_vm_needs_clear_scratch_pages(xe, vm,
> > flags))
> > > +			__xe_vm_clear_scratch_pages(xe, vm, addr,
> > addr + range);
> > 
> > A few things...
> > 
> > - I believe this is only needed for bind user operations or internal MAP
> >   GPU VMA operations.
> 
> Did you mean this is only needed for user bind, but not for internal map?
>

Needed for user bind and also for internal MAP. With updating the bind
pipeline you will only care MAP ops.
 
> > - I believe a TLB invalidation will be required.
> 
> My understanding is, NULL pte won't be cached in TLB, so TLB invalidation is not needed. 
> 

I would think NULL ptes are cached but not certain either way.

> > - I don't think calling zap PTEs range works here, given how the scratch
> >   tables are set up (i.e., new PTEs need to be created pointing to an
> >   invalid state).
> 
> You are right. 
> 
> I didn't realize that using xe_pt_walk_shared to walk and zap PTEs has a limitation:
> For the virtual address range we want to zap, all the page tables has to be already
> exist. This interface doesn't create new page tables. Even though xe_pt_walk_shared
> takes a range parameter (addr, end), range parameter can't be arbitrary.
> 
> Today only [xe_vma_start, xe_vma_end) is used to specify the xe_pt_walk_shared
> Walking range. Arbitrary range won't work as you pointed out. To me this is a small
> Interface design issue. If you agree, I can re-parameterize xe_pt_walk_shared to take
> VMA vs addr/end. This way people won't make the same mistake in the future.
> 

Ah, no. SVM will use ranges so I think (addr, end) are the right
parameters for internal PT functions.

> Anyway, I will follow the direction you give below to rework this series.
> 

+1

Matt

> Oak
> 
> > - This series appears to be untested based on the points above.
> > 
> > Therefore, instead of this series, I believe you will need to fully
> > update the bind pipeline to process MAP GPU VMA operations here.
> > 
> > So roughly...
> > 
> > - Maybe include a bit in xe_vma_op_map that specifies "invalidate on
> >   bind," set in vm_bind_ioctl_ops_create, since this will need to be
> >   wired throughout the bind pipeline.
> > - Don't validate backing memory in this case.
> > - Ensure that xe_vma_ops_incr_pt_update_ops is called in this case
> > for
> >   MAP operations, forcing entry into the xe_pt.c backend.
> > - Update xe_pt_stage_bind_walk with a variable that indicates
> > clearing
> >   the PTE. Instead of calling pte_encode_vma in
> > xe_pt_stage_bind_entry,
> >   set this variable for PT bind operations derived from MAP operations
> >   that meet the "invalidate on bind" condition.
> > - Ensure needs_invalidation is set in struct
> > xe_vm_pgtable_update_ops if
> >   a MAP operation is included that meets the "invalidate on bind"
> >   condition.
> > - Set the VMA tile_invalidated in addition to tile_present for MAP
> >   operations that meet the "invalidate on bind" condition.
> > 
> > I might be missing some implementation details mentioned above,
> > but this
> > should provide you with some direction.
> > 
> > Lastly, and perhaps most importantly, please test this using an IGT
> > and
> > include the results in the next post.
> > 
> > Matt
> > 
> > > +
> > >  		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i],
> > obj_offset,
> > >  						  addr, range, op, flags,
> > >  						  prefetch_region,
> > pat_index);
> > > --
> > > 2.26.3
> > >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind
  2025-01-30  1:36       ` Matthew Brost
@ 2025-01-30  8:36         ` Thomas Hellström
  2025-01-30 15:18           ` Zeng, Oak
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Hellström @ 2025-01-30  8:36 UTC (permalink / raw)
  To: Matthew Brost, Zeng, Oak
  Cc: intel-xe@lists.freedesktop.org, joonas.lahtinen@linux.intel.com

On Wed, 2025-01-29 at 17:36 -0800, Matthew Brost wrote:
> On Wed, Jan 29, 2025 at 02:01:38PM -0700, Zeng, Oak wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: Brost, Matthew <matthew.brost@intel.com>
> > > Sent: January 28, 2025 6:19 PM
> > > To: Zeng, Oak <oak.zeng@intel.com>
> > > Cc: intel-xe@lists.freedesktop.org;
> > > joonas.lahtinen@linux.intel.com;
> > > Thomas.Hellstrom@linux.intel.com
> > > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before
> > > vm_bind
> > > 
> > > On Tue, Jan 28, 2025 at 05:21:44PM -0500, Oak Zeng wrote:
> > > > When a vm runs under fault mode, if scratch page is enabled, we
> > > need
> > > > to clear the scratch page mapping before vm_bind for the
> > > > vm_bind
> > > > address range. Under fault mode, we depend on recoverable page
> > > fault
> > > > to establish mapping in page table. If scratch page is not
> > > > cleared,
> > > > GPU access of address won't cause page fault because it always
> > > > hits
> > > > the existing scratch page mapping.
> > > > 
> > > > When vm_bind with IMMEDIATE flag, there is no need of clearing
> > > > as
> > > > immediate bind can overwrite the scratch page mapping.
> > > > 
> > > > So far only is xe2 and xe3 products are allowed to enable
> > > > scratch
> > > page
> > > > under fault mode. On other platform we don't allow scratch page
> > > under
> > > > fault mode, so no need of such clearing.
> > > > 
> > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_vm.c | 32
> > > ++++++++++++++++++++++++++++++++
> > > >  1 file changed, 32 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > b/drivers/gpu/drm/xe/xe_vm.c
> > > > index 690330352d4c..196d347c6ac0 100644
> > > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > > @@ -38,6 +38,7 @@
> > > >  #include "xe_trace_bo.h"
> > > >  #include "xe_wa.h"
> > > >  #include "xe_hmm.h"
> > > > +#include "i915_drv.h"
> > > > 
> > > >  static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
> > > >  {
> > > > @@ -2917,6 +2918,34 @@ static int
> > > xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo
> > > *bo,
> > > >  	return 0;
> > > >  }
> > > > 
> > > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device
> > > *xe,
> > > > +					      struct xe_vm
> > > > *vm, u32
> > > bind_flags)
> > > > +{
> > > > +	if (!xe_vm_in_fault_mode(vm))
> > > > +		return false;
> > > > +
> > > > +	if (!xe_vm_has_scratch(vm))
> > > > +		return false;
> > > > +
> > > > +	if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE)
> > > > +		return false;
> > > > +
> > > > +	if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) ||
> > > IS_PANTHERLAKE(xe)))
> > > > +		return false;
> > > > +
> > > > +	return true;
> > > > +}
> > > > +
> > > > +static void __xe_vm_clear_scratch_pages(struct xe_device *xe,
> > > struct xe_vm *vm,
> > > > +					u64 start, u64 end)
> > > > +{
> > > > +	struct xe_tile *tile;
> > > > +	u8 id;
> > > > +
> > > > +	for_each_tile(tile, xe, id)
> > > > +		xe_pt_zap_range(tile, vm, start, end);
> > > > +}
> > > > +
> > > >  int xe_vm_bind_ioctl(struct drm_device *dev, void *data,
> > > > struct
> > > drm_file *file)
> > > >  {
> > > >  	struct xe_device *xe = to_xe_device(dev);
> > > > @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device
> > > *dev, void *data, struct drm_file *file)
> > > >  		u32 prefetch_region =
> > > bind_ops[i].prefetch_mem_region_instance;
> > > >  		u16 pat_index = bind_ops[i].pat_index;
> > > > 
> > > > +		if (__xe_vm_needs_clear_scratch_pages(xe, vm,
> > > flags))
> > > > +			__xe_vm_clear_scratch_pages(xe, vm,
> > > > addr,
> > > addr + range);
> > > 
> > > A few things...
> > > 
> > > - I believe this is only needed for bind user operations or
> > > internal MAP
> > >   GPU VMA operations.
> > 
> > Did you mean this is only needed for user bind, but not for
> > internal map?
> > 
> 
> Needed for user bind and also for internal MAP. With updating the
> bind
> pipeline you will only care MAP ops.
>  
> > > - I believe a TLB invalidation will be required.
> > 
> > My understanding is, NULL pte won't be cached in TLB, so TLB
> > invalidation is not needed. 
> > 
> 
> I would think NULL ptes are cached but not certain either way.

Note that a NULL / scratch mapping for larger pages may point to a
physical page.

/Thomas


> 
> > > - I don't think calling zap PTEs range works here, given how the
> > > scratch
> > >   tables are set up (i.e., new PTEs need to be created pointing
> > > to an
> > >   invalid state).
> > 
> > You are right. 
> > 
> > I didn't realize that using xe_pt_walk_shared to walk and zap PTEs
> > has a limitation:
> > For the virtual address range we want to zap, all the page tables
> > has to be already
> > exist. This interface doesn't create new page tables. Even though
> > xe_pt_walk_shared
> > takes a range parameter (addr, end), range parameter can't be
> > arbitrary.
> > 
> > Today only [xe_vma_start, xe_vma_end) is used to specify the
> > xe_pt_walk_shared
> > Walking range. Arbitrary range won't work as you pointed out. To me
> > this is a small
> > Interface design issue. If you agree, I can re-parameterize
> > xe_pt_walk_shared to take
> > VMA vs addr/end. This way people won't make the same mistake in the
> > future.
> > 
> 
> Ah, no. SVM will use ranges so I think (addr, end) are the right
> parameters for internal PT functions.
> 
> > Anyway, I will follow the direction you give below to rework this
> > series.
> > 
> 
> +1
> 
> Matt
> 
> > Oak
> > 
> > > - This series appears to be untested based on the points above.
> > > 
> > > Therefore, instead of this series, I believe you will need to
> > > fully
> > > update the bind pipeline to process MAP GPU VMA operations here.
> > > 
> > > So roughly...
> > > 
> > > - Maybe include a bit in xe_vma_op_map that specifies "invalidate
> > > on
> > >   bind," set in vm_bind_ioctl_ops_create, since this will need to
> > > be
> > >   wired throughout the bind pipeline.
> > > - Don't validate backing memory in this case.
> > > - Ensure that xe_vma_ops_incr_pt_update_ops is called in this
> > > case
> > > for
> > >   MAP operations, forcing entry into the xe_pt.c backend.
> > > - Update xe_pt_stage_bind_walk with a variable that indicates
> > > clearing
> > >   the PTE. Instead of calling pte_encode_vma in
> > > xe_pt_stage_bind_entry,
> > >   set this variable for PT bind operations derived from MAP
> > > operations
> > >   that meet the "invalidate on bind" condition.
> > > - Ensure needs_invalidation is set in struct
> > > xe_vm_pgtable_update_ops if
> > >   a MAP operation is included that meets the "invalidate on bind"
> > >   condition.
> > > - Set the VMA tile_invalidated in addition to tile_present for
> > > MAP
> > >   operations that meet the "invalidate on bind" condition.
> > > 
> > > I might be missing some implementation details mentioned above,
> > > but this
> > > should provide you with some direction.
> > > 
> > > Lastly, and perhaps most importantly, please test this using an
> > > IGT
> > > and
> > > include the results in the next post.
> > > 
> > > Matt
> > > 
> > > > +
> > > >  		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i],
> > > obj_offset,
> > > >  						  addr, range,
> > > > op, flags,
> > > >  						 
> > > > prefetch_region,
> > > pat_index);
> > > > --
> > > > 2.26.3
> > > > 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind
  2025-01-30  8:36         ` Thomas Hellström
@ 2025-01-30 15:18           ` Zeng, Oak
  0 siblings, 0 replies; 20+ messages in thread
From: Zeng, Oak @ 2025-01-30 15:18 UTC (permalink / raw)
  To: Thomas Hellström, Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, joonas.lahtinen@linux.intel.com



> -----Original Message-----
> From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Sent: January 30, 2025 3:37 AM
> To: Brost, Matthew <matthew.brost@intel.com>; Zeng, Oak
> <oak.zeng@intel.com>
> Cc: intel-xe@lists.freedesktop.org; joonas.lahtinen@linux.intel.com
> Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind
> 
> On Wed, 2025-01-29 at 17:36 -0800, Matthew Brost wrote:
> > On Wed, Jan 29, 2025 at 02:01:38PM -0700, Zeng, Oak wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Brost, Matthew <matthew.brost@intel.com>
> > > > Sent: January 28, 2025 6:19 PM
> > > > To: Zeng, Oak <oak.zeng@intel.com>
> > > > Cc: intel-xe@lists.freedesktop.org;
> > > > joonas.lahtinen@linux.intel.com;
> > > > Thomas.Hellstrom@linux.intel.com
> > > > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before
> > > > vm_bind
> > > >
> > > > On Tue, Jan 28, 2025 at 05:21:44PM -0500, Oak Zeng wrote:
> > > > > When a vm runs under fault mode, if scratch page is enabled,
> we
> > > > need
> > > > > to clear the scratch page mapping before vm_bind for the
> > > > > vm_bind
> > > > > address range. Under fault mode, we depend on recoverable
> page
> > > > fault
> > > > > to establish mapping in page table. If scratch page is not
> > > > > cleared,
> > > > > GPU access of address won't cause page fault because it
> always
> > > > > hits
> > > > > the existing scratch page mapping.
> > > > >
> > > > > When vm_bind with IMMEDIATE flag, there is no need of
> clearing
> > > > > as
> > > > > immediate bind can overwrite the scratch page mapping.
> > > > >
> > > > > So far only is xe2 and xe3 products are allowed to enable
> > > > > scratch
> > > > page
> > > > > under fault mode. On other platform we don't allow scratch
> page
> > > > under
> > > > > fault mode, so no need of such clearing.
> > > > >
> > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/xe/xe_vm.c | 32
> > > > ++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 32 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > > b/drivers/gpu/drm/xe/xe_vm.c
> > > > > index 690330352d4c..196d347c6ac0 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > > > @@ -38,6 +38,7 @@
> > > > >  #include "xe_trace_bo.h"
> > > > >  #include "xe_wa.h"
> > > > >  #include "xe_hmm.h"
> > > > > +#include "i915_drv.h"
> > > > >
> > > > >  static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
> > > > >  {
> > > > > @@ -2917,6 +2918,34 @@ static int
> > > > xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct
> xe_bo
> > > > *bo,
> > > > >  	return 0;
> > > > >  }
> > > > >
> > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct
> xe_device
> > > > *xe,
> > > > > +					      struct xe_vm
> > > > > *vm, u32
> > > > bind_flags)
> > > > > +{
> > > > > +	if (!xe_vm_in_fault_mode(vm))
> > > > > +		return false;
> > > > > +
> > > > > +	if (!xe_vm_has_scratch(vm))
> > > > > +		return false;
> > > > > +
> > > > > +	if (bind_flags &
> DRM_XE_VM_BIND_FLAG_IMMEDIATE)
> > > > > +		return false;
> > > > > +
> > > > > +	if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) ||
> > > > IS_PANTHERLAKE(xe)))
> > > > > +		return false;
> > > > > +
> > > > > +	return true;
> > > > > +}
> > > > > +
> > > > > +static void __xe_vm_clear_scratch_pages(struct xe_device
> *xe,
> > > > struct xe_vm *vm,
> > > > > +					u64 start, u64 end)
> > > > > +{
> > > > > +	struct xe_tile *tile;
> > > > > +	u8 id;
> > > > > +
> > > > > +	for_each_tile(tile, xe, id)
> > > > > +		xe_pt_zap_range(tile, vm, start, end);
> > > > > +}
> > > > > +
> > > > >  int xe_vm_bind_ioctl(struct drm_device *dev, void *data,
> > > > > struct
> > > > drm_file *file)
> > > > >  {
> > > > >  	struct xe_device *xe = to_xe_device(dev);
> > > > > @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct
> drm_device
> > > > *dev, void *data, struct drm_file *file)
> > > > >  		u32 prefetch_region =
> > > > bind_ops[i].prefetch_mem_region_instance;
> > > > >  		u16 pat_index = bind_ops[i].pat_index;
> > > > >
> > > > > +		if (__xe_vm_needs_clear_scratch_pages(xe,
> vm,
> > > > flags))
> > > > > +			__xe_vm_clear_scratch_pages(xe,
> vm,
> > > > > addr,
> > > > addr + range);
> > > >
> > > > A few things...
> > > >
> > > > - I believe this is only needed for bind user operations or
> > > > internal MAP
> > > >   GPU VMA operations.
> > >
> > > Did you mean this is only needed for user bind, but not for
> > > internal map?
> > >
> >
> > Needed for user bind and also for internal MAP. With updating the
> > bind
> > pipeline you will only care MAP ops.
> >
> > > > - I believe a TLB invalidation will be required.
> > >
> > > My understanding is, NULL pte won't be cached in TLB, so TLB
> > > invalidation is not needed.
> > >
> >
> > I would think NULL ptes are cached but not certain either way.
> 
> Note that a NULL / scratch mapping for larger pages may point to a
> physical page.

Thanks for pointing this out. Looking at page table definition, some high
Level page table indeed doesn't support NULL entry.

So yes TLB invalidation is required.

Oak

> 
> /Thomas
> 
> 
> >
> > > > - I don't think calling zap PTEs range works here, given how the
> > > > scratch
> > > >   tables are set up (i.e., new PTEs need to be created pointing
> > > > to an
> > > >   invalid state).
> > >
> > > You are right.
> > >
> > > I didn't realize that using xe_pt_walk_shared to walk and zap PTEs
> > > has a limitation:
> > > For the virtual address range we want to zap, all the page tables
> > > has to be already
> > > exist. This interface doesn't create new page tables. Even though
> > > xe_pt_walk_shared
> > > takes a range parameter (addr, end), range parameter can't be
> > > arbitrary.
> > >
> > > Today only [xe_vma_start, xe_vma_end) is used to specify the
> > > xe_pt_walk_shared
> > > Walking range. Arbitrary range won't work as you pointed out. To
> me
> > > this is a small
> > > Interface design issue. If you agree, I can re-parameterize
> > > xe_pt_walk_shared to take
> > > VMA vs addr/end. This way people won't make the same mistake
> in the
> > > future.
> > >
> >
> > Ah, no. SVM will use ranges so I think (addr, end) are the right
> > parameters for internal PT functions.
> >
> > > Anyway, I will follow the direction you give below to rework this
> > > series.
> > >
> >
> > +1
> >
> > Matt
> >
> > > Oak
> > >
> > > > - This series appears to be untested based on the points above.
> > > >
> > > > Therefore, instead of this series, I believe you will need to
> > > > fully
> > > > update the bind pipeline to process MAP GPU VMA operations
> here.
> > > >
> > > > So roughly...
> > > >
> > > > - Maybe include a bit in xe_vma_op_map that specifies
> "invalidate
> > > > on
> > > >   bind," set in vm_bind_ioctl_ops_create, since this will need to
> > > > be
> > > >   wired throughout the bind pipeline.
> > > > - Don't validate backing memory in this case.
> > > > - Ensure that xe_vma_ops_incr_pt_update_ops is called in this
> > > > case
> > > > for
> > > >   MAP operations, forcing entry into the xe_pt.c backend.
> > > > - Update xe_pt_stage_bind_walk with a variable that indicates
> > > > clearing
> > > >   the PTE. Instead of calling pte_encode_vma in
> > > > xe_pt_stage_bind_entry,
> > > >   set this variable for PT bind operations derived from MAP
> > > > operations
> > > >   that meet the "invalidate on bind" condition.
> > > > - Ensure needs_invalidation is set in struct
> > > > xe_vm_pgtable_update_ops if
> > > >   a MAP operation is included that meets the "invalidate on bind"
> > > >   condition.
> > > > - Set the VMA tile_invalidated in addition to tile_present for
> > > > MAP
> > > >   operations that meet the "invalidate on bind" condition.
> > > >
> > > > I might be missing some implementation details mentioned
> above,
> > > > but this
> > > > should provide you with some direction.
> > > >
> > > > Lastly, and perhaps most importantly, please test this using an
> > > > IGT
> > > > and
> > > > include the results in the next post.
> > > >
> > > > Matt
> > > >
> > > > > +
> > > > >  		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i],
> > > > obj_offset,
> > > > >  						  addr, range,
> > > > > op, flags,
> > > > >
> > > > > prefetch_region,
> > > > pat_index);
> > > > > --
> > > > > 2.26.3
> > > > >


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-01-30 15:19 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng
2025-01-28 22:21 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng
2025-01-28 23:05   ` Cavitt, Jonathan
2025-01-28 23:19   ` Matthew Brost
2025-01-29  5:04     ` Matthew Brost
2025-01-29  8:21       ` Thomas Hellström
2025-01-29 21:01     ` Zeng, Oak
2025-01-30  1:36       ` Matthew Brost
2025-01-30  8:36         ` Thomas Hellström
2025-01-30 15:18           ` Zeng, Oak
2025-01-28 22:21 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng
2025-01-28 23:05   ` Cavitt, Jonathan
2025-01-29  8:52   ` Thomas Hellström
2025-01-29 16:41     ` Matthew Brost
2025-01-28 23:04 ` [PATCH 1/3] drm/xe: Add a function to zap page table by address range Cavitt, Jonathan
2025-01-28 23:12 ` ✓ CI.Patch_applied: success for series starting with [1/3] " Patchwork
2025-01-28 23:12 ` ✓ CI.checkpatch: " Patchwork
2025-01-28 23:13 ` ✗ CI.KUnit: failure " Patchwork
2025-01-28 23:38 ` [PATCH 1/3] " Matthew Brost
2025-01-29  8:48 ` Thomas Hellström

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox