* [PATCH 1/3] drm/xe: Add a function to zap page table by address range
@ 2025-01-28 22:21 Oak Zeng
2025-01-28 22:21 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng
` (7 more replies)
0 siblings, 8 replies; 32+ messages in thread
From: Oak Zeng @ 2025-01-28 22:21 UTC (permalink / raw)
To: intel-xe; +Cc: joonas.lahtinen, Thomas.Hellstrom
Add a function xe_pt_zap_range. This is similar to xe_pt_zap_ptes
but is used when we don't have a vma to work with, such as zap
a range mapped to scratch page where we don't have vma.
Signed-off-by: Oak Zeng <oak.zeng@intel.com>
---
drivers/gpu/drm/xe/xe_pt.c | 28 ++++++++++++++++++++++++++++
drivers/gpu/drm/xe/xe_pt.h | 2 ++
2 files changed, 30 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 1ddcc7e79a93..2363260da6a6 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -792,6 +792,34 @@ static const struct xe_pt_walk_ops xe_pt_zap_ptes_ops = {
.pt_entry = xe_pt_zap_ptes_entry,
};
+/**
+ * xe_pt_zap_range() - Zap (zero) gpu ptes of an virtual address range
+ * @tile: The tile we're zapping for.
+ * @vm: The vm we're zapping for.
+ * @start: Start of the virtual address range, inclusive.
+ * @end: End of the virtual address range, exclusive.
+ *
+ * This is similar to xe_pt_zap_ptes() but it's used when we don't have a
+ * vma to work with. This is used for example when we're clearing the scratch
+ * page mapping during vm_bind.
+ *
+ */
+void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64 start, u64 end)
+{
+ struct xe_pt_zap_ptes_walk xe_walk = {
+ .base = {
+ .ops = &xe_pt_zap_ptes_ops,
+ .shifts = xe_normal_pt_shifts,
+ .max_level = XE_PT_HIGHEST_LEVEL,
+ },
+ .tile = tile,
+ };
+ struct xe_pt *pt = vm->pt_root[tile->id];
+
+ (void)xe_pt_walk_shared(&pt->base, pt->level, start,
+ end, &xe_walk.base);
+}
+
/**
* xe_pt_zap_ptes() - Zap (zero) gpu ptes of an address range
* @tile: The tile we're zapping for.
diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
index 9ab386431cad..b166b324f455 100644
--- a/drivers/gpu/drm/xe/xe_pt.h
+++ b/drivers/gpu/drm/xe/xe_pt.h
@@ -43,4 +43,6 @@ void xe_pt_update_ops_abort(struct xe_tile *tile, struct xe_vma_ops *vops);
bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
+void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64 start, u64 end);
+
#endif
--
2.26.3
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng @ 2025-01-28 22:21 ` Oak Zeng 2025-01-28 23:05 ` Cavitt, Jonathan 2025-01-28 23:19 ` Matthew Brost 2025-01-28 22:21 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng ` (6 subsequent siblings) 7 siblings, 2 replies; 32+ messages in thread From: Oak Zeng @ 2025-01-28 22:21 UTC (permalink / raw) To: intel-xe; +Cc: joonas.lahtinen, Thomas.Hellstrom When a vm runs under fault mode, if scratch page is enabled, we need to clear the scratch page mapping before vm_bind for the vm_bind address range. Under fault mode, we depend on recoverable page fault to establish mapping in page table. If scratch page is not cleared, GPU access of address won't cause page fault because it always hits the existing scratch page mapping. When vm_bind with IMMEDIATE flag, there is no need of clearing as immediate bind can overwrite the scratch page mapping. So far only is xe2 and xe3 products are allowed to enable scratch page under fault mode. On other platform we don't allow scratch page under fault mode, so no need of such clearing. Signed-off-by: Oak Zeng <oak.zeng@intel.com> --- drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 690330352d4c..196d347c6ac0 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -38,6 +38,7 @@ #include "xe_trace_bo.h" #include "xe_wa.h" #include "xe_hmm.h" +#include "i915_drv.h" static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm) { @@ -2917,6 +2918,34 @@ static int xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo *bo, return 0; } +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device *xe, + struct xe_vm *vm, u32 bind_flags) +{ + if (!xe_vm_in_fault_mode(vm)) + return false; + + if (!xe_vm_has_scratch(vm)) + return false; + + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) + return false; + + if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || IS_PANTHERLAKE(xe))) + return false; + + return true; +} + +static void __xe_vm_clear_scratch_pages(struct xe_device *xe, struct xe_vm *vm, + u64 start, u64 end) +{ + struct xe_tile *tile; + u8 id; + + for_each_tile(tile, xe, id) + xe_pt_zap_range(tile, vm, start, end); +} + int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { struct xe_device *xe = to_xe_device(dev); @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) u32 prefetch_region = bind_ops[i].prefetch_mem_region_instance; u16 pat_index = bind_ops[i].pat_index; + if (__xe_vm_needs_clear_scratch_pages(xe, vm, flags)) + __xe_vm_clear_scratch_pages(xe, vm, addr, addr + range); + ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset, addr, range, op, flags, prefetch_region, pat_index); -- 2.26.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* RE: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-01-28 22:21 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng @ 2025-01-28 23:05 ` Cavitt, Jonathan 2025-01-28 23:19 ` Matthew Brost 1 sibling, 0 replies; 32+ messages in thread From: Cavitt, Jonathan @ 2025-01-28 23:05 UTC (permalink / raw) To: Zeng, Oak, intel-xe@lists.freedesktop.org Cc: joonas.lahtinen@linux.intel.com, Thomas.Hellstrom@linux.intel.com, Cavitt, Jonathan -----Original Message----- From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Oak Zeng Sent: Tuesday, January 28, 2025 2:22 PM To: intel-xe@lists.freedesktop.org Cc: joonas.lahtinen@linux.intel.com; Thomas.Hellstrom@linux.intel.com Subject: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind > > When a vm runs under fault mode, if scratch page is enabled, we need > to clear the scratch page mapping before vm_bind for the vm_bind > address range. Under fault mode, we depend on recoverable page fault > to establish mapping in page table. If scratch page is not cleared, > GPU access of address won't cause page fault because it always hits > the existing scratch page mapping. > > When vm_bind with IMMEDIATE flag, there is no need of clearing as > immediate bind can overwrite the scratch page mapping. > > So far only is xe2 and xe3 products are allowed to enable scratch page > under fault mode. On other platform we don't allow scratch page under > fault mode, so no need of such clearing. > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > --- > drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++++++ > 1 file changed, 32 insertions(+) > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c > index 690330352d4c..196d347c6ac0 100644 > --- a/drivers/gpu/drm/xe/xe_vm.c > +++ b/drivers/gpu/drm/xe/xe_vm.c > @@ -38,6 +38,7 @@ > #include "xe_trace_bo.h" > #include "xe_wa.h" > #include "xe_hmm.h" > +#include "i915_drv.h" > > static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm) > { > @@ -2917,6 +2918,34 @@ static int xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo *bo, > return 0; > } > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device *xe, > + struct xe_vm *vm, u32 bind_flags) > +{ > + if (!xe_vm_in_fault_mode(vm)) > + return false; > + > + if (!xe_vm_has_scratch(vm)) > + return false; > + > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > + return false; > + > + if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || IS_PANTHERLAKE(xe))) We should probably break this platform check out into its own function or macro, especially since it's being mirrored in the next patch during xe_vm_create_ioctl. It'd also be easier to extend that way if future platforms also need this functionality. But other than that: Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> -Jonathan Cavitt > + return false; > + > + return true; > +} > + > +static void __xe_vm_clear_scratch_pages(struct xe_device *xe, struct xe_vm *vm, > + u64 start, u64 end) > +{ > + struct xe_tile *tile; > + u8 id; > + > + for_each_tile(tile, xe, id) > + xe_pt_zap_range(tile, vm, start, end); > +} > + > int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) > { > struct xe_device *xe = to_xe_device(dev); > @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) > u32 prefetch_region = bind_ops[i].prefetch_mem_region_instance; > u16 pat_index = bind_ops[i].pat_index; > > + if (__xe_vm_needs_clear_scratch_pages(xe, vm, flags)) > + __xe_vm_clear_scratch_pages(xe, vm, addr, addr + range); > + > ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset, > addr, range, op, flags, > prefetch_region, pat_index); > -- > 2.26.3 > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-01-28 22:21 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng 2025-01-28 23:05 ` Cavitt, Jonathan @ 2025-01-28 23:19 ` Matthew Brost 2025-01-29 5:04 ` Matthew Brost 2025-01-29 21:01 ` Zeng, Oak 1 sibling, 2 replies; 32+ messages in thread From: Matthew Brost @ 2025-01-28 23:19 UTC (permalink / raw) To: Oak Zeng; +Cc: intel-xe, joonas.lahtinen, Thomas.Hellstrom On Tue, Jan 28, 2025 at 05:21:44PM -0500, Oak Zeng wrote: > When a vm runs under fault mode, if scratch page is enabled, we need > to clear the scratch page mapping before vm_bind for the vm_bind > address range. Under fault mode, we depend on recoverable page fault > to establish mapping in page table. If scratch page is not cleared, > GPU access of address won't cause page fault because it always hits > the existing scratch page mapping. > > When vm_bind with IMMEDIATE flag, there is no need of clearing as > immediate bind can overwrite the scratch page mapping. > > So far only is xe2 and xe3 products are allowed to enable scratch page > under fault mode. On other platform we don't allow scratch page under > fault mode, so no need of such clearing. > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > --- > drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++++++ > 1 file changed, 32 insertions(+) > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c > index 690330352d4c..196d347c6ac0 100644 > --- a/drivers/gpu/drm/xe/xe_vm.c > +++ b/drivers/gpu/drm/xe/xe_vm.c > @@ -38,6 +38,7 @@ > #include "xe_trace_bo.h" > #include "xe_wa.h" > #include "xe_hmm.h" > +#include "i915_drv.h" > > static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm) > { > @@ -2917,6 +2918,34 @@ static int xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo *bo, > return 0; > } > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device *xe, > + struct xe_vm *vm, u32 bind_flags) > +{ > + if (!xe_vm_in_fault_mode(vm)) > + return false; > + > + if (!xe_vm_has_scratch(vm)) > + return false; > + > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > + return false; > + > + if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || IS_PANTHERLAKE(xe))) > + return false; > + > + return true; > +} > + > +static void __xe_vm_clear_scratch_pages(struct xe_device *xe, struct xe_vm *vm, > + u64 start, u64 end) > +{ > + struct xe_tile *tile; > + u8 id; > + > + for_each_tile(tile, xe, id) > + xe_pt_zap_range(tile, vm, start, end); > +} > + > int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) > { > struct xe_device *xe = to_xe_device(dev); > @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) > u32 prefetch_region = bind_ops[i].prefetch_mem_region_instance; > u16 pat_index = bind_ops[i].pat_index; > > + if (__xe_vm_needs_clear_scratch_pages(xe, vm, flags)) > + __xe_vm_clear_scratch_pages(xe, vm, addr, addr + range); A few things... - I believe this is only needed for bind user operations or internal MAP GPU VMA operations. - I believe a TLB invalidation will be required. - I don't think calling zap PTEs range works here, given how the scratch tables are set up (i.e., new PTEs need to be created pointing to an invalid state). - This series appears to be untested based on the points above. Therefore, instead of this series, I believe you will need to fully update the bind pipeline to process MAP GPU VMA operations here. So roughly... - Maybe include a bit in xe_vma_op_map that specifies "invalidate on bind," set in vm_bind_ioctl_ops_create, since this will need to be wired throughout the bind pipeline. - Don't validate backing memory in this case. - Ensure that xe_vma_ops_incr_pt_update_ops is called in this case for MAP operations, forcing entry into the xe_pt.c backend. - Update xe_pt_stage_bind_walk with a variable that indicates clearing the PTE. Instead of calling pte_encode_vma in xe_pt_stage_bind_entry, set this variable for PT bind operations derived from MAP operations that meet the "invalidate on bind" condition. - Ensure needs_invalidation is set in struct xe_vm_pgtable_update_ops if a MAP operation is included that meets the "invalidate on bind" condition. - Set the VMA tile_invalidated in addition to tile_present for MAP operations that meet the "invalidate on bind" condition. I might be missing some implementation details mentioned above, but this should provide you with some direction. Lastly, and perhaps most importantly, please test this using an IGT and include the results in the next post. Matt > + > ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset, > addr, range, op, flags, > prefetch_region, pat_index); > -- > 2.26.3 > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-01-28 23:19 ` Matthew Brost @ 2025-01-29 5:04 ` Matthew Brost 2025-01-29 8:21 ` Thomas Hellström 2025-01-29 21:01 ` Zeng, Oak 1 sibling, 1 reply; 32+ messages in thread From: Matthew Brost @ 2025-01-29 5:04 UTC (permalink / raw) To: Oak Zeng; +Cc: intel-xe, joonas.lahtinen, Thomas.Hellstrom On Tue, Jan 28, 2025 at 03:19:13PM -0800, Matthew Brost wrote: > On Tue, Jan 28, 2025 at 05:21:44PM -0500, Oak Zeng wrote: > > When a vm runs under fault mode, if scratch page is enabled, we need > > to clear the scratch page mapping before vm_bind for the vm_bind > > address range. Under fault mode, we depend on recoverable page fault > > to establish mapping in page table. If scratch page is not cleared, > > GPU access of address won't cause page fault because it always hits > > the existing scratch page mapping. > > > > When vm_bind with IMMEDIATE flag, there is no need of clearing as > > immediate bind can overwrite the scratch page mapping. > > > > So far only is xe2 and xe3 products are allowed to enable scratch page > > under fault mode. On other platform we don't allow scratch page under > > fault mode, so no need of such clearing. > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > --- > > drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++++++ > > 1 file changed, 32 insertions(+) > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c > > index 690330352d4c..196d347c6ac0 100644 > > --- a/drivers/gpu/drm/xe/xe_vm.c > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > @@ -38,6 +38,7 @@ > > #include "xe_trace_bo.h" > > #include "xe_wa.h" > > #include "xe_hmm.h" > > +#include "i915_drv.h" > > > > static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm) > > { > > @@ -2917,6 +2918,34 @@ static int xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo *bo, > > return 0; > > } > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device *xe, > > + struct xe_vm *vm, u32 bind_flags) > > +{ > > + if (!xe_vm_in_fault_mode(vm)) > > + return false; > > + > > + if (!xe_vm_has_scratch(vm)) > > + return false; > > + > > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > + return false; > > + > > + if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || IS_PANTHERLAKE(xe))) > > + return false; > > + > > + return true; > > +} > > + > > +static void __xe_vm_clear_scratch_pages(struct xe_device *xe, struct xe_vm *vm, > > + u64 start, u64 end) > > +{ > > + struct xe_tile *tile; > > + u8 id; > > + > > + for_each_tile(tile, xe, id) > > + xe_pt_zap_range(tile, vm, start, end); > > +} > > + > > int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) > > { > > struct xe_device *xe = to_xe_device(dev); > > @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) > > u32 prefetch_region = bind_ops[i].prefetch_mem_region_instance; > > u16 pat_index = bind_ops[i].pat_index; > > > > + if (__xe_vm_needs_clear_scratch_pages(xe, vm, flags)) > > + __xe_vm_clear_scratch_pages(xe, vm, addr, addr + range); > > A few things... > > - I believe this is only needed for bind user operations or internal MAP > GPU VMA operations. > - I believe a TLB invalidation will be required. > - I don't think calling zap PTEs range works here, given how the scratch > tables are set up (i.e., new PTEs need to be created pointing to an > invalid state). > - This series appears to be untested based on the points above. > > Therefore, instead of this series, I believe you will need to fully > update the bind pipeline to process MAP GPU VMA operations here. > > So roughly... > > - Maybe include a bit in xe_vma_op_map that specifies "invalidate on > bind," set in vm_bind_ioctl_ops_create, since this will need to be > wired throughout the bind pipeline. > - Don't validate backing memory in this case. > - Ensure that xe_vma_ops_incr_pt_update_ops is called in this case for > MAP operations, forcing entry into the xe_pt.c backend. > - Update xe_pt_stage_bind_walk with a variable that indicates clearing > the PTE. Instead of calling pte_encode_vma in xe_pt_stage_bind_entry, > set this variable for PT bind operations derived from MAP operations > that meet the "invalidate on bind" condition. Ugh, typos / nonsense here. Let me try again. "Update xe_pt_stage_bind_walk with a variable that indicates clearing the PTE, set this for MAP ops that meet the 'invalidate on bind' condition. When this variable is set, instead of calling pte_encode_vma in xe_pt_stage_bind_entry, call a function which encodes an invalid PTE." Matt > - Ensure needs_invalidation is set in struct xe_vm_pgtable_update_ops if > a MAP operation is included that meets the "invalidate on bind" > condition. > - Set the VMA tile_invalidated in addition to tile_present for MAP > operations that meet the "invalidate on bind" condition. > > I might be missing some implementation details mentioned above, but this > should provide you with some direction. > > Lastly, and perhaps most importantly, please test this using an IGT and > include the results in the next post. > > Matt > > > + > > ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset, > > addr, range, op, flags, > > prefetch_region, pat_index); > > -- > > 2.26.3 > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-01-29 5:04 ` Matthew Brost @ 2025-01-29 8:21 ` Thomas Hellström 0 siblings, 0 replies; 32+ messages in thread From: Thomas Hellström @ 2025-01-29 8:21 UTC (permalink / raw) To: Matthew Brost, Oak Zeng; +Cc: intel-xe, joonas.lahtinen On Tue, 2025-01-28 at 21:04 -0800, Matthew Brost wrote: > On Tue, Jan 28, 2025 at 03:19:13PM -0800, Matthew Brost wrote: > > On Tue, Jan 28, 2025 at 05:21:44PM -0500, Oak Zeng wrote: > > > When a vm runs under fault mode, if scratch page is enabled, we > > > need > > > to clear the scratch page mapping before vm_bind for the vm_bind > > > address range. Under fault mode, we depend on recoverable page > > > fault > > > to establish mapping in page table. If scratch page is not > > > cleared, > > > GPU access of address won't cause page fault because it always > > > hits > > > the existing scratch page mapping. > > > > > > When vm_bind with IMMEDIATE flag, there is no need of clearing as > > > immediate bind can overwrite the scratch page mapping. > > > > > > So far only is xe2 and xe3 products are allowed to enable scratch > > > page > > > under fault mode. On other platform we don't allow scratch page > > > under > > > fault mode, so no need of such clearing. > > > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > > --- > > > drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++++++ > > > 1 file changed, 32 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > > > b/drivers/gpu/drm/xe/xe_vm.c > > > index 690330352d4c..196d347c6ac0 100644 > > > --- a/drivers/gpu/drm/xe/xe_vm.c > > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > > @@ -38,6 +38,7 @@ > > > #include "xe_trace_bo.h" > > > #include "xe_wa.h" > > > #include "xe_hmm.h" > > > +#include "i915_drv.h" > > > > > > static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm) > > > { > > > @@ -2917,6 +2918,34 @@ static int > > > xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo > > > *bo, > > > return 0; > > > } > > > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device > > > *xe, > > > + struct xe_vm *vm, > > > u32 bind_flags) > > > +{ > > > + if (!xe_vm_in_fault_mode(vm)) > > > + return false; > > > + > > > + if (!xe_vm_has_scratch(vm)) > > > + return false; > > > + > > > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > > + return false; > > > + > > > + if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || > > > IS_PANTHERLAKE(xe))) > > > + return false; > > > + > > > + return true; > > > +} > > > + > > > +static void __xe_vm_clear_scratch_pages(struct xe_device *xe, > > > struct xe_vm *vm, > > > + u64 start, u64 end) > > > +{ > > > + struct xe_tile *tile; > > > + u8 id; > > > + > > > + for_each_tile(tile, xe, id) > > > + xe_pt_zap_range(tile, vm, start, end); > > > +} > > > + > > > int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct > > > drm_file *file) > > > { > > > struct xe_device *xe = to_xe_device(dev); > > > @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device > > > *dev, void *data, struct drm_file *file) > > > u32 prefetch_region = > > > bind_ops[i].prefetch_mem_region_instance; > > > u16 pat_index = bind_ops[i].pat_index; > > > > > > + if (__xe_vm_needs_clear_scratch_pages(xe, vm, > > > flags)) > > > + __xe_vm_clear_scratch_pages(xe, vm, > > > addr, addr + range); > > > > A few things... > > > > - I believe this is only needed for bind user operations or > > internal MAP > > GPU VMA operations. > > - I believe a TLB invalidation will be required. > > - I don't think calling zap PTEs range works here, given how the > > scratch > > tables are set up (i.e., new PTEs need to be created pointing to > > an > > invalid state). > > - This series appears to be untested based on the points above. > > > > Therefore, instead of this series, I believe you will need to fully > > update the bind pipeline to process MAP GPU VMA operations here. > > > > So roughly... > > > > - Maybe include a bit in xe_vma_op_map that specifies "invalidate > > on > > bind," set in vm_bind_ioctl_ops_create, since this will need to > > be > > wired throughout the bind pipeline. > > - Don't validate backing memory in this case. > > - Ensure that xe_vma_ops_incr_pt_update_ops is called in this case > > for > > MAP operations, forcing entry into the xe_pt.c backend. > > - Update xe_pt_stage_bind_walk with a variable that indicates > > clearing > > the PTE. Instead of calling pte_encode_vma in > > xe_pt_stage_bind_entry, > > set this variable for PT bind operations derived from MAP > > operations > > that meet the "invalidate on bind" condition. > > Ugh, typos / nonsense here. Let me try again. > > "Update xe_pt_stage_bind_walk with a variable that indicates clearing > the PTE, set this for MAP ops that meet the 'invalidate on bind' > condition. When this variable is set, instead of calling > pte_encode_vma > in xe_pt_stage_bind_entry, call a function which encodes an invalid > PTE." > > Matt I agree with Matt here. This needs to look just like a regular bind, with the exception that PTEs are cleared instead of pointing towards valid pages. /Thomas ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-01-28 23:19 ` Matthew Brost 2025-01-29 5:04 ` Matthew Brost @ 2025-01-29 21:01 ` Zeng, Oak 2025-01-30 1:36 ` Matthew Brost 1 sibling, 1 reply; 32+ messages in thread From: Zeng, Oak @ 2025-01-29 21:01 UTC (permalink / raw) To: Brost, Matthew Cc: intel-xe@lists.freedesktop.org, joonas.lahtinen@linux.intel.com, Thomas.Hellstrom@linux.intel.com > -----Original Message----- > From: Brost, Matthew <matthew.brost@intel.com> > Sent: January 28, 2025 6:19 PM > To: Zeng, Oak <oak.zeng@intel.com> > Cc: intel-xe@lists.freedesktop.org; joonas.lahtinen@linux.intel.com; > Thomas.Hellstrom@linux.intel.com > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind > > On Tue, Jan 28, 2025 at 05:21:44PM -0500, Oak Zeng wrote: > > When a vm runs under fault mode, if scratch page is enabled, we > need > > to clear the scratch page mapping before vm_bind for the vm_bind > > address range. Under fault mode, we depend on recoverable page > fault > > to establish mapping in page table. If scratch page is not cleared, > > GPU access of address won't cause page fault because it always hits > > the existing scratch page mapping. > > > > When vm_bind with IMMEDIATE flag, there is no need of clearing as > > immediate bind can overwrite the scratch page mapping. > > > > So far only is xe2 and xe3 products are allowed to enable scratch > page > > under fault mode. On other platform we don't allow scratch page > under > > fault mode, so no need of such clearing. > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > --- > > drivers/gpu/drm/xe/xe_vm.c | 32 > ++++++++++++++++++++++++++++++++ > > 1 file changed, 32 insertions(+) > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > b/drivers/gpu/drm/xe/xe_vm.c > > index 690330352d4c..196d347c6ac0 100644 > > --- a/drivers/gpu/drm/xe/xe_vm.c > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > @@ -38,6 +38,7 @@ > > #include "xe_trace_bo.h" > > #include "xe_wa.h" > > #include "xe_hmm.h" > > +#include "i915_drv.h" > > > > static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm) > > { > > @@ -2917,6 +2918,34 @@ static int > xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo > *bo, > > return 0; > > } > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device > *xe, > > + struct xe_vm *vm, u32 > bind_flags) > > +{ > > + if (!xe_vm_in_fault_mode(vm)) > > + return false; > > + > > + if (!xe_vm_has_scratch(vm)) > > + return false; > > + > > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > + return false; > > + > > + if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || > IS_PANTHERLAKE(xe))) > > + return false; > > + > > + return true; > > +} > > + > > +static void __xe_vm_clear_scratch_pages(struct xe_device *xe, > struct xe_vm *vm, > > + u64 start, u64 end) > > +{ > > + struct xe_tile *tile; > > + u8 id; > > + > > + for_each_tile(tile, xe, id) > > + xe_pt_zap_range(tile, vm, start, end); > > +} > > + > > int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct > drm_file *file) > > { > > struct xe_device *xe = to_xe_device(dev); > > @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device > *dev, void *data, struct drm_file *file) > > u32 prefetch_region = > bind_ops[i].prefetch_mem_region_instance; > > u16 pat_index = bind_ops[i].pat_index; > > > > + if (__xe_vm_needs_clear_scratch_pages(xe, vm, > flags)) > > + __xe_vm_clear_scratch_pages(xe, vm, addr, > addr + range); > > A few things... > > - I believe this is only needed for bind user operations or internal MAP > GPU VMA operations. Did you mean this is only needed for user bind, but not for internal map? > - I believe a TLB invalidation will be required. My understanding is, NULL pte won't be cached in TLB, so TLB invalidation is not needed. > - I don't think calling zap PTEs range works here, given how the scratch > tables are set up (i.e., new PTEs need to be created pointing to an > invalid state). You are right. I didn't realize that using xe_pt_walk_shared to walk and zap PTEs has a limitation: For the virtual address range we want to zap, all the page tables has to be already exist. This interface doesn't create new page tables. Even though xe_pt_walk_shared takes a range parameter (addr, end), range parameter can't be arbitrary. Today only [xe_vma_start, xe_vma_end) is used to specify the xe_pt_walk_shared Walking range. Arbitrary range won't work as you pointed out. To me this is a small Interface design issue. If you agree, I can re-parameterize xe_pt_walk_shared to take VMA vs addr/end. This way people won't make the same mistake in the future. Anyway, I will follow the direction you give below to rework this series. Oak > - This series appears to be untested based on the points above. > > Therefore, instead of this series, I believe you will need to fully > update the bind pipeline to process MAP GPU VMA operations here. > > So roughly... > > - Maybe include a bit in xe_vma_op_map that specifies "invalidate on > bind," set in vm_bind_ioctl_ops_create, since this will need to be > wired throughout the bind pipeline. > - Don't validate backing memory in this case. > - Ensure that xe_vma_ops_incr_pt_update_ops is called in this case > for > MAP operations, forcing entry into the xe_pt.c backend. > - Update xe_pt_stage_bind_walk with a variable that indicates > clearing > the PTE. Instead of calling pte_encode_vma in > xe_pt_stage_bind_entry, > set this variable for PT bind operations derived from MAP operations > that meet the "invalidate on bind" condition. > - Ensure needs_invalidation is set in struct > xe_vm_pgtable_update_ops if > a MAP operation is included that meets the "invalidate on bind" > condition. > - Set the VMA tile_invalidated in addition to tile_present for MAP > operations that meet the "invalidate on bind" condition. > > I might be missing some implementation details mentioned above, > but this > should provide you with some direction. > > Lastly, and perhaps most importantly, please test this using an IGT > and > include the results in the next post. > > Matt > > > + > > ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], > obj_offset, > > addr, range, op, flags, > > prefetch_region, > pat_index); > > -- > > 2.26.3 > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-01-29 21:01 ` Zeng, Oak @ 2025-01-30 1:36 ` Matthew Brost 2025-01-30 8:36 ` Thomas Hellström 0 siblings, 1 reply; 32+ messages in thread From: Matthew Brost @ 2025-01-30 1:36 UTC (permalink / raw) To: Zeng, Oak Cc: intel-xe@lists.freedesktop.org, joonas.lahtinen@linux.intel.com, Thomas.Hellstrom@linux.intel.com On Wed, Jan 29, 2025 at 02:01:38PM -0700, Zeng, Oak wrote: > > > > -----Original Message----- > > From: Brost, Matthew <matthew.brost@intel.com> > > Sent: January 28, 2025 6:19 PM > > To: Zeng, Oak <oak.zeng@intel.com> > > Cc: intel-xe@lists.freedesktop.org; joonas.lahtinen@linux.intel.com; > > Thomas.Hellstrom@linux.intel.com > > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind > > > > On Tue, Jan 28, 2025 at 05:21:44PM -0500, Oak Zeng wrote: > > > When a vm runs under fault mode, if scratch page is enabled, we > > need > > > to clear the scratch page mapping before vm_bind for the vm_bind > > > address range. Under fault mode, we depend on recoverable page > > fault > > > to establish mapping in page table. If scratch page is not cleared, > > > GPU access of address won't cause page fault because it always hits > > > the existing scratch page mapping. > > > > > > When vm_bind with IMMEDIATE flag, there is no need of clearing as > > > immediate bind can overwrite the scratch page mapping. > > > > > > So far only is xe2 and xe3 products are allowed to enable scratch > > page > > > under fault mode. On other platform we don't allow scratch page > > under > > > fault mode, so no need of such clearing. > > > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > > --- > > > drivers/gpu/drm/xe/xe_vm.c | 32 > > ++++++++++++++++++++++++++++++++ > > > 1 file changed, 32 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > > b/drivers/gpu/drm/xe/xe_vm.c > > > index 690330352d4c..196d347c6ac0 100644 > > > --- a/drivers/gpu/drm/xe/xe_vm.c > > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > > @@ -38,6 +38,7 @@ > > > #include "xe_trace_bo.h" > > > #include "xe_wa.h" > > > #include "xe_hmm.h" > > > +#include "i915_drv.h" > > > > > > static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm) > > > { > > > @@ -2917,6 +2918,34 @@ static int > > xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo > > *bo, > > > return 0; > > > } > > > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device > > *xe, > > > + struct xe_vm *vm, u32 > > bind_flags) > > > +{ > > > + if (!xe_vm_in_fault_mode(vm)) > > > + return false; > > > + > > > + if (!xe_vm_has_scratch(vm)) > > > + return false; > > > + > > > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > > + return false; > > > + > > > + if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || > > IS_PANTHERLAKE(xe))) > > > + return false; > > > + > > > + return true; > > > +} > > > + > > > +static void __xe_vm_clear_scratch_pages(struct xe_device *xe, > > struct xe_vm *vm, > > > + u64 start, u64 end) > > > +{ > > > + struct xe_tile *tile; > > > + u8 id; > > > + > > > + for_each_tile(tile, xe, id) > > > + xe_pt_zap_range(tile, vm, start, end); > > > +} > > > + > > > int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct > > drm_file *file) > > > { > > > struct xe_device *xe = to_xe_device(dev); > > > @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device > > *dev, void *data, struct drm_file *file) > > > u32 prefetch_region = > > bind_ops[i].prefetch_mem_region_instance; > > > u16 pat_index = bind_ops[i].pat_index; > > > > > > + if (__xe_vm_needs_clear_scratch_pages(xe, vm, > > flags)) > > > + __xe_vm_clear_scratch_pages(xe, vm, addr, > > addr + range); > > > > A few things... > > > > - I believe this is only needed for bind user operations or internal MAP > > GPU VMA operations. > > Did you mean this is only needed for user bind, but not for internal map? > Needed for user bind and also for internal MAP. With updating the bind pipeline you will only care MAP ops. > > - I believe a TLB invalidation will be required. > > My understanding is, NULL pte won't be cached in TLB, so TLB invalidation is not needed. > I would think NULL ptes are cached but not certain either way. > > - I don't think calling zap PTEs range works here, given how the scratch > > tables are set up (i.e., new PTEs need to be created pointing to an > > invalid state). > > You are right. > > I didn't realize that using xe_pt_walk_shared to walk and zap PTEs has a limitation: > For the virtual address range we want to zap, all the page tables has to be already > exist. This interface doesn't create new page tables. Even though xe_pt_walk_shared > takes a range parameter (addr, end), range parameter can't be arbitrary. > > Today only [xe_vma_start, xe_vma_end) is used to specify the xe_pt_walk_shared > Walking range. Arbitrary range won't work as you pointed out. To me this is a small > Interface design issue. If you agree, I can re-parameterize xe_pt_walk_shared to take > VMA vs addr/end. This way people won't make the same mistake in the future. > Ah, no. SVM will use ranges so I think (addr, end) are the right parameters for internal PT functions. > Anyway, I will follow the direction you give below to rework this series. > +1 Matt > Oak > > > - This series appears to be untested based on the points above. > > > > Therefore, instead of this series, I believe you will need to fully > > update the bind pipeline to process MAP GPU VMA operations here. > > > > So roughly... > > > > - Maybe include a bit in xe_vma_op_map that specifies "invalidate on > > bind," set in vm_bind_ioctl_ops_create, since this will need to be > > wired throughout the bind pipeline. > > - Don't validate backing memory in this case. > > - Ensure that xe_vma_ops_incr_pt_update_ops is called in this case > > for > > MAP operations, forcing entry into the xe_pt.c backend. > > - Update xe_pt_stage_bind_walk with a variable that indicates > > clearing > > the PTE. Instead of calling pte_encode_vma in > > xe_pt_stage_bind_entry, > > set this variable for PT bind operations derived from MAP operations > > that meet the "invalidate on bind" condition. > > - Ensure needs_invalidation is set in struct > > xe_vm_pgtable_update_ops if > > a MAP operation is included that meets the "invalidate on bind" > > condition. > > - Set the VMA tile_invalidated in addition to tile_present for MAP > > operations that meet the "invalidate on bind" condition. > > > > I might be missing some implementation details mentioned above, > > but this > > should provide you with some direction. > > > > Lastly, and perhaps most importantly, please test this using an IGT > > and > > include the results in the next post. > > > > Matt > > > > > + > > > ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], > > obj_offset, > > > addr, range, op, flags, > > > prefetch_region, > > pat_index); > > > -- > > > 2.26.3 > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-01-30 1:36 ` Matthew Brost @ 2025-01-30 8:36 ` Thomas Hellström 2025-01-30 15:18 ` Zeng, Oak 0 siblings, 1 reply; 32+ messages in thread From: Thomas Hellström @ 2025-01-30 8:36 UTC (permalink / raw) To: Matthew Brost, Zeng, Oak Cc: intel-xe@lists.freedesktop.org, joonas.lahtinen@linux.intel.com On Wed, 2025-01-29 at 17:36 -0800, Matthew Brost wrote: > On Wed, Jan 29, 2025 at 02:01:38PM -0700, Zeng, Oak wrote: > > > > > > > -----Original Message----- > > > From: Brost, Matthew <matthew.brost@intel.com> > > > Sent: January 28, 2025 6:19 PM > > > To: Zeng, Oak <oak.zeng@intel.com> > > > Cc: intel-xe@lists.freedesktop.org; > > > joonas.lahtinen@linux.intel.com; > > > Thomas.Hellstrom@linux.intel.com > > > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before > > > vm_bind > > > > > > On Tue, Jan 28, 2025 at 05:21:44PM -0500, Oak Zeng wrote: > > > > When a vm runs under fault mode, if scratch page is enabled, we > > > need > > > > to clear the scratch page mapping before vm_bind for the > > > > vm_bind > > > > address range. Under fault mode, we depend on recoverable page > > > fault > > > > to establish mapping in page table. If scratch page is not > > > > cleared, > > > > GPU access of address won't cause page fault because it always > > > > hits > > > > the existing scratch page mapping. > > > > > > > > When vm_bind with IMMEDIATE flag, there is no need of clearing > > > > as > > > > immediate bind can overwrite the scratch page mapping. > > > > > > > > So far only is xe2 and xe3 products are allowed to enable > > > > scratch > > > page > > > > under fault mode. On other platform we don't allow scratch page > > > under > > > > fault mode, so no need of such clearing. > > > > > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > > > --- > > > > drivers/gpu/drm/xe/xe_vm.c | 32 > > > ++++++++++++++++++++++++++++++++ > > > > 1 file changed, 32 insertions(+) > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > > > b/drivers/gpu/drm/xe/xe_vm.c > > > > index 690330352d4c..196d347c6ac0 100644 > > > > --- a/drivers/gpu/drm/xe/xe_vm.c > > > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > > > @@ -38,6 +38,7 @@ > > > > #include "xe_trace_bo.h" > > > > #include "xe_wa.h" > > > > #include "xe_hmm.h" > > > > +#include "i915_drv.h" > > > > > > > > static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm) > > > > { > > > > @@ -2917,6 +2918,34 @@ static int > > > xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo > > > *bo, > > > > return 0; > > > > } > > > > > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_device > > > *xe, > > > > + struct xe_vm > > > > *vm, u32 > > > bind_flags) > > > > +{ > > > > + if (!xe_vm_in_fault_mode(vm)) > > > > + return false; > > > > + > > > > + if (!xe_vm_has_scratch(vm)) > > > > + return false; > > > > + > > > > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > > > + return false; > > > > + > > > > + if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || > > > IS_PANTHERLAKE(xe))) > > > > + return false; > > > > + > > > > + return true; > > > > +} > > > > + > > > > +static void __xe_vm_clear_scratch_pages(struct xe_device *xe, > > > struct xe_vm *vm, > > > > + u64 start, u64 end) > > > > +{ > > > > + struct xe_tile *tile; > > > > + u8 id; > > > > + > > > > + for_each_tile(tile, xe, id) > > > > + xe_pt_zap_range(tile, vm, start, end); > > > > +} > > > > + > > > > int xe_vm_bind_ioctl(struct drm_device *dev, void *data, > > > > struct > > > drm_file *file) > > > > { > > > > struct xe_device *xe = to_xe_device(dev); > > > > @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct drm_device > > > *dev, void *data, struct drm_file *file) > > > > u32 prefetch_region = > > > bind_ops[i].prefetch_mem_region_instance; > > > > u16 pat_index = bind_ops[i].pat_index; > > > > > > > > + if (__xe_vm_needs_clear_scratch_pages(xe, vm, > > > flags)) > > > > + __xe_vm_clear_scratch_pages(xe, vm, > > > > addr, > > > addr + range); > > > > > > A few things... > > > > > > - I believe this is only needed for bind user operations or > > > internal MAP > > > GPU VMA operations. > > > > Did you mean this is only needed for user bind, but not for > > internal map? > > > > Needed for user bind and also for internal MAP. With updating the > bind > pipeline you will only care MAP ops. > > > > - I believe a TLB invalidation will be required. > > > > My understanding is, NULL pte won't be cached in TLB, so TLB > > invalidation is not needed. > > > > I would think NULL ptes are cached but not certain either way. Note that a NULL / scratch mapping for larger pages may point to a physical page. /Thomas > > > > - I don't think calling zap PTEs range works here, given how the > > > scratch > > > tables are set up (i.e., new PTEs need to be created pointing > > > to an > > > invalid state). > > > > You are right. > > > > I didn't realize that using xe_pt_walk_shared to walk and zap PTEs > > has a limitation: > > For the virtual address range we want to zap, all the page tables > > has to be already > > exist. This interface doesn't create new page tables. Even though > > xe_pt_walk_shared > > takes a range parameter (addr, end), range parameter can't be > > arbitrary. > > > > Today only [xe_vma_start, xe_vma_end) is used to specify the > > xe_pt_walk_shared > > Walking range. Arbitrary range won't work as you pointed out. To me > > this is a small > > Interface design issue. If you agree, I can re-parameterize > > xe_pt_walk_shared to take > > VMA vs addr/end. This way people won't make the same mistake in the > > future. > > > > Ah, no. SVM will use ranges so I think (addr, end) are the right > parameters for internal PT functions. > > > Anyway, I will follow the direction you give below to rework this > > series. > > > > +1 > > Matt > > > Oak > > > > > - This series appears to be untested based on the points above. > > > > > > Therefore, instead of this series, I believe you will need to > > > fully > > > update the bind pipeline to process MAP GPU VMA operations here. > > > > > > So roughly... > > > > > > - Maybe include a bit in xe_vma_op_map that specifies "invalidate > > > on > > > bind," set in vm_bind_ioctl_ops_create, since this will need to > > > be > > > wired throughout the bind pipeline. > > > - Don't validate backing memory in this case. > > > - Ensure that xe_vma_ops_incr_pt_update_ops is called in this > > > case > > > for > > > MAP operations, forcing entry into the xe_pt.c backend. > > > - Update xe_pt_stage_bind_walk with a variable that indicates > > > clearing > > > the PTE. Instead of calling pte_encode_vma in > > > xe_pt_stage_bind_entry, > > > set this variable for PT bind operations derived from MAP > > > operations > > > that meet the "invalidate on bind" condition. > > > - Ensure needs_invalidation is set in struct > > > xe_vm_pgtable_update_ops if > > > a MAP operation is included that meets the "invalidate on bind" > > > condition. > > > - Set the VMA tile_invalidated in addition to tile_present for > > > MAP > > > operations that meet the "invalidate on bind" condition. > > > > > > I might be missing some implementation details mentioned above, > > > but this > > > should provide you with some direction. > > > > > > Lastly, and perhaps most importantly, please test this using an > > > IGT > > > and > > > include the results in the next post. > > > > > > Matt > > > > > > > + > > > > ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], > > > obj_offset, > > > > addr, range, > > > > op, flags, > > > > > > > > prefetch_region, > > > pat_index); > > > > -- > > > > 2.26.3 > > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-01-30 8:36 ` Thomas Hellström @ 2025-01-30 15:18 ` Zeng, Oak 0 siblings, 0 replies; 32+ messages in thread From: Zeng, Oak @ 2025-01-30 15:18 UTC (permalink / raw) To: Thomas Hellström, Brost, Matthew Cc: intel-xe@lists.freedesktop.org, joonas.lahtinen@linux.intel.com > -----Original Message----- > From: Thomas Hellström <thomas.hellstrom@linux.intel.com> > Sent: January 30, 2025 3:37 AM > To: Brost, Matthew <matthew.brost@intel.com>; Zeng, Oak > <oak.zeng@intel.com> > Cc: intel-xe@lists.freedesktop.org; joonas.lahtinen@linux.intel.com > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind > > On Wed, 2025-01-29 at 17:36 -0800, Matthew Brost wrote: > > On Wed, Jan 29, 2025 at 02:01:38PM -0700, Zeng, Oak wrote: > > > > > > > > > > -----Original Message----- > > > > From: Brost, Matthew <matthew.brost@intel.com> > > > > Sent: January 28, 2025 6:19 PM > > > > To: Zeng, Oak <oak.zeng@intel.com> > > > > Cc: intel-xe@lists.freedesktop.org; > > > > joonas.lahtinen@linux.intel.com; > > > > Thomas.Hellstrom@linux.intel.com > > > > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before > > > > vm_bind > > > > > > > > On Tue, Jan 28, 2025 at 05:21:44PM -0500, Oak Zeng wrote: > > > > > When a vm runs under fault mode, if scratch page is enabled, > we > > > > need > > > > > to clear the scratch page mapping before vm_bind for the > > > > > vm_bind > > > > > address range. Under fault mode, we depend on recoverable > page > > > > fault > > > > > to establish mapping in page table. If scratch page is not > > > > > cleared, > > > > > GPU access of address won't cause page fault because it > always > > > > > hits > > > > > the existing scratch page mapping. > > > > > > > > > > When vm_bind with IMMEDIATE flag, there is no need of > clearing > > > > > as > > > > > immediate bind can overwrite the scratch page mapping. > > > > > > > > > > So far only is xe2 and xe3 products are allowed to enable > > > > > scratch > > > > page > > > > > under fault mode. On other platform we don't allow scratch > page > > > > under > > > > > fault mode, so no need of such clearing. > > > > > > > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > > > > --- > > > > > drivers/gpu/drm/xe/xe_vm.c | 32 > > > > ++++++++++++++++++++++++++++++++ > > > > > 1 file changed, 32 insertions(+) > > > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > > > > b/drivers/gpu/drm/xe/xe_vm.c > > > > > index 690330352d4c..196d347c6ac0 100644 > > > > > --- a/drivers/gpu/drm/xe/xe_vm.c > > > > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > > > > @@ -38,6 +38,7 @@ > > > > > #include "xe_trace_bo.h" > > > > > #include "xe_wa.h" > > > > > #include "xe_hmm.h" > > > > > +#include "i915_drv.h" > > > > > > > > > > static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm) > > > > > { > > > > > @@ -2917,6 +2918,34 @@ static int > > > > xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct > xe_bo > > > > *bo, > > > > > return 0; > > > > > } > > > > > > > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct > xe_device > > > > *xe, > > > > > + struct xe_vm > > > > > *vm, u32 > > > > bind_flags) > > > > > +{ > > > > > + if (!xe_vm_in_fault_mode(vm)) > > > > > + return false; > > > > > + > > > > > + if (!xe_vm_has_scratch(vm)) > > > > > + return false; > > > > > + > > > > > + if (bind_flags & > DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > > > > + return false; > > > > > + > > > > > + if (!(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || > > > > IS_PANTHERLAKE(xe))) > > > > > + return false; > > > > > + > > > > > + return true; > > > > > +} > > > > > + > > > > > +static void __xe_vm_clear_scratch_pages(struct xe_device > *xe, > > > > struct xe_vm *vm, > > > > > + u64 start, u64 end) > > > > > +{ > > > > > + struct xe_tile *tile; > > > > > + u8 id; > > > > > + > > > > > + for_each_tile(tile, xe, id) > > > > > + xe_pt_zap_range(tile, vm, start, end); > > > > > +} > > > > > + > > > > > int xe_vm_bind_ioctl(struct drm_device *dev, void *data, > > > > > struct > > > > drm_file *file) > > > > > { > > > > > struct xe_device *xe = to_xe_device(dev); > > > > > @@ -3062,6 +3091,9 @@ int xe_vm_bind_ioctl(struct > drm_device > > > > *dev, void *data, struct drm_file *file) > > > > > u32 prefetch_region = > > > > bind_ops[i].prefetch_mem_region_instance; > > > > > u16 pat_index = bind_ops[i].pat_index; > > > > > > > > > > + if (__xe_vm_needs_clear_scratch_pages(xe, > vm, > > > > flags)) > > > > > + __xe_vm_clear_scratch_pages(xe, > vm, > > > > > addr, > > > > addr + range); > > > > > > > > A few things... > > > > > > > > - I believe this is only needed for bind user operations or > > > > internal MAP > > > > GPU VMA operations. > > > > > > Did you mean this is only needed for user bind, but not for > > > internal map? > > > > > > > Needed for user bind and also for internal MAP. With updating the > > bind > > pipeline you will only care MAP ops. > > > > > > - I believe a TLB invalidation will be required. > > > > > > My understanding is, NULL pte won't be cached in TLB, so TLB > > > invalidation is not needed. > > > > > > > I would think NULL ptes are cached but not certain either way. > > Note that a NULL / scratch mapping for larger pages may point to a > physical page. Thanks for pointing this out. Looking at page table definition, some high Level page table indeed doesn't support NULL entry. So yes TLB invalidation is required. Oak > > /Thomas > > > > > > > > - I don't think calling zap PTEs range works here, given how the > > > > scratch > > > > tables are set up (i.e., new PTEs need to be created pointing > > > > to an > > > > invalid state). > > > > > > You are right. > > > > > > I didn't realize that using xe_pt_walk_shared to walk and zap PTEs > > > has a limitation: > > > For the virtual address range we want to zap, all the page tables > > > has to be already > > > exist. This interface doesn't create new page tables. Even though > > > xe_pt_walk_shared > > > takes a range parameter (addr, end), range parameter can't be > > > arbitrary. > > > > > > Today only [xe_vma_start, xe_vma_end) is used to specify the > > > xe_pt_walk_shared > > > Walking range. Arbitrary range won't work as you pointed out. To > me > > > this is a small > > > Interface design issue. If you agree, I can re-parameterize > > > xe_pt_walk_shared to take > > > VMA vs addr/end. This way people won't make the same mistake > in the > > > future. > > > > > > > Ah, no. SVM will use ranges so I think (addr, end) are the right > > parameters for internal PT functions. > > > > > Anyway, I will follow the direction you give below to rework this > > > series. > > > > > > > +1 > > > > Matt > > > > > Oak > > > > > > > - This series appears to be untested based on the points above. > > > > > > > > Therefore, instead of this series, I believe you will need to > > > > fully > > > > update the bind pipeline to process MAP GPU VMA operations > here. > > > > > > > > So roughly... > > > > > > > > - Maybe include a bit in xe_vma_op_map that specifies > "invalidate > > > > on > > > > bind," set in vm_bind_ioctl_ops_create, since this will need to > > > > be > > > > wired throughout the bind pipeline. > > > > - Don't validate backing memory in this case. > > > > - Ensure that xe_vma_ops_incr_pt_update_ops is called in this > > > > case > > > > for > > > > MAP operations, forcing entry into the xe_pt.c backend. > > > > - Update xe_pt_stage_bind_walk with a variable that indicates > > > > clearing > > > > the PTE. Instead of calling pte_encode_vma in > > > > xe_pt_stage_bind_entry, > > > > set this variable for PT bind operations derived from MAP > > > > operations > > > > that meet the "invalidate on bind" condition. > > > > - Ensure needs_invalidation is set in struct > > > > xe_vm_pgtable_update_ops if > > > > a MAP operation is included that meets the "invalidate on bind" > > > > condition. > > > > - Set the VMA tile_invalidated in addition to tile_present for > > > > MAP > > > > operations that meet the "invalidate on bind" condition. > > > > > > > > I might be missing some implementation details mentioned > above, > > > > but this > > > > should provide you with some direction. > > > > > > > > Lastly, and perhaps most importantly, please test this using an > > > > IGT > > > > and > > > > include the results in the next post. > > > > > > > > Matt > > > > > > > > > + > > > > > ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], > > > > obj_offset, > > > > > addr, range, > > > > > op, flags, > > > > > > > > > > prefetch_region, > > > > pat_index); > > > > > -- > > > > > 2.26.3 > > > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform 2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng 2025-01-28 22:21 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng @ 2025-01-28 22:21 ` Oak Zeng 2025-01-28 23:05 ` Cavitt, Jonathan 2025-01-29 8:52 ` Thomas Hellström 2025-01-28 23:04 ` [PATCH 1/3] drm/xe: Add a function to zap page table by address range Cavitt, Jonathan ` (5 subsequent siblings) 7 siblings, 2 replies; 32+ messages in thread From: Oak Zeng @ 2025-01-28 22:21 UTC (permalink / raw) To: intel-xe; +Cc: joonas.lahtinen, Thomas.Hellstrom Normally scratch page is not allowed when a vm is operate under page fault mode, i.e., in the existing codes, DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason is fault mode relies on recoverable page to work, while scratch page can mute recoverable page fault. On xe2 and xe3, HW prefetching can cause page fault interrupt. Due to page fault interrupt overhead (i.e., need Guc and KMD involved to fix the page fault), HW prefetching can be slowed by many orders of magnitude. Fix this problem by allowing scratch page under fault mode for xe2 and xe3. With scratch page in place, HW prefetching could always hit scratch page instead of causing interrupt. Signed-off-by: Oak Zeng <oak.zeng@intel.com> --- drivers/gpu/drm/xe/xe_vm.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 196d347c6ac0..3346f88f284a 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -1732,6 +1732,11 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data, if (XE_IOCTL_DBG(xe, args->extensions)) return -EINVAL; + if (XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE && + args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE && + !(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || IS_PANTHERLAKE(xe)))) + return -EINVAL; + if (XE_WA(xe_root_mmio_gt(xe), 14016763929)) args->flags |= DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE; @@ -1745,10 +1750,6 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data, if (XE_IOCTL_DBG(xe, args->flags & ~ALL_DRM_XE_VM_CREATE_FLAGS)) return -EINVAL; - if (XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE && - args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE)) - return -EINVAL; - if (XE_IOCTL_DBG(xe, !(args->flags & DRM_XE_VM_CREATE_FLAG_LR_MODE) && args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE)) return -EINVAL; -- 2.26.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* RE: [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform 2025-01-28 22:21 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng @ 2025-01-28 23:05 ` Cavitt, Jonathan 2025-01-29 8:52 ` Thomas Hellström 1 sibling, 0 replies; 32+ messages in thread From: Cavitt, Jonathan @ 2025-01-28 23:05 UTC (permalink / raw) To: Zeng, Oak, intel-xe@lists.freedesktop.org Cc: joonas.lahtinen@linux.intel.com, Thomas.Hellstrom@linux.intel.com, Cavitt, Jonathan -----Original Message----- From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Oak Zeng Sent: Tuesday, January 28, 2025 2:22 PM To: intel-xe@lists.freedesktop.org Cc: joonas.lahtinen@linux.intel.com; Thomas.Hellstrom@linux.intel.com Subject: [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform > > Normally scratch page is not allowed when a vm is operate under page > fault mode, i.e., in the existing codes, DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE > and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason > is fault mode relies on recoverable page to work, while scratch page > can mute recoverable page fault. > > On xe2 and xe3, HW prefetching can cause page fault interrupt. Due to > page fault interrupt overhead (i.e., need Guc and KMD involved to fix > the page fault), HW prefetching can be slowed by many orders of magnitude. > > Fix this problem by allowing scratch page under fault mode for xe2 and xe3. > With scratch page in place, HW prefetching could always hit scratch page > instead of causing interrupt. > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> LGTM, though see my comment for patch 2 regarding the added platform checks. Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> -Jonathan Cavitt > --- > drivers/gpu/drm/xe/xe_vm.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c > index 196d347c6ac0..3346f88f284a 100644 > --- a/drivers/gpu/drm/xe/xe_vm.c > +++ b/drivers/gpu/drm/xe/xe_vm.c > @@ -1732,6 +1732,11 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data, > if (XE_IOCTL_DBG(xe, args->extensions)) > return -EINVAL; > > + if (XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE && > + args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE && > + !(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || IS_PANTHERLAKE(xe)))) > + return -EINVAL; > + > if (XE_WA(xe_root_mmio_gt(xe), 14016763929)) > args->flags |= DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE; > > @@ -1745,10 +1750,6 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data, > if (XE_IOCTL_DBG(xe, args->flags & ~ALL_DRM_XE_VM_CREATE_FLAGS)) > return -EINVAL; > > - if (XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE && > - args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE)) > - return -EINVAL; > - > if (XE_IOCTL_DBG(xe, !(args->flags & DRM_XE_VM_CREATE_FLAG_LR_MODE) && > args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE)) > return -EINVAL; > -- > 2.26.3 > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform 2025-01-28 22:21 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng 2025-01-28 23:05 ` Cavitt, Jonathan @ 2025-01-29 8:52 ` Thomas Hellström 2025-01-29 16:41 ` Matthew Brost 1 sibling, 1 reply; 32+ messages in thread From: Thomas Hellström @ 2025-01-29 8:52 UTC (permalink / raw) To: Oak Zeng, intel-xe; +Cc: joonas.lahtinen On Tue, 2025-01-28 at 17:21 -0500, Oak Zeng wrote: > Normally scratch page is not allowed when a vm is operate under page > fault mode, i.e., in the existing codes, > DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE > and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason > is fault mode relies on recoverable page to work, while scratch page > can mute recoverable page fault. > > On xe2 and xe3, HW prefetching can cause page fault interrupt. Due to > page fault interrupt overhead (i.e., need Guc and KMD involved to fix > the page fault), HW prefetching can be slowed by many orders of > magnitude. The problem on xe2 and xe3 as I understand it is not the overhead but the fact that OOB prefetching causes pagefaults and we can't resolve those. > > Fix this problem by allowing scratch page under fault mode for xe2 > and xe3. > With scratch page in place, HW prefetching could always hit scratch > page > instead of causing interrupt. > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > --- > drivers/gpu/drm/xe/xe_vm.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c > index 196d347c6ac0..3346f88f284a 100644 > --- a/drivers/gpu/drm/xe/xe_vm.c > +++ b/drivers/gpu/drm/xe/xe_vm.c > @@ -1732,6 +1732,11 @@ int xe_vm_create_ioctl(struct drm_device *dev, > void *data, > if (XE_IOCTL_DBG(xe, args->extensions)) > return -EINVAL; > > + if (XE_IOCTL_DBG(xe, args->flags & > DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE && > + args->flags & > DRM_XE_VM_CREATE_FLAG_FAULT_MODE && > + !(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || > IS_PANTHERLAKE(xe)))) > + return -EINVAL; > + Could we perhaps add a bit in the xe->info structure? /Thomas > if (XE_WA(xe_root_mmio_gt(xe), 14016763929)) > args->flags |= DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE; > > @@ -1745,10 +1750,6 @@ int xe_vm_create_ioctl(struct drm_device *dev, > void *data, > if (XE_IOCTL_DBG(xe, args->flags & > ~ALL_DRM_XE_VM_CREATE_FLAGS)) > return -EINVAL; > > - if (XE_IOCTL_DBG(xe, args->flags & > DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE && > - args->flags & > DRM_XE_VM_CREATE_FLAG_FAULT_MODE)) > - return -EINVAL; > - > if (XE_IOCTL_DBG(xe, !(args->flags & > DRM_XE_VM_CREATE_FLAG_LR_MODE) && > args->flags & > DRM_XE_VM_CREATE_FLAG_FAULT_MODE)) > return -EINVAL; ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform 2025-01-29 8:52 ` Thomas Hellström @ 2025-01-29 16:41 ` Matthew Brost 0 siblings, 0 replies; 32+ messages in thread From: Matthew Brost @ 2025-01-29 16:41 UTC (permalink / raw) To: Thomas Hellström; +Cc: Oak Zeng, intel-xe, joonas.lahtinen On Wed, Jan 29, 2025 at 09:52:54AM +0100, Thomas Hellström wrote: > On Tue, 2025-01-28 at 17:21 -0500, Oak Zeng wrote: > > Normally scratch page is not allowed when a vm is operate under page > > fault mode, i.e., in the existing codes, > > DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE > > and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason > > is fault mode relies on recoverable page to work, while scratch page > > can mute recoverable page fault. > > > > On xe2 and xe3, HW prefetching can cause page fault interrupt. Due to > > page fault interrupt overhead (i.e., need Guc and KMD involved to fix > > the page fault), HW prefetching can be slowed by many orders of > > magnitude. > > The problem on xe2 and xe3 as I understand it is not the overhead but > the fact that OOB prefetching causes pagefaults and we can't resolve > those. > > > > > Fix this problem by allowing scratch page under fault mode for xe2 > > and xe3. > > With scratch page in place, HW prefetching could always hit scratch > > page > > instead of causing interrupt. > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > --- > > drivers/gpu/drm/xe/xe_vm.c | 9 +++++---- > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c > > index 196d347c6ac0..3346f88f284a 100644 > > --- a/drivers/gpu/drm/xe/xe_vm.c > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > @@ -1732,6 +1732,11 @@ int xe_vm_create_ioctl(struct drm_device *dev, > > void *data, > > if (XE_IOCTL_DBG(xe, args->extensions)) > > return -EINVAL; > > > > + if (XE_IOCTL_DBG(xe, args->flags & > > DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE && > > + args->flags & > > DRM_XE_VM_CREATE_FLAG_FAULT_MODE && > > + !(IS_LUNARLAKE(xe) || IS_BATTLEMAGE(xe) || > > IS_PANTHERLAKE(xe)))) > > + return -EINVAL; > > + > > Could we perhaps add a bit in the xe->info structure? > +1. This seems like a good idea. Matt > /Thomas > > > > if (XE_WA(xe_root_mmio_gt(xe), 14016763929)) > > args->flags |= DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE; > > > > @@ -1745,10 +1750,6 @@ int xe_vm_create_ioctl(struct drm_device *dev, > > void *data, > > if (XE_IOCTL_DBG(xe, args->flags & > > ~ALL_DRM_XE_VM_CREATE_FLAGS)) > > return -EINVAL; > > > > - if (XE_IOCTL_DBG(xe, args->flags & > > DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE && > > - args->flags & > > DRM_XE_VM_CREATE_FLAG_FAULT_MODE)) > > - return -EINVAL; > > - > > if (XE_IOCTL_DBG(xe, !(args->flags & > > DRM_XE_VM_CREATE_FLAG_LR_MODE) && > > args->flags & > > DRM_XE_VM_CREATE_FLAG_FAULT_MODE)) > > return -EINVAL; > ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 1/3] drm/xe: Add a function to zap page table by address range 2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng 2025-01-28 22:21 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng 2025-01-28 22:21 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng @ 2025-01-28 23:04 ` Cavitt, Jonathan 2025-01-28 23:12 ` ✓ CI.Patch_applied: success for series starting with [1/3] " Patchwork ` (4 subsequent siblings) 7 siblings, 0 replies; 32+ messages in thread From: Cavitt, Jonathan @ 2025-01-28 23:04 UTC (permalink / raw) To: Zeng, Oak, intel-xe@lists.freedesktop.org Cc: joonas.lahtinen@linux.intel.com, Thomas.Hellstrom@linux.intel.com, Cavitt, Jonathan -----Original Message----- From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Oak Zeng Sent: Tuesday, January 28, 2025 2:22 PM To: intel-xe@lists.freedesktop.org Cc: joonas.lahtinen@linux.intel.com; Thomas.Hellstrom@linux.intel.com Subject: [PATCH 1/3] drm/xe: Add a function to zap page table by address range > > Add a function xe_pt_zap_range. This is similar to xe_pt_zap_ptes > but is used when we don't have a vma to work with, such as zap > a range mapped to scratch page where we don't have vma. > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > --- > drivers/gpu/drm/xe/xe_pt.c | 28 ++++++++++++++++++++++++++++ > drivers/gpu/drm/xe/xe_pt.h | 2 ++ > 2 files changed, 30 insertions(+) > > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c > index 1ddcc7e79a93..2363260da6a6 100644 > --- a/drivers/gpu/drm/xe/xe_pt.c > +++ b/drivers/gpu/drm/xe/xe_pt.c > @@ -792,6 +792,34 @@ static const struct xe_pt_walk_ops xe_pt_zap_ptes_ops = { > .pt_entry = xe_pt_zap_ptes_entry, > }; > > +/** > + * xe_pt_zap_range() - Zap (zero) gpu ptes of an virtual address range > + * @tile: The tile we're zapping for. > + * @vm: The vm we're zapping for. > + * @start: Start of the virtual address range, inclusive. > + * @end: End of the virtual address range, exclusive. > + * > + * This is similar to xe_pt_zap_ptes() but it's used when we don't have a > + * vma to work with. This is used for example when we're clearing the scratch > + * page mapping during vm_bind. > + * NIT: The asterisk here can be removed. > + */ > +void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64 start, u64 end) It looks like the function xe_pt_zap_ptes returns if the walk needs to be invalidated. Will that also be needed here? Assuming not: Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> -Jonathan Cavitt > +{ > + struct xe_pt_zap_ptes_walk xe_walk = { > + .base = { > + .ops = &xe_pt_zap_ptes_ops, > + .shifts = xe_normal_pt_shifts, > + .max_level = XE_PT_HIGHEST_LEVEL, > + }, > + .tile = tile, > + }; > + struct xe_pt *pt = vm->pt_root[tile->id]; > + > + (void)xe_pt_walk_shared(&pt->base, pt->level, start, > + end, &xe_walk.base); > +} > + > /** > * xe_pt_zap_ptes() - Zap (zero) gpu ptes of an address range > * @tile: The tile we're zapping for. > diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h > index 9ab386431cad..b166b324f455 100644 > --- a/drivers/gpu/drm/xe/xe_pt.h > +++ b/drivers/gpu/drm/xe/xe_pt.h > @@ -43,4 +43,6 @@ void xe_pt_update_ops_abort(struct xe_tile *tile, struct xe_vma_ops *vops); > > bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma); > > +void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64 start, u64 end); > + > #endif > -- > 2.26.3 > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* ✓ CI.Patch_applied: success for series starting with [1/3] drm/xe: Add a function to zap page table by address range 2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng ` (2 preceding siblings ...) 2025-01-28 23:04 ` [PATCH 1/3] drm/xe: Add a function to zap page table by address range Cavitt, Jonathan @ 2025-01-28 23:12 ` Patchwork 2025-01-28 23:12 ` ✓ CI.checkpatch: " Patchwork ` (3 subsequent siblings) 7 siblings, 0 replies; 32+ messages in thread From: Patchwork @ 2025-01-28 23:12 UTC (permalink / raw) To: Oak Zeng; +Cc: intel-xe == Series Details == Series: series starting with [1/3] drm/xe: Add a function to zap page table by address range URL : https://patchwork.freedesktop.org/series/144072/ State : success == Summary == === Applying kernel patches on branch 'drm-tip' with base: === Base commit: 8f3ba847175b drm-tip: 2025y-01m-28d-22h-47m-54s UTC integration manifest === git am output follows === Applying: drm/xe: Add a function to zap page table by address range Applying: drm/xe: Clear scratch page before vm_bind Applying: drm/xe: Allow scratch page under fault mode for certain platform ^ permalink raw reply [flat|nested] 32+ messages in thread
* ✓ CI.checkpatch: success for series starting with [1/3] drm/xe: Add a function to zap page table by address range 2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng ` (3 preceding siblings ...) 2025-01-28 23:12 ` ✓ CI.Patch_applied: success for series starting with [1/3] " Patchwork @ 2025-01-28 23:12 ` Patchwork 2025-01-28 23:13 ` ✗ CI.KUnit: failure " Patchwork ` (2 subsequent siblings) 7 siblings, 0 replies; 32+ messages in thread From: Patchwork @ 2025-01-28 23:12 UTC (permalink / raw) To: Oak Zeng; +Cc: intel-xe == Series Details == Series: series starting with [1/3] drm/xe: Add a function to zap page table by address range URL : https://patchwork.freedesktop.org/series/144072/ State : success == Summary == + KERNEL=/kernel + git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt Cloning into 'mt'... warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/ + git -C mt rev-list -n1 origin/master 30ab6715fc09baee6cc14cb3c89ad8858688d474 + cd /kernel + git config --global --add safe.directory /kernel + git log -n1 commit cb0f1cdb1d31379f15c7b5e96e66a8c4d48691ee Author: Oak Zeng <oak.zeng@intel.com> Date: Tue Jan 28 17:21:45 2025 -0500 drm/xe: Allow scratch page under fault mode for certain platform Normally scratch page is not allowed when a vm is operate under page fault mode, i.e., in the existing codes, DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason is fault mode relies on recoverable page to work, while scratch page can mute recoverable page fault. On xe2 and xe3, HW prefetching can cause page fault interrupt. Due to page fault interrupt overhead (i.e., need Guc and KMD involved to fix the page fault), HW prefetching can be slowed by many orders of magnitude. Fix this problem by allowing scratch page under fault mode for xe2 and xe3. With scratch page in place, HW prefetching could always hit scratch page instead of causing interrupt. Signed-off-by: Oak Zeng <oak.zeng@intel.com> + /mt/dim checkpatch 8f3ba847175bfd495cbfe5b51d3cdfd8d32d24db drm-intel fc3c14dbb3cb drm/xe: Add a function to zap page table by address range 108d849b4e71 drm/xe: Clear scratch page before vm_bind cb0f1cdb1d31 drm/xe: Allow scratch page under fault mode for certain platform ^ permalink raw reply [flat|nested] 32+ messages in thread
* ✗ CI.KUnit: failure for series starting with [1/3] drm/xe: Add a function to zap page table by address range 2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng ` (4 preceding siblings ...) 2025-01-28 23:12 ` ✓ CI.checkpatch: " Patchwork @ 2025-01-28 23:13 ` Patchwork 2025-01-28 23:38 ` [PATCH 1/3] " Matthew Brost 2025-01-29 8:48 ` Thomas Hellström 7 siblings, 0 replies; 32+ messages in thread From: Patchwork @ 2025-01-28 23:13 UTC (permalink / raw) To: Oak Zeng; +Cc: intel-xe == Series Details == Series: series starting with [1/3] drm/xe: Add a function to zap page table by address range URL : https://patchwork.freedesktop.org/series/144072/ State : failure == Summary == + trap cleanup EXIT + /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig ERROR:root:../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes] 156 | u64 ioread64_lo_hi(const void __iomem *addr) | ^~~~~~~~~~~~~~ ../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes] 163 | u64 ioread64_hi_lo(const void __iomem *addr) | ^~~~~~~~~~~~~~ ../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes] 170 | u64 ioread64be_lo_hi(const void __iomem *addr) | ^~~~~~~~~~~~~~~~ ../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes] 178 | u64 ioread64be_hi_lo(const void __iomem *addr) | ^~~~~~~~~~~~~~~~ ../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes] 264 | void iowrite64_lo_hi(u64 val, void __iomem *addr) | ^~~~~~~~~~~~~~~ ../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes] 272 | void iowrite64_hi_lo(u64 val, void __iomem *addr) | ^~~~~~~~~~~~~~~ ../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes] 280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr) | ^~~~~~~~~~~~~~~~~ ../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes] 288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr) | ^~~~~~~~~~~~~~~~~ ../drivers/gpu/drm/xe/xe_vm.c:41:10: fatal error: i915_drv.h: No such file or directory 41 | #include "i915_drv.h" | ^~~~~~~~~~~~ compilation terminated. make[7]: *** [../scripts/Makefile.build:194: drivers/gpu/drm/xe/xe_vm.o] Error 1 make[7]: *** Waiting for unfinished jobs.... make[6]: *** [../scripts/Makefile.build:440: drivers/gpu/drm/xe] Error 2 make[5]: *** [../scripts/Makefile.build:440: drivers/gpu/drm] Error 2 make[4]: *** [../scripts/Makefile.build:440: drivers/gpu] Error 2 make[3]: *** [../scripts/Makefile.build:440: drivers] Error 2 make[2]: *** [/kernel/Makefile:1989: .] Error 2 make[1]: *** [/kernel/Makefile:251: __sub-make] Error 2 make: *** [Makefile:251: __sub-make] Error 2 [23:12:39] Configuring KUnit Kernel ... Generating .config ... Populating config with: $ make ARCH=um O=.kunit olddefconfig [23:12:43] Building KUnit Kernel ... Populating config with: $ make ARCH=um O=.kunit olddefconfig Building with: $ make all compile_commands.json ARCH=um O=.kunit --jobs=48 + cleanup ++ stat -c %u:%g /kernel + chown -R 1003:1003 /kernel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/3] drm/xe: Add a function to zap page table by address range 2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng ` (5 preceding siblings ...) 2025-01-28 23:13 ` ✗ CI.KUnit: failure " Patchwork @ 2025-01-28 23:38 ` Matthew Brost 2025-01-29 8:48 ` Thomas Hellström 7 siblings, 0 replies; 32+ messages in thread From: Matthew Brost @ 2025-01-28 23:38 UTC (permalink / raw) To: Oak Zeng; +Cc: intel-xe, joonas.lahtinen, Thomas.Hellstrom On Tue, Jan 28, 2025 at 05:21:43PM -0500, Oak Zeng wrote: > Add a function xe_pt_zap_range. This is similar to xe_pt_zap_ptes > but is used when we don't have a vma to work with, such as zap > a range mapped to scratch page where we don't have vma. > See my reply to the following patch. This won't work and isn't needed. Let me explain why this won't work. When we set up scratch PTEs on a VM, we create shell PTE structure to largest page size (1 GB), which we 1 GB entry pointing to a scratch page. With 48 bits of address space, this is 1 level to 1 GB pages, with 57 bits this is 2 levels to 1 GB pages. Now, let's say we perform a 4K bind from 0x0 to 0x1000 with immediate clear in fault mode. How can we invalidate just the 4K range of memory? We can't. We need to allocate new memory for page tables so that 0x0000–0x1000 points to an invalidated PTE, while 0x1000–1 GB points to scratch. This is why, in my reply to the patch, I indicated that the entire bind pipeline needs to be updated to properly invalidate the PTEs. I hope this explanation, along with my reply to the next patch, makes sense. Matt > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > --- > drivers/gpu/drm/xe/xe_pt.c | 28 ++++++++++++++++++++++++++++ > drivers/gpu/drm/xe/xe_pt.h | 2 ++ > 2 files changed, 30 insertions(+) > > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c > index 1ddcc7e79a93..2363260da6a6 100644 > --- a/drivers/gpu/drm/xe/xe_pt.c > +++ b/drivers/gpu/drm/xe/xe_pt.c > @@ -792,6 +792,34 @@ static const struct xe_pt_walk_ops xe_pt_zap_ptes_ops = { > .pt_entry = xe_pt_zap_ptes_entry, > }; > > +/** > + * xe_pt_zap_range() - Zap (zero) gpu ptes of an virtual address range > + * @tile: The tile we're zapping for. > + * @vm: The vm we're zapping for. > + * @start: Start of the virtual address range, inclusive. > + * @end: End of the virtual address range, exclusive. > + * > + * This is similar to xe_pt_zap_ptes() but it's used when we don't have a > + * vma to work with. This is used for example when we're clearing the scratch > + * page mapping during vm_bind. > + * > + */ > +void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64 start, u64 end) > +{ > + struct xe_pt_zap_ptes_walk xe_walk = { > + .base = { > + .ops = &xe_pt_zap_ptes_ops, > + .shifts = xe_normal_pt_shifts, > + .max_level = XE_PT_HIGHEST_LEVEL, > + }, > + .tile = tile, > + }; > + struct xe_pt *pt = vm->pt_root[tile->id]; > + > + (void)xe_pt_walk_shared(&pt->base, pt->level, start, > + end, &xe_walk.base); > +} > + > /** > * xe_pt_zap_ptes() - Zap (zero) gpu ptes of an address range > * @tile: The tile we're zapping for. > diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h > index 9ab386431cad..b166b324f455 100644 > --- a/drivers/gpu/drm/xe/xe_pt.h > +++ b/drivers/gpu/drm/xe/xe_pt.h > @@ -43,4 +43,6 @@ void xe_pt_update_ops_abort(struct xe_tile *tile, struct xe_vma_ops *vops); > > bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma); > > +void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64 start, u64 end); > + > #endif > -- > 2.26.3 > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/3] drm/xe: Add a function to zap page table by address range 2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng ` (6 preceding siblings ...) 2025-01-28 23:38 ` [PATCH 1/3] " Matthew Brost @ 2025-01-29 8:48 ` Thomas Hellström 7 siblings, 0 replies; 32+ messages in thread From: Thomas Hellström @ 2025-01-29 8:48 UTC (permalink / raw) To: Oak Zeng, intel-xe; +Cc: joonas.lahtinen On Tue, 2025-01-28 at 17:21 -0500, Oak Zeng wrote: > Add a function xe_pt_zap_range. This is similar to xe_pt_zap_ptes > but is used when we don't have a vma to work with, such as zap > a range mapped to scratch page where we don't have vma. > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> As I think also Matt mentions, The zap functionality works only when it exactly matches a pre-bound range (like in this case a vma) and there is something cleaning up the zapped ptes afterwards. Typically that's a rebind or an unbind. Thanks, Thomas > --- > drivers/gpu/drm/xe/xe_pt.c | 28 ++++++++++++++++++++++++++++ > drivers/gpu/drm/xe/xe_pt.h | 2 ++ > 2 files changed, 30 insertions(+) > > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c > index 1ddcc7e79a93..2363260da6a6 100644 > --- a/drivers/gpu/drm/xe/xe_pt.c > +++ b/drivers/gpu/drm/xe/xe_pt.c > @@ -792,6 +792,34 @@ static const struct xe_pt_walk_ops > xe_pt_zap_ptes_ops = { > .pt_entry = xe_pt_zap_ptes_entry, > }; > > +/** > + * xe_pt_zap_range() - Zap (zero) gpu ptes of an virtual address > range > + * @tile: The tile we're zapping for. > + * @vm: The vm we're zapping for. > + * @start: Start of the virtual address range, inclusive. > + * @end: End of the virtual address range, exclusive. > + * > + * This is similar to xe_pt_zap_ptes() but it's used when we don't > have a > + * vma to work with. This is used for example when we're clearing > the scratch > + * page mapping during vm_bind. > + * > + */ > +void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64 > start, u64 end) > +{ > + struct xe_pt_zap_ptes_walk xe_walk = { > + .base = { > + .ops = &xe_pt_zap_ptes_ops, > + .shifts = xe_normal_pt_shifts, > + .max_level = XE_PT_HIGHEST_LEVEL, > + }, > + .tile = tile, > + }; > + struct xe_pt *pt = vm->pt_root[tile->id]; > + > + (void)xe_pt_walk_shared(&pt->base, pt->level, start, > + end, &xe_walk.base); > +} > + > /** > * xe_pt_zap_ptes() - Zap (zero) gpu ptes of an address range > * @tile: The tile we're zapping for. > diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h > index 9ab386431cad..b166b324f455 100644 > --- a/drivers/gpu/drm/xe/xe_pt.h > +++ b/drivers/gpu/drm/xe/xe_pt.h > @@ -43,4 +43,6 @@ void xe_pt_update_ops_abort(struct xe_tile *tile, > struct xe_vma_ops *vops); > > bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma); > > +void xe_pt_zap_range(struct xe_tile *tile, struct xe_vm *vm, u64 > start, u64 end); > + > #endif ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 1/3] drm/xe: Introduced needs_scratch bit in device descriptor @ 2025-02-04 18:45 Oak Zeng 2025-02-04 18:45 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng 0 siblings, 1 reply; 32+ messages in thread From: Oak Zeng @ 2025-02-04 18:45 UTC (permalink / raw) To: intel-xe; +Cc: Thomas.Hellstrom, matthew.brost, jonathan.cavitt On some platform, scratch page is needed for out of bound prefetch to work. Introduce a bit in device descriptor to specify whether this device needs scratch page to work. v2: introduce a needs_scratch bit in device info (Thomas, Jonathan) Signed-off-by: Oak Zeng <oak.zeng@intel.com> --- drivers/gpu/drm/xe/xe_device_types.h | 3 +++ drivers/gpu/drm/xe/xe_pci.c | 5 +++++ 2 files changed, 8 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h index c0e886bac183..1197ab31e528 100644 --- a/drivers/gpu/drm/xe/xe_device_types.h +++ b/drivers/gpu/drm/xe/xe_device_types.h @@ -45,6 +45,7 @@ struct xe_pxp; #define GRAPHICS_VERx100(xe) ((xe)->info.graphics_verx100) #define MEDIA_VERx100(xe) ((xe)->info.media_verx100) #define IS_DGFX(xe) ((xe)->info.is_dgfx) +#define NEEDS_SCRATCH(xe) ((xe)->info.needs_scratch) #define XE_VRAM_FLAGS_NEED64K BIT(0) @@ -318,6 +319,8 @@ struct xe_device { u8 has_usm:1; /** @info.is_dgfx: is discrete device */ u8 is_dgfx:1; + /** @info.needs_scratch: needs scratch page for oob prefetch to work */ + u8 needs_scratch:1; /** * @info.probe_display: Probe display hardware. If set to * false, the driver will behave as if there is no display diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c index e8ef7d6b4db8..09736674a77e 100644 --- a/drivers/gpu/drm/xe/xe_pci.c +++ b/drivers/gpu/drm/xe/xe_pci.c @@ -67,6 +67,7 @@ struct xe_device_desc { u8 has_llc:1; u8 has_pxp:1; u8 has_sriov:1; + u8 needs_scratch:1; u8 skip_guc_pc:1; u8 skip_mtcfg:1; u8 skip_pcode:1; @@ -353,6 +354,7 @@ static const struct xe_device_desc lnl_desc = { .dma_mask_size = 46, .has_display = true, .has_pxp = true, + .needs_scratch = true, }; static const struct xe_device_desc bmg_desc = { @@ -361,6 +363,7 @@ static const struct xe_device_desc bmg_desc = { .dma_mask_size = 46, .has_display = true, .has_heci_cscfi = 1, + .needs_scratch = true, }; static const struct xe_device_desc ptl_desc = { @@ -368,6 +371,7 @@ static const struct xe_device_desc ptl_desc = { .dma_mask_size = 46, .has_display = true, .require_force_probe = true, + .needs_scratch = true, }; #undef PLATFORM @@ -643,6 +647,7 @@ static int xe_info_init_early(struct xe_device *xe, xe->info.skip_guc_pc = desc->skip_guc_pc; xe->info.skip_mtcfg = desc->skip_mtcfg; xe->info.skip_pcode = desc->skip_pcode; + xe->info.needs_scratch = desc->needs_scratch; xe->info.probe_display = IS_ENABLED(CONFIG_DRM_XE_DISPLAY) && xe_modparam.probe_display && -- 2.26.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-02-04 18:45 [PATCH 1/3] drm/xe: Introduced needs_scratch bit in device descriptor Oak Zeng @ 2025-02-04 18:45 ` Oak Zeng 2025-02-05 14:35 ` Thomas Hellström 2025-02-06 10:34 ` Matthew Brost 0 siblings, 2 replies; 32+ messages in thread From: Oak Zeng @ 2025-02-04 18:45 UTC (permalink / raw) To: intel-xe; +Cc: Thomas.Hellstrom, matthew.brost, jonathan.cavitt When a vm runs under fault mode, if scratch page is enabled, we need to clear the scratch page mapping before vm_bind for the vm_bind address range. Under fault mode, we depend on recoverable page fault to establish mapping in page table. If scratch page is not cleared, GPU access of address won't cause page fault because it always hits the existing scratch page mapping. When vm_bind with IMMEDIATE flag, there is no need of clearing as immediate bind can overwrite the scratch page mapping. So far only is xe2 and xe3 products are allowed to enable scratch page under fault mode. On other platform we don't allow scratch page under fault mode, so no need of such clearing. v2: Rework vm_bind pipeline to clear scratch page mapping. This is similar to a map operation, with the exception that PTEs are cleared instead of pointing to valid physical pages. (Matt, Thomas) TLB invalidation is needed after clear scratch page mapping as larger scratch page mapping could be backed by physical page and cached in TLB. (Matt, Thomas) Signed-off-by: Oak Zeng <oak.zeng@intel.com> --- drivers/gpu/drm/xe/xe_pt.c | 66 ++++++++++++++++++++++---------- drivers/gpu/drm/xe/xe_pt_types.h | 2 + drivers/gpu/drm/xe/xe_vm.c | 29 ++++++++++++-- drivers/gpu/drm/xe/xe_vm_types.h | 2 + 4 files changed, 75 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c index 1ddcc7e79a93..3fd0ae2dbe7d 100644 --- a/drivers/gpu/drm/xe/xe_pt.c +++ b/drivers/gpu/drm/xe/xe_pt.c @@ -268,6 +268,8 @@ struct xe_pt_stage_bind_walk { * granularity. */ bool needs_64K; + /* @clear_pt: clear page table entries during the bind walk */ + bool clear_pt; /** * @vma: VMA being mapped */ @@ -497,21 +499,25 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset, XE_WARN_ON(xe_walk->va_curs_start != addr); - pte = vm->pt_ops->pte_encode_vma(is_null ? 0 : - xe_res_dma(curs) + xe_walk->dma_offset, - xe_walk->vma, pat_index, level); - pte |= xe_walk->default_pte; + if (xe_walk->clear_pt) { + pte = 0; + } else { + pte = vm->pt_ops->pte_encode_vma(is_null ? 0 : + xe_res_dma(curs) + xe_walk->dma_offset, + xe_walk->vma, pat_index, level); + pte |= xe_walk->default_pte; - /* - * Set the XE_PTE_PS64 hint if possible, otherwise if - * this device *requires* 64K PTE size for VRAM, fail. - */ - if (level == 0 && !xe_parent->is_compact) { - if (xe_pt_is_pte_ps64K(addr, next, xe_walk)) { - xe_walk->vma->gpuva.flags |= XE_VMA_PTE_64K; - pte |= XE_PTE_PS64; - } else if (XE_WARN_ON(xe_walk->needs_64K)) { - return -EINVAL; + /* + * Set the XE_PTE_PS64 hint if possible, otherwise if + * this device *requires* 64K PTE size for VRAM, fail. + */ + if (level == 0 && !xe_parent->is_compact) { + if (xe_pt_is_pte_ps64K(addr, next, xe_walk)) { + xe_walk->vma->gpuva.flags |= XE_VMA_PTE_64K; + pte |= XE_PTE_PS64; + } else if (XE_WARN_ON(xe_walk->needs_64K)) { + return -EINVAL; + } } } @@ -519,7 +525,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset, if (unlikely(ret)) return ret; - if (!is_null) + if (!is_null && !xe_walk->clear_pt) xe_res_next(curs, next - addr); xe_walk->va_curs_start = next; xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << level); @@ -589,6 +595,7 @@ static const struct xe_pt_walk_ops xe_pt_stage_bind_ops = { * @vma: The vma indicating the address range. * @entries: Storage for the update entries used for connecting the tree to * the main tree at commit time. + * @clear_pt: Clear the page table entries. * @num_entries: On output contains the number of @entries used. * * This function builds a disconnected page-table tree for a given address @@ -602,7 +609,8 @@ static const struct xe_pt_walk_ops xe_pt_stage_bind_ops = { */ static int xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, - struct xe_vm_pgtable_update *entries, u32 *num_entries) + struct xe_vm_pgtable_update *entries, + bool clear_pt, u32 *num_entries) { struct xe_device *xe = tile_to_xe(tile); struct xe_bo *bo = xe_vma_bo(vma); @@ -622,10 +630,19 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, .vma = vma, .wupd.entries = entries, .needs_64K = (xe_vma_vm(vma)->flags & XE_VM_FLAG_64K) && is_devmem, + .clear_pt = clear_pt, }; struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id]; int ret; + if (clear_pt) { + ret = xe_pt_walk_range(&pt->base, pt->level, xe_vma_start(vma), + xe_vma_end(vma), &xe_walk.base); + + *num_entries = xe_walk.wupd.num_used_entries; + return ret; + } + /** * Default atomic expectations for different allocation scenarios are as follows: * @@ -981,12 +998,14 @@ static void xe_pt_free_bind(struct xe_vm_pgtable_update *entries, static int xe_pt_prepare_bind(struct xe_tile *tile, struct xe_vma *vma, - struct xe_vm_pgtable_update *entries, u32 *num_entries) + struct xe_vm_pgtable_update *entries, + bool invalidate_on_bind, u32 *num_entries) { int err; *num_entries = 0; - err = xe_pt_stage_bind(tile, vma, entries, num_entries); + err = xe_pt_stage_bind(tile, vma, entries, invalidate_on_bind, + num_entries); if (!err) xe_tile_assert(tile, *num_entries); @@ -1661,6 +1680,7 @@ static int bind_op_prepare(struct xe_vm *vm, struct xe_tile *tile, return err; err = xe_pt_prepare_bind(tile, vma, pt_op->entries, + pt_update_ops->invalidate_on_bind, &pt_op->num_entries); if (!err) { xe_tile_assert(tile, pt_op->num_entries <= @@ -1685,7 +1705,7 @@ static int bind_op_prepare(struct xe_vm *vm, struct xe_tile *tile, * it needs to be done here. */ if ((!pt_op->rebind && xe_vm_has_scratch(vm) && - xe_vm_in_preempt_fence_mode(vm))) + xe_vm_in_preempt_fence_mode(vm)) || pt_update_ops->invalidate_on_bind) pt_update_ops->needs_invalidation = true; else if (pt_op->rebind && !xe_vm_in_lr_mode(vm)) /* We bump also if batch_invalidate_tlb is true */ @@ -1759,9 +1779,13 @@ static int op_prepare(struct xe_vm *vm, switch (op->base.op) { case DRM_GPUVA_OP_MAP: - if (!op->map.immediate && xe_vm_in_fault_mode(vm)) + if (!op->map.immediate && xe_vm_in_fault_mode(vm) && + !op->map.invalidate_on_bind) break; + if (op->map.invalidate_on_bind) + pt_update_ops->invalidate_on_bind = true; + err = bind_op_prepare(vm, tile, pt_update_ops, op->map.vma); pt_update_ops->wait_vm_kernel = true; break; @@ -1871,6 +1895,8 @@ static void bind_op_commit(struct xe_vm *vm, struct xe_tile *tile, } vma->tile_present |= BIT(tile->id); vma->tile_staged &= ~BIT(tile->id); + if (pt_update_ops->invalidate_on_bind) + vma->tile_invalidated |= BIT(tile->id); if (xe_vma_is_userptr(vma)) { lockdep_assert_held_read(&vm->userptr.notifier_lock); to_userptr_vma(vma)->userptr.initial_bind = true; diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h index 384cc04de719..3d0aa2a5102e 100644 --- a/drivers/gpu/drm/xe/xe_pt_types.h +++ b/drivers/gpu/drm/xe/xe_pt_types.h @@ -108,6 +108,8 @@ struct xe_vm_pgtable_update_ops { bool needs_userptr_lock; /** @needs_invalidation: Needs invalidation */ bool needs_invalidation; + /** @invalidate_on_bind: Invalidate the range before bind */ + bool invalidate_on_bind; /** * @wait_vm_bookkeep: PT operations need to wait until VM is idle * (bookkeep dma-resv slots are idle) and stage all future VM activity diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index d664f2e418b2..813d893d9b63 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -1921,6 +1921,23 @@ static void print_op(struct xe_device *xe, struct drm_gpuva_op *op) } #endif +static bool __xe_vm_needs_clear_scratch_pages(struct xe_vm *vm, u32 bind_flags) +{ + if (!xe_vm_in_fault_mode(vm)) + return false; + + if (!NEEDS_SCRATCH(vm->xe)) + return false; + + if (!xe_vm_has_scratch(vm)) + return false; + + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) + return false; + + return true; +} + /* * Create operations list from IOCTL arguments, setup operations fields so parse * and commit steps are decoupled from IOCTL arguments. This step can fail. @@ -1991,6 +2008,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo, op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL; op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE; op->map.pat_index = pat_index; + op->map.invalidate_on_bind = + __xe_vm_needs_clear_scratch_pages(vm, flags); } else if (__op->op == DRM_GPUVA_OP_PREFETCH) { op->prefetch.region = prefetch_region; } @@ -2188,7 +2207,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops, return PTR_ERR(vma); op->map.vma = vma; - if (op->map.immediate || !xe_vm_in_fault_mode(vm)) + if (op->map.immediate || !xe_vm_in_fault_mode(vm) || + op->map.invalidate_on_bind) xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask); break; @@ -2416,9 +2436,10 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm, switch (op->base.op) { case DRM_GPUVA_OP_MAP: - err = vma_lock_and_validate(exec, op->map.vma, - !xe_vm_in_fault_mode(vm) || - op->map.immediate); + if (!op->map.invalidate_on_bind) + err = vma_lock_and_validate(exec, op->map.vma, + !xe_vm_in_fault_mode(vm) || + op->map.immediate); break; case DRM_GPUVA_OP_REMAP: err = check_ufence(gpuva_to_vma(op->base.remap.unmap->va)); diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h index 52467b9b5348..dace04f4ea5e 100644 --- a/drivers/gpu/drm/xe/xe_vm_types.h +++ b/drivers/gpu/drm/xe/xe_vm_types.h @@ -297,6 +297,8 @@ struct xe_vma_op_map { bool is_null; /** @dumpable: whether BO is dumped on GPU hang */ bool dumpable; + /** @invalidate: invalidate the VMA before bind */ + bool invalidate_on_bind; /** @pat_index: The pat index to use for this operation. */ u16 pat_index; }; -- 2.26.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-02-04 18:45 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng @ 2025-02-05 14:35 ` Thomas Hellström 2025-02-06 1:52 ` Zeng, Oak 2025-02-06 10:34 ` Matthew Brost 1 sibling, 1 reply; 32+ messages in thread From: Thomas Hellström @ 2025-02-05 14:35 UTC (permalink / raw) To: Oak Zeng, intel-xe; +Cc: matthew.brost, jonathan.cavitt On Tue, 2025-02-04 at 13:45 -0500, Oak Zeng wrote: > When a vm runs under fault mode, if scratch page is enabled, we need > to clear the scratch page mapping before vm_bind for the vm_bind > address range. Under fault mode, we depend on recoverable page fault > to establish mapping in page table. If scratch page is not cleared, > GPU access of address won't cause page fault because it always hits > the existing scratch page mapping. I think we need to ephasize that we clear AT vm_bind time, not before, both in the commit description and the patch title. > > When vm_bind with IMMEDIATE flag, there is no need of clearing as > immediate bind can overwrite the scratch page mapping. > > So far only is xe2 and xe3 products are allowed to enable scratch > page > under fault mode. On other platform we don't allow scratch page under > fault mode, so no need of such clearing. > > v2: Rework vm_bind pipeline to clear scratch page mapping. This is > similar > to a map operation, with the exception that PTEs are cleared instead > of > pointing to valid physical pages. (Matt, Thomas) > > TLB invalidation is needed after clear scratch page mapping as larger > scratch page mapping could be backed by physical page and cached in > TLB. (Matt, Thomas) > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > --- > drivers/gpu/drm/xe/xe_pt.c | 66 ++++++++++++++++++++++-------- > -- > drivers/gpu/drm/xe/xe_pt_types.h | 2 + > drivers/gpu/drm/xe/xe_vm.c | 29 ++++++++++++-- > drivers/gpu/drm/xe/xe_vm_types.h | 2 + > 4 files changed, 75 insertions(+), 24 deletions(-) > > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c > index 1ddcc7e79a93..3fd0ae2dbe7d 100644 > --- a/drivers/gpu/drm/xe/xe_pt.c > +++ b/drivers/gpu/drm/xe/xe_pt.c > @@ -268,6 +268,8 @@ struct xe_pt_stage_bind_walk { > * granularity. > */ > bool needs_64K; > + /* @clear_pt: clear page table entries during the bind walk > */ > + bool clear_pt; > /** > * @vma: VMA being mapped > */ > @@ -497,21 +499,25 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, > pgoff_t offset, > > XE_WARN_ON(xe_walk->va_curs_start != addr); > > - pte = vm->pt_ops->pte_encode_vma(is_null ? 0 : > - xe_res_dma(curs) + > xe_walk->dma_offset, > - xe_walk->vma, > pat_index, level); > - pte |= xe_walk->default_pte; > + if (xe_walk->clear_pt) { > + pte = 0; > + } else { > + pte = vm->pt_ops->pte_encode_vma(is_null ? 0 > : > + xe_res_dma(curs) + xe_walk- > >dma_offset, > + xe_walk->vma, pat_index, > level); > + pte |= xe_walk->default_pte; > > - /* > - * Set the XE_PTE_PS64 hint if possible, otherwise > if > - * this device *requires* 64K PTE size for VRAM, > fail. > - */ > - if (level == 0 && !xe_parent->is_compact) { > - if (xe_pt_is_pte_ps64K(addr, next, xe_walk)) > { > - xe_walk->vma->gpuva.flags |= > XE_VMA_PTE_64K; > - pte |= XE_PTE_PS64; > - } else if (XE_WARN_ON(xe_walk->needs_64K)) { > - return -EINVAL; > + /* > + * Set the XE_PTE_PS64 hint if possible, > otherwise if > + * this device *requires* 64K PTE size for > VRAM, fail. > + */ > + if (level == 0 && !xe_parent->is_compact) { > + if (xe_pt_is_pte_ps64K(addr, next, > xe_walk)) { > + xe_walk->vma->gpuva.flags |= > XE_VMA_PTE_64K; > + pte |= XE_PTE_PS64; > + } else if (XE_WARN_ON(xe_walk- > >needs_64K)) { > + return -EINVAL; > + } > } > } > I think there might be some things missing here. The bind should look exactly as a is_null bind, with the exception that the pte should be 0, so might be able to benefit from is_null, for example xe_pg_hugepte_possible() short-circuit the dma-address check. With clear_pt we don't want to use the XE_PTE_PS64K, though, like you identified above. Will get back with a look at the invalidation. > @@ -519,7 +525,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, > pgoff_t offset, > if (unlikely(ret)) > return ret; > > - if (!is_null) > + if (!is_null && !xe_walk->clear_pt) > xe_res_next(curs, next - addr); > xe_walk->va_curs_start = next; > xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << > level); > @@ -589,6 +595,7 @@ static const struct xe_pt_walk_ops > xe_pt_stage_bind_ops = { > * @vma: The vma indicating the address range. > * @entries: Storage for the update entries used for connecting the > tree to > * the main tree at commit time. > + * @clear_pt: Clear the page table entries. > * @num_entries: On output contains the number of @entries used. > * > * This function builds a disconnected page-table tree for a given > address > @@ -602,7 +609,8 @@ static const struct xe_pt_walk_ops > xe_pt_stage_bind_ops = { > */ > static int > xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, > - struct xe_vm_pgtable_update *entries, u32 > *num_entries) > + struct xe_vm_pgtable_update *entries, > + bool clear_pt, u32 *num_entries) > { > struct xe_device *xe = tile_to_xe(tile); > struct xe_bo *bo = xe_vma_bo(vma); > @@ -622,10 +630,19 @@ xe_pt_stage_bind(struct xe_tile *tile, struct > xe_vma *vma, > .vma = vma, > .wupd.entries = entries, > .needs_64K = (xe_vma_vm(vma)->flags & > XE_VM_FLAG_64K) && is_devmem, > + .clear_pt = clear_pt, > }; > struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id]; > int ret; > > + if (clear_pt) { > + ret = xe_pt_walk_range(&pt->base, pt->level, > xe_vma_start(vma), > + xe_vma_end(vma), > &xe_walk.base); > + > + *num_entries = xe_walk.wupd.num_used_entries; > + return ret; > + } > + > /** > * Default atomic expectations for different allocation > scenarios are as follows: > * > @@ -981,12 +998,14 @@ static void xe_pt_free_bind(struct > xe_vm_pgtable_update *entries, > > static int > xe_pt_prepare_bind(struct xe_tile *tile, struct xe_vma *vma, > - struct xe_vm_pgtable_update *entries, u32 > *num_entries) > + struct xe_vm_pgtable_update *entries, > + bool invalidate_on_bind, u32 *num_entries) > { > int err; > > *num_entries = 0; > - err = xe_pt_stage_bind(tile, vma, entries, num_entries); > + err = xe_pt_stage_bind(tile, vma, entries, > invalidate_on_bind, > + num_entries); > if (!err) > xe_tile_assert(tile, *num_entries); > > @@ -1661,6 +1680,7 @@ static int bind_op_prepare(struct xe_vm *vm, > struct xe_tile *tile, > return err; > > err = xe_pt_prepare_bind(tile, vma, pt_op->entries, > + pt_update_ops->invalidate_on_bind, > &pt_op->num_entries); > if (!err) { > xe_tile_assert(tile, pt_op->num_entries <= > @@ -1685,7 +1705,7 @@ static int bind_op_prepare(struct xe_vm *vm, > struct xe_tile *tile, > * it needs to be done here. > */ > if ((!pt_op->rebind && xe_vm_has_scratch(vm) && > - xe_vm_in_preempt_fence_mode(vm))) > + xe_vm_in_preempt_fence_mode(vm)) || > pt_update_ops->invalidate_on_bind) > pt_update_ops->needs_invalidation = true; > else if (pt_op->rebind && !xe_vm_in_lr_mode(vm)) > /* We bump also if batch_invalidate_tlb is > true */ > @@ -1759,9 +1779,13 @@ static int op_prepare(struct xe_vm *vm, > > switch (op->base.op) { > case DRM_GPUVA_OP_MAP: > - if (!op->map.immediate && xe_vm_in_fault_mode(vm)) > + if (!op->map.immediate && xe_vm_in_fault_mode(vm) && > + !op->map.invalidate_on_bind) > break; > > + if (op->map.invalidate_on_bind) > + pt_update_ops->invalidate_on_bind = true; > + > err = bind_op_prepare(vm, tile, pt_update_ops, op- > >map.vma); > pt_update_ops->wait_vm_kernel = true; > break; > @@ -1871,6 +1895,8 @@ static void bind_op_commit(struct xe_vm *vm, > struct xe_tile *tile, > } > vma->tile_present |= BIT(tile->id); > vma->tile_staged &= ~BIT(tile->id); > + if (pt_update_ops->invalidate_on_bind) > + vma->tile_invalidated |= BIT(tile->id); > if (xe_vma_is_userptr(vma)) { > lockdep_assert_held_read(&vm- > >userptr.notifier_lock); > to_userptr_vma(vma)->userptr.initial_bind = true; > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h > b/drivers/gpu/drm/xe/xe_pt_types.h > index 384cc04de719..3d0aa2a5102e 100644 > --- a/drivers/gpu/drm/xe/xe_pt_types.h > +++ b/drivers/gpu/drm/xe/xe_pt_types.h > @@ -108,6 +108,8 @@ struct xe_vm_pgtable_update_ops { > bool needs_userptr_lock; > /** @needs_invalidation: Needs invalidation */ > bool needs_invalidation; > + /** @invalidate_on_bind: Invalidate the range before bind */ > + bool invalidate_on_bind; > /** > * @wait_vm_bookkeep: PT operations need to wait until VM is > idle > * (bookkeep dma-resv slots are idle) and stage all future > VM activity > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c > index d664f2e418b2..813d893d9b63 100644 > --- a/drivers/gpu/drm/xe/xe_vm.c > +++ b/drivers/gpu/drm/xe/xe_vm.c > @@ -1921,6 +1921,23 @@ static void print_op(struct xe_device *xe, > struct drm_gpuva_op *op) > } > #endif > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_vm *vm, u32 > bind_flags) > +{ > + if (!xe_vm_in_fault_mode(vm)) > + return false; > + > + if (!NEEDS_SCRATCH(vm->xe)) > + return false; > + > + if (!xe_vm_has_scratch(vm)) > + return false; > + > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > + return false; > + > + return true; > +} > + > /* > * Create operations list from IOCTL arguments, setup operations > fields so parse > * and commit steps are decoupled from IOCTL arguments. This step > can fail. > @@ -1991,6 +2008,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, > struct xe_bo *bo, > op->map.is_null = flags & > DRM_XE_VM_BIND_FLAG_NULL; > op->map.dumpable = flags & > DRM_XE_VM_BIND_FLAG_DUMPABLE; > op->map.pat_index = pat_index; > + op->map.invalidate_on_bind = > + __xe_vm_needs_clear_scratch_pages(vm > , flags); > } else if (__op->op == DRM_GPUVA_OP_PREFETCH) { > op->prefetch.region = prefetch_region; > } > @@ -2188,7 +2207,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm > *vm, struct drm_gpuva_ops *ops, > return PTR_ERR(vma); > > op->map.vma = vma; > - if (op->map.immediate || > !xe_vm_in_fault_mode(vm)) > + if (op->map.immediate || > !xe_vm_in_fault_mode(vm) || > + op->map.invalidate_on_bind) > xe_vma_ops_incr_pt_update_ops(vops, > op- > >tile_mask); > break; > @@ -2416,9 +2436,10 @@ static int op_lock_and_prep(struct drm_exec > *exec, struct xe_vm *vm, > > switch (op->base.op) { > case DRM_GPUVA_OP_MAP: > - err = vma_lock_and_validate(exec, op->map.vma, > - !xe_vm_in_fault_mode(vm) > || > - op->map.immediate); > + if (!op->map.invalidate_on_bind) > + err = vma_lock_and_validate(exec, op- > >map.vma, > + > !xe_vm_in_fault_mode(vm) || > + op- > >map.immediate); > break; > case DRM_GPUVA_OP_REMAP: > err = check_ufence(gpuva_to_vma(op- > >base.remap.unmap->va)); > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h > b/drivers/gpu/drm/xe/xe_vm_types.h > index 52467b9b5348..dace04f4ea5e 100644 > --- a/drivers/gpu/drm/xe/xe_vm_types.h > +++ b/drivers/gpu/drm/xe/xe_vm_types.h > @@ -297,6 +297,8 @@ struct xe_vma_op_map { > bool is_null; > /** @dumpable: whether BO is dumped on GPU hang */ > bool dumpable; > + /** @invalidate: invalidate the VMA before bind */ > + bool invalidate_on_bind; > /** @pat_index: The pat index to use for this operation. */ > u16 pat_index; > }; ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-02-05 14:35 ` Thomas Hellström @ 2025-02-06 1:52 ` Zeng, Oak 2025-02-06 12:51 ` Thomas Hellström 0 siblings, 1 reply; 32+ messages in thread From: Zeng, Oak @ 2025-02-06 1:52 UTC (permalink / raw) To: Thomas Hellström, intel-xe@lists.freedesktop.org Cc: Brost, Matthew, Cavitt, Jonathan > -----Original Message----- > From: Thomas Hellström <thomas.hellstrom@linux.intel.com> > Sent: February 5, 2025 9:36 AM > To: Zeng, Oak <oak.zeng@intel.com>; intel-xe@lists.freedesktop.org > Cc: Brost, Matthew <matthew.brost@intel.com>; Cavitt, Jonathan > <jonathan.cavitt@intel.com> > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind > > On Tue, 2025-02-04 at 13:45 -0500, Oak Zeng wrote: > > When a vm runs under fault mode, if scratch page is enabled, we > need > > to clear the scratch page mapping before vm_bind for the vm_bind > > address range. Under fault mode, we depend on recoverable page > fault > > to establish mapping in page table. If scratch page is not cleared, > > GPU access of address won't cause page fault because it always hits > > the existing scratch page mapping. > > I think we need to ephasize that we clear AT vm_bind time, not > before, > both in the commit description and the patch title. Will fix. > > > > > When vm_bind with IMMEDIATE flag, there is no need of clearing as > > immediate bind can overwrite the scratch page mapping. > > > > So far only is xe2 and xe3 products are allowed to enable scratch > > page > > under fault mode. On other platform we don't allow scratch page > under > > fault mode, so no need of such clearing. > > > > v2: Rework vm_bind pipeline to clear scratch page mapping. This is > > similar > > to a map operation, with the exception that PTEs are cleared > instead > > of > > pointing to valid physical pages. (Matt, Thomas) > > > > TLB invalidation is needed after clear scratch page mapping as larger > > scratch page mapping could be backed by physical page and cached > in > > TLB. (Matt, Thomas) > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > --- > > drivers/gpu/drm/xe/xe_pt.c | 66 ++++++++++++++++++++++-- > ------ > > -- > > drivers/gpu/drm/xe/xe_pt_types.h | 2 + > > drivers/gpu/drm/xe/xe_vm.c | 29 ++++++++++++-- > > drivers/gpu/drm/xe/xe_vm_types.h | 2 + > > 4 files changed, 75 insertions(+), 24 deletions(-) > > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c > b/drivers/gpu/drm/xe/xe_pt.c > > index 1ddcc7e79a93..3fd0ae2dbe7d 100644 > > --- a/drivers/gpu/drm/xe/xe_pt.c > > +++ b/drivers/gpu/drm/xe/xe_pt.c > > @@ -268,6 +268,8 @@ struct xe_pt_stage_bind_walk { > > * granularity. > > */ > > bool needs_64K; > > + /* @clear_pt: clear page table entries during the bind walk > > */ > > + bool clear_pt; > > /** > > * @vma: VMA being mapped > > */ > > @@ -497,21 +499,25 @@ xe_pt_stage_bind_entry(struct xe_ptw > *parent, > > pgoff_t offset, > > > > XE_WARN_ON(xe_walk->va_curs_start != addr); > > > > - pte = vm->pt_ops->pte_encode_vma(is_null ? 0 : > > - xe_res_dma(curs) + > > xe_walk->dma_offset, > > - xe_walk->vma, > > pat_index, level); > > - pte |= xe_walk->default_pte; > > + if (xe_walk->clear_pt) { > > + pte = 0; > > + } else { > > + pte = vm->pt_ops->pte_encode_vma(is_null ? > 0 > > : > > + xe_res_dma(curs) + xe_walk- > > >dma_offset, > > + xe_walk->vma, pat_index, > > level); > > + pte |= xe_walk->default_pte; > > > > - /* > > - * Set the XE_PTE_PS64 hint if possible, otherwise > > if > > - * this device *requires* 64K PTE size for VRAM, > > fail. > > - */ > > - if (level == 0 && !xe_parent->is_compact) { > > - if (xe_pt_is_pte_ps64K(addr, next, xe_walk)) > > { > > - xe_walk->vma->gpuva.flags |= > > XE_VMA_PTE_64K; > > - pte |= XE_PTE_PS64; > > - } else if (XE_WARN_ON(xe_walk- > >needs_64K)) { > > - return -EINVAL; > > + /* > > + * Set the XE_PTE_PS64 hint if possible, > > otherwise if > > + * this device *requires* 64K PTE size for > > VRAM, fail. > > + */ > > + if (level == 0 && !xe_parent->is_compact) { > > + if (xe_pt_is_pte_ps64K(addr, next, > > xe_walk)) { > > + xe_walk->vma->gpuva.flags > |= > > XE_VMA_PTE_64K; > > + pte |= XE_PTE_PS64; > > + } else if (XE_WARN_ON(xe_walk- > > >needs_64K)) { > > + return -EINVAL; > > + } > > } > > } > > > > I think there might be some things missing here. > The bind should look exactly as a is_null bind, with the exception that > the pte should be 0, so might be able to benefit from is_null, for > example xe_pg_hugepte_possible() short-circuit the dma-address > check. You are right. Above code has a problem for huge pte cases: it wouldn't detect A huge pte for clear_pt case. I will fix it. I will not make is_null be true for clear_pt case, but will make similar logic to Let clear_pt benefit from is_null logic. > With clear_pt we don't want to use the XE_PTE_PS64K, though, like > you > identified above. > > Will get back with a look at the invalidation. I don't quite get here. Please do let me know if you find anything. Oak > > > > @@ -519,7 +525,7 @@ xe_pt_stage_bind_entry(struct xe_ptw > *parent, > > pgoff_t offset, > > if (unlikely(ret)) > > return ret; > > > > - if (!is_null) > > + if (!is_null && !xe_walk->clear_pt) > > xe_res_next(curs, next - addr); > > xe_walk->va_curs_start = next; > > xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << > > level); > > @@ -589,6 +595,7 @@ static const struct xe_pt_walk_ops > > xe_pt_stage_bind_ops = { > > * @vma: The vma indicating the address range. > > * @entries: Storage for the update entries used for connecting the > > tree to > > * the main tree at commit time. > > + * @clear_pt: Clear the page table entries. > > * @num_entries: On output contains the number of @entries > used. > > * > > * This function builds a disconnected page-table tree for a given > > address > > @@ -602,7 +609,8 @@ static const struct xe_pt_walk_ops > > xe_pt_stage_bind_ops = { > > */ > > static int > > xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, > > - struct xe_vm_pgtable_update *entries, u32 > > *num_entries) > > + struct xe_vm_pgtable_update *entries, > > + bool clear_pt, u32 *num_entries) > > { > > struct xe_device *xe = tile_to_xe(tile); > > struct xe_bo *bo = xe_vma_bo(vma); > > @@ -622,10 +630,19 @@ xe_pt_stage_bind(struct xe_tile *tile, > struct > > xe_vma *vma, > > .vma = vma, > > .wupd.entries = entries, > > .needs_64K = (xe_vma_vm(vma)->flags & > > XE_VM_FLAG_64K) && is_devmem, > > + .clear_pt = clear_pt, > > }; > > struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id]; > > int ret; > > > > + if (clear_pt) { > > + ret = xe_pt_walk_range(&pt->base, pt->level, > > xe_vma_start(vma), > > + xe_vma_end(vma), > > &xe_walk.base); > > + > > + *num_entries = xe_walk.wupd.num_used_entries; > > + return ret; > > + } > > + > > /** > > * Default atomic expectations for different allocation > > scenarios are as follows: > > * > > @@ -981,12 +998,14 @@ static void xe_pt_free_bind(struct > > xe_vm_pgtable_update *entries, > > > > static int > > xe_pt_prepare_bind(struct xe_tile *tile, struct xe_vma *vma, > > - struct xe_vm_pgtable_update *entries, u32 > > *num_entries) > > + struct xe_vm_pgtable_update *entries, > > + bool invalidate_on_bind, u32 *num_entries) > > { > > int err; > > > > *num_entries = 0; > > - err = xe_pt_stage_bind(tile, vma, entries, num_entries); > > + err = xe_pt_stage_bind(tile, vma, entries, > > invalidate_on_bind, > > + num_entries); > > if (!err) > > xe_tile_assert(tile, *num_entries); > > > > @@ -1661,6 +1680,7 @@ static int bind_op_prepare(struct xe_vm > *vm, > > struct xe_tile *tile, > > return err; > > > > err = xe_pt_prepare_bind(tile, vma, pt_op->entries, > > + pt_update_ops->invalidate_on_bind, > > &pt_op->num_entries); > > if (!err) { > > xe_tile_assert(tile, pt_op->num_entries <= > > @@ -1685,7 +1705,7 @@ static int bind_op_prepare(struct xe_vm > *vm, > > struct xe_tile *tile, > > * it needs to be done here. > > */ > > if ((!pt_op->rebind && xe_vm_has_scratch(vm) && > > - xe_vm_in_preempt_fence_mode(vm))) > > + xe_vm_in_preempt_fence_mode(vm)) || > > pt_update_ops->invalidate_on_bind) > > pt_update_ops->needs_invalidation = true; > > else if (pt_op->rebind && !xe_vm_in_lr_mode(vm)) > > /* We bump also if batch_invalidate_tlb is > > true */ > > @@ -1759,9 +1779,13 @@ static int op_prepare(struct xe_vm *vm, > > > > switch (op->base.op) { > > case DRM_GPUVA_OP_MAP: > > - if (!op->map.immediate && > xe_vm_in_fault_mode(vm)) > > + if (!op->map.immediate && > xe_vm_in_fault_mode(vm) && > > + !op->map.invalidate_on_bind) > > break; > > > > + if (op->map.invalidate_on_bind) > > + pt_update_ops->invalidate_on_bind = true; > > + > > err = bind_op_prepare(vm, tile, pt_update_ops, op- > > >map.vma); > > pt_update_ops->wait_vm_kernel = true; > > break; > > @@ -1871,6 +1895,8 @@ static void bind_op_commit(struct xe_vm > *vm, > > struct xe_tile *tile, > > } > > vma->tile_present |= BIT(tile->id); > > vma->tile_staged &= ~BIT(tile->id); > > + if (pt_update_ops->invalidate_on_bind) > > + vma->tile_invalidated |= BIT(tile->id); > > if (xe_vma_is_userptr(vma)) { > > lockdep_assert_held_read(&vm- > > >userptr.notifier_lock); > > to_userptr_vma(vma)->userptr.initial_bind = true; > > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h > > b/drivers/gpu/drm/xe/xe_pt_types.h > > index 384cc04de719..3d0aa2a5102e 100644 > > --- a/drivers/gpu/drm/xe/xe_pt_types.h > > +++ b/drivers/gpu/drm/xe/xe_pt_types.h > > @@ -108,6 +108,8 @@ struct xe_vm_pgtable_update_ops { > > bool needs_userptr_lock; > > /** @needs_invalidation: Needs invalidation */ > > bool needs_invalidation; > > + /** @invalidate_on_bind: Invalidate the range before bind */ > > + bool invalidate_on_bind; > > /** > > * @wait_vm_bookkeep: PT operations need to wait until VM > is > > idle > > * (bookkeep dma-resv slots are idle) and stage all future > > VM activity > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > b/drivers/gpu/drm/xe/xe_vm.c > > index d664f2e418b2..813d893d9b63 100644 > > --- a/drivers/gpu/drm/xe/xe_vm.c > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > @@ -1921,6 +1921,23 @@ static void print_op(struct xe_device *xe, > > struct drm_gpuva_op *op) > > } > > #endif > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_vm > *vm, u32 > > bind_flags) > > +{ > > + if (!xe_vm_in_fault_mode(vm)) > > + return false; > > + > > + if (!NEEDS_SCRATCH(vm->xe)) > > + return false; > > + > > + if (!xe_vm_has_scratch(vm)) > > + return false; > > + > > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > + return false; > > + > > + return true; > > +} > > + > > /* > > * Create operations list from IOCTL arguments, setup operations > > fields so parse > > * and commit steps are decoupled from IOCTL arguments. This > step > > can fail. > > @@ -1991,6 +2008,8 @@ vm_bind_ioctl_ops_create(struct xe_vm > *vm, > > struct xe_bo *bo, > > op->map.is_null = flags & > > DRM_XE_VM_BIND_FLAG_NULL; > > op->map.dumpable = flags & > > DRM_XE_VM_BIND_FLAG_DUMPABLE; > > op->map.pat_index = pat_index; > > + op->map.invalidate_on_bind = > > + > __xe_vm_needs_clear_scratch_pages(vm > > , flags); > > } else if (__op->op == DRM_GPUVA_OP_PREFETCH) { > > op->prefetch.region = prefetch_region; > > } > > @@ -2188,7 +2207,8 @@ static int vm_bind_ioctl_ops_parse(struct > xe_vm > > *vm, struct drm_gpuva_ops *ops, > > return PTR_ERR(vma); > > > > op->map.vma = vma; > > - if (op->map.immediate || > > !xe_vm_in_fault_mode(vm)) > > + if (op->map.immediate || > > !xe_vm_in_fault_mode(vm) || > > + op->map.invalidate_on_bind) > > > xe_vma_ops_incr_pt_update_ops(vops, > > op- > > >tile_mask); > > break; > > @@ -2416,9 +2436,10 @@ static int op_lock_and_prep(struct > drm_exec > > *exec, struct xe_vm *vm, > > > > switch (op->base.op) { > > case DRM_GPUVA_OP_MAP: > > - err = vma_lock_and_validate(exec, op->map.vma, > > - !xe_vm_in_fault_mode(vm) > > || > > - op->map.immediate); > > + if (!op->map.invalidate_on_bind) > > + err = vma_lock_and_validate(exec, op- > > >map.vma, > > + > > !xe_vm_in_fault_mode(vm) || > > + op- > > >map.immediate); > > break; > > case DRM_GPUVA_OP_REMAP: > > err = check_ufence(gpuva_to_vma(op- > > >base.remap.unmap->va)); > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h > > b/drivers/gpu/drm/xe/xe_vm_types.h > > index 52467b9b5348..dace04f4ea5e 100644 > > --- a/drivers/gpu/drm/xe/xe_vm_types.h > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h > > @@ -297,6 +297,8 @@ struct xe_vma_op_map { > > bool is_null; > > /** @dumpable: whether BO is dumped on GPU hang */ > > bool dumpable; > > + /** @invalidate: invalidate the VMA before bind */ > > + bool invalidate_on_bind; > > /** @pat_index: The pat index to use for this operation. */ > > u16 pat_index; > > }; ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-02-06 1:52 ` Zeng, Oak @ 2025-02-06 12:51 ` Thomas Hellström 2025-02-06 18:56 ` Zeng, Oak 0 siblings, 1 reply; 32+ messages in thread From: Thomas Hellström @ 2025-02-06 12:51 UTC (permalink / raw) To: Zeng, Oak, intel-xe@lists.freedesktop.org Cc: Brost, Matthew, Cavitt, Jonathan On Thu, 2025-02-06 at 01:52 +0000, Zeng, Oak wrote: > > > > -----Original Message----- > > From: Thomas Hellström <thomas.hellstrom@linux.intel.com> > > Sent: February 5, 2025 9:36 AM > > To: Zeng, Oak <oak.zeng@intel.com>; intel-xe@lists.freedesktop.org > > Cc: Brost, Matthew <matthew.brost@intel.com>; Cavitt, Jonathan > > <jonathan.cavitt@intel.com> > > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind > > > > On Tue, 2025-02-04 at 13:45 -0500, Oak Zeng wrote: > > > When a vm runs under fault mode, if scratch page is enabled, we > > need > > > to clear the scratch page mapping before vm_bind for the vm_bind > > > address range. Under fault mode, we depend on recoverable page > > fault > > > to establish mapping in page table. If scratch page is not > > > cleared, > > > GPU access of address won't cause page fault because it always > > > hits > > > the existing scratch page mapping. > > > > I think we need to ephasize that we clear AT vm_bind time, not > > before, > > both in the commit description and the patch title. > > Will fix. > > > > > > > > > When vm_bind with IMMEDIATE flag, there is no need of clearing as > > > immediate bind can overwrite the scratch page mapping. > > > > > > So far only is xe2 and xe3 products are allowed to enable scratch > > > page > > > under fault mode. On other platform we don't allow scratch page > > under > > > fault mode, so no need of such clearing. > > > > > > v2: Rework vm_bind pipeline to clear scratch page mapping. This > > > is > > > similar > > > to a map operation, with the exception that PTEs are cleared > > instead > > > of > > > pointing to valid physical pages. (Matt, Thomas) > > > > > > TLB invalidation is needed after clear scratch page mapping as > > > larger > > > scratch page mapping could be backed by physical page and cached > > in > > > TLB. (Matt, Thomas) > > > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > > --- > > > drivers/gpu/drm/xe/xe_pt.c | 66 ++++++++++++++++++++++-- > > ------ > > > -- > > > drivers/gpu/drm/xe/xe_pt_types.h | 2 + > > > drivers/gpu/drm/xe/xe_vm.c | 29 ++++++++++++-- > > > drivers/gpu/drm/xe/xe_vm_types.h | 2 + > > > 4 files changed, 75 insertions(+), 24 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c > > b/drivers/gpu/drm/xe/xe_pt.c > > > index 1ddcc7e79a93..3fd0ae2dbe7d 100644 > > > --- a/drivers/gpu/drm/xe/xe_pt.c > > > +++ b/drivers/gpu/drm/xe/xe_pt.c > > > @@ -268,6 +268,8 @@ struct xe_pt_stage_bind_walk { > > > * granularity. > > > */ > > > bool needs_64K; > > > + /* @clear_pt: clear page table entries during the bind > > > walk > > > */ > > > + bool clear_pt; > > > /** > > > * @vma: VMA being mapped > > > */ > > > @@ -497,21 +499,25 @@ xe_pt_stage_bind_entry(struct xe_ptw > > *parent, > > > pgoff_t offset, > > > > > > XE_WARN_ON(xe_walk->va_curs_start != addr); > > > > > > - pte = vm->pt_ops->pte_encode_vma(is_null ? 0 : > > > - > > > xe_res_dma(curs) + > > > xe_walk->dma_offset, > > > - xe_walk->vma, > > > pat_index, level); > > > - pte |= xe_walk->default_pte; > > > + if (xe_walk->clear_pt) { > > > + pte = 0; > > > + } else { > > > + pte = vm->pt_ops->pte_encode_vma(is_null > > > ? > > 0 > > > : > > > + xe_res_dma(curs) + > > > xe_walk- > > > > dma_offset, > > > + xe_walk->vma, pat_index, > > > level); > > > + pte |= xe_walk->default_pte; > > > > > > - /* > > > - * Set the XE_PTE_PS64 hint if possible, > > > otherwise > > > if > > > - * this device *requires* 64K PTE size for VRAM, > > > fail. > > > - */ > > > - if (level == 0 && !xe_parent->is_compact) { > > > - if (xe_pt_is_pte_ps64K(addr, next, > > > xe_walk)) > > > { > > > - xe_walk->vma->gpuva.flags |= > > > XE_VMA_PTE_64K; > > > - pte |= XE_PTE_PS64; > > > - } else if (XE_WARN_ON(xe_walk- > > > needs_64K)) { > > > - return -EINVAL; > > > + /* > > > + * Set the XE_PTE_PS64 hint if possible, > > > otherwise if > > > + * this device *requires* 64K PTE size > > > for > > > VRAM, fail. > > > + */ > > > + if (level == 0 && !xe_parent- > > > >is_compact) { > > > + if (xe_pt_is_pte_ps64K(addr, > > > next, > > > xe_walk)) { > > > + xe_walk->vma- > > > >gpuva.flags > > > = > > > XE_VMA_PTE_64K; > > > + pte |= XE_PTE_PS64; > > > + } else if (XE_WARN_ON(xe_walk- > > > > needs_64K)) { > > > + return -EINVAL; > > > + } > > > } > > > } > > > > > > > I think there might be some things missing here. > > The bind should look exactly as a is_null bind, with the exception > > that > > the pte should be 0, so might be able to benefit from is_null, for > > example xe_pg_hugepte_possible() short-circuit the dma-address > > check. > > You are right. Above code has a problem for huge pte cases: it > wouldn't detect > A huge pte for clear_pt case. I will fix it. > > I will not make is_null be true for clear_pt case, but will make > similar logic to > Let clear_pt benefit from is_null logic. > > > With clear_pt we don't want to use the XE_PTE_PS64K, though, like > > you > > identified above. > > > > Will get back with a look at the invalidation. > > I don't quite get here. Please do let me know if you find anything. Yes, I meant I hadn't had time to look at this part yet. For binding with scratch, it looks like *all* binds need TLB invalidation, not just clearing binds, So in the below code-path, can't we just replace the xe_vm_in_preempt_fence_mode() with xe_vm_in_lr_mode()? /* * If rebind, we have to invalidate TLB on !LR vms to invalidate * cached PTEs point to freed memory. On LR vms this is done * automatically when the context is re-enabled by the rebind worker, * or in fault mode it was invalidated on PTE zapping. * * If !rebind, and scratch enabled VMs, there is a chance the scratch * PTE is already cached in the TLB so it needs to be invalidated. * On !LR VMs this is done in the ring ops preceding a batch, but on * non-faulting LR, in particular on user-space batch buffer chaining, * it needs to be done here. */ if ((!pt_op->rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) /Thomas > > Oak > > > > > > > @@ -519,7 +525,7 @@ xe_pt_stage_bind_entry(struct xe_ptw > > *parent, > > > pgoff_t offset, > > > if (unlikely(ret)) > > > return ret; > > > > > > - if (!is_null) > > > + if (!is_null && !xe_walk->clear_pt) > > > xe_res_next(curs, next - addr); > > > xe_walk->va_curs_start = next; > > > xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << > > > level); > > > @@ -589,6 +595,7 @@ static const struct xe_pt_walk_ops > > > xe_pt_stage_bind_ops = { > > > * @vma: The vma indicating the address range. > > > * @entries: Storage for the update entries used for connecting > > > the > > > tree to > > > * the main tree at commit time. > > > + * @clear_pt: Clear the page table entries. > > > * @num_entries: On output contains the number of @entries > > used. > > > * > > > * This function builds a disconnected page-table tree for a > > > given > > > address > > > @@ -602,7 +609,8 @@ static const struct xe_pt_walk_ops > > > xe_pt_stage_bind_ops = { > > > */ > > > static int > > > xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, > > > - struct xe_vm_pgtable_update *entries, u32 > > > *num_entries) > > > + struct xe_vm_pgtable_update *entries, > > > + bool clear_pt, u32 *num_entries) > > > { > > > struct xe_device *xe = tile_to_xe(tile); > > > struct xe_bo *bo = xe_vma_bo(vma); > > > @@ -622,10 +630,19 @@ xe_pt_stage_bind(struct xe_tile *tile, > > struct > > > xe_vma *vma, > > > .vma = vma, > > > .wupd.entries = entries, > > > .needs_64K = (xe_vma_vm(vma)->flags & > > > XE_VM_FLAG_64K) && is_devmem, > > > + .clear_pt = clear_pt, > > > }; > > > struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id]; > > > int ret; > > > > > > + if (clear_pt) { > > > + ret = xe_pt_walk_range(&pt->base, pt->level, > > > xe_vma_start(vma), > > > + xe_vma_end(vma), > > > &xe_walk.base); > > > + > > > + *num_entries = xe_walk.wupd.num_used_entries; > > > + return ret; > > > + } > > > + > > > /** > > > * Default atomic expectations for different allocation > > > scenarios are as follows: > > > * > > > @@ -981,12 +998,14 @@ static void xe_pt_free_bind(struct > > > xe_vm_pgtable_update *entries, > > > > > > static int > > > xe_pt_prepare_bind(struct xe_tile *tile, struct xe_vma *vma, > > > - struct xe_vm_pgtable_update *entries, u32 > > > *num_entries) > > > + struct xe_vm_pgtable_update *entries, > > > + bool invalidate_on_bind, u32 *num_entries) > > > { > > > int err; > > > > > > *num_entries = 0; > > > - err = xe_pt_stage_bind(tile, vma, entries, num_entries); > > > + err = xe_pt_stage_bind(tile, vma, entries, > > > invalidate_on_bind, > > > + num_entries); > > > if (!err) > > > xe_tile_assert(tile, *num_entries); > > > > > > @@ -1661,6 +1680,7 @@ static int bind_op_prepare(struct xe_vm > > *vm, > > > struct xe_tile *tile, > > > return err; > > > > > > err = xe_pt_prepare_bind(tile, vma, pt_op->entries, > > > + pt_update_ops- > > > >invalidate_on_bind, > > > &pt_op->num_entries); > > > if (!err) { > > > xe_tile_assert(tile, pt_op->num_entries <= > > > @@ -1685,7 +1705,7 @@ static int bind_op_prepare(struct xe_vm > > *vm, > > > struct xe_tile *tile, > > > * it needs to be done here. > > > */ > > > if ((!pt_op->rebind && xe_vm_has_scratch(vm) && > > > - xe_vm_in_preempt_fence_mode(vm))) > > > + xe_vm_in_preempt_fence_mode(vm)) || > > > pt_update_ops->invalidate_on_bind) > > > pt_update_ops->needs_invalidation = > > > true; > > > else if (pt_op->rebind && !xe_vm_in_lr_mode(vm)) > > > /* We bump also if batch_invalidate_tlb > > > is > > > true */ > > > @@ -1759,9 +1779,13 @@ static int op_prepare(struct xe_vm *vm, > > > > > > switch (op->base.op) { > > > case DRM_GPUVA_OP_MAP: > > > - if (!op->map.immediate && > > xe_vm_in_fault_mode(vm)) > > > + if (!op->map.immediate && > > xe_vm_in_fault_mode(vm) && > > > + !op->map.invalidate_on_bind) > > > break; > > > > > > + if (op->map.invalidate_on_bind) > > > + pt_update_ops->invalidate_on_bind = > > > true; > > > + > > > err = bind_op_prepare(vm, tile, pt_update_ops, > > > op- > > > > map.vma); > > > pt_update_ops->wait_vm_kernel = true; > > > break; > > > @@ -1871,6 +1895,8 @@ static void bind_op_commit(struct xe_vm > > *vm, > > > struct xe_tile *tile, > > > } > > > vma->tile_present |= BIT(tile->id); > > > vma->tile_staged &= ~BIT(tile->id); > > > + if (pt_update_ops->invalidate_on_bind) > > > + vma->tile_invalidated |= BIT(tile->id); > > > if (xe_vma_is_userptr(vma)) { > > > lockdep_assert_held_read(&vm- > > > > userptr.notifier_lock); > > > to_userptr_vma(vma)->userptr.initial_bind = > > > true; > > > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h > > > b/drivers/gpu/drm/xe/xe_pt_types.h > > > index 384cc04de719..3d0aa2a5102e 100644 > > > --- a/drivers/gpu/drm/xe/xe_pt_types.h > > > +++ b/drivers/gpu/drm/xe/xe_pt_types.h > > > @@ -108,6 +108,8 @@ struct xe_vm_pgtable_update_ops { > > > bool needs_userptr_lock; > > > /** @needs_invalidation: Needs invalidation */ > > > bool needs_invalidation; > > > + /** @invalidate_on_bind: Invalidate the range before > > > bind */ > > > + bool invalidate_on_bind; > > > /** > > > * @wait_vm_bookkeep: PT operations need to wait until > > > VM > > is > > > idle > > > * (bookkeep dma-resv slots are idle) and stage all > > > future > > > VM activity > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > > b/drivers/gpu/drm/xe/xe_vm.c > > > index d664f2e418b2..813d893d9b63 100644 > > > --- a/drivers/gpu/drm/xe/xe_vm.c > > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > > @@ -1921,6 +1921,23 @@ static void print_op(struct xe_device *xe, > > > struct drm_gpuva_op *op) > > > } > > > #endif > > > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_vm > > *vm, u32 > > > bind_flags) > > > +{ > > > + if (!xe_vm_in_fault_mode(vm)) > > > + return false; > > > + > > > + if (!NEEDS_SCRATCH(vm->xe)) > > > + return false; > > > + > > > + if (!xe_vm_has_scratch(vm)) > > > + return false; > > > + > > > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > > + return false; > > > + > > > + return true; > > > +} > > > + > > > /* > > > * Create operations list from IOCTL arguments, setup operations > > > fields so parse > > > * and commit steps are decoupled from IOCTL arguments. This > > step > > > can fail. > > > @@ -1991,6 +2008,8 @@ vm_bind_ioctl_ops_create(struct xe_vm > > *vm, > > > struct xe_bo *bo, > > > op->map.is_null = flags & > > > DRM_XE_VM_BIND_FLAG_NULL; > > > op->map.dumpable = flags & > > > DRM_XE_VM_BIND_FLAG_DUMPABLE; > > > op->map.pat_index = pat_index; > > > + op->map.invalidate_on_bind = > > > + > > __xe_vm_needs_clear_scratch_pages(vm > > > , flags); > > > } else if (__op->op == DRM_GPUVA_OP_PREFETCH) { > > > op->prefetch.region = prefetch_region; > > > } > > > @@ -2188,7 +2207,8 @@ static int vm_bind_ioctl_ops_parse(struct > > xe_vm > > > *vm, struct drm_gpuva_ops *ops, > > > return PTR_ERR(vma); > > > > > > op->map.vma = vma; > > > - if (op->map.immediate || > > > !xe_vm_in_fault_mode(vm)) > > > + if (op->map.immediate || > > > !xe_vm_in_fault_mode(vm) || > > > + op->map.invalidate_on_bind) > > > > > xe_vma_ops_incr_pt_update_ops(vops, > > > > > > op- > > > > tile_mask); > > > break; > > > @@ -2416,9 +2436,10 @@ static int op_lock_and_prep(struct > > drm_exec > > > *exec, struct xe_vm *vm, > > > > > > switch (op->base.op) { > > > case DRM_GPUVA_OP_MAP: > > > - err = vma_lock_and_validate(exec, op->map.vma, > > > - > > > !xe_vm_in_fault_mode(vm) > > > > > > > > - op->map.immediate); > > > + if (!op->map.invalidate_on_bind) > > > + err = vma_lock_and_validate(exec, op- > > > > map.vma, > > > + > > > !xe_vm_in_fault_mode(vm) || > > > + op- > > > > map.immediate); > > > break; > > > case DRM_GPUVA_OP_REMAP: > > > err = check_ufence(gpuva_to_vma(op- > > > > base.remap.unmap->va)); > > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h > > > b/drivers/gpu/drm/xe/xe_vm_types.h > > > index 52467b9b5348..dace04f4ea5e 100644 > > > --- a/drivers/gpu/drm/xe/xe_vm_types.h > > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h > > > @@ -297,6 +297,8 @@ struct xe_vma_op_map { > > > bool is_null; > > > /** @dumpable: whether BO is dumped on GPU hang */ > > > bool dumpable; > > > + /** @invalidate: invalidate the VMA before bind */ > > > + bool invalidate_on_bind; > > > /** @pat_index: The pat index to use for this operation. > > > */ > > > u16 pat_index; > > > }; > ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-02-06 12:51 ` Thomas Hellström @ 2025-02-06 18:56 ` Zeng, Oak 2025-02-06 20:11 ` Thomas Hellström 0 siblings, 1 reply; 32+ messages in thread From: Zeng, Oak @ 2025-02-06 18:56 UTC (permalink / raw) To: Thomas Hellström, intel-xe@lists.freedesktop.org Cc: Brost, Matthew, Cavitt, Jonathan Hi Thomas! > -----Original Message----- > From: Thomas Hellström <thomas.hellstrom@linux.intel.com> > Sent: February 6, 2025 7:52 AM > To: Zeng, Oak <oak.zeng@intel.com>; intel-xe@lists.freedesktop.org > Cc: Brost, Matthew <matthew.brost@intel.com>; Cavitt, Jonathan > <jonathan.cavitt@intel.com> > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind > > On Thu, 2025-02-06 at 01:52 +0000, Zeng, Oak wrote: > > > > > > > -----Original Message----- > > > From: Thomas Hellström <thomas.hellstrom@linux.intel.com> > > > Sent: February 5, 2025 9:36 AM > > > To: Zeng, Oak <oak.zeng@intel.com>; intel- > xe@lists.freedesktop.org > > > Cc: Brost, Matthew <matthew.brost@intel.com>; Cavitt, Jonathan > > > <jonathan.cavitt@intel.com> > > > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before > vm_bind > > > > > > On Tue, 2025-02-04 at 13:45 -0500, Oak Zeng wrote: > > > > When a vm runs under fault mode, if scratch page is enabled, > we > > > need > > > > to clear the scratch page mapping before vm_bind for the > vm_bind > > > > address range. Under fault mode, we depend on recoverable > page > > > fault > > > > to establish mapping in page table. If scratch page is not > > > > cleared, > > > > GPU access of address won't cause page fault because it always > > > > hits > > > > the existing scratch page mapping. > > > > > > I think we need to ephasize that we clear AT vm_bind time, not > > > before, > > > both in the commit description and the patch title. > > > > Will fix. > > > > > > > > > > > > > When vm_bind with IMMEDIATE flag, there is no need of > clearing as > > > > immediate bind can overwrite the scratch page mapping. > > > > > > > > So far only is xe2 and xe3 products are allowed to enable scratch > > > > page > > > > under fault mode. On other platform we don't allow scratch > page > > > under > > > > fault mode, so no need of such clearing. > > > > > > > > v2: Rework vm_bind pipeline to clear scratch page mapping. This > > > > is > > > > similar > > > > to a map operation, with the exception that PTEs are cleared > > > instead > > > > of > > > > pointing to valid physical pages. (Matt, Thomas) > > > > > > > > TLB invalidation is needed after clear scratch page mapping as > > > > larger > > > > scratch page mapping could be backed by physical page and > cached > > > in > > > > TLB. (Matt, Thomas) > > > > > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > > > --- > > > > drivers/gpu/drm/xe/xe_pt.c | 66 > ++++++++++++++++++++++-- > > > ------ > > > > -- > > > > drivers/gpu/drm/xe/xe_pt_types.h | 2 + > > > > drivers/gpu/drm/xe/xe_vm.c | 29 ++++++++++++-- > > > > drivers/gpu/drm/xe/xe_vm_types.h | 2 + > > > > 4 files changed, 75 insertions(+), 24 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c > > > b/drivers/gpu/drm/xe/xe_pt.c > > > > index 1ddcc7e79a93..3fd0ae2dbe7d 100644 > > > > --- a/drivers/gpu/drm/xe/xe_pt.c > > > > +++ b/drivers/gpu/drm/xe/xe_pt.c > > > > @@ -268,6 +268,8 @@ struct xe_pt_stage_bind_walk { > > > > * granularity. > > > > */ > > > > bool needs_64K; > > > > + /* @clear_pt: clear page table entries during the bind > > > > walk > > > > */ > > > > + bool clear_pt; > > > > /** > > > > * @vma: VMA being mapped > > > > */ > > > > @@ -497,21 +499,25 @@ xe_pt_stage_bind_entry(struct > xe_ptw > > > *parent, > > > > pgoff_t offset, > > > > > > > > XE_WARN_ON(xe_walk->va_curs_start != addr); > > > > > > > > - pte = vm->pt_ops->pte_encode_vma(is_null ? 0 : > > > > - > > > > xe_res_dma(curs) + > > > > xe_walk->dma_offset, > > > > - xe_walk->vma, > > > > pat_index, level); > > > > - pte |= xe_walk->default_pte; > > > > + if (xe_walk->clear_pt) { > > > > + pte = 0; > > > > + } else { > > > > + pte = vm->pt_ops->pte_encode_vma(is_null > > > > ? > > > 0 > > > > : > > > > + xe_res_dma(curs) + > > > > xe_walk- > > > > > dma_offset, > > > > + xe_walk->vma, pat_index, > > > > level); > > > > + pte |= xe_walk->default_pte; > > > > > > > > - /* > > > > - * Set the XE_PTE_PS64 hint if possible, > > > > otherwise > > > > if > > > > - * this device *requires* 64K PTE size for VRAM, > > > > fail. > > > > - */ > > > > - if (level == 0 && !xe_parent->is_compact) { > > > > - if (xe_pt_is_pte_ps64K(addr, next, > > > > xe_walk)) > > > > { > > > > - xe_walk->vma->gpuva.flags |= > > > > XE_VMA_PTE_64K; > > > > - pte |= XE_PTE_PS64; > > > > - } else if (XE_WARN_ON(xe_walk- > > > > needs_64K)) { > > > > - return -EINVAL; > > > > + /* > > > > + * Set the XE_PTE_PS64 hint if possible, > > > > otherwise if > > > > + * this device *requires* 64K PTE size > > > > for > > > > VRAM, fail. > > > > + */ > > > > + if (level == 0 && !xe_parent- > > > > >is_compact) { > > > > + if (xe_pt_is_pte_ps64K(addr, > > > > next, > > > > xe_walk)) { > > > > + xe_walk->vma- > > > > >gpuva.flags > > > > = > > > > XE_VMA_PTE_64K; > > > > + pte |= XE_PTE_PS64; > > > > + } else if (XE_WARN_ON(xe_walk- > > > > > needs_64K)) { > > > > + return -EINVAL; > > > > + } > > > > } > > > > } > > > > > > > > > > I think there might be some things missing here. > > > The bind should look exactly as a is_null bind, with the exception > > > that > > > the pte should be 0, so might be able to benefit from is_null, for > > > example xe_pg_hugepte_possible() short-circuit the dma- > address > > > check. > > > > You are right. Above code has a problem for huge pte cases: it > > wouldn't detect > > A huge pte for clear_pt case. I will fix it. > > > > I will not make is_null be true for clear_pt case, but will make > > similar logic to > > Let clear_pt benefit from is_null logic. > > > > > With clear_pt we don't want to use the XE_PTE_PS64K, though, > like > > > you > > > identified above. > > > > > > Will get back with a look at the invalidation. > > > > I don't quite get here. Please do let me know if you find anything. > > Yes, I meant I hadn't had time to look at this part yet. > > For binding with scratch, it looks like *all* binds need TLB > invalidation, not just clearing binds, So in the below code-path, can't > we just replace the xe_vm_in_preempt_fence_mode() with > xe_vm_in_lr_mode()? I think you are correct. For long run vm, regardless it is in fault mode or not, we need to invalidate TLB scratch page caching during bind. Original codes only did TLB inv on non- Fault mode. On fault mode + LR, this runs into two sub-cases: 1. immediate bind 2. bind triggered by page fault: note this can actually runs into the !rebind case When the fault address is first time accessed by gpu. But the TLB cache for Scratch page mapping should have been invalidated by the "invalidate_on_bind" Condition. So this leaves case #1 problematic. It would be better for others to comment also, as in the comment below, it explicitly Called out "non-faulting LR". My understanding is, people didn't think of case #1 above. Since it is an existing issue, I will make it a separate patch for better tracking purpose. Oak > > > /* > * If rebind, we have to invalidate TLB on !LR vms to > invalidate > * cached PTEs point to freed memory. On LR vms this > is done > * automatically when the context is re-enabled by > the > rebind worker, > * or in fault mode it was invalidated on PTE zapping. > * > * If !rebind, and scratch enabled VMs, there is a > chance the scratch > * PTE is already cached in the TLB so it needs to be > invalidated. > * On !LR VMs this is done in the ring ops preceding a > batch, but on > * non-faulting LR, in particular on user-space batch > buffer chaining, > * it needs to be done here. > */ > if ((!pt_op->rebind && xe_vm_has_scratch(vm) && > xe_vm_in_preempt_fence_mode(vm))) > > /Thomas > > > > > > > > > > > > > > > > > > > > > > > > Oak > > > > > > > > > > @@ -519,7 +525,7 @@ xe_pt_stage_bind_entry(struct xe_ptw > > > *parent, > > > > pgoff_t offset, > > > > if (unlikely(ret)) > > > > return ret; > > > > > > > > - if (!is_null) > > > > + if (!is_null && !xe_walk->clear_pt) > > > > xe_res_next(curs, next - addr); > > > > xe_walk->va_curs_start = next; > > > > xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << > > > > level); > > > > @@ -589,6 +595,7 @@ static const struct xe_pt_walk_ops > > > > xe_pt_stage_bind_ops = { > > > > * @vma: The vma indicating the address range. > > > > * @entries: Storage for the update entries used for connecting > > > > the > > > > tree to > > > > * the main tree at commit time. > > > > + * @clear_pt: Clear the page table entries. > > > > * @num_entries: On output contains the number of @entries > > > used. > > > > * > > > > * This function builds a disconnected page-table tree for a > > > > given > > > > address > > > > @@ -602,7 +609,8 @@ static const struct xe_pt_walk_ops > > > > xe_pt_stage_bind_ops = { > > > > */ > > > > static int > > > > xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, > > > > - struct xe_vm_pgtable_update *entries, u32 > > > > *num_entries) > > > > + struct xe_vm_pgtable_update *entries, > > > > + bool clear_pt, u32 *num_entries) > > > > { > > > > struct xe_device *xe = tile_to_xe(tile); > > > > struct xe_bo *bo = xe_vma_bo(vma); > > > > @@ -622,10 +630,19 @@ xe_pt_stage_bind(struct xe_tile *tile, > > > struct > > > > xe_vma *vma, > > > > .vma = vma, > > > > .wupd.entries = entries, > > > > .needs_64K = (xe_vma_vm(vma)->flags & > > > > XE_VM_FLAG_64K) && is_devmem, > > > > + .clear_pt = clear_pt, > > > > }; > > > > struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id]; > > > > int ret; > > > > > > > > + if (clear_pt) { > > > > + ret = xe_pt_walk_range(&pt->base, pt->level, > > > > xe_vma_start(vma), > > > > + xe_vma_end(vma), > > > > &xe_walk.base); > > > > + > > > > + *num_entries = xe_walk.wupd.num_used_entries; > > > > + return ret; > > > > + } > > > > + > > > > /** > > > > * Default atomic expectations for different allocation > > > > scenarios are as follows: > > > > * > > > > @@ -981,12 +998,14 @@ static void xe_pt_free_bind(struct > > > > xe_vm_pgtable_update *entries, > > > > > > > > static int > > > > xe_pt_prepare_bind(struct xe_tile *tile, struct xe_vma *vma, > > > > - struct xe_vm_pgtable_update *entries, u32 > > > > *num_entries) > > > > + struct xe_vm_pgtable_update *entries, > > > > + bool invalidate_on_bind, u32 *num_entries) > > > > { > > > > int err; > > > > > > > > *num_entries = 0; > > > > - err = xe_pt_stage_bind(tile, vma, entries, num_entries); > > > > + err = xe_pt_stage_bind(tile, vma, entries, > > > > invalidate_on_bind, > > > > + num_entries); > > > > if (!err) > > > > xe_tile_assert(tile, *num_entries); > > > > > > > > @@ -1661,6 +1680,7 @@ static int bind_op_prepare(struct > xe_vm > > > *vm, > > > > struct xe_tile *tile, > > > > return err; > > > > > > > > err = xe_pt_prepare_bind(tile, vma, pt_op->entries, > > > > + pt_update_ops- > > > > >invalidate_on_bind, > > > > &pt_op->num_entries); > > > > if (!err) { > > > > xe_tile_assert(tile, pt_op->num_entries <= > > > > @@ -1685,7 +1705,7 @@ static int bind_op_prepare(struct > xe_vm > > > *vm, > > > > struct xe_tile *tile, > > > > * it needs to be done here. > > > > */ > > > > if ((!pt_op->rebind && xe_vm_has_scratch(vm) && > > > > - xe_vm_in_preempt_fence_mode(vm))) > > > > + xe_vm_in_preempt_fence_mode(vm)) || > > > > pt_update_ops->invalidate_on_bind) > > > > pt_update_ops->needs_invalidation = > > > > true; > > > > else if (pt_op->rebind && !xe_vm_in_lr_mode(vm)) > > > > /* We bump also if batch_invalidate_tlb > > > > is > > > > true */ > > > > @@ -1759,9 +1779,13 @@ static int op_prepare(struct xe_vm > *vm, > > > > > > > > switch (op->base.op) { > > > > case DRM_GPUVA_OP_MAP: > > > > - if (!op->map.immediate && > > > xe_vm_in_fault_mode(vm)) > > > > + if (!op->map.immediate && > > > xe_vm_in_fault_mode(vm) && > > > > + !op->map.invalidate_on_bind) > > > > break; > > > > > > > > + if (op->map.invalidate_on_bind) > > > > + pt_update_ops->invalidate_on_bind = > > > > true; > > > > + > > > > err = bind_op_prepare(vm, tile, pt_update_ops, > > > > op- > > > > > map.vma); > > > > pt_update_ops->wait_vm_kernel = true; > > > > break; > > > > @@ -1871,6 +1895,8 @@ static void bind_op_commit(struct > xe_vm > > > *vm, > > > > struct xe_tile *tile, > > > > } > > > > vma->tile_present |= BIT(tile->id); > > > > vma->tile_staged &= ~BIT(tile->id); > > > > + if (pt_update_ops->invalidate_on_bind) > > > > + vma->tile_invalidated |= BIT(tile->id); > > > > if (xe_vma_is_userptr(vma)) { > > > > lockdep_assert_held_read(&vm- > > > > > userptr.notifier_lock); > > > > to_userptr_vma(vma)->userptr.initial_bind = > > > > true; > > > > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h > > > > b/drivers/gpu/drm/xe/xe_pt_types.h > > > > index 384cc04de719..3d0aa2a5102e 100644 > > > > --- a/drivers/gpu/drm/xe/xe_pt_types.h > > > > +++ b/drivers/gpu/drm/xe/xe_pt_types.h > > > > @@ -108,6 +108,8 @@ struct xe_vm_pgtable_update_ops { > > > > bool needs_userptr_lock; > > > > /** @needs_invalidation: Needs invalidation */ > > > > bool needs_invalidation; > > > > + /** @invalidate_on_bind: Invalidate the range before > > > > bind */ > > > > + bool invalidate_on_bind; > > > > /** > > > > * @wait_vm_bookkeep: PT operations need to wait until > > > > VM > > > is > > > > idle > > > > * (bookkeep dma-resv slots are idle) and stage all > > > > future > > > > VM activity > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > > > b/drivers/gpu/drm/xe/xe_vm.c > > > > index d664f2e418b2..813d893d9b63 100644 > > > > --- a/drivers/gpu/drm/xe/xe_vm.c > > > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > > > @@ -1921,6 +1921,23 @@ static void print_op(struct xe_device > *xe, > > > > struct drm_gpuva_op *op) > > > > } > > > > #endif > > > > > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct > xe_vm > > > *vm, u32 > > > > bind_flags) > > > > +{ > > > > + if (!xe_vm_in_fault_mode(vm)) > > > > + return false; > > > > + > > > > + if (!NEEDS_SCRATCH(vm->xe)) > > > > + return false; > > > > + > > > > + if (!xe_vm_has_scratch(vm)) > > > > + return false; > > > > + > > > > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > > > + return false; > > > > + > > > > + return true; > > > > +} > > > > + > > > > /* > > > > * Create operations list from IOCTL arguments, setup > operations > > > > fields so parse > > > > * and commit steps are decoupled from IOCTL arguments. This > > > step > > > > can fail. > > > > @@ -1991,6 +2008,8 @@ vm_bind_ioctl_ops_create(struct > xe_vm > > > *vm, > > > > struct xe_bo *bo, > > > > op->map.is_null = flags & > > > > DRM_XE_VM_BIND_FLAG_NULL; > > > > op->map.dumpable = flags & > > > > DRM_XE_VM_BIND_FLAG_DUMPABLE; > > > > op->map.pat_index = pat_index; > > > > + op->map.invalidate_on_bind = > > > > + > > > __xe_vm_needs_clear_scratch_pages(vm > > > > , flags); > > > > } else if (__op->op == DRM_GPUVA_OP_PREFETCH) { > > > > op->prefetch.region = prefetch_region; > > > > } > > > > @@ -2188,7 +2207,8 @@ static int > vm_bind_ioctl_ops_parse(struct > > > xe_vm > > > > *vm, struct drm_gpuva_ops *ops, > > > > return PTR_ERR(vma); > > > > > > > > op->map.vma = vma; > > > > - if (op->map.immediate || > > > > !xe_vm_in_fault_mode(vm)) > > > > + if (op->map.immediate || > > > > !xe_vm_in_fault_mode(vm) || > > > > + op->map.invalidate_on_bind) > > > > > > > xe_vma_ops_incr_pt_update_ops(vops, > > > > > > > > op- > > > > > tile_mask); > > > > break; > > > > @@ -2416,9 +2436,10 @@ static int op_lock_and_prep(struct > > > drm_exec > > > > *exec, struct xe_vm *vm, > > > > > > > > switch (op->base.op) { > > > > case DRM_GPUVA_OP_MAP: > > > > - err = vma_lock_and_validate(exec, op->map.vma, > > > > - > > > > !xe_vm_in_fault_mode(vm) > > > > > > > > > > - op->map.immediate); > > > > + if (!op->map.invalidate_on_bind) > > > > + err = vma_lock_and_validate(exec, op- > > > > > map.vma, > > > > + > > > > !xe_vm_in_fault_mode(vm) || > > > > + op- > > > > > map.immediate); > > > > break; > > > > case DRM_GPUVA_OP_REMAP: > > > > err = check_ufence(gpuva_to_vma(op- > > > > > base.remap.unmap->va)); > > > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h > > > > b/drivers/gpu/drm/xe/xe_vm_types.h > > > > index 52467b9b5348..dace04f4ea5e 100644 > > > > --- a/drivers/gpu/drm/xe/xe_vm_types.h > > > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h > > > > @@ -297,6 +297,8 @@ struct xe_vma_op_map { > > > > bool is_null; > > > > /** @dumpable: whether BO is dumped on GPU hang */ > > > > bool dumpable; > > > > + /** @invalidate: invalidate the VMA before bind */ > > > > + bool invalidate_on_bind; > > > > /** @pat_index: The pat index to use for this operation. > > > > */ > > > > u16 pat_index; > > > > }; > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-02-06 18:56 ` Zeng, Oak @ 2025-02-06 20:11 ` Thomas Hellström 2025-02-06 21:05 ` Zeng, Oak 0 siblings, 1 reply; 32+ messages in thread From: Thomas Hellström @ 2025-02-06 20:11 UTC (permalink / raw) To: Zeng, Oak, intel-xe@lists.freedesktop.org Cc: Brost, Matthew, Cavitt, Jonathan Hi, Oak, On Thu, 2025-02-06 at 18:56 +0000, Zeng, Oak wrote: > Hi Thomas! > > > -----Original Message----- > > From: Thomas Hellström <thomas.hellstrom@linux.intel.com> > > Sent: February 6, 2025 7:52 AM > > To: Zeng, Oak <oak.zeng@intel.com>; intel-xe@lists.freedesktop.org > > Cc: Brost, Matthew <matthew.brost@intel.com>; Cavitt, Jonathan > > <jonathan.cavitt@intel.com> > > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind > > > > On Thu, 2025-02-06 at 01:52 +0000, Zeng, Oak wrote: > > > > > > > > > > -----Original Message----- > > > > From: Thomas Hellström <thomas.hellstrom@linux.intel.com> > > > > Sent: February 5, 2025 9:36 AM > > > > To: Zeng, Oak <oak.zeng@intel.com>; intel- > > xe@lists.freedesktop.org > > > > Cc: Brost, Matthew <matthew.brost@intel.com>; Cavitt, Jonathan > > > > <jonathan.cavitt@intel.com> > > > > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before > > vm_bind > > > > > > > > On Tue, 2025-02-04 at 13:45 -0500, Oak Zeng wrote: > > > > > When a vm runs under fault mode, if scratch page is enabled, > > we > > > > need > > > > > to clear the scratch page mapping before vm_bind for the > > vm_bind > > > > > address range. Under fault mode, we depend on recoverable > > page > > > > fault > > > > > to establish mapping in page table. If scratch page is not > > > > > cleared, > > > > > GPU access of address won't cause page fault because it > > > > > always > > > > > hits > > > > > the existing scratch page mapping. > > > > > > > > I think we need to ephasize that we clear AT vm_bind time, not > > > > before, > > > > both in the commit description and the patch title. > > > > > > Will fix. > > > > > > > > > > > > > > > > > When vm_bind with IMMEDIATE flag, there is no need of > > clearing as > > > > > immediate bind can overwrite the scratch page mapping. > > > > > > > > > > So far only is xe2 and xe3 products are allowed to enable > > > > > scratch > > > > > page > > > > > under fault mode. On other platform we don't allow scratch > > page > > > > under > > > > > fault mode, so no need of such clearing. > > > > > > > > > > v2: Rework vm_bind pipeline to clear scratch page mapping. > > > > > This > > > > > is > > > > > similar > > > > > to a map operation, with the exception that PTEs are cleared > > > > instead > > > > > of > > > > > pointing to valid physical pages. (Matt, Thomas) > > > > > > > > > > TLB invalidation is needed after clear scratch page mapping > > > > > as > > > > > larger > > > > > scratch page mapping could be backed by physical page and > > cached > > > > in > > > > > TLB. (Matt, Thomas) > > > > > > > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > > > > --- > > > > > drivers/gpu/drm/xe/xe_pt.c | 66 > > ++++++++++++++++++++++-- > > > > ------ > > > > > -- > > > > > drivers/gpu/drm/xe/xe_pt_types.h | 2 + > > > > > drivers/gpu/drm/xe/xe_vm.c | 29 ++++++++++++-- > > > > > drivers/gpu/drm/xe/xe_vm_types.h | 2 + > > > > > 4 files changed, 75 insertions(+), 24 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c > > > > b/drivers/gpu/drm/xe/xe_pt.c > > > > > index 1ddcc7e79a93..3fd0ae2dbe7d 100644 > > > > > --- a/drivers/gpu/drm/xe/xe_pt.c > > > > > +++ b/drivers/gpu/drm/xe/xe_pt.c > > > > > @@ -268,6 +268,8 @@ struct xe_pt_stage_bind_walk { > > > > > * granularity. > > > > > */ > > > > > bool needs_64K; > > > > > + /* @clear_pt: clear page table entries during the > > > > > bind > > > > > walk > > > > > */ > > > > > + bool clear_pt; > > > > > /** > > > > > * @vma: VMA being mapped > > > > > */ > > > > > @@ -497,21 +499,25 @@ xe_pt_stage_bind_entry(struct > > xe_ptw > > > > *parent, > > > > > pgoff_t offset, > > > > > > > > > > XE_WARN_ON(xe_walk->va_curs_start != addr); > > > > > > > > > > - pte = vm->pt_ops->pte_encode_vma(is_null ? 0 > > > > > : > > > > > - > > > > > xe_res_dma(curs) + > > > > > xe_walk->dma_offset, > > > > > - xe_walk- > > > > > >vma, > > > > > pat_index, level); > > > > > - pte |= xe_walk->default_pte; > > > > > + if (xe_walk->clear_pt) { > > > > > + pte = 0; > > > > > + } else { > > > > > + pte = vm->pt_ops- > > > > > >pte_encode_vma(is_null > > > > > ? > > > > 0 > > > > > : > > > > > + xe_res_dma(curs) + > > > > > xe_walk- > > > > > > dma_offset, > > > > > + xe_walk->vma, > > > > > pat_index, > > > > > level); > > > > > + pte |= xe_walk->default_pte; > > > > > > > > > > - /* > > > > > - * Set the XE_PTE_PS64 hint if possible, > > > > > otherwise > > > > > if > > > > > - * this device *requires* 64K PTE size for > > > > > VRAM, > > > > > fail. > > > > > - */ > > > > > - if (level == 0 && !xe_parent->is_compact) { > > > > > - if (xe_pt_is_pte_ps64K(addr, next, > > > > > xe_walk)) > > > > > { > > > > > - xe_walk->vma->gpuva.flags |= > > > > > XE_VMA_PTE_64K; > > > > > - pte |= XE_PTE_PS64; > > > > > - } else if (XE_WARN_ON(xe_walk- > > > > > needs_64K)) { > > > > > - return -EINVAL; > > > > > + /* > > > > > + * Set the XE_PTE_PS64 hint if > > > > > possible, > > > > > otherwise if > > > > > + * this device *requires* 64K PTE > > > > > size > > > > > for > > > > > VRAM, fail. > > > > > + */ > > > > > + if (level == 0 && !xe_parent- > > > > > > is_compact) { > > > > > + if (xe_pt_is_pte_ps64K(addr, > > > > > next, > > > > > xe_walk)) { > > > > > + xe_walk->vma- > > > > > > gpuva.flags > > > > > = > > > > > XE_VMA_PTE_64K; > > > > > + pte |= XE_PTE_PS64; > > > > > + } else if > > > > > (XE_WARN_ON(xe_walk- > > > > > > needs_64K)) { > > > > > + return -EINVAL; > > > > > + } > > > > > } > > > > > } > > > > > > > > > > > > > I think there might be some things missing here. > > > > The bind should look exactly as a is_null bind, with the > > > > exception > > > > that > > > > the pte should be 0, so might be able to benefit from is_null, > > > > for > > > > example xe_pg_hugepte_possible() short-circuit the dma- > > address > > > > check. > > > > > > You are right. Above code has a problem for huge pte cases: it > > > wouldn't detect > > > A huge pte for clear_pt case. I will fix it. > > > > > > I will not make is_null be true for clear_pt case, but will make > > > similar logic to > > > Let clear_pt benefit from is_null logic. > > > > > > > With clear_pt we don't want to use the XE_PTE_PS64K, though, > > like > > > > you > > > > identified above. > > > > > > > > Will get back with a look at the invalidation. > > > > > > I don't quite get here. Please do let me know if you find > > > anything. > > > > Yes, I meant I hadn't had time to look at this part yet. > > > > For binding with scratch, it looks like *all* binds need TLB > > invalidation, not just clearing binds, So in the below code-path, > > can't > > we just replace the xe_vm_in_preempt_fence_mode() with > > xe_vm_in_lr_mode()? > > I think you are correct. > > For long run vm, regardless it is in fault mode or not, we need to > invalidate > TLB scratch page caching during bind. Original codes only did TLB inv > on non- > Fault mode. On fault mode + LR, this runs into two sub-cases: > 1. immediate bind > 2. bind triggered by page fault: note this can actually runs into the > !rebind case > When the fault address is first time accessed by gpu. But the TLB > cache for > Scratch page mapping should have been invalidated by the > "invalidate_on_bind" > Condition. > > So this leaves case #1 problematic. > > It would be better for others to comment also, as in the comment > below, it explicitly > Called out "non-faulting LR". My understanding is, people didn't > think of case #1 above. > > Since it is an existing issue, I will make it a separate patch for > better tracking purpose. Hmm. What is the existing issue, I don't quite follow? /Thomas > > Oak > > > > > > > /* > > * If rebind, we have to invalidate TLB on !LR vms > > to > > invalidate > > * cached PTEs point to freed memory. On LR vms > > this > > is done > > * automatically when the context is re-enabled by > > the > > rebind worker, > > * or in fault mode it was invalidated on PTE > > zapping. > > * > > * If !rebind, and scratch enabled VMs, there is a > > chance the scratch > > * PTE is already cached in the TLB so it needs to > > be > > invalidated. > > * On !LR VMs this is done in the ring ops > > preceding a > > batch, but on > > * non-faulting LR, in particular on user-space > > batch > > buffer chaining, > > * it needs to be done here. > > */ > > if ((!pt_op->rebind && xe_vm_has_scratch(vm) && > > xe_vm_in_preempt_fence_mode(vm))) > > > > /Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Oak > > > > > > > > > > > > > @@ -519,7 +525,7 @@ xe_pt_stage_bind_entry(struct xe_ptw > > > > *parent, > > > > > pgoff_t offset, > > > > > if (unlikely(ret)) > > > > > return ret; > > > > > > > > > > - if (!is_null) > > > > > + if (!is_null && !xe_walk->clear_pt) > > > > > xe_res_next(curs, next - addr); > > > > > xe_walk->va_curs_start = next; > > > > > xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K > > > > > << > > > > > level); > > > > > @@ -589,6 +595,7 @@ static const struct xe_pt_walk_ops > > > > > xe_pt_stage_bind_ops = { > > > > > * @vma: The vma indicating the address range. > > > > > * @entries: Storage for the update entries used for > > > > > connecting > > > > > the > > > > > tree to > > > > > * the main tree at commit time. > > > > > + * @clear_pt: Clear the page table entries. > > > > > * @num_entries: On output contains the number of @entries > > > > used. > > > > > * > > > > > * This function builds a disconnected page-table tree for a > > > > > given > > > > > address > > > > > @@ -602,7 +609,8 @@ static const struct xe_pt_walk_ops > > > > > xe_pt_stage_bind_ops = { > > > > > */ > > > > > static int > > > > > xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, > > > > > - struct xe_vm_pgtable_update *entries, u32 > > > > > *num_entries) > > > > > + struct xe_vm_pgtable_update *entries, > > > > > + bool clear_pt, u32 *num_entries) > > > > > { > > > > > struct xe_device *xe = tile_to_xe(tile); > > > > > struct xe_bo *bo = xe_vma_bo(vma); > > > > > @@ -622,10 +630,19 @@ xe_pt_stage_bind(struct xe_tile *tile, > > > > struct > > > > > xe_vma *vma, > > > > > .vma = vma, > > > > > .wupd.entries = entries, > > > > > .needs_64K = (xe_vma_vm(vma)->flags & > > > > > XE_VM_FLAG_64K) && is_devmem, > > > > > + .clear_pt = clear_pt, > > > > > }; > > > > > struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile- > > > > > >id]; > > > > > int ret; > > > > > > > > > > + if (clear_pt) { > > > > > + ret = xe_pt_walk_range(&pt->base, pt->level, > > > > > xe_vma_start(vma), > > > > > + xe_vma_end(vma), > > > > > &xe_walk.base); > > > > > + > > > > > + *num_entries = > > > > > xe_walk.wupd.num_used_entries; > > > > > + return ret; > > > > > + } > > > > > + > > > > > /** > > > > > * Default atomic expectations for different > > > > > allocation > > > > > scenarios are as follows: > > > > > * > > > > > @@ -981,12 +998,14 @@ static void xe_pt_free_bind(struct > > > > > xe_vm_pgtable_update *entries, > > > > > > > > > > static int > > > > > xe_pt_prepare_bind(struct xe_tile *tile, struct xe_vma *vma, > > > > > - struct xe_vm_pgtable_update *entries, u32 > > > > > *num_entries) > > > > > + struct xe_vm_pgtable_update *entries, > > > > > + bool invalidate_on_bind, u32 > > > > > *num_entries) > > > > > { > > > > > int err; > > > > > > > > > > *num_entries = 0; > > > > > - err = xe_pt_stage_bind(tile, vma, entries, > > > > > num_entries); > > > > > + err = xe_pt_stage_bind(tile, vma, entries, > > > > > invalidate_on_bind, > > > > > + num_entries); > > > > > if (!err) > > > > > xe_tile_assert(tile, *num_entries); > > > > > > > > > > @@ -1661,6 +1680,7 @@ static int bind_op_prepare(struct > > xe_vm > > > > *vm, > > > > > struct xe_tile *tile, > > > > > return err; > > > > > > > > > > err = xe_pt_prepare_bind(tile, vma, pt_op->entries, > > > > > + pt_update_ops- > > > > > > invalidate_on_bind, > > > > > &pt_op->num_entries); > > > > > if (!err) { > > > > > xe_tile_assert(tile, pt_op->num_entries <= > > > > > @@ -1685,7 +1705,7 @@ static int bind_op_prepare(struct > > xe_vm > > > > *vm, > > > > > struct xe_tile *tile, > > > > > * it needs to be done here. > > > > > */ > > > > > if ((!pt_op->rebind && xe_vm_has_scratch(vm) > > > > > && > > > > > - xe_vm_in_preempt_fence_mode(vm))) > > > > > + xe_vm_in_preempt_fence_mode(vm)) || > > > > > pt_update_ops->invalidate_on_bind) > > > > > pt_update_ops->needs_invalidation = > > > > > true; > > > > > else if (pt_op->rebind && > > > > > !xe_vm_in_lr_mode(vm)) > > > > > /* We bump also if > > > > > batch_invalidate_tlb > > > > > is > > > > > true */ > > > > > @@ -1759,9 +1779,13 @@ static int op_prepare(struct xe_vm > > *vm, > > > > > > > > > > switch (op->base.op) { > > > > > case DRM_GPUVA_OP_MAP: > > > > > - if (!op->map.immediate && > > > > xe_vm_in_fault_mode(vm)) > > > > > + if (!op->map.immediate && > > > > xe_vm_in_fault_mode(vm) && > > > > > + !op->map.invalidate_on_bind) > > > > > break; > > > > > > > > > > + if (op->map.invalidate_on_bind) > > > > > + pt_update_ops->invalidate_on_bind = > > > > > true; > > > > > + > > > > > err = bind_op_prepare(vm, tile, > > > > > pt_update_ops, > > > > > op- > > > > > > map.vma); > > > > > pt_update_ops->wait_vm_kernel = true; > > > > > break; > > > > > @@ -1871,6 +1895,8 @@ static void bind_op_commit(struct > > xe_vm > > > > *vm, > > > > > struct xe_tile *tile, > > > > > } > > > > > vma->tile_present |= BIT(tile->id); > > > > > vma->tile_staged &= ~BIT(tile->id); > > > > > + if (pt_update_ops->invalidate_on_bind) > > > > > + vma->tile_invalidated |= BIT(tile->id); > > > > > if (xe_vma_is_userptr(vma)) { > > > > > lockdep_assert_held_read(&vm- > > > > > > userptr.notifier_lock); > > > > > to_userptr_vma(vma)->userptr.initial_bind = > > > > > true; > > > > > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h > > > > > b/drivers/gpu/drm/xe/xe_pt_types.h > > > > > index 384cc04de719..3d0aa2a5102e 100644 > > > > > --- a/drivers/gpu/drm/xe/xe_pt_types.h > > > > > +++ b/drivers/gpu/drm/xe/xe_pt_types.h > > > > > @@ -108,6 +108,8 @@ struct xe_vm_pgtable_update_ops { > > > > > bool needs_userptr_lock; > > > > > /** @needs_invalidation: Needs invalidation */ > > > > > bool needs_invalidation; > > > > > + /** @invalidate_on_bind: Invalidate the range before > > > > > bind */ > > > > > + bool invalidate_on_bind; > > > > > /** > > > > > * @wait_vm_bookkeep: PT operations need to wait > > > > > until > > > > > VM > > > > is > > > > > idle > > > > > * (bookkeep dma-resv slots are idle) and stage all > > > > > future > > > > > VM activity > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > > > > b/drivers/gpu/drm/xe/xe_vm.c > > > > > index d664f2e418b2..813d893d9b63 100644 > > > > > --- a/drivers/gpu/drm/xe/xe_vm.c > > > > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > > > > @@ -1921,6 +1921,23 @@ static void print_op(struct xe_device > > *xe, > > > > > struct drm_gpuva_op *op) > > > > > } > > > > > #endif > > > > > > > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct > > xe_vm > > > > *vm, u32 > > > > > bind_flags) > > > > > +{ > > > > > + if (!xe_vm_in_fault_mode(vm)) > > > > > + return false; > > > > > + > > > > > + if (!NEEDS_SCRATCH(vm->xe)) > > > > > + return false; > > > > > + > > > > > + if (!xe_vm_has_scratch(vm)) > > > > > + return false; > > > > > + > > > > > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > > > > + return false; > > > > > + > > > > > + return true; > > > > > +} > > > > > + > > > > > /* > > > > > * Create operations list from IOCTL arguments, setup > > operations > > > > > fields so parse > > > > > * and commit steps are decoupled from IOCTL arguments. This > > > > step > > > > > can fail. > > > > > @@ -1991,6 +2008,8 @@ vm_bind_ioctl_ops_create(struct > > xe_vm > > > > *vm, > > > > > struct xe_bo *bo, > > > > > op->map.is_null = flags & > > > > > DRM_XE_VM_BIND_FLAG_NULL; > > > > > op->map.dumpable = flags & > > > > > DRM_XE_VM_BIND_FLAG_DUMPABLE; > > > > > op->map.pat_index = pat_index; > > > > > + op->map.invalidate_on_bind = > > > > > + > > > > __xe_vm_needs_clear_scratch_pages(vm > > > > > , flags); > > > > > } else if (__op->op == > > > > > DRM_GPUVA_OP_PREFETCH) { > > > > > op->prefetch.region = > > > > > prefetch_region; > > > > > } > > > > > @@ -2188,7 +2207,8 @@ static int > > vm_bind_ioctl_ops_parse(struct > > > > xe_vm > > > > > *vm, struct drm_gpuva_ops *ops, > > > > > return PTR_ERR(vma); > > > > > > > > > > op->map.vma = vma; > > > > > - if (op->map.immediate || > > > > > !xe_vm_in_fault_mode(vm)) > > > > > + if (op->map.immediate || > > > > > !xe_vm_in_fault_mode(vm) || > > > > > + op->map.invalidate_on_bind) > > > > > > > > > xe_vma_ops_incr_pt_update_ops(vops, > > > > > > > > > > op- > > > > > > tile_mask); > > > > > break; > > > > > @@ -2416,9 +2436,10 @@ static int op_lock_and_prep(struct > > > > drm_exec > > > > > *exec, struct xe_vm *vm, > > > > > > > > > > switch (op->base.op) { > > > > > case DRM_GPUVA_OP_MAP: > > > > > - err = vma_lock_and_validate(exec, op- > > > > > >map.vma, > > > > > - > > > > > !xe_vm_in_fault_mode(vm) > > > > > > > > > > > > - op- > > > > > >map.immediate); > > > > > + if (!op->map.invalidate_on_bind) > > > > > + err = vma_lock_and_validate(exec, > > > > > op- > > > > > > map.vma, > > > > > + > > > > > !xe_vm_in_fault_mode(vm) || > > > > > + op- > > > > > > map.immediate); > > > > > break; > > > > > case DRM_GPUVA_OP_REMAP: > > > > > err = check_ufence(gpuva_to_vma(op- > > > > > > base.remap.unmap->va)); > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h > > > > > b/drivers/gpu/drm/xe/xe_vm_types.h > > > > > index 52467b9b5348..dace04f4ea5e 100644 > > > > > --- a/drivers/gpu/drm/xe/xe_vm_types.h > > > > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h > > > > > @@ -297,6 +297,8 @@ struct xe_vma_op_map { > > > > > bool is_null; > > > > > /** @dumpable: whether BO is dumped on GPU hang */ > > > > > bool dumpable; > > > > > + /** @invalidate: invalidate the VMA before bind */ > > > > > + bool invalidate_on_bind; > > > > > /** @pat_index: The pat index to use for this > > > > > operation. > > > > > */ > > > > > u16 pat_index; > > > > > }; > > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-02-06 20:11 ` Thomas Hellström @ 2025-02-06 21:05 ` Zeng, Oak 0 siblings, 0 replies; 32+ messages in thread From: Zeng, Oak @ 2025-02-06 21:05 UTC (permalink / raw) To: Thomas Hellström, intel-xe@lists.freedesktop.org Cc: Brost, Matthew, Cavitt, Jonathan > -----Original Message----- > From: Thomas Hellström <thomas.hellstrom@linux.intel.com> > Sent: February 6, 2025 3:12 PM > To: Zeng, Oak <oak.zeng@intel.com>; intel-xe@lists.freedesktop.org > Cc: Brost, Matthew <matthew.brost@intel.com>; Cavitt, Jonathan > <jonathan.cavitt@intel.com> > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind > > Hi, Oak, > > On Thu, 2025-02-06 at 18:56 +0000, Zeng, Oak wrote: > > Hi Thomas! > > > > > -----Original Message----- > > > From: Thomas Hellström <thomas.hellstrom@linux.intel.com> > > > Sent: February 6, 2025 7:52 AM > > > To: Zeng, Oak <oak.zeng@intel.com>; intel- > xe@lists.freedesktop.org > > > Cc: Brost, Matthew <matthew.brost@intel.com>; Cavitt, Jonathan > > > <jonathan.cavitt@intel.com> > > > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before > vm_bind > > > > > > On Thu, 2025-02-06 at 01:52 +0000, Zeng, Oak wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: Thomas Hellström <thomas.hellstrom@linux.intel.com> > > > > > Sent: February 5, 2025 9:36 AM > > > > > To: Zeng, Oak <oak.zeng@intel.com>; intel- > > > xe@lists.freedesktop.org > > > > > Cc: Brost, Matthew <matthew.brost@intel.com>; Cavitt, > Jonathan > > > > > <jonathan.cavitt@intel.com> > > > > > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before > > > vm_bind > > > > > > > > > > On Tue, 2025-02-04 at 13:45 -0500, Oak Zeng wrote: > > > > > > When a vm runs under fault mode, if scratch page is enabled, > > > we > > > > > need > > > > > > to clear the scratch page mapping before vm_bind for the > > > vm_bind > > > > > > address range. Under fault mode, we depend on > recoverable > > > page > > > > > fault > > > > > > to establish mapping in page table. If scratch page is not > > > > > > cleared, > > > > > > GPU access of address won't cause page fault because it > > > > > > always > > > > > > hits > > > > > > the existing scratch page mapping. > > > > > > > > > > I think we need to ephasize that we clear AT vm_bind time, > not > > > > > before, > > > > > both in the commit description and the patch title. > > > > > > > > Will fix. > > > > > > > > > > > > > > > > > > > > > When vm_bind with IMMEDIATE flag, there is no need of > > > clearing as > > > > > > immediate bind can overwrite the scratch page mapping. > > > > > > > > > > > > So far only is xe2 and xe3 products are allowed to enable > > > > > > scratch > > > > > > page > > > > > > under fault mode. On other platform we don't allow scratch > > > page > > > > > under > > > > > > fault mode, so no need of such clearing. > > > > > > > > > > > > v2: Rework vm_bind pipeline to clear scratch page mapping. > > > > > > This > > > > > > is > > > > > > similar > > > > > > to a map operation, with the exception that PTEs are cleared > > > > > instead > > > > > > of > > > > > > pointing to valid physical pages. (Matt, Thomas) > > > > > > > > > > > > TLB invalidation is needed after clear scratch page mapping > > > > > > as > > > > > > larger > > > > > > scratch page mapping could be backed by physical page and > > > cached > > > > > in > > > > > > TLB. (Matt, Thomas) > > > > > > > > > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > > > > > --- > > > > > > drivers/gpu/drm/xe/xe_pt.c | 66 > > > ++++++++++++++++++++++-- > > > > > ------ > > > > > > -- > > > > > > drivers/gpu/drm/xe/xe_pt_types.h | 2 + > > > > > > drivers/gpu/drm/xe/xe_vm.c | 29 ++++++++++++-- > > > > > > drivers/gpu/drm/xe/xe_vm_types.h | 2 + > > > > > > 4 files changed, 75 insertions(+), 24 deletions(-) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c > > > > > b/drivers/gpu/drm/xe/xe_pt.c > > > > > > index 1ddcc7e79a93..3fd0ae2dbe7d 100644 > > > > > > --- a/drivers/gpu/drm/xe/xe_pt.c > > > > > > +++ b/drivers/gpu/drm/xe/xe_pt.c > > > > > > @@ -268,6 +268,8 @@ struct xe_pt_stage_bind_walk { > > > > > > * granularity. > > > > > > */ > > > > > > bool needs_64K; > > > > > > + /* @clear_pt: clear page table entries during the > > > > > > bind > > > > > > walk > > > > > > */ > > > > > > + bool clear_pt; > > > > > > /** > > > > > > * @vma: VMA being mapped > > > > > > */ > > > > > > @@ -497,21 +499,25 @@ xe_pt_stage_bind_entry(struct > > > xe_ptw > > > > > *parent, > > > > > > pgoff_t offset, > > > > > > > > > > > > XE_WARN_ON(xe_walk->va_curs_start != > addr); > > > > > > > > > > > > - pte = vm->pt_ops->pte_encode_vma(is_null ? > 0 > > > > > > : > > > > > > - > > > > > > xe_res_dma(curs) + > > > > > > xe_walk->dma_offset, > > > > > > - xe_walk- > > > > > > >vma, > > > > > > pat_index, level); > > > > > > - pte |= xe_walk->default_pte; > > > > > > + if (xe_walk->clear_pt) { > > > > > > + pte = 0; > > > > > > + } else { > > > > > > + pte = vm->pt_ops- > > > > > > >pte_encode_vma(is_null > > > > > > ? > > > > > 0 > > > > > > : > > > > > > + xe_res_dma(curs) + > > > > > > xe_walk- > > > > > > > dma_offset, > > > > > > + xe_walk->vma, > > > > > > pat_index, > > > > > > level); > > > > > > + pte |= xe_walk->default_pte; > > > > > > > > > > > > - /* > > > > > > - * Set the XE_PTE_PS64 hint if possible, > > > > > > otherwise > > > > > > if > > > > > > - * this device *requires* 64K PTE size for > > > > > > VRAM, > > > > > > fail. > > > > > > - */ > > > > > > - if (level == 0 && !xe_parent->is_compact) { > > > > > > - if (xe_pt_is_pte_ps64K(addr, next, > > > > > > xe_walk)) > > > > > > { > > > > > > - xe_walk->vma->gpuva.flags > |= > > > > > > XE_VMA_PTE_64K; > > > > > > - pte |= XE_PTE_PS64; > > > > > > - } else if (XE_WARN_ON(xe_walk- > > > > > > needs_64K)) { > > > > > > - return -EINVAL; > > > > > > + /* > > > > > > + * Set the XE_PTE_PS64 hint if > > > > > > possible, > > > > > > otherwise if > > > > > > + * this device *requires* 64K PTE > > > > > > size > > > > > > for > > > > > > VRAM, fail. > > > > > > + */ > > > > > > + if (level == 0 && !xe_parent- > > > > > > > is_compact) { > > > > > > + if (xe_pt_is_pte_ps64K(addr, > > > > > > next, > > > > > > xe_walk)) { > > > > > > + xe_walk->vma- > > > > > > > gpuva.flags > > > > > > = > > > > > > XE_VMA_PTE_64K; > > > > > > + pte |= XE_PTE_PS64; > > > > > > + } else if > > > > > > (XE_WARN_ON(xe_walk- > > > > > > > needs_64K)) { > > > > > > + return -EINVAL; > > > > > > + } > > > > > > } > > > > > > } > > > > > > > > > > > > > > > > I think there might be some things missing here. > > > > > The bind should look exactly as a is_null bind, with the > > > > > exception > > > > > that > > > > > the pte should be 0, so might be able to benefit from is_null, > > > > > for > > > > > example xe_pg_hugepte_possible() short-circuit the dma- > > > address > > > > > check. > > > > > > > > You are right. Above code has a problem for huge pte cases: it > > > > wouldn't detect > > > > A huge pte for clear_pt case. I will fix it. > > > > > > > > I will not make is_null be true for clear_pt case, but will make > > > > similar logic to > > > > Let clear_pt benefit from is_null logic. > > > > > > > > > With clear_pt we don't want to use the XE_PTE_PS64K, though, > > > like > > > > > you > > > > > identified above. > > > > > > > > > > Will get back with a look at the invalidation. > > > > > > > > I don't quite get here. Please do let me know if you find > > > > anything. > > > > > > Yes, I meant I hadn't had time to look at this part yet. > > > > > > For binding with scratch, it looks like *all* binds need TLB > > > invalidation, not just clearing binds, So in the below code-path, > > > can't > > > we just replace the xe_vm_in_preempt_fence_mode() with > > > xe_vm_in_lr_mode()? > > > > I think you are correct. > > > > For long run vm, regardless it is in fault mode or not, we need to > > invalidate > > TLB scratch page caching during bind. Original codes only did TLB inv > > on non- > > Fault mode. On fault mode + LR, this runs into two sub-cases: > > 1. immediate bind > > 2. bind triggered by page fault: note this can actually runs into the > > !rebind case > > When the fault address is first time accessed by gpu. But the TLB > > cache for > > Scratch page mapping should have been invalidated by the > > "invalidate_on_bind" > > Condition. > > > > So this leaves case #1 problematic. > > > > It would be better for others to comment also, as in the comment > > below, it explicitly > > Called out "non-faulting LR". My understanding is, people didn't > > think of case #1 above. > > > > Since it is an existing issue, I will make it a separate patch for > > better tracking purpose. > > Hmm. What is the existing issue, I don't quite follow? I just realized it is not an existing issue, as previously we didn't allow Scratch on fault mode. I will still fix in the current patch. Oak > > /Thomas > > > > > Oak > > > > > > > > > > > /* > > > * If rebind, we have to invalidate TLB on !LR vms > > > to > > > invalidate > > > * cached PTEs point to freed memory. On LR vms > > > this > > > is done > > > * automatically when the context is re-enabled by > > > the > > > rebind worker, > > > * or in fault mode it was invalidated on PTE > > > zapping. > > > * > > > * If !rebind, and scratch enabled VMs, there is a > > > chance the scratch > > > * PTE is already cached in the TLB so it needs to > > > be > > > invalidated. > > > * On !LR VMs this is done in the ring ops > > > preceding a > > > batch, but on > > > * non-faulting LR, in particular on user-space > > > batch > > > buffer chaining, > > > * it needs to be done here. > > > */ > > > if ((!pt_op->rebind && xe_vm_has_scratch(vm) && > > > xe_vm_in_preempt_fence_mode(vm))) > > > > > > /Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Oak > > > > > > > > > > > > > > > > @@ -519,7 +525,7 @@ xe_pt_stage_bind_entry(struct > xe_ptw > > > > > *parent, > > > > > > pgoff_t offset, > > > > > > if (unlikely(ret)) > > > > > > return ret; > > > > > > > > > > > > - if (!is_null) > > > > > > + if (!is_null && !xe_walk->clear_pt) > > > > > > xe_res_next(curs, next - addr); > > > > > > xe_walk->va_curs_start = next; > > > > > > xe_walk->vma->gpuva.flags |= > (XE_VMA_PTE_4K > > > > > > << > > > > > > level); > > > > > > @@ -589,6 +595,7 @@ static const struct xe_pt_walk_ops > > > > > > xe_pt_stage_bind_ops = { > > > > > > * @vma: The vma indicating the address range. > > > > > > * @entries: Storage for the update entries used for > > > > > > connecting > > > > > > the > > > > > > tree to > > > > > > * the main tree at commit time. > > > > > > + * @clear_pt: Clear the page table entries. > > > > > > * @num_entries: On output contains the number of > @entries > > > > > used. > > > > > > * > > > > > > * This function builds a disconnected page-table tree for a > > > > > > given > > > > > > address > > > > > > @@ -602,7 +609,8 @@ static const struct xe_pt_walk_ops > > > > > > xe_pt_stage_bind_ops = { > > > > > > */ > > > > > > static int > > > > > > xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, > > > > > > - struct xe_vm_pgtable_update *entries, u32 > > > > > > *num_entries) > > > > > > + struct xe_vm_pgtable_update *entries, > > > > > > + bool clear_pt, u32 *num_entries) > > > > > > { > > > > > > struct xe_device *xe = tile_to_xe(tile); > > > > > > struct xe_bo *bo = xe_vma_bo(vma); > > > > > > @@ -622,10 +630,19 @@ xe_pt_stage_bind(struct xe_tile > *tile, > > > > > struct > > > > > > xe_vma *vma, > > > > > > .vma = vma, > > > > > > .wupd.entries = entries, > > > > > > .needs_64K = (xe_vma_vm(vma)->flags & > > > > > > XE_VM_FLAG_64K) && is_devmem, > > > > > > + .clear_pt = clear_pt, > > > > > > }; > > > > > > struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile- > > > > > > >id]; > > > > > > int ret; > > > > > > > > > > > > + if (clear_pt) { > > > > > > + ret = xe_pt_walk_range(&pt->base, pt->level, > > > > > > xe_vma_start(vma), > > > > > > + xe_vma_end(vma), > > > > > > &xe_walk.base); > > > > > > + > > > > > > + *num_entries = > > > > > > xe_walk.wupd.num_used_entries; > > > > > > + return ret; > > > > > > + } > > > > > > + > > > > > > /** > > > > > > * Default atomic expectations for different > > > > > > allocation > > > > > > scenarios are as follows: > > > > > > * > > > > > > @@ -981,12 +998,14 @@ static void xe_pt_free_bind(struct > > > > > > xe_vm_pgtable_update *entries, > > > > > > > > > > > > static int > > > > > > xe_pt_prepare_bind(struct xe_tile *tile, struct xe_vma > *vma, > > > > > > - struct xe_vm_pgtable_update *entries, u32 > > > > > > *num_entries) > > > > > > + struct xe_vm_pgtable_update *entries, > > > > > > + bool invalidate_on_bind, u32 > > > > > > *num_entries) > > > > > > { > > > > > > int err; > > > > > > > > > > > > *num_entries = 0; > > > > > > - err = xe_pt_stage_bind(tile, vma, entries, > > > > > > num_entries); > > > > > > + err = xe_pt_stage_bind(tile, vma, entries, > > > > > > invalidate_on_bind, > > > > > > + num_entries); > > > > > > if (!err) > > > > > > xe_tile_assert(tile, *num_entries); > > > > > > > > > > > > @@ -1661,6 +1680,7 @@ static int bind_op_prepare(struct > > > xe_vm > > > > > *vm, > > > > > > struct xe_tile *tile, > > > > > > return err; > > > > > > > > > > > > err = xe_pt_prepare_bind(tile, vma, pt_op->entries, > > > > > > + pt_update_ops- > > > > > > > invalidate_on_bind, > > > > > > &pt_op->num_entries); > > > > > > if (!err) { > > > > > > xe_tile_assert(tile, pt_op->num_entries <= > > > > > > @@ -1685,7 +1705,7 @@ static int bind_op_prepare(struct > > > xe_vm > > > > > *vm, > > > > > > struct xe_tile *tile, > > > > > > * it needs to be done here. > > > > > > */ > > > > > > if ((!pt_op->rebind && > xe_vm_has_scratch(vm) > > > > > > && > > > > > > - xe_vm_in_preempt_fence_mode(vm))) > > > > > > + xe_vm_in_preempt_fence_mode(vm)) || > > > > > > pt_update_ops->invalidate_on_bind) > > > > > > pt_update_ops->needs_invalidation > = > > > > > > true; > > > > > > else if (pt_op->rebind && > > > > > > !xe_vm_in_lr_mode(vm)) > > > > > > /* We bump also if > > > > > > batch_invalidate_tlb > > > > > > is > > > > > > true */ > > > > > > @@ -1759,9 +1779,13 @@ static int op_prepare(struct > xe_vm > > > *vm, > > > > > > > > > > > > switch (op->base.op) { > > > > > > case DRM_GPUVA_OP_MAP: > > > > > > - if (!op->map.immediate && > > > > > xe_vm_in_fault_mode(vm)) > > > > > > + if (!op->map.immediate && > > > > > xe_vm_in_fault_mode(vm) && > > > > > > + !op->map.invalidate_on_bind) > > > > > > break; > > > > > > > > > > > > + if (op->map.invalidate_on_bind) > > > > > > + pt_update_ops->invalidate_on_bind > = > > > > > > true; > > > > > > + > > > > > > err = bind_op_prepare(vm, tile, > > > > > > pt_update_ops, > > > > > > op- > > > > > > > map.vma); > > > > > > pt_update_ops->wait_vm_kernel = true; > > > > > > break; > > > > > > @@ -1871,6 +1895,8 @@ static void bind_op_commit(struct > > > xe_vm > > > > > *vm, > > > > > > struct xe_tile *tile, > > > > > > } > > > > > > vma->tile_present |= BIT(tile->id); > > > > > > vma->tile_staged &= ~BIT(tile->id); > > > > > > + if (pt_update_ops->invalidate_on_bind) > > > > > > + vma->tile_invalidated |= BIT(tile->id); > > > > > > if (xe_vma_is_userptr(vma)) { > > > > > > lockdep_assert_held_read(&vm- > > > > > > > userptr.notifier_lock); > > > > > > to_userptr_vma(vma)->userptr.initial_bind = > > > > > > true; > > > > > > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h > > > > > > b/drivers/gpu/drm/xe/xe_pt_types.h > > > > > > index 384cc04de719..3d0aa2a5102e 100644 > > > > > > --- a/drivers/gpu/drm/xe/xe_pt_types.h > > > > > > +++ b/drivers/gpu/drm/xe/xe_pt_types.h > > > > > > @@ -108,6 +108,8 @@ struct xe_vm_pgtable_update_ops { > > > > > > bool needs_userptr_lock; > > > > > > /** @needs_invalidation: Needs invalidation */ > > > > > > bool needs_invalidation; > > > > > > + /** @invalidate_on_bind: Invalidate the range before > > > > > > bind */ > > > > > > + bool invalidate_on_bind; > > > > > > /** > > > > > > * @wait_vm_bookkeep: PT operations need to wait > > > > > > until > > > > > > VM > > > > > is > > > > > > idle > > > > > > * (bookkeep dma-resv slots are idle) and stage all > > > > > > future > > > > > > VM activity > > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > > > > > b/drivers/gpu/drm/xe/xe_vm.c > > > > > > index d664f2e418b2..813d893d9b63 100644 > > > > > > --- a/drivers/gpu/drm/xe/xe_vm.c > > > > > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > > > > > @@ -1921,6 +1921,23 @@ static void print_op(struct > xe_device > > > *xe, > > > > > > struct drm_gpuva_op *op) > > > > > > } > > > > > > #endif > > > > > > > > > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct > > > xe_vm > > > > > *vm, u32 > > > > > > bind_flags) > > > > > > +{ > > > > > > + if (!xe_vm_in_fault_mode(vm)) > > > > > > + return false; > > > > > > + > > > > > > + if (!NEEDS_SCRATCH(vm->xe)) > > > > > > + return false; > > > > > > + > > > > > > + if (!xe_vm_has_scratch(vm)) > > > > > > + return false; > > > > > > + > > > > > > + if (bind_flags & > DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > > > > > + return false; > > > > > > + > > > > > > + return true; > > > > > > +} > > > > > > + > > > > > > /* > > > > > > * Create operations list from IOCTL arguments, setup > > > operations > > > > > > fields so parse > > > > > > * and commit steps are decoupled from IOCTL arguments. > This > > > > > step > > > > > > can fail. > > > > > > @@ -1991,6 +2008,8 @@ vm_bind_ioctl_ops_create(struct > > > xe_vm > > > > > *vm, > > > > > > struct xe_bo *bo, > > > > > > op->map.is_null = flags & > > > > > > DRM_XE_VM_BIND_FLAG_NULL; > > > > > > op->map.dumpable = flags & > > > > > > DRM_XE_VM_BIND_FLAG_DUMPABLE; > > > > > > op->map.pat_index = pat_index; > > > > > > + op->map.invalidate_on_bind = > > > > > > + > > > > > __xe_vm_needs_clear_scratch_pages(vm > > > > > > , flags); > > > > > > } else if (__op->op == > > > > > > DRM_GPUVA_OP_PREFETCH) { > > > > > > op->prefetch.region = > > > > > > prefetch_region; > > > > > > } > > > > > > @@ -2188,7 +2207,8 @@ static int > > > vm_bind_ioctl_ops_parse(struct > > > > > xe_vm > > > > > > *vm, struct drm_gpuva_ops *ops, > > > > > > return PTR_ERR(vma); > > > > > > > > > > > > op->map.vma = vma; > > > > > > - if (op->map.immediate || > > > > > > !xe_vm_in_fault_mode(vm)) > > > > > > + if (op->map.immediate || > > > > > > !xe_vm_in_fault_mode(vm) || > > > > > > + op->map.invalidate_on_bind) > > > > > > > > > > > xe_vma_ops_incr_pt_update_ops(vops, > > > > > > > > > > > > op- > > > > > > > tile_mask); > > > > > > break; > > > > > > @@ -2416,9 +2436,10 @@ static int > op_lock_and_prep(struct > > > > > drm_exec > > > > > > *exec, struct xe_vm *vm, > > > > > > > > > > > > switch (op->base.op) { > > > > > > case DRM_GPUVA_OP_MAP: > > > > > > - err = vma_lock_and_validate(exec, op- > > > > > > >map.vma, > > > > > > - > > > > > > !xe_vm_in_fault_mode(vm) > > > > > > > > > > > > > > - op- > > > > > > >map.immediate); > > > > > > + if (!op->map.invalidate_on_bind) > > > > > > + err = vma_lock_and_validate(exec, > > > > > > op- > > > > > > > map.vma, > > > > > > + > > > > > > !xe_vm_in_fault_mode(vm) || > > > > > > + op- > > > > > > > map.immediate); > > > > > > break; > > > > > > case DRM_GPUVA_OP_REMAP: > > > > > > err = check_ufence(gpuva_to_vma(op- > > > > > > > base.remap.unmap->va)); > > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h > > > > > > b/drivers/gpu/drm/xe/xe_vm_types.h > > > > > > index 52467b9b5348..dace04f4ea5e 100644 > > > > > > --- a/drivers/gpu/drm/xe/xe_vm_types.h > > > > > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h > > > > > > @@ -297,6 +297,8 @@ struct xe_vma_op_map { > > > > > > bool is_null; > > > > > > /** @dumpable: whether BO is dumped on GPU > hang */ > > > > > > bool dumpable; > > > > > > + /** @invalidate: invalidate the VMA before bind */ > > > > > > + bool invalidate_on_bind; > > > > > > /** @pat_index: The pat index to use for this > > > > > > operation. > > > > > > */ > > > > > > u16 pat_index; > > > > > > }; > > > > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-02-04 18:45 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng 2025-02-05 14:35 ` Thomas Hellström @ 2025-02-06 10:34 ` Matthew Brost 2025-02-06 10:43 ` Thomas Hellström 2025-02-06 15:16 ` Zeng, Oak 1 sibling, 2 replies; 32+ messages in thread From: Matthew Brost @ 2025-02-06 10:34 UTC (permalink / raw) To: Oak Zeng; +Cc: intel-xe, Thomas.Hellstrom, jonathan.cavitt On Tue, Feb 04, 2025 at 01:45:57PM -0500, Oak Zeng wrote: > When a vm runs under fault mode, if scratch page is enabled, we need > to clear the scratch page mapping before vm_bind for the vm_bind > address range. Under fault mode, we depend on recoverable page fault > to establish mapping in page table. If scratch page is not cleared, > GPU access of address won't cause page fault because it always hits > the existing scratch page mapping. > > When vm_bind with IMMEDIATE flag, there is no need of clearing as > immediate bind can overwrite the scratch page mapping. > > So far only is xe2 and xe3 products are allowed to enable scratch page > under fault mode. On other platform we don't allow scratch page under > fault mode, so no need of such clearing. > > v2: Rework vm_bind pipeline to clear scratch page mapping. This is similar > to a map operation, with the exception that PTEs are cleared instead of > pointing to valid physical pages. (Matt, Thomas) > > TLB invalidation is needed after clear scratch page mapping as larger > scratch page mapping could be backed by physical page and cached in > TLB. (Matt, Thomas) > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> Given the complexity of the VM bind path, I think we need an IGT posted with this series before merging as I suggested in v1. Without it, it will be fairly difficult to ensure correctness by reviews only. Matt > --- > drivers/gpu/drm/xe/xe_pt.c | 66 ++++++++++++++++++++++---------- > drivers/gpu/drm/xe/xe_pt_types.h | 2 + > drivers/gpu/drm/xe/xe_vm.c | 29 ++++++++++++-- > drivers/gpu/drm/xe/xe_vm_types.h | 2 + > 4 files changed, 75 insertions(+), 24 deletions(-) > > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c > index 1ddcc7e79a93..3fd0ae2dbe7d 100644 > --- a/drivers/gpu/drm/xe/xe_pt.c > +++ b/drivers/gpu/drm/xe/xe_pt.c > @@ -268,6 +268,8 @@ struct xe_pt_stage_bind_walk { > * granularity. > */ > bool needs_64K; > + /* @clear_pt: clear page table entries during the bind walk */ > + bool clear_pt; > /** > * @vma: VMA being mapped > */ > @@ -497,21 +499,25 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset, > > XE_WARN_ON(xe_walk->va_curs_start != addr); > > - pte = vm->pt_ops->pte_encode_vma(is_null ? 0 : > - xe_res_dma(curs) + xe_walk->dma_offset, > - xe_walk->vma, pat_index, level); > - pte |= xe_walk->default_pte; > + if (xe_walk->clear_pt) { > + pte = 0; > + } else { > + pte = vm->pt_ops->pte_encode_vma(is_null ? 0 : > + xe_res_dma(curs) + xe_walk->dma_offset, > + xe_walk->vma, pat_index, level); > + pte |= xe_walk->default_pte; > > - /* > - * Set the XE_PTE_PS64 hint if possible, otherwise if > - * this device *requires* 64K PTE size for VRAM, fail. > - */ > - if (level == 0 && !xe_parent->is_compact) { > - if (xe_pt_is_pte_ps64K(addr, next, xe_walk)) { > - xe_walk->vma->gpuva.flags |= XE_VMA_PTE_64K; > - pte |= XE_PTE_PS64; > - } else if (XE_WARN_ON(xe_walk->needs_64K)) { > - return -EINVAL; > + /* > + * Set the XE_PTE_PS64 hint if possible, otherwise if > + * this device *requires* 64K PTE size for VRAM, fail. > + */ > + if (level == 0 && !xe_parent->is_compact) { > + if (xe_pt_is_pte_ps64K(addr, next, xe_walk)) { > + xe_walk->vma->gpuva.flags |= XE_VMA_PTE_64K; > + pte |= XE_PTE_PS64; > + } else if (XE_WARN_ON(xe_walk->needs_64K)) { > + return -EINVAL; > + } > } > } > > @@ -519,7 +525,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset, > if (unlikely(ret)) > return ret; > > - if (!is_null) > + if (!is_null && !xe_walk->clear_pt) > xe_res_next(curs, next - addr); > xe_walk->va_curs_start = next; > xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << level); > @@ -589,6 +595,7 @@ static const struct xe_pt_walk_ops xe_pt_stage_bind_ops = { > * @vma: The vma indicating the address range. > * @entries: Storage for the update entries used for connecting the tree to > * the main tree at commit time. > + * @clear_pt: Clear the page table entries. > * @num_entries: On output contains the number of @entries used. > * > * This function builds a disconnected page-table tree for a given address > @@ -602,7 +609,8 @@ static const struct xe_pt_walk_ops xe_pt_stage_bind_ops = { > */ > static int > xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, > - struct xe_vm_pgtable_update *entries, u32 *num_entries) > + struct xe_vm_pgtable_update *entries, > + bool clear_pt, u32 *num_entries) > { > struct xe_device *xe = tile_to_xe(tile); > struct xe_bo *bo = xe_vma_bo(vma); > @@ -622,10 +630,19 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, > .vma = vma, > .wupd.entries = entries, > .needs_64K = (xe_vma_vm(vma)->flags & XE_VM_FLAG_64K) && is_devmem, > + .clear_pt = clear_pt, > }; > struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id]; > int ret; > > + if (clear_pt) { > + ret = xe_pt_walk_range(&pt->base, pt->level, xe_vma_start(vma), > + xe_vma_end(vma), &xe_walk.base); > + > + *num_entries = xe_walk.wupd.num_used_entries; > + return ret; > + } > + > /** > * Default atomic expectations for different allocation scenarios are as follows: > * > @@ -981,12 +998,14 @@ static void xe_pt_free_bind(struct xe_vm_pgtable_update *entries, > > static int > xe_pt_prepare_bind(struct xe_tile *tile, struct xe_vma *vma, > - struct xe_vm_pgtable_update *entries, u32 *num_entries) > + struct xe_vm_pgtable_update *entries, > + bool invalidate_on_bind, u32 *num_entries) > { > int err; > > *num_entries = 0; > - err = xe_pt_stage_bind(tile, vma, entries, num_entries); > + err = xe_pt_stage_bind(tile, vma, entries, invalidate_on_bind, > + num_entries); > if (!err) > xe_tile_assert(tile, *num_entries); > > @@ -1661,6 +1680,7 @@ static int bind_op_prepare(struct xe_vm *vm, struct xe_tile *tile, > return err; > > err = xe_pt_prepare_bind(tile, vma, pt_op->entries, > + pt_update_ops->invalidate_on_bind, > &pt_op->num_entries); > if (!err) { > xe_tile_assert(tile, pt_op->num_entries <= > @@ -1685,7 +1705,7 @@ static int bind_op_prepare(struct xe_vm *vm, struct xe_tile *tile, > * it needs to be done here. > */ > if ((!pt_op->rebind && xe_vm_has_scratch(vm) && > - xe_vm_in_preempt_fence_mode(vm))) > + xe_vm_in_preempt_fence_mode(vm)) || pt_update_ops->invalidate_on_bind) > pt_update_ops->needs_invalidation = true; > else if (pt_op->rebind && !xe_vm_in_lr_mode(vm)) > /* We bump also if batch_invalidate_tlb is true */ > @@ -1759,9 +1779,13 @@ static int op_prepare(struct xe_vm *vm, > > switch (op->base.op) { > case DRM_GPUVA_OP_MAP: > - if (!op->map.immediate && xe_vm_in_fault_mode(vm)) > + if (!op->map.immediate && xe_vm_in_fault_mode(vm) && > + !op->map.invalidate_on_bind) > break; > > + if (op->map.invalidate_on_bind) > + pt_update_ops->invalidate_on_bind = true; > + > err = bind_op_prepare(vm, tile, pt_update_ops, op->map.vma); > pt_update_ops->wait_vm_kernel = true; > break; > @@ -1871,6 +1895,8 @@ static void bind_op_commit(struct xe_vm *vm, struct xe_tile *tile, > } > vma->tile_present |= BIT(tile->id); > vma->tile_staged &= ~BIT(tile->id); > + if (pt_update_ops->invalidate_on_bind) > + vma->tile_invalidated |= BIT(tile->id); > if (xe_vma_is_userptr(vma)) { > lockdep_assert_held_read(&vm->userptr.notifier_lock); > to_userptr_vma(vma)->userptr.initial_bind = true; > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h > index 384cc04de719..3d0aa2a5102e 100644 > --- a/drivers/gpu/drm/xe/xe_pt_types.h > +++ b/drivers/gpu/drm/xe/xe_pt_types.h > @@ -108,6 +108,8 @@ struct xe_vm_pgtable_update_ops { > bool needs_userptr_lock; > /** @needs_invalidation: Needs invalidation */ > bool needs_invalidation; > + /** @invalidate_on_bind: Invalidate the range before bind */ > + bool invalidate_on_bind; > /** > * @wait_vm_bookkeep: PT operations need to wait until VM is idle > * (bookkeep dma-resv slots are idle) and stage all future VM activity > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c > index d664f2e418b2..813d893d9b63 100644 > --- a/drivers/gpu/drm/xe/xe_vm.c > +++ b/drivers/gpu/drm/xe/xe_vm.c > @@ -1921,6 +1921,23 @@ static void print_op(struct xe_device *xe, struct drm_gpuva_op *op) > } > #endif > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_vm *vm, u32 bind_flags) > +{ > + if (!xe_vm_in_fault_mode(vm)) > + return false; > + > + if (!NEEDS_SCRATCH(vm->xe)) > + return false; > + > + if (!xe_vm_has_scratch(vm)) > + return false; > + > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > + return false; > + > + return true; > +} > + > /* > * Create operations list from IOCTL arguments, setup operations fields so parse > * and commit steps are decoupled from IOCTL arguments. This step can fail. > @@ -1991,6 +2008,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo, > op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL; > op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE; > op->map.pat_index = pat_index; > + op->map.invalidate_on_bind = > + __xe_vm_needs_clear_scratch_pages(vm, flags); > } else if (__op->op == DRM_GPUVA_OP_PREFETCH) { > op->prefetch.region = prefetch_region; > } > @@ -2188,7 +2207,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops, > return PTR_ERR(vma); > > op->map.vma = vma; > - if (op->map.immediate || !xe_vm_in_fault_mode(vm)) > + if (op->map.immediate || !xe_vm_in_fault_mode(vm) || > + op->map.invalidate_on_bind) > xe_vma_ops_incr_pt_update_ops(vops, > op->tile_mask); > break; > @@ -2416,9 +2436,10 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm, > > switch (op->base.op) { > case DRM_GPUVA_OP_MAP: > - err = vma_lock_and_validate(exec, op->map.vma, > - !xe_vm_in_fault_mode(vm) || > - op->map.immediate); > + if (!op->map.invalidate_on_bind) > + err = vma_lock_and_validate(exec, op->map.vma, > + !xe_vm_in_fault_mode(vm) || > + op->map.immediate); > break; > case DRM_GPUVA_OP_REMAP: > err = check_ufence(gpuva_to_vma(op->base.remap.unmap->va)); > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h > index 52467b9b5348..dace04f4ea5e 100644 > --- a/drivers/gpu/drm/xe/xe_vm_types.h > +++ b/drivers/gpu/drm/xe/xe_vm_types.h > @@ -297,6 +297,8 @@ struct xe_vma_op_map { > bool is_null; > /** @dumpable: whether BO is dumped on GPU hang */ > bool dumpable; > + /** @invalidate: invalidate the VMA before bind */ > + bool invalidate_on_bind; > /** @pat_index: The pat index to use for this operation. */ > u16 pat_index; > }; > -- > 2.26.3 > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-02-06 10:34 ` Matthew Brost @ 2025-02-06 10:43 ` Thomas Hellström 2025-02-10 17:08 ` Matthew Brost 2025-02-06 15:16 ` Zeng, Oak 1 sibling, 1 reply; 32+ messages in thread From: Thomas Hellström @ 2025-02-06 10:43 UTC (permalink / raw) To: Matthew Brost, Oak Zeng; +Cc: intel-xe, jonathan.cavitt Hi, Matt, On Thu, 2025-02-06 at 02:34 -0800, Matthew Brost wrote: > On Tue, Feb 04, 2025 at 01:45:57PM -0500, Oak Zeng wrote: > > When a vm runs under fault mode, if scratch page is enabled, we > > need > > to clear the scratch page mapping before vm_bind for the vm_bind > > address range. Under fault mode, we depend on recoverable page > > fault > > to establish mapping in page table. If scratch page is not cleared, > > GPU access of address won't cause page fault because it always hits > > the existing scratch page mapping. > > > > When vm_bind with IMMEDIATE flag, there is no need of clearing as > > immediate bind can overwrite the scratch page mapping. > > > > So far only is xe2 and xe3 products are allowed to enable scratch > > page > > under fault mode. On other platform we don't allow scratch page > > under > > fault mode, so no need of such clearing. > > > > v2: Rework vm_bind pipeline to clear scratch page mapping. This is > > similar > > to a map operation, with the exception that PTEs are cleared > > instead of > > pointing to valid physical pages. (Matt, Thomas) > > > > TLB invalidation is needed after clear scratch page mapping as > > larger > > scratch page mapping could be backed by physical page and cached in > > TLB. (Matt, Thomas) > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > Given the complexity of the VM bind path, I think we need an IGT > posted > with this series before merging as I suggested in v1. Without it, it > will be fairly difficult to ensure correctness by reviews only. There is an igt posted that exercises xe_exec_fault_mode + scratch pages. Although you might have had something more elaborate in mind? /Thomas > > Matt > > > --- > > drivers/gpu/drm/xe/xe_pt.c | 66 ++++++++++++++++++++++------ > > ---- > > drivers/gpu/drm/xe/xe_pt_types.h | 2 + > > drivers/gpu/drm/xe/xe_vm.c | 29 ++++++++++++-- > > drivers/gpu/drm/xe/xe_vm_types.h | 2 + > > 4 files changed, 75 insertions(+), 24 deletions(-) > > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c > > b/drivers/gpu/drm/xe/xe_pt.c > > index 1ddcc7e79a93..3fd0ae2dbe7d 100644 > > --- a/drivers/gpu/drm/xe/xe_pt.c > > +++ b/drivers/gpu/drm/xe/xe_pt.c > > @@ -268,6 +268,8 @@ struct xe_pt_stage_bind_walk { > > * granularity. > > */ > > bool needs_64K; > > + /* @clear_pt: clear page table entries during the bind > > walk */ > > + bool clear_pt; > > /** > > * @vma: VMA being mapped > > */ > > @@ -497,21 +499,25 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, > > pgoff_t offset, > > > > XE_WARN_ON(xe_walk->va_curs_start != addr); > > > > - pte = vm->pt_ops->pte_encode_vma(is_null ? 0 : > > - xe_res_dma(curs) > > + xe_walk->dma_offset, > > - xe_walk->vma, > > pat_index, level); > > - pte |= xe_walk->default_pte; > > + if (xe_walk->clear_pt) { > > + pte = 0; > > + } else { > > + pte = vm->pt_ops->pte_encode_vma(is_null ? > > 0 : > > + xe_res_dma(curs) + > > xe_walk->dma_offset, > > + xe_walk->vma, pat_index, > > level); > > + pte |= xe_walk->default_pte; > > > > - /* > > - * Set the XE_PTE_PS64 hint if possible, otherwise > > if > > - * this device *requires* 64K PTE size for VRAM, > > fail. > > - */ > > - if (level == 0 && !xe_parent->is_compact) { > > - if (xe_pt_is_pte_ps64K(addr, next, > > xe_walk)) { > > - xe_walk->vma->gpuva.flags |= > > XE_VMA_PTE_64K; > > - pte |= XE_PTE_PS64; > > - } else if (XE_WARN_ON(xe_walk->needs_64K)) > > { > > - return -EINVAL; > > + /* > > + * Set the XE_PTE_PS64 hint if possible, > > otherwise if > > + * this device *requires* 64K PTE size for > > VRAM, fail. > > + */ > > + if (level == 0 && !xe_parent->is_compact) > > { > > + if (xe_pt_is_pte_ps64K(addr, next, > > xe_walk)) { > > + xe_walk->vma->gpuva.flags > > |= XE_VMA_PTE_64K; > > + pte |= XE_PTE_PS64; > > + } else if (XE_WARN_ON(xe_walk- > > >needs_64K)) { > > + return -EINVAL; > > + } > > } > > } > > > > @@ -519,7 +525,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, > > pgoff_t offset, > > if (unlikely(ret)) > > return ret; > > > > - if (!is_null) > > + if (!is_null && !xe_walk->clear_pt) > > xe_res_next(curs, next - addr); > > xe_walk->va_curs_start = next; > > xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << > > level); > > @@ -589,6 +595,7 @@ static const struct xe_pt_walk_ops > > xe_pt_stage_bind_ops = { > > * @vma: The vma indicating the address range. > > * @entries: Storage for the update entries used for connecting > > the tree to > > * the main tree at commit time. > > + * @clear_pt: Clear the page table entries. > > * @num_entries: On output contains the number of @entries used. > > * > > * This function builds a disconnected page-table tree for a given > > address > > @@ -602,7 +609,8 @@ static const struct xe_pt_walk_ops > > xe_pt_stage_bind_ops = { > > */ > > static int > > xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, > > - struct xe_vm_pgtable_update *entries, u32 > > *num_entries) > > + struct xe_vm_pgtable_update *entries, > > + bool clear_pt, u32 *num_entries) > > { > > struct xe_device *xe = tile_to_xe(tile); > > struct xe_bo *bo = xe_vma_bo(vma); > > @@ -622,10 +630,19 @@ xe_pt_stage_bind(struct xe_tile *tile, struct > > xe_vma *vma, > > .vma = vma, > > .wupd.entries = entries, > > .needs_64K = (xe_vma_vm(vma)->flags & > > XE_VM_FLAG_64K) && is_devmem, > > + .clear_pt = clear_pt, > > }; > > struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id]; > > int ret; > > > > + if (clear_pt) { > > + ret = xe_pt_walk_range(&pt->base, pt->level, > > xe_vma_start(vma), > > + xe_vma_end(vma), > > &xe_walk.base); > > + > > + *num_entries = xe_walk.wupd.num_used_entries; > > + return ret; > > + } > > + > > /** > > * Default atomic expectations for different allocation > > scenarios are as follows: > > * > > @@ -981,12 +998,14 @@ static void xe_pt_free_bind(struct > > xe_vm_pgtable_update *entries, > > > > static int > > xe_pt_prepare_bind(struct xe_tile *tile, struct xe_vma *vma, > > - struct xe_vm_pgtable_update *entries, u32 > > *num_entries) > > + struct xe_vm_pgtable_update *entries, > > + bool invalidate_on_bind, u32 *num_entries) > > { > > int err; > > > > *num_entries = 0; > > - err = xe_pt_stage_bind(tile, vma, entries, num_entries); > > + err = xe_pt_stage_bind(tile, vma, entries, > > invalidate_on_bind, > > + num_entries); > > if (!err) > > xe_tile_assert(tile, *num_entries); > > > > @@ -1661,6 +1680,7 @@ static int bind_op_prepare(struct xe_vm *vm, > > struct xe_tile *tile, > > return err; > > > > err = xe_pt_prepare_bind(tile, vma, pt_op->entries, > > + pt_update_ops- > > >invalidate_on_bind, > > &pt_op->num_entries); > > if (!err) { > > xe_tile_assert(tile, pt_op->num_entries <= > > @@ -1685,7 +1705,7 @@ static int bind_op_prepare(struct xe_vm *vm, > > struct xe_tile *tile, > > * it needs to be done here. > > */ > > if ((!pt_op->rebind && xe_vm_has_scratch(vm) && > > - xe_vm_in_preempt_fence_mode(vm))) > > + xe_vm_in_preempt_fence_mode(vm)) || > > pt_update_ops->invalidate_on_bind) > > pt_update_ops->needs_invalidation = true; > > else if (pt_op->rebind && !xe_vm_in_lr_mode(vm)) > > /* We bump also if batch_invalidate_tlb is > > true */ > > @@ -1759,9 +1779,13 @@ static int op_prepare(struct xe_vm *vm, > > > > switch (op->base.op) { > > case DRM_GPUVA_OP_MAP: > > - if (!op->map.immediate && xe_vm_in_fault_mode(vm)) > > + if (!op->map.immediate && xe_vm_in_fault_mode(vm) > > && > > + !op->map.invalidate_on_bind) > > break; > > > > + if (op->map.invalidate_on_bind) > > + pt_update_ops->invalidate_on_bind = true; > > + > > err = bind_op_prepare(vm, tile, pt_update_ops, op- > > >map.vma); > > pt_update_ops->wait_vm_kernel = true; > > break; > > @@ -1871,6 +1895,8 @@ static void bind_op_commit(struct xe_vm *vm, > > struct xe_tile *tile, > > } > > vma->tile_present |= BIT(tile->id); > > vma->tile_staged &= ~BIT(tile->id); > > + if (pt_update_ops->invalidate_on_bind) > > + vma->tile_invalidated |= BIT(tile->id); > > if (xe_vma_is_userptr(vma)) { > > lockdep_assert_held_read(&vm- > > >userptr.notifier_lock); > > to_userptr_vma(vma)->userptr.initial_bind = true; > > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h > > b/drivers/gpu/drm/xe/xe_pt_types.h > > index 384cc04de719..3d0aa2a5102e 100644 > > --- a/drivers/gpu/drm/xe/xe_pt_types.h > > +++ b/drivers/gpu/drm/xe/xe_pt_types.h > > @@ -108,6 +108,8 @@ struct xe_vm_pgtable_update_ops { > > bool needs_userptr_lock; > > /** @needs_invalidation: Needs invalidation */ > > bool needs_invalidation; > > + /** @invalidate_on_bind: Invalidate the range before bind > > */ > > + bool invalidate_on_bind; > > /** > > * @wait_vm_bookkeep: PT operations need to wait until VM > > is idle > > * (bookkeep dma-resv slots are idle) and stage all future > > VM activity > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > > b/drivers/gpu/drm/xe/xe_vm.c > > index d664f2e418b2..813d893d9b63 100644 > > --- a/drivers/gpu/drm/xe/xe_vm.c > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > @@ -1921,6 +1921,23 @@ static void print_op(struct xe_device *xe, > > struct drm_gpuva_op *op) > > } > > #endif > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_vm *vm, > > u32 bind_flags) > > +{ > > + if (!xe_vm_in_fault_mode(vm)) > > + return false; > > + > > + if (!NEEDS_SCRATCH(vm->xe)) > > + return false; > > + > > + if (!xe_vm_has_scratch(vm)) > > + return false; > > + > > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > + return false; > > + > > + return true; > > +} > > + > > /* > > * Create operations list from IOCTL arguments, setup operations > > fields so parse > > * and commit steps are decoupled from IOCTL arguments. This step > > can fail. > > @@ -1991,6 +2008,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, > > struct xe_bo *bo, > > op->map.is_null = flags & > > DRM_XE_VM_BIND_FLAG_NULL; > > op->map.dumpable = flags & > > DRM_XE_VM_BIND_FLAG_DUMPABLE; > > op->map.pat_index = pat_index; > > + op->map.invalidate_on_bind = > > + __xe_vm_needs_clear_scratch_pages( > > vm, flags); > > } else if (__op->op == DRM_GPUVA_OP_PREFETCH) { > > op->prefetch.region = prefetch_region; > > } > > @@ -2188,7 +2207,8 @@ static int vm_bind_ioctl_ops_parse(struct > > xe_vm *vm, struct drm_gpuva_ops *ops, > > return PTR_ERR(vma); > > > > op->map.vma = vma; > > - if (op->map.immediate || > > !xe_vm_in_fault_mode(vm)) > > + if (op->map.immediate || > > !xe_vm_in_fault_mode(vm) || > > + op->map.invalidate_on_bind) > > xe_vma_ops_incr_pt_update_ops(vops > > , > > op- > > >tile_mask); > > break; > > @@ -2416,9 +2436,10 @@ static int op_lock_and_prep(struct drm_exec > > *exec, struct xe_vm *vm, > > > > switch (op->base.op) { > > case DRM_GPUVA_OP_MAP: > > - err = vma_lock_and_validate(exec, op->map.vma, > > - > > !xe_vm_in_fault_mode(vm) || > > - op->map.immediate); > > + if (!op->map.invalidate_on_bind) > > + err = vma_lock_and_validate(exec, op- > > >map.vma, > > + > > !xe_vm_in_fault_mode(vm) || > > + op- > > >map.immediate); > > break; > > case DRM_GPUVA_OP_REMAP: > > err = check_ufence(gpuva_to_vma(op- > > >base.remap.unmap->va)); > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h > > b/drivers/gpu/drm/xe/xe_vm_types.h > > index 52467b9b5348..dace04f4ea5e 100644 > > --- a/drivers/gpu/drm/xe/xe_vm_types.h > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h > > @@ -297,6 +297,8 @@ struct xe_vma_op_map { > > bool is_null; > > /** @dumpable: whether BO is dumped on GPU hang */ > > bool dumpable; > > + /** @invalidate: invalidate the VMA before bind */ > > + bool invalidate_on_bind; > > /** @pat_index: The pat index to use for this operation. > > */ > > u16 pat_index; > > }; > > -- > > 2.26.3 > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-02-06 10:43 ` Thomas Hellström @ 2025-02-10 17:08 ` Matthew Brost 0 siblings, 0 replies; 32+ messages in thread From: Matthew Brost @ 2025-02-10 17:08 UTC (permalink / raw) To: Thomas Hellström; +Cc: Oak Zeng, intel-xe, jonathan.cavitt On Thu, Feb 06, 2025 at 11:43:10AM +0100, Thomas Hellström wrote: > Hi, Matt, > > On Thu, 2025-02-06 at 02:34 -0800, Matthew Brost wrote: > > On Tue, Feb 04, 2025 at 01:45:57PM -0500, Oak Zeng wrote: > > > When a vm runs under fault mode, if scratch page is enabled, we > > > need > > > to clear the scratch page mapping before vm_bind for the vm_bind > > > address range. Under fault mode, we depend on recoverable page > > > fault > > > to establish mapping in page table. If scratch page is not cleared, > > > GPU access of address won't cause page fault because it always hits > > > the existing scratch page mapping. > > > > > > When vm_bind with IMMEDIATE flag, there is no need of clearing as > > > immediate bind can overwrite the scratch page mapping. > > > > > > So far only is xe2 and xe3 products are allowed to enable scratch > > > page > > > under fault mode. On other platform we don't allow scratch page > > > under > > > fault mode, so no need of such clearing. > > > > > > v2: Rework vm_bind pipeline to clear scratch page mapping. This is > > > similar > > > to a map operation, with the exception that PTEs are cleared > > > instead of > > > pointing to valid physical pages. (Matt, Thomas) > > > > > > TLB invalidation is needed after clear scratch page mapping as > > > larger > > > scratch page mapping could be backed by physical page and cached in > > > TLB. (Matt, Thomas) > > > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > > > Given the complexity of the VM bind path, I think we need an IGT > > posted > > with this series before merging as I suggested in v1. Without it, it > > will be fairly difficult to ensure correctness by reviews only. > > There is an igt posted that exercises xe_exec_fault_mode + scratch > pages. Although you might have had something more elaborate in mind? > Ah, missed this as generally look at cover letter to find associated IGTs. A quick look at IGT and thinking we need to do slightly better. Off the top of my head something like: - Run a batch which does a DW write to scratch address - Bind scratch address /w IMMEDIATE cleared - Run a batch which does a DW write to newly bound address, verfiy result Matt > /Thomas > > > > > > Matt > > > > > --- > > > drivers/gpu/drm/xe/xe_pt.c | 66 ++++++++++++++++++++++------ > > > ---- > > > drivers/gpu/drm/xe/xe_pt_types.h | 2 + > > > drivers/gpu/drm/xe/xe_vm.c | 29 ++++++++++++-- > > > drivers/gpu/drm/xe/xe_vm_types.h | 2 + > > > 4 files changed, 75 insertions(+), 24 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c > > > b/drivers/gpu/drm/xe/xe_pt.c > > > index 1ddcc7e79a93..3fd0ae2dbe7d 100644 > > > --- a/drivers/gpu/drm/xe/xe_pt.c > > > +++ b/drivers/gpu/drm/xe/xe_pt.c > > > @@ -268,6 +268,8 @@ struct xe_pt_stage_bind_walk { > > > * granularity. > > > */ > > > bool needs_64K; > > > + /* @clear_pt: clear page table entries during the bind > > > walk */ > > > + bool clear_pt; > > > /** > > > * @vma: VMA being mapped > > > */ > > > @@ -497,21 +499,25 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, > > > pgoff_t offset, > > > > > > XE_WARN_ON(xe_walk->va_curs_start != addr); > > > > > > - pte = vm->pt_ops->pte_encode_vma(is_null ? 0 : > > > - xe_res_dma(curs) > > > + xe_walk->dma_offset, > > > - xe_walk->vma, > > > pat_index, level); > > > - pte |= xe_walk->default_pte; > > > + if (xe_walk->clear_pt) { > > > + pte = 0; > > > + } else { > > > + pte = vm->pt_ops->pte_encode_vma(is_null ? > > > 0 : > > > + xe_res_dma(curs) + > > > xe_walk->dma_offset, > > > + xe_walk->vma, pat_index, > > > level); > > > + pte |= xe_walk->default_pte; > > > > > > - /* > > > - * Set the XE_PTE_PS64 hint if possible, otherwise > > > if > > > - * this device *requires* 64K PTE size for VRAM, > > > fail. > > > - */ > > > - if (level == 0 && !xe_parent->is_compact) { > > > - if (xe_pt_is_pte_ps64K(addr, next, > > > xe_walk)) { > > > - xe_walk->vma->gpuva.flags |= > > > XE_VMA_PTE_64K; > > > - pte |= XE_PTE_PS64; > > > - } else if (XE_WARN_ON(xe_walk->needs_64K)) > > > { > > > - return -EINVAL; > > > + /* > > > + * Set the XE_PTE_PS64 hint if possible, > > > otherwise if > > > + * this device *requires* 64K PTE size for > > > VRAM, fail. > > > + */ > > > + if (level == 0 && !xe_parent->is_compact) > > > { > > > + if (xe_pt_is_pte_ps64K(addr, next, > > > xe_walk)) { > > > + xe_walk->vma->gpuva.flags > > > |= XE_VMA_PTE_64K; > > > + pte |= XE_PTE_PS64; > > > + } else if (XE_WARN_ON(xe_walk- > > > >needs_64K)) { > > > + return -EINVAL; > > > + } > > > } > > > } > > > > > > @@ -519,7 +525,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, > > > pgoff_t offset, > > > if (unlikely(ret)) > > > return ret; > > > > > > - if (!is_null) > > > + if (!is_null && !xe_walk->clear_pt) > > > xe_res_next(curs, next - addr); > > > xe_walk->va_curs_start = next; > > > xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << > > > level); > > > @@ -589,6 +595,7 @@ static const struct xe_pt_walk_ops > > > xe_pt_stage_bind_ops = { > > > * @vma: The vma indicating the address range. > > > * @entries: Storage for the update entries used for connecting > > > the tree to > > > * the main tree at commit time. > > > + * @clear_pt: Clear the page table entries. > > > * @num_entries: On output contains the number of @entries used. > > > * > > > * This function builds a disconnected page-table tree for a given > > > address > > > @@ -602,7 +609,8 @@ static const struct xe_pt_walk_ops > > > xe_pt_stage_bind_ops = { > > > */ > > > static int > > > xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, > > > - struct xe_vm_pgtable_update *entries, u32 > > > *num_entries) > > > + struct xe_vm_pgtable_update *entries, > > > + bool clear_pt, u32 *num_entries) > > > { > > > struct xe_device *xe = tile_to_xe(tile); > > > struct xe_bo *bo = xe_vma_bo(vma); > > > @@ -622,10 +630,19 @@ xe_pt_stage_bind(struct xe_tile *tile, struct > > > xe_vma *vma, > > > .vma = vma, > > > .wupd.entries = entries, > > > .needs_64K = (xe_vma_vm(vma)->flags & > > > XE_VM_FLAG_64K) && is_devmem, > > > + .clear_pt = clear_pt, > > > }; > > > struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id]; > > > int ret; > > > > > > + if (clear_pt) { > > > + ret = xe_pt_walk_range(&pt->base, pt->level, > > > xe_vma_start(vma), > > > + xe_vma_end(vma), > > > &xe_walk.base); > > > + > > > + *num_entries = xe_walk.wupd.num_used_entries; > > > + return ret; > > > + } > > > + > > > /** > > > * Default atomic expectations for different allocation > > > scenarios are as follows: > > > * > > > @@ -981,12 +998,14 @@ static void xe_pt_free_bind(struct > > > xe_vm_pgtable_update *entries, > > > > > > static int > > > xe_pt_prepare_bind(struct xe_tile *tile, struct xe_vma *vma, > > > - struct xe_vm_pgtable_update *entries, u32 > > > *num_entries) > > > + struct xe_vm_pgtable_update *entries, > > > + bool invalidate_on_bind, u32 *num_entries) > > > { > > > int err; > > > > > > *num_entries = 0; > > > - err = xe_pt_stage_bind(tile, vma, entries, num_entries); > > > + err = xe_pt_stage_bind(tile, vma, entries, > > > invalidate_on_bind, > > > + num_entries); > > > if (!err) > > > xe_tile_assert(tile, *num_entries); > > > > > > @@ -1661,6 +1680,7 @@ static int bind_op_prepare(struct xe_vm *vm, > > > struct xe_tile *tile, > > > return err; > > > > > > err = xe_pt_prepare_bind(tile, vma, pt_op->entries, > > > + pt_update_ops- > > > >invalidate_on_bind, > > > &pt_op->num_entries); > > > if (!err) { > > > xe_tile_assert(tile, pt_op->num_entries <= > > > @@ -1685,7 +1705,7 @@ static int bind_op_prepare(struct xe_vm *vm, > > > struct xe_tile *tile, > > > * it needs to be done here. > > > */ > > > if ((!pt_op->rebind && xe_vm_has_scratch(vm) && > > > - xe_vm_in_preempt_fence_mode(vm))) > > > + xe_vm_in_preempt_fence_mode(vm)) || > > > pt_update_ops->invalidate_on_bind) > > > pt_update_ops->needs_invalidation = true; > > > else if (pt_op->rebind && !xe_vm_in_lr_mode(vm)) > > > /* We bump also if batch_invalidate_tlb is > > > true */ > > > @@ -1759,9 +1779,13 @@ static int op_prepare(struct xe_vm *vm, > > > > > > switch (op->base.op) { > > > case DRM_GPUVA_OP_MAP: > > > - if (!op->map.immediate && xe_vm_in_fault_mode(vm)) > > > + if (!op->map.immediate && xe_vm_in_fault_mode(vm) > > > && > > > + !op->map.invalidate_on_bind) > > > break; > > > > > > + if (op->map.invalidate_on_bind) > > > + pt_update_ops->invalidate_on_bind = true; > > > + > > > err = bind_op_prepare(vm, tile, pt_update_ops, op- > > > >map.vma); > > > pt_update_ops->wait_vm_kernel = true; > > > break; > > > @@ -1871,6 +1895,8 @@ static void bind_op_commit(struct xe_vm *vm, > > > struct xe_tile *tile, > > > } > > > vma->tile_present |= BIT(tile->id); > > > vma->tile_staged &= ~BIT(tile->id); > > > + if (pt_update_ops->invalidate_on_bind) > > > + vma->tile_invalidated |= BIT(tile->id); > > > if (xe_vma_is_userptr(vma)) { > > > lockdep_assert_held_read(&vm- > > > >userptr.notifier_lock); > > > to_userptr_vma(vma)->userptr.initial_bind = true; > > > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h > > > b/drivers/gpu/drm/xe/xe_pt_types.h > > > index 384cc04de719..3d0aa2a5102e 100644 > > > --- a/drivers/gpu/drm/xe/xe_pt_types.h > > > +++ b/drivers/gpu/drm/xe/xe_pt_types.h > > > @@ -108,6 +108,8 @@ struct xe_vm_pgtable_update_ops { > > > bool needs_userptr_lock; > > > /** @needs_invalidation: Needs invalidation */ > > > bool needs_invalidation; > > > + /** @invalidate_on_bind: Invalidate the range before bind > > > */ > > > + bool invalidate_on_bind; > > > /** > > > * @wait_vm_bookkeep: PT operations need to wait until VM > > > is idle > > > * (bookkeep dma-resv slots are idle) and stage all future > > > VM activity > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > > > b/drivers/gpu/drm/xe/xe_vm.c > > > index d664f2e418b2..813d893d9b63 100644 > > > --- a/drivers/gpu/drm/xe/xe_vm.c > > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > > @@ -1921,6 +1921,23 @@ static void print_op(struct xe_device *xe, > > > struct drm_gpuva_op *op) > > > } > > > #endif > > > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_vm *vm, > > > u32 bind_flags) > > > +{ > > > + if (!xe_vm_in_fault_mode(vm)) > > > + return false; > > > + > > > + if (!NEEDS_SCRATCH(vm->xe)) > > > + return false; > > > + > > > + if (!xe_vm_has_scratch(vm)) > > > + return false; > > > + > > > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > > + return false; > > > + > > > + return true; > > > +} > > > + > > > /* > > > * Create operations list from IOCTL arguments, setup operations > > > fields so parse > > > * and commit steps are decoupled from IOCTL arguments. This step > > > can fail. > > > @@ -1991,6 +2008,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, > > > struct xe_bo *bo, > > > op->map.is_null = flags & > > > DRM_XE_VM_BIND_FLAG_NULL; > > > op->map.dumpable = flags & > > > DRM_XE_VM_BIND_FLAG_DUMPABLE; > > > op->map.pat_index = pat_index; > > > + op->map.invalidate_on_bind = > > > + __xe_vm_needs_clear_scratch_pages( > > > vm, flags); > > > } else if (__op->op == DRM_GPUVA_OP_PREFETCH) { > > > op->prefetch.region = prefetch_region; > > > } > > > @@ -2188,7 +2207,8 @@ static int vm_bind_ioctl_ops_parse(struct > > > xe_vm *vm, struct drm_gpuva_ops *ops, > > > return PTR_ERR(vma); > > > > > > op->map.vma = vma; > > > - if (op->map.immediate || > > > !xe_vm_in_fault_mode(vm)) > > > + if (op->map.immediate || > > > !xe_vm_in_fault_mode(vm) || > > > + op->map.invalidate_on_bind) > > > xe_vma_ops_incr_pt_update_ops(vops > > > , > > > op- > > > >tile_mask); > > > break; > > > @@ -2416,9 +2436,10 @@ static int op_lock_and_prep(struct drm_exec > > > *exec, struct xe_vm *vm, > > > > > > switch (op->base.op) { > > > case DRM_GPUVA_OP_MAP: > > > - err = vma_lock_and_validate(exec, op->map.vma, > > > - > > > !xe_vm_in_fault_mode(vm) || > > > - op->map.immediate); > > > + if (!op->map.invalidate_on_bind) > > > + err = vma_lock_and_validate(exec, op- > > > >map.vma, > > > + > > > !xe_vm_in_fault_mode(vm) || > > > + op- > > > >map.immediate); > > > break; > > > case DRM_GPUVA_OP_REMAP: > > > err = check_ufence(gpuva_to_vma(op- > > > >base.remap.unmap->va)); > > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h > > > b/drivers/gpu/drm/xe/xe_vm_types.h > > > index 52467b9b5348..dace04f4ea5e 100644 > > > --- a/drivers/gpu/drm/xe/xe_vm_types.h > > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h > > > @@ -297,6 +297,8 @@ struct xe_vma_op_map { > > > bool is_null; > > > /** @dumpable: whether BO is dumped on GPU hang */ > > > bool dumpable; > > > + /** @invalidate: invalidate the VMA before bind */ > > > + bool invalidate_on_bind; > > > /** @pat_index: The pat index to use for this operation. > > > */ > > > u16 pat_index; > > > }; > > > -- > > > 2.26.3 > > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-02-06 10:34 ` Matthew Brost 2025-02-06 10:43 ` Thomas Hellström @ 2025-02-06 15:16 ` Zeng, Oak 2025-02-10 17:09 ` Matthew Brost 1 sibling, 1 reply; 32+ messages in thread From: Zeng, Oak @ 2025-02-06 15:16 UTC (permalink / raw) To: Brost, Matthew Cc: intel-xe@lists.freedesktop.org, Thomas.Hellstrom@linux.intel.com, Cavitt, Jonathan > -----Original Message----- > From: Brost, Matthew <matthew.brost@intel.com> > Sent: February 6, 2025 5:35 AM > To: Zeng, Oak <oak.zeng@intel.com> > Cc: intel-xe@lists.freedesktop.org; > Thomas.Hellstrom@linux.intel.com; Cavitt, Jonathan > <jonathan.cavitt@intel.com> > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind > > On Tue, Feb 04, 2025 at 01:45:57PM -0500, Oak Zeng wrote: > > When a vm runs under fault mode, if scratch page is enabled, we > need > > to clear the scratch page mapping before vm_bind for the vm_bind > > address range. Under fault mode, we depend on recoverable page > fault > > to establish mapping in page table. If scratch page is not cleared, > > GPU access of address won't cause page fault because it always hits > > the existing scratch page mapping. > > > > When vm_bind with IMMEDIATE flag, there is no need of clearing as > > immediate bind can overwrite the scratch page mapping. > > > > So far only is xe2 and xe3 products are allowed to enable scratch > page > > under fault mode. On other platform we don't allow scratch page > under > > fault mode, so no need of such clearing. > > > > v2: Rework vm_bind pipeline to clear scratch page mapping. This is > similar > > to a map operation, with the exception that PTEs are cleared > instead of > > pointing to valid physical pages. (Matt, Thomas) > > > > TLB invalidation is needed after clear scratch page mapping as larger > > scratch page mapping could be backed by physical page and cached > in > > TLB. (Matt, Thomas) > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > Given the complexity of the VM bind path, I think we need an IGT > posted > with this series before merging as I suggested in v1. Without it, it > will be fairly difficult to ensure correctness by reviews only. I did have an igt posted in patch 3 of this series. The reason I posted on patch 3 is, without patch 3 this "feature" Is not enabled. Oak > > Matt > > > --- > > drivers/gpu/drm/xe/xe_pt.c | 66 ++++++++++++++++++++++-- > -------- > > drivers/gpu/drm/xe/xe_pt_types.h | 2 + > > drivers/gpu/drm/xe/xe_vm.c | 29 ++++++++++++-- > > drivers/gpu/drm/xe/xe_vm_types.h | 2 + > > 4 files changed, 75 insertions(+), 24 deletions(-) > > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c > b/drivers/gpu/drm/xe/xe_pt.c > > index 1ddcc7e79a93..3fd0ae2dbe7d 100644 > > --- a/drivers/gpu/drm/xe/xe_pt.c > > +++ b/drivers/gpu/drm/xe/xe_pt.c > > @@ -268,6 +268,8 @@ struct xe_pt_stage_bind_walk { > > * granularity. > > */ > > bool needs_64K; > > + /* @clear_pt: clear page table entries during the bind walk */ > > + bool clear_pt; > > /** > > * @vma: VMA being mapped > > */ > > @@ -497,21 +499,25 @@ xe_pt_stage_bind_entry(struct xe_ptw > *parent, pgoff_t offset, > > > > XE_WARN_ON(xe_walk->va_curs_start != addr); > > > > - pte = vm->pt_ops->pte_encode_vma(is_null ? 0 : > > - xe_res_dma(curs) + > xe_walk->dma_offset, > > - xe_walk->vma, > pat_index, level); > > - pte |= xe_walk->default_pte; > > + if (xe_walk->clear_pt) { > > + pte = 0; > > + } else { > > + pte = vm->pt_ops->pte_encode_vma(is_null ? > 0 : > > + xe_res_dma(curs) + xe_walk- > >dma_offset, > > + xe_walk->vma, pat_index, > level); > > + pte |= xe_walk->default_pte; > > > > - /* > > - * Set the XE_PTE_PS64 hint if possible, otherwise if > > - * this device *requires* 64K PTE size for VRAM, fail. > > - */ > > - if (level == 0 && !xe_parent->is_compact) { > > - if (xe_pt_is_pte_ps64K(addr, next, xe_walk)) > { > > - xe_walk->vma->gpuva.flags |= > XE_VMA_PTE_64K; > > - pte |= XE_PTE_PS64; > > - } else if (XE_WARN_ON(xe_walk- > >needs_64K)) { > > - return -EINVAL; > > + /* > > + * Set the XE_PTE_PS64 hint if possible, > otherwise if > > + * this device *requires* 64K PTE size for > VRAM, fail. > > + */ > > + if (level == 0 && !xe_parent->is_compact) { > > + if (xe_pt_is_pte_ps64K(addr, next, > xe_walk)) { > > + xe_walk->vma->gpuva.flags > |= XE_VMA_PTE_64K; > > + pte |= XE_PTE_PS64; > > + } else if (XE_WARN_ON(xe_walk- > >needs_64K)) { > > + return -EINVAL; > > + } > > } > > } > > > > @@ -519,7 +525,7 @@ xe_pt_stage_bind_entry(struct xe_ptw > *parent, pgoff_t offset, > > if (unlikely(ret)) > > return ret; > > > > - if (!is_null) > > + if (!is_null && !xe_walk->clear_pt) > > xe_res_next(curs, next - addr); > > xe_walk->va_curs_start = next; > > xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << > level); > > @@ -589,6 +595,7 @@ static const struct xe_pt_walk_ops > xe_pt_stage_bind_ops = { > > * @vma: The vma indicating the address range. > > * @entries: Storage for the update entries used for connecting the > tree to > > * the main tree at commit time. > > + * @clear_pt: Clear the page table entries. > > * @num_entries: On output contains the number of @entries > used. > > * > > * This function builds a disconnected page-table tree for a given > address > > @@ -602,7 +609,8 @@ static const struct xe_pt_walk_ops > xe_pt_stage_bind_ops = { > > */ > > static int > > xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, > > - struct xe_vm_pgtable_update *entries, u32 > *num_entries) > > + struct xe_vm_pgtable_update *entries, > > + bool clear_pt, u32 *num_entries) > > { > > struct xe_device *xe = tile_to_xe(tile); > > struct xe_bo *bo = xe_vma_bo(vma); > > @@ -622,10 +630,19 @@ xe_pt_stage_bind(struct xe_tile *tile, > struct xe_vma *vma, > > .vma = vma, > > .wupd.entries = entries, > > .needs_64K = (xe_vma_vm(vma)->flags & > XE_VM_FLAG_64K) && is_devmem, > > + .clear_pt = clear_pt, > > }; > > struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id]; > > int ret; > > > > + if (clear_pt) { > > + ret = xe_pt_walk_range(&pt->base, pt->level, > xe_vma_start(vma), > > + xe_vma_end(vma), > &xe_walk.base); > > + > > + *num_entries = xe_walk.wupd.num_used_entries; > > + return ret; > > + } > > + > > /** > > * Default atomic expectations for different allocation > scenarios are as follows: > > * > > @@ -981,12 +998,14 @@ static void xe_pt_free_bind(struct > xe_vm_pgtable_update *entries, > > > > static int > > xe_pt_prepare_bind(struct xe_tile *tile, struct xe_vma *vma, > > - struct xe_vm_pgtable_update *entries, u32 > *num_entries) > > + struct xe_vm_pgtable_update *entries, > > + bool invalidate_on_bind, u32 *num_entries) > > { > > int err; > > > > *num_entries = 0; > > - err = xe_pt_stage_bind(tile, vma, entries, num_entries); > > + err = xe_pt_stage_bind(tile, vma, entries, invalidate_on_bind, > > + num_entries); > > if (!err) > > xe_tile_assert(tile, *num_entries); > > > > @@ -1661,6 +1680,7 @@ static int bind_op_prepare(struct xe_vm > *vm, struct xe_tile *tile, > > return err; > > > > err = xe_pt_prepare_bind(tile, vma, pt_op->entries, > > + pt_update_ops->invalidate_on_bind, > > &pt_op->num_entries); > > if (!err) { > > xe_tile_assert(tile, pt_op->num_entries <= > > @@ -1685,7 +1705,7 @@ static int bind_op_prepare(struct xe_vm > *vm, struct xe_tile *tile, > > * it needs to be done here. > > */ > > if ((!pt_op->rebind && xe_vm_has_scratch(vm) && > > - xe_vm_in_preempt_fence_mode(vm))) > > + xe_vm_in_preempt_fence_mode(vm)) || > pt_update_ops->invalidate_on_bind) > > pt_update_ops->needs_invalidation = true; > > else if (pt_op->rebind && !xe_vm_in_lr_mode(vm)) > > /* We bump also if batch_invalidate_tlb is > true */ > > @@ -1759,9 +1779,13 @@ static int op_prepare(struct xe_vm *vm, > > > > switch (op->base.op) { > > case DRM_GPUVA_OP_MAP: > > - if (!op->map.immediate && > xe_vm_in_fault_mode(vm)) > > + if (!op->map.immediate && > xe_vm_in_fault_mode(vm) && > > + !op->map.invalidate_on_bind) > > break; > > > > + if (op->map.invalidate_on_bind) > > + pt_update_ops->invalidate_on_bind = true; > > + > > err = bind_op_prepare(vm, tile, pt_update_ops, op- > >map.vma); > > pt_update_ops->wait_vm_kernel = true; > > break; > > @@ -1871,6 +1895,8 @@ static void bind_op_commit(struct xe_vm > *vm, struct xe_tile *tile, > > } > > vma->tile_present |= BIT(tile->id); > > vma->tile_staged &= ~BIT(tile->id); > > + if (pt_update_ops->invalidate_on_bind) > > + vma->tile_invalidated |= BIT(tile->id); > > if (xe_vma_is_userptr(vma)) { > > lockdep_assert_held_read(&vm- > >userptr.notifier_lock); > > to_userptr_vma(vma)->userptr.initial_bind = true; > > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h > b/drivers/gpu/drm/xe/xe_pt_types.h > > index 384cc04de719..3d0aa2a5102e 100644 > > --- a/drivers/gpu/drm/xe/xe_pt_types.h > > +++ b/drivers/gpu/drm/xe/xe_pt_types.h > > @@ -108,6 +108,8 @@ struct xe_vm_pgtable_update_ops { > > bool needs_userptr_lock; > > /** @needs_invalidation: Needs invalidation */ > > bool needs_invalidation; > > + /** @invalidate_on_bind: Invalidate the range before bind */ > > + bool invalidate_on_bind; > > /** > > * @wait_vm_bookkeep: PT operations need to wait until VM > is idle > > * (bookkeep dma-resv slots are idle) and stage all future VM > activity > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > b/drivers/gpu/drm/xe/xe_vm.c > > index d664f2e418b2..813d893d9b63 100644 > > --- a/drivers/gpu/drm/xe/xe_vm.c > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > @@ -1921,6 +1921,23 @@ static void print_op(struct xe_device *xe, > struct drm_gpuva_op *op) > > } > > #endif > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_vm > *vm, u32 bind_flags) > > +{ > > + if (!xe_vm_in_fault_mode(vm)) > > + return false; > > + > > + if (!NEEDS_SCRATCH(vm->xe)) > > + return false; > > + > > + if (!xe_vm_has_scratch(vm)) > > + return false; > > + > > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > + return false; > > + > > + return true; > > +} > > + > > /* > > * Create operations list from IOCTL arguments, setup operations > fields so parse > > * and commit steps are decoupled from IOCTL arguments. This > step can fail. > > @@ -1991,6 +2008,8 @@ vm_bind_ioctl_ops_create(struct xe_vm > *vm, struct xe_bo *bo, > > op->map.is_null = flags & > DRM_XE_VM_BIND_FLAG_NULL; > > op->map.dumpable = flags & > DRM_XE_VM_BIND_FLAG_DUMPABLE; > > op->map.pat_index = pat_index; > > + op->map.invalidate_on_bind = > > + > __xe_vm_needs_clear_scratch_pages(vm, flags); > > } else if (__op->op == DRM_GPUVA_OP_PREFETCH) { > > op->prefetch.region = prefetch_region; > > } > > @@ -2188,7 +2207,8 @@ static int vm_bind_ioctl_ops_parse(struct > xe_vm *vm, struct drm_gpuva_ops *ops, > > return PTR_ERR(vma); > > > > op->map.vma = vma; > > - if (op->map.immediate > || !xe_vm_in_fault_mode(vm)) > > + if (op->map.immediate > || !xe_vm_in_fault_mode(vm) || > > + op->map.invalidate_on_bind) > > > xe_vma_ops_incr_pt_update_ops(vops, > > op- > >tile_mask); > > break; > > @@ -2416,9 +2436,10 @@ static int op_lock_and_prep(struct > drm_exec *exec, struct xe_vm *vm, > > > > switch (op->base.op) { > > case DRM_GPUVA_OP_MAP: > > - err = vma_lock_and_validate(exec, op->map.vma, > > - !xe_vm_in_fault_mode(vm) > || > > - op->map.immediate); > > + if (!op->map.invalidate_on_bind) > > + err = vma_lock_and_validate(exec, op- > >map.vma, > > + > !xe_vm_in_fault_mode(vm) || > > + op- > >map.immediate); > > break; > > case DRM_GPUVA_OP_REMAP: > > err = check_ufence(gpuva_to_vma(op- > >base.remap.unmap->va)); > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h > b/drivers/gpu/drm/xe/xe_vm_types.h > > index 52467b9b5348..dace04f4ea5e 100644 > > --- a/drivers/gpu/drm/xe/xe_vm_types.h > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h > > @@ -297,6 +297,8 @@ struct xe_vma_op_map { > > bool is_null; > > /** @dumpable: whether BO is dumped on GPU hang */ > > bool dumpable; > > + /** @invalidate: invalidate the VMA before bind */ > > + bool invalidate_on_bind; > > /** @pat_index: The pat index to use for this operation. */ > > u16 pat_index; > > }; > > -- > > 2.26.3 > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind 2025-02-06 15:16 ` Zeng, Oak @ 2025-02-10 17:09 ` Matthew Brost 0 siblings, 0 replies; 32+ messages in thread From: Matthew Brost @ 2025-02-10 17:09 UTC (permalink / raw) To: Zeng, Oak Cc: intel-xe@lists.freedesktop.org, Thomas.Hellstrom@linux.intel.com, Cavitt, Jonathan On Thu, Feb 06, 2025 at 08:16:36AM -0700, Zeng, Oak wrote: > > > > -----Original Message----- > > From: Brost, Matthew <matthew.brost@intel.com> > > Sent: February 6, 2025 5:35 AM > > To: Zeng, Oak <oak.zeng@intel.com> > > Cc: intel-xe@lists.freedesktop.org; > > Thomas.Hellstrom@linux.intel.com; Cavitt, Jonathan > > <jonathan.cavitt@intel.com> > > Subject: Re: [PATCH 2/3] drm/xe: Clear scratch page before vm_bind > > > > On Tue, Feb 04, 2025 at 01:45:57PM -0500, Oak Zeng wrote: > > > When a vm runs under fault mode, if scratch page is enabled, we > > need > > > to clear the scratch page mapping before vm_bind for the vm_bind > > > address range. Under fault mode, we depend on recoverable page > > fault > > > to establish mapping in page table. If scratch page is not cleared, > > > GPU access of address won't cause page fault because it always hits > > > the existing scratch page mapping. > > > > > > When vm_bind with IMMEDIATE flag, there is no need of clearing as > > > immediate bind can overwrite the scratch page mapping. > > > > > > So far only is xe2 and xe3 products are allowed to enable scratch > > page > > > under fault mode. On other platform we don't allow scratch page > > under > > > fault mode, so no need of such clearing. > > > > > > v2: Rework vm_bind pipeline to clear scratch page mapping. This is > > similar > > > to a map operation, with the exception that PTEs are cleared > > instead of > > > pointing to valid physical pages. (Matt, Thomas) > > > > > > TLB invalidation is needed after clear scratch page mapping as larger > > > scratch page mapping could be backed by physical page and cached > > in > > > TLB. (Matt, Thomas) > > > > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com> > > > > Given the complexity of the VM bind path, I think we need an IGT > > posted > > with this series before merging as I suggested in v1. Without it, it > > will be fairly difficult to ensure correctness by reviews only. > > I did have an igt posted in patch 3 of this series. > > The reason I posted on patch 3 is, without patch 3 this "feature" > Is not enabled. > Just replied to Thomas, see my reply there - I did miss an IGT was included but I think we need to do a bit better, more details in my reply to Thomas. Matt > Oak > > > > > Matt > > > > > --- > > > drivers/gpu/drm/xe/xe_pt.c | 66 ++++++++++++++++++++++-- > > -------- > > > drivers/gpu/drm/xe/xe_pt_types.h | 2 + > > > drivers/gpu/drm/xe/xe_vm.c | 29 ++++++++++++-- > > > drivers/gpu/drm/xe/xe_vm_types.h | 2 + > > > 4 files changed, 75 insertions(+), 24 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c > > b/drivers/gpu/drm/xe/xe_pt.c > > > index 1ddcc7e79a93..3fd0ae2dbe7d 100644 > > > --- a/drivers/gpu/drm/xe/xe_pt.c > > > +++ b/drivers/gpu/drm/xe/xe_pt.c > > > @@ -268,6 +268,8 @@ struct xe_pt_stage_bind_walk { > > > * granularity. > > > */ > > > bool needs_64K; > > > + /* @clear_pt: clear page table entries during the bind walk */ > > > + bool clear_pt; > > > /** > > > * @vma: VMA being mapped > > > */ > > > @@ -497,21 +499,25 @@ xe_pt_stage_bind_entry(struct xe_ptw > > *parent, pgoff_t offset, > > > > > > XE_WARN_ON(xe_walk->va_curs_start != addr); > > > > > > - pte = vm->pt_ops->pte_encode_vma(is_null ? 0 : > > > - xe_res_dma(curs) + > > xe_walk->dma_offset, > > > - xe_walk->vma, > > pat_index, level); > > > - pte |= xe_walk->default_pte; > > > + if (xe_walk->clear_pt) { > > > + pte = 0; > > > + } else { > > > + pte = vm->pt_ops->pte_encode_vma(is_null ? > > 0 : > > > + xe_res_dma(curs) + xe_walk- > > >dma_offset, > > > + xe_walk->vma, pat_index, > > level); > > > + pte |= xe_walk->default_pte; > > > > > > - /* > > > - * Set the XE_PTE_PS64 hint if possible, otherwise if > > > - * this device *requires* 64K PTE size for VRAM, fail. > > > - */ > > > - if (level == 0 && !xe_parent->is_compact) { > > > - if (xe_pt_is_pte_ps64K(addr, next, xe_walk)) > > { > > > - xe_walk->vma->gpuva.flags |= > > XE_VMA_PTE_64K; > > > - pte |= XE_PTE_PS64; > > > - } else if (XE_WARN_ON(xe_walk- > > >needs_64K)) { > > > - return -EINVAL; > > > + /* > > > + * Set the XE_PTE_PS64 hint if possible, > > otherwise if > > > + * this device *requires* 64K PTE size for > > VRAM, fail. > > > + */ > > > + if (level == 0 && !xe_parent->is_compact) { > > > + if (xe_pt_is_pte_ps64K(addr, next, > > xe_walk)) { > > > + xe_walk->vma->gpuva.flags > > |= XE_VMA_PTE_64K; > > > + pte |= XE_PTE_PS64; > > > + } else if (XE_WARN_ON(xe_walk- > > >needs_64K)) { > > > + return -EINVAL; > > > + } > > > } > > > } > > > > > > @@ -519,7 +525,7 @@ xe_pt_stage_bind_entry(struct xe_ptw > > *parent, pgoff_t offset, > > > if (unlikely(ret)) > > > return ret; > > > > > > - if (!is_null) > > > + if (!is_null && !xe_walk->clear_pt) > > > xe_res_next(curs, next - addr); > > > xe_walk->va_curs_start = next; > > > xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << > > level); > > > @@ -589,6 +595,7 @@ static const struct xe_pt_walk_ops > > xe_pt_stage_bind_ops = { > > > * @vma: The vma indicating the address range. > > > * @entries: Storage for the update entries used for connecting the > > tree to > > > * the main tree at commit time. > > > + * @clear_pt: Clear the page table entries. > > > * @num_entries: On output contains the number of @entries > > used. > > > * > > > * This function builds a disconnected page-table tree for a given > > address > > > @@ -602,7 +609,8 @@ static const struct xe_pt_walk_ops > > xe_pt_stage_bind_ops = { > > > */ > > > static int > > > xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, > > > - struct xe_vm_pgtable_update *entries, u32 > > *num_entries) > > > + struct xe_vm_pgtable_update *entries, > > > + bool clear_pt, u32 *num_entries) > > > { > > > struct xe_device *xe = tile_to_xe(tile); > > > struct xe_bo *bo = xe_vma_bo(vma); > > > @@ -622,10 +630,19 @@ xe_pt_stage_bind(struct xe_tile *tile, > > struct xe_vma *vma, > > > .vma = vma, > > > .wupd.entries = entries, > > > .needs_64K = (xe_vma_vm(vma)->flags & > > XE_VM_FLAG_64K) && is_devmem, > > > + .clear_pt = clear_pt, > > > }; > > > struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id]; > > > int ret; > > > > > > + if (clear_pt) { > > > + ret = xe_pt_walk_range(&pt->base, pt->level, > > xe_vma_start(vma), > > > + xe_vma_end(vma), > > &xe_walk.base); > > > + > > > + *num_entries = xe_walk.wupd.num_used_entries; > > > + return ret; > > > + } > > > + > > > /** > > > * Default atomic expectations for different allocation > > scenarios are as follows: > > > * > > > @@ -981,12 +998,14 @@ static void xe_pt_free_bind(struct > > xe_vm_pgtable_update *entries, > > > > > > static int > > > xe_pt_prepare_bind(struct xe_tile *tile, struct xe_vma *vma, > > > - struct xe_vm_pgtable_update *entries, u32 > > *num_entries) > > > + struct xe_vm_pgtable_update *entries, > > > + bool invalidate_on_bind, u32 *num_entries) > > > { > > > int err; > > > > > > *num_entries = 0; > > > - err = xe_pt_stage_bind(tile, vma, entries, num_entries); > > > + err = xe_pt_stage_bind(tile, vma, entries, invalidate_on_bind, > > > + num_entries); > > > if (!err) > > > xe_tile_assert(tile, *num_entries); > > > > > > @@ -1661,6 +1680,7 @@ static int bind_op_prepare(struct xe_vm > > *vm, struct xe_tile *tile, > > > return err; > > > > > > err = xe_pt_prepare_bind(tile, vma, pt_op->entries, > > > + pt_update_ops->invalidate_on_bind, > > > &pt_op->num_entries); > > > if (!err) { > > > xe_tile_assert(tile, pt_op->num_entries <= > > > @@ -1685,7 +1705,7 @@ static int bind_op_prepare(struct xe_vm > > *vm, struct xe_tile *tile, > > > * it needs to be done here. > > > */ > > > if ((!pt_op->rebind && xe_vm_has_scratch(vm) && > > > - xe_vm_in_preempt_fence_mode(vm))) > > > + xe_vm_in_preempt_fence_mode(vm)) || > > pt_update_ops->invalidate_on_bind) > > > pt_update_ops->needs_invalidation = true; > > > else if (pt_op->rebind && !xe_vm_in_lr_mode(vm)) > > > /* We bump also if batch_invalidate_tlb is > > true */ > > > @@ -1759,9 +1779,13 @@ static int op_prepare(struct xe_vm *vm, > > > > > > switch (op->base.op) { > > > case DRM_GPUVA_OP_MAP: > > > - if (!op->map.immediate && > > xe_vm_in_fault_mode(vm)) > > > + if (!op->map.immediate && > > xe_vm_in_fault_mode(vm) && > > > + !op->map.invalidate_on_bind) > > > break; > > > > > > + if (op->map.invalidate_on_bind) > > > + pt_update_ops->invalidate_on_bind = true; > > > + > > > err = bind_op_prepare(vm, tile, pt_update_ops, op- > > >map.vma); > > > pt_update_ops->wait_vm_kernel = true; > > > break; > > > @@ -1871,6 +1895,8 @@ static void bind_op_commit(struct xe_vm > > *vm, struct xe_tile *tile, > > > } > > > vma->tile_present |= BIT(tile->id); > > > vma->tile_staged &= ~BIT(tile->id); > > > + if (pt_update_ops->invalidate_on_bind) > > > + vma->tile_invalidated |= BIT(tile->id); > > > if (xe_vma_is_userptr(vma)) { > > > lockdep_assert_held_read(&vm- > > >userptr.notifier_lock); > > > to_userptr_vma(vma)->userptr.initial_bind = true; > > > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h > > b/drivers/gpu/drm/xe/xe_pt_types.h > > > index 384cc04de719..3d0aa2a5102e 100644 > > > --- a/drivers/gpu/drm/xe/xe_pt_types.h > > > +++ b/drivers/gpu/drm/xe/xe_pt_types.h > > > @@ -108,6 +108,8 @@ struct xe_vm_pgtable_update_ops { > > > bool needs_userptr_lock; > > > /** @needs_invalidation: Needs invalidation */ > > > bool needs_invalidation; > > > + /** @invalidate_on_bind: Invalidate the range before bind */ > > > + bool invalidate_on_bind; > > > /** > > > * @wait_vm_bookkeep: PT operations need to wait until VM > > is idle > > > * (bookkeep dma-resv slots are idle) and stage all future VM > > activity > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > > b/drivers/gpu/drm/xe/xe_vm.c > > > index d664f2e418b2..813d893d9b63 100644 > > > --- a/drivers/gpu/drm/xe/xe_vm.c > > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > > @@ -1921,6 +1921,23 @@ static void print_op(struct xe_device *xe, > > struct drm_gpuva_op *op) > > > } > > > #endif > > > > > > +static bool __xe_vm_needs_clear_scratch_pages(struct xe_vm > > *vm, u32 bind_flags) > > > +{ > > > + if (!xe_vm_in_fault_mode(vm)) > > > + return false; > > > + > > > + if (!NEEDS_SCRATCH(vm->xe)) > > > + return false; > > > + > > > + if (!xe_vm_has_scratch(vm)) > > > + return false; > > > + > > > + if (bind_flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE) > > > + return false; > > > + > > > + return true; > > > +} > > > + > > > /* > > > * Create operations list from IOCTL arguments, setup operations > > fields so parse > > > * and commit steps are decoupled from IOCTL arguments. This > > step can fail. > > > @@ -1991,6 +2008,8 @@ vm_bind_ioctl_ops_create(struct xe_vm > > *vm, struct xe_bo *bo, > > > op->map.is_null = flags & > > DRM_XE_VM_BIND_FLAG_NULL; > > > op->map.dumpable = flags & > > DRM_XE_VM_BIND_FLAG_DUMPABLE; > > > op->map.pat_index = pat_index; > > > + op->map.invalidate_on_bind = > > > + > > __xe_vm_needs_clear_scratch_pages(vm, flags); > > > } else if (__op->op == DRM_GPUVA_OP_PREFETCH) { > > > op->prefetch.region = prefetch_region; > > > } > > > @@ -2188,7 +2207,8 @@ static int vm_bind_ioctl_ops_parse(struct > > xe_vm *vm, struct drm_gpuva_ops *ops, > > > return PTR_ERR(vma); > > > > > > op->map.vma = vma; > > > - if (op->map.immediate > > || !xe_vm_in_fault_mode(vm)) > > > + if (op->map.immediate > > || !xe_vm_in_fault_mode(vm) || > > > + op->map.invalidate_on_bind) > > > > > xe_vma_ops_incr_pt_update_ops(vops, > > > op- > > >tile_mask); > > > break; > > > @@ -2416,9 +2436,10 @@ static int op_lock_and_prep(struct > > drm_exec *exec, struct xe_vm *vm, > > > > > > switch (op->base.op) { > > > case DRM_GPUVA_OP_MAP: > > > - err = vma_lock_and_validate(exec, op->map.vma, > > > - !xe_vm_in_fault_mode(vm) > > || > > > - op->map.immediate); > > > + if (!op->map.invalidate_on_bind) > > > + err = vma_lock_and_validate(exec, op- > > >map.vma, > > > + > > !xe_vm_in_fault_mode(vm) || > > > + op- > > >map.immediate); > > > break; > > > case DRM_GPUVA_OP_REMAP: > > > err = check_ufence(gpuva_to_vma(op- > > >base.remap.unmap->va)); > > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h > > b/drivers/gpu/drm/xe/xe_vm_types.h > > > index 52467b9b5348..dace04f4ea5e 100644 > > > --- a/drivers/gpu/drm/xe/xe_vm_types.h > > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h > > > @@ -297,6 +297,8 @@ struct xe_vma_op_map { > > > bool is_null; > > > /** @dumpable: whether BO is dumped on GPU hang */ > > > bool dumpable; > > > + /** @invalidate: invalidate the VMA before bind */ > > > + bool invalidate_on_bind; > > > /** @pat_index: The pat index to use for this operation. */ > > > u16 pat_index; > > > }; > > > -- > > > 2.26.3 > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-02-10 17:09 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng 2025-01-28 22:21 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng 2025-01-28 23:05 ` Cavitt, Jonathan 2025-01-28 23:19 ` Matthew Brost 2025-01-29 5:04 ` Matthew Brost 2025-01-29 8:21 ` Thomas Hellström 2025-01-29 21:01 ` Zeng, Oak 2025-01-30 1:36 ` Matthew Brost 2025-01-30 8:36 ` Thomas Hellström 2025-01-30 15:18 ` Zeng, Oak 2025-01-28 22:21 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng 2025-01-28 23:05 ` Cavitt, Jonathan 2025-01-29 8:52 ` Thomas Hellström 2025-01-29 16:41 ` Matthew Brost 2025-01-28 23:04 ` [PATCH 1/3] drm/xe: Add a function to zap page table by address range Cavitt, Jonathan 2025-01-28 23:12 ` ✓ CI.Patch_applied: success for series starting with [1/3] " Patchwork 2025-01-28 23:12 ` ✓ CI.checkpatch: " Patchwork 2025-01-28 23:13 ` ✗ CI.KUnit: failure " Patchwork 2025-01-28 23:38 ` [PATCH 1/3] " Matthew Brost 2025-01-29 8:48 ` Thomas Hellström -- strict thread matches above, loose matches on Subject: below -- 2025-02-04 18:45 [PATCH 1/3] drm/xe: Introduced needs_scratch bit in device descriptor Oak Zeng 2025-02-04 18:45 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng 2025-02-05 14:35 ` Thomas Hellström 2025-02-06 1:52 ` Zeng, Oak 2025-02-06 12:51 ` Thomas Hellström 2025-02-06 18:56 ` Zeng, Oak 2025-02-06 20:11 ` Thomas Hellström 2025-02-06 21:05 ` Zeng, Oak 2025-02-06 10:34 ` Matthew Brost 2025-02-06 10:43 ` Thomas Hellström 2025-02-10 17:08 ` Matthew Brost 2025-02-06 15:16 ` Zeng, Oak 2025-02-10 17:09 ` Matthew Brost
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox