* Re: [PATCH v3 3/4] drm/xe: Split TLB invalidation into submit and wait steps @ 2026-03-03 14:03 kernel test robot 0 siblings, 0 replies; 3+ messages in thread From: kernel test robot @ 2026-03-03 14:03 UTC (permalink / raw) To: oe-kbuild :::::: :::::: Manual check reason: "high confidence checkpatch report" :::::: BCC: lkp@intel.com CC: oe-kbuild-all@lists.linux.dev In-Reply-To: <20260303133409.11609-4-thomas.hellstrom@linux.intel.com> References: <20260303133409.11609-4-thomas.hellstrom@linux.intel.com> TO: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> TO: intel-xe@lists.freedesktop.org Hi Thomas, kernel test robot noticed the following build warnings: [auto build test WARNING on drm-xe/drm-xe-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/mm-mmu_notifier-Allow-two-pass-struct-mmu_interval_notifiers/20260303-213841 base: https://gitlab.freedesktop.org/drm/xe/kernel.git drm-xe-next patch link: https://lore.kernel.org/r/20260303133409.11609-4-thomas.hellstrom%40linux.intel.com patch subject: [PATCH v3 3/4] drm/xe: Split TLB invalidation into submit and wait steps :::::: branch date: 23 minutes ago :::::: commit date: 23 minutes ago reproduce: (https://download.01.org/0day-ci/archive/20260303/202603031501.fPtJyqCH-lkp@intel.com/reproduce) # many are suggestions rather than must-fix ERROR:BAD_SIGN_OFF: Unrecognized email address: 'GitHub Copilot:claude-sonnet-4.6' #26: Assisted-by: GitHub Copilot:claude-sonnet-4.6 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v3 0/4] Two-pass MMU interval notifiers
@ 2026-03-03 13:34 Thomas Hellström
2026-03-03 13:34 ` [PATCH v3 3/4] drm/xe: Split TLB invalidation into submit and wait steps Thomas Hellström
0 siblings, 1 reply; 3+ messages in thread
From: Thomas Hellström @ 2026-03-03 13:34 UTC (permalink / raw)
To: intel-xe
Cc: Thomas Hellström, Matthew Brost, Jason Gunthorpe,
Andrew Morton, Simona Vetter, Dave Airlie, Alistair Popple,
dri-devel, linux-mm, linux-kernel, Christian König
GPU use-cases for mmu_interval_notifiers with hmm often involve
starting a gpu operation and then waiting for it to complete.
These operations are typically context preemption or TLB flushing.
With single-pass notifiers per GPU this doesn't scale in
multi-gpu scenarios. In those scenarios we'd want to first start
preemption- or TLB flushing on all GPUs and as a second pass wait
for them to complete.
This also applies in non-recoverable page-fault scenarios to
starting a preemption requests on GPUs and waiting for the GPUs
to preempt so that system pages they access can be reclaimed.
One can do this on per-driver basis multiplexing per-driver
notifiers but that would mean sharing the notifier "user" lock
across all GPUs and that doesn't scale well either, so adding support
for two-pass in the core appears like the right choice.
So this series does that, with pach 1 implementing the core support
and also describes the choices made.
The rest of the patches implements a POC with xeKMD userptr
invalidation and potential TLB-flushing. A follow-up series
will extend to drm_gpusvm.
v2 hightlights:
- Refactor the core mm patch to use the struct
mmu_interval_notifier_ops for the invalidate_finish() callback.
- Rebase on xe driver tlb invalidation changes.
- Provide an initial implementation for userptr instead of drm_gpusvm.
The intent is to handle drm_gpusvm in a follow-up series.
v3:
- Address review comments from Matt Brost: Code formatting,
documentation, additional asserts and removal of
unnecessary waits, as specified in each patch.
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: <dri-devel@lists.freedesktop.org>
Cc: <linux-mm@kvack.org>
Cc: <linux-kernel@vger.kernel.org>
Thomas Hellström (4):
mm/mmu_notifier: Allow two-pass struct mmu_interval_notifiers
drm/xe/userptr: Convert invalidation to two-pass MMU notifier
drm/xe: Split TLB invalidation into submit and wait steps
drm/xe/userptr: Defer Waiting for TLB invalidation to the second pass
if possible
drivers/gpu/drm/xe/xe_svm.c | 8 +-
drivers/gpu/drm/xe/xe_tlb_inval.c | 84 +++++++++++++
drivers/gpu/drm/xe/xe_tlb_inval.h | 6 +
drivers/gpu/drm/xe/xe_tlb_inval_types.h | 14 +++
drivers/gpu/drm/xe/xe_userptr.c | 155 ++++++++++++++++++++----
drivers/gpu/drm/xe/xe_userptr.h | 31 ++++-
drivers/gpu/drm/xe/xe_vm.c | 99 +++++----------
drivers/gpu/drm/xe/xe_vm.h | 5 +-
drivers/gpu/drm/xe/xe_vm_madvise.c | 10 +-
drivers/gpu/drm/xe/xe_vm_types.h | 1 +
include/linux/mmu_notifier.h | 38 ++++++
mm/mmu_notifier.c | 65 ++++++++--
12 files changed, 412 insertions(+), 104 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 3+ messages in thread* [PATCH v3 3/4] drm/xe: Split TLB invalidation into submit and wait steps 2026-03-03 13:34 [PATCH v3 0/4] Two-pass MMU interval notifiers Thomas Hellström @ 2026-03-03 13:34 ` Thomas Hellström 2026-03-03 18:13 ` Matthew Brost 0 siblings, 1 reply; 3+ messages in thread From: Thomas Hellström @ 2026-03-03 13:34 UTC (permalink / raw) To: intel-xe Cc: Thomas Hellström, Matthew Brost, Christian König, dri-devel, Jason Gunthorpe, Andrew Morton, Simona Vetter, Dave Airlie, Alistair Popple, linux-mm, linux-kernel xe_vm_range_tilemask_tlb_inval() submits TLB invalidation requests to all GTs in a tile mask and then immediately waits for them to complete before returning. This is fine for the existing callers, but a subsequent patch will need to defer the wait in order to overlap TLB invalidations across multiple VMAs. Introduce xe_tlb_inval_range_tilemask_submit() and xe_tlb_inval_batch_wait() in xe_tlb_inval.c as the submit and wait halves respectively. The batch of fences is carried in the new xe_tlb_inval_batch structure. Remove xe_vm_range_tilemask_tlb_inval() and convert all three call sites to the new API. v3: - Don't wait on TLB invalidation batches if the corresponding batch submit returns an error. (Matt Brost) - s/_batch/batch/ (Matt Brost) Assisted-by: GitHub Copilot:claude-sonnet-4.6 Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> --- drivers/gpu/drm/xe/xe_svm.c | 8 ++- drivers/gpu/drm/xe/xe_tlb_inval.c | 84 +++++++++++++++++++++++++ drivers/gpu/drm/xe/xe_tlb_inval.h | 6 ++ drivers/gpu/drm/xe/xe_tlb_inval_types.h | 14 +++++ drivers/gpu/drm/xe/xe_vm.c | 69 +++----------------- drivers/gpu/drm/xe/xe_vm.h | 3 - drivers/gpu/drm/xe/xe_vm_madvise.c | 10 ++- drivers/gpu/drm/xe/xe_vm_types.h | 1 + 8 files changed, 127 insertions(+), 68 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c index 002b6c22ad3f..a91c84487a67 100644 --- a/drivers/gpu/drm/xe/xe_svm.c +++ b/drivers/gpu/drm/xe/xe_svm.c @@ -19,6 +19,7 @@ #include "xe_pt.h" #include "xe_svm.h" #include "xe_tile.h" +#include "xe_tlb_inval.h" #include "xe_ttm_vram_mgr.h" #include "xe_vm.h" #include "xe_vm_types.h" @@ -225,6 +226,7 @@ static void xe_svm_invalidate(struct drm_gpusvm *gpusvm, const struct mmu_notifier_range *mmu_range) { struct xe_vm *vm = gpusvm_to_vm(gpusvm); + struct xe_tlb_inval_batch batch; struct xe_device *xe = vm->xe; struct drm_gpusvm_range *r, *first; struct xe_tile *tile; @@ -276,8 +278,10 @@ static void xe_svm_invalidate(struct drm_gpusvm *gpusvm, xe_device_wmb(xe); - err = xe_vm_range_tilemask_tlb_inval(vm, adj_start, adj_end, tile_mask); - WARN_ON_ONCE(err); + err = xe_tlb_inval_range_tilemask_submit(xe, vm->usm.asid, adj_start, adj_end, + tile_mask, &batch); + if (!WARN_ON_ONCE(err)) + xe_tlb_inval_batch_wait(&batch); range_notifier_event_end: r = first; diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c index 933f30fb617d..10dcd4abb00f 100644 --- a/drivers/gpu/drm/xe/xe_tlb_inval.c +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c @@ -486,3 +486,87 @@ bool xe_tlb_inval_idle(struct xe_tlb_inval *tlb_inval) guard(spinlock_irq)(&tlb_inval->pending_lock); return list_is_singular(&tlb_inval->pending_fences); } + +/** + * xe_tlb_inval_batch_wait() - Wait for all fences in a TLB invalidation batch + * @batch: Batch of TLB invalidation fences to wait on + * + * Waits for every fence in @batch to signal, then resets @batch so it can be + * reused for a subsequent invalidation. + */ +void xe_tlb_inval_batch_wait(struct xe_tlb_inval_batch *batch) +{ + struct xe_tlb_inval_fence *fence = &batch->fence[0]; + unsigned int i; + + for (i = 0; i < batch->num_fences; ++i) + xe_tlb_inval_fence_wait(fence++); + + batch->num_fences = 0; +} + +/** + * xe_tlb_inval_range_tilemask_submit() - Submit TLB invalidations for an + * address range on a tile mask + * @xe: The xe device + * @asid: Address space ID + * @start: start address + * @end: end address + * @tile_mask: mask for which gt's issue tlb invalidation + * @batch: Batch of tlb invalidate fences + * + * Issue a range based TLB invalidation for gt's in tilemask + * If the function returns an error, there is no need to call + * xe_tlb_inval_batch_wait() on @batch. + * + * Returns 0 for success, negative error code otherwise. + */ +int xe_tlb_inval_range_tilemask_submit(struct xe_device *xe, u32 asid, + u64 start, u64 end, u8 tile_mask, + struct xe_tlb_inval_batch *batch) +{ + struct xe_tlb_inval_fence *fence = &batch->fence[0]; + struct xe_tile *tile; + u32 fence_id = 0; + u8 id; + int err; + + batch->num_fences = 0; + if (!tile_mask) + return 0; + + for_each_tile(tile, xe, id) { + if (!(tile_mask & BIT(id))) + continue; + + xe_tlb_inval_fence_init(&tile->primary_gt->tlb_inval, + &fence[fence_id], true); + + err = xe_tlb_inval_range(&tile->primary_gt->tlb_inval, + &fence[fence_id], start, end, + asid, NULL); + if (err) + goto wait; + ++fence_id; + + if (!tile->media_gt) + continue; + + xe_tlb_inval_fence_init(&tile->media_gt->tlb_inval, + &fence[fence_id], true); + + err = xe_tlb_inval_range(&tile->media_gt->tlb_inval, + &fence[fence_id], start, end, + asid, NULL); + if (err) + goto wait; + ++fence_id; + } + +wait: + batch->num_fences = fence_id; + if (err) + xe_tlb_inval_batch_wait(batch); + + return err; +} diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h b/drivers/gpu/drm/xe/xe_tlb_inval.h index 62089254fa23..a76b7823a5f2 100644 --- a/drivers/gpu/drm/xe/xe_tlb_inval.h +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h @@ -45,4 +45,10 @@ void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno); bool xe_tlb_inval_idle(struct xe_tlb_inval *tlb_inval); +int xe_tlb_inval_range_tilemask_submit(struct xe_device *xe, u32 asid, + u64 start, u64 end, u8 tile_mask, + struct xe_tlb_inval_batch *batch); + +void xe_tlb_inval_batch_wait(struct xe_tlb_inval_batch *batch); + #endif /* _XE_TLB_INVAL_ */ diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h b/drivers/gpu/drm/xe/xe_tlb_inval_types.h index 3b089f90f002..3d1797d186fd 100644 --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h @@ -9,6 +9,8 @@ #include <linux/workqueue.h> #include <linux/dma-fence.h> +#include "xe_device_types.h" + struct drm_suballoc; struct xe_tlb_inval; @@ -132,4 +134,16 @@ struct xe_tlb_inval_fence { ktime_t inval_time; }; +/** + * struct xe_tlb_inval_batch - Batch of TLB invalidation fences + * + * Holds one fence per GT covered by a TLB invalidation request. + */ +struct xe_tlb_inval_batch { + /** @fence: per-GT TLB invalidation fences */ + struct xe_tlb_inval_fence fence[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE]; + /** @num_fences: number of valid entries in @fence */ + unsigned int num_fences; +}; + #endif diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 548b0769b3ef..a3c2e8cefec7 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -3966,66 +3966,6 @@ void xe_vm_unlock(struct xe_vm *vm) dma_resv_unlock(xe_vm_resv(vm)); } -/** - * xe_vm_range_tilemask_tlb_inval - Issue a TLB invalidation on this tilemask for an - * address range - * @vm: The VM - * @start: start address - * @end: end address - * @tile_mask: mask for which gt's issue tlb invalidation - * - * Issue a range based TLB invalidation for gt's in tilemask - * - * Returns 0 for success, negative error code otherwise. - */ -int xe_vm_range_tilemask_tlb_inval(struct xe_vm *vm, u64 start, - u64 end, u8 tile_mask) -{ - struct xe_tlb_inval_fence - fence[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE]; - struct xe_tile *tile; - u32 fence_id = 0; - u8 id; - int err; - - if (!tile_mask) - return 0; - - for_each_tile(tile, vm->xe, id) { - if (!(tile_mask & BIT(id))) - continue; - - xe_tlb_inval_fence_init(&tile->primary_gt->tlb_inval, - &fence[fence_id], true); - - err = xe_tlb_inval_range(&tile->primary_gt->tlb_inval, - &fence[fence_id], start, end, - vm->usm.asid, NULL); - if (err) - goto wait; - ++fence_id; - - if (!tile->media_gt) - continue; - - xe_tlb_inval_fence_init(&tile->media_gt->tlb_inval, - &fence[fence_id], true); - - err = xe_tlb_inval_range(&tile->media_gt->tlb_inval, - &fence[fence_id], start, end, - vm->usm.asid, NULL); - if (err) - goto wait; - ++fence_id; - } - -wait: - for (id = 0; id < fence_id; ++id) - xe_tlb_inval_fence_wait(&fence[id]); - - return err; -} - /** * xe_vm_invalidate_vma - invalidate GPU mappings for VMA without a lock * @vma: VMA to invalidate @@ -4040,6 +3980,7 @@ int xe_vm_invalidate_vma(struct xe_vma *vma) { struct xe_device *xe = xe_vma_vm(vma)->xe; struct xe_vm *vm = xe_vma_vm(vma); + struct xe_tlb_inval_batch batch; struct xe_tile *tile; u8 tile_mask = 0; int ret = 0; @@ -4080,12 +4021,16 @@ int xe_vm_invalidate_vma(struct xe_vma *vma) xe_device_wmb(xe); - ret = xe_vm_range_tilemask_tlb_inval(xe_vma_vm(vma), xe_vma_start(vma), - xe_vma_end(vma), tile_mask); + ret = xe_tlb_inval_range_tilemask_submit(xe, xe_vma_vm(vma)->usm.asid, + xe_vma_start(vma), xe_vma_end(vma), + tile_mask, &batch); /* WRITE_ONCE pairs with READ_ONCE in xe_vm_has_valid_gpu_mapping() */ WRITE_ONCE(vma->tile_invalidated, vma->tile_mask); + if (!ret) + xe_tlb_inval_batch_wait(&batch); + return ret; } diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h index f849e369432b..62f4b6fec0bc 100644 --- a/drivers/gpu/drm/xe/xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -240,9 +240,6 @@ struct dma_fence *xe_vm_range_rebind(struct xe_vm *vm, struct dma_fence *xe_vm_range_unbind(struct xe_vm *vm, struct xe_svm_range *range); -int xe_vm_range_tilemask_tlb_inval(struct xe_vm *vm, u64 start, - u64 end, u8 tile_mask); - int xe_vm_invalidate_vma(struct xe_vma *vma); int xe_vm_validate_protected(struct xe_vm *vm); diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c index 95bf53cc29e3..02daf8a93044 100644 --- a/drivers/gpu/drm/xe/xe_vm_madvise.c +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c @@ -12,6 +12,7 @@ #include "xe_pat.h" #include "xe_pt.h" #include "xe_svm.h" +#include "xe_tlb_inval.h" struct xe_vmas_in_madvise_range { u64 addr; @@ -235,13 +236,20 @@ static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start, u64 end) static int xe_vm_invalidate_madvise_range(struct xe_vm *vm, u64 start, u64 end) { u8 tile_mask = xe_zap_ptes_in_madvise_range(vm, start, end); + struct xe_tlb_inval_batch batch; + int err; if (!tile_mask) return 0; xe_device_wmb(vm->xe); - return xe_vm_range_tilemask_tlb_inval(vm, start, end, tile_mask); + err = xe_tlb_inval_range_tilemask_submit(vm->xe, vm->usm.asid, start, end, + tile_mask, &batch); + if (!err) + xe_tlb_inval_batch_wait(&batch); + + return err; } static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madvise *args) diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h index 1f6f7e30e751..de6544165cfa 100644 --- a/drivers/gpu/drm/xe/xe_vm_types.h +++ b/drivers/gpu/drm/xe/xe_vm_types.h @@ -18,6 +18,7 @@ #include "xe_device_types.h" #include "xe_pt_types.h" #include "xe_range_fence.h" +#include "xe_tlb_inval_types.h" #include "xe_userptr.h" struct drm_pagemap; -- 2.53.0 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v3 3/4] drm/xe: Split TLB invalidation into submit and wait steps 2026-03-03 13:34 ` [PATCH v3 3/4] drm/xe: Split TLB invalidation into submit and wait steps Thomas Hellström @ 2026-03-03 18:13 ` Matthew Brost 0 siblings, 0 replies; 3+ messages in thread From: Matthew Brost @ 2026-03-03 18:13 UTC (permalink / raw) To: Thomas Hellström Cc: intel-xe, Christian König, dri-devel, Jason Gunthorpe, Andrew Morton, Simona Vetter, Dave Airlie, Alistair Popple, linux-mm, linux-kernel On Tue, Mar 03, 2026 at 02:34:08PM +0100, Thomas Hellström wrote: > xe_vm_range_tilemask_tlb_inval() submits TLB invalidation requests to > all GTs in a tile mask and then immediately waits for them to complete > before returning. This is fine for the existing callers, but a > subsequent patch will need to defer the wait in order to overlap TLB > invalidations across multiple VMAs. > > Introduce xe_tlb_inval_range_tilemask_submit() and > xe_tlb_inval_batch_wait() in xe_tlb_inval.c as the submit and wait > halves respectively. The batch of fences is carried in the new > xe_tlb_inval_batch structure. Remove xe_vm_range_tilemask_tlb_inval() > and convert all three call sites to the new API. > > v3: > - Don't wait on TLB invalidation batches if the corresponding batch > submit returns an error. (Matt Brost) > - s/_batch/batch/ (Matt Brost) > > Assisted-by: GitHub Copilot:claude-sonnet-4.6 > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> > --- > drivers/gpu/drm/xe/xe_svm.c | 8 ++- > drivers/gpu/drm/xe/xe_tlb_inval.c | 84 +++++++++++++++++++++++++ > drivers/gpu/drm/xe/xe_tlb_inval.h | 6 ++ > drivers/gpu/drm/xe/xe_tlb_inval_types.h | 14 +++++ > drivers/gpu/drm/xe/xe_vm.c | 69 +++----------------- > drivers/gpu/drm/xe/xe_vm.h | 3 - > drivers/gpu/drm/xe/xe_vm_madvise.c | 10 ++- > drivers/gpu/drm/xe/xe_vm_types.h | 1 + > 8 files changed, 127 insertions(+), 68 deletions(-) > > diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c > index 002b6c22ad3f..a91c84487a67 100644 > --- a/drivers/gpu/drm/xe/xe_svm.c > +++ b/drivers/gpu/drm/xe/xe_svm.c > @@ -19,6 +19,7 @@ > #include "xe_pt.h" > #include "xe_svm.h" > #include "xe_tile.h" > +#include "xe_tlb_inval.h" > #include "xe_ttm_vram_mgr.h" > #include "xe_vm.h" > #include "xe_vm_types.h" > @@ -225,6 +226,7 @@ static void xe_svm_invalidate(struct drm_gpusvm *gpusvm, > const struct mmu_notifier_range *mmu_range) > { > struct xe_vm *vm = gpusvm_to_vm(gpusvm); > + struct xe_tlb_inval_batch batch; > struct xe_device *xe = vm->xe; > struct drm_gpusvm_range *r, *first; > struct xe_tile *tile; > @@ -276,8 +278,10 @@ static void xe_svm_invalidate(struct drm_gpusvm *gpusvm, > > xe_device_wmb(xe); > > - err = xe_vm_range_tilemask_tlb_inval(vm, adj_start, adj_end, tile_mask); > - WARN_ON_ONCE(err); > + err = xe_tlb_inval_range_tilemask_submit(xe, vm->usm.asid, adj_start, adj_end, > + tile_mask, &batch); > + if (!WARN_ON_ONCE(err)) > + xe_tlb_inval_batch_wait(&batch); > > range_notifier_event_end: > r = first; > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c > index 933f30fb617d..10dcd4abb00f 100644 > --- a/drivers/gpu/drm/xe/xe_tlb_inval.c > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c > @@ -486,3 +486,87 @@ bool xe_tlb_inval_idle(struct xe_tlb_inval *tlb_inval) > guard(spinlock_irq)(&tlb_inval->pending_lock); > return list_is_singular(&tlb_inval->pending_fences); > } > + > +/** > + * xe_tlb_inval_batch_wait() - Wait for all fences in a TLB invalidation batch > + * @batch: Batch of TLB invalidation fences to wait on > + * > + * Waits for every fence in @batch to signal, then resets @batch so it can be > + * reused for a subsequent invalidation. > + */ > +void xe_tlb_inval_batch_wait(struct xe_tlb_inval_batch *batch) > +{ > + struct xe_tlb_inval_fence *fence = &batch->fence[0]; > + unsigned int i; > + > + for (i = 0; i < batch->num_fences; ++i) > + xe_tlb_inval_fence_wait(fence++); > + > + batch->num_fences = 0; > +} > + > +/** > + * xe_tlb_inval_range_tilemask_submit() - Submit TLB invalidations for an > + * address range on a tile mask > + * @xe: The xe device > + * @asid: Address space ID > + * @start: start address > + * @end: end address > + * @tile_mask: mask for which gt's issue tlb invalidation > + * @batch: Batch of tlb invalidate fences > + * > + * Issue a range based TLB invalidation for gt's in tilemask > + * If the function returns an error, there is no need to call > + * xe_tlb_inval_batch_wait() on @batch. > + * > + * Returns 0 for success, negative error code otherwise. > + */ > +int xe_tlb_inval_range_tilemask_submit(struct xe_device *xe, u32 asid, > + u64 start, u64 end, u8 tile_mask, > + struct xe_tlb_inval_batch *batch) > +{ > + struct xe_tlb_inval_fence *fence = &batch->fence[0]; > + struct xe_tile *tile; > + u32 fence_id = 0; > + u8 id; > + int err; > + > + batch->num_fences = 0; > + if (!tile_mask) > + return 0; > + > + for_each_tile(tile, xe, id) { > + if (!(tile_mask & BIT(id))) > + continue; > + > + xe_tlb_inval_fence_init(&tile->primary_gt->tlb_inval, > + &fence[fence_id], true); > + > + err = xe_tlb_inval_range(&tile->primary_gt->tlb_inval, > + &fence[fence_id], start, end, > + asid, NULL); > + if (err) > + goto wait; > + ++fence_id; > + > + if (!tile->media_gt) > + continue; > + > + xe_tlb_inval_fence_init(&tile->media_gt->tlb_inval, > + &fence[fence_id], true); > + > + err = xe_tlb_inval_range(&tile->media_gt->tlb_inval, > + &fence[fence_id], start, end, > + asid, NULL); > + if (err) > + goto wait; > + ++fence_id; > + } > + > +wait: > + batch->num_fences = fence_id; > + if (err) > + xe_tlb_inval_batch_wait(batch); > + > + return err; > +} > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h b/drivers/gpu/drm/xe/xe_tlb_inval.h > index 62089254fa23..a76b7823a5f2 100644 > --- a/drivers/gpu/drm/xe/xe_tlb_inval.h > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h > @@ -45,4 +45,10 @@ void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno); > > bool xe_tlb_inval_idle(struct xe_tlb_inval *tlb_inval); > > +int xe_tlb_inval_range_tilemask_submit(struct xe_device *xe, u32 asid, > + u64 start, u64 end, u8 tile_mask, > + struct xe_tlb_inval_batch *batch); > + > +void xe_tlb_inval_batch_wait(struct xe_tlb_inval_batch *batch); > + > #endif /* _XE_TLB_INVAL_ */ > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h b/drivers/gpu/drm/xe/xe_tlb_inval_types.h > index 3b089f90f002..3d1797d186fd 100644 > --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h > @@ -9,6 +9,8 @@ > #include <linux/workqueue.h> > #include <linux/dma-fence.h> > > +#include "xe_device_types.h" > + > struct drm_suballoc; > struct xe_tlb_inval; > > @@ -132,4 +134,16 @@ struct xe_tlb_inval_fence { > ktime_t inval_time; > }; > > +/** > + * struct xe_tlb_inval_batch - Batch of TLB invalidation fences > + * > + * Holds one fence per GT covered by a TLB invalidation request. > + */ > +struct xe_tlb_inval_batch { > + /** @fence: per-GT TLB invalidation fences */ > + struct xe_tlb_inval_fence fence[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE]; > + /** @num_fences: number of valid entries in @fence */ > + unsigned int num_fences; > +}; > + > #endif > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c > index 548b0769b3ef..a3c2e8cefec7 100644 > --- a/drivers/gpu/drm/xe/xe_vm.c > +++ b/drivers/gpu/drm/xe/xe_vm.c > @@ -3966,66 +3966,6 @@ void xe_vm_unlock(struct xe_vm *vm) > dma_resv_unlock(xe_vm_resv(vm)); > } > > -/** > - * xe_vm_range_tilemask_tlb_inval - Issue a TLB invalidation on this tilemask for an > - * address range > - * @vm: The VM > - * @start: start address > - * @end: end address > - * @tile_mask: mask for which gt's issue tlb invalidation > - * > - * Issue a range based TLB invalidation for gt's in tilemask > - * > - * Returns 0 for success, negative error code otherwise. > - */ > -int xe_vm_range_tilemask_tlb_inval(struct xe_vm *vm, u64 start, > - u64 end, u8 tile_mask) > -{ > - struct xe_tlb_inval_fence > - fence[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE]; > - struct xe_tile *tile; > - u32 fence_id = 0; > - u8 id; > - int err; > - > - if (!tile_mask) > - return 0; > - > - for_each_tile(tile, vm->xe, id) { > - if (!(tile_mask & BIT(id))) > - continue; > - > - xe_tlb_inval_fence_init(&tile->primary_gt->tlb_inval, > - &fence[fence_id], true); > - > - err = xe_tlb_inval_range(&tile->primary_gt->tlb_inval, > - &fence[fence_id], start, end, > - vm->usm.asid, NULL); > - if (err) > - goto wait; > - ++fence_id; > - > - if (!tile->media_gt) > - continue; > - > - xe_tlb_inval_fence_init(&tile->media_gt->tlb_inval, > - &fence[fence_id], true); > - > - err = xe_tlb_inval_range(&tile->media_gt->tlb_inval, > - &fence[fence_id], start, end, > - vm->usm.asid, NULL); > - if (err) > - goto wait; > - ++fence_id; > - } > - > -wait: > - for (id = 0; id < fence_id; ++id) > - xe_tlb_inval_fence_wait(&fence[id]); > - > - return err; > -} > - > /** > * xe_vm_invalidate_vma - invalidate GPU mappings for VMA without a lock > * @vma: VMA to invalidate > @@ -4040,6 +3980,7 @@ int xe_vm_invalidate_vma(struct xe_vma *vma) > { > struct xe_device *xe = xe_vma_vm(vma)->xe; > struct xe_vm *vm = xe_vma_vm(vma); > + struct xe_tlb_inval_batch batch; > struct xe_tile *tile; > u8 tile_mask = 0; > int ret = 0; > @@ -4080,12 +4021,16 @@ int xe_vm_invalidate_vma(struct xe_vma *vma) > > xe_device_wmb(xe); > > - ret = xe_vm_range_tilemask_tlb_inval(xe_vma_vm(vma), xe_vma_start(vma), > - xe_vma_end(vma), tile_mask); > + ret = xe_tlb_inval_range_tilemask_submit(xe, xe_vma_vm(vma)->usm.asid, > + xe_vma_start(vma), xe_vma_end(vma), > + tile_mask, &batch); > > /* WRITE_ONCE pairs with READ_ONCE in xe_vm_has_valid_gpu_mapping() */ > WRITE_ONCE(vma->tile_invalidated, vma->tile_mask); > > + if (!ret) > + xe_tlb_inval_batch_wait(&batch); > + > return ret; > } > > diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h > index f849e369432b..62f4b6fec0bc 100644 > --- a/drivers/gpu/drm/xe/xe_vm.h > +++ b/drivers/gpu/drm/xe/xe_vm.h > @@ -240,9 +240,6 @@ struct dma_fence *xe_vm_range_rebind(struct xe_vm *vm, > struct dma_fence *xe_vm_range_unbind(struct xe_vm *vm, > struct xe_svm_range *range); > > -int xe_vm_range_tilemask_tlb_inval(struct xe_vm *vm, u64 start, > - u64 end, u8 tile_mask); > - > int xe_vm_invalidate_vma(struct xe_vma *vma); > > int xe_vm_validate_protected(struct xe_vm *vm); > diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c > index 95bf53cc29e3..02daf8a93044 100644 > --- a/drivers/gpu/drm/xe/xe_vm_madvise.c > +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c > @@ -12,6 +12,7 @@ > #include "xe_pat.h" > #include "xe_pt.h" > #include "xe_svm.h" > +#include "xe_tlb_inval.h" > > struct xe_vmas_in_madvise_range { > u64 addr; > @@ -235,13 +236,20 @@ static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *vm, u64 start, u64 end) > static int xe_vm_invalidate_madvise_range(struct xe_vm *vm, u64 start, u64 end) > { > u8 tile_mask = xe_zap_ptes_in_madvise_range(vm, start, end); > + struct xe_tlb_inval_batch batch; > + int err; > > if (!tile_mask) > return 0; > > xe_device_wmb(vm->xe); > > - return xe_vm_range_tilemask_tlb_inval(vm, start, end, tile_mask); > + err = xe_tlb_inval_range_tilemask_submit(vm->xe, vm->usm.asid, start, end, > + tile_mask, &batch); > + if (!err) > + xe_tlb_inval_batch_wait(&batch); > + > + return err; > } > > static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_xe_madvise *args) > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h > index 1f6f7e30e751..de6544165cfa 100644 > --- a/drivers/gpu/drm/xe/xe_vm_types.h > +++ b/drivers/gpu/drm/xe/xe_vm_types.h > @@ -18,6 +18,7 @@ > #include "xe_device_types.h" > #include "xe_pt_types.h" > #include "xe_range_fence.h" > +#include "xe_tlb_inval_types.h" > #include "xe_userptr.h" > > struct drm_pagemap; > -- > 2.53.0 > ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-03 18:13 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-03 14:03 [PATCH v3 3/4] drm/xe: Split TLB invalidation into submit and wait steps kernel test robot -- strict thread matches above, loose matches on Subject: below -- 2026-03-03 13:34 [PATCH v3 0/4] Two-pass MMU interval notifiers Thomas Hellström 2026-03-03 13:34 ` [PATCH v3 3/4] drm/xe: Split TLB invalidation into submit and wait steps Thomas Hellström 2026-03-03 18:13 ` Matthew Brost
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.