Re: [PATCH v4 1/5] drm/xe: Add last fence attachment to TLB invalidation job queues

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Summers, Stuart" <stuart.summers@intel.com>
To: "Brost, Matthew" <matthew.brost@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"thomas.hellstrom@linux.intel.com"
	<thomas.hellstrom@linux.intel.com>
Subject: Re: [PATCH v4 1/5] drm/xe: Add last fence attachment to TLB invalidation job queues
Date: Wed, 29 Oct 2025 22:17:30 +0000	[thread overview]
Message-ID: <d5d2b43cbf25ca41906de25366d4eed903fbf9d6.camel@intel.com> (raw)
In-Reply-To: <aQJ5Qyb5HgaNc73L@lstrano-desk.jf.intel.com>

On Wed, 2025-10-29 at 13:29 -0700, Matthew Brost wrote:
> On Wed, Oct 29, 2025 at 11:48:29AM -0600, Summers, Stuart wrote:
> > On Mon, 2025-10-27 at 11:27 -0700, Matthew Brost wrote:
> > > To address serialization issues with bursts of unbind jobs, this
> > > patch
> > > adds support for attaching the last fence to TLB invalidation job
> > > queues. The idea is that user fence signaling for a bind job
> > > reflects
> > > both the bind job itself and the last fences of all related TLB
> > > invalidations. The submission order of bind jobs and TLB
> > > invalidations
> > > depends solely on the state of their respective queues.
> > > 
> > > This patch only introduces support functions for last fence
> > > attachment
> > > to TLB invalidation queues.
> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > 
> > > ---
> > > v3:
> > >  - Fix assert in xe_exec_queue_tlb_inval_last_fence_set (CI)
> > >  - Ensure migrate lock held for migrate queues (Testing)
> > > ---
> > >  drivers/gpu/drm/xe/xe_exec_queue.c       | 105
> > > ++++++++++++++++++++++-
> > >  drivers/gpu/drm/xe/xe_exec_queue.h       |  18 ++++
> > >  drivers/gpu/drm/xe/xe_exec_queue_types.h |   5 ++
> > >  drivers/gpu/drm/xe/xe_migrate.c          |  14 +++
> > >  drivers/gpu/drm/xe/xe_migrate.h          |   8 ++
> > >  drivers/gpu/drm/xe/xe_vm.c               |   7 +-
> > >  6 files changed, 155 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > index 90cbc95f8e2e..d7d00d4de93c 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > @@ -376,11 +376,15 @@ void xe_exec_queue_destroy(struct kref
> > > *ref)
> > >  {
> > >         struct xe_exec_queue *q = container_of(ref, struct
> > > xe_exec_queue, refcount);
> > >         struct xe_exec_queue *eq, *next;
> > > +       int i;
> > >  
> > >         if (xe_exec_queue_uses_pxp(q))
> > >                 xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp,
> > > q);
> > >  
> > >         xe_exec_queue_last_fence_put_unlocked(q);
> > > +       for_each_tlb_inval(i)
> > > +               xe_exec_queue_tlb_inval_last_fence_put_unlocked(q
> > > ,
> > > i);
> > > +
> > >         if (!(q->flags & EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD)) {
> > >                 list_for_each_entry_safe(eq, next, &q-
> > > >multi_gt_list,
> > >                                          multi_gt_link)
> > > @@ -998,7 +1002,9 @@ int xe_exec_queue_destroy_ioctl(struct
> > > drm_device *dev, void *data,
> > >  static void xe_exec_queue_last_fence_lockdep_assert(struct
> > > xe_exec_queue *q,
> > >                                                     struct xe_vm
> > > *vm)
> > >  {
> > > -       if (q->flags & EXEC_QUEUE_FLAG_VM) {
> > > +       if (q->flags & EXEC_QUEUE_FLAG_MIGRATE) {
> > > +               xe_migrate_job_lock_assert(q);
> > > +       } else if (q->flags & EXEC_QUEUE_FLAG_VM) {
> > >                 lockdep_assert_held(&vm->lock);
> > >         } else {
> > >                 xe_vm_assert_held(vm);
> > > @@ -1097,6 +1103,7 @@ void xe_exec_queue_last_fence_set(struct
> > > xe_exec_queue *q, struct xe_vm *vm,
> > >                                   struct dma_fence *fence)
> > >  {
> > >         xe_exec_queue_last_fence_lockdep_assert(q, vm);
> > > +       xe_assert(vm->xe, !dma_fence_is_container(fence));
> > >  
> > >         xe_exec_queue_last_fence_put(q, vm);
> > >         q->last_fence = dma_fence_get(fence);
> > > @@ -1125,6 +1132,102 @@ int
> > > xe_exec_queue_last_fence_test_dep(struct
> > > xe_exec_queue *q, struct xe_vm *vm)
> > >         return err;
> > >  }
> > >  
> > > +/**
> > > + * xe_exec_queue_tlb_inval_last_fence_put() - Drop ref to last
> > > TLB
> > > invalidation fence
> > > + * @q: The exec queue
> > > + * @vm: The VM the engine does a bind for
> > > + * @type: Either primary or media GT
> > > + */
> > > +void xe_exec_queue_tlb_inval_last_fence_put(struct xe_exec_queue
> > > *q,
> > > +                                           struct xe_vm *vm,
> > > +                                           unsigned int type)
> > > +{
> > > +       xe_exec_queue_last_fence_lockdep_assert(q, vm);
> > > +       xe_assert(vm->xe, type ==
> > > XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT ||
> > > +                 type == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT);
> > 
> > Do we really need these GT asserts through here?
> > 
> 
> I tend to be overzelous with asserts to catch any possible bugs, also
> asserts by nature are self documenting. So yes, I'd like to have
> these
> included.

Yeah makes sense. I was thinking of asking to move the check up a level
into the for_each_tlb_inval() loop itself, but maybe right having this
at the lower levels for clear stack traces. No issue from my side.

Thanks,
Stuart

> 
> Matt 
> 
> > Thanks,
> > Stuart
> > 
> > > +
> > > +       xe_exec_queue_tlb_inval_last_fence_put_unlocked(q, type);
> > > +}
> > > +
> > > +/**
> > > + * xe_exec_queue_tlb_inval_last_fence_put_unlocked() - Drop ref
> > > to
> > > last TLB
> > > + * invalidation fence unlocked
> > > + * @q: The exec queue
> > > + * @type: Either primary or media GT
> > > + *
> > > + * Only safe to be called from xe_exec_queue_destroy().
> > > + */
> > > +void xe_exec_queue_tlb_inval_last_fence_put_unlocked(struct
> > > xe_exec_queue *q,
> > > +                                                    unsigned int
> > > type)
> > > +{
> > > +       xe_assert(q->vm->xe, type ==
> > > XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT
> > > > > 
> > > +                 type == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT);
> > > +
> > > +       if (q->tlb_inval[type].last_fence) {
> > > +               dma_fence_put(q->tlb_inval[type].last_fence);
> > > +               q->tlb_inval[type].last_fence = NULL;
> > > +       }
> > > +}
> > > +
> > > +/**
> > > + * xe_exec_queue_tlb_inval_last_fence_get() - Get last fence for
> > > TLB
> > > invalidation
> > > + * @q: The exec queue
> > > + * @vm: The VM the engine does a bind for
> > > + * @type: Either primary or media GT
> > > + *
> > > + * Get last fence, takes a ref
> > > + *
> > > + * Returns: last fence if not signaled, dma fence stub if
> > > signaled
> > > + */
> > > +struct dma_fence *xe_exec_queue_tlb_inval_last_fence_get(struct
> > > xe_exec_queue *q,
> > > +                                                        struct
> > > xe_vm
> > > *vm,
> > > +                                                        unsigned
> > > int
> > > type)
> > > +{
> > > +       struct dma_fence *fence;
> > > +
> > > +       xe_exec_queue_last_fence_lockdep_assert(q, vm);
> > > +       xe_assert(vm->xe, type ==
> > > XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT ||
> > > +                 type == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT);
> > > +       xe_assert(vm->xe, q->flags & (EXEC_QUEUE_FLAG_VM |
> > > +                                     EXEC_QUEUE_FLAG_MIGRATE));
> > > +
> > > +       if (q->tlb_inval[type].last_fence &&
> > > +           test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > > +                    &q->tlb_inval[type].last_fence->flags))
> > > +               xe_exec_queue_tlb_inval_last_fence_put(q, vm,
> > > type);
> > > +
> > > +       fence = q->tlb_inval[type].last_fence ?:
> > > dma_fence_get_stub();
> > > +       dma_fence_get(fence);
> > > +       return fence;
> > > +}
> > > +
> > > +/**
> > > + * xe_exec_queue_tlb_inval_last_fence_set() - Set last fence for
> > > TLB
> > > invalidation
> > > + * @q: The exec queue
> > > + * @vm: The VM the engine does a bind for
> > > + * @fence: The fence
> > > + * @type: Either primary or media GT
> > > + *
> > > + * Set the last fence for the tlb invalidation type on the
> > > queue.
> > > Increases
> > > + * reference count for fence, when closing queue
> > > + * xe_exec_queue_tlb_inval_last_fence_put should be called.
> > > + */
> > > +void xe_exec_queue_tlb_inval_last_fence_set(struct xe_exec_queue
> > > *q,
> > > +                                           struct xe_vm *vm,
> > > +                                           struct dma_fence
> > > *fence,
> > > +                                           unsigned int type)
> > > +{
> > > +       xe_exec_queue_last_fence_lockdep_assert(q, vm);
> > > +       xe_assert(vm->xe, type ==
> > > XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT ||
> > > +                 type == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT);
> > > +       xe_assert(vm->xe, q->flags & (EXEC_QUEUE_FLAG_VM |
> > > +                                     EXEC_QUEUE_FLAG_MIGRATE));
> > > +       xe_assert(vm->xe, !dma_fence_is_container(fence));
> > > +
> > > +       xe_exec_queue_tlb_inval_last_fence_put(q, vm, type);
> > > +       q->tlb_inval[type].last_fence = dma_fence_get(fence);
> > > +}
> > > +
> > >  /**
> > >   * xe_exec_queue_contexts_hwsp_rebase - Re-compute GGTT
> > > references
> > >   * within all LRCs of a queue.
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h
> > > b/drivers/gpu/drm/xe/xe_exec_queue.h
> > > index a4dfbe858bda..c4b95fad93f1 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> > > @@ -14,6 +14,10 @@ struct drm_file;
> > >  struct xe_device;
> > >  struct xe_file;
> > >  
> > > +#define for_each_tlb_inval(__i)        \
> > > +       for (__i = XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT; \
> > > +            __i <= XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT; ++__i)
> > > +
> > >  struct xe_exec_queue *xe_exec_queue_create(struct xe_device *xe,
> > > struct xe_vm *vm,
> > >                                            u32 logical_mask, u16
> > > width,
> > >                                            struct xe_hw_engine
> > > *hw_engine, u32 flags,
> > > @@ -86,6 +90,20 @@ void xe_exec_queue_last_fence_set(struct
> > > xe_exec_queue *e, struct xe_vm *vm,
> > >                                   struct dma_fence *fence);
> > >  int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > >                                       struct xe_vm *vm);
> > > +
> > > +void xe_exec_queue_tlb_inval_last_fence_put(struct xe_exec_queue
> > > *q,
> > > +                                           struct xe_vm *vm,
> > > +                                           unsigned int type);
> > > +void xe_exec_queue_tlb_inval_last_fence_put_unlocked(struct
> > > xe_exec_queue *q,
> > > +                                                    unsigned int
> > > type);
> > > +struct dma_fence *xe_exec_queue_tlb_inval_last_fence_get(struct
> > > xe_exec_queue *q,
> > > +                                                        struct
> > > xe_vm
> > > *vm,
> > > +                                                        unsigned
> > > int
> > > type);
> > > +void xe_exec_queue_tlb_inval_last_fence_set(struct xe_exec_queue
> > > *q,
> > > +                                           struct xe_vm *vm,
> > > +                                           struct dma_fence
> > > *fence,
> > > +                                           unsigned int type);
> > > +
> > >  void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
> > >  
> > >  int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q,
> > > void
> > > *scratch);
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > index 282505fa1377..b4185fee54e1 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > @@ -145,6 +145,11 @@ struct xe_exec_queue {
> > >                  * dependency scheduler
> > >                  */
> > >                 struct xe_dep_scheduler *dep_scheduler;
> > > +               /**
> > > +                * @last_fence: last fence for tlb invalidation,
> > > protected by
> > > +                * vm->lock in write mode
> > > +                */
> > > +               struct dma_fence *last_fence;
> > >         } tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
> > >  
> > >         /** @pxp: PXP info tracking */
> > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > > b/drivers/gpu/drm/xe/xe_migrate.c
> > > index 921c9c1ea41f..4567bc88a8ec 100644
> > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > @@ -2333,6 +2333,20 @@ void xe_migrate_job_unlock(struct
> > > xe_migrate
> > > *m, struct xe_exec_queue *q)
> > >                 xe_vm_assert_held(q->vm);       /* User queues
> > > VM's
> > > should be locked */
> > >  }
> > >  
> > > +#if IS_ENABLED(CONFIG_PROVE_LOCKING)
> > > +/**
> > > + * xe_migrate_job_lock_assert() - Assert migrate job lock held
> > > of
> > > queue
> > > + * @q: Migrate queue
> > > + */
> > > +void xe_migrate_job_lock_assert(struct xe_exec_queue *q)
> > > +{
> > > +       struct xe_migrate *m = gt_to_tile(q->gt)->migrate;
> > > +
> > > +       xe_gt_assert(q->gt, q == m->q);
> > > +       lockdep_assert_held(&m->job_mutex);
> > > +}
> > > +#endif
> > > +
> > >  #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
> > >  #include "tests/xe_migrate.c"
> > >  #endif
> > > diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> > > b/drivers/gpu/drm/xe/xe_migrate.h
> > > index 4fad324b6253..9b5791617f5e 100644
> > > --- a/drivers/gpu/drm/xe/xe_migrate.h
> > > +++ b/drivers/gpu/drm/xe/xe_migrate.h
> > > @@ -152,6 +152,14 @@ xe_migrate_update_pgtables(struct xe_migrate
> > > *m,
> > >  
> > >  void xe_migrate_wait(struct xe_migrate *m);
> > >  
> > > +#if IS_ENABLED(CONFIG_PROVE_LOCKING)
> > > +void xe_migrate_job_lock_assert(struct xe_exec_queue *q);
> > > +#else
> > > +static inline void xe_migrate_job_lock_assert(struct
> > > xe_exec_queue
> > > *q)
> > > +{
> > > +}
> > > +#endif
> > > +
> > >  void xe_migrate_job_lock(struct xe_migrate *m, struct
> > > xe_exec_queue
> > > *q);
> > >  void xe_migrate_job_unlock(struct xe_migrate *m, struct
> > > xe_exec_queue *q);
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > b/drivers/gpu/drm/xe/xe_vm.c
> > > index 10d77666a425..d2a2f823f1b3 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -1731,8 +1731,13 @@ void xe_vm_close_and_put(struct xe_vm *vm)
> > >  
> > >         down_write(&vm->lock);
> > >         for_each_tile(tile, xe, id) {
> > > -               if (vm->q[id])
> > > +               if (vm->q[id]) {
> > > +                       int i;
> > > +
> > >                         xe_exec_queue_last_fence_put(vm->q[id],
> > > vm);
> > > +                       for_each_tlb_inval(i)
> > > +                               xe_exec_queue_tlb_inval_last_fenc
> > > e_pu
> > > t(vm->q[id], vm, i);
> > > +               }
> > >         }
> > >         up_write(&vm->lock);
> > >  
> >

next prev parent reply	other threads:[~2025-10-29 22:17 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-27 18:27 [PATCH v4 0/5] Fix serialization on burst of unbinds - v2 Matthew Brost
2025-10-27 18:27 ` [PATCH v4 1/5] drm/xe: Add last fence attachment to TLB invalidation job queues Matthew Brost
2025-10-29 16:05   ` Thomas Hellström
2025-10-29 17:48   ` Summers, Stuart
2025-10-29 20:29     ` Matthew Brost
2025-10-29 22:17       ` Summers, Stuart [this message]
2025-10-27 18:27 ` [PATCH v4 2/5] drm/xe: Decouple bind queue last fence from TLB invalidations Matthew Brost
2025-10-27 18:27 ` [PATCH v4 3/5] drm/xe: Do not wait on TLB invalidations in page fault binds Matthew Brost
2025-10-27 18:27 ` [PATCH v4 4/5] drm/xe: Don't allow in fences on zero batch exec or zero binds Matthew Brost
2025-10-27 18:27 ` [PATCH v4 5/5] drm/xe: Remove last fence dependecy check from binds Matthew Brost
2025-10-27 18:33 ` ✗ CI.checkpatch: warning for Fix serialization on burst of unbinds - v2 Patchwork
2025-10-27 18:34 ` ✓ CI.KUnit: success " Patchwork
2025-10-27 19:13 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-10-28  0:12 ` ✗ Xe.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d5d2b43cbf25ca41906de25366d4eed903fbf9d6.camel@intel.com \
    --to=stuart.summers@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox