Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: "Summers, Stuart" <stuart.summers@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Santa,  Carlos" <carlos.santa@intel.com>,
	"thomas.hellstrom@linux.intel.com"
	<thomas.hellstrom@linux.intel.com>
Subject: Re: [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations
Date: Tue, 21 Oct 2025 13:50:26 -0700	[thread overview]
Message-ID: <aPfyEgGRRZXpj7Um@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <7e2afab209224282ab3a211972364bb5ec1fce44.camel@intel.com>

On Tue, Oct 21, 2025 at 02:43:07PM -0600, Summers, Stuart wrote:
> On Tue, 2025-10-21 at 13:36 -0700, Matthew Brost wrote:
> > On Tue, Oct 21, 2025 at 11:55:56AM -0600, Summers, Stuart wrote:
> > > On Fri, 2025-10-17 at 09:52 -0700, Matthew Brost wrote:
> > > > When a burst of unbind jobs is issued, a dependency chain can
> > > > form
> > > > between the TLB invalidation of a previous unbind job and the
> > > > current
> > > > one. This leads to undesirable serialization, causing current
> > > > jobs to
> > > > wait unnecessarily for prior TLB invalidations, execute on the
> > > > GPU
> > > > when
> > > > not needed, and significantly slow down the unbind
> > > > burst—resulting in
> > > > up
> > > > to a 4× slowdown.
> > > > 
> > > > To break this chain, mask the last bind queue dependency if the
> > > > last
> > > > fence's DMA context matches the TLB invalidation context. This
> > > > allows
> > > > full pipelining of unbinds and TLB invalidations while preserving
> > > > correct dma-fence signaling semantics.
> > > > 
> > > > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6047
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_exec.c          |  3 +-
> > > >  drivers/gpu/drm/xe/xe_exec_queue.c    | 18 +++++++++--
> > > >  drivers/gpu/drm/xe/xe_exec_queue.h    |  3 +-
> > > >  drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++--
> > > >  drivers/gpu/drm/xe/xe_sched_job.c     | 44
> > > > ++++++++++++++++++++++++++-
> > > >  drivers/gpu/drm/xe/xe_sched_job.h     |  7 ++++-
> > > >  drivers/gpu/drm/xe/xe_tlb_inval_job.c | 14 +++++++++
> > > >  drivers/gpu/drm/xe/xe_tlb_inval_job.h |  2 ++
> > > >  8 files changed, 98 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec.c
> > > > b/drivers/gpu/drm/xe/xe_exec.c
> > > > index 0dc27476832b..6034cfc8be06 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec.c
> > > > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > > > @@ -294,7 +294,8 @@ int xe_exec_ioctl(struct drm_device *dev,
> > > > void
> > > > *data, struct drm_file *file)
> > > >                 goto err_put_job;
> > > >  
> > > >         if (!xe_vm_in_lr_mode(vm)) {
> > > > -               err = xe_sched_job_last_fence_add_dep(job, vm);
> > > > +               err = xe_sched_job_last_fence_add_dep(job, vm,
> > > > NO_MASK_DEP,
> > > > +                                                    
> > > > NO_MASK_DEP);
> > > >                 if (err)
> > > >                         goto err_put_job;
> > > >  
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > index 90cbc95f8e2e..d6f69d9bccba 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > @@ -25,6 +25,7 @@
> > > >  #include "xe_migrate.h"
> > > >  #include "xe_pm.h"
> > > >  #include "xe_ring_ops_types.h"
> > > > +#include "xe_sched_job.h"
> > > >  #include "xe_trace.h"
> > > >  #include "xe_vm.h"
> > > >  #include "xe_pxp.h"
> > > > @@ -1106,11 +1107,17 @@ void xe_exec_queue_last_fence_set(struct
> > > > xe_exec_queue *q, struct xe_vm *vm,
> > > >   * xe_exec_queue_last_fence_test_dep - Test last fence
> > > > dependency of
> > > > queue
> > > >   * @q: The exec queue
> > > >   * @vm: The VM the engine does a bind or exec for
> > > > + * @mask_ctx0: Mask dma-fence context0
> > > > + * @mask_ctx1: Mask dma-fence context1
> > > > + *
> > > > + * Test last fence dependency of queue, skipping masked dma
> > > > fence
> > > > contexts.
> > > >   *
> > > >   * Returns:
> > > > - * -ETIME if there exists an unsignalled last fence dependency,
> > > > zero
> > > > otherwise.
> > > > + * -ETIME if there exists an unsignalled and unmasked last fence
> > > > dependency,
> > > > + * zero otherwise.
> > > >   */
> > > > -int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > > > struct xe_vm *vm)
> > > > +int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > > > struct xe_vm *vm,
> > > > +                                     u64 mask_ctx0, u64
> > > > mask_ctx1)
> > > >  {
> > > >         struct dma_fence *fence;
> > > >         int err = 0;
> > > > @@ -1119,6 +1126,13 @@ int
> > > > xe_exec_queue_last_fence_test_dep(struct
> > > > xe_exec_queue *q, struct xe_vm *vm)
> > > >         if (fence) {
> > > >                 err = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > > > &fence-
> > > > > flags) ?
> > > >                         0 : -ETIME;
> > > > +
> > > > +               if (err == -ETIME) {
> > > > +                       if (xe_sched_job_mask_dependency(fence,
> > > > mask_ctx0,
> > > > +                                                       
> > > > mask_ctx1))
> > > > +                               err = 0;
> > > > +               }
> > > > +
> > > >                 dma_fence_put(fence);
> > > >         }
> > > >  
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h
> > > > b/drivers/gpu/drm/xe/xe_exec_queue.h
> > > > index a4dfbe858bda..99a35b22a46c 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> > > > @@ -85,7 +85,8 @@ struct dma_fence
> > > > *xe_exec_queue_last_fence_get_for_resume(struct xe_exec_queue *
> > > >  void xe_exec_queue_last_fence_set(struct xe_exec_queue *e,
> > > > struct
> > > > xe_vm *vm,
> > > >                                   struct dma_fence *fence);
> > > >  int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > > > -                                     struct xe_vm *vm);
> > > > +                                     struct xe_vm *vm, u64
> > > > mask_ctx0,
> > > > +                                     u64 mask_ctx1);
> > > >  void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
> > > >  
> > > >  int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q,
> > > > void
> > > > *scratch);
> > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > > b/drivers/gpu/drm/xe/xe_pt.c
> > > > index d22fd1ccc0ba..bba9ae559f57 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > > @@ -1341,10 +1341,21 @@ static int xe_pt_vm_dependencies(struct
> > > > xe_sched_job *job,
> > > >         }
> > > >  
> > > >         if (!(pt_update_ops->q->flags & EXEC_QUEUE_FLAG_KERNEL))
> > > > {
> > > > +               u64 mask_ctx0 = NO_MASK_DEP, mask_ctx1 =
> > > > NO_MASK_DEP;
> > > > +
> > > > +               if (ijob)
> > > > +                       mask_ctx0 =
> > > > xe_tlb_inval_job_fence_context(ijob);
> > > > +               if (mjob)
> > > > +                       mask_ctx1 =
> > > > xe_tlb_inval_job_fence_context(mjob);
> > > 
> > > Can we rename these ictx and mctx for consistency?
> > > 
> > 
> > Yes.
> > 
> > > Also, do we really need both of these here? Shouldn't we always
> > > have
> > > the primary GT inval (ictx) and so just need to check the one? My
> > 
> > The code as written, we'd only need to check primary GT but Matt R
> > eventually wants the driver to be able to boot without the primary
> > GT.
> > There is a bit of work to done to that work but I didn't want to make
> > it
> > worse in this patch.
> 
> Yeah makes sense, was just thinking of how to make this a little
> simplier. I don't really like that we're looking at both contexts here
> when we really just need one of them. But we're also doing this in
> other parts of the driver (like the primary/media GT TLB invals for
> which this is based), so maybe no problem. Just good to at least note
> this, otherwise to me it isn't super clear why we need two contexts
> here at a glance.
> 
> I think at least having those name changes (ictx and mctx) would help
> here.
> 

Yes, will add a comment too.

> > 
> > > understanding is the reason being that we might be adding either
> > > one of
> > > these as the last fence so we need both checks. But in that case
> > > would
> > > it be better to check against all dependencies or even just the
> > > last
> > > two? Wouldn't that also help if multiple apps are trying to free at
> > > once here so we have interleaved unbind dependencies?
> > > 
> > 
> > Depending on how the bind is setup, we may check further dependecies
> > in dma-resv - see all the other checks in this function. This is
> > covering the case for last queue dependecy only which is at least
> > sufficent help with ChromeOS case where this is triggered with a
> > burst
> > of user unbinds.
> > 
> > We might still have an issue with a burst of SVM unbinds where this
> > can
> > serialize though, that would however likely need some DRM scheduler
> > changes though fairly similar to this patch. I can maybe think on
> > that
> > one in a follow up in a later series. Also I probably should switch
> > over
> > SVM unbinds to drain the entire garbage collector list and issue a
> > single unbind job too.
> 
> Makes sense to me, but I'm also fine having that in a follow-up patch.
> 
> > 
> > > > +
> > > >                 if (job)
> > > > -                       err =
> > > > xe_sched_job_last_fence_add_dep(job,
> > > > vm);
> > > > +                       err =
> > > > xe_sched_job_last_fence_add_dep(job,
> > > > vm,
> > > > +                                                            
> > > > mask_ctx0,
> > > > +                                                            
> > > > mask_ctx1);
> > > >                 else
> > > > -                       err =
> > > > xe_exec_queue_last_fence_test_dep(pt_update_ops->q, vm);
> > > > +                       err =
> > > > xe_exec_queue_last_fence_test_dep(pt_update_ops->q,
> > > > +                                                               v
> > > > m,
> > > > mask_ctx0,
> > > > +                                                               m
> > > > ask_
> > > > ctx1);
> > > >         }
> > > >  
> > > >         for (i = 0; job && !err && i < vops->num_syncs; i++)
> > > > diff --git a/drivers/gpu/drm/xe/xe_sched_job.c
> > > > b/drivers/gpu/drm/xe/xe_sched_job.c
> > > > index d21bf8f26964..7cbdd87904c6 100644
> > > > --- a/drivers/gpu/drm/xe/xe_sched_job.c
> > > > +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> > > > @@ -6,6 +6,7 @@
> > > >  #include "xe_sched_job.h"
> > > >  
> > > >  #include <uapi/drm/xe_drm.h>
> > > > +#include <linux/dma-fence-array.h>
> > > >  #include <linux/dma-fence-chain.h>
> > > >  #include <linux/slab.h>
> > > >  
> > > > @@ -295,19 +296,60 @@ void xe_sched_job_push(struct xe_sched_job
> > > > *job)
> > > >         xe_sched_job_put(job);
> > > >  }
> > > >  
> > > > +/**
> > > > + * xe_sched_job_mask_dependency() - Determine if a dma-fence
> > > > dependency can be masked
> > > > + * @fence: The dma-fence to check
> > > > + * @mask_ctx0: First context to compare against the fence's
> > > > context
> > > > + * @mask_ctx1: Second context to compare against the fence's
> > > > context
> > > > + *
> > > > + * This function checks whether the context of the given dma-
> > > > fence
> > > > matches
> > > > + * either of the provided mask contexts. If a match is found,
> > > > the
> > > > dependency
> > > > + * represented by the fence can be skipped. If the fence is a
> > > > dma-
> > > > fence-array,
> > > > + * its individual fences are unwound and checked.
> > > > + *
> > > > + * Return: true if the fence can be masked (i.e., skipped),
> > > > false
> > > > otherwise.
> > > > + */
> > > > +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64
> > > > mask_ctx0,
> > > > +                                 u64 mask_ctx1)
> > > > +{
> > > > +       if (dma_fence_is_array(fence)) {
> > > > +               struct dma_fence *__fence;
> > > > +               int index;
> > > > +
> > > > +               dma_fence_array_for_each(__fence, index, fence)
> > > > +                       if (__fence->context == mask_ctx0 ||
> > > > +                           __fence->context == mask_ctx1)
> > > > +                               return true;
> > > > +       } else if (fence->context == mask_ctx0 ||
> > > > +                  fence->context == mask_ctx1) {
> > > > +               return true;
> > > > +       }
> > > > +
> > > > +       return false;
> > > > +}
> > > > +
> > > >  /**
> > > >   * xe_sched_job_last_fence_add_dep - Add last fence dependency
> > > > to
> > > > job
> > > >   * @job:job to add the last fence dependency to
> > > >   * @vm: virtual memory job belongs to
> > > > + * @mask_ctx0: Mask dma-fence context0
> > > > + * @mask_ctx1: Mask dma-fence context1
> > > > + *
> > > > + * Add last fence dependency to job, skipping masked dma fence
> > > > contexts.
> > > >   *
> > > >   * Returns:
> > > >   * 0 on success, or an error on failing to expand the array.
> > > >   */
> > > > -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job,
> > > > struct
> > > > xe_vm *vm)
> > > > +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job,
> > > > struct
> > > > xe_vm *vm,
> > > > +                                   u64 mask_ctx0, u64 mask_ctx1)
> > > >  {
> > > >         struct dma_fence *fence;
> > > >  
> > > >         fence = xe_exec_queue_last_fence_get(job->q, vm);
> > > > +       if (xe_sched_job_mask_dependency(fence, mask_ctx0,
> > > > mask_ctx1)) {
> > > > +               dma_fence_put(fence);
> > > > +               return 0;
> > > > +       }
> > > >  
> > > >         return drm_sched_job_add_dependency(&job->drm, fence);
> > > >  }
> > > > diff --git a/drivers/gpu/drm/xe/xe_sched_job.h
> > > > b/drivers/gpu/drm/xe/xe_sched_job.h
> > > > index 3dc72c5c1f13..81d8e848e605 100644
> > > > --- a/drivers/gpu/drm/xe/xe_sched_job.h
> > > > +++ b/drivers/gpu/drm/xe/xe_sched_job.h
> > > > @@ -58,7 +58,8 @@ bool xe_sched_job_completed(struct xe_sched_job
> > > > *job);
> > > >  void xe_sched_job_arm(struct xe_sched_job *job);
> > > >  void xe_sched_job_push(struct xe_sched_job *job);
> > > >  
> > > > -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job,
> > > > struct
> > > > xe_vm *vm);
> > > > +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job,
> > > > struct
> > > > xe_vm *vm,
> > > > +                                   u64 mask_ctx0, u64
> > > > mask_ctx1);
> > > >  void xe_sched_job_init_user_fence(struct xe_sched_job *job,
> > > >                                   struct xe_sync_entry *sync);
> > > >  
> > > > @@ -93,4 +94,8 @@ void xe_sched_job_snapshot_print(struct
> > > > xe_sched_job_snapshot *snapshot, struct
> > > >  int xe_sched_job_add_deps(struct xe_sched_job *job, struct
> > > > dma_resv
> > > > *resv,
> > > >                           enum dma_resv_usage usage);
> > > >  
> > > > +#define NO_MASK_DEP    (~0x0ull)
> > > > +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64
> > > > mask_ctx0,
> > > > +                                 u64 mask_ctx1);
> > > > +
> > > >  #endif
> > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > > b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > > index 492def04a559..f2fe7f9fbb22 100644
> > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > > @@ -32,6 +32,8 @@ struct xe_tlb_inval_job {
> > > >         u64 start;
> > > >         /** @end: End address to invalidate */
> > > >         u64 end;
> > > > +       /** @fence_context: Fence context for job */
> > > > +       u64 fence_context;
> > > >         /** @asid: Address space ID to invalidate */
> > > >         u32 asid;
> > > >         /** @fence_armed: Fence has been armed */
> > > > @@ -101,6 +103,7 @@ xe_tlb_inval_job_create(struct xe_exec_queue
> > > > *q,
> > > > struct xe_tlb_inval *tlb_inval,
> > > >         job->asid = asid;
> > > >         job->fence_armed = false;
> > > >         job->dep.ops = &dep_job_ops;
> > > 
> > > This means the "finished" context per the entity definition right?
> > > Can
> > > you either add a note here or change the job->fence_context name to
> > > reflect that? Or otherwise why is this adding the +1 here?
> > > 
> > 
> > The schedule context is entity->context, the finished context is
> > entity
> > + 1 - this is in DRM scheduler doc. I can add a comment to this for
> > now,
> > and roll a better fix which is DRM scheduler helper to fish out the
> > finished context into this series [1]. DRM scheduler stuff moves slow
> > so
> > that latter may take a minute and didn't want to block a fix on that.
> 
> Can you add some quick documentation there to that effect? Just nice
> not to have to go back and forth to the entity documentation which
> right now is just implied.
> 

Will add a comment.

Matt

> Also thanks for the link to that other series, I'll check that out too.
> 
> Thanks,
> Stuart
> 
> > 
> > Matt
> > 
> > [1] https://patchwork.freedesktop.org/series/155314/
> > 
> > > Thanks,
> > > Stuart
> > > 
> > > > +       job->fence_context = entity->fence_context + 1;
> > > >         kref_init(&job->refcount);
> > > >         xe_exec_queue_get(q);   /* Pairs with put in
> > > > xe_tlb_inval_job_destroy */
> > > >  
> > > > @@ -266,3 +269,14 @@ void xe_tlb_inval_job_put(struct
> > > > xe_tlb_inval_job *job)
> > > >         if (!IS_ERR_OR_NULL(job))
> > > >                 kref_put(&job->refcount,
> > > > xe_tlb_inval_job_destroy);
> > > >  }
> > > > +
> > > > +/**
> > > > + * xe_tlb_inval_job_fence_context() - TLB invalidation job fence
> > > > context
> > > > + * @job: TLB invalidation job object
> > > > + *
> > > > + * Return: TLB invalidation job fence context
> > > > + */
> > > > +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job)
> > > > +{
> > > > +       return job->fence_context;
> > > > +}
> > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > > b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > > index e63edcb26b50..2576165c2228 100644
> > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > > @@ -30,4 +30,6 @@ void xe_tlb_inval_job_get(struct
> > > > xe_tlb_inval_job
> > > > *job);
> > > >  
> > > >  void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job);
> > > >  
> > > > +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job
> > > > *job);
> > > > +
> > > >  #endif
> > > 
> 

  reply	other threads:[~2025-10-21 20:50 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-17 16:52 [PATCH 0/1] Fix serialization on burst of unbinds Matthew Brost
2025-10-17 16:52 ` [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations Matthew Brost
2025-10-21 17:55   ` Summers, Stuart
2025-10-21 20:36     ` Matthew Brost
2025-10-21 20:43       ` Summers, Stuart
2025-10-21 20:50         ` Matthew Brost [this message]
2025-10-22  8:00   ` Tvrtko Ursulin
2025-10-22 15:10     ` Matthew Brost
2025-10-23 12:46       ` Tvrtko Ursulin
2025-10-23 18:55         ` Matthew Brost
2025-10-23 19:27           ` Matthew Brost
2025-10-23 12:28   ` Thomas Hellström
2025-10-17 18:36 ` ✓ CI.KUnit: success for Fix serialization on burst of unbinds Patchwork
2025-10-17 19:16 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-18 18:20 ` ✗ Xe.CI.Full: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPfyEgGRRZXpj7Um@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=carlos.santa@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=stuart.summers@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox