Re: [PATCH] drm/xe: Always invalidate TLBs on userptr invalidation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: "Summers, Stuart" <stuart.summers@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Vishwanathapura,
	Niranjana" <niranjana.vishwanathapura@intel.com>
Subject: Re: [PATCH] drm/xe: Always invalidate TLBs on userptr invalidation
Date: Wed, 25 Mar 2026 14:47:14 -0700	[thread overview]
Message-ID: <acRX4oSHxMJ09yUA@gsse-cloud1> (raw)
In-Reply-To: <b62e8f960ccab92bcd045627e77a6d0d14c5cb18.camel@intel.com>

On Wed, Mar 25, 2026 at 02:35:26PM -0600, Summers, Stuart wrote:
> On Tue, 2026-03-24 at 16:53 -0700, Matthew Brost wrote:
> > On Tue, Mar 24, 2026 at 03:51:18PM -0600, Summers, Stuart wrote:
> > > On Tue, 2026-03-24 at 14:53 +0000, Summers, Stuart wrote:
> > > > On Mon, 2026-03-23 at 19:21 -0700, Matthew Brost wrote:
> > > > > On Mon, Mar 23, 2026 at 09:17:42PM +0000, Stuart Summers wrote:
> > > > > > Right now we are only invalidating TLBs when we are
> > > > > > running in fault mode. For non-fault mode based userptr
> > > > > > VMs, we then rely on context switches after an MMAP has
> > > > > > happened to ensure the TLB is clean for the next submission.
> > > > > > 
> > > > > > With context based TLB invalidation, we can no longer rely
> > > > > > on the implicit invalidation happening during context switch,
> > > > > > so remove the fault mode limiter and simply always perform
> > > > > > that invalidation.
> > > > > > 
> > > > > > I was able to see this behavior using the following test:
> > > > > > xe_exec_compute_mode --r twice-userptr-invalidate
> > > > > > 
> > > > > 
> > > > > Hmm, this implies that preempt fences don't issue a TLB
> > > > > invalidation...
> > > > > I thought we landed on preempt fences do a TLB invalidation in
> > > > > offline
> > > > > discussions.
> > > > 
> > > > So initially I was mostly focused on the xe_vm invalidate cases.
> > > > In
> > > > hindsight I should have done a little more focused testing here
> > > > as
> > > > well. This came up after some more extensive testing internally.
> > > > 
> > > > At least in my latest round of testing, I don't see any issues in
> > > > BAT/FULL after this change. I do think getting this merged (or
> > > > something similar) and tested more broadly would be interesting.
> > > > 
> > > > > 
> > > > > We may have more work to do here wrt context based TLB
> > > > > invalidations.
> > > > > 
> > > > > I think BO move path likely also needs to be updated then too.
> > > > 
> > > > Hm.. I haven't seen any issues on the BO side, although it's
> > > > possible
> > > > I've missed something in testing of course (like this LR case).
> > > > I'd
> > 
> > You'd only hit those cases if you run an eviction test.
> 
> Ok I'm manually running through some of these now and I don't see any
> issues so far (evict-small at least). 
> 

Try an evict-cm* test, but looking at it now, that probably won’t catch
the issues either.

Our eviction tests just aren’t very good — far from my finest work.
Looking at xe_evict.c, each BO is only touched once by the GPU. What
really needs to happen is multiple touches: the TLB should still
reference the old BO, it gets moved, and then we touch it again after it
moves back into place.

At one point I wrote a better eviction test [1], but it never got
merged. We could try to revive that one with this test case in mind. It
would need to be updated to run in preempt-fence mode and fault mode,
and I think we’d need to call touch_all_pages twice in the process
sections that trigger eviction. Also the way I VM bind addresses would
need to change too, as this version is using malloc and we found in
xe_exec_system_allocator that doesn't work.

[1] https://patchwork.freedesktop.org/patch/588613/?series=132251&rev=1

> > 
> > > > really like to get this tested more regularly in CI if possible
> > > > and
> > > > work through issues post merge. I think most of the remaining
> > > > issues,
> > > > if they come up, should be timing sensitive.
> > > > 
> > > > Also the fact that this isn't enabled in anything in xe_pci.c
> > > > right
> > > > now
> > > > means we shouldn't cause any major breakage otherwise for the
> > > > feature
> > > > generally. Of course this specific change impacts everyone using
> > > > preempt fence like you said. I could add a has_ctx_tlb_inval here
> > > > - I
> > > > had thought about that. But to me this seems like a more general
> > > > case
> > > > given the discussion we had about intent of explicit
> > > > invalidations.
> > 
> > I would add knob for context switch == TLB invalidation.
> 
> So basically if (fault_mode || has_ctx_tlb_inval)? I.e. is there a
> reason not to just tie has_ctx_tlb_inval and has_ctx_switch_inval (what
> you have below) together for now?
> 

I think it is fine to overload has_ctx_tlb_inval but maybe add some
kernel doc which indicates if has_ctx_tlb_inval == true, this implies
GPU context switches do not invalidate TLBs.

> > 
> > > > 
> > > > Anyway let me know what you think here.
> > > 
> > > Little more detail here on further analysis...
> > > 
> > > So I had found this through debug, but looking a little closer, I
> > > think
> > > the reason we need this is because of the xe_vm_rebind() function
> > > with
> > > these lines at the top:
> > > 
> > > if ((xe_vm_in_lr_mode(vm) && !rebind_worker)
> > > ||                       
> > >      list_empty(&vm-
> > > >rebind_list))                                    
> > >     return 0;
> > > 
> > > So basically if we aren't in preempt fence mode (lr_mode), we
> > > always
> > > invalidate the TLBs in ops_execute() called in the xe_vm_rebind()
> > > function later on. If we are in preempt fence mode, there is a
> > > corner
> > > case here it looks like where we call the preempt rebind worker
> > > that
> > > then calls xe_preempt_work_begin() -> xe_vm_validate_rebind() ->
> > > xe_vm_rebind(false) and hits the above case, causing us to skip the
> > > TLB
> > > invalidation. If we add the hook here, it forces invalidation for
> > 
> > Kinda, if you look at this comment in xe_pt.c:
> > 
> >                 /*
> >                  * If rebind, we have to invalidate TLB on !LR vms to
> > invalidate
> >                  * cached PTEs point to freed memory. On LR vms this
> > is done
> >                  * automatically when the context is re-enabled by
> > the rebind worker,
> >                  * or in fault mode it was invalidated on PTE
> > zapping.
> >                  *
> >                  * If !rebind, and scratch enabled VMs, there is a
> > chance the scratch
> >                  * PTE is already cached in the TLB so it needs to be
> > invalidated.
> >                  * On !LR VMs this is done in the ring ops preceding
> > a batch, but on
> >                  * LR, in particular on user-space batch buffer
> > chaining, it needs to
> >                  * be done here.
> >                  */
> >                 if ((!pt_op->rebind && xe_vm_has_scratch(vm) &&
> >                      xe_vm_in_lr_mode(vm)))
> >                         pt_update_ops->needs_invalidation = true;
> >                 else if (pt_op->rebind && !xe_vm_in_lr_mode(vm))
> >                         /* We bump also if batch_invalidate_tlb is
> > true */
> >                         vm->tlb_flush_seqno++;
> > 
> > We explicitly call out that we rely on “automatically when the the
> > context being re-enabled by the rebind worker”
> > 
> > So another option is to adjust this if statement to something like:
> > 
> > else if (pt_op->rebind && xe_vm_in_preempt_fence_mode(vm) &&
> > !context_switch_invalidation)
> >         pt_update_ops->needs_invalidation = true;
> > 
> > This would also cover the BO eviction case I mentioned above. Also
> > avoid
> > blocking in the notifier or BO evcition code on the TLB invalidation
> 
> Ok I get your point about the notifier path blocking vs the inval jobs
> we're creating in xe_pt.c...
> 

After the notifier runs (or a BO eviction) the code in xe_pt.c will
either execute from the exec IOCTL (dma-fence mode, vm->tlb_flush_seqno
covers this case), the rebind worker (preempt-fence mode, this new if
statement will invalidate TLBs), or a page fault (TLB already
invalidated in notifier or BO eviction in existing code as we don't have
any fences to wait on like the prior two modes).

Matt

> I'll give your change a try and let you know...
> 
> Thanks,
> Stuart
> 
> > which in general speeds up the entire kernel.
> > 
> > Matt
> > 
> > > lr_mode also.
> > > 
> > > We could just add this xe_vm_in_lr_mode() check here, but I still
> > > think
> > > doing this across the board is safest so we aren't hitting some
> > > other
> > > corner case in the future if we decide to rework those other
> > > scenarios.
> > > 
> > > Thanks,
> > > Stuart
> > > 
> > > > 
> > > > Thanks,
> > > > Stuart
> > > > 
> > > > >  
> > > > > > Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/xe/xe_userptr.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_userptr.c
> > > > > > b/drivers/gpu/drm/xe/xe_userptr.c
> > > > > > index 6761005c0b90..dfd679dd98d9 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_userptr.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_userptr.c
> > > > > > @@ -102,7 +102,7 @@ xe_vma_userptr_do_inval(struct xe_vm *vm,
> > > > > > struct xe_userptr_vma *uvma, bool is_d
> > > > > >                                     false,
> > > > > > MAX_SCHEDULE_TIMEOUT);
> > > > > >         XE_WARN_ON(err <= 0);
> > > > > >  
> > > > > > -       if (xe_vm_in_fault_mode(vm) && userptr->initial_bind)
> > > > > > {
> > > > > > +       if (userptr->initial_bind) {
> > > > > 
> > > > > Should be change this if statement based on hardware support?
> > > > > 
> > > > > Matt
> > > > > 
> > > > > >                 if (!userptr->finish_inuse) {
> > > > > >                         /*
> > > > > >                          * Defer the TLB wait to an extra
> > > > > > pass so
> > > > > > the caller
> > > > > > -- 
> > > > > > 2.43.0
> > > > > > 
> > > > 
> > > 
>

next prev parent reply	other threads:[~2026-03-25 21:47 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 21:17 [PATCH] drm/xe: Always invalidate TLBs on userptr invalidation Stuart Summers
2026-03-23 21:24 ` ✓ CI.KUnit: success for " Patchwork
2026-03-23 22:01 ` ✗ Xe.CI.BAT: failure " Patchwork
2026-03-24  2:21 ` [PATCH] " Matthew Brost
2026-03-24 14:53   ` Summers, Stuart
2026-03-24 21:51     ` Summers, Stuart
2026-03-24 23:53       ` Matthew Brost
2026-03-25 20:35         ` Summers, Stuart
2026-03-25 21:47           ` Matthew Brost [this message]
2026-03-25 22:21             ` Summers, Stuart
2026-03-24  6:12 ` ✓ Xe.CI.FULL: success for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acRX4oSHxMJ09yUA@gsse-cloud1 \
    --to=matthew.brost@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=niranjana.vishwanathapura@intel.com \
    --cc=stuart.summers@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.