Re: [PATCH v4 1/2] drm/xe: Userptr invalidation race with binds fixes

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org, matthew.auld@intel.com
Subject: Re: [PATCH v4 1/2] drm/xe: Userptr invalidation race with binds fixes
Date: Tue, 25 Feb 2025 19:56:36 +0100	[thread overview]
Message-ID: <4267074055afdb298914de6658b64f89e55136c5.camel@linux.intel.com> (raw)
In-Reply-To: <Z74Bo0no06PmV87O@lstrano-desk.jf.intel.com>

On Tue, 2025-02-25 at 09:45 -0800, Matthew Brost wrote:
> On Tue, Feb 25, 2025 at 03:30:54PM +0100, Thomas Hellström wrote:
> > Hi, Matt,
> > 
> > On Mon, 2025-02-24 at 09:01 -0800, Matthew Brost wrote:
> > > Always wait on dma-resv bookkeep slots if userptr invalidation
> > > has
> > > raced
> > > with a bind ensuring PTEs temporally setup to invalidated pages
> > > are
> > > never accessed.
> > > 
> > > Fixup initial bind handling always add VMAs to invalidation list
> > > and
> > > wait dma-resv bookkeep slots.
> > > 
> > > Always hold notifier across TLB invalidation in notifier to
> > > prevent a
> > > UAF if an unbind races.
> > > 
> > > Including all of the above changes for Fixes patch in hopes of an
> > > easier
> > > backport which fix a single patch.
> > > 
> > > v2:
> > >  - Wait dma-resv bookkeep before issuing PTE zap (Thomas)
> > >  - Support scratch page on invalidation (Thomas)
> > > v3:
> > >  - Drop clear of PTEs (Thomas)
> > 
> > This was what I actually meant.
> > 
> 
> Ok, I presented this as option and it wasn't clear to me this was
> preferred.

Well, I think the more special cases we can get rid of in the code, the
better? Or at least, like in this case, split out what's common with
the vm notifier into an xe_vm function and call that, making it more
clear to the reader that we force an invalidation.

> 
> > https://patchwork.freedesktop.org/patch/639489/?series=145409&rev=1
> > 
> 
> This patch is doesn't work.
> xe_vm.munmap-style-unbind-userptr-one-partial hangs due the error
> injection always firing on a single user bind, so we'd have to fix
> the
> error injection too.

I have a follow up patch that splits out a part of the notifier like
described above and calls that for each inject, also invalidating the
userptr's seqno, and that fixes the above problem, but then the code
hangs in 

xe_exec_fault_mode --r once-userptr-prefetch

but that's a different failure mode. Apparently the prefetch code
doesn't repin an invalid userptr and returns -EAGAIN forever...

/Thomas



> 
> Matt
>  
> > /Thomas
> > 
> > > v4:
> > >  - Remove double dma-resv wait
> > > 
> > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > Cc: <stable@vger.kernel.org>
> > > Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into
> > > single
> > > job")
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_pt.c | 21 ++++++++++++---------
> > >  drivers/gpu/drm/xe/xe_vm.c |  4 ++--
> > >  2 files changed, 14 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > b/drivers/gpu/drm/xe/xe_pt.c
> > > index 1ddcc7e79a93..ffd23c3564c5 100644
> > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > @@ -1215,9 +1215,6 @@ static int vma_check_userptr(struct xe_vm
> > > *vm,
> > > struct xe_vma *vma,
> > >  	uvma = to_userptr_vma(vma);
> > >  	notifier_seq = uvma->userptr.notifier_seq;
> > >  
> > > -	if (uvma->userptr.initial_bind &&
> > > !xe_vm_in_fault_mode(vm))
> > > -		return 0;
> > > -
> > >  	if (!mmu_interval_read_retry(&uvma->userptr.notifier,
> > >  				     notifier_seq) &&
> > >  	    !xe_pt_userptr_inject_eagain(uvma))
> > > @@ -1226,6 +1223,8 @@ static int vma_check_userptr(struct xe_vm
> > > *vm,
> > > struct xe_vma *vma,
> > >  	if (xe_vm_in_fault_mode(vm)) {
> > >  		return -EAGAIN;
> > >  	} else {
> > > +		long err;
> > > +
> > >  		spin_lock(&vm->userptr.invalidated_lock);
> > >  		list_move_tail(&uvma->userptr.invalidate_link,
> > >  			       &vm->userptr.invalidated);
> > > @@ -1234,19 +1233,23 @@ static int vma_check_userptr(struct xe_vm
> > > *vm, struct xe_vma *vma,
> > >  		if (xe_vm_in_preempt_fence_mode(vm)) {
> > >  			struct dma_resv_iter cursor;
> > >  			struct dma_fence *fence;
> > > -			long err;
> > >  
> > >  			dma_resv_iter_begin(&cursor,
> > > xe_vm_resv(vm),
> > >  					   
> > > DMA_RESV_USAGE_BOOKKEEP);
> > >  			dma_resv_for_each_fence_unlocked(&cursor
> > > ,
> > > fence)
> > >  				dma_fence_enable_sw_signaling(fe
> > > nce)
> > > ;
> > >  			dma_resv_iter_end(&cursor);
> > > -
> > > -			err =
> > > dma_resv_wait_timeout(xe_vm_resv(vm),
> > > -						   
> > > DMA_RESV_USAGE_BOOKKEEP,
> > > -						    false,
> > > MAX_SCHEDULE_TIMEOUT);
> > > -			XE_WARN_ON(err <= 0);
> > >  		}
> > > +
> > > +		/*
> > > +		 * We are temporally installing PTEs pointing to
> > > invalidated
> > > +		 * pages, ensure VM is idle to avoid data
> > > corruption. PTEs fixed
> > > +		 * up upon next exec or in rebind worker.
> > > +		 */
> > > +		err = dma_resv_wait_timeout(xe_vm_resv(vm),
> > > +					   
> > > DMA_RESV_USAGE_BOOKKEEP,
> > > +					    false,
> > > MAX_SCHEDULE_TIMEOUT);
> > > +		XE_WARN_ON(err <= 0);
> > >  	}
> > >  
> > >  	return 0;
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > b/drivers/gpu/drm/xe/xe_vm.c
> > > index 996000f2424e..9b2acb069a77 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -623,8 +623,6 @@ static bool vma_userptr_invalidate(struct
> > > mmu_interval_notifier *mni,
> > >  		spin_unlock(&vm->userptr.invalidated_lock);
> > >  	}
> > >  
> > > -	up_write(&vm->userptr.notifier_lock);
> > > -
> > >  	/*
> > >  	 * Preempt fences turn into schedule disables, pipeline
> > > these.
> > >  	 * Note that even in fault mode, we need to wait for
> > > binds
> > > and
> > > @@ -647,6 +645,8 @@ static bool vma_userptr_invalidate(struct
> > > mmu_interval_notifier *mni,
> > >  		XE_WARN_ON(err);
> > >  	}
> > >  
> > > +	up_write(&vm->userptr.notifier_lock);
> > > +
> > >  	trace_xe_vma_userptr_invalidate_complete(vma);
> > >  
> > >  	return true;
> >

next prev parent reply	other threads:[~2025-02-25 18:56 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-24 17:01 [PATCH v4 0/2] Userptr fixes Matthew Brost
2025-02-24 17:01 ` [PATCH v4 1/2] drm/xe: Userptr invalidation race with binds fixes Matthew Brost
2025-02-25 14:30   ` Thomas Hellström
2025-02-25 17:45     ` Matthew Brost
2025-02-25 18:56       ` Thomas Hellström [this message]
2025-02-25 19:23         ` Matthew Brost
2025-02-25 20:06           ` Matthew Brost
2025-02-26 15:13             ` Thomas Hellström
2025-02-24 17:01 ` [PATCH v4 2/2] drm/xe: Add staging tree for VM binds Matthew Brost
2025-02-25  0:17 ` ✓ CI.Patch_applied: success for Userptr fixes Patchwork
2025-02-25  0:17 ` ✓ CI.checkpatch: " Patchwork
2025-02-25  0:18 ` ✓ CI.KUnit: " Patchwork
2025-02-25  0:35 ` ✓ CI.Build: " Patchwork
2025-02-25  0:37 ` ✓ CI.Hooks: " Patchwork
2025-02-25  0:38 ` ✓ CI.checksparse: " Patchwork
2025-02-25  0:58 ` ✓ Xe.CI.BAT: " Patchwork
2025-02-25  3:53 ` ✗ Xe.CI.Full: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4267074055afdb298914de6658b64f89e55136c5.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox