Re: [PATCH] drm/xe: Invalidate userptr VMA on page pin fault

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org, fei.yang@intel.com,
	rodrigo.vivi@intel.com
Subject: Re: [PATCH] drm/xe: Invalidate userptr VMA on page pin fault
Date: Mon, 11 Mar 2024 20:23:27 +0100	[thread overview]
Message-ID: <2613ec1be671403e6746ad93e2f2966d3f8898a9.camel@linux.intel.com> (raw)
In-Reply-To: <Ze9SNeu2Ifl+/vOI@DUT025-TGLU.fm.intel.com>

On Mon, 2024-03-11 at 18:49 +0000, Matthew Brost wrote:
> On Mon, Mar 11, 2024 at 02:29:26PM +0100, Thomas Hellström wrote:
> > On Mon, 2024-03-11 at 11:55 +0100, Thomas Hellström wrote:
> > > Hi, Matthew
> > > 
> > > On Fri, 2024-03-08 at 13:37 -0800, Matthew Brost wrote:
> > > > Rather than return an error to the user or ban the VM when
> > > > userptr
> > > > VMA
> > > > page pin fails with -EFAULT, invalidate VMA mappings. This
> > > > supports
> > > > the
> > > > UMD use case of freeing userptr while still having bindings.
> > > > 
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_gt_pagefault.c |  4 ++--
> > > >  drivers/gpu/drm/xe/xe_trace.h        |  2 +-
> > > >  drivers/gpu/drm/xe/xe_vm.c           | 20 +++++++++++++-------
> > > >  drivers/gpu/drm/xe/xe_vm_types.h     |  7 ++-----
> > > >  4 files changed, 18 insertions(+), 15 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > index 73c535193a98..241c294270d9 100644
> > > > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > @@ -69,7 +69,7 @@ static bool access_is_atomic(enum access_type
> > > > access_type)
> > > >  static bool vma_is_valid(struct xe_tile *tile, struct xe_vma
> > > > *vma)
> > > >  {
> > > >  	return BIT(tile->id) & vma->tile_present &&
> > > > -		!(BIT(tile->id) & vma->usm.tile_invalidated);
> > > > +		!(BIT(tile->id) & vma->tile_invalidated);
> > > >  }
> > > >  
> > > >  static bool vma_matches(struct xe_vma *vma, u64 page_addr)
> > > > @@ -226,7 +226,7 @@ static int handle_pagefault(struct xe_gt
> > > > *gt,
> > > > struct pagefault *pf)
> > > >  
> > > >  	if (xe_vma_is_userptr(vma))
> > > >  		ret =
> > > > xe_vma_userptr_check_repin(to_userptr_vma(vma));
> > > > -	vma->usm.tile_invalidated &= ~BIT(tile->id);
> > > > +	vma->tile_invalidated &= ~BIT(tile->id);
> > > >  
> > > >  unlock_dma_resv:
> > > >  	drm_exec_fini(&exec);
> > > > diff --git a/drivers/gpu/drm/xe/xe_trace.h
> > > > b/drivers/gpu/drm/xe/xe_trace.h
> > > > index 4ddc55527f9a..846f14507d5f 100644
> > > > --- a/drivers/gpu/drm/xe/xe_trace.h
> > > > +++ b/drivers/gpu/drm/xe/xe_trace.h
> > > > @@ -468,7 +468,7 @@ DEFINE_EVENT(xe_vma,
> > > > xe_vma_userptr_invalidate,
> > > >  	     TP_ARGS(vma)
> > > >  );
> > > >  
> > > > -DEFINE_EVENT(xe_vma, xe_vma_usm_invalidate,
> > > > +DEFINE_EVENT(xe_vma, xe_vma_invalidate,
> > > >  	     TP_PROTO(struct xe_vma *vma),
> > > >  	     TP_ARGS(vma)
> > > >  );
> > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > > b/drivers/gpu/drm/xe/xe_vm.c
> > > > index 643b3701a738..9a19044f7ef6 100644
> > > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > > @@ -724,11 +724,18 @@ int xe_vm_userptr_pin(struct xe_vm *vm)
> > > >  	list_for_each_entry_safe(uvma, next, &vm-
> > > > > userptr.repin_list,
> > > >  				 userptr.repin_link) {
> > > >  		err = xe_vma_userptr_pin_pages(uvma);
> > > > -		if (err < 0)
> > > > -			return err;
> > > > -
> > > >  		list_del_init(&uvma->userptr.repin_link);
> > > > -		list_move_tail(&uvma-
> > > > >vma.combined_links.rebind,
> > > > &vm->rebind_list);
> > > > +		if (err == -EFAULT) {
> > > > +			err = xe_vm_invalidate_vma(&uvma-
> > > > >vma);
> > > 
> > > I think we need to check for FAULT_MODE here. If we hit this path
> > > in
> > > FAULT_MODE, we already have an invalid gpu access and can kill
> > > the
> > > VM.
> > > 
> 
> Agree, will fix.
> 
> > > In preempt-fence mode, we should probably be calling
> > > xe_vm_unbind_vma(), because xe_vm_invalidate_vma() isn't safe to
> > > call
> > > outside of the mmu_notifier, and if there are still BOOKKEEP
> > > fences
> > > pending- see the asserts in that function.
> > 
> > 
> > Actually, xe_vm_invalidate_vma() would probably work if we grabbed
> > the
> > vm resv and waited for bookkeep fences first, and updated the
> > asserts. 
> > 
> 
> Yes, I think that will work but is that even needed? We have vm->lock
> here in write mode which should prevent any further updates to the
> page
> tables. What are we trying prevent a race a against? An in-flight
> bind
> job which touches the same page tables? I guess that is possible.

Yeah, if we adhere to the locking rules for page tables we don't need
to care about keeping track of current and future potential racing code
paths.

>  
> > But then xe_vm_unbind_vma() might still be better since we also
> > clean
> > up the page-tables.
> > 
> 
> I think we do not want to mess with the VMA state as that should only
> be
> changed by user IOCTLs. An invalidation seems to be the right call
> here.

I think the tile_present is the only vma_state touched there. What I'm
after is really also freeing the now unused page-directories, but
that's not a necessity. 

/Thomas



> 
> > /Thomas
> > 
> > 
> > > 
> > > > +			if (err)
> > > > +				return err;
> > > > +		} else {
> > > > +			if (err < 0)
> > > > +				return err;
> > > > +
> > > > +			list_move_tail(&uvma-
> > > > > vma.combined_links.rebind,
> > > > +				       &vm->rebind_list);
> > > > +		}
> > > >  	}
> > > >  
> > > >  	return 0;
> > > > @@ -3214,9 +3221,8 @@ int xe_vm_invalidate_vma(struct xe_vma
> > > > *vma)
> > > >  	u8 id;
> > > >  	int ret;
> > > >  
> > > > -	xe_assert(xe, xe_vm_in_fault_mode(xe_vma_vm(vma)));
> > > >  	xe_assert(xe, !xe_vma_is_null(vma));
> > > > -	trace_xe_vma_usm_invalidate(vma);
> > > > +	trace_xe_vma_invalidate(vma);
> > > >  
> > > >  	/* Check that we don't race with page-table updates */
> > > >  	if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
> > > > @@ -3254,7 +3260,7 @@ int xe_vm_invalidate_vma(struct xe_vma
> > > > *vma)
> > > >  		}
> > > >  	}
> > > >  
> > > > -	vma->usm.tile_invalidated = vma->tile_mask;
> > > > +	vma->tile_invalidated = vma->tile_mask;
> > > >  
> > > >  	return 0;
> > > >  }
> > > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> > > > b/drivers/gpu/drm/xe/xe_vm_types.h
> > > > index 79b5cab57711..ae5fb565f6bf 100644
> > > > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > > > @@ -84,11 +84,8 @@ struct xe_vma {
> > > >  		struct work_struct destroy_work;
> > > >  	};
> > > >  
> > > > -	/** @usm: unified shared memory state */
> > > > -	struct {
> > > > -		/** @tile_invalidated: VMA has been
> > > > invalidated */
> > > > -		u8 tile_invalidated;
> > > > -	} usm;
> > > > +	/** @tile_invalidated: VMA has been invalidated */
> > > > +	u8 tile_invalidated;
> > > 
> > > Add a comment in the commit message about removing the usm
> > > struct?
> 
> Will add.
> 
> Matt
> 
> > > /Thomas
> > > 
> > > 
> > > >  
> > > >  	/** @tile_mask: Tile mask of where to create binding
> > > > for
> > > > this VMA */
> > > >  	u8 tile_mask;
> > > 
> >

next prev parent reply	other threads:[~2024-03-11 19:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-08 21:37 [PATCH] drm/xe: Invalidate userptr VMA on page pin fault Matthew Brost
2024-03-08 21:42 ` ✓ CI.Patch_applied: success for " Patchwork
2024-03-08 21:42 ` ✓ CI.checkpatch: " Patchwork
2024-03-08 21:43 ` ✗ CI.KUnit: failure " Patchwork
2024-03-11 10:55 ` [PATCH] " Thomas Hellström
2024-03-11 13:29   ` Thomas Hellström
2024-03-11 18:49     ` Matthew Brost
2024-03-11 19:23       ` Thomas Hellström [this message]
  -- strict thread matches above, loose matches on Subject: below --
2024-03-12 18:39 Matthew Brost
2024-03-13 12:18 ` Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2613ec1be671403e6746ad93e2f2966d3f8898a9.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=fei.yang@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox