Re: [RFC] drm/xe: Ensure write lock is held when calling xe_vma_rebind()

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: Nirmoy Das <nirmoy.das@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, Oak Zeng <oak.zeng@intel.com>
Subject: Re: [RFC] drm/xe: Ensure write lock is held when calling xe_vma_rebind()
Date: Wed, 5 Jun 2024 16:44:36 +0000	[thread overview]
Message-ID: <ZmCV9Kc4LBXtT0ev@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <20240605101607.29931-1-nirmoy.das@intel.com>

On Wed, Jun 05, 2024 at 12:16:07PM +0200, Nirmoy Das wrote:
> xe_vma_rebind() expects write vm lock is held so start out with a
> read lock and upgrade if a vma rebind is required.
> 
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1894
> Fixes: bf69918b7199 ("drm/xe: Use xe_vma_ops to implement page fault rebinds")
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Oak Zeng <oak.zeng@intel.com>
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_pagefault.c | 25 +++++++++----------------
>  1 file changed, 9 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> index 040dd142c49c..543cfa9892f6 100644
> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> @@ -153,24 +153,15 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
>  		return -EINVAL;
>  
>  retry_userptr:
> -	/*
> -	 * TODO: Avoid exclusive lock if VM doesn't have userptrs, or
> -	 * start out read-locked?
> -	 */
> -	down_write(&vm->lock);
> -	write_locked = true;
> +	/* start out with read-locked and upgrade of vma is needed a rebind */
> +	down_read(&vm->lock);
> +	write_locked = false;
>  	vma = lookup_vma(vm, pf->page_addr);
>  	if (!vma) {
>  		ret = -EINVAL;
>  		goto unlock_vm;
>  	}
>  
> -	if (!xe_vma_is_userptr(vma) ||
> -	    !xe_vma_userptr_check_repin(to_userptr_vma(vma))) {
> -		downgrade_write(&vm->lock);
> -		write_locked = false;
> -	}
> -
>  	trace_xe_vma_pagefault(vma);
>  
>  	atomic = access_is_atomic(pf->access_type);
> @@ -179,9 +170,14 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
>  	if (vma_is_valid(tile, vma) && !atomic)
>  		goto unlock_vm;
>  
> +	/* xe_vma_rebind() expects a write lock so upgrade the vm lock */
> +	up_read(&vm->lock);
> +	down_write(&vm->lock);

You can't do this, or even if it works it is bad, bad practice to drop
the lock as someone else could come in and acquire the lock in write
mode. In this case, a user unbind of the VMA could happen and the VMA
could be destoryed. Now the VMA refs below are accessing corrupt memory.

I have on going refactor [1] here which just takes the vm->lock in write
mode for page faults. I suggest we just review / use that now.

Long term I think we need to audit the read / write usage of vm->lock as
I'm almost certain we have broken the intended usage of this lock. Or
honestly maybe just change this lock to a mutex as I'm unsure if the
read vs. write buys us anything and complicates the driver.

Matt

[1] https://patchwork.freedesktop.org/series/133628/

> +	write_locked = true;
>  	/* TODO: Validate fault */
>  
> -	if (xe_vma_is_userptr(vma) && write_locked) {
> +	if (xe_vma_is_userptr(vma) &&
> +	    xe_vma_userptr_check_repin(to_userptr_vma(vma))) {
>  		struct xe_userptr_vma *uvma = to_userptr_vma(vma);
>  
>  		spin_lock(&vm->userptr.invalidated_lock);
> @@ -191,9 +187,6 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
>  		ret = xe_vma_userptr_pin_pages(uvma);
>  		if (ret)
>  			goto unlock_vm;
> -
> -		downgrade_write(&vm->lock);
> -		write_locked = false;
>  	}
>  
>  	/* Lock VM and BOs dma-resv */
> -- 
> 2.42.0
>

next prev parent reply	other threads:[~2024-06-05 16:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-05 10:16 [RFC] drm/xe: Ensure write lock is held when calling xe_vma_rebind() Nirmoy Das
2024-06-05 10:55 ` ✓ CI.Patch_applied: success for " Patchwork
2024-06-05 10:55 ` ✓ CI.checkpatch: " Patchwork
2024-06-05 10:56 ` ✓ CI.KUnit: " Patchwork
2024-06-05 11:07 ` ✓ CI.Build: " Patchwork
2024-06-05 11:08 ` ✗ CI.Hooks: failure " Patchwork
2024-06-05 11:09 ` ✓ CI.checksparse: success " Patchwork
2024-06-05 11:41 ` ✓ CI.BAT: " Patchwork
2024-06-05 16:44 ` Matthew Brost [this message]
2024-06-06 10:09   ` [RFC] " Nirmoy Das
2024-06-06 16:30     ` Matthew Brost
2024-06-06  1:43 ` ✗ CI.FULL: failure for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZmCV9Kc4LBXtT0ev@DUT025-TGLU.fm.intel.com \
    --to=matthew.brost@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=nirmoy.das@intel.com \
    --cc=oak.zeng@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox