Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: "Zeng, Oak" <oak.zeng@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH 01/13] drm/xe: Lock all gpuva ops during VM bind IOCTL
Date: Tue, 16 Apr 2024 17:02:35 +0000	[thread overview]
Message-ID: <Zh6vKzuCL3LhiOSx@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <SA1PR11MB6991AC5A46A3177C4742AEE292082@SA1PR11MB6991.namprd11.prod.outlook.com>

On Tue, Apr 16, 2024 at 09:51:24AM -0600, Zeng, Oak wrote:
> 
> 
> > -----Original Message-----
> > From: Brost, Matthew <matthew.brost@intel.com>
> > Sent: Wednesday, April 10, 2024 1:41 AM
> > To: intel-xe@lists.freedesktop.org
> > Cc: Brost, Matthew <matthew.brost@intel.com>; Zeng, Oak
> > <oak.zeng@intel.com>
> > Subject: [PATCH 01/13] drm/xe: Lock all gpuva ops during VM bind IOCTL
> > 
> > Lock all BOs used in gpuva ops and validate all BOs in a single step
> > during the VM bind IOCTL.
> > 
> > This help with the transition to making all gpuva ops in a VM bind IOCTL
> > a single atomic job which is required for proper error handling.
> > 
> > v2:
> >  - Better commit message (Oak)
> >  - s/op_lock/op_lock_and_prep, few other renames too (Oak)
> >  - Use DRM_EXEC_IGNORE_DUPLICATES flag in drm_exec_init (local testing)
> >  - Do not reserve slots in locking step (direction based on series from Thomas)
> > 
> > Cc: Oak Zeng <oak.zeng@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_vm.c | 147 +++++++++++++++++++++++++++----------
> >  1 file changed, 107 insertions(+), 40 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index 66b70fd3d105..6375c136e21a 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -414,19 +414,23 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm)
> > 
> >  #define XE_VM_REBIND_RETRY_TIMEOUT_MS 1000
> > 
> > -static void xe_vm_kill(struct xe_vm *vm)
> > +static void xe_vm_kill(struct xe_vm *vm, bool unlocked)
> >  {
> >  	struct xe_exec_queue *q;
> > 
> >  	lockdep_assert_held(&vm->lock);
> > 
> > -	xe_vm_lock(vm, false);
> > +	if (unlocked)
> > +		xe_vm_lock(vm, false);
> > +
> >  	vm->flags |= XE_VM_FLAG_BANNED;
> >  	trace_xe_vm_kill(vm);
> > 
> >  	list_for_each_entry(q, &vm->preempt.exec_queues, compute.link)
> >  		q->ops->kill(q);
> > -	xe_vm_unlock(vm);
> > +
> > +	if (unlocked)
> > +		xe_vm_unlock(vm);
> > 
> >  	/* TODO: Inform user the VM is banned */
> >  }
> > @@ -656,7 +660,7 @@ static void preempt_rebind_work_func(struct
> > work_struct *w)
> > 
> >  	if (err) {
> >  		drm_warn(&vm->xe->drm, "VM worker error: %d\n", err);
> > -		xe_vm_kill(vm);
> > +		xe_vm_kill(vm, true);
> >  	}
> >  	up_write(&vm->lock);
> > 
> > @@ -1876,17 +1880,9 @@ static int xe_vm_bind(struct xe_vm *vm, struct
> > xe_vma *vma, struct xe_exec_queue
> >  		      u32 num_syncs, bool immediate, bool first_op,
> >  		      bool last_op)
> >  {
> > -	int err;
> > -
> >  	xe_vm_assert_held(vm);
> >  	xe_bo_assert_held(bo);
> > 
> > -	if (bo && immediate) {
> > -		err = xe_bo_validate(bo, vm, true);
> 
> In this original code, bo validate is conditional to this immediate bind flag. This flag is from DRM_XE_VM_BIND_FLAG_IMMEDIATE. It seems later in this patch, the immediate flag is not used any more... can you explain? See below
> 

The immediate code just got pulled back in and omitted this in the
rebase. Will fix.

Matt

> > -		if (err)
> > -			return err;
> > -	}
> > -
> >  	return __xe_vm_bind(vm, vma, q, syncs, num_syncs, immediate,
> > first_op,
> >  			    last_op);
> >  }
> > @@ -2539,17 +2535,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm
> > *vm, struct xe_exec_queue *q,
> >  	return 0;
> >  }
> > 
> > -static int op_execute(struct drm_exec *exec, struct xe_vm *vm,
> > -		      struct xe_vma *vma, struct xe_vma_op *op)
> > +static int op_execute(struct xe_vm *vm, struct xe_vma *vma,
> > +		      struct xe_vma_op *op)
> >  {
> >  	int err;
> > 
> >  	lockdep_assert_held_write(&vm->lock);
> > 
> > -	err = xe_vm_lock_vma(exec, vma);
> > -	if (err)
> > -		return err;
> > -
> >  	xe_vm_assert_held(vm);
> >  	xe_bo_assert_held(xe_vma_bo(vma));
> > 
> > @@ -2630,19 +2622,10 @@ static int op_execute(struct drm_exec *exec,
> > struct xe_vm *vm,
> >  static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
> >  			       struct xe_vma_op *op)
> >  {
> > -	struct drm_exec exec;
> >  	int err;
> > 
> >  retry_userptr:
> > -	drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> > -	drm_exec_until_all_locked(&exec) {
> > -		err = op_execute(&exec, vm, vma, op);
> > -		drm_exec_retry_on_contention(&exec);
> > -		if (err)
> > -			break;
> > -	}
> > -	drm_exec_fini(&exec);
> > -
> > +	err = op_execute(vm, vma, op);
> >  	if (err == -EAGAIN) {
> >  		lockdep_assert_held_write(&vm->lock);
> > 
> > @@ -2807,29 +2790,113 @@ static void vm_bind_ioctl_ops_unwind(struct
> > xe_vm *vm,
> >  	}
> >  }
> > 
> > +static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma,
> > +				 bool validate)
> > +{
> > +	struct xe_bo *bo = xe_vma_bo(vma);
> > +	int err = 0;
> > +
> > +	if (bo) {
> > +		if (!bo->vm)
> > +			err = drm_exec_prepare_obj(exec, &bo->ttm.base, 0);
> > +		if (!err && validate)
> > +			err = xe_bo_validate(bo, xe_vma_vm(vma), true);
> > +	}
> > +
> > +	return err;
> > +}
> > +
> > +static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
> > +			    struct xe_vma_op *op)
> > +{
> > +	int err = 0;
> > +
> > +	switch (op->base.op) {
> > +	case DRM_GPUVA_OP_MAP:
> > +		err = vma_lock_and_validate(exec, op->map.vma,
> > +					    !xe_vm_in_fault_mode(vm));
> 
> I meant here, should the last parameter be: !xe_vm_in_fault_mode(vm) && IMMEDIATE flag? Or maybe the IMMEDIATE flag is deleted? I did see it in xe_vm.c:
> 
> 2796 #define SUPPORTED_FLAGS \
> 2797     (DRM_XE_VM_BIND_FLAG_READONLY | \
> 2798      DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
> 2799      DRM_XE_VM_BIND_FLAG_NULL | \
> 
> Oak
> 
> 
> > +		break;
> > +	case DRM_GPUVA_OP_REMAP:
> > +		err = vma_lock_and_validate(exec,
> > +					    gpuva_to_vma(op-
> > >base.remap.unmap->va),
> > +					    false);
> > +		if (!err && op->remap.prev)
> > +			err = vma_lock_and_validate(exec, op->remap.prev,
> > true);
> > +		if (!err && op->remap.next)
> > +			err = vma_lock_and_validate(exec, op->remap.next,
> > true);
> > +		break;
> > +	case DRM_GPUVA_OP_UNMAP:
> > +		err = vma_lock_and_validate(exec,
> > +					    gpuva_to_vma(op->base.unmap.va),
> > +					    false);
> > +		break;
> > +	case DRM_GPUVA_OP_PREFETCH:
> > +		err = vma_lock_and_validate(exec,
> > +					    gpuva_to_vma(op-
> > >base.prefetch.va), true);
> > +		break;
> > +	default:
> > +		drm_warn(&vm->xe->drm, "NOT POSSIBLE");
> > +	}
> > +
> > +	return err;
> > +}
> > +
> > +static int vm_bind_ioctl_ops_lock_and_prep(struct drm_exec *exec,
> > +					   struct xe_vm *vm,
> > +					   struct list_head *ops_list)
> > +{
> > +	struct xe_vma_op *op;
> > +	int err;
> > +
> > +	err = drm_exec_prepare_obj(exec, xe_vm_obj(vm), 0);
> > +	if (err)
> > +		return err;
> > +
> > +	list_for_each_entry(op, ops_list, link) {
> > +		err = op_lock_and_prep(exec, vm, op);
> > +		if (err)
> > +			return err;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
> >  				     struct list_head *ops_list)
> >  {
> > +	struct drm_exec exec;
> >  	struct xe_vma_op *op, *next;
> >  	int err;
> > 
> >  	lockdep_assert_held_write(&vm->lock);
> > 
> > -	list_for_each_entry_safe(op, next, ops_list, link) {
> > -		err = xe_vma_op_execute(vm, op);
> > -		if (err) {
> > -			drm_warn(&vm->xe->drm, "VM op(%d) failed with %d",
> > -				 op->base.op, err);
> > -			/*
> > -			 * FIXME: Killing VM rather than proper error handling
> > -			 */
> > -			xe_vm_kill(vm);
> > -			return -ENOSPC;
> > +	drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT |
> > +		      DRM_EXEC_IGNORE_DUPLICATES, 0);
> > +	drm_exec_until_all_locked(&exec) {
> > +		err = vm_bind_ioctl_ops_lock_and_prep(&exec, vm, ops_list);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (err)
> > +			goto unlock;
> > +
> > +		list_for_each_entry_safe(op, next, ops_list, link) {
> > +			err = xe_vma_op_execute(vm, op);
> > +			if (err) {
> > +				drm_warn(&vm->xe->drm, "VM op(%d) failed
> > with %d",
> > +					 op->base.op, err);
> > +				/*
> > +				 * FIXME: Killing VM rather than proper error
> > handling
> > +				 */
> > +				xe_vm_kill(vm, false);
> > +				err = -ENOSPC;
> > +				goto unlock;
> > +			}
> > +			xe_vma_op_cleanup(vm, op);
> >  		}
> > -		xe_vma_op_cleanup(vm, op);
> >  	}
> > 
> > -	return 0;
> > +unlock:
> > +	drm_exec_fini(&exec);
> > +	return err;
> >  }
> > 
> >  #define SUPPORTED_FLAGS	\
> > --
> > 2.34.1
> 

  reply	other threads:[~2024-04-16 17:02 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-10  5:40 [PATCH 00/13] Prep patches for 1 job per VM bind IOCTL Matthew Brost
2024-04-10  5:40 ` [PATCH 01/13] drm/xe: Lock all gpuva ops during " Matthew Brost
2024-04-16 15:51   ` Zeng, Oak
2024-04-16 17:02     ` Matthew Brost [this message]
2024-04-10  5:40 ` [PATCH 02/13] drm/xe: Add ops_execute function which returns a fence Matthew Brost
2024-04-18 16:16   ` Zeng, Oak
2024-04-18 19:36     ` Matthew Brost
2024-04-23  3:09       ` Zeng, Oak
2024-04-10  5:40 ` [PATCH 03/13] drm/xe: Move migrate to prefetch to op_lock_and_prep function Matthew Brost
2024-04-18 19:27   ` Zeng, Oak
2024-04-19 19:52     ` Matthew Brost
2024-04-23  3:32       ` Zeng, Oak
2024-04-10  5:40 ` [PATCH 04/13] drm/xe: Add struct xe_vma_ops abstraction Matthew Brost
2024-04-10  5:40 ` [PATCH 05/13] drm/xe: Use xe_vma_ops to implement xe_vm_rebind Matthew Brost
2024-04-19  3:43   ` Zeng, Oak
2024-04-19  4:14     ` Matthew Brost
2024-04-23  3:17       ` Zeng, Oak
2024-04-10  5:40 ` [PATCH 06/13] drm/xe: Simplify VM bind IOCTL error handling and cleanup Matthew Brost
2024-04-19  4:19   ` Zeng, Oak
2024-04-19 19:16     ` Matthew Brost
2024-04-23  3:22       ` Zeng, Oak
2024-04-10  5:40 ` [PATCH 07/13] drm/xe: Use xe_vma_ops to implement page fault rebinds Matthew Brost
2024-04-19 14:22   ` Zeng, Oak
2024-04-19 19:33     ` Matthew Brost
2024-04-23  3:27       ` Zeng, Oak
2024-04-10  5:40 ` [PATCH 08/13] drm/xe: Add some members to xe_vma_ops Matthew Brost
2024-04-19 14:24   ` Zeng, Oak
2024-04-10  5:40 ` [PATCH 09/13] drm/xe: Add vm_bind_ioctl_ops_fini helper Matthew Brost
2024-04-19 14:51   ` Zeng, Oak
2024-04-10  5:40 ` [PATCH 10/13] drm/xe: Move ufence check to op_lock Matthew Brost
2024-04-19 14:56   ` Zeng, Oak
2024-04-19 19:34     ` Matthew Brost
2024-04-10  5:40 ` [PATCH 11/13] drm/xe: Move ufence add to vm_bind_ioctl_ops_fini Matthew Brost
2024-04-19 15:24   ` Zeng, Oak
2024-04-19 19:45     ` Matthew Brost
2024-04-23  3:36       ` Zeng, Oak
2024-04-10  5:40 ` [PATCH 12/13] drm/xe: Add xe_gt_tlb_invalidation_range and convert PT layer to use this Matthew Brost
2024-04-19 16:00   ` Zeng, Oak
2024-04-10  5:40 ` [PATCH 13/13] drm/xe: Delete PT update selftest Matthew Brost
2024-04-10  6:28 ` ✗ CI.Patch_applied: failure for Prep patches for 1 job per VM bind IOCTL Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zh6vKzuCL3LhiOSx@DUT025-TGLU.fm.intel.com \
    --to=matthew.brost@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=oak.zeng@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox