[PATCH 1/7] drm/xe: Make TLB invalidation fences unordered

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/7] drm/xe: Make TLB invalidation fences unordered
       [not found] <20240321113720.120865-1-thomas.hellstrom@linux.intel.com>
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds Thomas Hellström
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Matthew Brost, stable

They can actually complete out-of-order, so allocate a unique
fence context for each fence.

Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after rebinds issued from execs")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 1 -
 drivers/gpu/drm/xe/xe_gt_types.h            | 7 -------
 drivers/gpu/drm/xe/xe_pt.c                  | 3 +--
 3 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index a3c4ffba679d..787cba5e49a1 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -63,7 +63,6 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 	INIT_LIST_HEAD(&gt->tlb_invalidation.pending_fences);
 	spin_lock_init(&gt->tlb_invalidation.pending_lock);
 	spin_lock_init(&gt->tlb_invalidation.lock);
-	gt->tlb_invalidation.fence_context = dma_fence_context_alloc(1);
 	INIT_DELAYED_WORK(&gt->tlb_invalidation.fence_tdr,
 			  xe_gt_tlb_fence_timeout);
 
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index f6da2ad9719f..2143dffcaf11 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -179,13 +179,6 @@ struct xe_gt {
 		 * xe_gt_tlb_fence_timeout after the timeut interval is over.
 		 */
 		struct delayed_work fence_tdr;
-		/** @tlb_invalidation.fence_context: context for TLB invalidation fences */
-		u64 fence_context;
-		/**
-		 * @tlb_invalidation.fence_seqno: seqno to TLB invalidation fences, protected by
-		 * tlb_invalidation.lock
-		 */
-		u32 fence_seqno;
 		/** @tlb_invalidation.lock: protects TLB invalidation fences */
 		spinlock_t lock;
 	} tlb_invalidation;
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 8d3922d2206e..a5bb8d20f815 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1135,8 +1135,7 @@ static int invalidation_fence_init(struct xe_gt *gt,
 	spin_lock_irq(&gt->tlb_invalidation.lock);
 	dma_fence_init(&ifence->base.base, &invalidation_fence_ops,
 		       &gt->tlb_invalidation.lock,
-		       gt->tlb_invalidation.fence_context,
-		       ++gt->tlb_invalidation.fence_seqno);
+		       dma_fence_context_alloc(1), 1);
 	spin_unlock_irq(&gt->tlb_invalidation.lock);
 
 	INIT_LIST_HEAD(&ifence->base.link);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds
       [not found] <20240321113720.120865-1-thomas.hellstrom@linux.intel.com>
  2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-21 19:09   ` Matthew Brost
  2024-03-21 11:37 ` [PATCH 2/7] drm/xe: Rework rebinding Thomas Hellström
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Matthew Brost, stable

For each rebind we insert a GuC TLB invalidation and add a
corresponding unordered TLB invalidation fence. This might
add a huge number of TLB invalidation fences to wait for so
rather than doing that, defer the TLB invalidation to the
next ring ops for each affected exec queue. Since the TLB
is invalidated on exec_queue switch, we need to invalidate
once for each affected exec_queue.

Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after rebinds issued from execs")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 ++
 drivers/gpu/drm/xe/xe_pt.c               |  5 +++--
 drivers/gpu/drm/xe/xe_ring_ops.c         | 11 ++++-------
 drivers/gpu/drm/xe/xe_sched_job.c        | 11 +++++++++++
 drivers/gpu/drm/xe/xe_sched_job_types.h  |  2 ++
 drivers/gpu/drm/xe/xe_vm_types.h         |  5 +++++
 6 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 62b3d9d1d7cd..891ad30e906f 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -148,6 +148,8 @@ struct xe_exec_queue {
 	const struct xe_ring_ops *ring_ops;
 	/** @entity: DRM sched entity for this exec queue (1 to 1 relationship) */
 	struct drm_sched_entity *entity;
+	/** @tlb_flush_seqno: The seqno of the last rebind tlb flush performed */
+	u64 tlb_flush_seqno;
 	/** @lrc: logical ring context for this exec queue */
 	struct xe_lrc lrc[];
 };
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 8d3922d2206e..21bc0d13fccf 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1254,11 +1254,12 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
 	 * non-faulting LR, in particular on user-space batch buffer chaining,
 	 * it needs to be done here.
 	 */
-	if ((rebind && !xe_vm_in_lr_mode(vm) && !vm->batch_invalidate_tlb) ||
-	    (!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) {
+	if ((!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) {
 		ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
 		if (!ifence)
 			return ERR_PTR(-ENOMEM);
+	} else if (rebind && !xe_vm_in_lr_mode(vm) && !vm->batch_invalidate_tlb) {
+		vm->tlb_flush_seqno++;
 	}
 
 	rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index c4edffcd4a32..5b2b37b59813 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -219,10 +219,9 @@ static void __emit_job_gen12_simple(struct xe_sched_job *job, struct xe_lrc *lrc
 {
 	u32 dw[MAX_JOB_SIZE_DW], i = 0;
 	u32 ppgtt_flag = get_ppgtt_flag(job);
-	struct xe_vm *vm = job->q->vm;
 	struct xe_gt *gt = job->q->gt;
 
-	if (vm && vm->batch_invalidate_tlb) {
+	if (job->ring_ops_flush_tlb) {
 		dw[i++] = preparser_disable(true);
 		i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
 					seqno, true, dw, i);
@@ -270,7 +269,6 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
 	struct xe_gt *gt = job->q->gt;
 	struct xe_device *xe = gt_to_xe(gt);
 	bool decode = job->q->class == XE_ENGINE_CLASS_VIDEO_DECODE;
-	struct xe_vm *vm = job->q->vm;
 
 	dw[i++] = preparser_disable(true);
 
@@ -282,13 +280,13 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
 			i = emit_aux_table_inv(gt, VE0_AUX_INV, dw, i);
 	}
 
-	if (vm && vm->batch_invalidate_tlb)
+	if (job->ring_ops_flush_tlb)
 		i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
 					seqno, true, dw, i);
 
 	dw[i++] = preparser_disable(false);
 
-	if (!vm || !vm->batch_invalidate_tlb)
+	if (!job->ring_ops_flush_tlb)
 		i = emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
 					seqno, dw, i);
 
@@ -317,7 +315,6 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 	struct xe_gt *gt = job->q->gt;
 	struct xe_device *xe = gt_to_xe(gt);
 	bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
-	struct xe_vm *vm = job->q->vm;
 	u32 mask_flags = 0;
 
 	dw[i++] = preparser_disable(true);
@@ -327,7 +324,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 		mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
 
 	/* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */
-	i = emit_pipe_invalidate(mask_flags, vm && vm->batch_invalidate_tlb, dw, i);
+	i = emit_pipe_invalidate(mask_flags, job->ring_ops_flush_tlb, dw, i);
 
 	/* hsdes: 1809175790 */
 	if (has_aux_ccs(xe))
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index 8151ddafb940..d55458d915a9 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -250,6 +250,17 @@ bool xe_sched_job_completed(struct xe_sched_job *job)
 
 void xe_sched_job_arm(struct xe_sched_job *job)
 {
+	struct xe_exec_queue *q = job->q;
+	struct xe_vm *vm = q->vm;
+
+	if (vm && !xe_sched_job_is_migration(q) && !xe_vm_in_lr_mode(vm) &&
+	    vm->tlb_flush_seqno != q->tlb_flush_seqno) {
+		q->tlb_flush_seqno = vm->tlb_flush_seqno;
+		job->ring_ops_flush_tlb = true;
+	} else if (vm && vm->batch_invalidate_tlb) {
+		job->ring_ops_flush_tlb = true;
+	}
+
 	drm_sched_job_arm(&job->drm);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
index b1d83da50a53..5e12724219fd 100644
--- a/drivers/gpu/drm/xe/xe_sched_job_types.h
+++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
@@ -39,6 +39,8 @@ struct xe_sched_job {
 	} user_fence;
 	/** @migrate_flush_flags: Additional flush flags for migration jobs */
 	u32 migrate_flush_flags;
+	/** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */
+	bool ring_ops_flush_tlb;
 	/** @batch_addr: batch buffer address of job */
 	u64 batch_addr[];
 };
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index ae5fb565f6bf..5747f136d24d 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -264,6 +264,11 @@ struct xe_vm {
 		bool capture_once;
 	} error_capture;
 
+	/**
+	 * @tlb_flush_seqno: Required TLB flush seqno for the next exec.
+	 * protected by the vm resv.
+	 */
+	u64 tlb_flush_seqno;
 	/** @batch_invalidate_tlb: Always invalidate TLB before batch start */
 	bool batch_invalidate_tlb;
 	/** @xef: XE file handle for tracking this VM's drm client */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds
  2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds Thomas Hellström
@ 2024-03-21 19:09   ` Matthew Brost
  2024-03-21 21:14     ` Thomas Hellström
  2024-03-21 21:55     ` Thomas Hellström
  0 siblings, 2 replies; 14+ messages in thread
From: Matthew Brost @ 2024-03-21 19:09 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe, stable

On Thu, Mar 21, 2024 at 12:37:11PM +0100, Thomas Hellström wrote:
> For each rebind we insert a GuC TLB invalidation and add a
> corresponding unordered TLB invalidation fence. This might
> add a huge number of TLB invalidation fences to wait for so
> rather than doing that, defer the TLB invalidation to the
> next ring ops for each affected exec queue. Since the TLB
> is invalidated on exec_queue switch, we need to invalidate
> once for each affected exec_queue.
> 
> Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after rebinds issued from execs")
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 ++
>  drivers/gpu/drm/xe/xe_pt.c               |  5 +++--
>  drivers/gpu/drm/xe/xe_ring_ops.c         | 11 ++++-------
>  drivers/gpu/drm/xe/xe_sched_job.c        | 11 +++++++++++
>  drivers/gpu/drm/xe/xe_sched_job_types.h  |  2 ++
>  drivers/gpu/drm/xe/xe_vm_types.h         |  5 +++++
>  6 files changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index 62b3d9d1d7cd..891ad30e906f 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -148,6 +148,8 @@ struct xe_exec_queue {
>  	const struct xe_ring_ops *ring_ops;
>  	/** @entity: DRM sched entity for this exec queue (1 to 1 relationship) */
>  	struct drm_sched_entity *entity;
> +	/** @tlb_flush_seqno: The seqno of the last rebind tlb flush performed */
> +	u64 tlb_flush_seqno;
>  	/** @lrc: logical ring context for this exec queue */
>  	struct xe_lrc lrc[];
>  };
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 8d3922d2206e..21bc0d13fccf 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1254,11 +1254,12 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
>  	 * non-faulting LR, in particular on user-space batch buffer chaining,
>  	 * it needs to be done here.
>  	 */
> -	if ((rebind && !xe_vm_in_lr_mode(vm) && !vm->batch_invalidate_tlb) ||
> -	    (!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) {
> +	if ((!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) {

Looked why this works in fault mode, we disallow scratch page in fault
mode. I thought at one point we had implementation for that [1] but it
looks like it never got merged. Some to keep an eye on.

[1] https://patchwork.freedesktop.org/series/120480/

>  		ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
>  		if (!ifence)
>  			return ERR_PTR(-ENOMEM);
> +	} else if (rebind && !xe_vm_in_lr_mode(vm) && !vm->batch_invalidate_tlb) {
> +		vm->tlb_flush_seqno++;

Can we unwind this if / else clause a bit?

I think batch_invalidate_tlb can only be true if !xe_vm_in_lr_mode(vm).

So else if 'rebind && !xe_vm_in_lr_mode(vm)' should work. Also if
batch_invalidate_tlb is we true we always issue TLB invalidate anyways
and incrementing the seqno is harmles too.

Side note, I'd be remiss if I didn't mention that I really do not like
updating these functions (__xe_pt_bind_vma / __xe_pt_unbind_vma) as they
are going away / being reworked here [2] in order to implement 1 job per
IOCTL / proper error handling.

[2] https://patchwork.freedesktop.org/series/125608/

>  	}
>  
>  	rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index c4edffcd4a32..5b2b37b59813 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -219,10 +219,9 @@ static void __emit_job_gen12_simple(struct xe_sched_job *job, struct xe_lrc *lrc
>  {
>  	u32 dw[MAX_JOB_SIZE_DW], i = 0;
>  	u32 ppgtt_flag = get_ppgtt_flag(job);
> -	struct xe_vm *vm = job->q->vm;
>  	struct xe_gt *gt = job->q->gt;
>  
> -	if (vm && vm->batch_invalidate_tlb) {
> +	if (job->ring_ops_flush_tlb) {
>  		dw[i++] = preparser_disable(true);
>  		i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
>  					seqno, true, dw, i);
> @@ -270,7 +269,6 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
>  	struct xe_gt *gt = job->q->gt;
>  	struct xe_device *xe = gt_to_xe(gt);
>  	bool decode = job->q->class == XE_ENGINE_CLASS_VIDEO_DECODE;
> -	struct xe_vm *vm = job->q->vm;
>  
>  	dw[i++] = preparser_disable(true);
>  
> @@ -282,13 +280,13 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
>  			i = emit_aux_table_inv(gt, VE0_AUX_INV, dw, i);
>  	}
>  
> -	if (vm && vm->batch_invalidate_tlb)
> +	if (job->ring_ops_flush_tlb)
>  		i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
>  					seqno, true, dw, i);
>  
>  	dw[i++] = preparser_disable(false);
>  
> -	if (!vm || !vm->batch_invalidate_tlb)
> +	if (!job->ring_ops_flush_tlb)
>  		i = emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
>  					seqno, dw, i);
>  
> @@ -317,7 +315,6 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
>  	struct xe_gt *gt = job->q->gt;
>  	struct xe_device *xe = gt_to_xe(gt);
>  	bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
> -	struct xe_vm *vm = job->q->vm;
>  	u32 mask_flags = 0;
>  
>  	dw[i++] = preparser_disable(true);
> @@ -327,7 +324,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
>  		mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
>  
>  	/* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */
> -	i = emit_pipe_invalidate(mask_flags, vm && vm->batch_invalidate_tlb, dw, i);
> +	i = emit_pipe_invalidate(mask_flags, job->ring_ops_flush_tlb, dw, i);
>  
>  	/* hsdes: 1809175790 */
>  	if (has_aux_ccs(xe))
> diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
> index 8151ddafb940..d55458d915a9 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job.c
> +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> @@ -250,6 +250,17 @@ bool xe_sched_job_completed(struct xe_sched_job *job)
>  
>  void xe_sched_job_arm(struct xe_sched_job *job)
>  {
> +	struct xe_exec_queue *q = job->q;
> +	struct xe_vm *vm = q->vm;
> +
> +	if (vm && !xe_sched_job_is_migration(q) && !xe_vm_in_lr_mode(vm) &&
> +	    vm->tlb_flush_seqno != q->tlb_flush_seqno) {
> +		q->tlb_flush_seqno = vm->tlb_flush_seqno;
> +		job->ring_ops_flush_tlb = true;
> +	} else if (vm && vm->batch_invalidate_tlb) {
> +		job->ring_ops_flush_tlb = true;
> +	}
> +

Can we simplify this too?

	if (vm && (vm->batch_invalidate_tlb || (vm->tlb_flush_seqno != q->tlb_flush_seqno))) {
		q->tlb_flush_seqno = vm->tlb_flush_seqno;
		job->ring_ops_flush_tlb = true;
	}

I think this works as xe_sched_job_is_migration has
emit_migration_job_gen12 which doesn't look at job->ring_ops_flush_tlb,
so no need to xe_sched_job_is_migration.

Also no need to check xe_vm_in_lr_mode as we wouldn'y increment the
seqno above if that true.

Lastly, harmless to increment q->tlb_flush_seqno in the case of
batch_invalidate_tlb being true.

>  	drm_sched_job_arm(&job->drm);
>  }
>  
> diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
> index b1d83da50a53..5e12724219fd 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job_types.h
> +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
> @@ -39,6 +39,8 @@ struct xe_sched_job {
>  	} user_fence;
>  	/** @migrate_flush_flags: Additional flush flags for migration jobs */
>  	u32 migrate_flush_flags;
> +	/** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */
> +	bool ring_ops_flush_tlb;

How about JOB_FLAG_FLUSH_TLB rather than a new field? See
JOB_FLAG_SUBMIT flag usage.

Matt

>  	/** @batch_addr: batch buffer address of job */
>  	u64 batch_addr[];
>  };
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index ae5fb565f6bf..5747f136d24d 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -264,6 +264,11 @@ struct xe_vm {
>  		bool capture_once;
>  	} error_capture;
>  
> +	/**
> +	 * @tlb_flush_seqno: Required TLB flush seqno for the next exec.
> +	 * protected by the vm resv.
> +	 */
> +	u64 tlb_flush_seqno;
>  	/** @batch_invalidate_tlb: Always invalidate TLB before batch start */
>  	bool batch_invalidate_tlb;
>  	/** @xef: XE file handle for tracking this VM's drm client */
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds
  2024-03-21 19:09   ` Matthew Brost
@ 2024-03-21 21:14     ` Thomas Hellström
  2024-03-21 21:21       ` Thomas Hellström
  2024-03-21 21:55     ` Thomas Hellström
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2024-03-21 21:14 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, stable

Hi, Matthew,

Thanks for reviewing, please see inline.

On Thu, 2024-03-21 at 19:09 +0000, Matthew Brost wrote:
> On Thu, Mar 21, 2024 at 12:37:11PM +0100, Thomas Hellström wrote:
> > For each rebind we insert a GuC TLB invalidation and add a
> > corresponding unordered TLB invalidation fence. This might
> > add a huge number of TLB invalidation fences to wait for so
> > rather than doing that, defer the TLB invalidation to the
> > next ring ops for each affected exec queue. Since the TLB
> > is invalidated on exec_queue switch, we need to invalidate
> > once for each affected exec_queue.
> > 
> > Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after
> > rebinds issued from execs")
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: <stable@vger.kernel.org> # v6.8+
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 ++
> >  drivers/gpu/drm/xe/xe_pt.c               |  5 +++--
> >  drivers/gpu/drm/xe/xe_ring_ops.c         | 11 ++++-------
> >  drivers/gpu/drm/xe/xe_sched_job.c        | 11 +++++++++++
> >  drivers/gpu/drm/xe/xe_sched_job_types.h  |  2 ++
> >  drivers/gpu/drm/xe/xe_vm_types.h         |  5 +++++
> >  6 files changed, 27 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > index 62b3d9d1d7cd..891ad30e906f 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > @@ -148,6 +148,8 @@ struct xe_exec_queue {
> >  	const struct xe_ring_ops *ring_ops;
> >  	/** @entity: DRM sched entity for this exec queue (1 to 1
> > relationship) */
> >  	struct drm_sched_entity *entity;
> > +	/** @tlb_flush_seqno: The seqno of the last rebind tlb
> > flush performed */
> > +	u64 tlb_flush_seqno;
> >  	/** @lrc: logical ring context for this exec queue */
> >  	struct xe_lrc lrc[];
> >  };
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > b/drivers/gpu/drm/xe/xe_pt.c
> > index 8d3922d2206e..21bc0d13fccf 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -1254,11 +1254,12 @@ __xe_pt_bind_vma(struct xe_tile *tile,
> > struct xe_vma *vma, struct xe_exec_queue
> >  	 * non-faulting LR, in particular on user-space batch
> > buffer chaining,
> >  	 * it needs to be done here.
> >  	 */
> > -	if ((rebind && !xe_vm_in_lr_mode(vm) && !vm-
> > >batch_invalidate_tlb) ||
> > -	    (!rebind && xe_vm_has_scratch(vm) &&
> > xe_vm_in_preempt_fence_mode(vm))) {
> > +	if ((!rebind && xe_vm_has_scratch(vm) &&
> > xe_vm_in_preempt_fence_mode(vm))) {
> 
> Looked why this works in fault mode, we disallow scratch page in
> fault
> mode. I thought at one point we had implementation for that [1] but
> it
> looks like it never got merged. Some to keep an eye on.
> 
> [1] https://patchwork.freedesktop.org/series/120480/
> 
> >  		ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
> >  		if (!ifence)
> >  			return ERR_PTR(-ENOMEM);
> > +	} else if (rebind && !xe_vm_in_lr_mode(vm) && !vm-
> > >batch_invalidate_tlb) {
> > +		vm->tlb_flush_seqno++;
> 
> Can we unwind this if / else clause a bit?
> 
> I think batch_invalidate_tlb can only be true if
> !xe_vm_in_lr_mode(vm).
> 
> So else if 'rebind && !xe_vm_in_lr_mode(vm)' should work. Also if
> batch_invalidate_tlb is we true we always issue TLB invalidate
> anyways
> and incrementing the seqno is harmles too.

Yes, although I don't really like making assumptions in the code what
it does with a certain variable elsewhere. At some point in the future
people might change that, or say "Hey, they unnecessarily increment the
seqno here or forget a branch. I could add asserts about
batch_invalidate_tlb, though to avoid such future mishaps.


> 
> Side note, I'd be remiss if I didn't mention that I really do not
> like
> updating these functions (__xe_pt_bind_vma / __xe_pt_unbind_vma) as
> they
> are going away / being reworked here [2] in order to implement 1 job
> per
> IOCTL / proper error handling.
> 
> [2] https://patchwork.freedesktop.org/series/125608/

Fully understandible. That's why I asked whether you thought it
clashed. However I want this backported to 6.8 so I don't see any other
way of doing it.

/Thomas


> 
> >  	}
> >  
> >  	rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
> > diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c
> > b/drivers/gpu/drm/xe/xe_ring_ops.c
> > index c4edffcd4a32..5b2b37b59813 100644
> > --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> > +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> > @@ -219,10 +219,9 @@ static void __emit_job_gen12_simple(struct
> > xe_sched_job *job, struct xe_lrc *lrc
> >  {
> >  	u32 dw[MAX_JOB_SIZE_DW], i = 0;
> >  	u32 ppgtt_flag = get_ppgtt_flag(job);
> > -	struct xe_vm *vm = job->q->vm;
> >  	struct xe_gt *gt = job->q->gt;
> >  
> > -	if (vm && vm->batch_invalidate_tlb) {
> > +	if (job->ring_ops_flush_tlb) {
> >  		dw[i++] = preparser_disable(true);
> >  		i =
> > emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
> >  					seqno, true, dw, i);
> > @@ -270,7 +269,6 @@ static void __emit_job_gen12_video(struct
> > xe_sched_job *job, struct xe_lrc *lrc,
> >  	struct xe_gt *gt = job->q->gt;
> >  	struct xe_device *xe = gt_to_xe(gt);
> >  	bool decode = job->q->class ==
> > XE_ENGINE_CLASS_VIDEO_DECODE;
> > -	struct xe_vm *vm = job->q->vm;
> >  
> >  	dw[i++] = preparser_disable(true);
> >  
> > @@ -282,13 +280,13 @@ static void __emit_job_gen12_video(struct
> > xe_sched_job *job, struct xe_lrc *lrc,
> >  			i = emit_aux_table_inv(gt, VE0_AUX_INV,
> > dw, i);
> >  	}
> >  
> > -	if (vm && vm->batch_invalidate_tlb)
> > +	if (job->ring_ops_flush_tlb)
> >  		i =
> > emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
> >  					seqno, true, dw, i);
> >  
> >  	dw[i++] = preparser_disable(false);
> >  
> > -	if (!vm || !vm->batch_invalidate_tlb)
> > +	if (!job->ring_ops_flush_tlb)
> >  		i =
> > emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
> >  					seqno, dw, i);
> >  
> > @@ -317,7 +315,6 @@ static void
> > __emit_job_gen12_render_compute(struct xe_sched_job *job,
> >  	struct xe_gt *gt = job->q->gt;
> >  	struct xe_device *xe = gt_to_xe(gt);
> >  	bool lacks_render = !(gt->info.engine_mask &
> > XE_HW_ENGINE_RCS_MASK);
> > -	struct xe_vm *vm = job->q->vm;
> >  	u32 mask_flags = 0;
> >  
> >  	dw[i++] = preparser_disable(true);
> > @@ -327,7 +324,7 @@ static void
> > __emit_job_gen12_render_compute(struct xe_sched_job *job,
> >  		mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
> >  
> >  	/* See __xe_pt_bind_vma() for a discussion on TLB
> > invalidations. */
> > -	i = emit_pipe_invalidate(mask_flags, vm && vm-
> > >batch_invalidate_tlb, dw, i);
> > +	i = emit_pipe_invalidate(mask_flags, job-
> > >ring_ops_flush_tlb, dw, i);
> >  
> >  	/* hsdes: 1809175790 */
> >  	if (has_aux_ccs(xe))
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job.c
> > b/drivers/gpu/drm/xe/xe_sched_job.c
> > index 8151ddafb940..d55458d915a9 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job.c
> > +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> > @@ -250,6 +250,17 @@ bool xe_sched_job_completed(struct
> > xe_sched_job *job)
> >  
> >  void xe_sched_job_arm(struct xe_sched_job *job)
> >  {
> > +	struct xe_exec_queue *q = job->q;
> > +	struct xe_vm *vm = q->vm;
> > +
> > +	if (vm && !xe_sched_job_is_migration(q) &&
> > !xe_vm_in_lr_mode(vm) &&
> > +	    vm->tlb_flush_seqno != q->tlb_flush_seqno) {
> > +		q->tlb_flush_seqno = vm->tlb_flush_seqno;
> > +		job->ring_ops_flush_tlb = true;
> > +	} else if (vm && vm->batch_invalidate_tlb) {
> > +		job->ring_ops_flush_tlb = true;
> > +	}
> > +
> 
> Can we simplify this too?
> 
> 	if (vm && (vm->batch_invalidate_tlb || (vm->tlb_flush_seqno
> != q->tlb_flush_seqno))) {
> 		q->tlb_flush_seqno = vm->tlb_flush_seqno;
> 		job->ring_ops_flush_tlb = true;
> 	}
> 
> I think this works as xe_sched_job_is_migration has
> emit_migration_job_gen12 which doesn't look at job-
> >ring_ops_flush_tlb,
> so no need to xe_sched_job_is_migration.
> 
> Also no need to check xe_vm_in_lr_mode as we wouldn'y increment the
> seqno above if that true.
> 
> Lastly, harmless to increment q->tlb_flush_seqno in the case of
> batch_invalidate_tlb being true.
> 
> >  	drm_sched_job_arm(&job->drm);
> >  }
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h
> > b/drivers/gpu/drm/xe/xe_sched_job_types.h
> > index b1d83da50a53..5e12724219fd 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job_types.h
> > +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
> > @@ -39,6 +39,8 @@ struct xe_sched_job {
> >  	} user_fence;
> >  	/** @migrate_flush_flags: Additional flush flags for
> > migration jobs */
> >  	u32 migrate_flush_flags;
> > +	/** @ring_ops_flush_tlb: The ring ops need to flush TLB
> > before payload. */
> > +	bool ring_ops_flush_tlb;
> 
> How about JOB_FLAG_FLUSH_TLB rather than a new field? See
> JOB_FLAG_SUBMIT flag usage.
> 
> Matt
> 
> >  	/** @batch_addr: batch buffer address of job */
> >  	u64 batch_addr[];
> >  };
> > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> > b/drivers/gpu/drm/xe/xe_vm_types.h
> > index ae5fb565f6bf..5747f136d24d 100644
> > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > @@ -264,6 +264,11 @@ struct xe_vm {
> >  		bool capture_once;
> >  	} error_capture;
> >  
> > +	/**
> > +	 * @tlb_flush_seqno: Required TLB flush seqno for the next
> > exec.
> > +	 * protected by the vm resv.
> > +	 */
> > +	u64 tlb_flush_seqno;
> >  	/** @batch_invalidate_tlb: Always invalidate TLB before
> > batch start */
> >  	bool batch_invalidate_tlb;
> >  	/** @xef: XE file handle for tracking this VM's drm client
> > */
> > -- 
> > 2.44.0
> > 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds
  2024-03-21 21:14     ` Thomas Hellström
@ 2024-03-21 21:21       ` Thomas Hellström
  2024-03-22  2:36         ` Matthew Brost
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2024-03-21 21:21 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, stable

On Thu, 2024-03-21 at 22:14 +0100, Thomas Hellström wrote:
> Hi, Matthew,
> 
> Thanks for reviewing, please see inline.
> 
> On Thu, 2024-03-21 at 19:09 +0000, Matthew Brost wrote:
> > On Thu, Mar 21, 2024 at 12:37:11PM +0100, Thomas Hellström wrote:
> > > For each rebind we insert a GuC TLB invalidation and add a
> > > corresponding unordered TLB invalidation fence. This might
> > > add a huge number of TLB invalidation fences to wait for so
> > > rather than doing that, defer the TLB invalidation to the
> > > next ring ops for each affected exec queue. Since the TLB
> > > is invalidated on exec_queue switch, we need to invalidate
> > > once for each affected exec_queue.
> > > 
> > > Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after
> > > rebinds issued from execs")
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Cc: <stable@vger.kernel.org> # v6.8+
> > > Signed-off-by: Thomas Hellström
> > > <thomas.hellstrom@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 ++
> > >  drivers/gpu/drm/xe/xe_pt.c               |  5 +++--
> > >  drivers/gpu/drm/xe/xe_ring_ops.c         | 11 ++++-------
> > >  drivers/gpu/drm/xe/xe_sched_job.c        | 11 +++++++++++
> > >  drivers/gpu/drm/xe/xe_sched_job_types.h  |  2 ++
> > >  drivers/gpu/drm/xe/xe_vm_types.h         |  5 +++++
> > >  6 files changed, 27 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > index 62b3d9d1d7cd..891ad30e906f 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > @@ -148,6 +148,8 @@ struct xe_exec_queue {
> > >  	const struct xe_ring_ops *ring_ops;
> > >  	/** @entity: DRM sched entity for this exec queue (1 to
> > > 1
> > > relationship) */
> > >  	struct drm_sched_entity *entity;
> > > +	/** @tlb_flush_seqno: The seqno of the last rebind tlb
> > > flush performed */
> > > +	u64 tlb_flush_seqno;
> > >  	/** @lrc: logical ring context for this exec queue */
> > >  	struct xe_lrc lrc[];
> > >  };
> > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > b/drivers/gpu/drm/xe/xe_pt.c
> > > index 8d3922d2206e..21bc0d13fccf 100644
> > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > @@ -1254,11 +1254,12 @@ __xe_pt_bind_vma(struct xe_tile *tile,
> > > struct xe_vma *vma, struct xe_exec_queue
> > >  	 * non-faulting LR, in particular on user-space batch
> > > buffer chaining,
> > >  	 * it needs to be done here.
> > >  	 */
> > > -	if ((rebind && !xe_vm_in_lr_mode(vm) && !vm-

While I remember it. When looking at your series I noticed that this
line got incorrectly moved there. Looks like you used
xe_vm_in_lr_mode() rather than !xe_vm_in_lr_mode()..

Thomas


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds
  2024-03-21 21:21       ` Thomas Hellström
@ 2024-03-22  2:36         ` Matthew Brost
  0 siblings, 0 replies; 14+ messages in thread
From: Matthew Brost @ 2024-03-22  2:36 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe, stable

On Thu, Mar 21, 2024 at 10:21:07PM +0100, Thomas Hellström wrote:
> On Thu, 2024-03-21 at 22:14 +0100, Thomas Hellström wrote:
> > Hi, Matthew,
> > 
> > Thanks for reviewing, please see inline.
> > 
> > On Thu, 2024-03-21 at 19:09 +0000, Matthew Brost wrote:
> > > On Thu, Mar 21, 2024 at 12:37:11PM +0100, Thomas Hellström wrote:
> > > > For each rebind we insert a GuC TLB invalidation and add a
> > > > corresponding unordered TLB invalidation fence. This might
> > > > add a huge number of TLB invalidation fences to wait for so
> > > > rather than doing that, defer the TLB invalidation to the
> > > > next ring ops for each affected exec queue. Since the TLB
> > > > is invalidated on exec_queue switch, we need to invalidate
> > > > once for each affected exec_queue.
> > > > 
> > > > Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after
> > > > rebinds issued from execs")
> > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > Cc: <stable@vger.kernel.org> # v6.8+
> > > > Signed-off-by: Thomas Hellström
> > > > <thomas.hellstrom@linux.intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 ++
> > > >  drivers/gpu/drm/xe/xe_pt.c               |  5 +++--
> > > >  drivers/gpu/drm/xe/xe_ring_ops.c         | 11 ++++-------
> > > >  drivers/gpu/drm/xe/xe_sched_job.c        | 11 +++++++++++
> > > >  drivers/gpu/drm/xe/xe_sched_job_types.h  |  2 ++
> > > >  drivers/gpu/drm/xe/xe_vm_types.h         |  5 +++++
> > > >  6 files changed, 27 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > index 62b3d9d1d7cd..891ad30e906f 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > @@ -148,6 +148,8 @@ struct xe_exec_queue {
> > > >  	const struct xe_ring_ops *ring_ops;
> > > >  	/** @entity: DRM sched entity for this exec queue (1 to
> > > > 1
> > > > relationship) */
> > > >  	struct drm_sched_entity *entity;
> > > > +	/** @tlb_flush_seqno: The seqno of the last rebind tlb
> > > > flush performed */
> > > > +	u64 tlb_flush_seqno;
> > > >  	/** @lrc: logical ring context for this exec queue */
> > > >  	struct xe_lrc lrc[];
> > > >  };
> > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > > b/drivers/gpu/drm/xe/xe_pt.c
> > > > index 8d3922d2206e..21bc0d13fccf 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > > @@ -1254,11 +1254,12 @@ __xe_pt_bind_vma(struct xe_tile *tile,
> > > > struct xe_vma *vma, struct xe_exec_queue
> > > >  	 * non-faulting LR, in particular on user-space batch
> > > > buffer chaining,
> > > >  	 * it needs to be done here.
> > > >  	 */
> > > > -	if ((rebind && !xe_vm_in_lr_mode(vm) && !vm-
> 
> While I remember it. When looking at your series I noticed that this
> line got incorrectly moved there. Looks like you used
> xe_vm_in_lr_mode() rather than !xe_vm_in_lr_mode()..
> 

Thanks, indeed I did invert that logic. Will fix in my next rev which
I'll rebase after this is merged.

Matt

> Thomas
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds
  2024-03-21 19:09   ` Matthew Brost
  2024-03-21 21:14     ` Thomas Hellström
@ 2024-03-21 21:55     ` Thomas Hellström
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2024-03-21 21:55 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, stable

On Thu, 2024-03-21 at 19:09 +0000, Matthew Brost wrote:
> 
> 
> Can we simplify this too?
> 
> 	if (vm && (vm->batch_invalidate_tlb || (vm->tlb_flush_seqno
> != q->tlb_flush_seqno))) {
> 		q->tlb_flush_seqno = vm->tlb_flush_seqno;
> 		job->ring_ops_flush_tlb = true;
> 	}
> 
> I think this works as xe_sched_job_is_migration has
> emit_migration_job_gen12 which doesn't look at job-
> >ring_ops_flush_tlb,
> so no need to xe_sched_job_is_migration.
> 
> Also no need to check xe_vm_in_lr_mode as we wouldn'y increment the
> seqno above if that true.
> 
> Lastly, harmless to increment q->tlb_flush_seqno in the case of
> batch_invalidate_tlb being true.

I think I can simplify it a bit. Problem is that neither the migration
vm nor lr mode grabs the vm->resv at arm time so we would access the
seqnos unlocked and potentially get caught by clever static analyzers.

I actually had an assert for that at one point. I should probably re-
add it.

/Thomas


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/7] drm/xe: Rework rebinding
       [not found] <20240321113720.120865-1-thomas.hellstrom@linux.intel.com>
  2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
  2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-21 19:14   ` Matthew Brost
  2024-03-21 11:37 ` [PATCH 2/7] drm/xe: Use ring ops TLB invalidation for rebinds Thomas Hellström
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Rodrigo Vivi, Matthew Brost, stable

Instead of handling the vm's rebind fence separately,
which is error prone if they are not strictly ordered,
attach rebind fences as kernel fences to the vm's resv.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_exec.c     | 31 +++----------------------------
 drivers/gpu/drm/xe/xe_pt.c       |  2 +-
 drivers/gpu/drm/xe/xe_vm.c       | 27 +++++++++------------------
 drivers/gpu/drm/xe/xe_vm.h       |  2 +-
 drivers/gpu/drm/xe/xe_vm_types.h |  3 ---
 5 files changed, 14 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 7692ebfe7d47..759497d4a102 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -152,7 +152,6 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	struct drm_exec *exec = &vm_exec.exec;
 	u32 i, num_syncs = 0, num_ufence = 0;
 	struct xe_sched_job *job;
-	struct dma_fence *rebind_fence;
 	struct xe_vm *vm;
 	bool write_locked, skip_retry = false;
 	ktime_t end = 0;
@@ -294,35 +293,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	 * Rebind any invalidated userptr or evicted BOs in the VM, non-compute
 	 * VM mode only.
 	 */
-	rebind_fence = xe_vm_rebind(vm, false);
-	if (IS_ERR(rebind_fence)) {
-		err = PTR_ERR(rebind_fence);
+	err = xe_vm_rebind(vm, false);
+	if (err)
 		goto err_put_job;
-	}
-
-	/*
-	 * We store the rebind_fence in the VM so subsequent execs don't get
-	 * scheduled before the rebinds of userptrs / evicted BOs is complete.
-	 */
-	if (rebind_fence) {
-		dma_fence_put(vm->rebind_fence);
-		vm->rebind_fence = rebind_fence;
-	}
-	if (vm->rebind_fence) {
-		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-			     &vm->rebind_fence->flags)) {
-			dma_fence_put(vm->rebind_fence);
-			vm->rebind_fence = NULL;
-		} else {
-			dma_fence_get(vm->rebind_fence);
-			err = drm_sched_job_add_dependency(&job->drm,
-							   vm->rebind_fence);
-			if (err)
-				goto err_put_job;
-		}
-	}
 
-	/* Wait behind munmap style rebinds */
+	/* Wait behind rebinds */
 	if (!xe_vm_in_lr_mode(vm)) {
 		err = drm_sched_job_add_resv_dependencies(&job->drm,
 							  xe_vm_resv(vm),
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 21bc0d13fccf..0484ed5b495f 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1298,7 +1298,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
 		}
 
 		/* add shared fence now for pagetable delayed destroy */
-		dma_resv_add_fence(xe_vm_resv(vm), fence, !rebind &&
+		dma_resv_add_fence(xe_vm_resv(vm), fence, rebind ||
 				   last_munmap_rebind ?
 				   DMA_RESV_USAGE_KERNEL :
 				   DMA_RESV_USAGE_BOOKKEEP);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 80d43d75b1da..35fba6e3f889 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -522,7 +522,6 @@ static void preempt_rebind_work_func(struct work_struct *w)
 {
 	struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work);
 	struct drm_exec exec;
-	struct dma_fence *rebind_fence;
 	unsigned int fence_count = 0;
 	LIST_HEAD(preempt_fences);
 	ktime_t end = 0;
@@ -568,18 +567,11 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	if (err)
 		goto out_unlock;
 
-	rebind_fence = xe_vm_rebind(vm, true);
-	if (IS_ERR(rebind_fence)) {
-		err = PTR_ERR(rebind_fence);
+	err = xe_vm_rebind(vm, true);
+	if (err)
 		goto out_unlock;
-	}
-
-	if (rebind_fence) {
-		dma_fence_wait(rebind_fence, false);
-		dma_fence_put(rebind_fence);
-	}
 
-	/* Wait on munmap style VM unbinds */
+	/* Wait on rebinds and munmap style VM unbinds */
 	wait = dma_resv_wait_timeout(xe_vm_resv(vm),
 				     DMA_RESV_USAGE_KERNEL,
 				     false, MAX_SCHEDULE_TIMEOUT);
@@ -773,14 +765,14 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_exec_queue *q,
 	       struct xe_sync_entry *syncs, u32 num_syncs,
 	       bool first_op, bool last_op);
 
-struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
+int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 {
-	struct dma_fence *fence = NULL;
+	struct dma_fence *fence;
 	struct xe_vma *vma, *next;
 
 	lockdep_assert_held(&vm->lock);
 	if (xe_vm_in_lr_mode(vm) && !rebind_worker)
-		return NULL;
+		return 0;
 
 	xe_vm_assert_held(vm);
 	list_for_each_entry_safe(vma, next, &vm->rebind_list,
@@ -788,17 +780,17 @@ struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 		xe_assert(vm->xe, vma->tile_present);
 
 		list_del_init(&vma->combined_links.rebind);
-		dma_fence_put(fence);
 		if (rebind_worker)
 			trace_xe_vma_rebind_worker(vma);
 		else
 			trace_xe_vma_rebind_exec(vma);
 		fence = xe_vm_bind_vma(vma, NULL, NULL, 0, false, false);
 		if (IS_ERR(fence))
-			return fence;
+			return PTR_ERR(fence);
+		dma_fence_put(fence);
 	}
 
-	return fence;
+	return 0;
 }
 
 static void xe_vma_free(struct xe_vma *vma)
@@ -1588,7 +1580,6 @@ static void vm_destroy_work_func(struct work_struct *w)
 		XE_WARN_ON(vm->pt_root[id]);
 
 	trace_xe_vm_free(vm);
-	dma_fence_put(vm->rebind_fence);
 	kfree(vm);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 6df1f1c7f85d..4853354336f2 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -207,7 +207,7 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm);
 
 int xe_vm_userptr_check_repin(struct xe_vm *vm);
 
-struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
+int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
 
 int xe_vm_invalidate_vma(struct xe_vma *vma);
 
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 5747f136d24d..badf3945083d 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -177,9 +177,6 @@ struct xe_vm {
 	 */
 	struct list_head rebind_list;
 
-	/** @rebind_fence: rebind fence from execbuf */
-	struct dma_fence *rebind_fence;
-
 	/**
 	 * @destroy_work: worker to destroy VM, needed as a dma_fence signaling
 	 * from an irq context can be last put and the destroy needs to be able
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/7] drm/xe: Rework rebinding
  2024-03-21 11:37 ` [PATCH 2/7] drm/xe: Rework rebinding Thomas Hellström
@ 2024-03-21 19:14   ` Matthew Brost
  2024-03-21 21:16     ` Thomas Hellström
  0 siblings, 1 reply; 14+ messages in thread
From: Matthew Brost @ 2024-03-21 19:14 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe, Rodrigo Vivi, stable

On Thu, Mar 21, 2024 at 12:37:12PM +0100, Thomas Hellström wrote:
> Instead of handling the vm's rebind fence separately,
> which is error prone if they are not strictly ordered,
> attach rebind fences as kernel fences to the vm's resv.
> 

See comment from previous, do not like updates to __xe_pt_bind_vma but I
guess I can live with it. Otherwise LGTM.

With that:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec.c     | 31 +++----------------------------
>  drivers/gpu/drm/xe/xe_pt.c       |  2 +-
>  drivers/gpu/drm/xe/xe_vm.c       | 27 +++++++++------------------
>  drivers/gpu/drm/xe/xe_vm.h       |  2 +-
>  drivers/gpu/drm/xe/xe_vm_types.h |  3 ---
>  5 files changed, 14 insertions(+), 51 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index 7692ebfe7d47..759497d4a102 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -152,7 +152,6 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  	struct drm_exec *exec = &vm_exec.exec;
>  	u32 i, num_syncs = 0, num_ufence = 0;
>  	struct xe_sched_job *job;
> -	struct dma_fence *rebind_fence;
>  	struct xe_vm *vm;
>  	bool write_locked, skip_retry = false;
>  	ktime_t end = 0;
> @@ -294,35 +293,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  	 * Rebind any invalidated userptr or evicted BOs in the VM, non-compute
>  	 * VM mode only.
>  	 */
> -	rebind_fence = xe_vm_rebind(vm, false);
> -	if (IS_ERR(rebind_fence)) {
> -		err = PTR_ERR(rebind_fence);
> +	err = xe_vm_rebind(vm, false);
> +	if (err)
>  		goto err_put_job;
> -	}
> -
> -	/*
> -	 * We store the rebind_fence in the VM so subsequent execs don't get
> -	 * scheduled before the rebinds of userptrs / evicted BOs is complete.
> -	 */
> -	if (rebind_fence) {
> -		dma_fence_put(vm->rebind_fence);
> -		vm->rebind_fence = rebind_fence;
> -	}
> -	if (vm->rebind_fence) {
> -		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> -			     &vm->rebind_fence->flags)) {
> -			dma_fence_put(vm->rebind_fence);
> -			vm->rebind_fence = NULL;
> -		} else {
> -			dma_fence_get(vm->rebind_fence);
> -			err = drm_sched_job_add_dependency(&job->drm,
> -							   vm->rebind_fence);
> -			if (err)
> -				goto err_put_job;
> -		}
> -	}
>  
> -	/* Wait behind munmap style rebinds */
> +	/* Wait behind rebinds */
>  	if (!xe_vm_in_lr_mode(vm)) {
>  		err = drm_sched_job_add_resv_dependencies(&job->drm,
>  							  xe_vm_resv(vm),
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 21bc0d13fccf..0484ed5b495f 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1298,7 +1298,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
>  		}
>  
>  		/* add shared fence now for pagetable delayed destroy */
> -		dma_resv_add_fence(xe_vm_resv(vm), fence, !rebind &&
> +		dma_resv_add_fence(xe_vm_resv(vm), fence, rebind ||
>  				   last_munmap_rebind ?
>  				   DMA_RESV_USAGE_KERNEL :
>  				   DMA_RESV_USAGE_BOOKKEEP);
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 80d43d75b1da..35fba6e3f889 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -522,7 +522,6 @@ static void preempt_rebind_work_func(struct work_struct *w)
>  {
>  	struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work);
>  	struct drm_exec exec;
> -	struct dma_fence *rebind_fence;
>  	unsigned int fence_count = 0;
>  	LIST_HEAD(preempt_fences);
>  	ktime_t end = 0;
> @@ -568,18 +567,11 @@ static void preempt_rebind_work_func(struct work_struct *w)
>  	if (err)
>  		goto out_unlock;
>  
> -	rebind_fence = xe_vm_rebind(vm, true);
> -	if (IS_ERR(rebind_fence)) {
> -		err = PTR_ERR(rebind_fence);
> +	err = xe_vm_rebind(vm, true);
> +	if (err)
>  		goto out_unlock;
> -	}
> -
> -	if (rebind_fence) {
> -		dma_fence_wait(rebind_fence, false);
> -		dma_fence_put(rebind_fence);
> -	}
>  
> -	/* Wait on munmap style VM unbinds */
> +	/* Wait on rebinds and munmap style VM unbinds */
>  	wait = dma_resv_wait_timeout(xe_vm_resv(vm),
>  				     DMA_RESV_USAGE_KERNEL,
>  				     false, MAX_SCHEDULE_TIMEOUT);
> @@ -773,14 +765,14 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_exec_queue *q,
>  	       struct xe_sync_entry *syncs, u32 num_syncs,
>  	       bool first_op, bool last_op);
>  
> -struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
> +int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
>  {
> -	struct dma_fence *fence = NULL;
> +	struct dma_fence *fence;
>  	struct xe_vma *vma, *next;
>  
>  	lockdep_assert_held(&vm->lock);
>  	if (xe_vm_in_lr_mode(vm) && !rebind_worker)
> -		return NULL;
> +		return 0;
>  
>  	xe_vm_assert_held(vm);
>  	list_for_each_entry_safe(vma, next, &vm->rebind_list,
> @@ -788,17 +780,17 @@ struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
>  		xe_assert(vm->xe, vma->tile_present);
>  
>  		list_del_init(&vma->combined_links.rebind);
> -		dma_fence_put(fence);
>  		if (rebind_worker)
>  			trace_xe_vma_rebind_worker(vma);
>  		else
>  			trace_xe_vma_rebind_exec(vma);
>  		fence = xe_vm_bind_vma(vma, NULL, NULL, 0, false, false);
>  		if (IS_ERR(fence))
> -			return fence;
> +			return PTR_ERR(fence);
> +		dma_fence_put(fence);
>  	}
>  
> -	return fence;
> +	return 0;
>  }
>  
>  static void xe_vma_free(struct xe_vma *vma)
> @@ -1588,7 +1580,6 @@ static void vm_destroy_work_func(struct work_struct *w)
>  		XE_WARN_ON(vm->pt_root[id]);
>  
>  	trace_xe_vm_free(vm);
> -	dma_fence_put(vm->rebind_fence);
>  	kfree(vm);
>  }
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 6df1f1c7f85d..4853354336f2 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -207,7 +207,7 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm);
>  
>  int xe_vm_userptr_check_repin(struct xe_vm *vm);
>  
> -struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
> +int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
>  
>  int xe_vm_invalidate_vma(struct xe_vma *vma);
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index 5747f136d24d..badf3945083d 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -177,9 +177,6 @@ struct xe_vm {
>  	 */
>  	struct list_head rebind_list;
>  
> -	/** @rebind_fence: rebind fence from execbuf */
> -	struct dma_fence *rebind_fence;
> -
>  	/**
>  	 * @destroy_work: worker to destroy VM, needed as a dma_fence signaling
>  	 * from an irq context can be last put and the destroy needs to be able
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/7] drm/xe: Rework rebinding
  2024-03-21 19:14   ` Matthew Brost
@ 2024-03-21 21:16     ` Thomas Hellström
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2024-03-21 21:16 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Rodrigo Vivi, stable

On Thu, 2024-03-21 at 19:14 +0000, Matthew Brost wrote:
> On Thu, Mar 21, 2024 at 12:37:12PM +0100, Thomas Hellström wrote:
> > Instead of handling the vm's rebind fence separately,
> > which is error prone if they are not strictly ordered,
> > attach rebind fences as kernel fences to the vm's resv.
> > 
> 
> See comment from previous, do not like updates to __xe_pt_bind_vma
> but I
> guess I can live with it. Otherwise LGTM.

Thanks, yeah same reason there, unfortunately.

/Thomas


> 
> With that:
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> 
> > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel
> > GPUs")
> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: <stable@vger.kernel.org> # v6.8+
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_exec.c     | 31 +++-------------------------
> > ---
> >  drivers/gpu/drm/xe/xe_pt.c       |  2 +-
> >  drivers/gpu/drm/xe/xe_vm.c       | 27 +++++++++------------------
> >  drivers/gpu/drm/xe/xe_vm.h       |  2 +-
> >  drivers/gpu/drm/xe/xe_vm_types.h |  3 ---
> >  5 files changed, 14 insertions(+), 51 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c
> > b/drivers/gpu/drm/xe/xe_exec.c
> > index 7692ebfe7d47..759497d4a102 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -152,7 +152,6 @@ int xe_exec_ioctl(struct drm_device *dev, void
> > *data, struct drm_file *file)
> >  	struct drm_exec *exec = &vm_exec.exec;
> >  	u32 i, num_syncs = 0, num_ufence = 0;
> >  	struct xe_sched_job *job;
> > -	struct dma_fence *rebind_fence;
> >  	struct xe_vm *vm;
> >  	bool write_locked, skip_retry = false;
> >  	ktime_t end = 0;
> > @@ -294,35 +293,11 @@ int xe_exec_ioctl(struct drm_device *dev,
> > void *data, struct drm_file *file)
> >  	 * Rebind any invalidated userptr or evicted BOs in the
> > VM, non-compute
> >  	 * VM mode only.
> >  	 */
> > -	rebind_fence = xe_vm_rebind(vm, false);
> > -	if (IS_ERR(rebind_fence)) {
> > -		err = PTR_ERR(rebind_fence);
> > +	err = xe_vm_rebind(vm, false);
> > +	if (err)
> >  		goto err_put_job;
> > -	}
> > -
> > -	/*
> > -	 * We store the rebind_fence in the VM so subsequent execs
> > don't get
> > -	 * scheduled before the rebinds of userptrs / evicted BOs
> > is complete.
> > -	 */
> > -	if (rebind_fence) {
> > -		dma_fence_put(vm->rebind_fence);
> > -		vm->rebind_fence = rebind_fence;
> > -	}
> > -	if (vm->rebind_fence) {
> > -		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > -			     &vm->rebind_fence->flags)) {
> > -			dma_fence_put(vm->rebind_fence);
> > -			vm->rebind_fence = NULL;
> > -		} else {
> > -			dma_fence_get(vm->rebind_fence);
> > -			err = drm_sched_job_add_dependency(&job-
> > >drm,
> > -							   vm-
> > >rebind_fence);
> > -			if (err)
> > -				goto err_put_job;
> > -		}
> > -	}
> >  
> > -	/* Wait behind munmap style rebinds */
> > +	/* Wait behind rebinds */
> >  	if (!xe_vm_in_lr_mode(vm)) {
> >  		err = drm_sched_job_add_resv_dependencies(&job-
> > >drm,
> >  							 
> > xe_vm_resv(vm),
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > b/drivers/gpu/drm/xe/xe_pt.c
> > index 21bc0d13fccf..0484ed5b495f 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -1298,7 +1298,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct
> > xe_vma *vma, struct xe_exec_queue
> >  		}
> >  
> >  		/* add shared fence now for pagetable delayed
> > destroy */
> > -		dma_resv_add_fence(xe_vm_resv(vm), fence, !rebind
> > &&
> > +		dma_resv_add_fence(xe_vm_resv(vm), fence, rebind
> > ||
> >  				   last_munmap_rebind ?
> >  				   DMA_RESV_USAGE_KERNEL :
> >  				   DMA_RESV_USAGE_BOOKKEEP);
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > b/drivers/gpu/drm/xe/xe_vm.c
> > index 80d43d75b1da..35fba6e3f889 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -522,7 +522,6 @@ static void preempt_rebind_work_func(struct
> > work_struct *w)
> >  {
> >  	struct xe_vm *vm = container_of(w, struct xe_vm,
> > preempt.rebind_work);
> >  	struct drm_exec exec;
> > -	struct dma_fence *rebind_fence;
> >  	unsigned int fence_count = 0;
> >  	LIST_HEAD(preempt_fences);
> >  	ktime_t end = 0;
> > @@ -568,18 +567,11 @@ static void preempt_rebind_work_func(struct
> > work_struct *w)
> >  	if (err)
> >  		goto out_unlock;
> >  
> > -	rebind_fence = xe_vm_rebind(vm, true);
> > -	if (IS_ERR(rebind_fence)) {
> > -		err = PTR_ERR(rebind_fence);
> > +	err = xe_vm_rebind(vm, true);
> > +	if (err)
> >  		goto out_unlock;
> > -	}
> > -
> > -	if (rebind_fence) {
> > -		dma_fence_wait(rebind_fence, false);
> > -		dma_fence_put(rebind_fence);
> > -	}
> >  
> > -	/* Wait on munmap style VM unbinds */
> > +	/* Wait on rebinds and munmap style VM unbinds */
> >  	wait = dma_resv_wait_timeout(xe_vm_resv(vm),
> >  				     DMA_RESV_USAGE_KERNEL,
> >  				     false, MAX_SCHEDULE_TIMEOUT);
> > @@ -773,14 +765,14 @@ xe_vm_bind_vma(struct xe_vma *vma, struct
> > xe_exec_queue *q,
> >  	       struct xe_sync_entry *syncs, u32 num_syncs,
> >  	       bool first_op, bool last_op);
> >  
> > -struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool
> > rebind_worker)
> > +int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
> >  {
> > -	struct dma_fence *fence = NULL;
> > +	struct dma_fence *fence;
> >  	struct xe_vma *vma, *next;
> >  
> >  	lockdep_assert_held(&vm->lock);
> >  	if (xe_vm_in_lr_mode(vm) && !rebind_worker)
> > -		return NULL;
> > +		return 0;
> >  
> >  	xe_vm_assert_held(vm);
> >  	list_for_each_entry_safe(vma, next, &vm->rebind_list,
> > @@ -788,17 +780,17 @@ struct dma_fence *xe_vm_rebind(struct xe_vm
> > *vm, bool rebind_worker)
> >  		xe_assert(vm->xe, vma->tile_present);
> >  
> >  		list_del_init(&vma->combined_links.rebind);
> > -		dma_fence_put(fence);
> >  		if (rebind_worker)
> >  			trace_xe_vma_rebind_worker(vma);
> >  		else
> >  			trace_xe_vma_rebind_exec(vma);
> >  		fence = xe_vm_bind_vma(vma, NULL, NULL, 0, false,
> > false);
> >  		if (IS_ERR(fence))
> > -			return fence;
> > +			return PTR_ERR(fence);
> > +		dma_fence_put(fence);
> >  	}
> >  
> > -	return fence;
> > +	return 0;
> >  }
> >  
> >  static void xe_vma_free(struct xe_vma *vma)
> > @@ -1588,7 +1580,6 @@ static void vm_destroy_work_func(struct
> > work_struct *w)
> >  		XE_WARN_ON(vm->pt_root[id]);
> >  
> >  	trace_xe_vm_free(vm);
> > -	dma_fence_put(vm->rebind_fence);
> >  	kfree(vm);
> >  }
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_vm.h
> > b/drivers/gpu/drm/xe/xe_vm.h
> > index 6df1f1c7f85d..4853354336f2 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.h
> > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > @@ -207,7 +207,7 @@ int __xe_vm_userptr_needs_repin(struct xe_vm
> > *vm);
> >  
> >  int xe_vm_userptr_check_repin(struct xe_vm *vm);
> >  
> > -struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool
> > rebind_worker);
> > +int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
> >  
> >  int xe_vm_invalidate_vma(struct xe_vma *vma);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> > b/drivers/gpu/drm/xe/xe_vm_types.h
> > index 5747f136d24d..badf3945083d 100644
> > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > @@ -177,9 +177,6 @@ struct xe_vm {
> >  	 */
> >  	struct list_head rebind_list;
> >  
> > -	/** @rebind_fence: rebind fence from execbuf */
> > -	struct dma_fence *rebind_fence;
> > -
> >  	/**
> >  	 * @destroy_work: worker to destroy VM, needed as a
> > dma_fence signaling
> >  	 * from an irq context can be last put and the destroy
> > needs to be able
> > -- 
> > 2.44.0
> > 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/7] drm/xe: Use ring ops TLB invalidation for rebinds
       [not found] <20240321113720.120865-1-thomas.hellstrom@linux.intel.com>
                   ` (2 preceding siblings ...)
  2024-03-21 11:37 ` [PATCH 2/7] drm/xe: Rework rebinding Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-21 11:37 ` [PATCH 3/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
  2024-03-21 11:37 ` [PATCH 4/7] drm/xe: Rework rebinding Thomas Hellström
  5 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Matthew Brost, stable

For each rebind we insert a GuC TLB invalidation and add a
corresponding unordered TLB invalidation fence. This might
add a huge number of TLB invalidation fences to wait for so
rather than doing that, defer the TLB invalidation to the
next ring ops for each affected exec queue. Since the TLB
is invalidated on exec_queue switch, we need to invalidate
once for each affected exec_queue.

Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after rebinds issued from execs")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 ++
 drivers/gpu/drm/xe/xe_pt.c               |  5 +++--
 drivers/gpu/drm/xe/xe_ring_ops.c         | 11 ++++-------
 drivers/gpu/drm/xe/xe_sched_job.c        | 11 +++++++++++
 drivers/gpu/drm/xe/xe_sched_job_types.h  |  2 ++
 drivers/gpu/drm/xe/xe_vm_types.h         |  5 +++++
 6 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 62b3d9d1d7cd..891ad30e906f 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -148,6 +148,8 @@ struct xe_exec_queue {
 	const struct xe_ring_ops *ring_ops;
 	/** @entity: DRM sched entity for this exec queue (1 to 1 relationship) */
 	struct drm_sched_entity *entity;
+	/** @tlb_flush_seqno: The seqno of the last rebind tlb flush performed */
+	u64 tlb_flush_seqno;
 	/** @lrc: logical ring context for this exec queue */
 	struct xe_lrc lrc[];
 };
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index a5bb8d20f815..109c26e5689e 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1253,11 +1253,12 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
 	 * non-faulting LR, in particular on user-space batch buffer chaining,
 	 * it needs to be done here.
 	 */
-	if ((rebind && !xe_vm_in_lr_mode(vm) && !vm->batch_invalidate_tlb) ||
-	    (!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) {
+	if ((!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) {
 		ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
 		if (!ifence)
 			return ERR_PTR(-ENOMEM);
+	} else if (rebind && !xe_vm_in_lr_mode(vm) && !vm->batch_invalidate_tlb) {
+		vm->tlb_flush_seqno++;
 	}
 
 	rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index c4edffcd4a32..5b2b37b59813 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -219,10 +219,9 @@ static void __emit_job_gen12_simple(struct xe_sched_job *job, struct xe_lrc *lrc
 {
 	u32 dw[MAX_JOB_SIZE_DW], i = 0;
 	u32 ppgtt_flag = get_ppgtt_flag(job);
-	struct xe_vm *vm = job->q->vm;
 	struct xe_gt *gt = job->q->gt;
 
-	if (vm && vm->batch_invalidate_tlb) {
+	if (job->ring_ops_flush_tlb) {
 		dw[i++] = preparser_disable(true);
 		i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
 					seqno, true, dw, i);
@@ -270,7 +269,6 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
 	struct xe_gt *gt = job->q->gt;
 	struct xe_device *xe = gt_to_xe(gt);
 	bool decode = job->q->class == XE_ENGINE_CLASS_VIDEO_DECODE;
-	struct xe_vm *vm = job->q->vm;
 
 	dw[i++] = preparser_disable(true);
 
@@ -282,13 +280,13 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
 			i = emit_aux_table_inv(gt, VE0_AUX_INV, dw, i);
 	}
 
-	if (vm && vm->batch_invalidate_tlb)
+	if (job->ring_ops_flush_tlb)
 		i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
 					seqno, true, dw, i);
 
 	dw[i++] = preparser_disable(false);
 
-	if (!vm || !vm->batch_invalidate_tlb)
+	if (!job->ring_ops_flush_tlb)
 		i = emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
 					seqno, dw, i);
 
@@ -317,7 +315,6 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 	struct xe_gt *gt = job->q->gt;
 	struct xe_device *xe = gt_to_xe(gt);
 	bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
-	struct xe_vm *vm = job->q->vm;
 	u32 mask_flags = 0;
 
 	dw[i++] = preparser_disable(true);
@@ -327,7 +324,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 		mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
 
 	/* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */
-	i = emit_pipe_invalidate(mask_flags, vm && vm->batch_invalidate_tlb, dw, i);
+	i = emit_pipe_invalidate(mask_flags, job->ring_ops_flush_tlb, dw, i);
 
 	/* hsdes: 1809175790 */
 	if (has_aux_ccs(xe))
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index 8151ddafb940..d55458d915a9 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -250,6 +250,17 @@ bool xe_sched_job_completed(struct xe_sched_job *job)
 
 void xe_sched_job_arm(struct xe_sched_job *job)
 {
+	struct xe_exec_queue *q = job->q;
+	struct xe_vm *vm = q->vm;
+
+	if (vm && !xe_sched_job_is_migration(q) && !xe_vm_in_lr_mode(vm) &&
+	    vm->tlb_flush_seqno != q->tlb_flush_seqno) {
+		q->tlb_flush_seqno = vm->tlb_flush_seqno;
+		job->ring_ops_flush_tlb = true;
+	} else if (vm && vm->batch_invalidate_tlb) {
+		job->ring_ops_flush_tlb = true;
+	}
+
 	drm_sched_job_arm(&job->drm);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
index b1d83da50a53..5e12724219fd 100644
--- a/drivers/gpu/drm/xe/xe_sched_job_types.h
+++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
@@ -39,6 +39,8 @@ struct xe_sched_job {
 	} user_fence;
 	/** @migrate_flush_flags: Additional flush flags for migration jobs */
 	u32 migrate_flush_flags;
+	/** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */
+	bool ring_ops_flush_tlb;
 	/** @batch_addr: batch buffer address of job */
 	u64 batch_addr[];
 };
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index ae5fb565f6bf..5747f136d24d 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -264,6 +264,11 @@ struct xe_vm {
 		bool capture_once;
 	} error_capture;
 
+	/**
+	 * @tlb_flush_seqno: Required TLB flush seqno for the next exec.
+	 * protected by the vm resv.
+	 */
+	u64 tlb_flush_seqno;
 	/** @batch_invalidate_tlb: Always invalidate TLB before batch start */
 	bool batch_invalidate_tlb;
 	/** @xef: XE file handle for tracking this VM's drm client */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/7] drm/xe: Make TLB invalidation fences unordered
       [not found] <20240321113720.120865-1-thomas.hellstrom@linux.intel.com>
                   ` (3 preceding siblings ...)
  2024-03-21 11:37 ` [PATCH 2/7] drm/xe: Use ring ops TLB invalidation for rebinds Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-22 17:55   ` Matthew Brost
  2024-03-21 11:37 ` [PATCH 4/7] drm/xe: Rework rebinding Thomas Hellström
  5 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Matthew Brost, stable

They can actually complete out-of-order, so allocate a unique
fence context for each fence.

Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after rebinds issued from execs")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 1 -
 drivers/gpu/drm/xe/xe_gt_types.h            | 7 -------
 drivers/gpu/drm/xe/xe_pt.c                  | 3 +--
 3 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index a3c4ffba679d..787cba5e49a1 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -63,7 +63,6 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 	INIT_LIST_HEAD(&gt->tlb_invalidation.pending_fences);
 	spin_lock_init(&gt->tlb_invalidation.pending_lock);
 	spin_lock_init(&gt->tlb_invalidation.lock);
-	gt->tlb_invalidation.fence_context = dma_fence_context_alloc(1);
 	INIT_DELAYED_WORK(&gt->tlb_invalidation.fence_tdr,
 			  xe_gt_tlb_fence_timeout);
 
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index f6da2ad9719f..2143dffcaf11 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -179,13 +179,6 @@ struct xe_gt {
 		 * xe_gt_tlb_fence_timeout after the timeut interval is over.
 		 */
 		struct delayed_work fence_tdr;
-		/** @tlb_invalidation.fence_context: context for TLB invalidation fences */
-		u64 fence_context;
-		/**
-		 * @tlb_invalidation.fence_seqno: seqno to TLB invalidation fences, protected by
-		 * tlb_invalidation.lock
-		 */
-		u32 fence_seqno;
 		/** @tlb_invalidation.lock: protects TLB invalidation fences */
 		spinlock_t lock;
 	} tlb_invalidation;
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 0484ed5b495f..045a8c0845ba 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1135,8 +1135,7 @@ static int invalidation_fence_init(struct xe_gt *gt,
 	spin_lock_irq(&gt->tlb_invalidation.lock);
 	dma_fence_init(&ifence->base.base, &invalidation_fence_ops,
 		       &gt->tlb_invalidation.lock,
-		       gt->tlb_invalidation.fence_context,
-		       ++gt->tlb_invalidation.fence_seqno);
+		       dma_fence_context_alloc(1), 1);
 	spin_unlock_irq(&gt->tlb_invalidation.lock);
 
 	INIT_LIST_HEAD(&ifence->base.link);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/7] drm/xe: Make TLB invalidation fences unordered
  2024-03-21 11:37 ` [PATCH 3/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
@ 2024-03-22 17:55   ` Matthew Brost
  0 siblings, 0 replies; 14+ messages in thread
From: Matthew Brost @ 2024-03-22 17:55 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe, stable

On Thu, Mar 21, 2024 at 12:37:14PM +0100, Thomas Hellström wrote:
> They can actually complete out-of-order, so allocate a unique
> fence context for each fence.
> 

Yes indeed these can complete out ordered on different xe_exec_queue but
should be ordered within an xe_exec_queue.

In addition to this patch I think we will need [1] too.

This patch does LGTM though, with that:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

[1] https://patchwork.freedesktop.org/patch/582006/?series=125608&rev=5

> Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after rebinds issued from execs")
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 1 -
>  drivers/gpu/drm/xe/xe_gt_types.h            | 7 -------
>  drivers/gpu/drm/xe/xe_pt.c                  | 3 +--
>  3 files changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> index a3c4ffba679d..787cba5e49a1 100644
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> @@ -63,7 +63,6 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
>  	INIT_LIST_HEAD(&gt->tlb_invalidation.pending_fences);
>  	spin_lock_init(&gt->tlb_invalidation.pending_lock);
>  	spin_lock_init(&gt->tlb_invalidation.lock);
> -	gt->tlb_invalidation.fence_context = dma_fence_context_alloc(1);
>  	INIT_DELAYED_WORK(&gt->tlb_invalidation.fence_tdr,
>  			  xe_gt_tlb_fence_timeout);
>  
> diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
> index f6da2ad9719f..2143dffcaf11 100644
> --- a/drivers/gpu/drm/xe/xe_gt_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_types.h
> @@ -179,13 +179,6 @@ struct xe_gt {
>  		 * xe_gt_tlb_fence_timeout after the timeut interval is over.
>  		 */
>  		struct delayed_work fence_tdr;
> -		/** @tlb_invalidation.fence_context: context for TLB invalidation fences */
> -		u64 fence_context;
> -		/**
> -		 * @tlb_invalidation.fence_seqno: seqno to TLB invalidation fences, protected by
> -		 * tlb_invalidation.lock
> -		 */
> -		u32 fence_seqno;
>  		/** @tlb_invalidation.lock: protects TLB invalidation fences */
>  		spinlock_t lock;
>  	} tlb_invalidation;
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 0484ed5b495f..045a8c0845ba 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1135,8 +1135,7 @@ static int invalidation_fence_init(struct xe_gt *gt,
>  	spin_lock_irq(&gt->tlb_invalidation.lock);
>  	dma_fence_init(&ifence->base.base, &invalidation_fence_ops,
>  		       &gt->tlb_invalidation.lock,
> -		       gt->tlb_invalidation.fence_context,
> -		       ++gt->tlb_invalidation.fence_seqno);
> +		       dma_fence_context_alloc(1), 1);
>  	spin_unlock_irq(&gt->tlb_invalidation.lock);
>  
>  	INIT_LIST_HEAD(&ifence->base.link);
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/7] drm/xe: Rework rebinding
       [not found] <20240321113720.120865-1-thomas.hellstrom@linux.intel.com>
                   ` (4 preceding siblings ...)
  2024-03-21 11:37 ` [PATCH 3/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  5 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Rodrigo Vivi, Matthew Brost, stable

Instead of handling the vm's rebind fence separately,
which is error prone if they are not strictly ordered,
attach rebind fences as kernel fences to the vm's resv.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_exec.c     | 31 +++----------------------------
 drivers/gpu/drm/xe/xe_pt.c       |  2 +-
 drivers/gpu/drm/xe/xe_vm.c       | 27 +++++++++------------------
 drivers/gpu/drm/xe/xe_vm.h       |  2 +-
 drivers/gpu/drm/xe/xe_vm_types.h |  3 ---
 5 files changed, 14 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 511d23426a5e..397a49b731f1 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -127,7 +127,6 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	struct drm_exec *exec = &vm_exec.exec;
 	u32 i, num_syncs = 0, num_ufence = 0;
 	struct xe_sched_job *job;
-	struct dma_fence *rebind_fence;
 	struct xe_vm *vm;
 	bool write_locked, skip_retry = false;
 	ktime_t end = 0;
@@ -269,35 +268,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	 * Rebind any invalidated userptr or evicted BOs in the VM, non-compute
 	 * VM mode only.
 	 */
-	rebind_fence = xe_vm_rebind(vm, false);
-	if (IS_ERR(rebind_fence)) {
-		err = PTR_ERR(rebind_fence);
+	err = xe_vm_rebind(vm, false);
+	if (err)
 		goto err_put_job;
-	}
-
-	/*
-	 * We store the rebind_fence in the VM so subsequent execs don't get
-	 * scheduled before the rebinds of userptrs / evicted BOs is complete.
-	 */
-	if (rebind_fence) {
-		dma_fence_put(vm->rebind_fence);
-		vm->rebind_fence = rebind_fence;
-	}
-	if (vm->rebind_fence) {
-		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-			     &vm->rebind_fence->flags)) {
-			dma_fence_put(vm->rebind_fence);
-			vm->rebind_fence = NULL;
-		} else {
-			dma_fence_get(vm->rebind_fence);
-			err = drm_sched_job_add_dependency(&job->drm,
-							   vm->rebind_fence);
-			if (err)
-				goto err_put_job;
-		}
-	}
 
-	/* Wait behind munmap style rebinds */
+	/* Wait behind rebinds */
 	if (!xe_vm_in_lr_mode(vm)) {
 		err = drm_sched_job_add_resv_dependencies(&job->drm,
 							  xe_vm_resv(vm),
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 7add08b2954e..92cd660fbcef 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1304,7 +1304,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
 		}
 
 		/* add shared fence now for pagetable delayed destroy */
-		dma_resv_add_fence(xe_vm_resv(vm), fence, !rebind &&
+		dma_resv_add_fence(xe_vm_resv(vm), fence, rebind ||
 				   last_munmap_rebind ?
 				   DMA_RESV_USAGE_KERNEL :
 				   DMA_RESV_USAGE_BOOKKEEP);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index cff8db605743..33a85e702e2c 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -529,7 +529,6 @@ static void preempt_rebind_work_func(struct work_struct *w)
 {
 	struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work);
 	struct drm_exec exec;
-	struct dma_fence *rebind_fence;
 	unsigned int fence_count = 0;
 	LIST_HEAD(preempt_fences);
 	ktime_t end = 0;
@@ -575,18 +574,11 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	if (err)
 		goto out_unlock;
 
-	rebind_fence = xe_vm_rebind(vm, true);
-	if (IS_ERR(rebind_fence)) {
-		err = PTR_ERR(rebind_fence);
+	err = xe_vm_rebind(vm, true);
+	if (err)
 		goto out_unlock;
-	}
-
-	if (rebind_fence) {
-		dma_fence_wait(rebind_fence, false);
-		dma_fence_put(rebind_fence);
-	}
 
-	/* Wait on munmap style VM unbinds */
+	/* Wait on rebinds and munmap style VM unbinds */
 	wait = dma_resv_wait_timeout(xe_vm_resv(vm),
 				     DMA_RESV_USAGE_KERNEL,
 				     false, MAX_SCHEDULE_TIMEOUT);
@@ -780,14 +772,14 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_exec_queue *q,
 	       struct xe_sync_entry *syncs, u32 num_syncs,
 	       bool first_op, bool last_op);
 
-struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
+int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 {
-	struct dma_fence *fence = NULL;
+	struct dma_fence *fence;
 	struct xe_vma *vma, *next;
 
 	lockdep_assert_held(&vm->lock);
 	if (xe_vm_in_lr_mode(vm) && !rebind_worker)
-		return NULL;
+		return 0;
 
 	xe_vm_assert_held(vm);
 	list_for_each_entry_safe(vma, next, &vm->rebind_list,
@@ -795,17 +787,17 @@ struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 		xe_assert(vm->xe, vma->tile_present);
 
 		list_del_init(&vma->combined_links.rebind);
-		dma_fence_put(fence);
 		if (rebind_worker)
 			trace_xe_vma_rebind_worker(vma);
 		else
 			trace_xe_vma_rebind_exec(vma);
 		fence = xe_vm_bind_vma(vma, NULL, NULL, 0, false, false);
 		if (IS_ERR(fence))
-			return fence;
+			return PTR_ERR(fence);
+		dma_fence_put(fence);
 	}
 
-	return fence;
+	return 0;
 }
 
 static void xe_vma_free(struct xe_vma *vma)
@@ -1586,7 +1578,6 @@ static void vm_destroy_work_func(struct work_struct *w)
 		XE_WARN_ON(vm->pt_root[id]);
 
 	trace_xe_vm_free(vm);
-	dma_fence_put(vm->rebind_fence);
 	kfree(vm);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 47f3960120a0..20009d8b4702 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -207,7 +207,7 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm);
 
 int xe_vm_userptr_check_repin(struct xe_vm *vm);
 
-struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
+int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
 
 int xe_vm_invalidate_vma(struct xe_vma *vma);
 
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 5747f136d24d..badf3945083d 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -177,9 +177,6 @@ struct xe_vm {
 	 */
 	struct list_head rebind_list;
 
-	/** @rebind_fence: rebind fence from execbuf */
-	struct dma_fence *rebind_fence;
-
 	/**
 	 * @destroy_work: worker to destroy VM, needed as a dma_fence signaling
 	 * from an irq context can be last put and the destroy needs to be able
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-03-22 17:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20240321113720.120865-1-thomas.hellstrom@linux.intel.com>
2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds Thomas Hellström
2024-03-21 19:09   ` Matthew Brost
2024-03-21 21:14     ` Thomas Hellström
2024-03-21 21:21       ` Thomas Hellström
2024-03-22  2:36         ` Matthew Brost
2024-03-21 21:55     ` Thomas Hellström
2024-03-21 11:37 ` [PATCH 2/7] drm/xe: Rework rebinding Thomas Hellström
2024-03-21 19:14   ` Matthew Brost
2024-03-21 21:16     ` Thomas Hellström
2024-03-21 11:37 ` [PATCH 2/7] drm/xe: Use ring ops TLB invalidation for rebinds Thomas Hellström
2024-03-21 11:37 ` [PATCH 3/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
2024-03-22 17:55   ` Matthew Brost
2024-03-21 11:37 ` [PATCH 4/7] drm/xe: Rework rebinding Thomas Hellström

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).