[PATCH v3 00/25] CPU binds and ULLS on migration queue

public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH v3 00/25] CPU binds and ULLS on migration queue
@ 2026-02-28  1:34 Matthew Brost
  2026-02-28  1:34 ` [PATCH v3 01/25] drm/xe: Drop struct xe_migrate_pt_update argument from populate/clear vfuns Matthew Brost
                   ` (30 more replies)
  0 siblings, 31 replies; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

We now have data demonstrating the need for CPU binds and ULLS on the
migration queue, based on results generated from [1].

On BMG, measurements show that when the GPU is continuously processing
faults, copy jobs run approximately 30–40µs faster (depending on the
test case) with ULLS compared to traditional GuC submission with SLPC
enabled on the migration queue. Startup from a cold GPU shows an even
larger speedup. Given the critical nature of fault performance, ULLS
appears to be a worthwhile feature.

In addition to driver telemetry, UMD compute benchmarks consistently
show over 1GB/s improvement in pagefault benchmarks with ULLS enabled.

ULLS will consume more power (not yet measured) due to a continuously
running batch on the paging engine. However, compute UMDs already do
this on engines exposed to users, so this seems like a worthwhile
tradeoff. To mitigate power concerns, ULLS will exit after a period of
time in which no faults have been processed.

CPU binds are required for ULLS to function, as the migration queue
needs exclusive access to the paging hardware engine. Thus, CPU binds
are included here.

Beyond being a requirement for ULLS, CPU binds should also reduce
VM-bind latency, provide clearer multi-tile and TLB-invalidation
layering, reduce pressure on GuC during fault storms as it is bypassed,
and decouple kernel binds from unrelated copy/clear jobs—especially
beneficial when faults are serviced in parallel. In a parallel-faulting
test case, average bind time was reduced by approximately 15µs. In the
worst case, 2MB copy time (~60–140µs) × (number of pagefault threads −
1) of latency would otherwise be added to a single fault. Reducing this
latency increases overall throughput of the fault handler.

This series can be merged in phases:

Phase 1: CPU binds (patches 1–13)
Phase 2: CPU-bind components and multi-tile relayers (patches 14–17)
Phase 3: ULLS on the migration execution queue (patches 18–25)

v2:
 - Use delayed worker to exit ULLS mode in an effort to save on power
 - Various other cleanups
v3:
 - CPU bind component, multi-tile relayer
 - Split CPU bind patches in many small patches

Matt

[1] https://patchwork.freedesktop.org/series/149811/

Matthew Brost (25):
  drm/xe: Drop struct xe_migrate_pt_update argument from populate/clear
    vfuns
  drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper
  drm/xe: Decouple exec queue idle check from LRC
  drm/xe: Add job count to GuC exec queue snapshot
  drm/xe: Update xe_bo_put_deferred arguments to include writeback flag
  drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC
  drm/xe: Update scheduler job layer to support PT jobs
  drm/xe: Add helpers to access PT ops
  drm/xe: Add struct xe_pt_job_ops
  drm/xe: Update GuC submission backend to run PT jobs
  drm/xe: Store level in struct xe_vm_pgtable_update
  drm/xe: Don't use migrate exec queue for page fault binds
  drm/xe: Enable CPU binds for jobs
  drm/xe: Remove unused arguments from xe_migrate_pt_update_ops
  drm/xe: Make bind queues operate cross-tile
  drm/xe: Add CPU bind layer
  drm/xe: Add device flag to enable PT mirroring across tiles
  drm/xe: Add xe_hw_engine_write_ring_tail
  drm/xe: Add ULLS support to LRC
  drm/xe: Add ULLS migration job support to migration layer
  drm/xe: Add MI_SEMAPHORE_WAIT instruction defs
  drm/xe: Add ULLS migration job support to ring ops
  drm/xe: Add ULLS migration job support to GuC submission
  drm/xe: Enter ULLS for migration jobs upon page fault or SVM prefetch
  drm/xe: Add modparam to enable / disable ULLS on migrate queue

 drivers/gpu/drm/xe/Makefile                   |   1 +
 .../gpu/drm/xe/instructions/xe_mi_commands.h  |   6 +
 drivers/gpu/drm/xe/xe_bo.c                    |   8 +-
 drivers/gpu/drm/xe/xe_bo.h                    |  11 +-
 drivers/gpu/drm/xe/xe_bo_types.h              |   2 -
 drivers/gpu/drm/xe/xe_cpu_bind.c              | 296 +++++++
 drivers/gpu/drm/xe/xe_cpu_bind.h              | 118 +++
 drivers/gpu/drm/xe/xe_debugfs.c               |   1 +
 drivers/gpu/drm/xe/xe_defaults.h              |   1 +
 drivers/gpu/drm/xe/xe_device.c                |  17 +-
 drivers/gpu/drm/xe/xe_device_types.h          |  11 +
 drivers/gpu/drm/xe/xe_drm_client.c            |   2 +-
 drivers/gpu/drm/xe/xe_exec_queue.c            | 163 ++--
 drivers/gpu/drm/xe/xe_exec_queue.h            |  18 +-
 drivers/gpu/drm/xe/xe_exec_queue_types.h      |  21 +-
 drivers/gpu/drm/xe/xe_guc_submit.c            |  82 +-
 drivers/gpu/drm/xe/xe_guc_submit_types.h      |   2 +
 drivers/gpu/drm/xe/xe_hw_engine.c             |  10 +
 drivers/gpu/drm/xe/xe_hw_engine.h             |   1 +
 drivers/gpu/drm/xe/xe_lrc.c                   |  51 ++
 drivers/gpu/drm/xe/xe_lrc.h                   |   3 +
 drivers/gpu/drm/xe/xe_lrc_types.h             |   4 +
 drivers/gpu/drm/xe/xe_migrate.c               | 585 +++++--------
 drivers/gpu/drm/xe/xe_migrate.h               |  93 +--
 drivers/gpu/drm/xe/xe_module.c                |   4 +
 drivers/gpu/drm/xe/xe_module.h                |   1 +
 drivers/gpu/drm/xe/xe_pagefault.c             |   3 +
 drivers/gpu/drm/xe/xe_pci.c                   |   2 +
 drivers/gpu/drm/xe/xe_pci_types.h             |   1 +
 drivers/gpu/drm/xe/xe_pt.c                    | 773 +++++++++++-------
 drivers/gpu/drm/xe/xe_pt.h                    |  12 +-
 drivers/gpu/drm/xe/xe_pt_types.h              |  49 +-
 drivers/gpu/drm/xe/xe_ring_ops.c              |  31 +
 drivers/gpu/drm/xe/xe_sched_job.c             | 100 ++-
 drivers/gpu/drm/xe/xe_sched_job_types.h       |  36 +-
 drivers/gpu/drm/xe/xe_sync.c                  |  20 +-
 drivers/gpu/drm/xe/xe_tlb_inval_job.c         |  28 +-
 drivers/gpu/drm/xe/xe_tlb_inval_job.h         |   4 +-
 drivers/gpu/drm/xe/xe_vm.c                    | 241 +++---
 drivers/gpu/drm/xe/xe_vm.h                    |   3 +
 drivers/gpu/drm/xe/xe_vm_types.h              |  22 +-
 41 files changed, 1658 insertions(+), 1179 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.c
 create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 01/25] drm/xe: Drop struct xe_migrate_pt_update argument from populate/clear vfuns
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-03-05 14:17   ` Francois Dugast
  2026-02-28  1:34 ` [PATCH v3 02/25] drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper Matthew Brost
                   ` (29 subsequent siblings)
  30 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Remove the xe_migrate_pt_update argument from the populate and clear
vfuns. This structure will not be available in run_job, where CPU binds
will be implemented. The populate path no longer needs it, and the clear
path already uses the VM field instead.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_migrate.c |  9 +++++----
 drivers/gpu/drm/xe/xe_migrate.h | 12 +++++-------
 drivers/gpu/drm/xe/xe_pt.c      | 12 +++++-------
 3 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 8af6c347bea8..f7e3a044bd78 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1656,6 +1656,7 @@ static void write_pgtable(struct xe_tile *tile, struct xe_bb *bb, u64 ppgtt_ofs,
 			  struct xe_migrate_pt_update *pt_update)
 {
 	const struct xe_migrate_pt_update_ops *ops = pt_update->ops;
+	struct xe_vm *vm = pt_update->vops->vm;
 	u32 chunk;
 	u32 ofs = update->ofs, size = update->qwords;
 
@@ -1687,10 +1688,10 @@ static void write_pgtable(struct xe_tile *tile, struct xe_bb *bb, u64 ppgtt_ofs,
 		bb->cs[bb->len++] = lower_32_bits(addr);
 		bb->cs[bb->len++] = upper_32_bits(addr);
 		if (pt_op->bind)
-			ops->populate(pt_update, tile, NULL, bb->cs + bb->len,
+			ops->populate(tile, NULL, bb->cs + bb->len,
 				      ofs, chunk, update);
 		else
-			ops->clear(pt_update, tile, NULL, bb->cs + bb->len,
+			ops->clear(vm, tile, NULL, bb->cs + bb->len,
 				   ofs, chunk, update);
 
 		bb->len += chunk * 2;
@@ -1747,12 +1748,12 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
 				&pt_op->entries[j];
 
 			if (pt_op->bind)
-				ops->populate(pt_update, m->tile,
+				ops->populate(m->tile,
 					      &update->pt_bo->vmap, NULL,
 					      update->ofs, update->qwords,
 					      update);
 			else
-				ops->clear(pt_update, m->tile,
+				ops->clear(vm, m->tile,
 					   &update->pt_bo->vmap, NULL,
 					   update->ofs, update->qwords, update);
 		}
diff --git a/drivers/gpu/drm/xe/xe_migrate.h b/drivers/gpu/drm/xe/xe_migrate.h
index 1522afb37dcf..c3c0740f908d 100644
--- a/drivers/gpu/drm/xe/xe_migrate.h
+++ b/drivers/gpu/drm/xe/xe_migrate.h
@@ -40,7 +40,6 @@ enum xe_migrate_copy_dir {
 struct xe_migrate_pt_update_ops {
 	/**
 	 * @populate: Populate a command buffer or page-table with ptes.
-	 * @pt_update: Embeddable callback argument.
 	 * @tile: The tile for the current operation.
 	 * @map: struct iosys_map into the memory to be populated.
 	 * @pos: If @map is NULL, map into the memory to be populated.
@@ -52,13 +51,12 @@ struct xe_migrate_pt_update_ops {
 	 * page-table system to populate command buffers or shared
 	 * page-tables with PTEs.
 	 */
-	void (*populate)(struct xe_migrate_pt_update *pt_update,
-			 struct xe_tile *tile, struct iosys_map *map,
+	void (*populate)(struct xe_tile *tile, struct iosys_map *map,
 			 void *pos, u32 ofs, u32 num_qwords,
 			 const struct xe_vm_pgtable_update *update);
 	/**
 	 * @clear: Clear a command buffer or page-table with ptes.
-	 * @pt_update: Embeddable callback argument.
+	 * @vm: VM being updated
 	 * @tile: The tile for the current operation.
 	 * @map: struct iosys_map into the memory to be populated.
 	 * @pos: If @map is NULL, map into the memory to be populated.
@@ -70,9 +68,9 @@ struct xe_migrate_pt_update_ops {
 	 * page-table system to populate command buffers or shared
 	 * page-tables with PTEs.
 	 */
-	void (*clear)(struct xe_migrate_pt_update *pt_update,
-		      struct xe_tile *tile, struct iosys_map *map,
-		      void *pos, u32 ofs, u32 num_qwords,
+	void (*clear)(struct xe_vm *vm, struct xe_tile *tile,
+		      struct iosys_map *map, void *pos, u32 ofs,
+		      u32 num_qwords,
 		      const struct xe_vm_pgtable_update *update);
 
 	/**
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 13b355fadd58..99b15d37267f 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -980,9 +980,8 @@ bool xe_pt_zap_ptes_range(struct xe_tile *tile, struct xe_vm *vm,
 }
 
 static void
-xe_vm_populate_pgtable(struct xe_migrate_pt_update *pt_update, struct xe_tile *tile,
-		       struct iosys_map *map, void *data,
-		       u32 qword_ofs, u32 num_qwords,
+xe_vm_populate_pgtable(struct xe_tile *tile, struct iosys_map *map,
+		       void *data, u32 qword_ofs, u32 num_qwords,
 		       const struct xe_vm_pgtable_update *update)
 {
 	struct xe_pt_entry *ptes = update->pt_entries;
@@ -1811,12 +1810,11 @@ static unsigned int xe_pt_stage_unbind(struct xe_tile *tile,
 }
 
 static void
-xe_migrate_clear_pgtable_callback(struct xe_migrate_pt_update *pt_update,
-				  struct xe_tile *tile, struct iosys_map *map,
-				  void *ptr, u32 qword_ofs, u32 num_qwords,
+xe_migrate_clear_pgtable_callback(struct xe_vm *vm, struct xe_tile *tile,
+				  struct iosys_map *map, void *ptr,
+				  u32 qword_ofs, u32 num_qwords,
 				  const struct xe_vm_pgtable_update *update)
 {
-	struct xe_vm *vm = pt_update->vops->vm;
 	u64 empty = __xe_pt_empty_pte(tile, vm, update->pt->level);
 	int i;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 01/25] drm/xe: Drop struct xe_migrate_pt_update argument from populate/clear vfuns
  2026-02-28  1:34 ` [PATCH v3 01/25] drm/xe: Drop struct xe_migrate_pt_update argument from populate/clear vfuns Matthew Brost
@ 2026-03-05 14:17   ` Francois Dugast
  0 siblings, 0 replies; 63+ messages in thread
From: Francois Dugast @ 2026-03-05 14:17 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom

On Fri, Feb 27, 2026 at 05:34:37PM -0800, Matthew Brost wrote:
> Remove the xe_migrate_pt_update argument from the populate and clear
> vfuns. This structure will not be available in run_job, where CPU binds
> will be implemented. The populate path no longer needs it, and the clear
> path already uses the VM field instead.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Francois Dugast <francois.dugast@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_migrate.c |  9 +++++----
>  drivers/gpu/drm/xe/xe_migrate.h | 12 +++++-------
>  drivers/gpu/drm/xe/xe_pt.c      | 12 +++++-------
>  3 files changed, 15 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index 8af6c347bea8..f7e3a044bd78 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -1656,6 +1656,7 @@ static void write_pgtable(struct xe_tile *tile, struct xe_bb *bb, u64 ppgtt_ofs,
>  			  struct xe_migrate_pt_update *pt_update)
>  {
>  	const struct xe_migrate_pt_update_ops *ops = pt_update->ops;
> +	struct xe_vm *vm = pt_update->vops->vm;
>  	u32 chunk;
>  	u32 ofs = update->ofs, size = update->qwords;
>  
> @@ -1687,10 +1688,10 @@ static void write_pgtable(struct xe_tile *tile, struct xe_bb *bb, u64 ppgtt_ofs,
>  		bb->cs[bb->len++] = lower_32_bits(addr);
>  		bb->cs[bb->len++] = upper_32_bits(addr);
>  		if (pt_op->bind)
> -			ops->populate(pt_update, tile, NULL, bb->cs + bb->len,
> +			ops->populate(tile, NULL, bb->cs + bb->len,
>  				      ofs, chunk, update);
>  		else
> -			ops->clear(pt_update, tile, NULL, bb->cs + bb->len,
> +			ops->clear(vm, tile, NULL, bb->cs + bb->len,
>  				   ofs, chunk, update);
>  
>  		bb->len += chunk * 2;
> @@ -1747,12 +1748,12 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
>  				&pt_op->entries[j];
>  
>  			if (pt_op->bind)
> -				ops->populate(pt_update, m->tile,
> +				ops->populate(m->tile,
>  					      &update->pt_bo->vmap, NULL,
>  					      update->ofs, update->qwords,
>  					      update);
>  			else
> -				ops->clear(pt_update, m->tile,
> +				ops->clear(vm, m->tile,
>  					   &update->pt_bo->vmap, NULL,
>  					   update->ofs, update->qwords, update);
>  		}
> diff --git a/drivers/gpu/drm/xe/xe_migrate.h b/drivers/gpu/drm/xe/xe_migrate.h
> index 1522afb37dcf..c3c0740f908d 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.h
> +++ b/drivers/gpu/drm/xe/xe_migrate.h
> @@ -40,7 +40,6 @@ enum xe_migrate_copy_dir {
>  struct xe_migrate_pt_update_ops {
>  	/**
>  	 * @populate: Populate a command buffer or page-table with ptes.
> -	 * @pt_update: Embeddable callback argument.
>  	 * @tile: The tile for the current operation.
>  	 * @map: struct iosys_map into the memory to be populated.
>  	 * @pos: If @map is NULL, map into the memory to be populated.
> @@ -52,13 +51,12 @@ struct xe_migrate_pt_update_ops {
>  	 * page-table system to populate command buffers or shared
>  	 * page-tables with PTEs.
>  	 */
> -	void (*populate)(struct xe_migrate_pt_update *pt_update,
> -			 struct xe_tile *tile, struct iosys_map *map,
> +	void (*populate)(struct xe_tile *tile, struct iosys_map *map,
>  			 void *pos, u32 ofs, u32 num_qwords,
>  			 const struct xe_vm_pgtable_update *update);
>  	/**
>  	 * @clear: Clear a command buffer or page-table with ptes.
> -	 * @pt_update: Embeddable callback argument.
> +	 * @vm: VM being updated
>  	 * @tile: The tile for the current operation.
>  	 * @map: struct iosys_map into the memory to be populated.
>  	 * @pos: If @map is NULL, map into the memory to be populated.
> @@ -70,9 +68,9 @@ struct xe_migrate_pt_update_ops {
>  	 * page-table system to populate command buffers or shared
>  	 * page-tables with PTEs.
>  	 */
> -	void (*clear)(struct xe_migrate_pt_update *pt_update,
> -		      struct xe_tile *tile, struct iosys_map *map,
> -		      void *pos, u32 ofs, u32 num_qwords,
> +	void (*clear)(struct xe_vm *vm, struct xe_tile *tile,
> +		      struct iosys_map *map, void *pos, u32 ofs,
> +		      u32 num_qwords,
>  		      const struct xe_vm_pgtable_update *update);
>  
>  	/**
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 13b355fadd58..99b15d37267f 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -980,9 +980,8 @@ bool xe_pt_zap_ptes_range(struct xe_tile *tile, struct xe_vm *vm,
>  }
>  
>  static void
> -xe_vm_populate_pgtable(struct xe_migrate_pt_update *pt_update, struct xe_tile *tile,
> -		       struct iosys_map *map, void *data,
> -		       u32 qword_ofs, u32 num_qwords,
> +xe_vm_populate_pgtable(struct xe_tile *tile, struct iosys_map *map,
> +		       void *data, u32 qword_ofs, u32 num_qwords,
>  		       const struct xe_vm_pgtable_update *update)
>  {
>  	struct xe_pt_entry *ptes = update->pt_entries;
> @@ -1811,12 +1810,11 @@ static unsigned int xe_pt_stage_unbind(struct xe_tile *tile,
>  }
>  
>  static void
> -xe_migrate_clear_pgtable_callback(struct xe_migrate_pt_update *pt_update,
> -				  struct xe_tile *tile, struct iosys_map *map,
> -				  void *ptr, u32 qword_ofs, u32 num_qwords,
> +xe_migrate_clear_pgtable_callback(struct xe_vm *vm, struct xe_tile *tile,
> +				  struct iosys_map *map, void *ptr,
> +				  u32 qword_ofs, u32 num_qwords,
>  				  const struct xe_vm_pgtable_update *update)
>  {
> -	struct xe_vm *vm = pt_update->vops->vm;
>  	u64 empty = __xe_pt_empty_pte(tile, vm, update->pt->level);
>  	int i;
>  
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 02/25] drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
  2026-02-28  1:34 ` [PATCH v3 01/25] drm/xe: Drop struct xe_migrate_pt_update argument from populate/clear vfuns Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-03-05 14:39   ` Francois Dugast
  2026-02-28  1:34 ` [PATCH v3 03/25] drm/xe: Decouple exec queue idle check from LRC Matthew Brost
                   ` (28 subsequent siblings)
  30 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Add the xe_migrate_update_pgtables_cpu_execute helper, which performs
the CPU-side page-table update. This will support implementing CPU
binds, as the submission backend can call this helper once a bind job’s
dependencies are resolved. While here, add assertions to provide basic
sanity checks on tht function arguments.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_migrate.c | 58 ++++++++++++++++++++-------------
 1 file changed, 35 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index f7e3a044bd78..69e6e3135ec6 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1715,6 +1715,38 @@ struct migrate_test_params {
 	container_of(_priv, struct migrate_test_params, base)
 #endif
 
+static void
+xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct xe_tile *tile,
+				       const struct xe_migrate_pt_update_ops *ops,
+				       struct xe_vm_pgtable_update_op *pt_op,
+				       int num_ops)
+{
+	u32 j, i;
+
+	for (j = 0; j < num_ops; ++j, ++pt_op) {
+		for (i = 0; i < pt_op->num_entries; i++) {
+			const struct xe_vm_pgtable_update *update =
+				&pt_op->entries[i];
+
+			xe_tile_assert(tile, update);
+			xe_tile_assert(tile, update->pt_bo);
+			xe_tile_assert(tile, !iosys_map_is_null(&update->pt_bo->vmap));
+
+			if (pt_op->bind)
+				ops->populate(tile, &update->pt_bo->vmap,
+					      NULL, update->ofs, update->qwords,
+					      update);
+			else
+				ops->clear(vm, tile, &update->pt_bo->vmap,
+					   NULL, update->ofs, update->qwords,
+					   update);
+		}
+	}
+
+	trace_xe_vm_cpu_bind(vm);
+	xe_device_wmb(vm->xe);
+}
+
 static struct dma_fence *
 xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
 			       struct xe_migrate_pt_update *pt_update)
@@ -1727,7 +1759,6 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
 	struct xe_vm_pgtable_update_ops *pt_update_ops =
 		&pt_update->vops->pt_update_ops[pt_update->tile_id];
 	int err;
-	u32 i, j;
 
 	if (XE_TEST_ONLY(test && test->force_gpu))
 		return ERR_PTR(-ETIME);
@@ -1739,28 +1770,9 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
 			return ERR_PTR(err);
 	}
 
-	for (i = 0; i < pt_update_ops->num_ops; ++i) {
-		const struct xe_vm_pgtable_update_op *pt_op =
-			&pt_update_ops->ops[i];
-
-		for (j = 0; j < pt_op->num_entries; j++) {
-			const struct xe_vm_pgtable_update *update =
-				&pt_op->entries[j];
-
-			if (pt_op->bind)
-				ops->populate(m->tile,
-					      &update->pt_bo->vmap, NULL,
-					      update->ofs, update->qwords,
-					      update);
-			else
-				ops->clear(vm, m->tile,
-					   &update->pt_bo->vmap, NULL,
-					   update->ofs, update->qwords, update);
-		}
-	}
-
-	trace_xe_vm_cpu_bind(vm);
-	xe_device_wmb(vm->xe);
+	xe_migrate_update_pgtables_cpu_execute(vm, m->tile, ops,
+					       pt_update_ops->ops,
+					       pt_update_ops->num_ops);
 
 	return dma_fence_get_stub();
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 02/25] drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper
  2026-02-28  1:34 ` [PATCH v3 02/25] drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper Matthew Brost
@ 2026-03-05 14:39   ` Francois Dugast
  0 siblings, 0 replies; 63+ messages in thread
From: Francois Dugast @ 2026-03-05 14:39 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom

On Fri, Feb 27, 2026 at 05:34:38PM -0800, Matthew Brost wrote:
> Add the xe_migrate_update_pgtables_cpu_execute helper, which performs
> the CPU-side page-table update. This will support implementing CPU
> binds, as the submission backend can call this helper once a bind job’s
> dependencies are resolved. While here, add assertions to provide basic
> sanity checks on tht function arguments.

s/tht/the/

Reviewed-by: Francois Dugast <francois.dugast@intel.com>

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_migrate.c | 58 ++++++++++++++++++++-------------
>  1 file changed, 35 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index f7e3a044bd78..69e6e3135ec6 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -1715,6 +1715,38 @@ struct migrate_test_params {
>  	container_of(_priv, struct migrate_test_params, base)
>  #endif
>  
> +static void
> +xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct xe_tile *tile,
> +				       const struct xe_migrate_pt_update_ops *ops,
> +				       struct xe_vm_pgtable_update_op *pt_op,
> +				       int num_ops)
> +{
> +	u32 j, i;
> +
> +	for (j = 0; j < num_ops; ++j, ++pt_op) {
> +		for (i = 0; i < pt_op->num_entries; i++) {
> +			const struct xe_vm_pgtable_update *update =
> +				&pt_op->entries[i];
> +
> +			xe_tile_assert(tile, update);
> +			xe_tile_assert(tile, update->pt_bo);
> +			xe_tile_assert(tile, !iosys_map_is_null(&update->pt_bo->vmap));
> +
> +			if (pt_op->bind)
> +				ops->populate(tile, &update->pt_bo->vmap,
> +					      NULL, update->ofs, update->qwords,
> +					      update);
> +			else
> +				ops->clear(vm, tile, &update->pt_bo->vmap,
> +					   NULL, update->ofs, update->qwords,
> +					   update);
> +		}
> +	}
> +
> +	trace_xe_vm_cpu_bind(vm);
> +	xe_device_wmb(vm->xe);
> +}
> +
>  static struct dma_fence *
>  xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
>  			       struct xe_migrate_pt_update *pt_update)
> @@ -1727,7 +1759,6 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
>  	struct xe_vm_pgtable_update_ops *pt_update_ops =
>  		&pt_update->vops->pt_update_ops[pt_update->tile_id];
>  	int err;
> -	u32 i, j;
>  
>  	if (XE_TEST_ONLY(test && test->force_gpu))
>  		return ERR_PTR(-ETIME);
> @@ -1739,28 +1770,9 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
>  			return ERR_PTR(err);
>  	}
>  
> -	for (i = 0; i < pt_update_ops->num_ops; ++i) {
> -		const struct xe_vm_pgtable_update_op *pt_op =
> -			&pt_update_ops->ops[i];
> -
> -		for (j = 0; j < pt_op->num_entries; j++) {
> -			const struct xe_vm_pgtable_update *update =
> -				&pt_op->entries[j];
> -
> -			if (pt_op->bind)
> -				ops->populate(m->tile,
> -					      &update->pt_bo->vmap, NULL,
> -					      update->ofs, update->qwords,
> -					      update);
> -			else
> -				ops->clear(vm, m->tile,
> -					   &update->pt_bo->vmap, NULL,
> -					   update->ofs, update->qwords, update);
> -		}
> -	}
> -
> -	trace_xe_vm_cpu_bind(vm);
> -	xe_device_wmb(vm->xe);
> +	xe_migrate_update_pgtables_cpu_execute(vm, m->tile, ops,
> +					       pt_update_ops->ops,
> +					       pt_update_ops->num_ops);
>  
>  	return dma_fence_get_stub();
>  }
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 03/25] drm/xe: Decouple exec queue idle check from LRC
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
  2026-02-28  1:34 ` [PATCH v3 01/25] drm/xe: Drop struct xe_migrate_pt_update argument from populate/clear vfuns Matthew Brost
  2026-02-28  1:34 ` [PATCH v3 02/25] drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-03-02 20:50   ` Summers, Stuart
  2026-02-28  1:34 ` [PATCH v3 04/25] drm/xe: Add job count to GuC exec queue snapshot Matthew Brost
                   ` (27 subsequent siblings)
  30 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

We already maintain a job count for each exec queue, so simplify the idle
check to rely on the job count rather than the LRC state. This decouples
exec queues from LRC-based backends and avoids unnecessarily coupling idle
detection to backend-specific implementation details.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 2d0e73a6a6ee..b3f700a9d425 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -1382,20 +1382,7 @@ bool xe_exec_queue_is_lr(struct xe_exec_queue *q)
  */
 bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
 {
-	if (xe_exec_queue_is_parallel(q)) {
-		int i;
-
-		for (i = 0; i < q->width; ++i) {
-			if (xe_lrc_seqno(q->lrc[i]) !=
-			    q->lrc[i]->fence_ctx.next_seqno - 1)
-				return false;
-		}
-
-		return true;
-	}
-
-	return xe_lrc_seqno(q->lrc[0]) ==
-		q->lrc[0]->fence_ctx.next_seqno - 1;
+	return !atomic_read(&q->job_cnt);
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 03/25] drm/xe: Decouple exec queue idle check from LRC
  2026-02-28  1:34 ` [PATCH v3 03/25] drm/xe: Decouple exec queue idle check from LRC Matthew Brost
@ 2026-03-02 20:50   ` Summers, Stuart
  2026-03-02 21:02     ` Matthew Brost
  0 siblings, 1 reply; 63+ messages in thread
From: Summers, Stuart @ 2026-03-02 20:50 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Ghimiray, Himal Prasad, Yadav, Arvind,
	thomas.hellstrom@linux.intel.com, Dugast, Francois

On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> We already maintain a job count for each exec queue, so simplify the
> idle
> check to rely on the job count rather than the LRC state. This
> decouples
> exec queues from LRC-based backends and avoids unnecessarily coupling
> idle
> detection to backend-specific implementation details.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec_queue.c | 15 +--------------
>  1 file changed, 1 insertion(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 2d0e73a6a6ee..b3f700a9d425 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -1382,20 +1382,7 @@ bool xe_exec_queue_is_lr(struct xe_exec_queue
> *q)
>   */
>  bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
>  {
> -       if (xe_exec_queue_is_parallel(q)) {
> -               int i;
> -
> -               for (i = 0; i < q->width; ++i) {
> -                       if (xe_lrc_seqno(q->lrc[i]) !=
> -                           q->lrc[i]->fence_ctx.next_seqno - 1)
> -                               return false;
> -               }
> -
> -               return true;
> -       }
> -
> -       return xe_lrc_seqno(q->lrc[0]) ==
> -               q->lrc[0]->fence_ctx.next_seqno - 1;
> +       return !atomic_read(&q->job_cnt);

Still looking through the series, so might be handled elsewhere, but
just looking at this patch alone, I'm a little worried this will cause
unexpected issues in the exec queue cleanup. This function currently
ensures that the job is idle from the hardware level. The change you
make here moves that to a software level check. And this is getting
decremented and checked before we tear down the exec queue. So
presumably, GuC and the command streamer could still be doing something
here and we're falsely telling other parts of the driver that rely on
the engine to really be idle to trust us here.

For reference, I'm looking at xe_sched_job_destroy() where we do the
decrement and then the exec queue put.

So my question is, how are we guaranteeing that hardware is indeed idle
after this change? Are we moving the sequence number check somewhere
else?

Thanks,
Stuart

>  }
>  
>  /**


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 03/25] drm/xe: Decouple exec queue idle check from LRC
  2026-03-02 20:50   ` Summers, Stuart
@ 2026-03-02 21:02     ` Matthew Brost
  2026-03-03 21:26       ` Summers, Stuart
  0 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-03-02 21:02 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, Ghimiray, Himal Prasad,
	Yadav, Arvind, thomas.hellstrom@linux.intel.com, Dugast, Francois

On Mon, Mar 02, 2026 at 01:50:11PM -0700, Summers, Stuart wrote:
> On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> > We already maintain a job count for each exec queue, so simplify the
> > idle
> > check to rely on the job count rather than the LRC state. This
> > decouples
> > exec queues from LRC-based backends and avoids unnecessarily coupling
> > idle
> > detection to backend-specific implementation details.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_exec_queue.c | 15 +--------------
> >  1 file changed, 1 insertion(+), 14 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > index 2d0e73a6a6ee..b3f700a9d425 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > @@ -1382,20 +1382,7 @@ bool xe_exec_queue_is_lr(struct xe_exec_queue
> > *q)
> >   */
> >  bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
> >  {
> > -       if (xe_exec_queue_is_parallel(q)) {
> > -               int i;
> > -
> > -               for (i = 0; i < q->width; ++i) {
> > -                       if (xe_lrc_seqno(q->lrc[i]) !=
> > -                           q->lrc[i]->fence_ctx.next_seqno - 1)
> > -                               return false;
> > -               }
> > -
> > -               return true;
> > -       }
> > -
> > -       return xe_lrc_seqno(q->lrc[0]) ==
> > -               q->lrc[0]->fence_ctx.next_seqno - 1;
> > +       return !atomic_read(&q->job_cnt);
> 
> Still looking through the series, so might be handled elsewhere, but
> just looking at this patch alone, I'm a little worried this will cause
> unexpected issues in the exec queue cleanup. This function currently
> ensures that the job is idle from the hardware level. The change you

The current check is actually incorrect if, for example, a queue is
reset and the LRC head != tail. However, I believe the only places we
use xe_exec_queue_is_idle are cases where a queue hasn’t been reset, so
it happens to work in practice. It’s also just an advisory check, so
nothing bad happens if it incorrectly reports “not idle".

> make here moves that to a software level check. And this is getting
> decremented and checked before we tear down the exec queue. So
> presumably, GuC and the command streamer could still be doing something
> here and we're falsely telling other parts of the driver that rely on
> the engine to really be idle to trust us here.
> 

See above for part of the explanation, but the other part involves
reference counting and fence signaling. A job can only have its last
reference dropped when its fence is signaled.

A fence can only signal under the following conditions:

- Its seqno is incremented via ring instructions (which corresponds to
  the LRC head == tail if it’s the last job on the queue).
- We time out jobs on the queue and signal their fences in software. We
  only signal fences in software once the queue has been kicked off the
  hardware (i.e., scheduling-disable H2G triggers a G2H response).

> For reference, I'm looking at xe_sched_job_destroy() where we do the
> decrement and then the exec queue put.
> 
> So my question is, how are we guaranteeing that hardware is indeed idle
> after this change? Are we moving the sequence number check somewhere
> else?
> 

I think above explains this.

Matt

> Thanks,
> Stuart
> 
> >  }
> >  
> >  /**
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 03/25] drm/xe: Decouple exec queue idle check from LRC
  2026-03-02 21:02     ` Matthew Brost
@ 2026-03-03 21:26       ` Summers, Stuart
  2026-03-03 22:42         ` Matthew Brost
  0 siblings, 1 reply; 63+ messages in thread
From: Summers, Stuart @ 2026-03-03 21:26 UTC (permalink / raw)
  To: Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, Ghimiray, Himal Prasad,
	Yadav, Arvind, thomas.hellstrom@linux.intel.com, Dugast, Francois

On Mon, 2026-03-02 at 13:02 -0800, Matthew Brost wrote:
> > On Mon, Mar 02, 2026 at 01:50:11PM -0700, Summers, Stuart wrote:
> > > > On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> > > > > > We already maintain a job count for each exec queue, so
> > > > > > simplify > > > the
> > > > > > idle
> > > > > > check to rely on the job count rather than the LRC state.
> > > > > > This
> > > > > > decouples
> > > > > > exec queues from LRC-based backends and avoids
> > > > > > unnecessarily > > > coupling
> > > > > > idle
> > > > > > detection to backend-specific implementation details.
> > > > > > 
> > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/xe/xe_exec_queue.c | 15 +--------------
> > > > > >  1 file changed, 1 insertion(+), 14 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > index 2d0e73a6a6ee..b3f700a9d425 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > @@ -1382,20 +1382,7 @@ bool xe_exec_queue_is_lr(struct > >
> > > > > > > xe_exec_queue
> > > > > > *q)
> > > > > >   */
> > > > > >  bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
> > > > > >  {
> > > > > > -       if (xe_exec_queue_is_parallel(q)) {
> > > > > > -               int i;
> > > > > > -
> > > > > > -               for (i = 0; i < q->width; ++i) {
> > > > > > -                       if (xe_lrc_seqno(q->lrc[i]) !=
> > > > > > -                           q->lrc[i]->fence_ctx.next_seqno
> > > > > > - 1)
> > > > > > -                               return false;
> > > > > > -               }
> > > > > > -
> > > > > > -               return true;
> > > > > > -       }
> > > > > > -
> > > > > > -       return xe_lrc_seqno(q->lrc[0]) ==
> > > > > > -               q->lrc[0]->fence_ctx.next_seqno - 1;
> > > > > > +       return !atomic_read(&q->job_cnt);
> > > > 
> > > > Still looking through the series, so might be handled
> > > > elsewhere, > > but
> > > > just looking at this patch alone, I'm a little worried this
> > > > will > > cause
> > > > unexpected issues in the exec queue cleanup. This function > >
> > > > currently
> > > > ensures that the job is idle from the hardware level. The
> > > > change > > you
> > 
> > The current check is actually incorrect if, for example, a queue is
> > reset and the LRC head != tail. However, I believe the only places
> > we
> > use xe_exec_queue_is_idle are cases where a queue hasn’t been
> > reset, > so
> > it happens to work in practice. It’s also just an advisory check,
> > so
> > nothing bad happens if it incorrectly reports “not idle".

So reset case aside (which not taking into consideration anything you
said below :) I'd consider a bug here), it does give a false sense of
things being actually idle on the hardware IMO that might be extended
out to other areas without realizing in the future. I agree that the
current use cases match what you said.

> > 
> > > > make here moves that to a software level check. And this is
> > > > getting
> > > > decremented and checked before we tear down the exec queue. So
> > > > presumably, GuC and the command streamer could still be doing >
> > > > > something
> > > > here and we're falsely telling other parts of the driver that
> > > > rely > > on
> > > > the engine to really be idle to trust us here.
> > > > 
> > 
> > See above for part of the explanation, but the other part involves
> > reference counting and fence signaling. A job can only have its
> > last
> > reference dropped when its fence is signaled.
> > 
> > A fence can only signal under the following conditions:
> > 
> > - Its seqno is incremented via ring instructions (which corresponds
> > > to
> >   the LRC head == tail if it’s the last job on the queue).

Right, so technically I guess we could have a hardware hang after the
sequence number was written since that isn't the last instruction
there, but seems very unlikely. And if we did hit that case, the reset
handler would cover that.

Maybe this should be obvious... but just so I'm not missing something
here..

So I think the signaling here we're talking about is via the
MI_USER_INT in:
xe_hw_engine_handle_irq -> xe_hw_fence_rq_run

And that dependency you're talking about is here (xe_exec, although I
know there are a few in xe_migrate, xe_pt, etc)?
        /* Wait behind rebinds */
        if (!xe_vm_in_lr_mode(vm)) {
                err = xe_sched_job_add_deps(job,
                                            xe_vm_resv(vm),
                                            DMA_RESV_USAGE_KERNEL);
                if (err)
                        goto err_put_job;
        }

What is the expectation for LR jobs?

Thanks,
Stuart

> > - We time out jobs on the queue and signal their fences in
> > software. > We
> >   only signal fences in software once the queue has been kicked off
> > > the
> >   hardware (i.e., scheduling-disable H2G triggers a G2H response).
> > 
> > > > For reference, I'm looking at xe_sched_job_destroy() where we
> > > > do > > the
> > > > decrement and then the exec queue put.
> > > > 
> > > > So my question is, how are we guaranteeing that hardware is
> > > > indeed > > idle
> > > > after this change? Are we moving the sequence number check > >
> > > > somewhere
> > > > else?
> > > > 
> > 
> > I think above explains this.
> > 
> > Matt
> > 
> > > > Thanks,
> > > > Stuart
> > > > 
> > > > > >  }
> > > > > >  
> > > > > >  /**
> > > > 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 03/25] drm/xe: Decouple exec queue idle check from LRC
  2026-03-03 21:26       ` Summers, Stuart
@ 2026-03-03 22:42         ` Matthew Brost
  2026-03-03 22:54           ` Summers, Stuart
  0 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-03-03 22:42 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, Ghimiray, Himal Prasad,
	Yadav, Arvind, thomas.hellstrom@linux.intel.com, Dugast, Francois

On Tue, Mar 03, 2026 at 02:26:56PM -0700, Summers, Stuart wrote:
> On Mon, 2026-03-02 at 13:02 -0800, Matthew Brost wrote:
> > > On Mon, Mar 02, 2026 at 01:50:11PM -0700, Summers, Stuart wrote:
> > > > > On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> > > > > > > We already maintain a job count for each exec queue, so
> > > > > > > simplify > > > the
> > > > > > > idle
> > > > > > > check to rely on the job count rather than the LRC state.
> > > > > > > This
> > > > > > > decouples
> > > > > > > exec queues from LRC-based backends and avoids
> > > > > > > unnecessarily > > > coupling
> > > > > > > idle
> > > > > > > detection to backend-specific implementation details.
> > > > > > > 
> > > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/xe/xe_exec_queue.c | 15 +--------------
> > > > > > >  1 file changed, 1 insertion(+), 14 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > > index 2d0e73a6a6ee..b3f700a9d425 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > > @@ -1382,20 +1382,7 @@ bool xe_exec_queue_is_lr(struct > >
> > > > > > > > xe_exec_queue
> > > > > > > *q)
> > > > > > >   */
> > > > > > >  bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
> > > > > > >  {
> > > > > > > -       if (xe_exec_queue_is_parallel(q)) {
> > > > > > > -               int i;
> > > > > > > -
> > > > > > > -               for (i = 0; i < q->width; ++i) {
> > > > > > > -                       if (xe_lrc_seqno(q->lrc[i]) !=
> > > > > > > -                           q->lrc[i]->fence_ctx.next_seqno
> > > > > > > - 1)
> > > > > > > -                               return false;
> > > > > > > -               }
> > > > > > > -
> > > > > > > -               return true;
> > > > > > > -       }
> > > > > > > -
> > > > > > > -       return xe_lrc_seqno(q->lrc[0]) ==
> > > > > > > -               q->lrc[0]->fence_ctx.next_seqno - 1;
> > > > > > > +       return !atomic_read(&q->job_cnt);
> > > > > 
> > > > > Still looking through the series, so might be handled
> > > > > elsewhere, > > but
> > > > > just looking at this patch alone, I'm a little worried this
> > > > > will > > cause
> > > > > unexpected issues in the exec queue cleanup. This function > >
> > > > > currently
> > > > > ensures that the job is idle from the hardware level. The
> > > > > change > > you
> > > 
> > > The current check is actually incorrect if, for example, a queue is
> > > reset and the LRC head != tail. However, I believe the only places
> > > we
> > > use xe_exec_queue_is_idle are cases where a queue hasn’t been
> > > reset, > so
> > > it happens to work in practice. It’s also just an advisory check,
> > > so
> > > nothing bad happens if it incorrectly reports “not idle".
> 
> So reset case aside (which not taking into consideration anything you
> said below :) I'd consider a bug here), it does give a false sense of
> things being actually idle on the hardware IMO that might be extended
> out to other areas without realizing in the future. I agree that the
> current use cases match what you said.
> 

Yes, so I would say this patch is actually improving things and opening
up this function to other possible use cases.

> > > 
> > > > > make here moves that to a software level check. And this is
> > > > > getting
> > > > > decremented and checked before we tear down the exec queue. So
> > > > > presumably, GuC and the command streamer could still be doing >
> > > > > > something
> > > > > here and we're falsely telling other parts of the driver that
> > > > > rely > > on
> > > > > the engine to really be idle to trust us here.
> > > > > 
> > > 
> > > See above for part of the explanation, but the other part involves
> > > reference counting and fence signaling. A job can only have its
> > > last
> > > reference dropped when its fence is signaled.
> > > 
> > > A fence can only signal under the following conditions:
> > > 
> > > - Its seqno is incremented via ring instructions (which corresponds
> > > > to
> > >   the LRC head == tail if it’s the last job on the queue).
> 
> Right, so technically I guess we could have a hardware hang after the
> sequence number was written since that isn't the last instruction
> there, but seems very unlikely. And if we did hit that case, the reset
> handler would cover that.
> 
> Maybe this should be obvious... but just so I'm not missing something
> here..
> 
> So I think the signaling here we're talking about is via the
> MI_USER_INT in:
> xe_hw_engine_handle_irq -> xe_hw_fence_rq_run

This is where fences are signaled or if we time them out in
guc_exec_queue_timedout_job via xe_sched_job_set_error.

> 
> And that dependency you're talking about is here (xe_exec, although I
> know there are a few in xe_migrate, xe_pt, etc)?
>         /* Wait behind rebinds */
>         if (!xe_vm_in_lr_mode(vm)) {
>                 err = xe_sched_job_add_deps(job,
>                                             xe_vm_resv(vm),
>                                             DMA_RESV_USAGE_KERNEL);
>                 if (err)
>                         goto err_put_job;
>         }
> 
> What is the expectation for LR jobs?
> 

This is completely unrelated but in dma-fence mode (!xe_vm_in_lr_mode)
we can't fault the device so we issue rebinds in the current exec
IOCTL for anything that moved since the last exec IOCTL - this ordering
exec IOCTL submission behind moving memory back into place + rebinding
it.

LR mode we either:
 - Rebind in preempt rebind worker
 - Let the device take a page fault and rebind

Because of this we don't even take the dma-resv lock for LR VMs in the
exec IOCTL.

Matt 

> Thanks,
> Stuart
> 
> > > - We time out jobs on the queue and signal their fences in
> > > software. > We
> > >   only signal fences in software once the queue has been kicked off
> > > > the
> > >   hardware (i.e., scheduling-disable H2G triggers a G2H response).
> > > 
> > > > > For reference, I'm looking at xe_sched_job_destroy() where we
> > > > > do > > the
> > > > > decrement and then the exec queue put.
> > > > > 
> > > > > So my question is, how are we guaranteeing that hardware is
> > > > > indeed > > idle
> > > > > after this change? Are we moving the sequence number check > >
> > > > > somewhere
> > > > > else?
> > > > > 
> > > 
> > > I think above explains this.
> > > 
> > > Matt
> > > 
> > > > > Thanks,
> > > > > Stuart
> > > > > 
> > > > > > >  }
> > > > > > >  
> > > > > > >  /**
> > > > > 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 03/25] drm/xe: Decouple exec queue idle check from LRC
  2026-03-03 22:42         ` Matthew Brost
@ 2026-03-03 22:54           ` Summers, Stuart
  0 siblings, 0 replies; 63+ messages in thread
From: Summers, Stuart @ 2026-03-03 22:54 UTC (permalink / raw)
  To: Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, Ghimiray, Himal Prasad,
	Yadav, Arvind, thomas.hellstrom@linux.intel.com, Dugast, Francois

On Tue, 2026-03-03 at 14:42 -0800, Matthew Brost wrote:
> On Tue, Mar 03, 2026 at 02:26:56PM -0700, Summers, Stuart wrote:
> > On Mon, 2026-03-02 at 13:02 -0800, Matthew Brost wrote:
> > > > On Mon, Mar 02, 2026 at 01:50:11PM -0700, Summers, Stuart
> > > > wrote:
> > > > > > On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> > > > > > > > We already maintain a job count for each exec queue, so
> > > > > > > > simplify > > > the
> > > > > > > > idle
> > > > > > > > check to rely on the job count rather than the LRC
> > > > > > > > state.
> > > > > > > > This
> > > > > > > > decouples
> > > > > > > > exec queues from LRC-based backends and avoids
> > > > > > > > unnecessarily > > > coupling
> > > > > > > > idle
> > > > > > > > detection to backend-specific implementation details.
> > > > > > > > 
> > > > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > > > > ---
> > > > > > > >  drivers/gpu/drm/xe/xe_exec_queue.c | 15 +-------------
> > > > > > > > -
> > > > > > > >  1 file changed, 1 insertion(+), 14 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > > > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > > > index 2d0e73a6a6ee..b3f700a9d425 100644
> > > > > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > > > @@ -1382,20 +1382,7 @@ bool xe_exec_queue_is_lr(struct
> > > > > > > > > >
> > > > > > > > > xe_exec_queue
> > > > > > > > *q)
> > > > > > > >   */
> > > > > > > >  bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
> > > > > > > >  {
> > > > > > > > -       if (xe_exec_queue_is_parallel(q)) {
> > > > > > > > -               int i;
> > > > > > > > -
> > > > > > > > -               for (i = 0; i < q->width; ++i) {
> > > > > > > > -                       if (xe_lrc_seqno(q->lrc[i]) !=
> > > > > > > > -                           q->lrc[i]-
> > > > > > > > >fence_ctx.next_seqno
> > > > > > > > - 1)
> > > > > > > > -                               return false;
> > > > > > > > -               }
> > > > > > > > -
> > > > > > > > -               return true;
> > > > > > > > -       }
> > > > > > > > -
> > > > > > > > -       return xe_lrc_seqno(q->lrc[0]) ==
> > > > > > > > -               q->lrc[0]->fence_ctx.next_seqno - 1;
> > > > > > > > +       return !atomic_read(&q->job_cnt);
> > > > > > 
> > > > > > Still looking through the series, so might be handled
> > > > > > elsewhere, > > but
> > > > > > just looking at this patch alone, I'm a little worried this
> > > > > > will > > cause
> > > > > > unexpected issues in the exec queue cleanup. This function
> > > > > > > >
> > > > > > currently
> > > > > > ensures that the job is idle from the hardware level. The
> > > > > > change > > you
> > > > 
> > > > The current check is actually incorrect if, for example, a
> > > > queue is
> > > > reset and the LRC head != tail. However, I believe the only
> > > > places
> > > > we
> > > > use xe_exec_queue_is_idle are cases where a queue hasn’t been
> > > > reset, > so
> > > > it happens to work in practice. It’s also just an advisory
> > > > check,
> > > > so
> > > > nothing bad happens if it incorrectly reports “not idle".
> > 
> > So reset case aside (which not taking into consideration anything
> > you
> > said below :) I'd consider a bug here), it does give a false sense
> > of
> > things being actually idle on the hardware IMO that might be
> > extended
> > out to other areas without realizing in the future. I agree that
> > the
> > current use cases match what you said.
> > 
> 
> Yes, so I would say this patch is actually improving things and
> opening
> up this function to other possible use cases.

Agreed..

> 
> > > > 
> > > > > > make here moves that to a software level check. And this is
> > > > > > getting
> > > > > > decremented and checked before we tear down the exec queue.
> > > > > > So
> > > > > > presumably, GuC and the command streamer could still be
> > > > > > doing >
> > > > > > > something
> > > > > > here and we're falsely telling other parts of the driver
> > > > > > that
> > > > > > rely > > on
> > > > > > the engine to really be idle to trust us here.
> > > > > > 
> > > > 
> > > > See above for part of the explanation, but the other part
> > > > involves
> > > > reference counting and fence signaling. A job can only have its
> > > > last
> > > > reference dropped when its fence is signaled.
> > > > 
> > > > A fence can only signal under the following conditions:
> > > > 
> > > > - Its seqno is incremented via ring instructions (which
> > > > corresponds
> > > > > to
> > > >   the LRC head == tail if it’s the last job on the queue).
> > 
> > Right, so technically I guess we could have a hardware hang after
> > the
> > sequence number was written since that isn't the last instruction
> > there, but seems very unlikely. And if we did hit that case, the
> > reset
> > handler would cover that.
> > 
> > Maybe this should be obvious... but just so I'm not missing
> > something
> > here..
> > 
> > So I think the signaling here we're talking about is via the
> > MI_USER_INT in:
> > xe_hw_engine_handle_irq -> xe_hw_fence_rq_run
> 
> This is where fences are signaled or if we time them out in
> guc_exec_queue_timedout_job via xe_sched_job_set_error.

Ah right..

> 
> > 
> > And that dependency you're talking about is here (xe_exec, although
> > I
> > know there are a few in xe_migrate, xe_pt, etc)?
> >         /* Wait behind rebinds */
> >         if (!xe_vm_in_lr_mode(vm)) {
> >                 err = xe_sched_job_add_deps(job,
> >                                             xe_vm_resv(vm),
> >                                             DMA_RESV_USAGE_KERNEL);
> >                 if (err)
> >                         goto err_put_job;
> >         }
> > 
> > What is the expectation for LR jobs?
> > 
> 
> This is completely unrelated but in dma-fence mode
> (!xe_vm_in_lr_mode)
> we can't fault the device so we issue rebinds in the current exec
> IOCTL for anything that moved since the last exec IOCTL - this
> ordering
> exec IOCTL submission behind moving memory back into place +
> rebinding
> it.
> 
> LR mode we either:
>  - Rebind in preempt rebind worker
>  - Let the device take a page fault and rebind
> 
> Because of this we don't even take the dma-resv lock for LR VMs in
> the
> exec IOCTL.

Yeah ok makes sense and I appreciate the explanation :)

Anyway I think with that I agree with the direction here.

Reviewed-by: Stuart Summers <stuart.summers@intel.com>

Thanks,
Stuart

> 
> Matt 
> 
> > Thanks,
> > Stuart
> > 
> > > > - We time out jobs on the queue and signal their fences in
> > > > software. > We
> > > >   only signal fences in software once the queue has been kicked
> > > > off
> > > > > the
> > > >   hardware (i.e., scheduling-disable H2G triggers a G2H
> > > > response).
> > > > 
> > > > > > For reference, I'm looking at xe_sched_job_destroy() where
> > > > > > we
> > > > > > do > > the
> > > > > > decrement and then the exec queue put.
> > > > > > 
> > > > > > So my question is, how are we guaranteeing that hardware is
> > > > > > indeed > > idle
> > > > > > after this change? Are we moving the sequence number check
> > > > > > > >
> > > > > > somewhere
> > > > > > else?
> > > > > > 
> > > > 
> > > > I think above explains this.
> > > > 
> > > > Matt
> > > > 
> > > > > > Thanks,
> > > > > > Stuart
> > > > > > 
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  /**
> > > > > > 
> > 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 04/25] drm/xe: Add job count to GuC exec queue snapshot
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (2 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 03/25] drm/xe: Decouple exec queue idle check from LRC Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-03-02 20:50   ` Summers, Stuart
  2026-02-28  1:34 ` [PATCH v3 05/25] drm/xe: Update xe_bo_put_deferred arguments to include writeback flag Matthew Brost
                   ` (26 subsequent siblings)
  30 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Add the job count to the GuC exec queue snapshot, as this is useful
debug information.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c       | 2 ++
 drivers/gpu/drm/xe/xe_guc_submit_types.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index ca7aa4f358d0..453af51fe87b 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -3207,6 +3207,7 @@ xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q)
 	snapshot->logical_mask = q->logical_mask;
 	snapshot->width = q->width;
 	snapshot->refcount = kref_read(&q->refcount);
+	snapshot->jobcount = atomic_read(&q->job_cnt);
 	snapshot->sched_timeout = sched->base.timeout;
 	snapshot->sched_props.timeslice_us = q->sched_props.timeslice_us;
 	snapshot->sched_props.preempt_timeout_us =
@@ -3279,6 +3280,7 @@ xe_guc_exec_queue_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps
 	drm_printf(p, "\tLogical mask: 0x%x\n", snapshot->logical_mask);
 	drm_printf(p, "\tWidth: %d\n", snapshot->width);
 	drm_printf(p, "\tRef: %d\n", snapshot->refcount);
+	drm_printf(p, "\tJob count: %d\n", snapshot->jobcount);
 	drm_printf(p, "\tTimeout: %ld (ms)\n", snapshot->sched_timeout);
 	drm_printf(p, "\tTimeslice: %u (us)\n",
 		   snapshot->sched_props.timeslice_us);
diff --git a/drivers/gpu/drm/xe/xe_guc_submit_types.h b/drivers/gpu/drm/xe/xe_guc_submit_types.h
index 5ccc5f959bb3..529d4b1f83fc 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_submit_types.h
@@ -77,6 +77,8 @@ struct xe_guc_submit_exec_queue_snapshot {
 	u16 width;
 	/** @refcount: ref count of this exec queue */
 	u32 refcount;
+	/** @jobcount: job count of this exec queue */
+	u32 jobcount;
 	/**
 	 * @sched_timeout: the time after which a job is removed from the
 	 * scheduler.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 04/25] drm/xe: Add job count to GuC exec queue snapshot
  2026-02-28  1:34 ` [PATCH v3 04/25] drm/xe: Add job count to GuC exec queue snapshot Matthew Brost
@ 2026-03-02 20:50   ` Summers, Stuart
  0 siblings, 0 replies; 63+ messages in thread
From: Summers, Stuart @ 2026-03-02 20:50 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Ghimiray, Himal Prasad, Yadav, Arvind,
	thomas.hellstrom@linux.intel.com, Dugast, Francois

On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> Add the job count to the GuC exec queue snapshot, as this is useful
> debug information.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Stuart Summers <stuart.summers@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_guc_submit.c       | 2 ++
>  drivers/gpu/drm/xe/xe_guc_submit_types.h | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> b/drivers/gpu/drm/xe/xe_guc_submit.c
> index ca7aa4f358d0..453af51fe87b 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -3207,6 +3207,7 @@ xe_guc_exec_queue_snapshot_capture(struct
> xe_exec_queue *q)
>         snapshot->logical_mask = q->logical_mask;
>         snapshot->width = q->width;
>         snapshot->refcount = kref_read(&q->refcount);
> +       snapshot->jobcount = atomic_read(&q->job_cnt);
>         snapshot->sched_timeout = sched->base.timeout;
>         snapshot->sched_props.timeslice_us = q-
> >sched_props.timeslice_us;
>         snapshot->sched_props.preempt_timeout_us =
> @@ -3279,6 +3280,7 @@ xe_guc_exec_queue_snapshot_print(struct
> xe_guc_submit_exec_queue_snapshot *snaps
>         drm_printf(p, "\tLogical mask: 0x%x\n", snapshot-
> >logical_mask);
>         drm_printf(p, "\tWidth: %d\n", snapshot->width);
>         drm_printf(p, "\tRef: %d\n", snapshot->refcount);
> +       drm_printf(p, "\tJob count: %d\n", snapshot->jobcount);
>         drm_printf(p, "\tTimeout: %ld (ms)\n", snapshot-
> >sched_timeout);
>         drm_printf(p, "\tTimeslice: %u (us)\n",
>                    snapshot->sched_props.timeslice_us);
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit_types.h
> b/drivers/gpu/drm/xe/xe_guc_submit_types.h
> index 5ccc5f959bb3..529d4b1f83fc 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit_types.h
> +++ b/drivers/gpu/drm/xe/xe_guc_submit_types.h
> @@ -77,6 +77,8 @@ struct xe_guc_submit_exec_queue_snapshot {
>         u16 width;
>         /** @refcount: ref count of this exec queue */
>         u32 refcount;
> +       /** @jobcount: job count of this exec queue */
> +       u32 jobcount;
>         /**
>          * @sched_timeout: the time after which a job is removed from
> the
>          * scheduler.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 05/25] drm/xe: Update xe_bo_put_deferred arguments to include writeback flag
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (3 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 04/25] drm/xe: Add job count to GuC exec queue snapshot Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-04-01 12:20   ` Francois Dugast
  2026-02-28  1:34 ` [PATCH v3 06/25] drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC Matthew Brost
                   ` (25 subsequent siblings)
  30 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Update the xe_bo_put_deferred arguments to include a writeback flag,
which indicates whether the BO was added to the deferred list. This is
useful when the caller needs to take additional actions after the BO has
been queued for deferred release.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.h         | 10 ++++++++--
 drivers/gpu/drm/xe/xe_drm_client.c |  2 +-
 drivers/gpu/drm/xe/xe_pt.c         |  2 +-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index c914ab719f20..bf284ed47325 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -348,6 +348,8 @@ void __xe_bo_release_dummy(struct kref *kref);
  * @bo: The bo to put.
  * @deferred: List to which to add the buffer object if we cannot put, or
  * NULL if the function is to put unconditionally.
+ * @added: BO was added to deferred list, written back to caller, can be NULL if
+ * writeback is not needed.
  *
  * Since the final freeing of an object includes both sleeping and (!)
  * memory allocation in the dma_resv individualization, it's not ok
@@ -367,7 +369,8 @@ void __xe_bo_release_dummy(struct kref *kref);
  * false otherwise.
  */
 static inline bool
-xe_bo_put_deferred(struct xe_bo *bo, struct llist_head *deferred)
+xe_bo_put_deferred(struct xe_bo *bo, struct llist_head *deferred,
+		   bool *added)
 {
 	if (!deferred) {
 		xe_bo_put(bo);
@@ -377,6 +380,9 @@ xe_bo_put_deferred(struct xe_bo *bo, struct llist_head *deferred)
 	if (!kref_put(&bo->ttm.base.refcount, __xe_bo_release_dummy))
 		return false;
 
+	if (added)
+		*added = true;
+
 	return llist_add(&bo->freed, deferred);
 }
 
@@ -393,7 +399,7 @@ xe_bo_put_async(struct xe_bo *bo)
 {
 	struct xe_bo_dev *bo_device = &xe_bo_device(bo)->bo_device;
 
-	if (xe_bo_put_deferred(bo, &bo_device->async_list))
+	if (xe_bo_put_deferred(bo, &bo_device->async_list, NULL))
 		schedule_work(&bo_device->async_free);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
index 84b66147bf49..45efe7a55427 100644
--- a/drivers/gpu/drm/xe/xe_drm_client.c
+++ b/drivers/gpu/drm/xe/xe_drm_client.c
@@ -246,7 +246,7 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
 			xe_assert(xef->xe, !list_empty(&bo->client_link));
 		}
 
-		xe_bo_put_deferred(bo, &deferred);
+		xe_bo_put_deferred(bo, &deferred, NULL);
 	}
 	spin_unlock(&client->bos_lock);
 
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 99b15d37267f..83dacc91b7b3 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -212,7 +212,7 @@ void xe_pt_destroy(struct xe_pt *pt, u32 flags, struct llist_head *deferred)
 
 	XE_WARN_ON(!list_empty(&pt->bo->ttm.base.gpuva.list));
 	xe_bo_unpin(pt->bo);
-	xe_bo_put_deferred(pt->bo, deferred);
+	xe_bo_put_deferred(pt->bo, deferred, NULL);
 
 	if (pt->level > 0 && pt->num_live) {
 		struct xe_pt_dir *pt_dir = as_xe_pt_dir(pt);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 05/25] drm/xe: Update xe_bo_put_deferred arguments to include writeback flag
  2026-02-28  1:34 ` [PATCH v3 05/25] drm/xe: Update xe_bo_put_deferred arguments to include writeback flag Matthew Brost
@ 2026-04-01 12:20   ` Francois Dugast
  2026-04-01 22:39     ` Matthew Brost
  0 siblings, 1 reply; 63+ messages in thread
From: Francois Dugast @ 2026-04-01 12:20 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom

On Fri, Feb 27, 2026 at 05:34:41PM -0800, Matthew Brost wrote:
> Update the xe_bo_put_deferred arguments to include a writeback flag,
> which indicates whether the BO was added to the deferred list. This is
> useful when the caller needs to take additional actions after the BO has
> been queued for deferred release.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.h         | 10 ++++++++--
>  drivers/gpu/drm/xe/xe_drm_client.c |  2 +-
>  drivers/gpu/drm/xe/xe_pt.c         |  2 +-
>  3 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index c914ab719f20..bf284ed47325 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -348,6 +348,8 @@ void __xe_bo_release_dummy(struct kref *kref);
>   * @bo: The bo to put.
>   * @deferred: List to which to add the buffer object if we cannot put, or
>   * NULL if the function is to put unconditionally.
> + * @added: BO was added to deferred list, written back to caller, can be NULL if
> + * writeback is not needed.

We should add in this comment that if set by the caller then it must be
false, because this function only sets it to true. With that:

    Reviewed-by: Francois Dugast <francois.dugast@intel.com>

>   *
>   * Since the final freeing of an object includes both sleeping and (!)
>   * memory allocation in the dma_resv individualization, it's not ok
> @@ -367,7 +369,8 @@ void __xe_bo_release_dummy(struct kref *kref);
>   * false otherwise.
>   */
>  static inline bool
> -xe_bo_put_deferred(struct xe_bo *bo, struct llist_head *deferred)
> +xe_bo_put_deferred(struct xe_bo *bo, struct llist_head *deferred,
> +		   bool *added)
>  {
>  	if (!deferred) {
>  		xe_bo_put(bo);
> @@ -377,6 +380,9 @@ xe_bo_put_deferred(struct xe_bo *bo, struct llist_head *deferred)
>  	if (!kref_put(&bo->ttm.base.refcount, __xe_bo_release_dummy))
>  		return false;
>  
> +	if (added)
> +		*added = true;
> +
>  	return llist_add(&bo->freed, deferred);
>  }
>  
> @@ -393,7 +399,7 @@ xe_bo_put_async(struct xe_bo *bo)
>  {
>  	struct xe_bo_dev *bo_device = &xe_bo_device(bo)->bo_device;
>  
> -	if (xe_bo_put_deferred(bo, &bo_device->async_list))
> +	if (xe_bo_put_deferred(bo, &bo_device->async_list, NULL))
>  		schedule_work(&bo_device->async_free);
>  }
>  
> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
> index 84b66147bf49..45efe7a55427 100644
> --- a/drivers/gpu/drm/xe/xe_drm_client.c
> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
> @@ -246,7 +246,7 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
>  			xe_assert(xef->xe, !list_empty(&bo->client_link));
>  		}
>  
> -		xe_bo_put_deferred(bo, &deferred);
> +		xe_bo_put_deferred(bo, &deferred, NULL);
>  	}
>  	spin_unlock(&client->bos_lock);
>  
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 99b15d37267f..83dacc91b7b3 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -212,7 +212,7 @@ void xe_pt_destroy(struct xe_pt *pt, u32 flags, struct llist_head *deferred)
>  
>  	XE_WARN_ON(!list_empty(&pt->bo->ttm.base.gpuva.list));
>  	xe_bo_unpin(pt->bo);
> -	xe_bo_put_deferred(pt->bo, deferred);
> +	xe_bo_put_deferred(pt->bo, deferred, NULL);
>  
>  	if (pt->level > 0 && pt->num_live) {
>  		struct xe_pt_dir *pt_dir = as_xe_pt_dir(pt);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 05/25] drm/xe: Update xe_bo_put_deferred arguments to include writeback flag
  2026-04-01 12:20   ` Francois Dugast
@ 2026-04-01 22:39     ` Matthew Brost
  0 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-04-01 22:39 UTC (permalink / raw)
  To: Francois Dugast
  Cc: intel-xe, stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom

On Wed, Apr 01, 2026 at 02:20:06PM +0200, Francois Dugast wrote:
> On Fri, Feb 27, 2026 at 05:34:41PM -0800, Matthew Brost wrote:
> > Update the xe_bo_put_deferred arguments to include a writeback flag,
> > which indicates whether the BO was added to the deferred list. This is
> > useful when the caller needs to take additional actions after the BO has
> > been queued for deferred release.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_bo.h         | 10 ++++++++--
> >  drivers/gpu/drm/xe/xe_drm_client.c |  2 +-
> >  drivers/gpu/drm/xe/xe_pt.c         |  2 +-
> >  3 files changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> > index c914ab719f20..bf284ed47325 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.h
> > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > @@ -348,6 +348,8 @@ void __xe_bo_release_dummy(struct kref *kref);
> >   * @bo: The bo to put.
> >   * @deferred: List to which to add the buffer object if we cannot put, or
> >   * NULL if the function is to put unconditionally.
> > + * @added: BO was added to deferred list, written back to caller, can be NULL if
> > + * writeback is not needed.
> 
> We should add in this comment that if set by the caller then it must be
> false, because this function only sets it to true. With that:
> 

Yes, or I could just change this sematic as it is kinda goofy.

Matt

>     Reviewed-by: Francois Dugast <francois.dugast@intel.com>
> 
> >   *
> >   * Since the final freeing of an object includes both sleeping and (!)
> >   * memory allocation in the dma_resv individualization, it's not ok
> > @@ -367,7 +369,8 @@ void __xe_bo_release_dummy(struct kref *kref);
> >   * false otherwise.
> >   */
> >  static inline bool
> > -xe_bo_put_deferred(struct xe_bo *bo, struct llist_head *deferred)
> > +xe_bo_put_deferred(struct xe_bo *bo, struct llist_head *deferred,
> > +		   bool *added)
> >  {
> >  	if (!deferred) {
> >  		xe_bo_put(bo);
> > @@ -377,6 +380,9 @@ xe_bo_put_deferred(struct xe_bo *bo, struct llist_head *deferred)
> >  	if (!kref_put(&bo->ttm.base.refcount, __xe_bo_release_dummy))
> >  		return false;
> >  
> > +	if (added)
> > +		*added = true;
> > +
> >  	return llist_add(&bo->freed, deferred);
> >  }
> >  
> > @@ -393,7 +399,7 @@ xe_bo_put_async(struct xe_bo *bo)
> >  {
> >  	struct xe_bo_dev *bo_device = &xe_bo_device(bo)->bo_device;
> >  
> > -	if (xe_bo_put_deferred(bo, &bo_device->async_list))
> > +	if (xe_bo_put_deferred(bo, &bo_device->async_list, NULL))
> >  		schedule_work(&bo_device->async_free);
> >  }
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
> > index 84b66147bf49..45efe7a55427 100644
> > --- a/drivers/gpu/drm/xe/xe_drm_client.c
> > +++ b/drivers/gpu/drm/xe/xe_drm_client.c
> > @@ -246,7 +246,7 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
> >  			xe_assert(xef->xe, !list_empty(&bo->client_link));
> >  		}
> >  
> > -		xe_bo_put_deferred(bo, &deferred);
> > +		xe_bo_put_deferred(bo, &deferred, NULL);
> >  	}
> >  	spin_unlock(&client->bos_lock);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> > index 99b15d37267f..83dacc91b7b3 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -212,7 +212,7 @@ void xe_pt_destroy(struct xe_pt *pt, u32 flags, struct llist_head *deferred)
> >  
> >  	XE_WARN_ON(!list_empty(&pt->bo->ttm.base.gpuva.list));
> >  	xe_bo_unpin(pt->bo);
> > -	xe_bo_put_deferred(pt->bo, deferred);
> > +	xe_bo_put_deferred(pt->bo, deferred, NULL);
> >  
> >  	if (pt->level > 0 && pt->num_live) {
> >  		struct xe_pt_dir *pt_dir = as_xe_pt_dir(pt);
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 06/25] drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (4 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 05/25] drm/xe: Update xe_bo_put_deferred arguments to include writeback flag Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-04-01 12:22   ` Francois Dugast
  2026-02-28  1:34 ` [PATCH v3 07/25] drm/xe: Update scheduler job layer to support PT jobs Matthew Brost
                   ` (24 subsequent siblings)
  30 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Add XE_BO_FLAG_PUT_VM_ASYNC, which indicates that an async BO put must
also drop an additional reference to the BO’s VM. This is useful when a
kernel BO, one that does not normally hold a VM reference, needs to be
put asynchronously, ensuring the shared dma-resv object does not
disappear before the BO.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 8 +++++++-
 drivers/gpu/drm/xe/xe_bo.h | 1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 8ff193600443..d4c8ef8ff2c3 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -3567,8 +3567,14 @@ void xe_bo_put_commit(struct llist_head *deferred)
 	if (!freed)
 		return;
 
-	llist_for_each_entry_safe(bo, next, freed, freed)
+	llist_for_each_entry_safe(bo, next, freed, freed) {
+		struct xe_vm *vm = bo->vm;
+		bool async = bo->flags & XE_BO_FLAG_PUT_VM_ASYNC;
+
 		drm_gem_object_free(&bo->ttm.base.refcount);
+		if (async)
+			xe_vm_put(vm);
+	}
 }
 
 static void xe_bo_dev_work_func(struct work_struct *work)
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index bf284ed47325..14be9c1bb09d 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -52,6 +52,7 @@
 #define XE_BO_FLAG_CPU_ADDR_MIRROR	BIT(24)
 #define XE_BO_FLAG_FORCE_USER_VRAM	BIT(25)
 #define XE_BO_FLAG_NO_COMPRESSION	BIT(26)
+#define XE_BO_FLAG_PUT_VM_ASYNC		BIT(27)
 
 /* this one is trigger internally only */
 #define XE_BO_FLAG_INTERNAL_TEST	BIT(30)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 06/25] drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC
  2026-02-28  1:34 ` [PATCH v3 06/25] drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC Matthew Brost
@ 2026-04-01 12:22   ` Francois Dugast
  2026-04-01 22:38     ` Matthew Brost
  0 siblings, 1 reply; 63+ messages in thread
From: Francois Dugast @ 2026-04-01 12:22 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom

On Fri, Feb 27, 2026 at 05:34:42PM -0800, Matthew Brost wrote:
> Add XE_BO_FLAG_PUT_VM_ASYNC, which indicates that an async BO put must
> also drop an additional reference to the BO’s VM. This is useful when a
> kernel BO, one that does not normally hold a VM reference, needs to be
> put asynchronously, ensuring the shared dma-resv object does not
> disappear before the BO.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c | 8 +++++++-
>  drivers/gpu/drm/xe/xe_bo.h | 1 +
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 8ff193600443..d4c8ef8ff2c3 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -3567,8 +3567,14 @@ void xe_bo_put_commit(struct llist_head *deferred)
>  	if (!freed)
>  		return;
>  
> -	llist_for_each_entry_safe(bo, next, freed, freed)
> +	llist_for_each_entry_safe(bo, next, freed, freed) {
> +		struct xe_vm *vm = bo->vm;
> +		bool async = bo->flags & XE_BO_FLAG_PUT_VM_ASYNC;
> +
>  		drm_gem_object_free(&bo->ttm.base.refcount);
> +		if (async)
> +			xe_vm_put(vm);
> +	}
>  }
>  
>  static void xe_bo_dev_work_func(struct work_struct *work)
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index bf284ed47325..14be9c1bb09d 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -52,6 +52,7 @@
>  #define XE_BO_FLAG_CPU_ADDR_MIRROR	BIT(24)
>  #define XE_BO_FLAG_FORCE_USER_VRAM	BIT(25)
>  #define XE_BO_FLAG_NO_COMPRESSION	BIT(26)
> +#define XE_BO_FLAG_PUT_VM_ASYNC		BIT(27)

If I understand correctly, this flag should only be used in the deferred
path, which is why it is ignored in the non-deferred path. Maybe it would
be good to add an assert there to make sure we are not mistakenly leaking
a VM reference.

Francois

>  
>  /* this one is trigger internally only */
>  #define XE_BO_FLAG_INTERNAL_TEST	BIT(30)
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 06/25] drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC
  2026-04-01 12:22   ` Francois Dugast
@ 2026-04-01 22:38     ` Matthew Brost
  0 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-04-01 22:38 UTC (permalink / raw)
  To: Francois Dugast
  Cc: intel-xe, stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom

On Wed, Apr 01, 2026 at 02:22:09PM +0200, Francois Dugast wrote:
> On Fri, Feb 27, 2026 at 05:34:42PM -0800, Matthew Brost wrote:
> > Add XE_BO_FLAG_PUT_VM_ASYNC, which indicates that an async BO put must
> > also drop an additional reference to the BO’s VM. This is useful when a
> > kernel BO, one that does not normally hold a VM reference, needs to be
> > put asynchronously, ensuring the shared dma-resv object does not
> > disappear before the BO.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_bo.c | 8 +++++++-
> >  drivers/gpu/drm/xe/xe_bo.h | 1 +
> >  2 files changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > index 8ff193600443..d4c8ef8ff2c3 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -3567,8 +3567,14 @@ void xe_bo_put_commit(struct llist_head *deferred)
> >  	if (!freed)
> >  		return;
> >  
> > -	llist_for_each_entry_safe(bo, next, freed, freed)
> > +	llist_for_each_entry_safe(bo, next, freed, freed) {
> > +		struct xe_vm *vm = bo->vm;
> > +		bool async = bo->flags & XE_BO_FLAG_PUT_VM_ASYNC;
> > +
> >  		drm_gem_object_free(&bo->ttm.base.refcount);
> > +		if (async)
> > +			xe_vm_put(vm);
> > +	}
> >  }
> >  
> >  static void xe_bo_dev_work_func(struct work_struct *work)
> > diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> > index bf284ed47325..14be9c1bb09d 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.h
> > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > @@ -52,6 +52,7 @@
> >  #define XE_BO_FLAG_CPU_ADDR_MIRROR	BIT(24)
> >  #define XE_BO_FLAG_FORCE_USER_VRAM	BIT(25)
> >  #define XE_BO_FLAG_NO_COMPRESSION	BIT(26)
> > +#define XE_BO_FLAG_PUT_VM_ASYNC		BIT(27)
> 
> If I understand correctly, this flag should only be used in the deferred
> path, which is why it is ignored in the non-deferred path. Maybe it would
> be good to add an assert there to make sure we are not mistakenly leaking
> a VM reference.
> 

I think we'd have to clear the flag here then too as we still get into
xe_gem_object_free + xe_ttm_bo_destroy eventually just not in the
naturnal ref counted way...

But that is probably a good idea though to make sure we don't screw this
up.

Matt 

> Francois
> 
> >  
> >  /* this one is trigger internally only */
> >  #define XE_BO_FLAG_INTERNAL_TEST	BIT(30)
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 07/25] drm/xe: Update scheduler job layer to support PT jobs
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (5 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 06/25] drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-03-03 22:50   ` Summers, Stuart
  2026-02-28  1:34 ` [PATCH v3 08/25] drm/xe: Add helpers to access PT ops Matthew Brost
                   ` (23 subsequent siblings)
  30 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Update the scheduler job layer to support PT jobs. PT jobs are executed
entirely on the CPU and do not require LRC fences or a batch address.
Repurpose the LRC fence storage to hold PT‑job arguments and update the
scheduler job layer to distinguish between PT jobs and jobs that require
an LRC.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_sched_job.c       | 92 ++++++++++++++++---------
 drivers/gpu/drm/xe/xe_sched_job_types.h | 31 ++++++++-
 2 files changed, 88 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index ae5b38b2a884..a8ba7f90368f 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -26,19 +26,22 @@ static struct kmem_cache *xe_sched_job_parallel_slab;
 
 int __init xe_sched_job_module_init(void)
 {
+	struct xe_sched_job *job;
+	size_t size;
+
+	size = struct_size(job, ptrs, 1);
 	xe_sched_job_slab =
-		kmem_cache_create("xe_sched_job",
-				  sizeof(struct xe_sched_job) +
-				  sizeof(struct xe_job_ptrs), 0,
+		kmem_cache_create("xe_sched_job", size, 0,
 				  SLAB_HWCACHE_ALIGN, NULL);
 	if (!xe_sched_job_slab)
 		return -ENOMEM;
 
+	size = max_t(size_t,
+		     struct_size(job, ptrs,
+				 XE_HW_ENGINE_MAX_INSTANCE),
+		     struct_size(job, pt_update, 1));
 	xe_sched_job_parallel_slab =
-		kmem_cache_create("xe_sched_job_parallel",
-				  sizeof(struct xe_sched_job) +
-				  sizeof(struct xe_job_ptrs) *
-				  XE_HW_ENGINE_MAX_INSTANCE, 0,
+		kmem_cache_create("xe_sched_job_parallel", size, 0,
 				  SLAB_HWCACHE_ALIGN, NULL);
 	if (!xe_sched_job_parallel_slab) {
 		kmem_cache_destroy(xe_sched_job_slab);
@@ -84,7 +87,7 @@ static void xe_sched_job_free_fences(struct xe_sched_job *job)
 {
 	int i;
 
-	for (i = 0; i < job->q->width; ++i) {
+	for (i = 0; !job->is_pt_job && i < job->q->width; ++i) {
 		struct xe_job_ptrs *ptrs = &job->ptrs[i];
 
 		if (ptrs->lrc_fence)
@@ -93,10 +96,23 @@ static void xe_sched_job_free_fences(struct xe_sched_job *job)
 	}
 }
 
+/**
+ * xe_sched_job_create() - Create a scheduler job
+ * @q: exec queue to create the scheduler job for
+ * @batch: array of batch addresses for the job; must match the width of @q,
+ *         or NULL to indicate a PT job that does not require a batch address
+ *
+ * Create a scheduler job for submission.
+ *
+ * Context: Reclaim
+ *
+ * Return: a &xe_sched_job object on success, or an ERR_PTR on failure.
+ */
 struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
 					 u64 *batch_addr)
 {
 	bool is_migration = xe_sched_job_is_migration(q);
+	struct xe_device *xe = gt_to_xe(q->gt);
 	struct xe_sched_job *job;
 	int err;
 	int i;
@@ -105,6 +121,9 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
 	/* only a kernel context can submit a vm-less job */
 	XE_WARN_ON(!q->vm && !(q->flags & EXEC_QUEUE_FLAG_KERNEL));
 
+	xe_assert(xe, batch_addr ||
+		  q->flags & (EXEC_QUEUE_FLAG_VM | EXEC_QUEUE_FLAG_MIGRATE));
+
 	job = job_alloc(xe_exec_queue_is_parallel(q) || is_migration);
 	if (!job)
 		return ERR_PTR(-ENOMEM);
@@ -119,34 +138,39 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
 	if (err)
 		goto err_free;
 
-	for (i = 0; i < q->width; ++i) {
-		struct dma_fence *fence = xe_lrc_alloc_seqno_fence();
-		struct dma_fence_chain *chain;
-
-		if (IS_ERR(fence)) {
-			err = PTR_ERR(fence);
-			goto err_sched_job;
+	if (!batch_addr) {
+		job->fence = dma_fence_get_stub();
+		job->is_pt_job = true;
+	} else {
+		for (i = 0; i < q->width; ++i) {
+			struct dma_fence *fence = xe_lrc_alloc_seqno_fence();
+			struct dma_fence_chain *chain;
+
+			if (IS_ERR(fence)) {
+				err = PTR_ERR(fence);
+				goto err_sched_job;
+			}
+			job->ptrs[i].lrc_fence = fence;
+
+			if (i + 1 == q->width)
+				continue;
+
+			chain = dma_fence_chain_alloc();
+			if (!chain) {
+				err = -ENOMEM;
+				goto err_sched_job;
+			}
+			job->ptrs[i].chain_fence = chain;
 		}
-		job->ptrs[i].lrc_fence = fence;
 
-		if (i + 1 == q->width)
-			continue;
+		width = q->width;
+		if (is_migration)
+			width = 2;
 
-		chain = dma_fence_chain_alloc();
-		if (!chain) {
-			err = -ENOMEM;
-			goto err_sched_job;
-		}
-		job->ptrs[i].chain_fence = chain;
+		for (i = 0; i < width; ++i)
+			job->ptrs[i].batch_addr = batch_addr[i];
 	}
 
-	width = q->width;
-	if (is_migration)
-		width = 2;
-
-	for (i = 0; i < width; ++i)
-		job->ptrs[i].batch_addr = batch_addr[i];
-
 	atomic_inc(&q->job_cnt);
 	xe_pm_runtime_get_noresume(job_to_xe(job));
 	trace_xe_sched_job_create(job);
@@ -246,7 +270,7 @@ bool xe_sched_job_completed(struct xe_sched_job *job)
 void xe_sched_job_arm(struct xe_sched_job *job)
 {
 	struct xe_exec_queue *q = job->q;
-	struct dma_fence *fence, *prev;
+	struct dma_fence *fence = job->fence, *prev;
 	struct xe_vm *vm = q->vm;
 	u64 seqno = 0;
 	int i;
@@ -266,6 +290,9 @@ void xe_sched_job_arm(struct xe_sched_job *job)
 		job->ring_ops_flush_tlb = true;
 	}
 
+	if (job->is_pt_job)
+		goto arm;
+
 	/* Arm the pre-allocated fences */
 	for (i = 0; i < q->width; prev = fence, ++i) {
 		struct dma_fence_chain *chain;
@@ -286,6 +313,7 @@ void xe_sched_job_arm(struct xe_sched_job *job)
 		fence = &chain->base;
 	}
 
+arm:
 	job->fence = dma_fence_get(fence);	/* Pairs with put in scheduler */
 	drm_sched_job_arm(&job->drm);
 }
diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
index 13c2970e81a8..9be4e2c5989d 100644
--- a/drivers/gpu/drm/xe/xe_sched_job_types.h
+++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
@@ -10,10 +10,29 @@
 
 #include <drm/gpu_scheduler.h>
 
-struct xe_exec_queue;
 struct dma_fence;
 struct dma_fence_chain;
 
+struct xe_exec_queue;
+struct xe_migrate_pt_update_ops;
+struct xe_pt_job_ops;
+struct xe_tile;
+struct xe_vm;
+
+/**
+ * struct xe_pt_update_args - PT update arguments
+ */
+struct xe_pt_update_args {
+	/** @vm: VM which is being bound */
+	struct xe_vm *vm;
+	/** @tile: Tile which page tables belong to */
+	struct xe_tile *tile;
+	/** @ops: Migrate PT update ops */
+	const struct xe_migrate_pt_update_ops *ops;
+	/** @pt_job_ops: PT job ops state */
+	struct xe_pt_job_ops *pt_job_ops;
+};
+
 /**
  * struct xe_job_ptrs - Per hw engine instance data
  */
@@ -69,8 +88,14 @@ struct xe_sched_job {
 	bool restore_replay;
 	/** @last_replay: last job being replayed */
 	bool last_replay;
-	/** @ptrs: per instance pointers. */
-	struct xe_job_ptrs ptrs[];
+	/** @is_pt_job: is a PT job */
+	bool is_pt_job;
+	union {
+		/** @ptrs: per instance pointers. */
+		DECLARE_FLEX_ARRAY(struct xe_job_ptrs, ptrs);
+		/** @pt_update: PT update arguments */
+		DECLARE_FLEX_ARRAY(struct xe_pt_update_args, pt_update);
+	};
 };
 
 struct xe_sched_job_snapshot {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 07/25] drm/xe: Update scheduler job layer to support PT jobs
  2026-02-28  1:34 ` [PATCH v3 07/25] drm/xe: Update scheduler job layer to support PT jobs Matthew Brost
@ 2026-03-03 22:50   ` Summers, Stuart
  2026-03-03 23:00     ` Matthew Brost
  0 siblings, 1 reply; 63+ messages in thread
From: Summers, Stuart @ 2026-03-03 22:50 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Ghimiray, Himal Prasad, Yadav, Arvind,
	thomas.hellstrom@linux.intel.com, Dugast, Francois

On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> Update the scheduler job layer to support PT jobs. PT jobs are
> executed
> entirely on the CPU and do not require LRC fences or a batch address.
> Repurpose the LRC fence storage to hold PT‑job arguments and update
> the
> scheduler job layer to distinguish between PT jobs and jobs that
> require
> an LRC.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_sched_job.c       | 92 ++++++++++++++++-------
> --
>  drivers/gpu/drm/xe/xe_sched_job_types.h | 31 ++++++++-
>  2 files changed, 88 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_sched_job.c
> b/drivers/gpu/drm/xe/xe_sched_job.c
> index ae5b38b2a884..a8ba7f90368f 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job.c
> +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> @@ -26,19 +26,22 @@ static struct kmem_cache
> *xe_sched_job_parallel_slab;
>  
>  int __init xe_sched_job_module_init(void)
>  {
> +       struct xe_sched_job *job;
> +       size_t size;
> +
> +       size = struct_size(job, ptrs, 1);

Nice..

>         xe_sched_job_slab =
> -               kmem_cache_create("xe_sched_job",
> -                                 sizeof(struct xe_sched_job) +
> -                                 sizeof(struct xe_job_ptrs), 0,
> +               kmem_cache_create("xe_sched_job", size, 0,
>                                   SLAB_HWCACHE_ALIGN, NULL);
>         if (!xe_sched_job_slab)
>                 return -ENOMEM;
>  
> +       size = max_t(size_t,
> +                    struct_size(job, ptrs,
> +                                XE_HW_ENGINE_MAX_INSTANCE),
> +                    struct_size(job, pt_update, 1));
>         xe_sched_job_parallel_slab =
> -               kmem_cache_create("xe_sched_job_parallel",
> -                                 sizeof(struct xe_sched_job) +
> -                                 sizeof(struct xe_job_ptrs) *
> -                                 XE_HW_ENGINE_MAX_INSTANCE, 0,
> +               kmem_cache_create("xe_sched_job_parallel", size, 0,
>                                   SLAB_HWCACHE_ALIGN, NULL);
>         if (!xe_sched_job_parallel_slab) {
>                 kmem_cache_destroy(xe_sched_job_slab);
> @@ -84,7 +87,7 @@ static void xe_sched_job_free_fences(struct
> xe_sched_job *job)
>  {
>         int i;
>  
> -       for (i = 0; i < job->q->width; ++i) {
> +       for (i = 0; !job->is_pt_job && i < job->q->width; ++i) {
>                 struct xe_job_ptrs *ptrs = &job->ptrs[i];
>  
>                 if (ptrs->lrc_fence)
> @@ -93,10 +96,23 @@ static void xe_sched_job_free_fences(struct
> xe_sched_job *job)
>         }
>  }
>  
> +/**
> + * xe_sched_job_create() - Create a scheduler job
> + * @q: exec queue to create the scheduler job for
> + * @batch: array of batch addresses for the job; must match the
> width of @q,
> + *         or NULL to indicate a PT job that does not require a
> batch address
> + *
> + * Create a scheduler job for submission.
> + *
> + * Context: Reclaim
> + *
> + * Return: a &xe_sched_job object on success, or an ERR_PTR on
> failure.
> + */
>  struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
>                                          u64 *batch_addr)
>  {
>         bool is_migration = xe_sched_job_is_migration(q);
> +       struct xe_device *xe = gt_to_xe(q->gt);
>         struct xe_sched_job *job;
>         int err;
>         int i;
> @@ -105,6 +121,9 @@ struct xe_sched_job *xe_sched_job_create(struct
> xe_exec_queue *q,
>         /* only a kernel context can submit a vm-less job */
>         XE_WARN_ON(!q->vm && !(q->flags & EXEC_QUEUE_FLAG_KERNEL));
>  
> +       xe_assert(xe, batch_addr ||
> +                 q->flags & (EXEC_QUEUE_FLAG_VM |
> EXEC_QUEUE_FLAG_MIGRATE));

Ok..

> +
>         job = job_alloc(xe_exec_queue_is_parallel(q) ||
> is_migration);
>         if (!job)
>                 return ERR_PTR(-ENOMEM);
> @@ -119,34 +138,39 @@ struct xe_sched_job *xe_sched_job_create(struct
> xe_exec_queue *q,
>         if (err)
>                 goto err_free;
>  
> -       for (i = 0; i < q->width; ++i) {
> -               struct dma_fence *fence = xe_lrc_alloc_seqno_fence();
> -               struct dma_fence_chain *chain;
> -
> -               if (IS_ERR(fence)) {
> -                       err = PTR_ERR(fence);
> -                       goto err_sched_job;
> +       if (!batch_addr) {
> +               job->fence = dma_fence_get_stub();
> +               job->is_pt_job = true;
> +       } else {
> +               for (i = 0; i < q->width; ++i) {
> +                       struct dma_fence *fence =
> xe_lrc_alloc_seqno_fence();
> +                       struct dma_fence_chain *chain;
> +
> +                       if (IS_ERR(fence)) {
> +                               err = PTR_ERR(fence);
> +                               goto err_sched_job;
> +                       }
> +                       job->ptrs[i].lrc_fence = fence;
> +
> +                       if (i + 1 == q->width)
> +                               continue;
> +
> +                       chain = dma_fence_chain_alloc();
> +                       if (!chain) {
> +                               err = -ENOMEM;
> +                               goto err_sched_job;
> +                       }
> +                       job->ptrs[i].chain_fence = chain;
>                 }
> -               job->ptrs[i].lrc_fence = fence;
>  
> -               if (i + 1 == q->width)
> -                       continue;
> +               width = q->width;
> +               if (is_migration)
> +                       width = 2;
>  
> -               chain = dma_fence_chain_alloc();
> -               if (!chain) {
> -                       err = -ENOMEM;
> -                       goto err_sched_job;
> -               }
> -               job->ptrs[i].chain_fence = chain;
> +               for (i = 0; i < width; ++i)
> +                       job->ptrs[i].batch_addr = batch_addr[i];
>         }
>  
> -       width = q->width;
> -       if (is_migration)
> -               width = 2;
> -
> -       for (i = 0; i < width; ++i)
> -               job->ptrs[i].batch_addr = batch_addr[i];
> -
>         atomic_inc(&q->job_cnt);
>         xe_pm_runtime_get_noresume(job_to_xe(job));
>         trace_xe_sched_job_create(job);
> @@ -246,7 +270,7 @@ bool xe_sched_job_completed(struct xe_sched_job
> *job)
>  void xe_sched_job_arm(struct xe_sched_job *job)
>  {
>         struct xe_exec_queue *q = job->q;
> -       struct dma_fence *fence, *prev;
> +       struct dma_fence *fence = job->fence, *prev;

Looks like this was a bug where prev would be NULL in that first for
each queue width loop below? Would be nice for this to go into a
separate patch.

>         struct xe_vm *vm = q->vm;
>         u64 seqno = 0;
>         int i;
> @@ -266,6 +290,9 @@ void xe_sched_job_arm(struct xe_sched_job *job)
>                 job->ring_ops_flush_tlb = true;
>         }
>  
> +       if (job->is_pt_job)
> +               goto arm;
> +
>         /* Arm the pre-allocated fences */
>         for (i = 0; i < q->width; prev = fence, ++i) {

Can we either use the goto above or change this to align with what you
had earlier, something like:
for (i = 0; !job->is_pt_job && i < q->width; prev = fence, ++i) {

Just for consistency...

>                 struct dma_fence_chain *chain;
> @@ -286,6 +313,7 @@ void xe_sched_job_arm(struct xe_sched_job *job)
>                 fence = &chain->base;
>         }
>  
> +arm:
>         job->fence = dma_fence_get(fence);      /* Pairs with put in
> scheduler */
>         drm_sched_job_arm(&job->drm);
>  }
> diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h
> b/drivers/gpu/drm/xe/xe_sched_job_types.h
> index 13c2970e81a8..9be4e2c5989d 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job_types.h
> +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
> @@ -10,10 +10,29 @@
>  
>  #include <drm/gpu_scheduler.h>
>  
> -struct xe_exec_queue;
>  struct dma_fence;
>  struct dma_fence_chain;
>  
> +struct xe_exec_queue;
> +struct xe_migrate_pt_update_ops;
> +struct xe_pt_job_ops;
> +struct xe_tile;
> +struct xe_vm;
> +
> +/**
> + * struct xe_pt_update_args - PT update arguments
> + */
> +struct xe_pt_update_args {
> +       /** @vm: VM which is being bound */
> +       struct xe_vm *vm;
> +       /** @tile: Tile which page tables belong to */
> +       struct xe_tile *tile;
> +       /** @ops: Migrate PT update ops */
> +       const struct xe_migrate_pt_update_ops *ops;
> +       /** @pt_job_ops: PT job ops state */
> +       struct xe_pt_job_ops *pt_job_ops;
> +};
> +
>  /**
>   * struct xe_job_ptrs - Per hw engine instance data
>   */
> @@ -69,8 +88,14 @@ struct xe_sched_job {
>         bool restore_replay;
>         /** @last_replay: last job being replayed */
>         bool last_replay;
> -       /** @ptrs: per instance pointers. */
> -       struct xe_job_ptrs ptrs[];
> +       /** @is_pt_job: is a PT job */
> +       bool is_pt_job;
> +       union {
> +               /** @ptrs: per instance pointers. */
> +               DECLARE_FLEX_ARRAY(struct xe_job_ptrs, ptrs);

Nice..

Thanks,
Stuart

> +               /** @pt_update: PT update arguments */
> +               DECLARE_FLEX_ARRAY(struct xe_pt_update_args,
> pt_update);
> +       };
>  };
>  
>  struct xe_sched_job_snapshot {


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 07/25] drm/xe: Update scheduler job layer to support PT jobs
  2026-03-03 22:50   ` Summers, Stuart
@ 2026-03-03 23:00     ` Matthew Brost
  0 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-03-03 23:00 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, Ghimiray, Himal Prasad,
	Yadav, Arvind, thomas.hellstrom@linux.intel.com, Dugast, Francois

On Tue, Mar 03, 2026 at 03:50:58PM -0700, Summers, Stuart wrote:
> On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> > Update the scheduler job layer to support PT jobs. PT jobs are
> > executed
> > entirely on the CPU and do not require LRC fences or a batch address.
> > Repurpose the LRC fence storage to hold PT‑job arguments and update
> > the
> > scheduler job layer to distinguish between PT jobs and jobs that
> > require
> > an LRC.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_sched_job.c       | 92 ++++++++++++++++-------
> > --
> >  drivers/gpu/drm/xe/xe_sched_job_types.h | 31 ++++++++-
> >  2 files changed, 88 insertions(+), 35 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job.c
> > b/drivers/gpu/drm/xe/xe_sched_job.c
> > index ae5b38b2a884..a8ba7f90368f 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job.c
> > +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> > @@ -26,19 +26,22 @@ static struct kmem_cache
> > *xe_sched_job_parallel_slab;
> >  
> >  int __init xe_sched_job_module_init(void)
> >  {
> > +       struct xe_sched_job *job;
> > +       size_t size;
> > +
> > +       size = struct_size(job, ptrs, 1);
> 
> Nice..
> 
> >         xe_sched_job_slab =
> > -               kmem_cache_create("xe_sched_job",
> > -                                 sizeof(struct xe_sched_job) +
> > -                                 sizeof(struct xe_job_ptrs), 0,
> > +               kmem_cache_create("xe_sched_job", size, 0,
> >                                   SLAB_HWCACHE_ALIGN, NULL);
> >         if (!xe_sched_job_slab)
> >                 return -ENOMEM;
> >  
> > +       size = max_t(size_t,
> > +                    struct_size(job, ptrs,
> > +                                XE_HW_ENGINE_MAX_INSTANCE),
> > +                    struct_size(job, pt_update, 1));
> >         xe_sched_job_parallel_slab =
> > -               kmem_cache_create("xe_sched_job_parallel",
> > -                                 sizeof(struct xe_sched_job) +
> > -                                 sizeof(struct xe_job_ptrs) *
> > -                                 XE_HW_ENGINE_MAX_INSTANCE, 0,
> > +               kmem_cache_create("xe_sched_job_parallel", size, 0,
> >                                   SLAB_HWCACHE_ALIGN, NULL);
> >         if (!xe_sched_job_parallel_slab) {
> >                 kmem_cache_destroy(xe_sched_job_slab);
> > @@ -84,7 +87,7 @@ static void xe_sched_job_free_fences(struct
> > xe_sched_job *job)
> >  {
> >         int i;
> >  
> > -       for (i = 0; i < job->q->width; ++i) {
> > +       for (i = 0; !job->is_pt_job && i < job->q->width; ++i) {
> >                 struct xe_job_ptrs *ptrs = &job->ptrs[i];
> >  
> >                 if (ptrs->lrc_fence)
> > @@ -93,10 +96,23 @@ static void xe_sched_job_free_fences(struct
> > xe_sched_job *job)
> >         }
> >  }
> >  
> > +/**
> > + * xe_sched_job_create() - Create a scheduler job
> > + * @q: exec queue to create the scheduler job for
> > + * @batch: array of batch addresses for the job; must match the
> > width of @q,
> > + *         or NULL to indicate a PT job that does not require a
> > batch address
> > + *
> > + * Create a scheduler job for submission.
> > + *
> > + * Context: Reclaim
> > + *
> > + * Return: a &xe_sched_job object on success, or an ERR_PTR on
> > failure.
> > + */
> >  struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
> >                                          u64 *batch_addr)
> >  {
> >         bool is_migration = xe_sched_job_is_migration(q);
> > +       struct xe_device *xe = gt_to_xe(q->gt);
> >         struct xe_sched_job *job;
> >         int err;
> >         int i;
> > @@ -105,6 +121,9 @@ struct xe_sched_job *xe_sched_job_create(struct
> > xe_exec_queue *q,
> >         /* only a kernel context can submit a vm-less job */
> >         XE_WARN_ON(!q->vm && !(q->flags & EXEC_QUEUE_FLAG_KERNEL));
> >  
> > +       xe_assert(xe, batch_addr ||
> > +                 q->flags & (EXEC_QUEUE_FLAG_VM |
> > EXEC_QUEUE_FLAG_MIGRATE));
> 
> Ok..
> 

I do plan on reworking the 'EXEC_QUEUE_FLAG_*' in follow up but
overloading the kernel bind queue on EXEC_QUEUE_FLAG_MIGRATE in this
series as that is what EXEC_QUEUE_FLAG_MIGRATE meant prior to this
series.

> > +
> >         job = job_alloc(xe_exec_queue_is_parallel(q) ||
> > is_migration);
> >         if (!job)
> >                 return ERR_PTR(-ENOMEM);
> > @@ -119,34 +138,39 @@ struct xe_sched_job *xe_sched_job_create(struct
> > xe_exec_queue *q,
> >         if (err)
> >                 goto err_free;
> >  
> > -       for (i = 0; i < q->width; ++i) {
> > -               struct dma_fence *fence = xe_lrc_alloc_seqno_fence();
> > -               struct dma_fence_chain *chain;
> > -
> > -               if (IS_ERR(fence)) {
> > -                       err = PTR_ERR(fence);
> > -                       goto err_sched_job;
> > +       if (!batch_addr) {
> > +               job->fence = dma_fence_get_stub();
> > +               job->is_pt_job = true;
> > +       } else {
> > +               for (i = 0; i < q->width; ++i) {
> > +                       struct dma_fence *fence =
> > xe_lrc_alloc_seqno_fence();
> > +                       struct dma_fence_chain *chain;
> > +
> > +                       if (IS_ERR(fence)) {
> > +                               err = PTR_ERR(fence);
> > +                               goto err_sched_job;
> > +                       }
> > +                       job->ptrs[i].lrc_fence = fence;
> > +
> > +                       if (i + 1 == q->width)
> > +                               continue;
> > +
> > +                       chain = dma_fence_chain_alloc();
> > +                       if (!chain) {
> > +                               err = -ENOMEM;
> > +                               goto err_sched_job;
> > +                       }
> > +                       job->ptrs[i].chain_fence = chain;
> >                 }
> > -               job->ptrs[i].lrc_fence = fence;
> >  
> > -               if (i + 1 == q->width)
> > -                       continue;
> > +               width = q->width;
> > +               if (is_migration)
> > +                       width = 2;
> >  
> > -               chain = dma_fence_chain_alloc();
> > -               if (!chain) {
> > -                       err = -ENOMEM;
> > -                       goto err_sched_job;
> > -               }
> > -               job->ptrs[i].chain_fence = chain;
> > +               for (i = 0; i < width; ++i)
> > +                       job->ptrs[i].batch_addr = batch_addr[i];
> >         }
> >  
> > -       width = q->width;
> > -       if (is_migration)
> > -               width = 2;
> > -
> > -       for (i = 0; i < width; ++i)
> > -               job->ptrs[i].batch_addr = batch_addr[i];
> > -
> >         atomic_inc(&q->job_cnt);
> >         xe_pm_runtime_get_noresume(job_to_xe(job));
> >         trace_xe_sched_job_create(job);
> > @@ -246,7 +270,7 @@ bool xe_sched_job_completed(struct xe_sched_job
> > *job)
> >  void xe_sched_job_arm(struct xe_sched_job *job)
> >  {
> >         struct xe_exec_queue *q = job->q;
> > -       struct dma_fence *fence, *prev;
> > +       struct dma_fence *fence = job->fence, *prev;
> 
> Looks like this was a bug where prev would be NULL in that first for
> each queue width loop below? Would be nice for this to go into a
> separate patch.
> 

No, the existing code works, so does the code after.

We set 'fence = job->fence' so when we go directly to arm the
'dma_fence_get' works.

> >         struct xe_vm *vm = q->vm;
> >         u64 seqno = 0;
> >         int i;
> > @@ -266,6 +290,9 @@ void xe_sched_job_arm(struct xe_sched_job *job)
> >                 job->ring_ops_flush_tlb = true;
> >         }
> >  
> > +       if (job->is_pt_job)
> > +               goto arm;
> > +
> >         /* Arm the pre-allocated fences */
> >         for (i = 0; i < q->width; prev = fence, ++i) {
> 
> Can we either use the goto above or change this to align with what you
> had earlier, something like:
> for (i = 0; !job->is_pt_job && i < q->width; prev = fence, ++i) {
> 
> Just for consistency...
> 

Let me use a 'goto' in both cases. I've gotten push back on boolean
short circuits in the loops in the past.

Matt

> >                 struct dma_fence_chain *chain;
> > @@ -286,6 +313,7 @@ void xe_sched_job_arm(struct xe_sched_job *job)
> >                 fence = &chain->base;
> >         }
> >  
> > +arm:
> >         job->fence = dma_fence_get(fence);      /* Pairs with put in
> > scheduler */
> >         drm_sched_job_arm(&job->drm);
> >  }
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h
> > b/drivers/gpu/drm/xe/xe_sched_job_types.h
> > index 13c2970e81a8..9be4e2c5989d 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job_types.h
> > +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
> > @@ -10,10 +10,29 @@
> >  
> >  #include <drm/gpu_scheduler.h>
> >  
> > -struct xe_exec_queue;
> >  struct dma_fence;
> >  struct dma_fence_chain;
> >  
> > +struct xe_exec_queue;
> > +struct xe_migrate_pt_update_ops;
> > +struct xe_pt_job_ops;
> > +struct xe_tile;
> > +struct xe_vm;
> > +
> > +/**
> > + * struct xe_pt_update_args - PT update arguments
> > + */
> > +struct xe_pt_update_args {
> > +       /** @vm: VM which is being bound */
> > +       struct xe_vm *vm;
> > +       /** @tile: Tile which page tables belong to */
> > +       struct xe_tile *tile;
> > +       /** @ops: Migrate PT update ops */
> > +       const struct xe_migrate_pt_update_ops *ops;
> > +       /** @pt_job_ops: PT job ops state */
> > +       struct xe_pt_job_ops *pt_job_ops;
> > +};
> > +
> >  /**
> >   * struct xe_job_ptrs - Per hw engine instance data
> >   */
> > @@ -69,8 +88,14 @@ struct xe_sched_job {
> >         bool restore_replay;
> >         /** @last_replay: last job being replayed */
> >         bool last_replay;
> > -       /** @ptrs: per instance pointers. */
> > -       struct xe_job_ptrs ptrs[];
> > +       /** @is_pt_job: is a PT job */
> > +       bool is_pt_job;
> > +       union {
> > +               /** @ptrs: per instance pointers. */
> > +               DECLARE_FLEX_ARRAY(struct xe_job_ptrs, ptrs);
> 
> Nice..
> 
> Thanks,
> Stuart
> 
> > +               /** @pt_update: PT update arguments */
> > +               DECLARE_FLEX_ARRAY(struct xe_pt_update_args,
> > pt_update);
> > +       };
> >  };
> >  
> >  struct xe_sched_job_snapshot {
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 08/25] drm/xe: Add helpers to access PT ops
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (6 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 07/25] drm/xe: Update scheduler job layer to support PT jobs Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-04-07 15:22   ` Francois Dugast
  2026-02-28  1:34 ` [PATCH v3 09/25] drm/xe: Add struct xe_pt_job_ops Matthew Brost
                   ` (22 subsequent siblings)
  30 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Add helpers to access PT ops, making it easier to shuffle the location of
the ops structures without requiring widespread code changes.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c | 65 ++++++++++++++++++++++++++------------
 1 file changed, 45 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 83dacc91b7b3..1f24eff75185 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1881,13 +1881,37 @@ xe_pt_commit_prepare_unbind(struct xe_vma *vma,
 	}
 }
 
+static struct xe_vm_pgtable_update_op *
+to_pt_op(struct xe_vm_pgtable_update_ops *pt_update_ops, u32 op_idx)
+{
+	return &pt_update_ops->ops[op_idx];
+}
+
+static u32
+get_current_op(struct xe_vm_pgtable_update_ops *pt_update_ops)
+{
+	return pt_update_ops->current_op;
+}
+
+static struct xe_vm_pgtable_update_op *
+to_current_pt_op(struct xe_vm_pgtable_update_ops *pt_update_ops)
+{
+	return to_pt_op(pt_update_ops, get_current_op(pt_update_ops));
+}
+
+static void
+incr_current_op(struct xe_vm_pgtable_update_ops *pt_update_ops)
+{
+	++pt_update_ops->current_op;
+}
+
 static void
 xe_pt_update_ops_rfence_interval(struct xe_vm_pgtable_update_ops *pt_update_ops,
 				 u64 start, u64 end)
 {
 	u64 last;
-	u32 current_op = pt_update_ops->current_op;
-	struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op];
+	struct xe_vm_pgtable_update_op *pt_op =
+		to_current_pt_op(pt_update_ops);
 	int i, level = 0;
 
 	for (i = 0; i < pt_op->num_entries; i++) {
@@ -1922,8 +1946,8 @@ static int bind_op_prepare(struct xe_vm *vm, struct xe_tile *tile,
 			   struct xe_vm_pgtable_update_ops *pt_update_ops,
 			   struct xe_vma *vma, bool invalidate_on_bind)
 {
-	u32 current_op = pt_update_ops->current_op;
-	struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op];
+	struct xe_vm_pgtable_update_op *pt_op =
+		to_current_pt_op(pt_update_ops);
 	int err;
 
 	xe_tile_assert(tile, !xe_vma_is_cpu_addr_mirror(vma));
@@ -1952,7 +1976,7 @@ static int bind_op_prepare(struct xe_vm *vm, struct xe_tile *tile,
 		xe_pt_update_ops_rfence_interval(pt_update_ops,
 						 xe_vma_start(vma),
 						 xe_vma_end(vma));
-		++pt_update_ops->current_op;
+		incr_current_op(pt_update_ops);
 		pt_update_ops->needs_svm_lock |= xe_vma_is_userptr(vma);
 
 		/*
@@ -1989,8 +2013,8 @@ static int bind_range_prepare(struct xe_vm *vm, struct xe_tile *tile,
 			      struct xe_vm_pgtable_update_ops *pt_update_ops,
 			      struct xe_vma *vma, struct xe_svm_range *range)
 {
-	u32 current_op = pt_update_ops->current_op;
-	struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op];
+	struct xe_vm_pgtable_update_op *pt_op =
+		to_current_pt_op(pt_update_ops);
 	int err;
 
 	xe_tile_assert(tile, xe_vma_is_cpu_addr_mirror(vma));
@@ -2014,7 +2038,7 @@ static int bind_range_prepare(struct xe_vm *vm, struct xe_tile *tile,
 		xe_pt_update_ops_rfence_interval(pt_update_ops,
 						 xe_svm_range_start(range),
 						 xe_svm_range_end(range));
-		++pt_update_ops->current_op;
+		incr_current_op(pt_update_ops);
 		pt_update_ops->needs_svm_lock = true;
 
 		pt_op->vma = vma;
@@ -2032,8 +2056,8 @@ static int unbind_op_prepare(struct xe_tile *tile,
 			     struct xe_vma *vma)
 {
 	struct xe_device *xe = tile_to_xe(tile);
-	u32 current_op = pt_update_ops->current_op;
-	struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op];
+	struct xe_vm_pgtable_update_op *pt_op =
+		to_current_pt_op(pt_update_ops);
 	int err;
 
 	if (!((vma->tile_present | vma->tile_staged) & BIT(tile->id)))
@@ -2072,7 +2096,7 @@ static int unbind_op_prepare(struct xe_tile *tile,
 				pt_op->num_entries, false);
 	xe_pt_update_ops_rfence_interval(pt_update_ops, xe_vma_start(vma),
 					 xe_vma_end(vma));
-	++pt_update_ops->current_op;
+	incr_current_op(pt_update_ops);
 	pt_update_ops->needs_svm_lock |= xe_vma_is_userptr(vma);
 	pt_update_ops->needs_invalidation = true;
 
@@ -2112,8 +2136,8 @@ static int unbind_range_prepare(struct xe_vm *vm,
 				struct xe_vm_pgtable_update_ops *pt_update_ops,
 				struct xe_svm_range *range)
 {
-	u32 current_op = pt_update_ops->current_op;
-	struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op];
+	struct xe_vm_pgtable_update_op *pt_op =
+		to_current_pt_op(pt_update_ops);
 
 	if (!(range->tile_present & BIT(tile->id)))
 		return 0;
@@ -2134,7 +2158,7 @@ static int unbind_range_prepare(struct xe_vm *vm,
 				pt_op->num_entries, false);
 	xe_pt_update_ops_rfence_interval(pt_update_ops, xe_svm_range_start(range),
 					 xe_svm_range_end(range));
-	++pt_update_ops->current_op;
+	incr_current_op(pt_update_ops);
 	pt_update_ops->needs_svm_lock = true;
 	pt_update_ops->needs_invalidation |= xe_vm_has_scratch(vm) ||
 		xe_vm_has_valid_gpu_mapping(tile, range->tile_present,
@@ -2282,7 +2306,7 @@ int xe_pt_update_ops_prepare(struct xe_tile *tile, struct xe_vma_ops *vops)
 			return err;
 	}
 
-	xe_tile_assert(tile, pt_update_ops->current_op <=
+	xe_tile_assert(tile, get_current_op(pt_update_ops) <=
 		       pt_update_ops->num_ops);
 
 #ifdef TEST_VM_OPS_ERROR
@@ -2515,7 +2539,7 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
 	lockdep_assert_held(&vm->lock);
 	xe_vm_assert_held(vm);
 
-	if (!pt_update_ops->current_op) {
+	if (!get_current_op(pt_update_ops)) {
 		xe_tile_assert(tile, xe_vm_in_fault_mode(vm));
 
 		return dma_fence_get_stub();
@@ -2583,8 +2607,9 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
 	}
 
 	/* Point of no return - VM killed if failure after this */
-	for (i = 0; i < pt_update_ops->current_op; ++i) {
-		struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[i];
+	for (i = 0; i < get_current_op(pt_update_ops); ++i) {
+		struct xe_vm_pgtable_update_op *pt_op =
+			to_pt_op(pt_update_ops, i);
 
 		xe_pt_commit(pt_op->vma, pt_op->entries,
 			     pt_op->num_entries, &pt_update_ops->deferred);
@@ -2708,9 +2733,9 @@ void xe_pt_update_ops_abort(struct xe_tile *tile, struct xe_vma_ops *vops)
 
 	for (i = pt_update_ops->num_ops - 1; i >= 0; --i) {
 		struct xe_vm_pgtable_update_op *pt_op =
-			&pt_update_ops->ops[i];
+			to_pt_op(pt_update_ops, i);
 
-		if (!pt_op->vma || i >= pt_update_ops->current_op)
+		if (!pt_op->vma || i >= get_current_op(pt_update_ops))
 			continue;
 
 		if (pt_op->bind)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 08/25] drm/xe: Add helpers to access PT ops
  2026-02-28  1:34 ` [PATCH v3 08/25] drm/xe: Add helpers to access PT ops Matthew Brost
@ 2026-04-07 15:22   ` Francois Dugast
  0 siblings, 0 replies; 63+ messages in thread
From: Francois Dugast @ 2026-04-07 15:22 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom

On Fri, Feb 27, 2026 at 05:34:44PM -0800, Matthew Brost wrote:
> Add helpers to access PT ops, making it easier to shuffle the location of
> the ops structures without requiring widespread code changes.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Francois Dugast <francois.dugast@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_pt.c | 65 ++++++++++++++++++++++++++------------
>  1 file changed, 45 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 83dacc91b7b3..1f24eff75185 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1881,13 +1881,37 @@ xe_pt_commit_prepare_unbind(struct xe_vma *vma,
>  	}
>  }
>  
> +static struct xe_vm_pgtable_update_op *
> +to_pt_op(struct xe_vm_pgtable_update_ops *pt_update_ops, u32 op_idx)
> +{
> +	return &pt_update_ops->ops[op_idx];
> +}
> +
> +static u32
> +get_current_op(struct xe_vm_pgtable_update_ops *pt_update_ops)
> +{
> +	return pt_update_ops->current_op;
> +}
> +
> +static struct xe_vm_pgtable_update_op *
> +to_current_pt_op(struct xe_vm_pgtable_update_ops *pt_update_ops)
> +{
> +	return to_pt_op(pt_update_ops, get_current_op(pt_update_ops));
> +}
> +
> +static void
> +incr_current_op(struct xe_vm_pgtable_update_ops *pt_update_ops)
> +{
> +	++pt_update_ops->current_op;
> +}
> +
>  static void
>  xe_pt_update_ops_rfence_interval(struct xe_vm_pgtable_update_ops *pt_update_ops,
>  				 u64 start, u64 end)
>  {
>  	u64 last;
> -	u32 current_op = pt_update_ops->current_op;
> -	struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op];
> +	struct xe_vm_pgtable_update_op *pt_op =
> +		to_current_pt_op(pt_update_ops);
>  	int i, level = 0;
>  
>  	for (i = 0; i < pt_op->num_entries; i++) {
> @@ -1922,8 +1946,8 @@ static int bind_op_prepare(struct xe_vm *vm, struct xe_tile *tile,
>  			   struct xe_vm_pgtable_update_ops *pt_update_ops,
>  			   struct xe_vma *vma, bool invalidate_on_bind)
>  {
> -	u32 current_op = pt_update_ops->current_op;
> -	struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op];
> +	struct xe_vm_pgtable_update_op *pt_op =
> +		to_current_pt_op(pt_update_ops);
>  	int err;
>  
>  	xe_tile_assert(tile, !xe_vma_is_cpu_addr_mirror(vma));
> @@ -1952,7 +1976,7 @@ static int bind_op_prepare(struct xe_vm *vm, struct xe_tile *tile,
>  		xe_pt_update_ops_rfence_interval(pt_update_ops,
>  						 xe_vma_start(vma),
>  						 xe_vma_end(vma));
> -		++pt_update_ops->current_op;
> +		incr_current_op(pt_update_ops);
>  		pt_update_ops->needs_svm_lock |= xe_vma_is_userptr(vma);
>  
>  		/*
> @@ -1989,8 +2013,8 @@ static int bind_range_prepare(struct xe_vm *vm, struct xe_tile *tile,
>  			      struct xe_vm_pgtable_update_ops *pt_update_ops,
>  			      struct xe_vma *vma, struct xe_svm_range *range)
>  {
> -	u32 current_op = pt_update_ops->current_op;
> -	struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op];
> +	struct xe_vm_pgtable_update_op *pt_op =
> +		to_current_pt_op(pt_update_ops);
>  	int err;
>  
>  	xe_tile_assert(tile, xe_vma_is_cpu_addr_mirror(vma));
> @@ -2014,7 +2038,7 @@ static int bind_range_prepare(struct xe_vm *vm, struct xe_tile *tile,
>  		xe_pt_update_ops_rfence_interval(pt_update_ops,
>  						 xe_svm_range_start(range),
>  						 xe_svm_range_end(range));
> -		++pt_update_ops->current_op;
> +		incr_current_op(pt_update_ops);
>  		pt_update_ops->needs_svm_lock = true;
>  
>  		pt_op->vma = vma;
> @@ -2032,8 +2056,8 @@ static int unbind_op_prepare(struct xe_tile *tile,
>  			     struct xe_vma *vma)
>  {
>  	struct xe_device *xe = tile_to_xe(tile);
> -	u32 current_op = pt_update_ops->current_op;
> -	struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op];
> +	struct xe_vm_pgtable_update_op *pt_op =
> +		to_current_pt_op(pt_update_ops);
>  	int err;
>  
>  	if (!((vma->tile_present | vma->tile_staged) & BIT(tile->id)))
> @@ -2072,7 +2096,7 @@ static int unbind_op_prepare(struct xe_tile *tile,
>  				pt_op->num_entries, false);
>  	xe_pt_update_ops_rfence_interval(pt_update_ops, xe_vma_start(vma),
>  					 xe_vma_end(vma));
> -	++pt_update_ops->current_op;
> +	incr_current_op(pt_update_ops);
>  	pt_update_ops->needs_svm_lock |= xe_vma_is_userptr(vma);
>  	pt_update_ops->needs_invalidation = true;
>  
> @@ -2112,8 +2136,8 @@ static int unbind_range_prepare(struct xe_vm *vm,
>  				struct xe_vm_pgtable_update_ops *pt_update_ops,
>  				struct xe_svm_range *range)
>  {
> -	u32 current_op = pt_update_ops->current_op;
> -	struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op];
> +	struct xe_vm_pgtable_update_op *pt_op =
> +		to_current_pt_op(pt_update_ops);
>  
>  	if (!(range->tile_present & BIT(tile->id)))
>  		return 0;
> @@ -2134,7 +2158,7 @@ static int unbind_range_prepare(struct xe_vm *vm,
>  				pt_op->num_entries, false);
>  	xe_pt_update_ops_rfence_interval(pt_update_ops, xe_svm_range_start(range),
>  					 xe_svm_range_end(range));
> -	++pt_update_ops->current_op;
> +	incr_current_op(pt_update_ops);
>  	pt_update_ops->needs_svm_lock = true;
>  	pt_update_ops->needs_invalidation |= xe_vm_has_scratch(vm) ||
>  		xe_vm_has_valid_gpu_mapping(tile, range->tile_present,
> @@ -2282,7 +2306,7 @@ int xe_pt_update_ops_prepare(struct xe_tile *tile, struct xe_vma_ops *vops)
>  			return err;
>  	}
>  
> -	xe_tile_assert(tile, pt_update_ops->current_op <=
> +	xe_tile_assert(tile, get_current_op(pt_update_ops) <=
>  		       pt_update_ops->num_ops);
>  
>  #ifdef TEST_VM_OPS_ERROR
> @@ -2515,7 +2539,7 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
>  	lockdep_assert_held(&vm->lock);
>  	xe_vm_assert_held(vm);
>  
> -	if (!pt_update_ops->current_op) {
> +	if (!get_current_op(pt_update_ops)) {
>  		xe_tile_assert(tile, xe_vm_in_fault_mode(vm));
>  
>  		return dma_fence_get_stub();
> @@ -2583,8 +2607,9 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
>  	}
>  
>  	/* Point of no return - VM killed if failure after this */
> -	for (i = 0; i < pt_update_ops->current_op; ++i) {
> -		struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[i];
> +	for (i = 0; i < get_current_op(pt_update_ops); ++i) {
> +		struct xe_vm_pgtable_update_op *pt_op =
> +			to_pt_op(pt_update_ops, i);
>  
>  		xe_pt_commit(pt_op->vma, pt_op->entries,
>  			     pt_op->num_entries, &pt_update_ops->deferred);
> @@ -2708,9 +2733,9 @@ void xe_pt_update_ops_abort(struct xe_tile *tile, struct xe_vma_ops *vops)
>  
>  	for (i = pt_update_ops->num_ops - 1; i >= 0; --i) {
>  		struct xe_vm_pgtable_update_op *pt_op =
> -			&pt_update_ops->ops[i];
> +			to_pt_op(pt_update_ops, i);
>  
> -		if (!pt_op->vma || i >= pt_update_ops->current_op)
> +		if (!pt_op->vma || i >= get_current_op(pt_update_ops))
>  			continue;
>  
>  		if (pt_op->bind)
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 09/25] drm/xe: Add struct xe_pt_job_ops
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (7 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 08/25] drm/xe: Add helpers to access PT ops Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-03-03 23:26   ` Summers, Stuart
  2026-02-28  1:34 ` [PATCH v3 10/25] drm/xe: Update GuC submission backend to run PT jobs Matthew Brost
                   ` (21 subsequent siblings)
  30 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Add struct xe_pt_job_ops, a dynamically refcounted object that contains
the information required to issue a CPU bind via a job after the initial
bind IOCTL returns.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_migrate.c  |  10 +--
 drivers/gpu/drm/xe/xe_pt.c       | 132 +++++++++++++++++++++++++++----
 drivers/gpu/drm/xe/xe_pt.h       |   4 +
 drivers/gpu/drm/xe/xe_pt_types.h |  27 +++++--
 drivers/gpu/drm/xe/xe_vm.c       |  10 +--
 5 files changed, 149 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 69e6e3135ec6..cd6802642ef3 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1771,7 +1771,7 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
 	}
 
 	xe_migrate_update_pgtables_cpu_execute(vm, m->tile, ops,
-					       pt_update_ops->ops,
+					       pt_update_ops->pt_job_ops->ops,
 					       pt_update_ops->num_ops);
 
 	return dma_fence_get_stub();
@@ -1798,7 +1798,7 @@ __xe_migrate_update_pgtables(struct xe_migrate *m,
 	bool usm = is_migrate && xe->info.has_usm;
 
 	for (i = 0; i < pt_update_ops->num_ops; ++i) {
-		struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[i];
+		struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->pt_job_ops->ops[i];
 		struct xe_vm_pgtable_update *updates = pt_op->entries;
 
 		num_updates += pt_op->num_entries;
@@ -1867,7 +1867,7 @@ __xe_migrate_update_pgtables(struct xe_migrate *m,
 
 			for (; i < pt_update_ops->num_ops; ++i) {
 				struct xe_vm_pgtable_update_op *pt_op =
-					&pt_update_ops->ops[i];
+					&pt_update_ops->pt_job_ops->ops[i];
 				struct xe_vm_pgtable_update *updates = pt_op->entries;
 
 				for (; j < pt_op->num_entries; ++j, ++current_update, ++idx) {
@@ -1904,7 +1904,7 @@ __xe_migrate_update_pgtables(struct xe_migrate *m,
 			(page_ofs / sizeof(u64)) * XE_PAGE_SIZE;
 		for (i = 0; i < pt_update_ops->num_ops; ++i) {
 			struct xe_vm_pgtable_update_op *pt_op =
-				&pt_update_ops->ops[i];
+				&pt_update_ops->pt_job_ops->ops[i];
 			struct xe_vm_pgtable_update *updates = pt_op->entries;
 
 			for (j = 0; j < pt_op->num_entries; ++j) {
@@ -1922,7 +1922,7 @@ __xe_migrate_update_pgtables(struct xe_migrate *m,
 
 		for (i = 0; i < pt_update_ops->num_ops; ++i) {
 			struct xe_vm_pgtable_update_op *pt_op =
-				&pt_update_ops->ops[i];
+				&pt_update_ops->pt_job_ops->ops[i];
 			struct xe_vm_pgtable_update *updates = pt_op->entries;
 
 			for (j = 0; j < pt_op->num_entries; ++j)
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 1f24eff75185..6b56e62a35c1 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -204,7 +204,9 @@ unsigned int xe_pt_shift(unsigned int level)
  * and finally frees @pt. TODO: Can we remove the @flags argument?
  */
 void xe_pt_destroy(struct xe_pt *pt, u32 flags, struct llist_head *deferred)
+
 {
+	bool added = false;
 	int i;
 
 	if (!pt)
@@ -212,7 +214,20 @@ void xe_pt_destroy(struct xe_pt *pt, u32 flags, struct llist_head *deferred)
 
 	XE_WARN_ON(!list_empty(&pt->bo->ttm.base.gpuva.list));
 	xe_bo_unpin(pt->bo);
-	xe_bo_put_deferred(pt->bo, deferred, NULL);
+	xe_bo_put_deferred(pt->bo, deferred, &added);
+	if (added) {
+		xe_assert(pt->bo->vm->xe, !kref_read(&pt->bo->ttm.base.refcount));
+
+		/*
+		 * We need the VM present until the BO is destroyed as it shares
+		 * a dma-resv and BO destroy is async. Reinit BO refcount so
+		 * xe_bo_put_async can be used when the PT job ops refcount goes
+		 * to zero.
+		 */
+		xe_vm_get(pt->bo->vm);
+		pt->bo->flags |= XE_BO_FLAG_PUT_VM_ASYNC;
+		kref_init(&pt->bo->ttm.base.refcount);
+	}
 
 	if (pt->level > 0 && pt->num_live) {
 		struct xe_pt_dir *pt_dir = as_xe_pt_dir(pt);
@@ -1884,13 +1899,13 @@ xe_pt_commit_prepare_unbind(struct xe_vma *vma,
 static struct xe_vm_pgtable_update_op *
 to_pt_op(struct xe_vm_pgtable_update_ops *pt_update_ops, u32 op_idx)
 {
-	return &pt_update_ops->ops[op_idx];
+	return &pt_update_ops->pt_job_ops->ops[op_idx];
 }
 
 static u32
 get_current_op(struct xe_vm_pgtable_update_ops *pt_update_ops)
 {
-	return pt_update_ops->current_op;
+	return pt_update_ops->pt_job_ops->current_op;
 }
 
 static struct xe_vm_pgtable_update_op *
@@ -1902,7 +1917,7 @@ to_current_pt_op(struct xe_vm_pgtable_update_ops *pt_update_ops)
 static void
 incr_current_op(struct xe_vm_pgtable_update_ops *pt_update_ops)
 {
-	++pt_update_ops->current_op;
+	++pt_update_ops->pt_job_ops->current_op;
 }
 
 static void
@@ -2264,7 +2279,6 @@ static int op_prepare(struct xe_vm *vm,
 static void
 xe_pt_update_ops_init(struct xe_vm_pgtable_update_ops *pt_update_ops)
 {
-	init_llist_head(&pt_update_ops->deferred);
 	pt_update_ops->start = ~0x0ull;
 	pt_update_ops->last = 0x0ull;
 	xe_page_reclaim_list_init(&pt_update_ops->prl);
@@ -2612,7 +2626,8 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
 			to_pt_op(pt_update_ops, i);
 
 		xe_pt_commit(pt_op->vma, pt_op->entries,
-			     pt_op->num_entries, &pt_update_ops->deferred);
+			     pt_op->num_entries,
+			     &pt_update_ops->pt_job_ops->deferred);
 		pt_op->vma = NULL;	/* skip in xe_pt_update_ops_abort */
 	}
 
@@ -2700,19 +2715,8 @@ void xe_pt_update_ops_fini(struct xe_tile *tile, struct xe_vma_ops *vops)
 {
 	struct xe_vm_pgtable_update_ops *pt_update_ops =
 		&vops->pt_update_ops[tile->id];
-	int i;
 
 	xe_page_reclaim_entries_put(pt_update_ops->prl.entries);
-
-	lockdep_assert_held(&vops->vm->lock);
-	xe_vm_assert_held(vops->vm);
-
-	for (i = 0; i < pt_update_ops->current_op; ++i) {
-		struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[i];
-
-		xe_pt_free_bind(pt_op->entries, pt_op->num_entries);
-	}
-	xe_bo_put_commit(&vops->pt_update_ops[tile->id].deferred);
 }
 
 /**
@@ -2749,3 +2753,97 @@ void xe_pt_update_ops_abort(struct xe_tile *tile, struct xe_vma_ops *vops)
 
 	xe_pt_update_ops_fini(tile, vops);
 }
+
+/**
+ * xe_pt_job_ops_alloc() - Allocate PT job ops
+ * @num_ops: Number of VM PT update ops
+ *
+ * Allocate PT job ops and internal array of VM PT update ops.
+ *
+ * Return: Pointer to PT job ops or NULL
+ */
+struct xe_pt_job_ops *xe_pt_job_ops_alloc(u32 num_ops)
+{
+	struct xe_pt_job_ops *pt_job_ops;
+
+	pt_job_ops = kmalloc(sizeof(*pt_job_ops), GFP_KERNEL);
+	if (!pt_job_ops)
+		return NULL;
+
+	pt_job_ops->ops = kvmalloc_array(num_ops, sizeof(*pt_job_ops->ops),
+					 GFP_KERNEL);
+	if (!pt_job_ops->ops) {
+		kvfree(pt_job_ops);
+		return NULL;
+	}
+
+	pt_job_ops->current_op = 0;
+	kref_init(&pt_job_ops->refcount);
+	init_llist_head(&pt_job_ops->deferred);
+
+	return pt_job_ops;
+}
+
+/**
+ * xe_pt_job_ops_get() - Get PT job ops
+ * @pt_job_ops: PT job ops to get
+ *
+ * Take a reference to PT job ops
+ *
+ * Return: Pointer to PT job ops or NULL
+ */
+struct xe_pt_job_ops *xe_pt_job_ops_get(struct xe_pt_job_ops *pt_job_ops)
+{
+	if (pt_job_ops)
+		kref_get(&pt_job_ops->refcount);
+
+	return pt_job_ops;
+}
+
+static void xe_pt_update_ops_free(struct xe_vm_pgtable_update_op *pt_op,
+				  u32 num_ops)
+{
+	u32 i;
+
+	for (i = 0; i < num_ops; ++i, ++pt_op)
+		xe_pt_free_bind(pt_op->entries, pt_op->num_entries);
+}
+
+static void xe_pt_job_ops_destroy(struct kref *ref)
+{
+	struct xe_pt_job_ops *pt_job_ops =
+		container_of(ref, struct xe_pt_job_ops, refcount);
+	struct llist_node *freed;
+	struct xe_bo *bo, *next;
+
+	xe_pt_update_ops_free(pt_job_ops->ops,
+			      pt_job_ops->current_op);
+
+	freed = llist_del_all(&pt_job_ops->deferred);
+	if (freed) {
+		llist_for_each_entry_safe(bo, next, freed, freed)
+			/*
+			 * If called from run_job, we are in the dma-fencing
+			 * path and cannot take dma-resv locks so use an async
+			 * put.
+			 */
+			xe_bo_put_async(bo);
+	}
+
+	kvfree(pt_job_ops->ops);
+	kfree(pt_job_ops);
+}
+
+/**
+ * xe_pt_job_ops_put() - Put PT job ops
+ * @pt_job_ops: PT job ops to put
+ *
+ * Drop a reference to PT job ops
+ */
+void xe_pt_job_ops_put(struct xe_pt_job_ops *pt_job_ops)
+{
+	if (!pt_job_ops)
+		return;
+
+	kref_put(&pt_job_ops->refcount, xe_pt_job_ops_destroy);
+}
diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
index 4daeebaab5a1..5faddb8e700c 100644
--- a/drivers/gpu/drm/xe/xe_pt.h
+++ b/drivers/gpu/drm/xe/xe_pt.h
@@ -49,4 +49,8 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
 bool xe_pt_zap_ptes_range(struct xe_tile *tile, struct xe_vm *vm,
 			  struct xe_svm_range *range);
 
+struct xe_pt_job_ops *xe_pt_job_ops_alloc(u32 num_ops);
+struct xe_pt_job_ops *xe_pt_job_ops_get(struct xe_pt_job_ops *pt_job_ops);
+void xe_pt_job_ops_put(struct xe_pt_job_ops *pt_job_ops);
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
index 84b51d3762a4..92d50573ed1d 100644
--- a/drivers/gpu/drm/xe/xe_pt_types.h
+++ b/drivers/gpu/drm/xe/xe_pt_types.h
@@ -91,12 +91,29 @@ struct xe_vm_pgtable_update_op {
 	bool rebind;
 };
 
+/**
+ * struct xe_pt_job_ops - Page-table update operations (dynamically allocated)
+ *
+ * This is the portion of &struct xe_vma_ops and
+ * &struct xe_vm_pgtable_update_ops that is dynamically allocated, as it
+ * must remain valid until the associated bind job completes. A reference
+ * count controls its lifetime.
+ */
+struct xe_pt_job_ops {
+	/** @current_op: current page-table update operation */
+	u32 current_op;
+	/** @refcount: reference count */
+	struct kref refcount;
+	/** @deferred: list of deferred PT entries to destroy */
+	struct llist_head deferred;
+	/** @ops: page-table update operations */
+	struct xe_vm_pgtable_update_op *ops;
+};
+
 /** struct xe_vm_pgtable_update_ops: page table update operations */
 struct xe_vm_pgtable_update_ops {
-	/** @ops: operations */
-	struct xe_vm_pgtable_update_op *ops;
-	/** @deferred: deferred list to destroy PT entries */
-	struct llist_head deferred;
+	/** @pt_job_ops: PT update operations dynamic allocation*/
+	struct xe_pt_job_ops *pt_job_ops;
 	/** @q: exec queue for PT operations */
 	struct xe_exec_queue *q;
 	/** @prl: embedded page reclaim list */
@@ -107,8 +124,6 @@ struct xe_vm_pgtable_update_ops {
 	u64 last;
 	/** @num_ops: number of operations */
 	u32 num_ops;
-	/** @current_op: current operations */
-	u32 current_op;
 	/** @needs_svm_lock: Needs SVM lock */
 	bool needs_svm_lock;
 	/** @needs_invalidation: Needs invalidation */
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 548b0769b3ef..3e2d2191b78c 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -585,11 +585,9 @@ static int xe_vma_ops_alloc(struct xe_vma_ops *vops, bool array_of_binds)
 		if (!vops->pt_update_ops[i].num_ops)
 			continue;
 
-		vops->pt_update_ops[i].ops =
-			kmalloc_objs(*vops->pt_update_ops[i].ops,
-				     vops->pt_update_ops[i].num_ops,
-				     GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
-		if (!vops->pt_update_ops[i].ops)
+		vops->pt_update_ops[i].pt_job_ops =
+			xe_pt_job_ops_alloc(vops->pt_update_ops[i].num_ops);
+		if (!vops->pt_update_ops[i].pt_job_ops)
 			return array_of_binds ? -ENOBUFS : -ENOMEM;
 	}
 
@@ -625,7 +623,7 @@ static void xe_vma_ops_fini(struct xe_vma_ops *vops)
 	xe_vma_svm_prefetch_ops_fini(vops);
 
 	for (i = 0; i < XE_MAX_TILES_PER_DEVICE; ++i)
-		kfree(vops->pt_update_ops[i].ops);
+		xe_pt_job_ops_put(vops->pt_update_ops[i].pt_job_ops);
 }
 
 static void xe_vma_ops_incr_pt_update_ops(struct xe_vma_ops *vops, u8 tile_mask, int inc_val)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 09/25] drm/xe: Add struct xe_pt_job_ops
  2026-02-28  1:34 ` [PATCH v3 09/25] drm/xe: Add struct xe_pt_job_ops Matthew Brost
@ 2026-03-03 23:26   ` Summers, Stuart
  2026-03-03 23:28     ` Matthew Brost
  0 siblings, 1 reply; 63+ messages in thread
From: Summers, Stuart @ 2026-03-03 23:26 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Ghimiray, Himal Prasad, Yadav, Arvind,
	thomas.hellstrom@linux.intel.com, Dugast, Francois

On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> Add struct xe_pt_job_ops, a dynamically refcounted object that
> contains
> the information required to issue a CPU bind via a job after the
> initial
> bind IOCTL returns.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_migrate.c  |  10 +--
>  drivers/gpu/drm/xe/xe_pt.c       | 132 +++++++++++++++++++++++++++--
> --
>  drivers/gpu/drm/xe/xe_pt.h       |   4 +
>  drivers/gpu/drm/xe/xe_pt_types.h |  27 +++++--
>  drivers/gpu/drm/xe/xe_vm.c       |  10 +--
>  5 files changed, 149 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> b/drivers/gpu/drm/xe/xe_migrate.c
> index 69e6e3135ec6..cd6802642ef3 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -1771,7 +1771,7 @@ xe_migrate_update_pgtables_cpu(struct
> xe_migrate *m,
>         }
>  
>         xe_migrate_update_pgtables_cpu_execute(vm, m->tile, ops,
> -                                              pt_update_ops->ops,
> +                                              pt_update_ops-
> >pt_job_ops->ops,
>                                                pt_update_ops-
> >num_ops);
>  
>         return dma_fence_get_stub();
> @@ -1798,7 +1798,7 @@ __xe_migrate_update_pgtables(struct xe_migrate
> *m,
>         bool usm = is_migrate && xe->info.has_usm;
>  
>         for (i = 0; i < pt_update_ops->num_ops; ++i) {
> -               struct xe_vm_pgtable_update_op *pt_op =
> &pt_update_ops->ops[i];
> +               struct xe_vm_pgtable_update_op *pt_op =
> &pt_update_ops->pt_job_ops->ops[i];
>                 struct xe_vm_pgtable_update *updates = pt_op-
> >entries;
>  
>                 num_updates += pt_op->num_entries;
> @@ -1867,7 +1867,7 @@ __xe_migrate_update_pgtables(struct xe_migrate
> *m,
>  
>                         for (; i < pt_update_ops->num_ops; ++i) {
>                                 struct xe_vm_pgtable_update_op *pt_op
> =
> -                                       &pt_update_ops->ops[i];
> +                                       &pt_update_ops->pt_job_ops-
> >ops[i];
>                                 struct xe_vm_pgtable_update *updates
> = pt_op->entries;
>  
>                                 for (; j < pt_op->num_entries; ++j,
> ++current_update, ++idx) {
> @@ -1904,7 +1904,7 @@ __xe_migrate_update_pgtables(struct xe_migrate
> *m,
>                         (page_ofs / sizeof(u64)) * XE_PAGE_SIZE;
>                 for (i = 0; i < pt_update_ops->num_ops; ++i) {
>                         struct xe_vm_pgtable_update_op *pt_op =
> -                               &pt_update_ops->ops[i];
> +                               &pt_update_ops->pt_job_ops->ops[i];
>                         struct xe_vm_pgtable_update *updates = pt_op-
> >entries;
>  
>                         for (j = 0; j < pt_op->num_entries; ++j) {
> @@ -1922,7 +1922,7 @@ __xe_migrate_update_pgtables(struct xe_migrate
> *m,
>  
>                 for (i = 0; i < pt_update_ops->num_ops; ++i) {
>                         struct xe_vm_pgtable_update_op *pt_op =
> -                               &pt_update_ops->ops[i];
> +                               &pt_update_ops->pt_job_ops->ops[i];
>                         struct xe_vm_pgtable_update *updates = pt_op-
> >entries;
>  
>                         for (j = 0; j < pt_op->num_entries; ++j)
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 1f24eff75185..6b56e62a35c1 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -204,7 +204,9 @@ unsigned int xe_pt_shift(unsigned int level)
>   * and finally frees @pt. TODO: Can we remove the @flags argument?
>   */
>  void xe_pt_destroy(struct xe_pt *pt, u32 flags, struct llist_head
> *deferred)
> +

Extra line..

>  {
> +       bool added = false;
>         int i;
>  
>         if (!pt)
> @@ -212,7 +214,20 @@ void xe_pt_destroy(struct xe_pt *pt, u32 flags,
> struct llist_head *deferred)
>  
>         XE_WARN_ON(!list_empty(&pt->bo->ttm.base.gpuva.list));
>         xe_bo_unpin(pt->bo);
> -       xe_bo_put_deferred(pt->bo, deferred, NULL);
> +       xe_bo_put_deferred(pt->bo, deferred, &added);
> +       if (added) {
> +               xe_assert(pt->bo->vm->xe, !kref_read(&pt->bo-
> >ttm.base.refcount));
> +
> +               /*
> +                * We need the VM present until the BO is destroyed
> as it shares
> +                * a dma-resv and BO destroy is async. Reinit BO
> refcount so
> +                * xe_bo_put_async can be used when the PT job ops
> refcount goes
> +                * to zero.
> +                */
> +               xe_vm_get(pt->bo->vm);
> +               pt->bo->flags |= XE_BO_FLAG_PUT_VM_ASYNC;
> +               kref_init(&pt->bo->ttm.base.refcount);
> +       }
>  
>         if (pt->level > 0 && pt->num_live) {
>                 struct xe_pt_dir *pt_dir = as_xe_pt_dir(pt);
> @@ -1884,13 +1899,13 @@ xe_pt_commit_prepare_unbind(struct xe_vma
> *vma,
>  static struct xe_vm_pgtable_update_op *
>  to_pt_op(struct xe_vm_pgtable_update_ops *pt_update_ops, u32 op_idx)
>  {
> -       return &pt_update_ops->ops[op_idx];
> +       return &pt_update_ops->pt_job_ops->ops[op_idx];
>  }
>  
>  static u32
>  get_current_op(struct xe_vm_pgtable_update_ops *pt_update_ops)
>  {
> -       return pt_update_ops->current_op;
> +       return pt_update_ops->pt_job_ops->current_op;
>  }
>  
>  static struct xe_vm_pgtable_update_op *
> @@ -1902,7 +1917,7 @@ to_current_pt_op(struct
> xe_vm_pgtable_update_ops *pt_update_ops)
>  static void
>  incr_current_op(struct xe_vm_pgtable_update_ops *pt_update_ops)
>  {
> -       ++pt_update_ops->current_op;
> +       ++pt_update_ops->pt_job_ops->current_op;
>  }
>  
>  static void
> @@ -2264,7 +2279,6 @@ static int op_prepare(struct xe_vm *vm,
>  static void
>  xe_pt_update_ops_init(struct xe_vm_pgtable_update_ops
> *pt_update_ops)
>  {
> -       init_llist_head(&pt_update_ops->deferred);
>         pt_update_ops->start = ~0x0ull;
>         pt_update_ops->last = 0x0ull;
>         xe_page_reclaim_list_init(&pt_update_ops->prl);
> @@ -2612,7 +2626,8 @@ xe_pt_update_ops_run(struct xe_tile *tile,
> struct xe_vma_ops *vops)
>                         to_pt_op(pt_update_ops, i);
>  
>                 xe_pt_commit(pt_op->vma, pt_op->entries,
> -                            pt_op->num_entries, &pt_update_ops-
> >deferred);
> +                            pt_op->num_entries,
> +                            &pt_update_ops->pt_job_ops->deferred);
>                 pt_op->vma = NULL;      /* skip in
> xe_pt_update_ops_abort */
>         }
>  
> @@ -2700,19 +2715,8 @@ void xe_pt_update_ops_fini(struct xe_tile
> *tile, struct xe_vma_ops *vops)
>  {
>         struct xe_vm_pgtable_update_ops *pt_update_ops =
>                 &vops->pt_update_ops[tile->id];
> -       int i;
>  
>         xe_page_reclaim_entries_put(pt_update_ops->prl.entries);
> -
> -       lockdep_assert_held(&vops->vm->lock);
> -       xe_vm_assert_held(vops->vm);
> -
> -       for (i = 0; i < pt_update_ops->current_op; ++i) {
> -               struct xe_vm_pgtable_update_op *pt_op =
> &pt_update_ops->ops[i];
> -
> -               xe_pt_free_bind(pt_op->entries, pt_op->num_entries);
> -       }
> -       xe_bo_put_commit(&vops->pt_update_ops[tile->id].deferred);
>  }
>  
>  /**
> @@ -2749,3 +2753,97 @@ void xe_pt_update_ops_abort(struct xe_tile
> *tile, struct xe_vma_ops *vops)
>  
>         xe_pt_update_ops_fini(tile, vops);
>  }
> +
> +/**
> + * xe_pt_job_ops_alloc() - Allocate PT job ops
> + * @num_ops: Number of VM PT update ops
> + *
> + * Allocate PT job ops and internal array of VM PT update ops.
> + *
> + * Return: Pointer to PT job ops or NULL
> + */
> +struct xe_pt_job_ops *xe_pt_job_ops_alloc(u32 num_ops)
> +{
> +       struct xe_pt_job_ops *pt_job_ops;
> +
> +       pt_job_ops = kmalloc(sizeof(*pt_job_ops), GFP_KERNEL);
> +       if (!pt_job_ops)
> +               return NULL;
> +
> +       pt_job_ops->ops = kvmalloc_array(num_ops, sizeof(*pt_job_ops-
> >ops),
> +                                        GFP_KERNEL);
> +       if (!pt_job_ops->ops) {
> +               kvfree(pt_job_ops);

This should be kfree right?

Thanks,
Stuart

> +               return NULL;
> +       }
> +
> +       pt_job_ops->current_op = 0;
> +       kref_init(&pt_job_ops->refcount);
> +       init_llist_head(&pt_job_ops->deferred);
> +
> +       return pt_job_ops;
> +}
> +
> +/**
> + * xe_pt_job_ops_get() - Get PT job ops
> + * @pt_job_ops: PT job ops to get
> + *
> + * Take a reference to PT job ops
> + *
> + * Return: Pointer to PT job ops or NULL
> + */
> +struct xe_pt_job_ops *xe_pt_job_ops_get(struct xe_pt_job_ops
> *pt_job_ops)
> +{
> +       if (pt_job_ops)
> +               kref_get(&pt_job_ops->refcount);
> +
> +       return pt_job_ops;
> +}
> +
> +static void xe_pt_update_ops_free(struct xe_vm_pgtable_update_op
> *pt_op,
> +                                 u32 num_ops)
> +{
> +       u32 i;
> +
> +       for (i = 0; i < num_ops; ++i, ++pt_op)
> +               xe_pt_free_bind(pt_op->entries, pt_op->num_entries);
> +}
> +
> +static void xe_pt_job_ops_destroy(struct kref *ref)
> +{
> +       struct xe_pt_job_ops *pt_job_ops =
> +               container_of(ref, struct xe_pt_job_ops, refcount);
> +       struct llist_node *freed;
> +       struct xe_bo *bo, *next;
> +
> +       xe_pt_update_ops_free(pt_job_ops->ops,
> +                             pt_job_ops->current_op);
> +
> +       freed = llist_del_all(&pt_job_ops->deferred);
> +       if (freed) {
> +               llist_for_each_entry_safe(bo, next, freed, freed)
> +                       /*
> +                        * If called from run_job, we are in the dma-
> fencing
> +                        * path and cannot take dma-resv locks so use
> an async
> +                        * put.
> +                        */
> +                       xe_bo_put_async(bo);
> +       }
> +
> +       kvfree(pt_job_ops->ops);
> +       kfree(pt_job_ops);
> +}
> +
> +/**
> + * xe_pt_job_ops_put() - Put PT job ops
> + * @pt_job_ops: PT job ops to put
> + *
> + * Drop a reference to PT job ops
> + */
> +void xe_pt_job_ops_put(struct xe_pt_job_ops *pt_job_ops)
> +{
> +       if (!pt_job_ops)
> +               return;
> +
> +       kref_put(&pt_job_ops->refcount, xe_pt_job_ops_destroy);
> +}
> diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
> index 4daeebaab5a1..5faddb8e700c 100644
> --- a/drivers/gpu/drm/xe/xe_pt.h
> +++ b/drivers/gpu/drm/xe/xe_pt.h
> @@ -49,4 +49,8 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct
> xe_vma *vma);
>  bool xe_pt_zap_ptes_range(struct xe_tile *tile, struct xe_vm *vm,
>                           struct xe_svm_range *range);
>  
> +struct xe_pt_job_ops *xe_pt_job_ops_alloc(u32 num_ops);
> +struct xe_pt_job_ops *xe_pt_job_ops_get(struct xe_pt_job_ops
> *pt_job_ops);
> +void xe_pt_job_ops_put(struct xe_pt_job_ops *pt_job_ops);
> +
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_pt_types.h
> b/drivers/gpu/drm/xe/xe_pt_types.h
> index 84b51d3762a4..92d50573ed1d 100644
> --- a/drivers/gpu/drm/xe/xe_pt_types.h
> +++ b/drivers/gpu/drm/xe/xe_pt_types.h
> @@ -91,12 +91,29 @@ struct xe_vm_pgtable_update_op {
>         bool rebind;
>  };
>  
> +/**
> + * struct xe_pt_job_ops - Page-table update operations (dynamically
> allocated)
> + *
> + * This is the portion of &struct xe_vma_ops and
> + * &struct xe_vm_pgtable_update_ops that is dynamically allocated,
> as it
> + * must remain valid until the associated bind job completes. A
> reference
> + * count controls its lifetime.
> + */
> +struct xe_pt_job_ops {
> +       /** @current_op: current page-table update operation */
> +       u32 current_op;
> +       /** @refcount: reference count */
> +       struct kref refcount;
> +       /** @deferred: list of deferred PT entries to destroy */
> +       struct llist_head deferred;
> +       /** @ops: page-table update operations */
> +       struct xe_vm_pgtable_update_op *ops;
> +};
> +
>  /** struct xe_vm_pgtable_update_ops: page table update operations */
>  struct xe_vm_pgtable_update_ops {
> -       /** @ops: operations */
> -       struct xe_vm_pgtable_update_op *ops;
> -       /** @deferred: deferred list to destroy PT entries */
> -       struct llist_head deferred;
> +       /** @pt_job_ops: PT update operations dynamic allocation*/
> +       struct xe_pt_job_ops *pt_job_ops;
>         /** @q: exec queue for PT operations */
>         struct xe_exec_queue *q;
>         /** @prl: embedded page reclaim list */
> @@ -107,8 +124,6 @@ struct xe_vm_pgtable_update_ops {
>         u64 last;
>         /** @num_ops: number of operations */
>         u32 num_ops;
> -       /** @current_op: current operations */
> -       u32 current_op;
>         /** @needs_svm_lock: Needs SVM lock */
>         bool needs_svm_lock;
>         /** @needs_invalidation: Needs invalidation */
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 548b0769b3ef..3e2d2191b78c 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -585,11 +585,9 @@ static int xe_vma_ops_alloc(struct xe_vma_ops
> *vops, bool array_of_binds)
>                 if (!vops->pt_update_ops[i].num_ops)
>                         continue;
>  
> -               vops->pt_update_ops[i].ops =
> -                       kmalloc_objs(*vops->pt_update_ops[i].ops,
> -                                    vops->pt_update_ops[i].num_ops,
> -                                    GFP_KERNEL | __GFP_RETRY_MAYFAIL
> | __GFP_NOWARN);
> -               if (!vops->pt_update_ops[i].ops)
> +               vops->pt_update_ops[i].pt_job_ops =
> +                       xe_pt_job_ops_alloc(vops-
> >pt_update_ops[i].num_ops);
> +               if (!vops->pt_update_ops[i].pt_job_ops)
>                         return array_of_binds ? -ENOBUFS : -ENOMEM;
>         }
>  
> @@ -625,7 +623,7 @@ static void xe_vma_ops_fini(struct xe_vma_ops
> *vops)
>         xe_vma_svm_prefetch_ops_fini(vops);
>  
>         for (i = 0; i < XE_MAX_TILES_PER_DEVICE; ++i)
> -               kfree(vops->pt_update_ops[i].ops);
> +               xe_pt_job_ops_put(vops->pt_update_ops[i].pt_job_ops);
>  }
>  
>  static void xe_vma_ops_incr_pt_update_ops(struct xe_vma_ops *vops,
> u8 tile_mask, int inc_val)


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 09/25] drm/xe: Add struct xe_pt_job_ops
  2026-03-03 23:26   ` Summers, Stuart
@ 2026-03-03 23:28     ` Matthew Brost
  0 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-03-03 23:28 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, Ghimiray, Himal Prasad,
	Yadav, Arvind, thomas.hellstrom@linux.intel.com, Dugast, Francois

On Tue, Mar 03, 2026 at 04:26:01PM -0700, Summers, Stuart wrote:
> On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> > Add struct xe_pt_job_ops, a dynamically refcounted object that
> > contains
> > the information required to issue a CPU bind via a job after the
> > initial
> > bind IOCTL returns.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_migrate.c  |  10 +--
> >  drivers/gpu/drm/xe/xe_pt.c       | 132 +++++++++++++++++++++++++++--
> > --
> >  drivers/gpu/drm/xe/xe_pt.h       |   4 +
> >  drivers/gpu/drm/xe/xe_pt_types.h |  27 +++++--
> >  drivers/gpu/drm/xe/xe_vm.c       |  10 +--
> >  5 files changed, 149 insertions(+), 34 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > b/drivers/gpu/drm/xe/xe_migrate.c
> > index 69e6e3135ec6..cd6802642ef3 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -1771,7 +1771,7 @@ xe_migrate_update_pgtables_cpu(struct
> > xe_migrate *m,
> >         }
> >  
> >         xe_migrate_update_pgtables_cpu_execute(vm, m->tile, ops,
> > -                                              pt_update_ops->ops,
> > +                                              pt_update_ops-
> > >pt_job_ops->ops,
> >                                                pt_update_ops-
> > >num_ops);
> >  
> >         return dma_fence_get_stub();
> > @@ -1798,7 +1798,7 @@ __xe_migrate_update_pgtables(struct xe_migrate
> > *m,
> >         bool usm = is_migrate && xe->info.has_usm;
> >  
> >         for (i = 0; i < pt_update_ops->num_ops; ++i) {
> > -               struct xe_vm_pgtable_update_op *pt_op =
> > &pt_update_ops->ops[i];
> > +               struct xe_vm_pgtable_update_op *pt_op =
> > &pt_update_ops->pt_job_ops->ops[i];
> >                 struct xe_vm_pgtable_update *updates = pt_op-
> > >entries;
> >  
> >                 num_updates += pt_op->num_entries;
> > @@ -1867,7 +1867,7 @@ __xe_migrate_update_pgtables(struct xe_migrate
> > *m,
> >  
> >                         for (; i < pt_update_ops->num_ops; ++i) {
> >                                 struct xe_vm_pgtable_update_op *pt_op
> > =
> > -                                       &pt_update_ops->ops[i];
> > +                                       &pt_update_ops->pt_job_ops-
> > >ops[i];
> >                                 struct xe_vm_pgtable_update *updates
> > = pt_op->entries;
> >  
> >                                 for (; j < pt_op->num_entries; ++j,
> > ++current_update, ++idx) {
> > @@ -1904,7 +1904,7 @@ __xe_migrate_update_pgtables(struct xe_migrate
> > *m,
> >                         (page_ofs / sizeof(u64)) * XE_PAGE_SIZE;
> >                 for (i = 0; i < pt_update_ops->num_ops; ++i) {
> >                         struct xe_vm_pgtable_update_op *pt_op =
> > -                               &pt_update_ops->ops[i];
> > +                               &pt_update_ops->pt_job_ops->ops[i];
> >                         struct xe_vm_pgtable_update *updates = pt_op-
> > >entries;
> >  
> >                         for (j = 0; j < pt_op->num_entries; ++j) {
> > @@ -1922,7 +1922,7 @@ __xe_migrate_update_pgtables(struct xe_migrate
> > *m,
> >  
> >                 for (i = 0; i < pt_update_ops->num_ops; ++i) {
> >                         struct xe_vm_pgtable_update_op *pt_op =
> > -                               &pt_update_ops->ops[i];
> > +                               &pt_update_ops->pt_job_ops->ops[i];
> >                         struct xe_vm_pgtable_update *updates = pt_op-
> > >entries;
> >  
> >                         for (j = 0; j < pt_op->num_entries; ++j)
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> > index 1f24eff75185..6b56e62a35c1 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -204,7 +204,9 @@ unsigned int xe_pt_shift(unsigned int level)
> >   * and finally frees @pt. TODO: Can we remove the @flags argument?
> >   */
> >  void xe_pt_destroy(struct xe_pt *pt, u32 flags, struct llist_head
> > *deferred)
> > +
> 
> Extra line..
> 

+1

> >  {
> > +       bool added = false;
> >         int i;
> >  
> >         if (!pt)
> > @@ -212,7 +214,20 @@ void xe_pt_destroy(struct xe_pt *pt, u32 flags,
> > struct llist_head *deferred)
> >  
> >         XE_WARN_ON(!list_empty(&pt->bo->ttm.base.gpuva.list));
> >         xe_bo_unpin(pt->bo);
> > -       xe_bo_put_deferred(pt->bo, deferred, NULL);
> > +       xe_bo_put_deferred(pt->bo, deferred, &added);
> > +       if (added) {
> > +               xe_assert(pt->bo->vm->xe, !kref_read(&pt->bo-
> > >ttm.base.refcount));
> > +
> > +               /*
> > +                * We need the VM present until the BO is destroyed
> > as it shares
> > +                * a dma-resv and BO destroy is async. Reinit BO
> > refcount so
> > +                * xe_bo_put_async can be used when the PT job ops
> > refcount goes
> > +                * to zero.
> > +                */
> > +               xe_vm_get(pt->bo->vm);
> > +               pt->bo->flags |= XE_BO_FLAG_PUT_VM_ASYNC;
> > +               kref_init(&pt->bo->ttm.base.refcount);
> > +       }
> >  
> >         if (pt->level > 0 && pt->num_live) {
> >                 struct xe_pt_dir *pt_dir = as_xe_pt_dir(pt);
> > @@ -1884,13 +1899,13 @@ xe_pt_commit_prepare_unbind(struct xe_vma
> > *vma,
> >  static struct xe_vm_pgtable_update_op *
> >  to_pt_op(struct xe_vm_pgtable_update_ops *pt_update_ops, u32 op_idx)
> >  {
> > -       return &pt_update_ops->ops[op_idx];
> > +       return &pt_update_ops->pt_job_ops->ops[op_idx];
> >  }
> >  
> >  static u32
> >  get_current_op(struct xe_vm_pgtable_update_ops *pt_update_ops)
> >  {
> > -       return pt_update_ops->current_op;
> > +       return pt_update_ops->pt_job_ops->current_op;
> >  }
> >  
> >  static struct xe_vm_pgtable_update_op *
> > @@ -1902,7 +1917,7 @@ to_current_pt_op(struct
> > xe_vm_pgtable_update_ops *pt_update_ops)
> >  static void
> >  incr_current_op(struct xe_vm_pgtable_update_ops *pt_update_ops)
> >  {
> > -       ++pt_update_ops->current_op;
> > +       ++pt_update_ops->pt_job_ops->current_op;
> >  }
> >  
> >  static void
> > @@ -2264,7 +2279,6 @@ static int op_prepare(struct xe_vm *vm,
> >  static void
> >  xe_pt_update_ops_init(struct xe_vm_pgtable_update_ops
> > *pt_update_ops)
> >  {
> > -       init_llist_head(&pt_update_ops->deferred);
> >         pt_update_ops->start = ~0x0ull;
> >         pt_update_ops->last = 0x0ull;
> >         xe_page_reclaim_list_init(&pt_update_ops->prl);
> > @@ -2612,7 +2626,8 @@ xe_pt_update_ops_run(struct xe_tile *tile,
> > struct xe_vma_ops *vops)
> >                         to_pt_op(pt_update_ops, i);
> >  
> >                 xe_pt_commit(pt_op->vma, pt_op->entries,
> > -                            pt_op->num_entries, &pt_update_ops-
> > >deferred);
> > +                            pt_op->num_entries,
> > +                            &pt_update_ops->pt_job_ops->deferred);
> >                 pt_op->vma = NULL;      /* skip in
> > xe_pt_update_ops_abort */
> >         }
> >  
> > @@ -2700,19 +2715,8 @@ void xe_pt_update_ops_fini(struct xe_tile
> > *tile, struct xe_vma_ops *vops)
> >  {
> >         struct xe_vm_pgtable_update_ops *pt_update_ops =
> >                 &vops->pt_update_ops[tile->id];
> > -       int i;
> >  
> >         xe_page_reclaim_entries_put(pt_update_ops->prl.entries);
> > -
> > -       lockdep_assert_held(&vops->vm->lock);
> > -       xe_vm_assert_held(vops->vm);
> > -
> > -       for (i = 0; i < pt_update_ops->current_op; ++i) {
> > -               struct xe_vm_pgtable_update_op *pt_op =
> > &pt_update_ops->ops[i];
> > -
> > -               xe_pt_free_bind(pt_op->entries, pt_op->num_entries);
> > -       }
> > -       xe_bo_put_commit(&vops->pt_update_ops[tile->id].deferred);
> >  }
> >  
> >  /**
> > @@ -2749,3 +2753,97 @@ void xe_pt_update_ops_abort(struct xe_tile
> > *tile, struct xe_vma_ops *vops)
> >  
> >         xe_pt_update_ops_fini(tile, vops);
> >  }
> > +
> > +/**
> > + * xe_pt_job_ops_alloc() - Allocate PT job ops
> > + * @num_ops: Number of VM PT update ops
> > + *
> > + * Allocate PT job ops and internal array of VM PT update ops.
> > + *
> > + * Return: Pointer to PT job ops or NULL
> > + */
> > +struct xe_pt_job_ops *xe_pt_job_ops_alloc(u32 num_ops)
> > +{
> > +       struct xe_pt_job_ops *pt_job_ops;
> > +
> > +       pt_job_ops = kmalloc(sizeof(*pt_job_ops), GFP_KERNEL);
> > +       if (!pt_job_ops)
> > +               return NULL;
> > +
> > +       pt_job_ops->ops = kvmalloc_array(num_ops, sizeof(*pt_job_ops-
> > >ops),
> > +                                        GFP_KERNEL);
> > +       if (!pt_job_ops->ops) {
> > +               kvfree(pt_job_ops);
> 
> This should be kfree right?
> 

Yes, will fix.

Matt


> Thanks,
> Stuart
> 
> > +               return NULL;
> > +       }
> > +
> > +       pt_job_ops->current_op = 0;
> > +       kref_init(&pt_job_ops->refcount);
> > +       init_llist_head(&pt_job_ops->deferred);
> > +
> > +       return pt_job_ops;
> > +}
> > +
> > +/**
> > + * xe_pt_job_ops_get() - Get PT job ops
> > + * @pt_job_ops: PT job ops to get
> > + *
> > + * Take a reference to PT job ops
> > + *
> > + * Return: Pointer to PT job ops or NULL
> > + */
> > +struct xe_pt_job_ops *xe_pt_job_ops_get(struct xe_pt_job_ops
> > *pt_job_ops)
> > +{
> > +       if (pt_job_ops)
> > +               kref_get(&pt_job_ops->refcount);
> > +
> > +       return pt_job_ops;
> > +}
> > +
> > +static void xe_pt_update_ops_free(struct xe_vm_pgtable_update_op
> > *pt_op,
> > +                                 u32 num_ops)
> > +{
> > +       u32 i;
> > +
> > +       for (i = 0; i < num_ops; ++i, ++pt_op)
> > +               xe_pt_free_bind(pt_op->entries, pt_op->num_entries);
> > +}
> > +
> > +static void xe_pt_job_ops_destroy(struct kref *ref)
> > +{
> > +       struct xe_pt_job_ops *pt_job_ops =
> > +               container_of(ref, struct xe_pt_job_ops, refcount);
> > +       struct llist_node *freed;
> > +       struct xe_bo *bo, *next;
> > +
> > +       xe_pt_update_ops_free(pt_job_ops->ops,
> > +                             pt_job_ops->current_op);
> > +
> > +       freed = llist_del_all(&pt_job_ops->deferred);
> > +       if (freed) {
> > +               llist_for_each_entry_safe(bo, next, freed, freed)
> > +                       /*
> > +                        * If called from run_job, we are in the dma-
> > fencing
> > +                        * path and cannot take dma-resv locks so use
> > an async
> > +                        * put.
> > +                        */
> > +                       xe_bo_put_async(bo);
> > +       }
> > +
> > +       kvfree(pt_job_ops->ops);
> > +       kfree(pt_job_ops);
> > +}
> > +
> > +/**
> > + * xe_pt_job_ops_put() - Put PT job ops
> > + * @pt_job_ops: PT job ops to put
> > + *
> > + * Drop a reference to PT job ops
> > + */
> > +void xe_pt_job_ops_put(struct xe_pt_job_ops *pt_job_ops)
> > +{
> > +       if (!pt_job_ops)
> > +               return;
> > +
> > +       kref_put(&pt_job_ops->refcount, xe_pt_job_ops_destroy);
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
> > index 4daeebaab5a1..5faddb8e700c 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.h
> > +++ b/drivers/gpu/drm/xe/xe_pt.h
> > @@ -49,4 +49,8 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct
> > xe_vma *vma);
> >  bool xe_pt_zap_ptes_range(struct xe_tile *tile, struct xe_vm *vm,
> >                           struct xe_svm_range *range);
> >  
> > +struct xe_pt_job_ops *xe_pt_job_ops_alloc(u32 num_ops);
> > +struct xe_pt_job_ops *xe_pt_job_ops_get(struct xe_pt_job_ops
> > *pt_job_ops);
> > +void xe_pt_job_ops_put(struct xe_pt_job_ops *pt_job_ops);
> > +
> >  #endif
> > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h
> > b/drivers/gpu/drm/xe/xe_pt_types.h
> > index 84b51d3762a4..92d50573ed1d 100644
> > --- a/drivers/gpu/drm/xe/xe_pt_types.h
> > +++ b/drivers/gpu/drm/xe/xe_pt_types.h
> > @@ -91,12 +91,29 @@ struct xe_vm_pgtable_update_op {
> >         bool rebind;
> >  };
> >  
> > +/**
> > + * struct xe_pt_job_ops - Page-table update operations (dynamically
> > allocated)
> > + *
> > + * This is the portion of &struct xe_vma_ops and
> > + * &struct xe_vm_pgtable_update_ops that is dynamically allocated,
> > as it
> > + * must remain valid until the associated bind job completes. A
> > reference
> > + * count controls its lifetime.
> > + */
> > +struct xe_pt_job_ops {
> > +       /** @current_op: current page-table update operation */
> > +       u32 current_op;
> > +       /** @refcount: reference count */
> > +       struct kref refcount;
> > +       /** @deferred: list of deferred PT entries to destroy */
> > +       struct llist_head deferred;
> > +       /** @ops: page-table update operations */
> > +       struct xe_vm_pgtable_update_op *ops;
> > +};
> > +
> >  /** struct xe_vm_pgtable_update_ops: page table update operations */
> >  struct xe_vm_pgtable_update_ops {
> > -       /** @ops: operations */
> > -       struct xe_vm_pgtable_update_op *ops;
> > -       /** @deferred: deferred list to destroy PT entries */
> > -       struct llist_head deferred;
> > +       /** @pt_job_ops: PT update operations dynamic allocation*/
> > +       struct xe_pt_job_ops *pt_job_ops;
> >         /** @q: exec queue for PT operations */
> >         struct xe_exec_queue *q;
> >         /** @prl: embedded page reclaim list */
> > @@ -107,8 +124,6 @@ struct xe_vm_pgtable_update_ops {
> >         u64 last;
> >         /** @num_ops: number of operations */
> >         u32 num_ops;
> > -       /** @current_op: current operations */
> > -       u32 current_op;
> >         /** @needs_svm_lock: Needs SVM lock */
> >         bool needs_svm_lock;
> >         /** @needs_invalidation: Needs invalidation */
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index 548b0769b3ef..3e2d2191b78c 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -585,11 +585,9 @@ static int xe_vma_ops_alloc(struct xe_vma_ops
> > *vops, bool array_of_binds)
> >                 if (!vops->pt_update_ops[i].num_ops)
> >                         continue;
> >  
> > -               vops->pt_update_ops[i].ops =
> > -                       kmalloc_objs(*vops->pt_update_ops[i].ops,
> > -                                    vops->pt_update_ops[i].num_ops,
> > -                                    GFP_KERNEL | __GFP_RETRY_MAYFAIL
> > | __GFP_NOWARN);
> > -               if (!vops->pt_update_ops[i].ops)
> > +               vops->pt_update_ops[i].pt_job_ops =
> > +                       xe_pt_job_ops_alloc(vops-
> > >pt_update_ops[i].num_ops);
> > +               if (!vops->pt_update_ops[i].pt_job_ops)
> >                         return array_of_binds ? -ENOBUFS : -ENOMEM;
> >         }
> >  
> > @@ -625,7 +623,7 @@ static void xe_vma_ops_fini(struct xe_vma_ops
> > *vops)
> >         xe_vma_svm_prefetch_ops_fini(vops);
> >  
> >         for (i = 0; i < XE_MAX_TILES_PER_DEVICE; ++i)
> > -               kfree(vops->pt_update_ops[i].ops);
> > +               xe_pt_job_ops_put(vops->pt_update_ops[i].pt_job_ops);
> >  }
> >  
> >  static void xe_vma_ops_incr_pt_update_ops(struct xe_vma_ops *vops,
> > u8 tile_mask, int inc_val)
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 10/25] drm/xe: Update GuC submission backend to run PT jobs
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (8 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 09/25] drm/xe: Add struct xe_pt_job_ops Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-03-03 23:28   ` Summers, Stuart
  2026-02-28  1:34 ` [PATCH v3 11/25] drm/xe: Store level in struct xe_vm_pgtable_update Matthew Brost
                   ` (20 subsequent siblings)
  30 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

PT jobs bypass GPU execution for the final step of a bind job, using the
CPU to program the required page tables. Teach the GuC submission backend
how to execute these jobs.

PT job submission is implemented in the GuC backend for simplicity. A
follow-up patch could introduce a dedicated backend for PT jobs.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 37 ++++++++++++++++++++++++++----
 drivers/gpu/drm/xe/xe_migrate.c    | 13 ++++++++++-
 drivers/gpu/drm/xe/xe_migrate.h    |  8 +++++++
 3 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 453af51fe87b..1d6ac7a6563b 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -36,8 +36,10 @@
 #include "xe_lrc.h"
 #include "xe_macros.h"
 #include "xe_map.h"
+#include "xe_migrate.h"
 #include "xe_mocs.h"
 #include "xe_pm.h"
+#include "xe_pt.h"
 #include "xe_ring_ops_types.h"
 #include "xe_sched_job.h"
 #include "xe_sleep.h"
@@ -1183,6 +1185,20 @@ static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job)
 	}
 }
 
+static bool is_pt_job(struct xe_sched_job *job)
+{
+	return job->is_pt_job;
+}
+
+static void run_pt_job(struct xe_sched_job *job)
+{
+	xe_migrate_update_pgtables_cpu_execute(job->pt_update[0].vm,
+					       job->pt_update[0].tile,
+					       job->pt_update[0].ops,
+					       job->pt_update[0].pt_job_ops->ops,
+					       job->pt_update[0].pt_job_ops->current_op);
+}
+
 static struct dma_fence *
 guc_exec_queue_run_job(struct drm_sched_job *drm_job)
 {
@@ -1210,14 +1226,25 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
 				register_exec_queue(primary, GUC_CONTEXT_NORMAL);
 		}
 
-		if (!exec_queue_registered(q))
-			register_exec_queue(q, GUC_CONTEXT_NORMAL);
-		if (!job->restore_replay)
-			q->ring_ops->emit_job(job);
-		submit_exec_queue(q, job);
+		if (is_pt_job(job)) {
+			xe_gt_assert(guc_to_gt(guc), !exec_queue_registered(q));
+			run_pt_job(job);
+		} else {
+			if (!exec_queue_registered(q))
+				register_exec_queue(q, GUC_CONTEXT_NORMAL);
+			if (!job->restore_replay)
+				q->ring_ops->emit_job(job);
+			submit_exec_queue(q, job);
+		}
 		job->restore_replay = false;
 	}
 
+	if (is_pt_job(job)) {
+		xe_pt_job_ops_put(job->pt_update[0].pt_job_ops);
+		dma_fence_put(job->fence);	/* Drop ref from xe_sched_job_arm */
+		return NULL;
+	}
+
 run_job_out:
 
 	return job->fence;
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index cd6802642ef3..e9b9dfe19e48 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1715,7 +1715,18 @@ struct migrate_test_params {
 	container_of(_priv, struct migrate_test_params, base)
 #endif
 
-static void
+/**
+ * xe_migrate_update_pgtables_cpu_execute() - Update a VM's PTEs via the CPU
+ * @vm: The VM being updated
+ * @tile: The tile being updated
+ * @ops: The migrate PT update ops
+ * @pt_ops: The VM PT update ops
+ * @num_ops: The number of The VM PT update ops
+ *
+ * Execute the VM PT update ops array which results in a VM's PTEs being updated
+ * via the CPU.
+ */
+void
 xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct xe_tile *tile,
 				       const struct xe_migrate_pt_update_ops *ops,
 				       struct xe_vm_pgtable_update_op *pt_op,
diff --git a/drivers/gpu/drm/xe/xe_migrate.h b/drivers/gpu/drm/xe/xe_migrate.h
index c3c0740f908d..30c9c990a8b1 100644
--- a/drivers/gpu/drm/xe/xe_migrate.h
+++ b/drivers/gpu/drm/xe/xe_migrate.h
@@ -24,6 +24,7 @@ struct xe_pt;
 struct xe_tile;
 struct xe_vm;
 struct xe_vm_pgtable_update;
+struct xe_vm_pgtable_update_op;
 struct xe_vma;
 
 enum xe_sriov_vf_ccs_rw_ctxs;
@@ -157,6 +158,13 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m,
 
 struct xe_vm *xe_migrate_get_vm(struct xe_migrate *m);
 
+
+void
+xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct xe_tile *tile,
+				       const struct xe_migrate_pt_update_ops *ops,
+				       struct xe_vm_pgtable_update_op *pt_op,
+				       int num_ops);
+
 struct dma_fence *
 xe_migrate_update_pgtables(struct xe_migrate *m,
 			   struct xe_migrate_pt_update *pt_update);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 10/25] drm/xe: Update GuC submission backend to run PT jobs
  2026-02-28  1:34 ` [PATCH v3 10/25] drm/xe: Update GuC submission backend to run PT jobs Matthew Brost
@ 2026-03-03 23:28   ` Summers, Stuart
  2026-03-04  0:26     ` Matthew Brost
  0 siblings, 1 reply; 63+ messages in thread
From: Summers, Stuart @ 2026-03-03 23:28 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Ghimiray, Himal Prasad, Yadav, Arvind,
	thomas.hellstrom@linux.intel.com, Dugast, Francois

On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> PT jobs bypass GPU execution for the final step of a bind job, using
> the
> CPU to program the required page tables. Teach the GuC submission
> backend
> how to execute these jobs.
> 
> PT job submission is implemented in the GuC backend for simplicity. A
> follow-up patch could introduce a dedicated backend for PT jobs.

Still looking through the whole series, but standing alone, this patch
doesn't feel right to me. I don't see why we'd want to hook together
the PT update flow with the GuC backend...

Thanks,
Stuart

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_guc_submit.c | 37 ++++++++++++++++++++++++++--
> --
>  drivers/gpu/drm/xe/xe_migrate.c    | 13 ++++++++++-
>  drivers/gpu/drm/xe/xe_migrate.h    |  8 +++++++
>  3 files changed, 52 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 453af51fe87b..1d6ac7a6563b 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -36,8 +36,10 @@
>  #include "xe_lrc.h"
>  #include "xe_macros.h"
>  #include "xe_map.h"
> +#include "xe_migrate.h"
>  #include "xe_mocs.h"
>  #include "xe_pm.h"
> +#include "xe_pt.h"
>  #include "xe_ring_ops_types.h"
>  #include "xe_sched_job.h"
>  #include "xe_sleep.h"
> @@ -1183,6 +1185,20 @@ static void submit_exec_queue(struct
> xe_exec_queue *q, struct xe_sched_job *job)
>         }
>  }
>  
> +static bool is_pt_job(struct xe_sched_job *job)
> +{
> +       return job->is_pt_job;
> +}
> +
> +static void run_pt_job(struct xe_sched_job *job)
> +{
> +       xe_migrate_update_pgtables_cpu_execute(job->pt_update[0].vm,
> +                                              job-
> >pt_update[0].tile,
> +                                              job->pt_update[0].ops,
> +                                              job-
> >pt_update[0].pt_job_ops->ops,
> +                                              job-
> >pt_update[0].pt_job_ops->current_op);
> +}
> +
>  static struct dma_fence *
>  guc_exec_queue_run_job(struct drm_sched_job *drm_job)
>  {
> @@ -1210,14 +1226,25 @@ guc_exec_queue_run_job(struct drm_sched_job
> *drm_job)
>                                 register_exec_queue(primary,
> GUC_CONTEXT_NORMAL);
>                 }
>  
> -               if (!exec_queue_registered(q))
> -                       register_exec_queue(q, GUC_CONTEXT_NORMAL);
> -               if (!job->restore_replay)
> -                       q->ring_ops->emit_job(job);
> -               submit_exec_queue(q, job);
> +               if (is_pt_job(job)) {
> +                       xe_gt_assert(guc_to_gt(guc),
> !exec_queue_registered(q));
> +                       run_pt_job(job);
> +               } else {
> +                       if (!exec_queue_registered(q))
> +                               register_exec_queue(q,
> GUC_CONTEXT_NORMAL);
> +                       if (!job->restore_replay)
> +                               q->ring_ops->emit_job(job);
> +                       submit_exec_queue(q, job);
> +               }
>                 job->restore_replay = false;
>         }
>  
> +       if (is_pt_job(job)) {
> +               xe_pt_job_ops_put(job->pt_update[0].pt_job_ops);
> +               dma_fence_put(job->fence);      /* Drop ref from
> xe_sched_job_arm */
> +               return NULL;
> +       }
> +
>  run_job_out:
>  
>         return job->fence;
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> b/drivers/gpu/drm/xe/xe_migrate.c
> index cd6802642ef3..e9b9dfe19e48 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -1715,7 +1715,18 @@ struct migrate_test_params {
>         container_of(_priv, struct migrate_test_params, base)
>  #endif
>  
> -static void
> +/**
> + * xe_migrate_update_pgtables_cpu_execute() - Update a VM's PTEs via
> the CPU
> + * @vm: The VM being updated
> + * @tile: The tile being updated
> + * @ops: The migrate PT update ops
> + * @pt_ops: The VM PT update ops
> + * @num_ops: The number of The VM PT update ops
> + *
> + * Execute the VM PT update ops array which results in a VM's PTEs
> being updated
> + * via the CPU.
> + */
> +void
>  xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct
> xe_tile *tile,
>                                        const struct
> xe_migrate_pt_update_ops *ops,
>                                        struct xe_vm_pgtable_update_op
> *pt_op,
> diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> b/drivers/gpu/drm/xe/xe_migrate.h
> index c3c0740f908d..30c9c990a8b1 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.h
> +++ b/drivers/gpu/drm/xe/xe_migrate.h
> @@ -24,6 +24,7 @@ struct xe_pt;
>  struct xe_tile;
>  struct xe_vm;
>  struct xe_vm_pgtable_update;
> +struct xe_vm_pgtable_update_op;
>  struct xe_vma;
>  
>  enum xe_sriov_vf_ccs_rw_ctxs;
> @@ -157,6 +158,13 @@ struct dma_fence *xe_migrate_clear(struct
> xe_migrate *m,
>  
>  struct xe_vm *xe_migrate_get_vm(struct xe_migrate *m);
>  
> +
> +void
> +xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct
> xe_tile *tile,
> +                                      const struct
> xe_migrate_pt_update_ops *ops,
> +                                      struct xe_vm_pgtable_update_op
> *pt_op,
> +                                      int num_ops);
> +
>  struct dma_fence *
>  xe_migrate_update_pgtables(struct xe_migrate *m,
>                            struct xe_migrate_pt_update *pt_update);


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 10/25] drm/xe: Update GuC submission backend to run PT jobs
  2026-03-03 23:28   ` Summers, Stuart
@ 2026-03-04  0:26     ` Matthew Brost
  2026-03-04 20:43       ` Summers, Stuart
  0 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-03-04  0:26 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, Ghimiray, Himal Prasad,
	Yadav, Arvind, thomas.hellstrom@linux.intel.com, Dugast, Francois

On Tue, Mar 03, 2026 at 04:28:50PM -0700, Summers, Stuart wrote:
> On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> > PT jobs bypass GPU execution for the final step of a bind job, using
> > the
> > CPU to program the required page tables. Teach the GuC submission
> > backend
> > how to execute these jobs.
> > 
> > PT job submission is implemented in the GuC backend for simplicity. A
> > follow-up patch could introduce a dedicated backend for PT jobs.
> 
> Still looking through the whole series, but standing alone, this patch
> doesn't feel right to me. I don't see why we'd want to hook together
> the PT update flow with the GuC backend...
> 

I don't think it is either, which is why I called out a follow-up to
implement the PT backend. It’s likely a bigger refactor than one would
expect, though...

- Build the backend on top of xe_dep_scheduler
- Introduce xe_pt_job that inherits from xe_dep_job
- Ripple these changes through CPU bind, the PT layer, etc.

A lot of that is just shuffling code around across those three steps,
but it’s not too bad and will likely give us some nice layering cleanups
along the way.

The real tricky part is handling all the flows that stop/start the
backends while various global events occur (e.g., PM enter/exit, GT
resets, VF migration, FLR (WIP), etc.). All of those flows are currently
GT → UC → GuC layered (or, for VF migration, direct to GuC). So we’d
need a refactor there as well. It’s doable, but it will end up touching
quite a few files. Again, once we do this, I suspect we’ll get
additional layering cleanups along the way.

So for now, given the already large size of the series, I’d like to get
the functionality in first and then tackle the layering refactors.

Matt

> Thanks,
> Stuart
> 
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_guc_submit.c | 37 ++++++++++++++++++++++++++--
> > --
> >  drivers/gpu/drm/xe/xe_migrate.c    | 13 ++++++++++-
> >  drivers/gpu/drm/xe/xe_migrate.h    |  8 +++++++
> >  3 files changed, 52 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> > b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index 453af51fe87b..1d6ac7a6563b 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -36,8 +36,10 @@
> >  #include "xe_lrc.h"
> >  #include "xe_macros.h"
> >  #include "xe_map.h"
> > +#include "xe_migrate.h"
> >  #include "xe_mocs.h"
> >  #include "xe_pm.h"
> > +#include "xe_pt.h"
> >  #include "xe_ring_ops_types.h"
> >  #include "xe_sched_job.h"
> >  #include "xe_sleep.h"
> > @@ -1183,6 +1185,20 @@ static void submit_exec_queue(struct
> > xe_exec_queue *q, struct xe_sched_job *job)
> >         }
> >  }
> >  
> > +static bool is_pt_job(struct xe_sched_job *job)
> > +{
> > +       return job->is_pt_job;
> > +}
> > +
> > +static void run_pt_job(struct xe_sched_job *job)
> > +{
> > +       xe_migrate_update_pgtables_cpu_execute(job->pt_update[0].vm,
> > +                                              job-
> > >pt_update[0].tile,
> > +                                              job->pt_update[0].ops,
> > +                                              job-
> > >pt_update[0].pt_job_ops->ops,
> > +                                              job-
> > >pt_update[0].pt_job_ops->current_op);
> > +}
> > +
> >  static struct dma_fence *
> >  guc_exec_queue_run_job(struct drm_sched_job *drm_job)
> >  {
> > @@ -1210,14 +1226,25 @@ guc_exec_queue_run_job(struct drm_sched_job
> > *drm_job)
> >                                 register_exec_queue(primary,
> > GUC_CONTEXT_NORMAL);
> >                 }
> >  
> > -               if (!exec_queue_registered(q))
> > -                       register_exec_queue(q, GUC_CONTEXT_NORMAL);
> > -               if (!job->restore_replay)
> > -                       q->ring_ops->emit_job(job);
> > -               submit_exec_queue(q, job);
> > +               if (is_pt_job(job)) {
> > +                       xe_gt_assert(guc_to_gt(guc),
> > !exec_queue_registered(q));
> > +                       run_pt_job(job);
> > +               } else {
> > +                       if (!exec_queue_registered(q))
> > +                               register_exec_queue(q,
> > GUC_CONTEXT_NORMAL);
> > +                       if (!job->restore_replay)
> > +                               q->ring_ops->emit_job(job);
> > +                       submit_exec_queue(q, job);
> > +               }
> >                 job->restore_replay = false;
> >         }
> >  
> > +       if (is_pt_job(job)) {
> > +               xe_pt_job_ops_put(job->pt_update[0].pt_job_ops);
> > +               dma_fence_put(job->fence);      /* Drop ref from
> > xe_sched_job_arm */
> > +               return NULL;
> > +       }
> > +
> >  run_job_out:
> >  
> >         return job->fence;
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > b/drivers/gpu/drm/xe/xe_migrate.c
> > index cd6802642ef3..e9b9dfe19e48 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -1715,7 +1715,18 @@ struct migrate_test_params {
> >         container_of(_priv, struct migrate_test_params, base)
> >  #endif
> >  
> > -static void
> > +/**
> > + * xe_migrate_update_pgtables_cpu_execute() - Update a VM's PTEs via
> > the CPU
> > + * @vm: The VM being updated
> > + * @tile: The tile being updated
> > + * @ops: The migrate PT update ops
> > + * @pt_ops: The VM PT update ops
> > + * @num_ops: The number of The VM PT update ops
> > + *
> > + * Execute the VM PT update ops array which results in a VM's PTEs
> > being updated
> > + * via the CPU.
> > + */
> > +void
> >  xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct
> > xe_tile *tile,
> >                                        const struct
> > xe_migrate_pt_update_ops *ops,
> >                                        struct xe_vm_pgtable_update_op
> > *pt_op,
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> > b/drivers/gpu/drm/xe/xe_migrate.h
> > index c3c0740f908d..30c9c990a8b1 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.h
> > +++ b/drivers/gpu/drm/xe/xe_migrate.h
> > @@ -24,6 +24,7 @@ struct xe_pt;
> >  struct xe_tile;
> >  struct xe_vm;
> >  struct xe_vm_pgtable_update;
> > +struct xe_vm_pgtable_update_op;
> >  struct xe_vma;
> >  
> >  enum xe_sriov_vf_ccs_rw_ctxs;
> > @@ -157,6 +158,13 @@ struct dma_fence *xe_migrate_clear(struct
> > xe_migrate *m,
> >  
> >  struct xe_vm *xe_migrate_get_vm(struct xe_migrate *m);
> >  
> > +
> > +void
> > +xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct
> > xe_tile *tile,
> > +                                      const struct
> > xe_migrate_pt_update_ops *ops,
> > +                                      struct xe_vm_pgtable_update_op
> > *pt_op,
> > +                                      int num_ops);
> > +
> >  struct dma_fence *
> >  xe_migrate_update_pgtables(struct xe_migrate *m,
> >                            struct xe_migrate_pt_update *pt_update);
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 10/25] drm/xe: Update GuC submission backend to run PT jobs
  2026-03-04  0:26     ` Matthew Brost
@ 2026-03-04 20:43       ` Summers, Stuart
  2026-03-04 21:53         ` Matthew Brost
  0 siblings, 1 reply; 63+ messages in thread
From: Summers, Stuart @ 2026-03-04 20:43 UTC (permalink / raw)
  To: Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, Ghimiray, Himal Prasad,
	Yadav, Arvind, thomas.hellstrom@linux.intel.com, Dugast, Francois

On Tue, 2026-03-03 at 16:26 -0800, Matthew Brost wrote:
> On Tue, Mar 03, 2026 at 04:28:50PM -0700, Summers, Stuart wrote:
> > On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> > > PT jobs bypass GPU execution for the final step of a bind job,
> > > using
> > > the
> > > CPU to program the required page tables. Teach the GuC submission
> > > backend
> > > how to execute these jobs.
> > > 
> > > PT job submission is implemented in the GuC backend for
> > > simplicity. A
> > > follow-up patch could introduce a dedicated backend for PT jobs.
> > 
> > Still looking through the whole series, but standing alone, this
> > patch
> > doesn't feel right to me. I don't see why we'd want to hook
> > together
> > the PT update flow with the GuC backend...
> > 
> 
> I don't think it is either, which is why I called out a follow-up to
> implement the PT backend. It’s likely a bigger refactor than one
> would
> expect, though...
> 
> - Build the backend on top of xe_dep_scheduler
> - Introduce xe_pt_job that inherits from xe_dep_job
> - Ripple these changes through CPU bind, the PT layer, etc.
> 
> A lot of that is just shuffling code around across those three steps,
> but it’s not too bad and will likely give us some nice layering
> cleanups
> along the way.
> 
> The real tricky part is handling all the flows that stop/start the
> backends while various global events occur (e.g., PM enter/exit, GT
> resets, VF migration, FLR (WIP), etc.). All of those flows are
> currently
> GT → UC → GuC layered (or, for VF migration, direct to GuC). So we’d
> need a refactor there as well. It’s doable, but it will end up
> touching
> quite a few files. Again, once we do this, I suspect we’ll get
> additional layering cleanups along the way.
> 
> So for now, given the already large size of the series, I’d like to
> get
> the functionality in first and then tackle the layering refactors.

I get what you're saying here. Other than complexity, is there a reason
we can't do that work first though? Is there some critical reason we
need to get the CPU binding work in first, basically?

-Stuart

> 
> Matt
> 
> > Thanks,
> > Stuart
> > 
> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_guc_submit.c | 37
> > > ++++++++++++++++++++++++++--
> > > --
> > >  drivers/gpu/drm/xe/xe_migrate.c    | 13 ++++++++++-
> > >  drivers/gpu/drm/xe/xe_migrate.h    |  8 +++++++
> > >  3 files changed, 52 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > index 453af51fe87b..1d6ac7a6563b 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > @@ -36,8 +36,10 @@
> > >  #include "xe_lrc.h"
> > >  #include "xe_macros.h"
> > >  #include "xe_map.h"
> > > +#include "xe_migrate.h"
> > >  #include "xe_mocs.h"
> > >  #include "xe_pm.h"
> > > +#include "xe_pt.h"
> > >  #include "xe_ring_ops_types.h"
> > >  #include "xe_sched_job.h"
> > >  #include "xe_sleep.h"
> > > @@ -1183,6 +1185,20 @@ static void submit_exec_queue(struct
> > > xe_exec_queue *q, struct xe_sched_job *job)
> > >         }
> > >  }
> > >  
> > > +static bool is_pt_job(struct xe_sched_job *job)
> > > +{
> > > +       return job->is_pt_job;
> > > +}
> > > +
> > > +static void run_pt_job(struct xe_sched_job *job)
> > > +{
> > > +       xe_migrate_update_pgtables_cpu_execute(job-
> > > >pt_update[0].vm,
> > > +                                              job-
> > > > pt_update[0].tile,
> > > +                                              job-
> > > >pt_update[0].ops,
> > > +                                              job-
> > > > pt_update[0].pt_job_ops->ops,
> > > +                                              job-
> > > > pt_update[0].pt_job_ops->current_op);
> > > +}
> > > +
> > >  static struct dma_fence *
> > >  guc_exec_queue_run_job(struct drm_sched_job *drm_job)
> > >  {
> > > @@ -1210,14 +1226,25 @@ guc_exec_queue_run_job(struct
> > > drm_sched_job
> > > *drm_job)
> > >                                 register_exec_queue(primary,
> > > GUC_CONTEXT_NORMAL);
> > >                 }
> > >  
> > > -               if (!exec_queue_registered(q))
> > > -                       register_exec_queue(q,
> > > GUC_CONTEXT_NORMAL);
> > > -               if (!job->restore_replay)
> > > -                       q->ring_ops->emit_job(job);
> > > -               submit_exec_queue(q, job);
> > > +               if (is_pt_job(job)) {
> > > +                       xe_gt_assert(guc_to_gt(guc),
> > > !exec_queue_registered(q));
> > > +                       run_pt_job(job);
> > > +               } else {
> > > +                       if (!exec_queue_registered(q))
> > > +                               register_exec_queue(q,
> > > GUC_CONTEXT_NORMAL);
> > > +                       if (!job->restore_replay)
> > > +                               q->ring_ops->emit_job(job);
> > > +                       submit_exec_queue(q, job);
> > > +               }
> > >                 job->restore_replay = false;
> > >         }
> > >  
> > > +       if (is_pt_job(job)) {
> > > +               xe_pt_job_ops_put(job->pt_update[0].pt_job_ops);
> > > +               dma_fence_put(job->fence);      /* Drop ref from
> > > xe_sched_job_arm */
> > > +               return NULL;
> > > +       }
> > > +
> > >  run_job_out:
> > >  
> > >         return job->fence;
> > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > > b/drivers/gpu/drm/xe/xe_migrate.c
> > > index cd6802642ef3..e9b9dfe19e48 100644
> > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > @@ -1715,7 +1715,18 @@ struct migrate_test_params {
> > >         container_of(_priv, struct migrate_test_params, base)
> > >  #endif
> > >  
> > > -static void
> > > +/**
> > > + * xe_migrate_update_pgtables_cpu_execute() - Update a VM's PTEs
> > > via
> > > the CPU
> > > + * @vm: The VM being updated
> > > + * @tile: The tile being updated
> > > + * @ops: The migrate PT update ops
> > > + * @pt_ops: The VM PT update ops
> > > + * @num_ops: The number of The VM PT update ops
> > > + *
> > > + * Execute the VM PT update ops array which results in a VM's
> > > PTEs
> > > being updated
> > > + * via the CPU.
> > > + */
> > > +void
> > >  xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct
> > > xe_tile *tile,
> > >                                        const struct
> > > xe_migrate_pt_update_ops *ops,
> > >                                        struct
> > > xe_vm_pgtable_update_op
> > > *pt_op,
> > > diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> > > b/drivers/gpu/drm/xe/xe_migrate.h
> > > index c3c0740f908d..30c9c990a8b1 100644
> > > --- a/drivers/gpu/drm/xe/xe_migrate.h
> > > +++ b/drivers/gpu/drm/xe/xe_migrate.h
> > > @@ -24,6 +24,7 @@ struct xe_pt;
> > >  struct xe_tile;
> > >  struct xe_vm;
> > >  struct xe_vm_pgtable_update;
> > > +struct xe_vm_pgtable_update_op;
> > >  struct xe_vma;
> > >  
> > >  enum xe_sriov_vf_ccs_rw_ctxs;
> > > @@ -157,6 +158,13 @@ struct dma_fence *xe_migrate_clear(struct
> > > xe_migrate *m,
> > >  
> > >  struct xe_vm *xe_migrate_get_vm(struct xe_migrate *m);
> > >  
> > > +
> > > +void
> > > +xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct
> > > xe_tile *tile,
> > > +                                      const struct
> > > xe_migrate_pt_update_ops *ops,
> > > +                                      struct
> > > xe_vm_pgtable_update_op
> > > *pt_op,
> > > +                                      int num_ops);
> > > +
> > >  struct dma_fence *
> > >  xe_migrate_update_pgtables(struct xe_migrate *m,
> > >                            struct xe_migrate_pt_update
> > > *pt_update);
> > 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 10/25] drm/xe: Update GuC submission backend to run PT jobs
  2026-03-04 20:43       ` Summers, Stuart
@ 2026-03-04 21:53         ` Matthew Brost
  2026-03-05 20:24           ` Summers, Stuart
  0 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-03-04 21:53 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, Ghimiray, Himal Prasad,
	Yadav, Arvind, thomas.hellstrom@linux.intel.com, Dugast, Francois

On Wed, Mar 04, 2026 at 01:43:27PM -0700, Summers, Stuart wrote:
> On Tue, 2026-03-03 at 16:26 -0800, Matthew Brost wrote:
> > On Tue, Mar 03, 2026 at 04:28:50PM -0700, Summers, Stuart wrote:
> > > On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> > > > PT jobs bypass GPU execution for the final step of a bind job,
> > > > using
> > > > the
> > > > CPU to program the required page tables. Teach the GuC submission
> > > > backend
> > > > how to execute these jobs.
> > > > 
> > > > PT job submission is implemented in the GuC backend for
> > > > simplicity. A
> > > > follow-up patch could introduce a dedicated backend for PT jobs.
> > > 
> > > Still looking through the whole series, but standing alone, this
> > > patch
> > > doesn't feel right to me. I don't see why we'd want to hook
> > > together
> > > the PT update flow with the GuC backend...
> > > 
> > 
> > I don't think it is either, which is why I called out a follow-up to
> > implement the PT backend. It’s likely a bigger refactor than one
> > would
> > expect, though...
> > 
> > - Build the backend on top of xe_dep_scheduler
> > - Introduce xe_pt_job that inherits from xe_dep_job
> > - Ripple these changes through CPU bind, the PT layer, etc.
> > 
> > A lot of that is just shuffling code around across those three steps,
> > but it’s not too bad and will likely give us some nice layering
> > cleanups
> > along the way.
> > 
> > The real tricky part is handling all the flows that stop/start the
> > backends while various global events occur (e.g., PM enter/exit, GT
> > resets, VF migration, FLR (WIP), etc.). All of those flows are
> > currently
> > GT → UC → GuC layered (or, for VF migration, direct to GuC). So we’d
> > need a refactor there as well. It’s doable, but it will end up
> > touching
> > quite a few files. Again, once we do this, I suspect we’ll get
> > additional layering cleanups along the way.
> > 
> > So for now, given the already large size of the series, I’d like to
> > get
> > the functionality in first and then tackle the layering refactors.
> 
> I get what you're saying here. Other than complexity, is there a reason
> we can't do that work first though? Is there some critical reason we

No reason I couldn't pile in those changes into this series.

> need to get the CPU binding work in first, basically?
> 

CPU binding is the basis for much of the SVM performance work—ULLS on
the migration queue doesn’t work without it, and parallel
faults/prefetches don’t work all that well either because of ordering
issues on the migration queue between copies, clears, binds, etc.

CPU binds also make some of the multi-tile refactors easier (included in
this series; see patches 12, 15, and 16).

So I’d like to get this in without expanding the scope too much. I will
definitely rework this to use a dedicated backend and wire it up through
all the layers in a follow-up, though.

Matt

> -Stuart
> 
> > 
> > Matt
> > 
> > > Thanks,
> > > Stuart
> > > 
> > > > 
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_guc_submit.c | 37
> > > > ++++++++++++++++++++++++++--
> > > > --
> > > >  drivers/gpu/drm/xe/xe_migrate.c    | 13 ++++++++++-
> > > >  drivers/gpu/drm/xe/xe_migrate.h    |  8 +++++++
> > > >  3 files changed, 52 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > index 453af51fe87b..1d6ac7a6563b 100644
> > > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > @@ -36,8 +36,10 @@
> > > >  #include "xe_lrc.h"
> > > >  #include "xe_macros.h"
> > > >  #include "xe_map.h"
> > > > +#include "xe_migrate.h"
> > > >  #include "xe_mocs.h"
> > > >  #include "xe_pm.h"
> > > > +#include "xe_pt.h"
> > > >  #include "xe_ring_ops_types.h"
> > > >  #include "xe_sched_job.h"
> > > >  #include "xe_sleep.h"
> > > > @@ -1183,6 +1185,20 @@ static void submit_exec_queue(struct
> > > > xe_exec_queue *q, struct xe_sched_job *job)
> > > >         }
> > > >  }
> > > >  
> > > > +static bool is_pt_job(struct xe_sched_job *job)
> > > > +{
> > > > +       return job->is_pt_job;
> > > > +}
> > > > +
> > > > +static void run_pt_job(struct xe_sched_job *job)
> > > > +{
> > > > +       xe_migrate_update_pgtables_cpu_execute(job-
> > > > >pt_update[0].vm,
> > > > +                                              job-
> > > > > pt_update[0].tile,
> > > > +                                              job-
> > > > >pt_update[0].ops,
> > > > +                                              job-
> > > > > pt_update[0].pt_job_ops->ops,
> > > > +                                              job-
> > > > > pt_update[0].pt_job_ops->current_op);
> > > > +}
> > > > +
> > > >  static struct dma_fence *
> > > >  guc_exec_queue_run_job(struct drm_sched_job *drm_job)
> > > >  {
> > > > @@ -1210,14 +1226,25 @@ guc_exec_queue_run_job(struct
> > > > drm_sched_job
> > > > *drm_job)
> > > >                                 register_exec_queue(primary,
> > > > GUC_CONTEXT_NORMAL);
> > > >                 }
> > > >  
> > > > -               if (!exec_queue_registered(q))
> > > > -                       register_exec_queue(q,
> > > > GUC_CONTEXT_NORMAL);
> > > > -               if (!job->restore_replay)
> > > > -                       q->ring_ops->emit_job(job);
> > > > -               submit_exec_queue(q, job);
> > > > +               if (is_pt_job(job)) {
> > > > +                       xe_gt_assert(guc_to_gt(guc),
> > > > !exec_queue_registered(q));
> > > > +                       run_pt_job(job);
> > > > +               } else {
> > > > +                       if (!exec_queue_registered(q))
> > > > +                               register_exec_queue(q,
> > > > GUC_CONTEXT_NORMAL);
> > > > +                       if (!job->restore_replay)
> > > > +                               q->ring_ops->emit_job(job);
> > > > +                       submit_exec_queue(q, job);
> > > > +               }
> > > >                 job->restore_replay = false;
> > > >         }
> > > >  
> > > > +       if (is_pt_job(job)) {
> > > > +               xe_pt_job_ops_put(job->pt_update[0].pt_job_ops);
> > > > +               dma_fence_put(job->fence);      /* Drop ref from
> > > > xe_sched_job_arm */
> > > > +               return NULL;
> > > > +       }
> > > > +
> > > >  run_job_out:
> > > >  
> > > >         return job->fence;
> > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > > > b/drivers/gpu/drm/xe/xe_migrate.c
> > > > index cd6802642ef3..e9b9dfe19e48 100644
> > > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > @@ -1715,7 +1715,18 @@ struct migrate_test_params {
> > > >         container_of(_priv, struct migrate_test_params, base)
> > > >  #endif
> > > >  
> > > > -static void
> > > > +/**
> > > > + * xe_migrate_update_pgtables_cpu_execute() - Update a VM's PTEs
> > > > via
> > > > the CPU
> > > > + * @vm: The VM being updated
> > > > + * @tile: The tile being updated
> > > > + * @ops: The migrate PT update ops
> > > > + * @pt_ops: The VM PT update ops
> > > > + * @num_ops: The number of The VM PT update ops
> > > > + *
> > > > + * Execute the VM PT update ops array which results in a VM's
> > > > PTEs
> > > > being updated
> > > > + * via the CPU.
> > > > + */
> > > > +void
> > > >  xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct
> > > > xe_tile *tile,
> > > >                                        const struct
> > > > xe_migrate_pt_update_ops *ops,
> > > >                                        struct
> > > > xe_vm_pgtable_update_op
> > > > *pt_op,
> > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> > > > b/drivers/gpu/drm/xe/xe_migrate.h
> > > > index c3c0740f908d..30c9c990a8b1 100644
> > > > --- a/drivers/gpu/drm/xe/xe_migrate.h
> > > > +++ b/drivers/gpu/drm/xe/xe_migrate.h
> > > > @@ -24,6 +24,7 @@ struct xe_pt;
> > > >  struct xe_tile;
> > > >  struct xe_vm;
> > > >  struct xe_vm_pgtable_update;
> > > > +struct xe_vm_pgtable_update_op;
> > > >  struct xe_vma;
> > > >  
> > > >  enum xe_sriov_vf_ccs_rw_ctxs;
> > > > @@ -157,6 +158,13 @@ struct dma_fence *xe_migrate_clear(struct
> > > > xe_migrate *m,
> > > >  
> > > >  struct xe_vm *xe_migrate_get_vm(struct xe_migrate *m);
> > > >  
> > > > +
> > > > +void
> > > > +xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct
> > > > xe_tile *tile,
> > > > +                                      const struct
> > > > xe_migrate_pt_update_ops *ops,
> > > > +                                      struct
> > > > xe_vm_pgtable_update_op
> > > > *pt_op,
> > > > +                                      int num_ops);
> > > > +
> > > >  struct dma_fence *
> > > >  xe_migrate_update_pgtables(struct xe_migrate *m,
> > > >                            struct xe_migrate_pt_update
> > > > *pt_update);
> > > 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 10/25] drm/xe: Update GuC submission backend to run PT jobs
  2026-03-04 21:53         ` Matthew Brost
@ 2026-03-05 20:24           ` Summers, Stuart
  0 siblings, 0 replies; 63+ messages in thread
From: Summers, Stuart @ 2026-03-05 20:24 UTC (permalink / raw)
  To: Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, Ghimiray, Himal Prasad,
	Yadav, Arvind, thomas.hellstrom@linux.intel.com, Dugast, Francois

On Wed, 2026-03-04 at 13:53 -0800, Matthew Brost wrote:
> On Wed, Mar 04, 2026 at 01:43:27PM -0700, Summers, Stuart wrote:
> > On Tue, 2026-03-03 at 16:26 -0800, Matthew Brost wrote:
> > > On Tue, Mar 03, 2026 at 04:28:50PM -0700, Summers, Stuart wrote:
> > > > On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> > > > > PT jobs bypass GPU execution for the final step of a bind
> > > > > job,
> > > > > using
> > > > > the
> > > > > CPU to program the required page tables. Teach the GuC
> > > > > submission
> > > > > backend
> > > > > how to execute these jobs.
> > > > > 
> > > > > PT job submission is implemented in the GuC backend for
> > > > > simplicity. A
> > > > > follow-up patch could introduce a dedicated backend for PT
> > > > > jobs.
> > > > 
> > > > Still looking through the whole series, but standing alone,
> > > > this
> > > > patch
> > > > doesn't feel right to me. I don't see why we'd want to hook
> > > > together
> > > > the PT update flow with the GuC backend...
> > > > 
> > > 
> > > I don't think it is either, which is why I called out a follow-up
> > > to
> > > implement the PT backend. It’s likely a bigger refactor than one
> > > would
> > > expect, though...
> > > 
> > > - Build the backend on top of xe_dep_scheduler
> > > - Introduce xe_pt_job that inherits from xe_dep_job
> > > - Ripple these changes through CPU bind, the PT layer, etc.
> > > 
> > > A lot of that is just shuffling code around across those three
> > > steps,
> > > but it’s not too bad and will likely give us some nice layering
> > > cleanups
> > > along the way.
> > > 
> > > The real tricky part is handling all the flows that stop/start
> > > the
> > > backends while various global events occur (e.g., PM enter/exit,
> > > GT
> > > resets, VF migration, FLR (WIP), etc.). All of those flows are
> > > currently
> > > GT → UC → GuC layered (or, for VF migration, direct to GuC). So
> > > we’d
> > > need a refactor there as well. It’s doable, but it will end up
> > > touching
> > > quite a few files. Again, once we do this, I suspect we’ll get
> > > additional layering cleanups along the way.
> > > 
> > > So for now, given the already large size of the series, I’d like
> > > to
> > > get
> > > the functionality in first and then tackle the layering
> > > refactors.
> > 
> > I get what you're saying here. Other than complexity, is there a
> > reason
> > we can't do that work first though? Is there some critical reason
> > we
> 
> No reason I couldn't pile in those changes into this series.
> 
> > need to get the CPU binding work in first, basically?
> > 
> 
> CPU binding is the basis for much of the SVM performance work—ULLS on
> the migration queue doesn’t work without it, and parallel
> faults/prefetches don’t work all that well either because of ordering
> issues on the migration queue between copies, clears, binds, etc.
> 
> CPU binds also make some of the multi-tile refactors easier (included
> in
> this series; see patches 12, 15, and 16).
> 
> So I’d like to get this in without expanding the scope too much. I
> will
> definitely rework this to use a dedicated backend and wire it up
> through
> all the layers in a follow-up, though.

Yeah I get what you're saying. I'll need to finish reviewing those
later patches before I'm ready to review this one. But I'm going to be
a little busy the next few days so likely not able to get back here
until some time next week. Don't feel like you need to hold here for me
if someone else can review in the meantime.

Thanks,
Stuart

> 
> Matt
> 
> > -Stuart
> > 
> > > 
> > > Matt
> > > 
> > > > Thanks,
> > > > Stuart
> > > > 
> > > > > 
> > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/xe/xe_guc_submit.c | 37
> > > > > ++++++++++++++++++++++++++--
> > > > > --
> > > > >  drivers/gpu/drm/xe/xe_migrate.c    | 13 ++++++++++-
> > > > >  drivers/gpu/drm/xe/xe_migrate.h    |  8 +++++++
> > > > >  3 files changed, 52 insertions(+), 6 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > index 453af51fe87b..1d6ac7a6563b 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > @@ -36,8 +36,10 @@
> > > > >  #include "xe_lrc.h"
> > > > >  #include "xe_macros.h"
> > > > >  #include "xe_map.h"
> > > > > +#include "xe_migrate.h"
> > > > >  #include "xe_mocs.h"
> > > > >  #include "xe_pm.h"
> > > > > +#include "xe_pt.h"
> > > > >  #include "xe_ring_ops_types.h"
> > > > >  #include "xe_sched_job.h"
> > > > >  #include "xe_sleep.h"
> > > > > @@ -1183,6 +1185,20 @@ static void submit_exec_queue(struct
> > > > > xe_exec_queue *q, struct xe_sched_job *job)
> > > > >         }
> > > > >  }
> > > > >  
> > > > > +static bool is_pt_job(struct xe_sched_job *job)
> > > > > +{
> > > > > +       return job->is_pt_job;
> > > > > +}
> > > > > +
> > > > > +static void run_pt_job(struct xe_sched_job *job)
> > > > > +{
> > > > > +       xe_migrate_update_pgtables_cpu_execute(job-
> > > > > > pt_update[0].vm,
> > > > > +                                              job-
> > > > > > pt_update[0].tile,
> > > > > +                                              job-
> > > > > > pt_update[0].ops,
> > > > > +                                              job-
> > > > > > pt_update[0].pt_job_ops->ops,
> > > > > +                                              job-
> > > > > > pt_update[0].pt_job_ops->current_op);
> > > > > +}
> > > > > +
> > > > >  static struct dma_fence *
> > > > >  guc_exec_queue_run_job(struct drm_sched_job *drm_job)
> > > > >  {
> > > > > @@ -1210,14 +1226,25 @@ guc_exec_queue_run_job(struct
> > > > > drm_sched_job
> > > > > *drm_job)
> > > > >                                 register_exec_queue(primary,
> > > > > GUC_CONTEXT_NORMAL);
> > > > >                 }
> > > > >  
> > > > > -               if (!exec_queue_registered(q))
> > > > > -                       register_exec_queue(q,
> > > > > GUC_CONTEXT_NORMAL);
> > > > > -               if (!job->restore_replay)
> > > > > -                       q->ring_ops->emit_job(job);
> > > > > -               submit_exec_queue(q, job);
> > > > > +               if (is_pt_job(job)) {
> > > > > +                       xe_gt_assert(guc_to_gt(guc),
> > > > > !exec_queue_registered(q));
> > > > > +                       run_pt_job(job);
> > > > > +               } else {
> > > > > +                       if (!exec_queue_registered(q))
> > > > > +                               register_exec_queue(q,
> > > > > GUC_CONTEXT_NORMAL);
> > > > > +                       if (!job->restore_replay)
> > > > > +                               q->ring_ops->emit_job(job);
> > > > > +                       submit_exec_queue(q, job);
> > > > > +               }
> > > > >                 job->restore_replay = false;
> > > > >         }
> > > > >  
> > > > > +       if (is_pt_job(job)) {
> > > > > +               xe_pt_job_ops_put(job-
> > > > > >pt_update[0].pt_job_ops);
> > > > > +               dma_fence_put(job->fence);      /* Drop ref
> > > > > from
> > > > > xe_sched_job_arm */
> > > > > +               return NULL;
> > > > > +       }
> > > > > +
> > > > >  run_job_out:
> > > > >  
> > > > >         return job->fence;
> > > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > > > > b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > index cd6802642ef3..e9b9dfe19e48 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > @@ -1715,7 +1715,18 @@ struct migrate_test_params {
> > > > >         container_of(_priv, struct migrate_test_params, base)
> > > > >  #endif
> > > > >  
> > > > > -static void
> > > > > +/**
> > > > > + * xe_migrate_update_pgtables_cpu_execute() - Update a VM's
> > > > > PTEs
> > > > > via
> > > > > the CPU
> > > > > + * @vm: The VM being updated
> > > > > + * @tile: The tile being updated
> > > > > + * @ops: The migrate PT update ops
> > > > > + * @pt_ops: The VM PT update ops
> > > > > + * @num_ops: The number of The VM PT update ops
> > > > > + *
> > > > > + * Execute the VM PT update ops array which results in a
> > > > > VM's
> > > > > PTEs
> > > > > being updated
> > > > > + * via the CPU.
> > > > > + */
> > > > > +void
> > > > >  xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm,
> > > > > struct
> > > > > xe_tile *tile,
> > > > >                                        const struct
> > > > > xe_migrate_pt_update_ops *ops,
> > > > >                                        struct
> > > > > xe_vm_pgtable_update_op
> > > > > *pt_op,
> > > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> > > > > b/drivers/gpu/drm/xe/xe_migrate.h
> > > > > index c3c0740f908d..30c9c990a8b1 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_migrate.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_migrate.h
> > > > > @@ -24,6 +24,7 @@ struct xe_pt;
> > > > >  struct xe_tile;
> > > > >  struct xe_vm;
> > > > >  struct xe_vm_pgtable_update;
> > > > > +struct xe_vm_pgtable_update_op;
> > > > >  struct xe_vma;
> > > > >  
> > > > >  enum xe_sriov_vf_ccs_rw_ctxs;
> > > > > @@ -157,6 +158,13 @@ struct dma_fence
> > > > > *xe_migrate_clear(struct
> > > > > xe_migrate *m,
> > > > >  
> > > > >  struct xe_vm *xe_migrate_get_vm(struct xe_migrate *m);
> > > > >  
> > > > > +
> > > > > +void
> > > > > +xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm,
> > > > > struct
> > > > > xe_tile *tile,
> > > > > +                                      const struct
> > > > > xe_migrate_pt_update_ops *ops,
> > > > > +                                      struct
> > > > > xe_vm_pgtable_update_op
> > > > > *pt_op,
> > > > > +                                      int num_ops);
> > > > > +
> > > > >  struct dma_fence *
> > > > >  xe_migrate_update_pgtables(struct xe_migrate *m,
> > > > >                            struct xe_migrate_pt_update
> > > > > *pt_update);
> > > > 
> > 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 11/25] drm/xe: Store level in struct xe_vm_pgtable_update
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (9 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 10/25] drm/xe: Update GuC submission backend to run PT jobs Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-03-03 23:44   ` Summers, Stuart
  2026-02-28  1:34 ` [PATCH v3 12/25] drm/xe: Don't use migrate exec queue for page fault binds Matthew Brost
                   ` (19 subsequent siblings)
  30 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

The level was previously extracted from struct xe_pt inside
xe_vm_pgtable_update during CPU binds, which always occurred during the
bind IOCTL. With CPU binds now supported in bind jobs, struct xe_pt may
no longer be valid in memory at that point. To address this, store the
level directly in struct xe_vm_pgtable_update.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c       | 3 ++-
 drivers/gpu/drm/xe/xe_pt_types.h | 8 +++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 6b56e62a35c1..0a90d1460a8b 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -381,6 +381,7 @@ xe_pt_new_shared(struct xe_walk_update *wupd, struct xe_pt *parent,
 	entry->flags = 0;
 	entry->qwords = 0;
 	entry->pt_bo->update_index = -1;
+	entry->level = parent->level;
 
 	if (alloc_entries) {
 		entry->pt_entries = kmalloc_objs(*entry->pt_entries, XE_PDES);
@@ -1830,7 +1831,7 @@ xe_migrate_clear_pgtable_callback(struct xe_vm *vm, struct xe_tile *tile,
 				  u32 qword_ofs, u32 num_qwords,
 				  const struct xe_vm_pgtable_update *update)
 {
-	u64 empty = __xe_pt_empty_pte(tile, vm, update->pt->level);
+	u64 empty = __xe_pt_empty_pte(tile, vm, update->level);
 	int i;
 
 	if (map && map->is_iomem)
diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
index 92d50573ed1d..aa1d7c0e8669 100644
--- a/drivers/gpu/drm/xe/xe_pt_types.h
+++ b/drivers/gpu/drm/xe/xe_pt_types.h
@@ -65,12 +65,18 @@ struct xe_vm_pgtable_update {
 	/** @qwords: number of PTE's to write */
 	u32 qwords;
 
-	/** @pt: opaque pointer useful for the caller of xe_migrate_update_pgtables */
+	/**
+	 * @pt: opaque pointer useful for PT building in the bind IOCTL. Only
+	 * safe to touch during the bind IOCTL (i.e., not in bind jobs).
+	 */
 	struct xe_pt *pt;
 
 	/** @pt_entries: Newly added pagetable entries */
 	struct xe_pt_entry *pt_entries;
 
+	/** @level: level of update */
+	unsigned int level;
+
 	/** @flags: Target flags */
 	u32 flags;
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 11/25] drm/xe: Store level in struct xe_vm_pgtable_update
  2026-02-28  1:34 ` [PATCH v3 11/25] drm/xe: Store level in struct xe_vm_pgtable_update Matthew Brost
@ 2026-03-03 23:44   ` Summers, Stuart
  0 siblings, 0 replies; 63+ messages in thread
From: Summers, Stuart @ 2026-03-03 23:44 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Ghimiray, Himal Prasad, Yadav, Arvind,
	thomas.hellstrom@linux.intel.com, Dugast, Francois

On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> The level was previously extracted from struct xe_pt inside
> xe_vm_pgtable_update during CPU binds, which always occurred during
> the
> bind IOCTL. With CPU binds now supported in bind jobs, struct xe_pt
> may
> no longer be valid in memory at that point. To address this, store
> the
> level directly in struct xe_vm_pgtable_update.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Stuart Summers <stuart.summers@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_pt.c       | 3 ++-
>  drivers/gpu/drm/xe/xe_pt_types.h | 8 +++++++-
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 6b56e62a35c1..0a90d1460a8b 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -381,6 +381,7 @@ xe_pt_new_shared(struct xe_walk_update *wupd,
> struct xe_pt *parent,
>         entry->flags = 0;
>         entry->qwords = 0;
>         entry->pt_bo->update_index = -1;
> +       entry->level = parent->level;
>  
>         if (alloc_entries) {
>                 entry->pt_entries = kmalloc_objs(*entry->pt_entries,
> XE_PDES);
> @@ -1830,7 +1831,7 @@ xe_migrate_clear_pgtable_callback(struct xe_vm
> *vm, struct xe_tile *tile,
>                                   u32 qword_ofs, u32 num_qwords,
>                                   const struct xe_vm_pgtable_update
> *update)
>  {
> -       u64 empty = __xe_pt_empty_pte(tile, vm, update->pt->level);
> +       u64 empty = __xe_pt_empty_pte(tile, vm, update->level);
>         int i;
>  
>         if (map && map->is_iomem)
> diff --git a/drivers/gpu/drm/xe/xe_pt_types.h
> b/drivers/gpu/drm/xe/xe_pt_types.h
> index 92d50573ed1d..aa1d7c0e8669 100644
> --- a/drivers/gpu/drm/xe/xe_pt_types.h
> +++ b/drivers/gpu/drm/xe/xe_pt_types.h
> @@ -65,12 +65,18 @@ struct xe_vm_pgtable_update {
>         /** @qwords: number of PTE's to write */
>         u32 qwords;
>  
> -       /** @pt: opaque pointer useful for the caller of
> xe_migrate_update_pgtables */
> +       /**
> +        * @pt: opaque pointer useful for PT building in the bind
> IOCTL. Only
> +        * safe to touch during the bind IOCTL (i.e., not in bind
> jobs).
> +        */
>         struct xe_pt *pt;
>  
>         /** @pt_entries: Newly added pagetable entries */
>         struct xe_pt_entry *pt_entries;
>  
> +       /** @level: level of update */
> +       unsigned int level;
> +
>         /** @flags: Target flags */
>         u32 flags;
>  };


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 12/25] drm/xe: Don't use migrate exec queue for page fault binds
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (10 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 11/25] drm/xe: Store level in struct xe_vm_pgtable_update Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-02-28  1:34 ` [PATCH v3 13/25] drm/xe: Enable CPU binds for jobs Matthew Brost
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Now that the CPU is always used for binds even in jobs, CPU bind jobs
can pass GPU jobs in the same exec queue resulting dma-fences signaling
out-of-order. Use a dedicated exec queue for binds issued from page
faults to avoid ordering issues and avoid blocking kernel binds on
unrelated copies / clears.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_migrate.c | 49 ++++++++++++++++++++++++++++++---
 drivers/gpu/drm/xe/xe_migrate.h |  1 +
 drivers/gpu/drm/xe/xe_vm.c      | 17 ++++++++----
 3 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index e9b9dfe19e48..547affe55361 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -49,6 +49,8 @@
 struct xe_migrate {
 	/** @q: Default exec queue used for migration */
 	struct xe_exec_queue *q;
+	/** @bind_q: Default exec queue used for binds */
+	struct xe_exec_queue *bind_q;
 	/** @tile: Backpointer to the tile this struct xe_migrate belongs to. */
 	struct xe_tile *tile;
 	/** @job_mutex: Timeline mutex for @eng. */
@@ -113,6 +115,7 @@ static void xe_migrate_fini(void *arg)
 	mutex_destroy(&m->job_mutex);
 	xe_vm_close_and_put(m->q->vm);
 	xe_exec_queue_put(m->q);
+	xe_exec_queue_put(m->bind_q);
 }
 
 static u64 xe_migrate_vm_addr(u64 slot, u32 level)
@@ -465,6 +468,16 @@ int xe_migrate_init(struct xe_migrate *m)
 			goto err_out;
 		}
 
+		m->bind_q = xe_exec_queue_create(xe, vm, logical_mask, 1, hwe,
+						 EXEC_QUEUE_FLAG_KERNEL |
+						 EXEC_QUEUE_FLAG_PERMANENT |
+						 EXEC_QUEUE_FLAG_HIGH_PRIORITY |
+						 EXEC_QUEUE_FLAG_MIGRATE, 0);
+		if (IS_ERR(m->bind_q)) {
+			err = PTR_ERR(m->bind_q);
+			goto err_out;
+		}
+
 		/*
 		 * XXX: Currently only reserving 1 (likely slow) BCS instance on
 		 * PVC, may want to revisit if performance is needed.
@@ -476,6 +489,16 @@ int xe_migrate_init(struct xe_migrate *m)
 					    EXEC_QUEUE_FLAG_MIGRATE |
 					    EXEC_QUEUE_FLAG_LOW_LATENCY, 0);
 	} else {
+		m->bind_q = xe_exec_queue_create_class(xe, primary_gt, vm,
+						       XE_ENGINE_CLASS_COPY,
+						       EXEC_QUEUE_FLAG_KERNEL |
+						       EXEC_QUEUE_FLAG_PERMANENT |
+						       EXEC_QUEUE_FLAG_MIGRATE, 0);
+		if (IS_ERR(m->bind_q)) {
+			err = PTR_ERR(m->bind_q);
+			goto err_out;
+		}
+
 		m->q = xe_exec_queue_create_class(xe, primary_gt, vm,
 						  XE_ENGINE_CLASS_COPY,
 						  EXEC_QUEUE_FLAG_KERNEL |
@@ -512,6 +535,8 @@ int xe_migrate_init(struct xe_migrate *m)
 	return err;
 
 err_out:
+	if (!IS_ERR_OR_NULL(m->bind_q))
+		xe_exec_queue_put(m->bind_q);
 	xe_vm_close_and_put(vm);
 	return err;
 
@@ -1395,6 +1420,17 @@ struct dma_fence *xe_migrate_vram_copy_chunk(struct xe_bo *vram_bo, u64 vram_off
 	return fence;
 }
 
+/**
+ * xe_get_migrate_bind_queue() - Get the bind queue from migrate context.
+ * @migrate: Migrate context.
+ *
+ * Return: Pointer to bind queue on success, error on failure
+ */
+struct xe_exec_queue *xe_migrate_bind_queue(struct xe_migrate *migrate)
+{
+	return migrate->bind_q;
+}
+
 static void emit_clear_link_copy(struct xe_gt *gt, struct xe_bb *bb, u64 src_ofs,
 				 u32 size, u32 pitch)
 {
@@ -1788,6 +1824,11 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
 	return dma_fence_get_stub();
 }
 
+static bool is_migrate_queue(struct xe_migrate *m, struct xe_exec_queue *q)
+{
+	return m->bind_q == q;
+}
+
 static struct dma_fence *
 __xe_migrate_update_pgtables(struct xe_migrate *m,
 			     struct xe_migrate_pt_update *pt_update,
@@ -1805,7 +1846,7 @@ __xe_migrate_update_pgtables(struct xe_migrate *m,
 	u32 num_updates = 0, current_update = 0;
 	u64 addr;
 	int err = 0;
-	bool is_migrate = pt_update_ops->q == m->q;
+	bool is_migrate = is_migrate_queue(m, pt_update_ops->q);
 	bool usm = is_migrate && xe->info.has_usm;
 
 	for (i = 0; i < pt_update_ops->num_ops; ++i) {
@@ -2527,7 +2568,7 @@ int xe_migrate_access_memory(struct xe_migrate *m, struct xe_bo *bo,
  */
 void xe_migrate_job_lock(struct xe_migrate *m, struct xe_exec_queue *q)
 {
-	bool is_migrate = q == m->q;
+	bool is_migrate = is_migrate_queue(m, q);
 
 	if (is_migrate)
 		mutex_lock(&m->job_mutex);
@@ -2545,7 +2586,7 @@ void xe_migrate_job_lock(struct xe_migrate *m, struct xe_exec_queue *q)
  */
 void xe_migrate_job_unlock(struct xe_migrate *m, struct xe_exec_queue *q)
 {
-	bool is_migrate = q == m->q;
+	bool is_migrate = is_migrate_queue(m, q);
 
 	if (is_migrate)
 		mutex_unlock(&m->job_mutex);
@@ -2562,7 +2603,7 @@ void xe_migrate_job_lock_assert(struct xe_exec_queue *q)
 {
 	struct xe_migrate *m = gt_to_tile(q->gt)->migrate;
 
-	xe_gt_assert(q->gt, q == m->q);
+	xe_gt_assert(q->gt, q == m->bind_q);
 	lockdep_assert_held(&m->job_mutex);
 }
 #endif
diff --git a/drivers/gpu/drm/xe/xe_migrate.h b/drivers/gpu/drm/xe/xe_migrate.h
index 30c9c990a8b1..9865de29fee7 100644
--- a/drivers/gpu/drm/xe/xe_migrate.h
+++ b/drivers/gpu/drm/xe/xe_migrate.h
@@ -140,6 +140,7 @@ void xe_migrate_ccs_rw_copy_clear(struct xe_bo *src_bo,
 
 struct xe_lrc *xe_migrate_lrc(struct xe_migrate *migrate);
 struct xe_exec_queue *xe_migrate_exec_queue(struct xe_migrate *migrate);
+struct xe_exec_queue *xe_migrate_bind_queue(struct xe_migrate *migrate);
 struct dma_fence *xe_migrate_vram_copy_chunk(struct xe_bo *vram_bo, u64 vram_offset,
 					     struct xe_bo *sysmem_bo, u64 sysmem_offset,
 					     u64 size, enum xe_migrate_copy_dir dir);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 3e2d2191b78c..4ddfdd6a3c2a 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -688,7 +688,9 @@ int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 	struct xe_vma *vma, *next;
 	struct xe_vma_ops vops;
 	struct xe_vma_op *op, *next_op;
-	int err, i;
+	struct xe_tile *tile;
+	u8 id;
+	int err;
 
 	lockdep_assert_held(&vm->lock);
 	if ((xe_vm_in_lr_mode(vm) && !rebind_worker) ||
@@ -696,8 +698,11 @@ int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 		return 0;
 
 	xe_vma_ops_init(&vops, vm, NULL, NULL, 0);
-	for (i = 0; i < XE_MAX_TILES_PER_DEVICE; ++i)
-		vops.pt_update_ops[i].wait_vm_bookkeep = true;
+	for_each_tile(tile, vm->xe, id) {
+		vops.pt_update_ops[id].wait_vm_bookkeep = true;
+		vops.pt_update_ops[id].q =
+			xe_migrate_bind_queue(tile->migrate);
+	}
 
 	xe_vm_assert_held(vm);
 	list_for_each_entry(vma, &vm->rebind_list, combined_links.rebind) {
@@ -755,7 +760,7 @@ struct dma_fence *xe_vma_rebind(struct xe_vm *vm, struct xe_vma *vma, u8 tile_ma
 	for_each_tile(tile, vm->xe, id) {
 		vops.pt_update_ops[id].wait_vm_bookkeep = true;
 		vops.pt_update_ops[tile->id].q =
-			xe_migrate_exec_queue(tile->migrate);
+			xe_migrate_bind_queue(tile->migrate);
 	}
 
 	err = xe_vm_ops_add_rebind(&vops, vma, tile_mask);
@@ -846,7 +851,7 @@ struct dma_fence *xe_vm_range_rebind(struct xe_vm *vm,
 	for_each_tile(tile, vm->xe, id) {
 		vops.pt_update_ops[id].wait_vm_bookkeep = true;
 		vops.pt_update_ops[tile->id].q =
-			xe_migrate_exec_queue(tile->migrate);
+			xe_migrate_bind_queue(tile->migrate);
 	}
 
 	err = xe_vm_ops_add_range_rebind(&vops, vma, range, tile_mask);
@@ -929,7 +934,7 @@ struct dma_fence *xe_vm_range_unbind(struct xe_vm *vm,
 	for_each_tile(tile, vm->xe, id) {
 		vops.pt_update_ops[id].wait_vm_bookkeep = true;
 		vops.pt_update_ops[tile->id].q =
-			xe_migrate_exec_queue(tile->migrate);
+			xe_migrate_bind_queue(tile->migrate);
 	}
 
 	err = xe_vm_ops_add_range_unbind(&vops, range);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 13/25] drm/xe: Enable CPU binds for jobs
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (11 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 12/25] drm/xe: Don't use migrate exec queue for page fault binds Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-02-28  1:34 ` [PATCH v3 14/25] drm/xe: Remove unused arguments from xe_migrate_pt_update_ops Matthew Brost
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

No reason to use the GPU for binds.

Benefits of CPU-based binds:
- Lower latency once dependencies are resolved, as there is no
  interaction with the GuC or a hardware context switch both of which
  are relatively slow.
- Large arrays of binds do not risk running out of migration PTEs,
  avoiding -ENOBUFS being returned to userspace.
- Kernel binds are decoupled from the migration exec queue (which issues
  copies and clears), so they cannot get stuck behind unrelated
  jobs—this can be a problem with parallel GPU faults.
- Paves the for path decouping binds from tiles and individual engines
- Enables ULLS on the migration exec queue, as this queue has exclusive
  access to the paging copy engine.

Update migration layer to formulate a PT job which will issue CPU bind
in the submission backend.

All code related to GPU-based binding has been removed.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_bo_types.h |   2 -
 drivers/gpu/drm/xe/xe_migrate.c  | 239 ++-----------------------------
 drivers/gpu/drm/xe/xe_pt.c       |   1 -
 3 files changed, 14 insertions(+), 228 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index d4fe3c8dca5b..bcbd23c7d2ed 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -79,8 +79,6 @@ struct xe_bo {
 
 	/** @freed: List node for delayed put. */
 	struct llist_node freed;
-	/** @update_index: Update index if PT BO */
-	int update_index;
 	/** @created: Whether the bo has passed initial creation */
 	bool created;
 
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 547affe55361..00288a2ead00 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -75,18 +75,12 @@ struct xe_migrate {
 	 * Protected by @job_mutex.
 	 */
 	struct dma_fence *fence;
-	/**
-	 * @vm_update_sa: For integrated, used to suballocate page-tables
-	 * out of the pt_bo.
-	 */
-	struct drm_suballoc_manager vm_update_sa;
 	/** @min_chunk_size: For dgfx, Minimum chunk size */
 	u64 min_chunk_size;
 };
 
 #define MAX_PREEMPTDISABLE_TRANSFER SZ_8M /* Around 1ms. */
 #define MAX_CCS_LIMITED_TRANSFER SZ_4M /* XE_PAGE_SIZE * (FIELD_MAX(XE2_CCS_SIZE_MASK) + 1) */
-#define NUM_KERNEL_PDE 15
 #define NUM_PT_SLOTS 32
 #define LEVEL0_PAGE_TABLE_ENCODE_SIZE SZ_2M
 #define MAX_NUM_PTE 512
@@ -111,7 +105,6 @@ static void xe_migrate_fini(void *arg)
 
 	dma_fence_put(m->fence);
 	xe_bo_put(m->pt_bo);
-	drm_suballoc_manager_fini(&m->vm_update_sa);
 	mutex_destroy(&m->job_mutex);
 	xe_vm_close_and_put(m->q->vm);
 	xe_exec_queue_put(m->q);
@@ -205,8 +198,6 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 	BUILD_BUG_ON(NUM_PT_SLOTS > SZ_2M/XE_PAGE_SIZE);
 	/* Must be a multiple of 64K to support all platforms */
 	BUILD_BUG_ON(NUM_PT_SLOTS * XE_PAGE_SIZE % SZ_64K);
-	/* And one slot reserved for the 4KiB page table updates */
-	BUILD_BUG_ON(!(NUM_KERNEL_PDE & 1));
 
 	/* Need to be sure everything fits in the first PT, or create more */
 	xe_tile_assert(tile, m->batch_base_ofs + xe_bo_size(batch) < SZ_2M);
@@ -344,8 +335,6 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 	/*
 	 * Example layout created above, with root level = 3:
 	 * [PT0...PT7]: kernel PT's for copy/clear; 64 or 4KiB PTE's
-	 * [PT8]: Kernel PT for VM_BIND, 4 KiB PTE's
-	 * [PT9...PT26]: Userspace PT's for VM_BIND, 4 KiB PTE's
 	 * [PT27 = PDE 0] [PT28 = PDE 1] [PT29 = PDE 2] [PT30 & PT31 = 2M vram identity map]
 	 *
 	 * This makes the lowest part of the VM point to the pagetables.
@@ -353,19 +342,10 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 	 * and flushes, other parts of the VM can be used either for copying and
 	 * clearing.
 	 *
-	 * For performance, the kernel reserves PDE's, so about 20 are left
-	 * for async VM updates.
-	 *
 	 * To make it easier to work, each scratch PT is put in slot (1 + PT #)
 	 * everywhere, this allows lockless updates to scratch pages by using
 	 * the different addresses in VM.
 	 */
-#define NUM_VMUSA_UNIT_PER_PAGE	32
-#define VM_SA_UPDATE_UNIT_SIZE		(XE_PAGE_SIZE / NUM_VMUSA_UNIT_PER_PAGE)
-#define NUM_VMUSA_WRITES_PER_UNIT	(VM_SA_UPDATE_UNIT_SIZE / sizeof(u64))
-	drm_suballoc_manager_init(&m->vm_update_sa,
-				  (size_t)(map_ofs / XE_PAGE_SIZE - NUM_KERNEL_PDE) *
-				  NUM_VMUSA_UNIT_PER_PAGE, 0);
 
 	m->pt_bo = bo;
 	return 0;
@@ -1078,6 +1058,9 @@ struct xe_lrc *xe_migrate_lrc(struct xe_migrate *migrate)
 	return migrate->q->lrc[0];
 }
 
+/* XXX: With CPU binds this can be removed in a follow up */
+#define NUM_KERNEL_PDE 15
+
 static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
 {
 	/*
@@ -1686,56 +1669,6 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m,
 	return fence;
 }
 
-static void write_pgtable(struct xe_tile *tile, struct xe_bb *bb, u64 ppgtt_ofs,
-			  const struct xe_vm_pgtable_update_op *pt_op,
-			  const struct xe_vm_pgtable_update *update,
-			  struct xe_migrate_pt_update *pt_update)
-{
-	const struct xe_migrate_pt_update_ops *ops = pt_update->ops;
-	struct xe_vm *vm = pt_update->vops->vm;
-	u32 chunk;
-	u32 ofs = update->ofs, size = update->qwords;
-
-	/*
-	 * If we have 512 entries (max), we would populate it ourselves,
-	 * and update the PDE above it to the new pointer.
-	 * The only time this can only happen if we have to update the top
-	 * PDE. This requires a BO that is almost vm->size big.
-	 *
-	 * This shouldn't be possible in practice.. might change when 16K
-	 * pages are used. Hence the assert.
-	 */
-	xe_tile_assert(tile, update->qwords < MAX_NUM_PTE);
-	if (!ppgtt_ofs)
-		ppgtt_ofs = xe_migrate_vram_ofs(tile_to_xe(tile),
-						xe_bo_addr(update->pt_bo, 0,
-							   XE_PAGE_SIZE), false);
-
-	do {
-		u64 addr = ppgtt_ofs + ofs * 8;
-
-		chunk = min(size, MAX_PTE_PER_SDI);
-
-		/* Ensure populatefn can do memset64 by aligning bb->cs */
-		if (!(bb->len & 1))
-			bb->cs[bb->len++] = MI_NOOP;
-
-		bb->cs[bb->len++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(chunk);
-		bb->cs[bb->len++] = lower_32_bits(addr);
-		bb->cs[bb->len++] = upper_32_bits(addr);
-		if (pt_op->bind)
-			ops->populate(tile, NULL, bb->cs + bb->len,
-				      ofs, chunk, update);
-		else
-			ops->clear(vm, tile, NULL, bb->cs + bb->len,
-				   ofs, chunk, update);
-
-		bb->len += chunk * 2;
-		ofs += chunk;
-		size -= chunk;
-	} while (size);
-}
-
 struct xe_vm *xe_migrate_get_vm(struct xe_migrate *m)
 {
 	return xe_vm_get(m->q->vm);
@@ -1836,162 +1769,18 @@ __xe_migrate_update_pgtables(struct xe_migrate *m,
 {
 	const struct xe_migrate_pt_update_ops *ops = pt_update->ops;
 	struct xe_tile *tile = m->tile;
-	struct xe_gt *gt = tile->primary_gt;
-	struct xe_device *xe = tile_to_xe(tile);
 	struct xe_sched_job *job;
 	struct dma_fence *fence;
-	struct drm_suballoc *sa_bo = NULL;
-	struct xe_bb *bb;
-	u32 i, j, batch_size = 0, ppgtt_ofs, update_idx, page_ofs = 0;
-	u32 num_updates = 0, current_update = 0;
-	u64 addr;
-	int err = 0;
 	bool is_migrate = is_migrate_queue(m, pt_update_ops->q);
-	bool usm = is_migrate && xe->info.has_usm;
-
-	for (i = 0; i < pt_update_ops->num_ops; ++i) {
-		struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->pt_job_ops->ops[i];
-		struct xe_vm_pgtable_update *updates = pt_op->entries;
-
-		num_updates += pt_op->num_entries;
-		for (j = 0; j < pt_op->num_entries; ++j) {
-			u32 num_cmds = DIV_ROUND_UP(updates[j].qwords,
-						    MAX_PTE_PER_SDI);
-
-			/* align noop + MI_STORE_DATA_IMM cmd prefix */
-			batch_size += 4 * num_cmds + updates[j].qwords * 2;
-		}
-	}
-
-	/* fixed + PTE entries */
-	if (IS_DGFX(xe))
-		batch_size += 2;
-	else
-		batch_size += 6 * (num_updates / MAX_PTE_PER_SDI + 1) +
-			num_updates * 2;
-
-	bb = xe_bb_new(gt, batch_size, usm);
-	if (IS_ERR(bb))
-		return ERR_CAST(bb);
-
-	/* For sysmem PTE's, need to map them in our hole.. */
-	if (!IS_DGFX(xe)) {
-		u16 pat_index = xe->pat.idx[XE_CACHE_WB];
-		u32 ptes, ofs;
-
-		ppgtt_ofs = NUM_KERNEL_PDE - 1;
-		if (!is_migrate) {
-			u32 num_units = DIV_ROUND_UP(num_updates,
-						     NUM_VMUSA_WRITES_PER_UNIT);
-
-			if (num_units > m->vm_update_sa.size) {
-				err = -ENOBUFS;
-				goto err_bb;
-			}
-			sa_bo = drm_suballoc_new(&m->vm_update_sa, num_units,
-						 GFP_KERNEL, true, 0);
-			if (IS_ERR(sa_bo)) {
-				err = PTR_ERR(sa_bo);
-				goto err_bb;
-			}
-
-			ppgtt_ofs = NUM_KERNEL_PDE +
-				(drm_suballoc_soffset(sa_bo) /
-				 NUM_VMUSA_UNIT_PER_PAGE);
-			page_ofs = (drm_suballoc_soffset(sa_bo) %
-				    NUM_VMUSA_UNIT_PER_PAGE) *
-				VM_SA_UPDATE_UNIT_SIZE;
-		}
-
-		/* Map our PT's to gtt */
-		i = 0;
-		j = 0;
-		ptes = num_updates;
-		ofs = ppgtt_ofs * XE_PAGE_SIZE + page_ofs;
-		while (ptes) {
-			u32 chunk = min(MAX_PTE_PER_SDI, ptes);
-			u32 idx = 0;
-
-			bb->cs[bb->len++] = MI_STORE_DATA_IMM |
-				MI_SDI_NUM_QW(chunk);
-			bb->cs[bb->len++] = ofs;
-			bb->cs[bb->len++] = 0; /* upper_32_bits */
-
-			for (; i < pt_update_ops->num_ops; ++i) {
-				struct xe_vm_pgtable_update_op *pt_op =
-					&pt_update_ops->pt_job_ops->ops[i];
-				struct xe_vm_pgtable_update *updates = pt_op->entries;
-
-				for (; j < pt_op->num_entries; ++j, ++current_update, ++idx) {
-					struct xe_vm *vm = pt_update->vops->vm;
-					struct xe_bo *pt_bo = updates[j].pt_bo;
-
-					if (idx == chunk)
-						goto next_cmd;
-
-					xe_tile_assert(tile, xe_bo_size(pt_bo) == SZ_4K);
-
-					/* Map a PT at most once */
-					if (pt_bo->update_index < 0)
-						pt_bo->update_index = current_update;
-
-					addr = vm->pt_ops->pte_encode_bo(pt_bo, 0,
-									 pat_index, 0);
-					bb->cs[bb->len++] = lower_32_bits(addr);
-					bb->cs[bb->len++] = upper_32_bits(addr);
-				}
-
-				j = 0;
-			}
-
-next_cmd:
-			ptes -= chunk;
-			ofs += chunk * sizeof(u64);
-		}
-
-		bb->cs[bb->len++] = MI_BATCH_BUFFER_END;
-		update_idx = bb->len;
-
-		addr = xe_migrate_vm_addr(ppgtt_ofs, 0) +
-			(page_ofs / sizeof(u64)) * XE_PAGE_SIZE;
-		for (i = 0; i < pt_update_ops->num_ops; ++i) {
-			struct xe_vm_pgtable_update_op *pt_op =
-				&pt_update_ops->pt_job_ops->ops[i];
-			struct xe_vm_pgtable_update *updates = pt_op->entries;
-
-			for (j = 0; j < pt_op->num_entries; ++j) {
-				struct xe_bo *pt_bo = updates[j].pt_bo;
-
-				write_pgtable(tile, bb, addr +
-					      pt_bo->update_index * XE_PAGE_SIZE,
-					      pt_op, &updates[j], pt_update);
-			}
-		}
-	} else {
-		/* phys pages, no preamble required */
-		bb->cs[bb->len++] = MI_BATCH_BUFFER_END;
-		update_idx = bb->len;
-
-		for (i = 0; i < pt_update_ops->num_ops; ++i) {
-			struct xe_vm_pgtable_update_op *pt_op =
-				&pt_update_ops->pt_job_ops->ops[i];
-			struct xe_vm_pgtable_update *updates = pt_op->entries;
-
-			for (j = 0; j < pt_op->num_entries; ++j)
-				write_pgtable(tile, bb, 0, pt_op, &updates[j],
-					      pt_update);
-		}
-	}
+	int err;
 
-	job = xe_bb_create_migration_job(pt_update_ops->q, bb,
-					 xe_migrate_batch_base(m, usm),
-					 update_idx);
+	job = xe_sched_job_create(pt_update_ops->q, NULL);
 	if (IS_ERR(job)) {
 		err = PTR_ERR(job);
-		goto err_sa;
+		goto err_out;
 	}
 
-	xe_sched_job_add_migrate_flush(job, MI_INVALIDATE_TLB);
+	xe_tile_assert(tile, job->is_pt_job);
 
 	if (ops->pre_commit) {
 		pt_update->job = job;
@@ -2002,6 +1791,12 @@ __xe_migrate_update_pgtables(struct xe_migrate *m,
 	if (is_migrate)
 		mutex_lock(&m->job_mutex);
 
+	job->pt_update[0].vm = pt_update->vops->vm;
+	job->pt_update[0].tile = tile;
+	job->pt_update[0].ops = ops;
+	job->pt_update[0].pt_job_ops =
+		xe_pt_job_ops_get(pt_update_ops->pt_job_ops);
+
 	xe_sched_job_arm(job);
 	fence = dma_fence_get(&job->drm.s_fence->finished);
 	xe_sched_job_push(job);
@@ -2009,17 +1804,11 @@ __xe_migrate_update_pgtables(struct xe_migrate *m,
 	if (is_migrate)
 		mutex_unlock(&m->job_mutex);
 
-	xe_bb_free(bb, fence);
-	drm_suballoc_free(sa_bo, fence);
-
 	return fence;
 
 err_job:
 	xe_sched_job_put(job);
-err_sa:
-	drm_suballoc_free(sa_bo, NULL);
-err_bb:
-	xe_bb_free(bb, NULL);
+err_out:
 	return ERR_PTR(err);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 0a90d1460a8b..dc567e442db2 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -380,7 +380,6 @@ xe_pt_new_shared(struct xe_walk_update *wupd, struct xe_pt *parent,
 	entry->pt = parent;
 	entry->flags = 0;
 	entry->qwords = 0;
-	entry->pt_bo->update_index = -1;
 	entry->level = parent->level;
 
 	if (alloc_entries) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 14/25] drm/xe: Remove unused arguments from xe_migrate_pt_update_ops
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (12 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 13/25] drm/xe: Enable CPU binds for jobs Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-02-28  1:34 ` [PATCH v3 15/25] drm/xe: Make bind queues operate cross-tile Matthew Brost
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Both populate and clear have unused void* ptr arguments, remove these.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_migrate.c |  4 ++--
 drivers/gpu/drm/xe/xe_migrate.h |  7 ++-----
 drivers/gpu/drm/xe/xe_pt.c      | 29 ++++++++++++++---------------
 3 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 00288a2ead00..fe5c9bdcb555 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1714,11 +1714,11 @@ xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct xe_tile *tile,
 
 			if (pt_op->bind)
 				ops->populate(tile, &update->pt_bo->vmap,
-					      NULL, update->ofs, update->qwords,
+					      update->ofs, update->qwords,
 					      update);
 			else
 				ops->clear(vm, tile, &update->pt_bo->vmap,
-					   NULL, update->ofs, update->qwords,
+					   update->ofs, update->qwords,
 					   update);
 		}
 	}
diff --git a/drivers/gpu/drm/xe/xe_migrate.h b/drivers/gpu/drm/xe/xe_migrate.h
index 9865de29fee7..ae979f6bf8ef 100644
--- a/drivers/gpu/drm/xe/xe_migrate.h
+++ b/drivers/gpu/drm/xe/xe_migrate.h
@@ -43,7 +43,6 @@ struct xe_migrate_pt_update_ops {
 	 * @populate: Populate a command buffer or page-table with ptes.
 	 * @tile: The tile for the current operation.
 	 * @map: struct iosys_map into the memory to be populated.
-	 * @pos: If @map is NULL, map into the memory to be populated.
 	 * @ofs: qword offset into @map, unused if @map is NULL.
 	 * @num_qwords: Number of qwords to write.
 	 * @update: Information about the PTEs to be inserted.
@@ -53,14 +52,13 @@ struct xe_migrate_pt_update_ops {
 	 * page-tables with PTEs.
 	 */
 	void (*populate)(struct xe_tile *tile, struct iosys_map *map,
-			 void *pos, u32 ofs, u32 num_qwords,
+			 u32 ofs, u32 num_qwords,
 			 const struct xe_vm_pgtable_update *update);
 	/**
 	 * @clear: Clear a command buffer or page-table with ptes.
 	 * @vm: VM being updated
 	 * @tile: The tile for the current operation.
 	 * @map: struct iosys_map into the memory to be populated.
-	 * @pos: If @map is NULL, map into the memory to be populated.
 	 * @ofs: qword offset into @map, unused if @map is NULL.
 	 * @num_qwords: Number of qwords to write.
 	 * @update: Information about the PTEs to be inserted.
@@ -70,8 +68,7 @@ struct xe_migrate_pt_update_ops {
 	 * page-tables with PTEs.
 	 */
 	void (*clear)(struct xe_vm *vm, struct xe_tile *tile,
-		      struct iosys_map *map, void *pos, u32 ofs,
-		      u32 num_qwords,
+		      struct iosys_map *map, u32 ofs, u32 num_qwords,
 		      const struct xe_vm_pgtable_update *update);
 
 	/**
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index dc567e442db2..ed7cb34c958c 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -996,20 +996,18 @@ bool xe_pt_zap_ptes_range(struct xe_tile *tile, struct xe_vm *vm,
 
 static void
 xe_vm_populate_pgtable(struct xe_tile *tile, struct iosys_map *map,
-		       void *data, u32 qword_ofs, u32 num_qwords,
+		       u32 qword_ofs, u32 num_qwords,
 		       const struct xe_vm_pgtable_update *update)
 {
 	struct xe_pt_entry *ptes = update->pt_entries;
-	u64 *ptr = data;
 	u32 i;
 
-	for (i = 0; i < num_qwords; i++) {
-		if (map)
-			xe_map_wr(tile_to_xe(tile), map, (qword_ofs + i) *
-				  sizeof(u64), u64, ptes[i].pte);
-		else
-			ptr[i] = ptes[i].pte;
-	}
+	xe_assert(tile_to_xe(tile), map);
+	xe_assert(tile_to_xe(tile), !iosys_map_is_null(map));
+
+	for (i = 0; i < num_qwords; i++)
+		xe_map_wr(tile_to_xe(tile), map, (qword_ofs + i) *
+			  sizeof(u64), u64, ptes[i].pte);
 }
 
 static void xe_pt_cancel_bind(struct xe_vma *vma,
@@ -1826,22 +1824,23 @@ static unsigned int xe_pt_stage_unbind(struct xe_tile *tile,
 
 static void
 xe_migrate_clear_pgtable_callback(struct xe_vm *vm, struct xe_tile *tile,
-				  struct iosys_map *map, void *ptr,
-				  u32 qword_ofs, u32 num_qwords,
+				  struct iosys_map *map, u32 qword_ofs,
+				  u32 num_qwords,
 				  const struct xe_vm_pgtable_update *update)
 {
 	u64 empty = __xe_pt_empty_pte(tile, vm, update->level);
 	int i;
 
-	if (map && map->is_iomem)
+	xe_assert(vm->xe, map);
+	xe_assert(vm->xe, !iosys_map_is_null(map));
+
+	if (map->is_iomem)
 		for (i = 0; i < num_qwords; ++i)
 			xe_map_wr(tile_to_xe(tile), map, (qword_ofs + i) *
 				  sizeof(u64), u64, empty);
-	else if (map)
+	else
 		memset64(map->vaddr + qword_ofs * sizeof(u64), empty,
 			 num_qwords);
-	else
-		memset64(ptr, empty, num_qwords);
 }
 
 static void xe_pt_abort_unbind(struct xe_vma *vma,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 15/25] drm/xe: Make bind queues operate cross-tile
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (13 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 14/25] drm/xe: Remove unused arguments from xe_migrate_pt_update_ops Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-02-28  1:34 ` [PATCH v3 16/25] drm/xe: Add CPU bind layer Matthew Brost
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Since bind jobs execute on the CPU rather than the GPU, maintaining a
per-tile bind queue no longer provides value. Convert the driver to use
a single bind queue shared across tiles. The primary change is routing
all GT TLB invalidations through this unified bind queue.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c       | 142 +++++++++--------------
 drivers/gpu/drm/xe/xe_exec_queue.h       |  14 +--
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  21 ++--
 drivers/gpu/drm/xe/xe_pt.c               |  22 ++--
 drivers/gpu/drm/xe/xe_sync.c             |  20 +---
 drivers/gpu/drm/xe/xe_tlb_inval_job.c    |  15 ++-
 drivers/gpu/drm/xe/xe_tlb_inval_job.h    |   2 +-
 drivers/gpu/drm/xe/xe_vm.c               |  65 +++++------
 drivers/gpu/drm/xe/xe_vm_types.h         |   2 +-
 9 files changed, 126 insertions(+), 177 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index b3f700a9d425..0201b8159e63 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -142,9 +142,8 @@ static void __xe_exec_queue_free(struct xe_exec_queue *q)
 {
 	int i;
 
-	for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i)
-		if (q->tlb_inval[i].dep_scheduler)
-			xe_dep_scheduler_fini(q->tlb_inval[i].dep_scheduler);
+	for_each_tlb_inval(q, i)
+		xe_dep_scheduler_fini(q->tlb_inval[i].dep_scheduler);
 
 	if (xe_exec_queue_uses_pxp(q))
 		xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
@@ -166,31 +165,34 @@ static void __xe_exec_queue_free(struct xe_exec_queue *q)
 
 static int alloc_dep_schedulers(struct xe_device *xe, struct xe_exec_queue *q)
 {
-	struct xe_tile *tile = gt_to_tile(q->gt);
-	int i;
+	struct xe_tile *tile;
+	int i = 0, j;
+	u8 id;
 
-	for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i) {
-		struct xe_dep_scheduler *dep_scheduler;
-		struct xe_gt *gt;
-		struct workqueue_struct *wq;
+	for_each_tile(tile, xe, id) {
+		for (j = 0; j < (XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT + 1); ++j, ++i) {
+			struct xe_dep_scheduler *dep_scheduler;
+			struct xe_gt *gt;
+			struct workqueue_struct *wq;
 
-		if (i == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT)
-			gt = tile->primary_gt;
-		else
-			gt = tile->media_gt;
+			if (j == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT)
+				gt = tile->primary_gt;
+			else
+				gt = tile->media_gt;
 
-		if (!gt)
-			continue;
+			if (!gt)
+				continue;
 
-		wq = gt->tlb_inval.job_wq;
+			wq = gt->tlb_inval.job_wq;
 
 #define MAX_TLB_INVAL_JOBS	16	/* Picking a reasonable value */
-		dep_scheduler = xe_dep_scheduler_create(xe, wq, q->name,
-							MAX_TLB_INVAL_JOBS);
-		if (IS_ERR(dep_scheduler))
-			return PTR_ERR(dep_scheduler);
+			dep_scheduler = xe_dep_scheduler_create(xe, wq, q->name,
+								MAX_TLB_INVAL_JOBS);
+			if (IS_ERR(dep_scheduler))
+				return PTR_ERR(dep_scheduler);
 
-		q->tlb_inval[i].dep_scheduler = dep_scheduler;
+			q->tlb_inval[i].dep_scheduler = dep_scheduler;
+		}
 	}
 #undef MAX_TLB_INVAL_JOBS
 
@@ -227,7 +229,6 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
 	q->ops = gt->exec_queue_ops;
 	INIT_LIST_HEAD(&q->lr.link);
 	INIT_LIST_HEAD(&q->vm_exec_queue_link);
-	INIT_LIST_HEAD(&q->multi_gt_link);
 	INIT_LIST_HEAD(&q->hw_engine_group_link);
 	INIT_LIST_HEAD(&q->pxp.link);
 	spin_lock_init(&q->multi_queue.lock);
@@ -536,7 +537,6 @@ ALLOW_ERROR_INJECTION(xe_exec_queue_create_bind, ERRNO);
 void xe_exec_queue_destroy(struct kref *ref)
 {
 	struct xe_exec_queue *q = container_of(ref, struct xe_exec_queue, refcount);
-	struct xe_exec_queue *eq, *next;
 	int i;
 
 	xe_assert(gt_to_xe(q->gt), atomic_read(&q->job_cnt) == 0);
@@ -548,15 +548,9 @@ void xe_exec_queue_destroy(struct kref *ref)
 		xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
 
 	xe_exec_queue_last_fence_put_unlocked(q);
-	for_each_tlb_inval(i)
+	for_each_tlb_inval(q, i)
 		xe_exec_queue_tlb_inval_last_fence_put_unlocked(q, i);
 
-	if (!(q->flags & EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD)) {
-		list_for_each_entry_safe(eq, next, &q->multi_gt_list,
-					 multi_gt_link)
-			xe_exec_queue_put(eq);
-	}
-
 	if (q->user_vm) {
 		xe_vm_put(q->user_vm);
 		q->user_vm = NULL;
@@ -1159,7 +1153,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 		u64_to_user_ptr(args->instances);
 	struct xe_hw_engine *hwe;
 	struct xe_vm *vm;
-	struct xe_tile *tile;
 	struct xe_exec_queue *q = NULL;
 	u32 logical_mask;
 	u32 flags = 0;
@@ -1208,31 +1201,16 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 			return -ENOENT;
 		}
 
-		for_each_tile(tile, xe, id) {
-			struct xe_exec_queue *new;
-
-			flags |= EXEC_QUEUE_FLAG_VM;
-			if (id)
-				flags |= EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD;
-
-			new = xe_exec_queue_create_bind(xe, tile, vm, flags,
-							args->extensions);
-			if (IS_ERR(new)) {
-				up_read(&vm->lock);
-				xe_vm_put(vm);
-				err = PTR_ERR(new);
-				if (q)
-					goto put_exec_queue;
-				return err;
-			}
-			if (id == 0)
-				q = new;
-			else
-				list_add_tail(&new->multi_gt_list,
-					      &q->multi_gt_link);
-		}
+		flags |= EXEC_QUEUE_FLAG_VM;
+
+		q = xe_exec_queue_create_bind(xe, xe_device_get_root_tile(xe),
+					      vm, flags, args->extensions);
 		up_read(&vm->lock);
 		xe_vm_put(vm);
+		if (IS_ERR(q)) {
+			err = PTR_ERR(q);
+			return err;
+		}
 	} else {
 		logical_mask = calc_validate_logical_mask(xe, eci,
 							  args->width,
@@ -1436,14 +1414,6 @@ void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q)
  */
 void xe_exec_queue_kill(struct xe_exec_queue *q)
 {
-	struct xe_exec_queue *eq = q, *next;
-
-	list_for_each_entry_safe(eq, next, &eq->multi_gt_list,
-				 multi_gt_link) {
-		q->ops->kill(eq);
-		xe_vm_remove_compute_exec_queue(q->vm, eq);
-	}
-
 	q->ops->kill(q);
 	xe_vm_remove_compute_exec_queue(q->vm, q);
 }
@@ -1594,42 +1564,40 @@ void xe_exec_queue_last_fence_set(struct xe_exec_queue *q, struct xe_vm *vm,
  * xe_exec_queue_tlb_inval_last_fence_put() - Drop ref to last TLB invalidation fence
  * @q: The exec queue
  * @vm: The VM the engine does a bind for
- * @type: Either primary or media GT
+ * @idx: Index of tlb invalidation
  */
 void xe_exec_queue_tlb_inval_last_fence_put(struct xe_exec_queue *q,
 					    struct xe_vm *vm,
-					    unsigned int type)
+					    unsigned int idx)
 {
 	xe_exec_queue_last_fence_lockdep_assert(q, vm);
-	xe_assert(vm->xe, type == XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT ||
-		  type == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT);
+	xe_assert(vm->xe, idx < XE_EXEC_QUEUE_TLB_INVAL_COUNT);
 
-	xe_exec_queue_tlb_inval_last_fence_put_unlocked(q, type);
+	xe_exec_queue_tlb_inval_last_fence_put_unlocked(q, idx);
 }
 
 /**
  * xe_exec_queue_tlb_inval_last_fence_put_unlocked() - Drop ref to last TLB
  * invalidation fence unlocked
  * @q: The exec queue
- * @type: Either primary or media GT
+ * @idx: Index of tlb invalidation
  *
  * Only safe to be called from xe_exec_queue_destroy().
  */
 void xe_exec_queue_tlb_inval_last_fence_put_unlocked(struct xe_exec_queue *q,
-						     unsigned int type)
+						     unsigned int idx)
 {
-	xe_assert(q->vm->xe, type == XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT ||
-		  type == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT);
+	xe_assert(q->vm->xe, idx < XE_EXEC_QUEUE_TLB_INVAL_COUNT);
 
-	dma_fence_put(q->tlb_inval[type].last_fence);
-	q->tlb_inval[type].last_fence = NULL;
+	dma_fence_put(q->tlb_inval[idx].last_fence);
+	q->tlb_inval[idx].last_fence = NULL;
 }
 
 /**
  * xe_exec_queue_tlb_inval_last_fence_get() - Get last fence for TLB invalidation
  * @q: The exec queue
  * @vm: The VM the engine does a bind for
- * @type: Either primary or media GT
+ * @idx: Index of tlb invalidation
  *
  * Get last fence, takes a ref
  *
@@ -1637,22 +1605,21 @@ void xe_exec_queue_tlb_inval_last_fence_put_unlocked(struct xe_exec_queue *q,
  */
 struct dma_fence *xe_exec_queue_tlb_inval_last_fence_get(struct xe_exec_queue *q,
 							 struct xe_vm *vm,
-							 unsigned int type)
+							 unsigned int idx)
 {
 	struct dma_fence *fence;
 
 	xe_exec_queue_last_fence_lockdep_assert(q, vm);
-	xe_assert(vm->xe, type == XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT ||
-		  type == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT);
+	xe_assert(vm->xe, idx < XE_EXEC_QUEUE_TLB_INVAL_COUNT);
 	xe_assert(vm->xe, q->flags & (EXEC_QUEUE_FLAG_VM |
 				      EXEC_QUEUE_FLAG_MIGRATE));
 
-	if (q->tlb_inval[type].last_fence &&
+	if (q->tlb_inval[idx].last_fence &&
 	    test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-		     &q->tlb_inval[type].last_fence->flags))
-		xe_exec_queue_tlb_inval_last_fence_put(q, vm, type);
+		     &q->tlb_inval[idx].last_fence->flags))
+		xe_exec_queue_tlb_inval_last_fence_put(q, vm, idx);
 
-	fence = q->tlb_inval[type].last_fence ?: dma_fence_get_stub();
+	fence = q->tlb_inval[idx].last_fence ?: dma_fence_get_stub();
 	dma_fence_get(fence);
 	return fence;
 }
@@ -1662,26 +1629,25 @@ struct dma_fence *xe_exec_queue_tlb_inval_last_fence_get(struct xe_exec_queue *q
  * @q: The exec queue
  * @vm: The VM the engine does a bind for
  * @fence: The fence
- * @type: Either primary or media GT
+ * @idx: Index of tlb invalidation
  *
- * Set the last fence for the tlb invalidation type on the queue. Increases
+ * Set the last fence for the tlb invalidation client on the queue. Increases
  * reference count for fence, when closing queue
  * xe_exec_queue_tlb_inval_last_fence_put should be called.
  */
 void xe_exec_queue_tlb_inval_last_fence_set(struct xe_exec_queue *q,
 					    struct xe_vm *vm,
 					    struct dma_fence *fence,
-					    unsigned int type)
+					    unsigned int idx)
 {
 	xe_exec_queue_last_fence_lockdep_assert(q, vm);
-	xe_assert(vm->xe, type == XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT ||
-		  type == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT);
+	xe_assert(vm->xe, idx < XE_EXEC_QUEUE_TLB_INVAL_COUNT);
 	xe_assert(vm->xe, q->flags & (EXEC_QUEUE_FLAG_VM |
 				      EXEC_QUEUE_FLAG_MIGRATE));
 	xe_assert(vm->xe, !dma_fence_is_container(fence));
 
-	xe_exec_queue_tlb_inval_last_fence_put(q, vm, type);
-	q->tlb_inval[type].last_fence = dma_fence_get(fence);
+	xe_exec_queue_tlb_inval_last_fence_put(q, vm, idx);
+	q->tlb_inval[idx].last_fence = dma_fence_get(fence);
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
index a82d99bd77bc..b5aabab388c1 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue.h
@@ -14,9 +14,9 @@ struct drm_file;
 struct xe_device;
 struct xe_file;
 
-#define for_each_tlb_inval(__i)	\
-	for (__i = XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT; \
-	     __i <= XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT; ++__i)
+#define for_each_tlb_inval(__q, __i)	\
+	for (__i = 0; __i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++__i)	\
+		for_each_if((__q)->tlb_inval[__i].dep_scheduler)
 
 struct xe_exec_queue *xe_exec_queue_create(struct xe_device *xe, struct xe_vm *vm,
 					   u32 logical_mask, u16 width,
@@ -141,19 +141,19 @@ void xe_exec_queue_last_fence_set(struct xe_exec_queue *e, struct xe_vm *vm,
 
 void xe_exec_queue_tlb_inval_last_fence_put(struct xe_exec_queue *q,
 					    struct xe_vm *vm,
-					    unsigned int type);
+					    unsigned int idx);
 
 void xe_exec_queue_tlb_inval_last_fence_put_unlocked(struct xe_exec_queue *q,
-						     unsigned int type);
+						     unsigned int idx);
 
 struct dma_fence *xe_exec_queue_tlb_inval_last_fence_get(struct xe_exec_queue *q,
 							 struct xe_vm *vm,
-							 unsigned int type);
+							 unsigned int idx);
 
 void xe_exec_queue_tlb_inval_last_fence_set(struct xe_exec_queue *q,
 					    struct xe_vm *vm,
 					    struct dma_fence *fence,
-					    unsigned int type);
+					    unsigned int idx);
 
 void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
 
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index a1f3938f4173..d2a25db0a835 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -126,14 +126,12 @@ struct xe_exec_queue {
 #define EXEC_QUEUE_FLAG_PERMANENT		BIT(1)
 /* for VM jobs. Caller needs to hold rpm ref when creating queue with this flag */
 #define EXEC_QUEUE_FLAG_VM			BIT(2)
-/* child of VM queue for multi-tile VM jobs */
-#define EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD	BIT(3)
 /* kernel exec_queue only, set priority to highest level */
-#define EXEC_QUEUE_FLAG_HIGH_PRIORITY		BIT(4)
+#define EXEC_QUEUE_FLAG_HIGH_PRIORITY		BIT(3)
 /* flag to indicate low latency hint to guc */
-#define EXEC_QUEUE_FLAG_LOW_LATENCY		BIT(5)
+#define EXEC_QUEUE_FLAG_LOW_LATENCY		BIT(4)
 /* for migration (kernel copy, clear, bind) jobs */
-#define EXEC_QUEUE_FLAG_MIGRATE			BIT(6)
+#define EXEC_QUEUE_FLAG_MIGRATE			BIT(5)
 
 	/**
 	 * @flags: flags for this exec queue, should statically setup aside from ban
@@ -141,13 +139,6 @@ struct xe_exec_queue {
 	 */
 	unsigned long flags;
 
-	union {
-		/** @multi_gt_list: list head for VM bind engines if multi-GT */
-		struct list_head multi_gt_list;
-		/** @multi_gt_link: link for VM bind engines if multi-GT */
-		struct list_head multi_gt_link;
-	};
-
 	union {
 		/** @execlist: execlist backend specific state for exec queue */
 		struct xe_execlist_exec_queue *execlist;
@@ -202,7 +193,8 @@ struct xe_exec_queue {
 
 #define XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT	0
 #define XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT	1
-#define XE_EXEC_QUEUE_TLB_INVAL_COUNT		(XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT  + 1)
+#define XE_EXEC_QUEUE_TLB_INVAL_COUNT	\
+	((XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT + 1) * 2)
 
 	/** @tlb_inval: TLB invalidations exec queue state */
 	struct {
@@ -213,7 +205,8 @@ struct xe_exec_queue {
 		struct xe_dep_scheduler *dep_scheduler;
 		/**
 		 * @last_fence: last fence for tlb invalidation, protected by
-		 * vm->lock in write mode
+		 * vm->lock in write mode to user queues, protected by
+		 * tile->m->lock for migration queues
 		 */
 		struct dma_fence *last_fence;
 	} tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index ed7cb34c958c..032947a10806 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -2510,12 +2510,18 @@ static const struct xe_migrate_pt_update_ops svm_userptr_migrate_ops;
 #endif
 
 static struct xe_dep_scheduler *to_dep_scheduler(struct xe_exec_queue *q,
-						 struct xe_gt *gt)
+						 struct xe_tile *tile,
+						 struct xe_gt *gt,
+						 unsigned int *type)
 {
+	int tile_ofs = tile->id * (XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT + 1);
+
 	if (xe_gt_is_media_type(gt))
-		return q->tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT].dep_scheduler;
+		*type = tile_ofs + XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT;
+	else
+		*type = tile_ofs + XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT;
 
-	return q->tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT].dep_scheduler;
+	return q->tlb_inval[*type].dep_scheduler;
 }
 
 /**
@@ -2540,6 +2546,7 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
 	struct xe_tlb_inval_job *ijob = NULL, *mjob = NULL;
 	struct xe_range_fence *rfence;
 	struct xe_vma_op *op;
+	unsigned int type;
 	int err = 0, i;
 	struct xe_migrate_pt_update update = {
 		.ops = pt_update_ops->needs_svm_lock ?
@@ -2566,13 +2573,13 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
 
 	if (pt_update_ops->needs_invalidation) {
 		struct xe_dep_scheduler *dep_scheduler =
-			to_dep_scheduler(q, tile->primary_gt);
+			to_dep_scheduler(q, tile, tile->primary_gt, &type);
 
 		ijob = xe_tlb_inval_job_create(q, &tile->primary_gt->tlb_inval,
 					       dep_scheduler, vm,
 					       pt_update_ops->start,
 					       pt_update_ops->last,
-					       XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT);
+					       type);
 		if (IS_ERR(ijob)) {
 			err = PTR_ERR(ijob);
 			goto kill_vm_tile1;
@@ -2591,14 +2598,15 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
 		}
 
 		if (tile->media_gt) {
-			dep_scheduler = to_dep_scheduler(q, tile->media_gt);
+			dep_scheduler = to_dep_scheduler(q, tile,
+							 tile->media_gt, &type);
 
 			mjob = xe_tlb_inval_job_create(q,
 						       &tile->media_gt->tlb_inval,
 						       dep_scheduler, vm,
 						       pt_update_ops->start,
 						       pt_update_ops->last,
-						       XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT);
+						       type);
 			if (IS_ERR(mjob)) {
 				err = PTR_ERR(mjob);
 				goto free_ijob;
diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
index 24d6d9af20d6..8a0de78395f1 100644
--- a/drivers/gpu/drm/xe/xe_sync.c
+++ b/drivers/gpu/drm/xe/xe_sync.c
@@ -345,15 +345,9 @@ xe_sync_in_fence_get(struct xe_sync_entry *sync, int num_sync,
 			return ERR_PTR(-EOPNOTSUPP);
 
 	if (q->flags & EXEC_QUEUE_FLAG_VM) {
-		struct xe_exec_queue *__q;
-		struct xe_tile *tile;
-		u8 id;
-
-		for_each_tile(tile, vm->xe, id) {
+		num_fence++;
+		for_each_tlb_inval(q, i)
 			num_fence++;
-			for_each_tlb_inval(i)
-				num_fence++;
-		}
 
 		fences = kmalloc_objs(*fences, num_fence);
 		if (!fences)
@@ -361,17 +355,9 @@ xe_sync_in_fence_get(struct xe_sync_entry *sync, int num_sync,
 
 		fences[current_fence++] =
 			xe_exec_queue_last_fence_get(q, vm);
-		for_each_tlb_inval(i)
+		for_each_tlb_inval(q, i)
 			fences[current_fence++] =
 				xe_exec_queue_tlb_inval_last_fence_get(q, vm, i);
-		list_for_each_entry(__q, &q->multi_gt_list,
-				    multi_gt_link) {
-			fences[current_fence++] =
-				xe_exec_queue_last_fence_get(__q, vm);
-			for_each_tlb_inval(i)
-				fences[current_fence++] =
-					xe_exec_queue_tlb_inval_last_fence_get(__q, vm, i);
-		}
 
 		xe_assert(vm->xe, current_fence == num_fence);
 		cf = dma_fence_array_create(num_fence, fences,
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.c b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
index 04d21015cd5d..81f560068d3c 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
+++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
@@ -39,8 +39,8 @@ struct xe_tlb_inval_job {
 	u64 start;
 	/** @end: End address to invalidate */
 	u64 end;
-	/** @type: GT type */
-	int type;
+	/** @idx: Index of tlb invalidation */
+	int idx;
 	/** @fence_armed: Fence has been armed */
 	bool fence_armed;
 };
@@ -87,7 +87,7 @@ static const struct xe_dep_job_ops dep_job_ops = {
  * @vm: VM which TLB invalidation is being issued for
  * @start: Start address to invalidate
  * @end: End address to invalidate
- * @type: GT type
+ * @idx: Index of tlb invalidation
  *
  * Create a TLB invalidation job and initialize internal fields. The caller is
  * responsible for releasing the creation reference.
@@ -97,7 +97,7 @@ static const struct xe_dep_job_ops dep_job_ops = {
 struct xe_tlb_inval_job *
 xe_tlb_inval_job_create(struct xe_exec_queue *q, struct xe_tlb_inval *tlb_inval,
 			struct xe_dep_scheduler *dep_scheduler,
-			struct xe_vm *vm, u64 start, u64 end, int type)
+			struct xe_vm *vm, u64 start, u64 end, int idx)
 {
 	struct xe_tlb_inval_job *job;
 	struct drm_sched_entity *entity =
@@ -105,8 +105,7 @@ xe_tlb_inval_job_create(struct xe_exec_queue *q, struct xe_tlb_inval *tlb_inval,
 	struct xe_tlb_inval_fence *ifence;
 	int err;
 
-	xe_assert(vm->xe, type == XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT ||
-		  type == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT);
+	xe_assert(vm->xe, idx < XE_EXEC_QUEUE_TLB_INVAL_COUNT);
 
 	job = kmalloc_obj(*job);
 	if (!job)
@@ -120,7 +119,7 @@ xe_tlb_inval_job_create(struct xe_exec_queue *q, struct xe_tlb_inval *tlb_inval,
 	job->fence_armed = false;
 	xe_page_reclaim_list_init(&job->prl);
 	job->dep.ops = &dep_job_ops;
-	job->type = type;
+	job->idx = idx;
 	kref_init(&job->refcount);
 	xe_exec_queue_get(q);	/* Pairs with put in xe_tlb_inval_job_destroy */
 	xe_vm_get(vm);		/* Pairs with put in xe_tlb_inval_job_destroy */
@@ -280,7 +279,7 @@ struct dma_fence *xe_tlb_inval_job_push(struct xe_tlb_inval_job *job,
 	/* Let the upper layers fish this out */
 	xe_exec_queue_tlb_inval_last_fence_set(job->q, job->vm,
 					       &job->dep.drm.s_fence->finished,
-					       job->type);
+					       job->idx);
 
 	xe_migrate_job_unlock(m, job->q);
 
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
index 03d6e21cd611..2a4478f529e6 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
+++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
@@ -20,7 +20,7 @@ struct xe_vm;
 struct xe_tlb_inval_job *
 xe_tlb_inval_job_create(struct xe_exec_queue *q, struct xe_tlb_inval *tlb_inval,
 			struct xe_dep_scheduler *dep_scheduler,
-			struct xe_vm *vm, u64 start, u64 end, int type);
+			struct xe_vm *vm, u64 start, u64 end, int idx);
 
 void xe_tlb_inval_job_add_page_reclaim(struct xe_tlb_inval_job *job,
 				       struct xe_page_reclaim_list *prl);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 4ddfdd6a3c2a..52212b51caa8 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1657,7 +1657,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 			struct xe_exec_queue *q;
 			u32 create_flags = EXEC_QUEUE_FLAG_VM;
 
-			if (!vm->pt_root[id])
+			if (!vm->pt_root[id] || vm->q)
 				continue;
 
 			if (!xef) /* Not from userspace */
@@ -1668,7 +1668,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 				err = PTR_ERR(q);
 				goto err_close;
 			}
-			vm->q[id] = q;
+			vm->q = q;
 		}
 	}
 
@@ -1775,24 +1775,18 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 	if (xe_vm_in_fault_mode(vm))
 		xe_svm_close(vm);
 
-	down_write(&vm->lock);
-	for_each_tile(tile, xe, id) {
-		if (vm->q[id]) {
-			int i;
+	if (vm->q) {
+		int i;
 
-			xe_exec_queue_last_fence_put(vm->q[id], vm);
-			for_each_tlb_inval(i)
-				xe_exec_queue_tlb_inval_last_fence_put(vm->q[id], vm, i);
-		}
-	}
-	up_write(&vm->lock);
+		down_write(&vm->lock);
+		xe_exec_queue_last_fence_put(vm->q, vm);
+		for_each_tlb_inval(vm->q, i)
+			xe_exec_queue_tlb_inval_last_fence_put(vm->q, vm, i);
+		up_write(&vm->lock);
 
-	for_each_tile(tile, xe, id) {
-		if (vm->q[id]) {
-			xe_exec_queue_kill(vm->q[id]);
-			xe_exec_queue_put(vm->q[id]);
-			vm->q[id] = NULL;
-		}
+		xe_exec_queue_kill(vm->q);
+		xe_exec_queue_put(vm->q);
+		vm->q = NULL;
 	}
 
 	down_write(&vm->lock);
@@ -1924,7 +1918,7 @@ u64 xe_vm_pdp4_descriptor(struct xe_vm *vm, struct xe_tile *tile)
 static struct xe_exec_queue *
 to_wait_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
 {
-	return q ? q : vm->q[0];
+	return q ? q : vm->q;
 }
 
 static struct xe_user_fence *
@@ -3159,13 +3153,10 @@ static int vm_ops_setup_tile_args(struct xe_vm *vm, struct xe_vma_ops *vops)
 		if (vops->pt_update_ops[id].q)
 			continue;
 
-		if (q) {
+		if (q)
 			vops->pt_update_ops[id].q = q;
-			if (vm->pt_root[id] && !list_empty(&q->multi_gt_list))
-				q = list_next_entry(q, multi_gt_list);
-		} else {
-			vops->pt_update_ops[id].q = vm->q[id];
-		}
+		else
+			vops->pt_update_ops[id].q = vm->q;
 	}
 
 	return number_tiles;
@@ -3185,15 +3176,15 @@ static struct dma_fence *ops_execute(struct xe_vm *vm,
 	if (number_tiles == 0)
 		return ERR_PTR(-ENODATA);
 
-	for_each_tile(tile, vm->xe, id) {
+	for_each_tile(tile, vm->xe, id)
 		++n_fence;
 
-		if (!(vops->flags & XE_VMA_OPS_FLAG_SKIP_TLB_WAIT))
-			for_each_tlb_inval(i)
-				++n_fence;
+	if (!(vops->flags & XE_VMA_OPS_FLAG_SKIP_TLB_WAIT)) {
+		for_each_tlb_inval(vops->pt_update_ops[0].q, i)
+			++n_fence;
 	}
 
-	fences = kmalloc_objs(*fences, n_fence);
+	fences = kcalloc(n_fence, sizeof(*fences), GFP_KERNEL);
 	if (!fences) {
 		fence = ERR_PTR(-ENOMEM);
 		goto err_trace;
@@ -3235,9 +3226,15 @@ static struct dma_fence *ops_execute(struct xe_vm *vm,
 			continue;
 
 		xe_migrate_job_lock(tile->migrate, q);
-		for_each_tlb_inval(i)
-			fences[current_fence++] =
-				xe_exec_queue_tlb_inval_last_fence_get(q, vm, i);
+		for_each_tlb_inval(q, i) {
+			if (i >= (tile->id + 1) * XE_MAX_GT_PER_TILE ||
+			    i < tile->id * XE_MAX_GT_PER_TILE)
+				continue;
+
+			fences[current_fence++] = fence ?
+				xe_exec_queue_tlb_inval_last_fence_get(q, vm, i) :
+				dma_fence_get_stub();
+		}
 		xe_migrate_job_unlock(tile->migrate, q);
 	}
 
@@ -3746,7 +3743,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 
 	syncs_user = u64_to_user_ptr(args->syncs);
 	for (num_syncs = 0; num_syncs < args->num_syncs; num_syncs++) {
-		struct xe_exec_queue *__q = q ?: vm->q[0];
+		struct xe_exec_queue *__q = q ?: vm->q;
 
 		err = xe_sync_entry_parse(xe, xef, &syncs[num_syncs],
 					  &syncs_user[num_syncs],
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 1f6f7e30e751..2c173550346a 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -209,7 +209,7 @@ struct xe_vm {
 	struct xe_device *xe;
 
 	/* exec queue used for (un)binding vma's */
-	struct xe_exec_queue *q[XE_MAX_TILES_PER_DEVICE];
+	struct xe_exec_queue *q;
 
 	/** @lru_bulk_move: Bulk LRU move list for this VM's BOs */
 	struct ttm_lru_bulk_move lru_bulk_move;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 16/25] drm/xe: Add CPU bind layer
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (14 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 15/25] drm/xe: Make bind queues operate cross-tile Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-02-28  1:34 ` [PATCH v3 17/25] drm/xe: Add device flag to enable PT mirroring across tiles Matthew Brost
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

With CPU binds, it no longer makes sense to implement CPU bind handling
in the migrate layer, as these operations are entirely decoupled from
hardware. Introduce a dedicated CPU bind layer stored at the device
level.

Since CPU binds are tile-independent, update the PT layer to generate a
single bind job even when pages are mirrored across tiles.

This patch is large because the refactor touches multiple file / layers
and ensures functional equivalence before and after the change.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/Makefile             |   1 +
 drivers/gpu/drm/xe/xe_cpu_bind.c        | 296 +++++++++++++
 drivers/gpu/drm/xe/xe_cpu_bind.h        | 118 +++++
 drivers/gpu/drm/xe/xe_device.c          |   5 +
 drivers/gpu/drm/xe/xe_device_types.h    |   4 +
 drivers/gpu/drm/xe/xe_exec_queue.c      |   3 +-
 drivers/gpu/drm/xe/xe_guc_submit.c      |  41 +-
 drivers/gpu/drm/xe/xe_migrate.c         | 248 -----------
 drivers/gpu/drm/xe/xe_migrate.h         |  95 ----
 drivers/gpu/drm/xe/xe_pt.c              | 553 ++++++++++++------------
 drivers/gpu/drm/xe/xe_pt.h              |   8 +-
 drivers/gpu/drm/xe/xe_pt_types.h        |  14 -
 drivers/gpu/drm/xe/xe_sched_job.c       |  10 +-
 drivers/gpu/drm/xe/xe_sched_job_types.h |  11 +-
 drivers/gpu/drm/xe/xe_tlb_inval_job.c   |  13 +-
 drivers/gpu/drm/xe/xe_tlb_inval_job.h   |   2 -
 drivers/gpu/drm/xe/xe_vm.c              | 156 ++-----
 drivers/gpu/drm/xe/xe_vm_types.h        |  20 +-
 18 files changed, 818 insertions(+), 780 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.c
 create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index ff778fb2d4ff..f923e54c1082 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -35,6 +35,7 @@ $(obj)/generated/%_device_wa_oob.c $(obj)/generated/%_device_wa_oob.h: $(obj)/xe
 xe-y += xe_bb.o \
 	xe_bo.o \
 	xe_bo_evict.o \
+	xe_cpu_bind.o \
 	xe_dep_scheduler.o \
 	xe_devcoredump.o \
 	xe_device.o \
diff --git a/drivers/gpu/drm/xe/xe_cpu_bind.c b/drivers/gpu/drm/xe/xe_cpu_bind.c
new file mode 100644
index 000000000000..4a9c72250ca9
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_cpu_bind.c
@@ -0,0 +1,296 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#include <drm/drm_managed.h>
+#include <linux/mutex.h>
+
+#include "xe_cpu_bind.h"
+#include "xe_device_types.h"
+#include "xe_exec_queue.h"
+#include "xe_pt.h"
+#include "xe_sched_job.h"
+#include "xe_trace_bo.h"
+#include "xe_vm.h"
+
+/**
+ * struct xe_cpu_bind - cpu_bind context.
+ */
+struct xe_cpu_bind {
+	/** @xe: Xe device */
+	struct xe_device *xe;
+	/** @q: Default exec queue used for kernel binds */
+	struct xe_exec_queue *q;
+	/** @job_mutex: Timeline mutex for @q. */
+	struct mutex job_mutex;
+};
+
+static bool is_cpu_bind_queue(struct xe_cpu_bind *cpu_bind,
+			      struct xe_exec_queue *q)
+{
+	return cpu_bind->q == q;
+}
+
+static void xe_cpu_bind_fini(void *arg)
+{
+	struct xe_cpu_bind *cpu_bind = arg;
+
+	mutex_destroy(&cpu_bind->job_mutex);
+	xe_exec_queue_put(cpu_bind->q);
+}
+
+/**
+ * xe_cpu_bind_init() - Initialize a cpu_bind context
+ * @xe: &xe_device
+ *
+ * Return: 0 if successful, negative error code on failure
+ */
+int xe_cpu_bind_init(struct xe_device *xe)
+{
+	struct xe_cpu_bind *cpu_bind =
+		drmm_kzalloc(&xe->drm, sizeof(*cpu_bind), GFP_KERNEL);
+	struct xe_exec_queue *q;
+
+	q = xe_exec_queue_create_bind(xe, xe_device_get_root_tile(xe), NULL,
+				      EXEC_QUEUE_FLAG_KERNEL |
+				      EXEC_QUEUE_FLAG_PERMANENT |
+				      EXEC_QUEUE_FLAG_MIGRATE, 0);
+	if (IS_ERR(q))
+		return PTR_ERR(q);
+
+	cpu_bind->xe = xe;
+	cpu_bind->q = q;
+	xe->cpu_bind = cpu_bind;
+
+	mutex_init(&cpu_bind->job_mutex);
+
+	fs_reclaim_acquire(GFP_KERNEL);
+	might_lock(&cpu_bind->job_mutex);
+	fs_reclaim_release(GFP_KERNEL);
+
+	return devm_add_action_or_reset(cpu_bind->xe->drm.dev, xe_cpu_bind_fini,
+					cpu_bind);
+}
+
+/**
+ * xe_cpu_bind_queue() - Get the bind queue from cpu_bind context.
+ * @cpu_bind: The cpu bind context.
+ *
+ * Return: Pointer to bind queue on success, error on failure
+ */
+struct xe_exec_queue *xe_cpu_bind_queue(struct xe_cpu_bind *cpu_bind)
+{
+	return cpu_bind->q;
+}
+
+/**
+ * xe_cpu_bind_update_pgtables_cpu_execute() - Update a VM's PTEs via the CPU
+ * @vm: The VM being updated
+ * @tile: The tile being updated
+ * @ops: The migrate PT update ops
+ * @pt_ops: The VM PT update ops
+ * @num_ops: The number of The VM PT update ops
+ *
+ * Execute the VM PT update ops array which results in a VM's PTEs being updated
+ * via the CPU.
+ */
+void
+xe_cpu_bind_update_pgtables_execute(struct xe_vm *vm, struct xe_tile *tile,
+				    const struct xe_cpu_bind_pt_update_ops *ops,
+				    struct xe_vm_pgtable_update_op *pt_op,
+				    int num_ops)
+{
+	u32 j, i;
+
+	for (j = 0; j < num_ops; ++j, ++pt_op) {
+		for (i = 0; i < pt_op->num_entries; i++) {
+			const struct xe_vm_pgtable_update *update =
+				&pt_op->entries[i];
+
+			xe_assert(vm->xe, update);
+			xe_assert(vm->xe, update->pt_bo);
+			xe_assert(vm->xe, !iosys_map_is_null(&update->pt_bo->vmap));
+
+			if (pt_op->bind)
+				ops->populate(tile, &update->pt_bo->vmap,
+					      update->ofs, update->qwords,
+					      update);
+			else
+				ops->clear(vm, tile, &update->pt_bo->vmap,
+					   update->ofs, update->qwords,
+					   update);
+		}
+	}
+
+	trace_xe_vm_cpu_bind(vm);
+	xe_device_wmb(vm->xe);
+}
+
+static struct dma_fence *
+xe_cpu_bind_update_pgtables_no_job(struct xe_cpu_bind *cpu_bind,
+				   struct xe_cpu_bind_pt_update *pt_update)
+{
+	const struct xe_cpu_bind_pt_update_ops *ops = pt_update->ops;
+	struct xe_vm *vm = pt_update->vops->vm;
+	struct xe_tile *tile;
+	int err, id;
+
+	if (ops->pre_commit) {
+		pt_update->job = NULL;
+		err = ops->pre_commit(pt_update);
+		if (err)
+			return ERR_PTR(err);
+	}
+
+	for_each_tile(tile, vm->xe, id) {
+		struct xe_vm_pgtable_update_ops *pt_update_ops =
+			&pt_update->vops->pt_update_ops[tile->id];
+
+		if (!pt_update_ops->pt_job_ops)
+			continue;
+
+		xe_cpu_bind_update_pgtables_execute(vm, tile, ops,
+						    pt_update_ops->pt_job_ops->ops,
+						    pt_update_ops->pt_job_ops->current_op);
+	}
+
+	return dma_fence_get_stub();
+}
+
+static struct dma_fence *
+xe_cpu_bind_update_pgtables_job(struct xe_cpu_bind *cpu_bind,
+				struct xe_cpu_bind_pt_update *pt_update)
+{
+	const struct xe_cpu_bind_pt_update_ops *ops = pt_update->ops;
+	struct xe_exec_queue *q = pt_update->vops->q;
+	struct xe_device *xe = cpu_bind->xe;
+	struct xe_sched_job *job;
+	struct dma_fence *fence;
+	struct xe_tile *tile;
+	int err, id;
+	bool is_cpu_bind = is_cpu_bind_queue(cpu_bind, q);
+
+	job = xe_sched_job_create(q, NULL);
+	if (IS_ERR(job))
+		return ERR_CAST(job);
+
+	xe_assert(xe, job->is_pt_job);
+
+	if (ops->pre_commit) {
+		pt_update->job = job;
+		err = ops->pre_commit(pt_update);
+		if (err)
+			goto err_job;
+	}
+
+	if (is_cpu_bind)
+		mutex_lock(&cpu_bind->job_mutex);
+
+	job->pt_update[0].vm = pt_update->vops->vm;
+	job->pt_update[0].ops = ops;
+	for_each_tile(tile, xe, id) {
+		struct xe_vm_pgtable_update_ops *pt_update_ops =
+			&pt_update->vops->pt_update_ops[tile->id];
+
+		job->pt_update[0].pt_job_ops[tile->id] =
+			xe_pt_job_ops_get(pt_update_ops->pt_job_ops);
+	}
+
+	xe_sched_job_arm(job);
+	fence = dma_fence_get(&job->drm.s_fence->finished);
+	xe_sched_job_push(job);
+
+	if (is_cpu_bind)
+		mutex_unlock(&cpu_bind->job_mutex);
+
+	return fence;
+
+err_job:
+	xe_sched_job_put(job);
+	return ERR_PTR(err);
+}
+
+/**
+ * xe_cpu_bind_update_pgtables() - Pipelined page-table update
+ * @cpu_bind: The cpu bind context.
+ * @pt_update: PT update arguments
+ *
+ * Perform a pipelined page-table update. The update descriptors are typically
+ * built under the same lock critical section as a call to this function. If
+ * using the default engine for the updates, they will be performed in the
+ * order they grab the job_mutex. If different engines are used, external
+ * synchronization is needed for overlapping updates to maintain page-table
+ * consistency. Note that the meaning of "overlapping" is that the updates
+ * touch the same page-table, which might be a higher-level page-directory.
+ * If no pipelining is needed, then updates may be performed by the cpu.
+ *
+ * Return: A dma_fence that, when signaled, indicates the update completion.
+ */
+struct dma_fence *
+xe_cpu_bind_update_pgtables(struct xe_cpu_bind *cpu_bind,
+			    struct xe_cpu_bind_pt_update *pt_update)
+{
+	struct dma_fence *fence;
+
+	fence = xe_cpu_bind_update_pgtables_no_job(cpu_bind, pt_update);
+
+	/* -ETIME indicates a job is needed, anything else is legit error */
+	if (!IS_ERR(fence) || PTR_ERR(fence) != -ETIME)
+		return fence;
+
+	return xe_cpu_bind_update_pgtables_job(cpu_bind, pt_update);
+}
+
+/**
+ * xe_cpu_bind_job_lock() - Lock cpu_bind job lock
+ * @cpu_bind: The cpu bind context.
+ * @q: Queue associated with the operation which requires a lock
+ *
+ * Lock the cpu_bind job lock if the queue is a cpu bind queue, otherwise
+ * assert the VM's dma-resv is held (user queue's have own locking).
+ */
+void xe_cpu_bind_job_lock(struct xe_cpu_bind *cpu_bind,
+			  struct xe_exec_queue *q)
+{
+	bool is_cpu_bind = is_cpu_bind_queue(cpu_bind, q);
+
+	if (is_cpu_bind)
+		mutex_lock(&cpu_bind->job_mutex);
+	else
+		xe_vm_assert_held(q->user_vm);	/* User queues VM's should be locked */
+}
+
+/**
+ * xe_cpu_bind_job_unlock() - Unlock cpu_bind job lock
+ * @cpu_bind: The cpu bind context.
+ * @q: Queue associated with the operation which requires a lock
+ *
+ * Unlock the cpu_bind job lock if the queue is a cpu bind queue, otherwise
+ * assert the VM's dma-resv is held (user queue's have own locking).
+ */
+void xe_cpu_bind_job_unlock(struct xe_cpu_bind *cpu_bind,
+			    struct xe_exec_queue *q)
+{
+	bool is_cpu_bind = is_cpu_bind_queue(cpu_bind, q);
+
+	if (is_cpu_bind)
+		mutex_unlock(&cpu_bind->job_mutex);
+	else
+		xe_vm_assert_held(q->user_vm);	/* User queues VM's should be locked */
+}
+
+#if IS_ENABLED(CONFIG_PROVE_LOCKING)
+/**
+ * xe_cpu_bind_job_lock_assert() - Assert cpu_bind job lock held of queue
+ * @q: cpu bind queue
+ */
+void xe_cpu_bind_job_lock_assert(struct xe_exec_queue *q)
+{
+	struct xe_device *xe = gt_to_xe(q->gt);
+	struct xe_cpu_bind *cpu_bind = xe->cpu_bind;
+
+	xe_assert(xe, q == cpu_bind->q);
+	lockdep_assert_held(&cpu_bind->job_mutex);
+}
+#endif
diff --git a/drivers/gpu/drm/xe/xe_cpu_bind.h b/drivers/gpu/drm/xe/xe_cpu_bind.h
new file mode 100644
index 000000000000..95996a6a5c20
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_cpu_bind.h
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_CPU_BIND_H_
+#define _XE_CPU_BIND_H_
+
+#include <linux/types.h>
+
+struct dma_fence;
+struct iosys_map;
+struct xe_cpu_bind;
+struct xe_cpu_bind_pt_update;
+struct xe_device;
+struct xe_tlb_inval_job;
+struct xe_tile;
+struct xe_vm;
+struct xe_vm_pgtable_update;
+struct xe_vm_pgtable_update_op;
+struct xe_vma_ops;
+
+/**
+ * struct xe_cpu_bind_pt_update_ops - Callbacks for the
+ * xe_cpu_bind_update_pgtables() function.
+ */
+struct xe_cpu_bind_pt_update_ops {
+	/**
+	 * @populate: Populate a command buffer or page-table with ptes.
+	 * @tile: The tile for the current operation.
+	 * @map: struct iosys_map into the memory to be populated.
+	 * @ofs: qword offset into @map, unused if @map is NULL.
+	 * @num_qwords: Number of qwords to write.
+	 * @update: Information about the PTEs to be inserted.
+	 *
+	 * This interface is intended to be used as a callback into the
+	 * page-table system to populate command buffers or shared
+	 * page-tables with PTEs.
+	 */
+	void (*populate)(struct xe_tile *tile, struct iosys_map *map,
+			 u32 ofs, u32 num_qwords,
+			 const struct xe_vm_pgtable_update *update);
+	/**
+	 * @clear: Clear a command buffer or page-table with ptes.
+	 * @vm: VM being updated
+	 * @tile: The tile for the current operation.
+	 * @map: struct iosys_map into the memory to be populated.
+	 * @ofs: qword offset into @map, unused if @map is NULL.
+	 * @num_qwords: Number of qwords to write.
+	 * @update: Information about the PTEs to be inserted.
+	 *
+	 * This interface is intended to be used as a callback into the
+	 * page-table system to populate command buffers or shared
+	 * page-tables with PTEs.
+	 */
+	void (*clear)(struct xe_vm *vm, struct xe_tile *tile,
+		      struct iosys_map *map, u32 ofs, u32 num_qwords,
+		      const struct xe_vm_pgtable_update *update);
+
+	/**
+	 * @pre_commit: Callback to be called just before arming the
+	 * sched_job.
+	 * @pt_update: Pointer to embeddable callback argument.
+	 *
+	 * Return: 0 on success, negative error code on error.
+	 */
+	int (*pre_commit)(struct xe_cpu_bind_pt_update *pt_update);
+};
+
+/**
+ * struct xe_cpu_bind_pt_update - Argument to the struct
+ * xe_cpu_bind_pt_update_ops callbacks.
+ *
+ * Intended to be subclassed to support additional arguments if necessary.
+ */
+struct xe_cpu_bind_pt_update {
+	/** @ops: Pointer to the struct xe_cpu_bind_pt_update_ops callbacks */
+	const struct xe_cpu_bind_pt_update_ops *ops;
+	/** @vops: VMA operations */
+	struct xe_vma_ops *vops;
+	/** @job: The job if a GPU page-table update. NULL otherwise */
+	struct xe_sched_job *job;
+	/**
+	 * @ijobs: The TLB invalidation jobs, individual instances can be NULL
+	 */
+#define XE_CPU_BIND_INVAL_JOB_COUNT	4
+	struct xe_tlb_inval_job *ijobs[XE_CPU_BIND_INVAL_JOB_COUNT];
+};
+
+int xe_cpu_bind_init(struct xe_device *xe);
+
+struct xe_exec_queue *xe_cpu_bind_queue(struct xe_cpu_bind *cpu_bind);
+
+void
+xe_cpu_bind_update_pgtables_execute(struct xe_vm *vm, struct xe_tile *tile,
+				    const struct xe_cpu_bind_pt_update_ops *ops,
+				    struct xe_vm_pgtable_update_op *pt_op,
+				    int num_ops);
+
+struct dma_fence *
+xe_cpu_bind_update_pgtables(struct xe_cpu_bind *cpu_bind,
+			    struct xe_cpu_bind_pt_update *pt_update);
+
+void xe_cpu_bind_job_lock(struct xe_cpu_bind *cpu_bind,
+			  struct xe_exec_queue *q);
+
+void xe_cpu_bind_job_unlock(struct xe_cpu_bind *cpu_bind,
+			    struct xe_exec_queue *q);
+
+#if IS_ENABLED(CONFIG_PROVE_LOCKING)
+void xe_cpu_bind_job_lock_assert(struct xe_exec_queue *q);
+#else
+static inline void xe_cpu_bind_job_lock_assert(struct xe_exec_queue *q)
+{
+}
+#endif
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 3462645ca13c..b7ad7f97e68c 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -25,6 +25,7 @@
 #include "regs/xe_regs.h"
 #include "xe_bo.h"
 #include "xe_bo_evict.h"
+#include "xe_cpu_bind.h"
 #include "xe_debugfs.h"
 #include "xe_defaults.h"
 #include "xe_devcoredump.h"
@@ -929,6 +930,10 @@ int xe_device_probe(struct xe_device *xe)
 			return err;
 	}
 
+	err = xe_cpu_bind_init(xe);
+	if (err)
+		return err;
+
 	err = xe_pagefault_init(xe);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index caa8f34a6744..776e9e190320 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -36,6 +36,7 @@
 struct drm_pagemap_shrinker;
 struct intel_display;
 struct intel_dg_nvm_dev;
+struct xe_cpu_bind;
 struct xe_ggtt;
 struct xe_i2c;
 struct xe_pat_ops;
@@ -512,6 +513,9 @@ struct xe_device {
 	/** @i2c: I2C host controller */
 	struct xe_i2c *i2c;
 
+	/** @cpu_bind: CPU bind object */
+	struct xe_cpu_bind *cpu_bind;
+
 	/** @atomic_svm_timeslice_ms: Atomic SVM fault timeslice MS */
 	u32 atomic_svm_timeslice_ms;
 
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 0201b8159e63..ee2119cf45c1 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -14,6 +14,7 @@
 #include <uapi/drm/xe_drm.h>
 
 #include "xe_bo.h"
+#include "xe_cpu_bind.h"
 #include "xe_dep_scheduler.h"
 #include "xe_device.h"
 #include "xe_gt.h"
@@ -1454,7 +1455,7 @@ static void xe_exec_queue_last_fence_lockdep_assert(struct xe_exec_queue *q,
 						    struct xe_vm *vm)
 {
 	if (q->flags & EXEC_QUEUE_FLAG_MIGRATE) {
-		xe_migrate_job_lock_assert(q);
+		xe_cpu_bind_job_lock_assert(q);
 	} else if (q->flags & EXEC_QUEUE_FLAG_VM) {
 		lockdep_assert_held(&vm->lock);
 	} else {
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 1d6ac7a6563b..f7b56a1eaed4 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -17,6 +17,7 @@
 #include "abi/guc_klvs_abi.h"
 #include "xe_assert.h"
 #include "xe_bo.h"
+#include "xe_cpu_bind.h"
 #include "xe_devcoredump.h"
 #include "xe_device.h"
 #include "xe_exec_queue.h"
@@ -36,7 +37,6 @@
 #include "xe_lrc.h"
 #include "xe_macros.h"
 #include "xe_map.h"
-#include "xe_migrate.h"
 #include "xe_mocs.h"
 #include "xe_pm.h"
 #include "xe_pt.h"
@@ -1190,13 +1190,36 @@ static bool is_pt_job(struct xe_sched_job *job)
 	return job->is_pt_job;
 }
 
-static void run_pt_job(struct xe_sched_job *job)
+static void run_pt_job(struct xe_device *xe, struct xe_sched_job *job)
 {
-	xe_migrate_update_pgtables_cpu_execute(job->pt_update[0].vm,
-					       job->pt_update[0].tile,
-					       job->pt_update[0].ops,
-					       job->pt_update[0].pt_job_ops->ops,
-					       job->pt_update[0].pt_job_ops->current_op);
+	struct xe_tile *tile;
+	int id;
+
+	for_each_tile(tile, xe, id) {
+		struct xe_pt_job_ops *pt_job_ops =
+			job->pt_update[0].pt_job_ops[id];
+
+		if (!pt_job_ops || !pt_job_ops->current_op)
+			continue;
+
+		xe_cpu_bind_update_pgtables_execute(job->pt_update[0].vm, tile,
+						    job->pt_update[0].ops,
+						    pt_job_ops->ops,
+						    pt_job_ops->current_op);
+	}
+}
+
+static void put_pt_job(struct xe_device *xe, struct xe_sched_job *job)
+{
+	struct xe_tile *tile;
+	int id;
+
+	for_each_tile(tile, xe, id) {
+		struct xe_pt_job_ops *pt_job_ops =
+			job->pt_update[0].pt_job_ops[id];
+
+		xe_pt_job_ops_put(pt_job_ops);
+	}
 }
 
 static struct dma_fence *
@@ -1228,7 +1251,7 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
 
 		if (is_pt_job(job)) {
 			xe_gt_assert(guc_to_gt(guc), !exec_queue_registered(q));
-			run_pt_job(job);
+			run_pt_job(guc_to_xe(guc), job);
 		} else {
 			if (!exec_queue_registered(q))
 				register_exec_queue(q, GUC_CONTEXT_NORMAL);
@@ -1240,7 +1263,7 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
 	}
 
 	if (is_pt_job(job)) {
-		xe_pt_job_ops_put(job->pt_update[0].pt_job_ops);
+		put_pt_job(guc_to_xe(guc), job);
 		dma_fence_put(job->fence);	/* Drop ref from xe_sched_job_arm */
 		return NULL;
 	}
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index fe5c9bdcb555..b5d4fc4d4c62 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -49,8 +49,6 @@
 struct xe_migrate {
 	/** @q: Default exec queue used for migration */
 	struct xe_exec_queue *q;
-	/** @bind_q: Default exec queue used for binds */
-	struct xe_exec_queue *bind_q;
 	/** @tile: Backpointer to the tile this struct xe_migrate belongs to. */
 	struct xe_tile *tile;
 	/** @job_mutex: Timeline mutex for @eng. */
@@ -108,7 +106,6 @@ static void xe_migrate_fini(void *arg)
 	mutex_destroy(&m->job_mutex);
 	xe_vm_close_and_put(m->q->vm);
 	xe_exec_queue_put(m->q);
-	xe_exec_queue_put(m->bind_q);
 }
 
 static u64 xe_migrate_vm_addr(u64 slot, u32 level)
@@ -448,16 +445,6 @@ int xe_migrate_init(struct xe_migrate *m)
 			goto err_out;
 		}
 
-		m->bind_q = xe_exec_queue_create(xe, vm, logical_mask, 1, hwe,
-						 EXEC_QUEUE_FLAG_KERNEL |
-						 EXEC_QUEUE_FLAG_PERMANENT |
-						 EXEC_QUEUE_FLAG_HIGH_PRIORITY |
-						 EXEC_QUEUE_FLAG_MIGRATE, 0);
-		if (IS_ERR(m->bind_q)) {
-			err = PTR_ERR(m->bind_q);
-			goto err_out;
-		}
-
 		/*
 		 * XXX: Currently only reserving 1 (likely slow) BCS instance on
 		 * PVC, may want to revisit if performance is needed.
@@ -469,16 +456,6 @@ int xe_migrate_init(struct xe_migrate *m)
 					    EXEC_QUEUE_FLAG_MIGRATE |
 					    EXEC_QUEUE_FLAG_LOW_LATENCY, 0);
 	} else {
-		m->bind_q = xe_exec_queue_create_class(xe, primary_gt, vm,
-						       XE_ENGINE_CLASS_COPY,
-						       EXEC_QUEUE_FLAG_KERNEL |
-						       EXEC_QUEUE_FLAG_PERMANENT |
-						       EXEC_QUEUE_FLAG_MIGRATE, 0);
-		if (IS_ERR(m->bind_q)) {
-			err = PTR_ERR(m->bind_q);
-			goto err_out;
-		}
-
 		m->q = xe_exec_queue_create_class(xe, primary_gt, vm,
 						  XE_ENGINE_CLASS_COPY,
 						  EXEC_QUEUE_FLAG_KERNEL |
@@ -515,8 +492,6 @@ int xe_migrate_init(struct xe_migrate *m)
 	return err;
 
 err_out:
-	if (!IS_ERR_OR_NULL(m->bind_q))
-		xe_exec_queue_put(m->bind_q);
 	xe_vm_close_and_put(vm);
 	return err;
 
@@ -1403,17 +1378,6 @@ struct dma_fence *xe_migrate_vram_copy_chunk(struct xe_bo *vram_bo, u64 vram_off
 	return fence;
 }
 
-/**
- * xe_get_migrate_bind_queue() - Get the bind queue from migrate context.
- * @migrate: Migrate context.
- *
- * Return: Pointer to bind queue on success, error on failure
- */
-struct xe_exec_queue *xe_migrate_bind_queue(struct xe_migrate *migrate)
-{
-	return migrate->bind_q;
-}
-
 static void emit_clear_link_copy(struct xe_gt *gt, struct xe_bb *bb, u64 src_ofs,
 				 u32 size, u32 pitch)
 {
@@ -1684,168 +1648,6 @@ struct migrate_test_params {
 	container_of(_priv, struct migrate_test_params, base)
 #endif
 
-/**
- * xe_migrate_update_pgtables_cpu_execute() - Update a VM's PTEs via the CPU
- * @vm: The VM being updated
- * @tile: The tile being updated
- * @ops: The migrate PT update ops
- * @pt_ops: The VM PT update ops
- * @num_ops: The number of The VM PT update ops
- *
- * Execute the VM PT update ops array which results in a VM's PTEs being updated
- * via the CPU.
- */
-void
-xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct xe_tile *tile,
-				       const struct xe_migrate_pt_update_ops *ops,
-				       struct xe_vm_pgtable_update_op *pt_op,
-				       int num_ops)
-{
-	u32 j, i;
-
-	for (j = 0; j < num_ops; ++j, ++pt_op) {
-		for (i = 0; i < pt_op->num_entries; i++) {
-			const struct xe_vm_pgtable_update *update =
-				&pt_op->entries[i];
-
-			xe_tile_assert(tile, update);
-			xe_tile_assert(tile, update->pt_bo);
-			xe_tile_assert(tile, !iosys_map_is_null(&update->pt_bo->vmap));
-
-			if (pt_op->bind)
-				ops->populate(tile, &update->pt_bo->vmap,
-					      update->ofs, update->qwords,
-					      update);
-			else
-				ops->clear(vm, tile, &update->pt_bo->vmap,
-					   update->ofs, update->qwords,
-					   update);
-		}
-	}
-
-	trace_xe_vm_cpu_bind(vm);
-	xe_device_wmb(vm->xe);
-}
-
-static struct dma_fence *
-xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
-			       struct xe_migrate_pt_update *pt_update)
-{
-	XE_TEST_DECLARE(struct migrate_test_params *test =
-			to_migrate_test_params
-			(xe_cur_kunit_priv(XE_TEST_LIVE_MIGRATE));)
-	const struct xe_migrate_pt_update_ops *ops = pt_update->ops;
-	struct xe_vm *vm = pt_update->vops->vm;
-	struct xe_vm_pgtable_update_ops *pt_update_ops =
-		&pt_update->vops->pt_update_ops[pt_update->tile_id];
-	int err;
-
-	if (XE_TEST_ONLY(test && test->force_gpu))
-		return ERR_PTR(-ETIME);
-
-	if (ops->pre_commit) {
-		pt_update->job = NULL;
-		err = ops->pre_commit(pt_update);
-		if (err)
-			return ERR_PTR(err);
-	}
-
-	xe_migrate_update_pgtables_cpu_execute(vm, m->tile, ops,
-					       pt_update_ops->pt_job_ops->ops,
-					       pt_update_ops->num_ops);
-
-	return dma_fence_get_stub();
-}
-
-static bool is_migrate_queue(struct xe_migrate *m, struct xe_exec_queue *q)
-{
-	return m->bind_q == q;
-}
-
-static struct dma_fence *
-__xe_migrate_update_pgtables(struct xe_migrate *m,
-			     struct xe_migrate_pt_update *pt_update,
-			     struct xe_vm_pgtable_update_ops *pt_update_ops)
-{
-	const struct xe_migrate_pt_update_ops *ops = pt_update->ops;
-	struct xe_tile *tile = m->tile;
-	struct xe_sched_job *job;
-	struct dma_fence *fence;
-	bool is_migrate = is_migrate_queue(m, pt_update_ops->q);
-	int err;
-
-	job = xe_sched_job_create(pt_update_ops->q, NULL);
-	if (IS_ERR(job)) {
-		err = PTR_ERR(job);
-		goto err_out;
-	}
-
-	xe_tile_assert(tile, job->is_pt_job);
-
-	if (ops->pre_commit) {
-		pt_update->job = job;
-		err = ops->pre_commit(pt_update);
-		if (err)
-			goto err_job;
-	}
-	if (is_migrate)
-		mutex_lock(&m->job_mutex);
-
-	job->pt_update[0].vm = pt_update->vops->vm;
-	job->pt_update[0].tile = tile;
-	job->pt_update[0].ops = ops;
-	job->pt_update[0].pt_job_ops =
-		xe_pt_job_ops_get(pt_update_ops->pt_job_ops);
-
-	xe_sched_job_arm(job);
-	fence = dma_fence_get(&job->drm.s_fence->finished);
-	xe_sched_job_push(job);
-
-	if (is_migrate)
-		mutex_unlock(&m->job_mutex);
-
-	return fence;
-
-err_job:
-	xe_sched_job_put(job);
-err_out:
-	return ERR_PTR(err);
-}
-
-/**
- * xe_migrate_update_pgtables() - Pipelined page-table update
- * @m: The migrate context.
- * @pt_update: PT update arguments
- *
- * Perform a pipelined page-table update. The update descriptors are typically
- * built under the same lock critical section as a call to this function. If
- * using the default engine for the updates, they will be performed in the
- * order they grab the job_mutex. If different engines are used, external
- * synchronization is needed for overlapping updates to maintain page-table
- * consistency. Note that the meaning of "overlapping" is that the updates
- * touch the same page-table, which might be a higher-level page-directory.
- * If no pipelining is needed, then updates may be performed by the cpu.
- *
- * Return: A dma_fence that, when signaled, indicates the update completion.
- */
-struct dma_fence *
-xe_migrate_update_pgtables(struct xe_migrate *m,
-			   struct xe_migrate_pt_update *pt_update)
-
-{
-	struct xe_vm_pgtable_update_ops *pt_update_ops =
-		&pt_update->vops->pt_update_ops[pt_update->tile_id];
-	struct dma_fence *fence;
-
-	fence =  xe_migrate_update_pgtables_cpu(m, pt_update);
-
-	/* -ETIME indicates a job is needed, anything else is legit error */
-	if (!IS_ERR(fence) || PTR_ERR(fence) != -ETIME)
-		return fence;
-
-	return __xe_migrate_update_pgtables(m, pt_update, pt_update_ops);
-}
-
 /**
  * xe_migrate_wait() - Complete all operations using the xe_migrate context
  * @m: Migrate context to wait for.
@@ -2347,56 +2149,6 @@ int xe_migrate_access_memory(struct xe_migrate *m, struct xe_bo *bo,
 	return IS_ERR(fence) ? PTR_ERR(fence) : 0;
 }
 
-/**
- * xe_migrate_job_lock() - Lock migrate job lock
- * @m: The migration context.
- * @q: Queue associated with the operation which requires a lock
- *
- * Lock the migrate job lock if the queue is a migration queue, otherwise
- * assert the VM's dma-resv is held (user queue's have own locking).
- */
-void xe_migrate_job_lock(struct xe_migrate *m, struct xe_exec_queue *q)
-{
-	bool is_migrate = is_migrate_queue(m, q);
-
-	if (is_migrate)
-		mutex_lock(&m->job_mutex);
-	else
-		xe_vm_assert_held(q->user_vm);	/* User queues VM's should be locked */
-}
-
-/**
- * xe_migrate_job_unlock() - Unlock migrate job lock
- * @m: The migration context.
- * @q: Queue associated with the operation which requires a lock
- *
- * Unlock the migrate job lock if the queue is a migration queue, otherwise
- * assert the VM's dma-resv is held (user queue's have own locking).
- */
-void xe_migrate_job_unlock(struct xe_migrate *m, struct xe_exec_queue *q)
-{
-	bool is_migrate = is_migrate_queue(m, q);
-
-	if (is_migrate)
-		mutex_unlock(&m->job_mutex);
-	else
-		xe_vm_assert_held(q->user_vm);	/* User queues VM's should be locked */
-}
-
-#if IS_ENABLED(CONFIG_PROVE_LOCKING)
-/**
- * xe_migrate_job_lock_assert() - Assert migrate job lock held of queue
- * @q: Migrate queue
- */
-void xe_migrate_job_lock_assert(struct xe_exec_queue *q)
-{
-	struct xe_migrate *m = gt_to_tile(q->gt)->migrate;
-
-	xe_gt_assert(q->gt, q == m->bind_q);
-	lockdep_assert_held(&m->job_mutex);
-}
-#endif
-
 #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
 #include "tests/xe_migrate.c"
 #endif
diff --git a/drivers/gpu/drm/xe/xe_migrate.h b/drivers/gpu/drm/xe/xe_migrate.h
index ae979f6bf8ef..f6fa23c6c4fb 100644
--- a/drivers/gpu/drm/xe/xe_migrate.h
+++ b/drivers/gpu/drm/xe/xe_migrate.h
@@ -34,78 +34,6 @@ enum xe_migrate_copy_dir {
 	XE_MIGRATE_COPY_TO_SRAM,
 };
 
-/**
- * struct xe_migrate_pt_update_ops - Callbacks for the
- * xe_migrate_update_pgtables() function.
- */
-struct xe_migrate_pt_update_ops {
-	/**
-	 * @populate: Populate a command buffer or page-table with ptes.
-	 * @tile: The tile for the current operation.
-	 * @map: struct iosys_map into the memory to be populated.
-	 * @ofs: qword offset into @map, unused if @map is NULL.
-	 * @num_qwords: Number of qwords to write.
-	 * @update: Information about the PTEs to be inserted.
-	 *
-	 * This interface is intended to be used as a callback into the
-	 * page-table system to populate command buffers or shared
-	 * page-tables with PTEs.
-	 */
-	void (*populate)(struct xe_tile *tile, struct iosys_map *map,
-			 u32 ofs, u32 num_qwords,
-			 const struct xe_vm_pgtable_update *update);
-	/**
-	 * @clear: Clear a command buffer or page-table with ptes.
-	 * @vm: VM being updated
-	 * @tile: The tile for the current operation.
-	 * @map: struct iosys_map into the memory to be populated.
-	 * @ofs: qword offset into @map, unused if @map is NULL.
-	 * @num_qwords: Number of qwords to write.
-	 * @update: Information about the PTEs to be inserted.
-	 *
-	 * This interface is intended to be used as a callback into the
-	 * page-table system to populate command buffers or shared
-	 * page-tables with PTEs.
-	 */
-	void (*clear)(struct xe_vm *vm, struct xe_tile *tile,
-		      struct iosys_map *map, u32 ofs, u32 num_qwords,
-		      const struct xe_vm_pgtable_update *update);
-
-	/**
-	 * @pre_commit: Callback to be called just before arming the
-	 * sched_job.
-	 * @pt_update: Pointer to embeddable callback argument.
-	 *
-	 * Return: 0 on success, negative error code on error.
-	 */
-	int (*pre_commit)(struct xe_migrate_pt_update *pt_update);
-};
-
-/**
- * struct xe_migrate_pt_update - Argument to the
- * struct xe_migrate_pt_update_ops callbacks.
- *
- * Intended to be subclassed to support additional arguments if necessary.
- */
-struct xe_migrate_pt_update {
-	/** @ops: Pointer to the struct xe_migrate_pt_update_ops callbacks */
-	const struct xe_migrate_pt_update_ops *ops;
-	/** @vops: VMA operations */
-	struct xe_vma_ops *vops;
-	/** @job: The job if a GPU page-table update. NULL otherwise */
-	struct xe_sched_job *job;
-	/**
-	 * @ijob: The TLB invalidation job for primary GT. NULL otherwise
-	 */
-	struct xe_tlb_inval_job *ijob;
-	/**
-	 * @mjob: The TLB invalidation job for media GT. NULL otherwise
-	 */
-	struct xe_tlb_inval_job *mjob;
-	/** @tile_id: Tile ID of the update */
-	u8 tile_id;
-};
-
 struct xe_migrate *xe_migrate_alloc(struct xe_tile *tile);
 int xe_migrate_init(struct xe_migrate *m);
 
@@ -137,7 +65,6 @@ void xe_migrate_ccs_rw_copy_clear(struct xe_bo *src_bo,
 
 struct xe_lrc *xe_migrate_lrc(struct xe_migrate *migrate);
 struct xe_exec_queue *xe_migrate_exec_queue(struct xe_migrate *migrate);
-struct xe_exec_queue *xe_migrate_bind_queue(struct xe_migrate *migrate);
 struct dma_fence *xe_migrate_vram_copy_chunk(struct xe_bo *vram_bo, u64 vram_offset,
 					     struct xe_bo *sysmem_bo, u64 sysmem_offset,
 					     u64 size, enum xe_migrate_copy_dir dir);
@@ -156,28 +83,6 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m,
 
 struct xe_vm *xe_migrate_get_vm(struct xe_migrate *m);
 
-
-void
-xe_migrate_update_pgtables_cpu_execute(struct xe_vm *vm, struct xe_tile *tile,
-				       const struct xe_migrate_pt_update_ops *ops,
-				       struct xe_vm_pgtable_update_op *pt_op,
-				       int num_ops);
-
-struct dma_fence *
-xe_migrate_update_pgtables(struct xe_migrate *m,
-			   struct xe_migrate_pt_update *pt_update);
-
 void xe_migrate_wait(struct xe_migrate *m);
 
-#if IS_ENABLED(CONFIG_PROVE_LOCKING)
-void xe_migrate_job_lock_assert(struct xe_exec_queue *q);
-#else
-static inline void xe_migrate_job_lock_assert(struct xe_exec_queue *q)
-{
-}
-#endif
-
-void xe_migrate_job_lock(struct xe_migrate *m, struct xe_exec_queue *q);
-void xe_migrate_job_unlock(struct xe_migrate *m, struct xe_exec_queue *q);
-
 #endif
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 032947a10806..d91d80c92957 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -7,12 +7,12 @@
 
 #include "regs/xe_gtt_defs.h"
 #include "xe_bo.h"
+#include "xe_cpu_bind.h"
 #include "xe_device.h"
 #include "xe_drm_client.h"
 #include "xe_exec_queue.h"
 #include "xe_gt.h"
 #include "xe_gt_stats.h"
-#include "xe_migrate.h"
 #include "xe_page_reclaim.h"
 #include "xe_pt_types.h"
 #include "xe_pt_walk.h"
@@ -1291,11 +1291,9 @@ static int op_add_deps(struct xe_vm *vm, struct xe_vma_op *op,
 }
 
 static int xe_pt_vm_dependencies(struct xe_sched_job *job,
-				 struct xe_tlb_inval_job *ijob,
-				 struct xe_tlb_inval_job *mjob,
+				 struct xe_tlb_inval_job **ijobs,
 				 struct xe_vm *vm,
 				 struct xe_vma_ops *vops,
-				 struct xe_vm_pgtable_update_ops *pt_update_ops,
 				 struct xe_range_fence_tree *rftree)
 {
 	struct xe_range_fence *rtfence;
@@ -1308,20 +1306,22 @@ static int xe_pt_vm_dependencies(struct xe_sched_job *job,
 	if (!job && !no_in_syncs(vops->syncs, vops->num_syncs))
 		return -ETIME;
 
-	if (!job && !xe_exec_queue_is_idle(pt_update_ops->q))
+	if (!job && !xe_exec_queue_is_idle(vops->q))
 		return -ETIME;
 
-	if (pt_update_ops->wait_vm_bookkeep || pt_update_ops->wait_vm_kernel) {
-		err = job_test_add_deps(job, xe_vm_resv(vm),
-					pt_update_ops->wait_vm_bookkeep ?
-					DMA_RESV_USAGE_BOOKKEEP :
-					DMA_RESV_USAGE_KERNEL);
+	if (vops->flags & (XE_VMA_OPS_FLAG_WAIT_VM_BOOKKEEP |
+			   XE_VMA_OPS_FLAG_WAIT_VM_KERNEL)) {
+		enum dma_resv_usage usage = DMA_RESV_USAGE_KERNEL;
+
+		if (vops->flags & XE_VMA_OPS_FLAG_WAIT_VM_BOOKKEEP)
+			usage = DMA_RESV_USAGE_BOOKKEEP;
+
+		err = job_test_add_deps(job, xe_vm_resv(vm), usage);
 		if (err)
 			return err;
 	}
 
-	rtfence = xe_range_fence_tree_first(rftree, pt_update_ops->start,
-					    pt_update_ops->last);
+	rtfence = xe_range_fence_tree_first(rftree, vops->start, vops->last);
 	while (rtfence) {
 		fence = rtfence->fence;
 
@@ -1339,9 +1339,8 @@ static int xe_pt_vm_dependencies(struct xe_sched_job *job,
 				return err;
 		}
 
-		rtfence = xe_range_fence_tree_next(rtfence,
-						   pt_update_ops->start,
-						   pt_update_ops->last);
+		rtfence = xe_range_fence_tree_next(rtfence, vops->start,
+						   vops->last);
 	}
 
 	list_for_each_entry(op, &vops->list, link) {
@@ -1354,14 +1353,11 @@ static int xe_pt_vm_dependencies(struct xe_sched_job *job,
 		err = xe_sync_entry_add_deps(&vops->syncs[i], job);
 
 	if (job) {
-		if (ijob) {
-			err = xe_tlb_inval_job_alloc_dep(ijob);
-			if (err)
-				return err;
-		}
+		for (i = 0; i < XE_CPU_BIND_INVAL_JOB_COUNT; ++i) {
+			if (!ijobs[i])
+				continue;
 
-		if (mjob) {
-			err = xe_tlb_inval_job_alloc_dep(mjob);
+			err = xe_tlb_inval_job_alloc_dep(ijobs[i]);
 			if (err)
 				return err;
 		}
@@ -1370,17 +1366,14 @@ static int xe_pt_vm_dependencies(struct xe_sched_job *job,
 	return err;
 }
 
-static int xe_pt_pre_commit(struct xe_migrate_pt_update *pt_update)
+static int xe_pt_pre_commit(struct xe_cpu_bind_pt_update *pt_update)
 {
 	struct xe_vma_ops *vops = pt_update->vops;
 	struct xe_vm *vm = vops->vm;
-	struct xe_range_fence_tree *rftree = &vm->rftree[pt_update->tile_id];
-	struct xe_vm_pgtable_update_ops *pt_update_ops =
-		&vops->pt_update_ops[pt_update->tile_id];
+	struct xe_range_fence_tree *rftree = &vm->rftree;
 
-	return xe_pt_vm_dependencies(pt_update->job, pt_update->ijob,
-				     pt_update->mjob, vm, pt_update->vops,
-				     pt_update_ops, rftree);
+	return xe_pt_vm_dependencies(pt_update->job, pt_update->ijobs,
+				     vm, vops, rftree);
 }
 
 #if IS_ENABLED(CONFIG_DRM_GPUSVM)
@@ -1408,8 +1401,7 @@ static bool xe_pt_userptr_inject_eagain(struct xe_userptr_vma *uvma)
 
 #endif
 
-static int vma_check_userptr(struct xe_vm *vm, struct xe_vma *vma,
-			     struct xe_vm_pgtable_update_ops *pt_update)
+static int vma_check_userptr(struct xe_vm *vm, struct xe_vma *vma)
 {
 	struct xe_userptr_vma *uvma;
 	unsigned long notifier_seq;
@@ -1439,8 +1431,7 @@ static int vma_check_userptr(struct xe_vm *vm, struct xe_vma *vma,
 	return 0;
 }
 
-static int op_check_svm_userptr(struct xe_vm *vm, struct xe_vma_op *op,
-				struct xe_vm_pgtable_update_ops *pt_update)
+static int op_check_svm_userptr(struct xe_vm *vm, struct xe_vma_op *op)
 {
 	int err = 0;
 
@@ -1451,13 +1442,13 @@ static int op_check_svm_userptr(struct xe_vm *vm, struct xe_vma_op *op,
 		if (!op->map.immediate && xe_vm_in_fault_mode(vm))
 			break;
 
-		err = vma_check_userptr(vm, op->map.vma, pt_update);
+		err = vma_check_userptr(vm, op->map.vma);
 		break;
 	case DRM_GPUVA_OP_REMAP:
 		if (op->remap.prev)
-			err = vma_check_userptr(vm, op->remap.prev, pt_update);
+			err = vma_check_userptr(vm, op->remap.prev);
 		if (!err && op->remap.next)
-			err = vma_check_userptr(vm, op->remap.next, pt_update);
+			err = vma_check_userptr(vm, op->remap.next);
 		break;
 	case DRM_GPUVA_OP_UNMAP:
 		break;
@@ -1477,7 +1468,7 @@ static int op_check_svm_userptr(struct xe_vm *vm, struct xe_vma_op *op,
 				}
 			}
 		} else {
-			err = vma_check_userptr(vm, gpuva_to_vma(op->base.prefetch.va), pt_update);
+			err = vma_check_userptr(vm, gpuva_to_vma(op->base.prefetch.va));
 		}
 		break;
 #if IS_ENABLED(CONFIG_DRM_XE_GPUSVM)
@@ -1503,12 +1494,10 @@ static int op_check_svm_userptr(struct xe_vm *vm, struct xe_vma_op *op,
 	return err;
 }
 
-static int xe_pt_svm_userptr_pre_commit(struct xe_migrate_pt_update *pt_update)
+static int xe_pt_svm_userptr_pre_commit(struct xe_cpu_bind_pt_update *pt_update)
 {
 	struct xe_vm *vm = pt_update->vops->vm;
 	struct xe_vma_ops *vops = pt_update->vops;
-	struct xe_vm_pgtable_update_ops *pt_update_ops =
-		&vops->pt_update_ops[pt_update->tile_id];
 	struct xe_vma_op *op;
 	int err;
 
@@ -1519,7 +1508,7 @@ static int xe_pt_svm_userptr_pre_commit(struct xe_migrate_pt_update *pt_update)
 	xe_svm_notifier_lock(vm);
 
 	list_for_each_entry(op, &vops->list, link) {
-		err = op_check_svm_userptr(vm, op, pt_update_ops);
+		err = op_check_svm_userptr(vm, op);
 		if (err) {
 			xe_svm_notifier_unlock(vm);
 			break;
@@ -1823,10 +1812,10 @@ static unsigned int xe_pt_stage_unbind(struct xe_tile *tile,
 }
 
 static void
-xe_migrate_clear_pgtable_callback(struct xe_vm *vm, struct xe_tile *tile,
-				  struct iosys_map *map, u32 qword_ofs,
-				  u32 num_qwords,
-				  const struct xe_vm_pgtable_update *update)
+xe_pt_clear_pgtable_callback(struct xe_vm *vm, struct xe_tile *tile,
+			     struct iosys_map *map, u32 qword_ofs,
+			     u32 num_qwords,
+			     const struct xe_vm_pgtable_update *update)
 {
 	u64 empty = __xe_pt_empty_pte(tile, vm, update->level);
 	int i;
@@ -1904,6 +1893,9 @@ to_pt_op(struct xe_vm_pgtable_update_ops *pt_update_ops, u32 op_idx)
 static u32
 get_current_op(struct xe_vm_pgtable_update_ops *pt_update_ops)
 {
+	if (!pt_update_ops->pt_job_ops)
+		return 0;
+
 	return pt_update_ops->pt_job_ops->current_op;
 }
 
@@ -2187,6 +2179,7 @@ static int unbind_range_prepare(struct xe_vm *vm,
 
 static int op_prepare(struct xe_vm *vm,
 		      struct xe_tile *tile,
+		      struct xe_vma_ops *vops,
 		      struct xe_vm_pgtable_update_ops *pt_update_ops,
 		      struct xe_vma_op *op)
 {
@@ -2203,7 +2196,7 @@ static int op_prepare(struct xe_vm *vm,
 
 		err = bind_op_prepare(vm, tile, pt_update_ops, op->map.vma,
 				      op->map.invalidate_on_bind);
-		pt_update_ops->wait_vm_kernel = true;
+		vops->flags |= XE_VMA_OPS_FLAG_WAIT_VM_KERNEL;
 		break;
 	case DRM_GPUVA_OP_REMAP:
 	{
@@ -2217,12 +2210,12 @@ static int op_prepare(struct xe_vm *vm,
 		if (!err && op->remap.prev) {
 			err = bind_op_prepare(vm, tile, pt_update_ops,
 					      op->remap.prev, false);
-			pt_update_ops->wait_vm_bookkeep = true;
+			vops->flags |= XE_VMA_OPS_FLAG_WAIT_VM_KERNEL;
 		}
 		if (!err && op->remap.next) {
 			err = bind_op_prepare(vm, tile, pt_update_ops,
 					      op->remap.next, false);
-			pt_update_ops->wait_vm_bookkeep = true;
+			vops->flags |= XE_VMA_OPS_FLAG_WAIT_VM_KERNEL;
 		}
 		break;
 	}
@@ -2252,7 +2245,7 @@ static int op_prepare(struct xe_vm *vm,
 			}
 		} else {
 			err = bind_op_prepare(vm, tile, pt_update_ops, vma, false);
-			pt_update_ops->wait_vm_kernel = true;
+			vops->flags |= XE_VMA_OPS_FLAG_WAIT_VM_KERNEL;
 		}
 		break;
 	}
@@ -2283,18 +2276,8 @@ xe_pt_update_ops_init(struct xe_vm_pgtable_update_ops *pt_update_ops)
 	xe_page_reclaim_list_init(&pt_update_ops->prl);
 }
 
-/**
- * xe_pt_update_ops_prepare() - Prepare PT update operations
- * @tile: Tile of PT update operations
- * @vops: VMA operationa
- *
- * Prepare PT update operations which includes updating internal PT state,
- * allocate memory for page tables, populate page table being pruned in, and
- * create PT update operations for leaf insertion / removal.
- *
- * Return: 0 on success, negative error code on error.
- */
-int xe_pt_update_ops_prepare(struct xe_tile *tile, struct xe_vma_ops *vops)
+static int __xe_pt_update_ops_prepare(struct xe_tile *tile,
+				      struct xe_vma_ops *vops)
 {
 	struct xe_vm_pgtable_update_ops *pt_update_ops =
 		&vops->pt_update_ops[tile->id];
@@ -2313,7 +2296,7 @@ int xe_pt_update_ops_prepare(struct xe_tile *tile, struct xe_vma_ops *vops)
 		return err;
 
 	list_for_each_entry(op, &vops->list, link) {
-		err = op_prepare(vops->vm, tile, pt_update_ops, op);
+		err = op_prepare(vops->vm, tile, vops, pt_update_ops, op);
 
 		if (err)
 			return err;
@@ -2322,6 +2305,16 @@ int xe_pt_update_ops_prepare(struct xe_tile *tile, struct xe_vma_ops *vops)
 	xe_tile_assert(tile, get_current_op(pt_update_ops) <=
 		       pt_update_ops->num_ops);
 
+	/* Propagate individual tile state up to VMA operation */
+	if (pt_update_ops->start < vops->start)
+		vops->start = pt_update_ops->start;
+	if (pt_update_ops->last > vops->last)
+		vops->last = pt_update_ops->last;
+	if (pt_update_ops->needs_invalidation)
+		vops->flags |= XE_VMA_OPS_FLAG_NEEDS_INVALIDATION;
+	if (pt_update_ops->needs_svm_lock)
+		vops->flags |= XE_VMA_OPS_FLAG_NEEDS_SVM_LOCK;
+
 #ifdef TEST_VM_OPS_ERROR
 	if (vops->inject_error &&
 	    vops->vm->xe->vm_inject_error_position == FORCE_OP_ERROR_PREPARE)
@@ -2330,35 +2323,68 @@ int xe_pt_update_ops_prepare(struct xe_tile *tile, struct xe_vma_ops *vops)
 
 	return 0;
 }
-ALLOW_ERROR_INJECTION(xe_pt_update_ops_prepare, ERRNO);
 
-static void bind_op_commit(struct xe_vm *vm, struct xe_tile *tile,
-			   struct xe_vm_pgtable_update_ops *pt_update_ops,
-			   struct xe_vma *vma, struct dma_fence *fence,
-			   struct dma_fence *fence2, bool invalidate_on_bind)
+/**
+ * xe_pt_update_ops_prepare() - Prepare PT update operations
+ * @xe: xe device.
+ * @vops: VMA operationa
+ *
+ * Prepare PT update operations which includes updating internal PT state,
+ * allocate memory for page tables, populate page table being pruned in, and
+ * create PT update operations for leaf insertion / removal.
+ *
+ * Return: 0 on success, negative error code on error.
+ */
+int xe_pt_update_ops_prepare(struct xe_device *xe, struct xe_vma_ops *vops)
 {
-	xe_tile_assert(tile, !xe_vma_is_cpu_addr_mirror(vma));
+	struct xe_tile *tile;
+	int id, err;
+
+	for_each_tile(tile, xe, id) {
+		if (!vops->pt_update_ops[id].num_ops)
+			continue;
 
-	if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm) {
-		dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence,
-				   pt_update_ops->wait_vm_bookkeep ?
-				   DMA_RESV_USAGE_KERNEL :
-				   DMA_RESV_USAGE_BOOKKEEP);
-		if (fence2)
-			dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence2,
-					   pt_update_ops->wait_vm_bookkeep ?
-					   DMA_RESV_USAGE_KERNEL :
-					   DMA_RESV_USAGE_BOOKKEEP);
+		err = __xe_pt_update_ops_prepare(tile, vops);
+		if (err)
+			return err;
 	}
+
+	return 0;
+}
+ALLOW_ERROR_INJECTION(xe_pt_update_ops_prepare, ERRNO);
+
+static void vma_add_fences(struct xe_vma *vma, struct dma_fence **fences,
+			   int fence_count, enum dma_resv_usage usage)
+{
+	int i;
+
+	if (xe_vma_has_no_bo(vma) || xe_vma_bo(vma)->vm)
+		return;
+
+	for (i = 0; i < fence_count; ++i)
+		if (fences[i])
+			dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv,
+					   fences[i], usage);
+}
+
+static void bind_op_commit(struct xe_vm *vm, struct xe_vma *vma,
+			   struct dma_fence **fences, int fence_count,
+			   enum dma_resv_usage usage, u8 tile_mask,
+			   bool invalidate_on_bind)
+{
+	xe_assert(vm->xe, !xe_vma_is_cpu_addr_mirror(vma));
+
+	vma_add_fences(vma, fences, fence_count, usage);
+
 	/* All WRITE_ONCE pair with READ_ONCE in xe_vm_has_valid_gpu_mapping() */
-	WRITE_ONCE(vma->tile_present, vma->tile_present | BIT(tile->id));
+	WRITE_ONCE(vma->tile_present, vma->tile_present | tile_mask);
 	if (invalidate_on_bind)
 		WRITE_ONCE(vma->tile_invalidated,
-			   vma->tile_invalidated | BIT(tile->id));
+			   vma->tile_invalidated | tile_mask);
 	else
 		WRITE_ONCE(vma->tile_invalidated,
-			   vma->tile_invalidated & ~BIT(tile->id));
-	vma->tile_staged &= ~BIT(tile->id);
+			   vma->tile_invalidated & ~tile_mask);
+	vma->tile_staged &= ~tile_mask;
 	if (xe_vma_is_userptr(vma)) {
 		xe_svm_assert_held_read(vm);
 		to_userptr_vma(vma)->userptr.initial_bind = true;
@@ -2368,31 +2394,21 @@ static void bind_op_commit(struct xe_vm *vm, struct xe_tile *tile,
 	 * Kick rebind worker if this bind triggers preempt fences and not in
 	 * the rebind worker
 	 */
-	if (pt_update_ops->wait_vm_bookkeep &&
+	if (usage == DMA_RESV_USAGE_KERNEL &&
 	    xe_vm_in_preempt_fence_mode(vm) &&
 	    !current->mm)
 		xe_vm_queue_rebind_worker(vm);
 }
 
-static void unbind_op_commit(struct xe_vm *vm, struct xe_tile *tile,
-			     struct xe_vm_pgtable_update_ops *pt_update_ops,
-			     struct xe_vma *vma, struct dma_fence *fence,
-			     struct dma_fence *fence2)
+static void unbind_op_commit(struct xe_vm *vm, struct xe_vma *vma,
+			     struct dma_fence **fences, int fence_count,
+			     enum dma_resv_usage usage, u8 tile_mask)
 {
-	xe_tile_assert(tile, !xe_vma_is_cpu_addr_mirror(vma));
+	xe_assert(vm->xe, !xe_vma_is_cpu_addr_mirror(vma));
 
-	if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm) {
-		dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence,
-				   pt_update_ops->wait_vm_bookkeep ?
-				   DMA_RESV_USAGE_KERNEL :
-				   DMA_RESV_USAGE_BOOKKEEP);
-		if (fence2)
-			dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence2,
-					   pt_update_ops->wait_vm_bookkeep ?
-					   DMA_RESV_USAGE_KERNEL :
-					   DMA_RESV_USAGE_BOOKKEEP);
-	}
-	vma->tile_present &= ~BIT(tile->id);
+	vma_add_fences(vma, fences, fence_count, usage);
+
+	vma->tile_present &= ~tile_mask;
 	if (!vma->tile_present) {
 		list_del_init(&vma->combined_links.rebind);
 		if (xe_vma_is_userptr(vma)) {
@@ -2407,21 +2423,19 @@ static void unbind_op_commit(struct xe_vm *vm, struct xe_tile *tile,
 
 static void range_present_and_invalidated_tile(struct xe_vm *vm,
 					       struct xe_svm_range *range,
-					       u8 tile_id)
+					       u8 tile_mask)
 {
 	/* All WRITE_ONCE pair with READ_ONCE in xe_vm_has_valid_gpu_mapping() */
 
 	lockdep_assert_held(&vm->svm.gpusvm.notifier_lock);
 
-	WRITE_ONCE(range->tile_present, range->tile_present | BIT(tile_id));
-	WRITE_ONCE(range->tile_invalidated, range->tile_invalidated & ~BIT(tile_id));
+	WRITE_ONCE(range->tile_present, range->tile_present | tile_mask);
+	WRITE_ONCE(range->tile_invalidated, range->tile_invalidated & ~tile_mask);
 }
 
-static void op_commit(struct xe_vm *vm,
-		      struct xe_tile *tile,
-		      struct xe_vm_pgtable_update_ops *pt_update_ops,
-		      struct xe_vma_op *op, struct dma_fence *fence,
-		      struct dma_fence *fence2)
+static void op_commit(struct xe_vm *vm, struct xe_vma_op *op,
+		      struct dma_fence **fences, int fence_count,
+		      enum dma_resv_usage usage, u8 tile_mask)
 {
 	xe_vm_assert_held(vm);
 
@@ -2431,8 +2445,8 @@ static void op_commit(struct xe_vm *vm,
 		    (op->map.vma_flags & XE_VMA_SYSTEM_ALLOCATOR))
 			break;
 
-		bind_op_commit(vm, tile, pt_update_ops, op->map.vma, fence,
-			       fence2, op->map.invalidate_on_bind);
+		bind_op_commit(vm, op->map.vma, fences, fence_count, usage,
+			       tile_mask, op->map.invalidate_on_bind);
 		break;
 	case DRM_GPUVA_OP_REMAP:
 	{
@@ -2441,14 +2455,15 @@ static void op_commit(struct xe_vm *vm,
 		if (xe_vma_is_cpu_addr_mirror(old))
 			break;
 
-		unbind_op_commit(vm, tile, pt_update_ops, old, fence, fence2);
+		unbind_op_commit(vm, old, fences, fence_count, usage,
+				 tile_mask);
 
 		if (op->remap.prev)
-			bind_op_commit(vm, tile, pt_update_ops, op->remap.prev,
-				       fence, fence2, false);
+			bind_op_commit(vm, op->remap.prev, fences, fence_count,
+				       usage, tile_mask, false);
 		if (op->remap.next)
-			bind_op_commit(vm, tile, pt_update_ops, op->remap.next,
-				       fence, fence2, false);
+			bind_op_commit(vm, op->remap.next, fences, fence_count,
+				       usage, tile_mask, false);
 		break;
 	}
 	case DRM_GPUVA_OP_UNMAP:
@@ -2456,8 +2471,8 @@ static void op_commit(struct xe_vm *vm,
 		struct xe_vma *vma = gpuva_to_vma(op->base.unmap.va);
 
 		if (!xe_vma_is_cpu_addr_mirror(vma))
-			unbind_op_commit(vm, tile, pt_update_ops, vma, fence,
-					 fence2);
+			unbind_op_commit(vm, vma, fences, fence_count,
+					 tile_mask, usage);
 		break;
 	}
 	case DRM_GPUVA_OP_PREFETCH:
@@ -2469,10 +2484,11 @@ static void op_commit(struct xe_vm *vm,
 			unsigned long i;
 
 			xa_for_each(&op->prefetch_range.range, i, range)
-				range_present_and_invalidated_tile(vm, range, tile->id);
+				range_present_and_invalidated_tile(vm, range,
+								   tile_mask);
 		} else {
-			bind_op_commit(vm, tile, pt_update_ops, vma, fence,
-				       fence2, false);
+			bind_op_commit(vm, vma, fences, fence_count, usage,
+				       tile_mask, false);
 		}
 		break;
 	}
@@ -2480,11 +2496,12 @@ static void op_commit(struct xe_vm *vm,
 	{
 		/* WRITE_ONCE pairs with READ_ONCE in xe_vm_has_valid_gpu_mapping() */
 		if (op->subop == XE_VMA_SUBOP_MAP_RANGE)
-			range_present_and_invalidated_tile(vm, op->map_range.range, tile->id);
+			range_present_and_invalidated_tile(vm, op->map_range.range,
+							   tile_mask);
 		else if (op->subop == XE_VMA_SUBOP_UNMAP_RANGE)
 			WRITE_ONCE(op->unmap_range.range->tile_present,
 				   op->unmap_range.range->tile_present &
-				   ~BIT(tile->id));
+				   ~tile_mask);
 
 		break;
 	}
@@ -2493,40 +2510,25 @@ static void op_commit(struct xe_vm *vm,
 	}
 }
 
-static const struct xe_migrate_pt_update_ops migrate_ops = {
+static const struct xe_cpu_bind_pt_update_ops cpu_bind_ops = {
 	.populate = xe_vm_populate_pgtable,
-	.clear = xe_migrate_clear_pgtable_callback,
+	.clear = xe_pt_clear_pgtable_callback,
 	.pre_commit = xe_pt_pre_commit,
 };
 
 #if IS_ENABLED(CONFIG_DRM_GPUSVM)
-static const struct xe_migrate_pt_update_ops svm_userptr_migrate_ops = {
+static const struct xe_cpu_bind_pt_update_ops svm_userptr_cpu_bind_ops = {
 	.populate = xe_vm_populate_pgtable,
-	.clear = xe_migrate_clear_pgtable_callback,
+	.clear = xe_pt_clear_pgtable_callback,
 	.pre_commit = xe_pt_svm_userptr_pre_commit,
 };
 #else
-static const struct xe_migrate_pt_update_ops svm_userptr_migrate_ops;
+static const struct xe_cpu_bind_pt_update_ops svm_userptr_cpu_bind_ops;
 #endif
 
-static struct xe_dep_scheduler *to_dep_scheduler(struct xe_exec_queue *q,
-						 struct xe_tile *tile,
-						 struct xe_gt *gt,
-						 unsigned int *type)
-{
-	int tile_ofs = tile->id * (XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT + 1);
-
-	if (xe_gt_is_media_type(gt))
-		*type = tile_ofs + XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT;
-	else
-		*type = tile_ofs + XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT;
-
-	return q->tlb_inval[*type].dep_scheduler;
-}
-
 /**
  * xe_pt_update_ops_run() - Run PT update operations
- * @tile: Tile of PT update operations
+ * @xe: xe device.
  * @vops: VMA operationa
  *
  * Run PT update operations which includes committing internal PT state changes,
@@ -2536,82 +2538,83 @@ static struct xe_dep_scheduler *to_dep_scheduler(struct xe_exec_queue *q,
  * Return: fence on success, negative ERR_PTR on error.
  */
 struct dma_fence *
-xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
+xe_pt_update_ops_run(struct xe_device *xe, struct xe_vma_ops *vops)
 {
 	struct xe_vm *vm = vops->vm;
-	struct xe_vm_pgtable_update_ops *pt_update_ops =
-		&vops->pt_update_ops[tile->id];
-	struct xe_exec_queue *q = pt_update_ops->q;
-	struct dma_fence *fence, *ifence = NULL, *mfence = NULL;
-	struct xe_tlb_inval_job *ijob = NULL, *mjob = NULL;
+	struct xe_exec_queue *q = vops->q;
+	struct dma_fence *fence;
+	struct dma_fence *ifences[XE_CPU_BIND_INVAL_JOB_COUNT] = {};
 	struct xe_range_fence *rfence;
+	enum dma_resv_usage usage = DMA_RESV_USAGE_BOOKKEEP;
 	struct xe_vma_op *op;
-	unsigned int type;
-	int err = 0, i;
-	struct xe_migrate_pt_update update = {
-		.ops = pt_update_ops->needs_svm_lock ?
-			&svm_userptr_migrate_ops :
-			&migrate_ops,
+	struct xe_tile *tile;
+	int err = 0, total_ops = 0, i, j;
+	u8 tile_mask = 0;
+	bool needs_invalidation = vops->flags &
+		XE_VMA_OPS_FLAG_NEEDS_INVALIDATION;
+	bool needs_svm_lock = vops->flags &
+		XE_VMA_OPS_FLAG_NEEDS_SVM_LOCK;
+	struct xe_cpu_bind_pt_update update = {
+		.ops = needs_svm_lock ? &svm_userptr_cpu_bind_ops :
+			&cpu_bind_ops,
 		.vops = vops,
-		.tile_id = tile->id,
 	};
 
 	lockdep_assert_held(&vm->lock);
 	xe_vm_assert_held(vm);
 
-	if (!get_current_op(pt_update_ops)) {
-		xe_tile_assert(tile, xe_vm_in_fault_mode(vm));
+	for_each_tile(tile, xe, j) {
+		struct xe_vm_pgtable_update_ops *pt_update_ops =
+			&vops->pt_update_ops[j];
 
+		total_ops += get_current_op(pt_update_ops);
+	}
+	if (!total_ops) {
+		xe_assert(xe, xe_vm_in_fault_mode(vm));
 		return dma_fence_get_stub();
 	}
 
 #ifdef TEST_VM_OPS_ERROR
 	if (vops->inject_error &&
-	    vm->xe->vm_inject_error_position == FORCE_OP_ERROR_RUN)
+	    xe->vm_inject_error_position == FORCE_OP_ERROR_RUN)
 		return ERR_PTR(-ENOSPC);
 #endif
 
-	if (pt_update_ops->needs_invalidation) {
-		struct xe_dep_scheduler *dep_scheduler =
-			to_dep_scheduler(q, tile, tile->primary_gt, &type);
-
-		ijob = xe_tlb_inval_job_create(q, &tile->primary_gt->tlb_inval,
-					       dep_scheduler, vm,
-					       pt_update_ops->start,
-					       pt_update_ops->last,
-					       type);
-		if (IS_ERR(ijob)) {
-			err = PTR_ERR(ijob);
-			goto kill_vm_tile1;
-		}
-		update.ijob = ijob;
-		/*
-		 * Only add page reclaim for the primary GT. Media GT does not have
-		 * any PPC to flush, so enabling the PPC flush bit for media is
-		 * effectively a NOP and provides no performance benefit nor
-		 * interfere with primary GT.
-		 */
-		if (xe_page_reclaim_list_valid(&pt_update_ops->prl)) {
-			xe_tlb_inval_job_add_page_reclaim(ijob, &pt_update_ops->prl);
-			/* Release ref from alloc, job will now handle it */
-			xe_page_reclaim_list_invalidate(&pt_update_ops->prl);
-		}
-
-		if (tile->media_gt) {
-			dep_scheduler = to_dep_scheduler(q, tile,
-							 tile->media_gt, &type);
-
-			mjob = xe_tlb_inval_job_create(q,
-						       &tile->media_gt->tlb_inval,
-						       dep_scheduler, vm,
-						       pt_update_ops->start,
-						       pt_update_ops->last,
-						       type);
-			if (IS_ERR(mjob)) {
-				err = PTR_ERR(mjob);
+	if (needs_invalidation) {
+		for_each_tlb_inval(q, i) {
+			struct xe_dep_scheduler *dep_scheduler =
+				q->tlb_inval[i].dep_scheduler;
+			struct xe_tile *tile =
+				&xe->tiles[i / XE_MAX_GT_PER_TILE];
+			struct xe_vm_pgtable_update_ops *pt_update_ops =
+				&vops->pt_update_ops[tile->id];
+			struct xe_page_reclaim_list *prl = &pt_update_ops->prl;
+			struct xe_tlb_inval_job *ijob;
+			struct xe_gt *gt = i % XE_MAX_GT_PER_TILE ?
+				tile->media_gt : tile->primary_gt;
+
+			ijob = xe_tlb_inval_job_create(q, &gt->tlb_inval,
+						       dep_scheduler,
+						       vm, pt_update_ops->start,
+						       pt_update_ops->last, i);
+			if (IS_ERR(ijob)) {
+				err = PTR_ERR(ijob);
 				goto free_ijob;
 			}
-			update.mjob = mjob;
+
+			update.ijobs[i] = ijob;
+
+			/*
+			 * Only add page reclaim for the primary GT. Media GT
+			 * does not have any PPC to flush, so enabling the PPC
+			 * flush bit for media is effectively a NOP and provides
+			 * no performance benefit nor interfere with primary GT.
+			 */
+			if (xe_page_reclaim_list_valid(prl)) {
+				xe_tlb_inval_job_add_page_reclaim(ijob, prl);
+				/* Release ref from alloc, job will now handle it */
+				xe_page_reclaim_list_invalidate(prl);
+			}
 		}
 	}
 
@@ -2621,67 +2624,61 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
 		goto free_ijob;
 	}
 
-	fence = xe_migrate_update_pgtables(tile->migrate, &update);
+	fence = xe_cpu_bind_update_pgtables(xe->cpu_bind, &update);
 	if (IS_ERR(fence)) {
 		err = PTR_ERR(fence);
 		goto free_rfence;
 	}
 
 	/* Point of no return - VM killed if failure after this */
-	for (i = 0; i < get_current_op(pt_update_ops); ++i) {
-		struct xe_vm_pgtable_update_op *pt_op =
-			to_pt_op(pt_update_ops, i);
-
-		xe_pt_commit(pt_op->vma, pt_op->entries,
-			     pt_op->num_entries,
-			     &pt_update_ops->pt_job_ops->deferred);
-		pt_op->vma = NULL;	/* skip in xe_pt_update_ops_abort */
+	for_each_tile(tile, xe, j) {
+		struct xe_vm_pgtable_update_ops *pt_update_ops =
+			&vops->pt_update_ops[j];
+
+		for (i = 0; i < get_current_op(pt_update_ops); ++i) {
+			struct xe_vm_pgtable_update_op *pt_op =
+				to_pt_op(pt_update_ops, i);
+
+			xe_pt_commit(pt_op->vma, pt_op->entries,
+				     pt_op->num_entries,
+				     &pt_update_ops->pt_job_ops->deferred);
+			pt_op->vma = NULL;	/* skip in xe_pt_update_ops_abort */
+			tile_mask |= BIT(tile->id);
+		}
 	}
 
-	if (xe_range_fence_insert(&vm->rftree[tile->id], rfence,
+	if (xe_range_fence_insert(&vm->rftree, rfence,
 				  &xe_range_fence_kfree_ops,
-				  pt_update_ops->start,
-				  pt_update_ops->last, fence))
+				  vops->start, vops->last, fence))
 		dma_fence_wait(fence, false);
 
-	if (ijob)
-		ifence = xe_tlb_inval_job_push(ijob, tile->migrate, fence);
-	if (mjob)
-		mfence = xe_tlb_inval_job_push(mjob, tile->migrate, fence);
+	if (vops->flags & XE_VMA_OPS_FLAG_WAIT_VM_BOOKKEEP)
+		usage = DMA_RESV_USAGE_KERNEL;
 
-	if (!mjob && !ijob) {
-		dma_resv_add_fence(xe_vm_resv(vm), fence,
-				   pt_update_ops->wait_vm_bookkeep ?
-				   DMA_RESV_USAGE_KERNEL :
-				   DMA_RESV_USAGE_BOOKKEEP);
+	if (!needs_invalidation) {
+		dma_resv_add_fence(xe_vm_resv(vm), fence, usage);
 
 		list_for_each_entry(op, &vops->list, link)
-			op_commit(vops->vm, tile, pt_update_ops, op, fence, NULL);
-	} else if (ijob && !mjob) {
-		dma_resv_add_fence(xe_vm_resv(vm), ifence,
-				   pt_update_ops->wait_vm_bookkeep ?
-				   DMA_RESV_USAGE_KERNEL :
-				   DMA_RESV_USAGE_BOOKKEEP);
-
-		list_for_each_entry(op, &vops->list, link)
-			op_commit(vops->vm, tile, pt_update_ops, op, ifence, NULL);
+			op_commit(vops->vm, op, &fence, 1, usage, tile_mask);
 	} else {
-		dma_resv_add_fence(xe_vm_resv(vm), ifence,
-				   pt_update_ops->wait_vm_bookkeep ?
-				   DMA_RESV_USAGE_KERNEL :
-				   DMA_RESV_USAGE_BOOKKEEP);
+		for (i = 0; i < XE_CPU_BIND_INVAL_JOB_COUNT; ++i) {
+			if (!update.ijobs[i])
+				continue;
+
+			ifences[i] = xe_tlb_inval_job_push(update.ijobs[i],
+							   fence);
+			xe_assert(xe, !IS_ERR_OR_NULL(ifences[i]));
 
-		dma_resv_add_fence(xe_vm_resv(vm), mfence,
-				   pt_update_ops->wait_vm_bookkeep ?
-				   DMA_RESV_USAGE_KERNEL :
-				   DMA_RESV_USAGE_BOOKKEEP);
+			dma_resv_add_fence(xe_vm_resv(vm), ifences[i], usage);
+		}
 
 		list_for_each_entry(op, &vops->list, link)
-			op_commit(vops->vm, tile, pt_update_ops, op, ifence,
-				  mfence);
+			op_commit(vops->vm, op, ifences,
+				  XE_CPU_BIND_INVAL_JOB_COUNT, usage,
+				  tile_mask);
 	}
 
-	if (pt_update_ops->needs_svm_lock)
+	if (needs_svm_lock)
 		xe_svm_notifier_unlock(vm);
 
 	/*
@@ -2691,21 +2688,18 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
 	if (!(q->flags & EXEC_QUEUE_FLAG_MIGRATE))
 		xe_exec_queue_last_fence_set(q, vm, fence);
 
-	xe_tlb_inval_job_put(mjob);
-	xe_tlb_inval_job_put(ijob);
-	dma_fence_put(ifence);
-	dma_fence_put(mfence);
+	for (i = 0; i < XE_CPU_BIND_INVAL_JOB_COUNT; ++i) {
+		xe_tlb_inval_job_put(update.ijobs[i]);
+		dma_fence_put(ifences[i]);
+	}
 
 	return fence;
 
 free_rfence:
 	kfree(rfence);
 free_ijob:
-	xe_tlb_inval_job_put(mjob);
-	xe_tlb_inval_job_put(ijob);
-kill_vm_tile1:
-	if (err != -EAGAIN && err != -ENODATA && tile->id)
-		xe_vm_kill(vops->vm, false);
+	for (i = 0; i < XE_CPU_BIND_INVAL_JOB_COUNT; ++i)
+		xe_tlb_inval_job_put(update.ijobs[i]);
 
 	return ERR_PTR(err);
 }
@@ -2713,52 +2707,65 @@ ALLOW_ERROR_INJECTION(xe_pt_update_ops_run, ERRNO);
 
 /**
  * xe_pt_update_ops_fini() - Finish PT update operations
- * @tile: Tile of PT update operations
+ * @xe: xe device.
  * @vops: VMA operations
  *
  * Finish PT update operations by committing to destroy page table memory
  */
-void xe_pt_update_ops_fini(struct xe_tile *tile, struct xe_vma_ops *vops)
+void xe_pt_update_ops_fini(struct xe_device *xe, struct xe_vma_ops *vops)
 {
-	struct xe_vm_pgtable_update_ops *pt_update_ops =
-		&vops->pt_update_ops[tile->id];
+	struct xe_tile *tile;
+	int id;
+
+	for_each_tile(tile, xe, id) {
+		struct xe_vm_pgtable_update_ops *pt_update_ops =
+			&vops->pt_update_ops[id];
 
-	xe_page_reclaim_entries_put(pt_update_ops->prl.entries);
+		if (!pt_update_ops->num_ops)
+			continue;
+
+		xe_page_reclaim_entries_put(pt_update_ops->prl.entries);
+	}
 }
 
 /**
  * xe_pt_update_ops_abort() - Abort PT update operations
- * @tile: Tile of PT update operations
+ * @xe: xe device.
  * @vops: VMA operationa
  *
  *  Abort PT update operations by unwinding internal PT state
  */
-void xe_pt_update_ops_abort(struct xe_tile *tile, struct xe_vma_ops *vops)
+void xe_pt_update_ops_abort(struct xe_device *xe, struct xe_vma_ops *vops)
 {
-	struct xe_vm_pgtable_update_ops *pt_update_ops =
-		&vops->pt_update_ops[tile->id];
-	int i;
+	struct xe_tile *tile;
+	int id;
 
 	lockdep_assert_held(&vops->vm->lock);
 	xe_vm_assert_held(vops->vm);
 
-	for (i = pt_update_ops->num_ops - 1; i >= 0; --i) {
-		struct xe_vm_pgtable_update_op *pt_op =
-			to_pt_op(pt_update_ops, i);
-
-		if (!pt_op->vma || i >= get_current_op(pt_update_ops))
-			continue;
-
-		if (pt_op->bind)
-			xe_pt_abort_bind(pt_op->vma, pt_op->entries,
-					 pt_op->num_entries,
-					 pt_op->rebind);
-		else
-			xe_pt_abort_unbind(pt_op->vma, pt_op->entries,
-					   pt_op->num_entries);
+	for_each_tile(tile, xe, id) {
+		struct xe_vm_pgtable_update_ops *pt_update_ops =
+			&vops->pt_update_ops[id];
+		int i;
+
+		for (i = pt_update_ops->num_ops - 1; i >= 0; --i) {
+			struct xe_vm_pgtable_update_op *pt_op =
+				to_pt_op(pt_update_ops, i);
+
+			if (!pt_op->vma || i >= get_current_op(pt_update_ops))
+				continue;
+
+			if (pt_op->bind)
+				xe_pt_abort_bind(pt_op->vma, pt_op->entries,
+						 pt_op->num_entries,
+						 pt_op->rebind);
+			else
+				xe_pt_abort_unbind(pt_op->vma, pt_op->entries,
+						   pt_op->num_entries);
+		}
 	}
 
-	xe_pt_update_ops_fini(tile, vops);
+	xe_pt_update_ops_fini(xe, vops);
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
index 5faddb8e700c..cd78141fb81c 100644
--- a/drivers/gpu/drm/xe/xe_pt.h
+++ b/drivers/gpu/drm/xe/xe_pt.h
@@ -39,11 +39,11 @@ void xe_pt_destroy(struct xe_pt *pt, u32 flags, struct llist_head *deferred);
 
 void xe_pt_clear(struct xe_device *xe, struct xe_pt *pt);
 
-int xe_pt_update_ops_prepare(struct xe_tile *tile, struct xe_vma_ops *vops);
-struct dma_fence *xe_pt_update_ops_run(struct xe_tile *tile,
+int xe_pt_update_ops_prepare(struct xe_device *xe, struct xe_vma_ops *vops);
+struct dma_fence *xe_pt_update_ops_run(struct xe_device *xe,
 				       struct xe_vma_ops *vops);
-void xe_pt_update_ops_fini(struct xe_tile *tile, struct xe_vma_ops *vops);
-void xe_pt_update_ops_abort(struct xe_tile *tile, struct xe_vma_ops *vops);
+void xe_pt_update_ops_fini(struct xe_device *xe, struct xe_vma_ops *vops);
+void xe_pt_update_ops_abort(struct xe_device *xe, struct xe_vma_ops *vops);
 
 bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
 bool xe_pt_zap_ptes_range(struct xe_tile *tile, struct xe_vm *vm,
diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
index aa1d7c0e8669..5cdd7cd25a91 100644
--- a/drivers/gpu/drm/xe/xe_pt_types.h
+++ b/drivers/gpu/drm/xe/xe_pt_types.h
@@ -120,8 +120,6 @@ struct xe_pt_job_ops {
 struct xe_vm_pgtable_update_ops {
 	/** @pt_job_ops: PT update operations dynamic allocation*/
 	struct xe_pt_job_ops *pt_job_ops;
-	/** @q: exec queue for PT operations */
-	struct xe_exec_queue *q;
 	/** @prl: embedded page reclaim list */
 	struct xe_page_reclaim_list prl;
 	/** @start: start address of ops */
@@ -134,18 +132,6 @@ struct xe_vm_pgtable_update_ops {
 	bool needs_svm_lock;
 	/** @needs_invalidation: Needs invalidation */
 	bool needs_invalidation;
-	/**
-	 * @wait_vm_bookkeep: PT operations need to wait until VM is idle
-	 * (bookkeep dma-resv slots are idle) and stage all future VM activity
-	 * behind these operations (install PT operations into VM kernel
-	 * dma-resv slot).
-	 */
-	bool wait_vm_bookkeep;
-	/**
-	 * @wait_vm_kernel: PT operations need to wait until VM kernel dma-resv
-	 * slots are idle.
-	 */
-	bool wait_vm_kernel;
 };
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index a8ba7f90368f..3fde9b386bb9 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -73,8 +73,9 @@ static void job_free(struct xe_sched_job *job)
 	struct xe_exec_queue *q = job->q;
 	bool is_migration = xe_sched_job_is_migration(q);
 
-	kmem_cache_free(xe_exec_queue_is_parallel(job->q) || is_migration ?
-			xe_sched_job_parallel_slab : xe_sched_job_slab, job);
+	kmem_cache_free(job->is_pt_job || xe_exec_queue_is_parallel(job->q) ||
+			is_migration ? xe_sched_job_parallel_slab :
+			xe_sched_job_slab, job);
 }
 
 static struct xe_device *job_to_xe(struct xe_sched_job *job)
@@ -124,10 +125,12 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
 	xe_assert(xe, batch_addr ||
 		  q->flags & (EXEC_QUEUE_FLAG_VM | EXEC_QUEUE_FLAG_MIGRATE));
 
-	job = job_alloc(xe_exec_queue_is_parallel(q) || is_migration);
+	job = job_alloc(!batch_addr || xe_exec_queue_is_parallel(q) ||
+			is_migration);
 	if (!job)
 		return ERR_PTR(-ENOMEM);
 
+	job->is_pt_job = !batch_addr;
 	job->q = q;
 	job->sample_timestamp = U64_MAX;
 	kref_init(&job->refcount);
@@ -140,7 +143,6 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
 
 	if (!batch_addr) {
 		job->fence = dma_fence_get_stub();
-		job->is_pt_job = true;
 	} else {
 		for (i = 0; i < q->width; ++i) {
 			struct dma_fence *fence = xe_lrc_alloc_seqno_fence();
diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
index 9be4e2c5989d..3a797de746ad 100644
--- a/drivers/gpu/drm/xe/xe_sched_job_types.h
+++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
@@ -14,7 +14,7 @@ struct dma_fence;
 struct dma_fence_chain;
 
 struct xe_exec_queue;
-struct xe_migrate_pt_update_ops;
+struct xe_cpu_bind_pt_update_ops;
 struct xe_pt_job_ops;
 struct xe_tile;
 struct xe_vm;
@@ -25,12 +25,11 @@ struct xe_vm;
 struct xe_pt_update_args {
 	/** @vm: VM which is being bound */
 	struct xe_vm *vm;
-	/** @tile: Tile which page tables belong to */
-	struct xe_tile *tile;
-	/** @ops: Migrate PT update ops */
-	const struct xe_migrate_pt_update_ops *ops;
+	/** @ops: CPU bind PT update ops */
+	const struct xe_cpu_bind_pt_update_ops *ops;
+#define XE_PT_UPDATE_JOB_OPS_COUNT	2
 	/** @pt_job_ops: PT job ops state */
-	struct xe_pt_job_ops *pt_job_ops;
+	struct xe_pt_job_ops *pt_job_ops[XE_PT_UPDATE_JOB_OPS_COUNT];
 };
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.c b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
index 81f560068d3c..7378cfe6e855 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
+++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
@@ -4,6 +4,7 @@
  */
 
 #include "xe_assert.h"
+#include "xe_cpu_bind.h"
 #include "xe_dep_job_types.h"
 #include "xe_dep_scheduler.h"
 #include "xe_exec_queue.h"
@@ -12,7 +13,6 @@
 #include "xe_page_reclaim.h"
 #include "xe_tlb_inval.h"
 #include "xe_tlb_inval_job.h"
-#include "xe_migrate.h"
 #include "xe_pm.h"
 #include "xe_vm.h"
 
@@ -218,7 +218,6 @@ int xe_tlb_inval_job_alloc_dep(struct xe_tlb_inval_job *job)
 /**
  * xe_tlb_inval_job_push() - TLB invalidation job push
  * @job: TLB invalidation job to push
- * @m: The migration object being used
  * @fence: Dependency for TLB invalidation job
  *
  * Pushes a TLB invalidation job for execution, using @fence as a dependency.
@@ -230,11 +229,11 @@ int xe_tlb_inval_job_alloc_dep(struct xe_tlb_inval_job *job)
  * Return: Job's finished fence on success, cannot fail
  */
 struct dma_fence *xe_tlb_inval_job_push(struct xe_tlb_inval_job *job,
-					struct xe_migrate *m,
 					struct dma_fence *fence)
 {
 	struct xe_tlb_inval_fence *ifence =
 		container_of(job->fence, typeof(*ifence), base);
+	struct xe_cpu_bind *cpu_bind = gt_to_xe(job->q->gt)->cpu_bind;
 
 	if (!dma_fence_is_signaled(fence)) {
 		void *ptr;
@@ -258,11 +257,11 @@ struct dma_fence *xe_tlb_inval_job_push(struct xe_tlb_inval_job *job,
 	job->fence_armed = true;
 
 	/*
-	 * We need the migration lock to protect the job's seqno and the spsc
-	 * queue, only taken on migration queue, user queues protected dma-resv
+	 * We need the cpu_bind lock to protect the job's seqno and the spsc
+	 * queue, only taken on cpu_bind queue, user queues protected dma-resv
 	 * VM lock.
 	 */
-	xe_migrate_job_lock(m, job->q);
+	xe_cpu_bind_job_lock(cpu_bind, job->q);
 
 	/* Creation ref pairs with put in xe_tlb_inval_job_destroy */
 	xe_tlb_inval_fence_init(job->tlb_inval, ifence, false);
@@ -281,7 +280,7 @@ struct dma_fence *xe_tlb_inval_job_push(struct xe_tlb_inval_job *job,
 					       &job->dep.drm.s_fence->finished,
 					       job->idx);
 
-	xe_migrate_job_unlock(m, job->q);
+	xe_cpu_bind_job_unlock(cpu_bind, job->q);
 
 	/*
 	 * Not using job->fence, as it has its own dma-fence context, which does
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
index 2a4478f529e6..97e032ea21c3 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
+++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
@@ -11,7 +11,6 @@
 struct dma_fence;
 struct xe_dep_scheduler;
 struct xe_exec_queue;
-struct xe_migrate;
 struct xe_page_reclaim_list;
 struct xe_tlb_inval;
 struct xe_tlb_inval_job;
@@ -28,7 +27,6 @@ void xe_tlb_inval_job_add_page_reclaim(struct xe_tlb_inval_job *job,
 int xe_tlb_inval_job_alloc_dep(struct xe_tlb_inval_job *job);
 
 struct dma_fence *xe_tlb_inval_job_push(struct xe_tlb_inval_job *job,
-					struct xe_migrate *m,
 					struct dma_fence *fence);
 
 void xe_tlb_inval_job_get(struct xe_tlb_inval_job *job);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 52212b51caa8..b3928e05b70a 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -24,6 +24,7 @@
 #include "regs/xe_gtt_defs.h"
 #include "xe_assert.h"
 #include "xe_bo.h"
+#include "xe_cpu_bind.h"
 #include "xe_device.h"
 #include "xe_drm_client.h"
 #include "xe_exec_queue.h"
@@ -688,8 +689,6 @@ int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 	struct xe_vma *vma, *next;
 	struct xe_vma_ops vops;
 	struct xe_vma_op *op, *next_op;
-	struct xe_tile *tile;
-	u8 id;
 	int err;
 
 	lockdep_assert_held(&vm->lock);
@@ -697,12 +696,9 @@ int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 	    list_empty(&vm->rebind_list))
 		return 0;
 
-	xe_vma_ops_init(&vops, vm, NULL, NULL, 0);
-	for_each_tile(tile, vm->xe, id) {
-		vops.pt_update_ops[id].wait_vm_bookkeep = true;
-		vops.pt_update_ops[id].q =
-			xe_migrate_bind_queue(tile->migrate);
-	}
+	xe_vma_ops_init(&vops, vm, xe_cpu_bind_queue(vm->xe->cpu_bind),
+			NULL, 0);
+	vops.flags |= XE_VMA_OPS_FLAG_WAIT_VM_BOOKKEEP;
 
 	xe_vm_assert_held(vm);
 	list_for_each_entry(vma, &vm->rebind_list, combined_links.rebind) {
@@ -747,21 +743,16 @@ struct dma_fence *xe_vma_rebind(struct xe_vm *vm, struct xe_vma *vma, u8 tile_ma
 	struct dma_fence *fence = NULL;
 	struct xe_vma_ops vops;
 	struct xe_vma_op *op, *next_op;
-	struct xe_tile *tile;
-	u8 id;
 	int err;
 
 	lockdep_assert_held(&vm->lock);
 	xe_vm_assert_held(vm);
 	xe_assert(vm->xe, xe_vm_in_fault_mode(vm));
 
-	xe_vma_ops_init(&vops, vm, NULL, NULL, 0);
-	vops.flags |= XE_VMA_OPS_FLAG_SKIP_TLB_WAIT;
-	for_each_tile(tile, vm->xe, id) {
-		vops.pt_update_ops[id].wait_vm_bookkeep = true;
-		vops.pt_update_ops[tile->id].q =
-			xe_migrate_bind_queue(tile->migrate);
-	}
+	xe_vma_ops_init(&vops, vm, xe_cpu_bind_queue(vm->xe->cpu_bind),
+			NULL, 0);
+	vops.flags |= XE_VMA_OPS_FLAG_SKIP_TLB_WAIT |
+		XE_VMA_OPS_FLAG_WAIT_VM_BOOKKEEP;
 
 	err = xe_vm_ops_add_rebind(&vops, vma, tile_mask);
 	if (err)
@@ -837,8 +828,6 @@ struct dma_fence *xe_vm_range_rebind(struct xe_vm *vm,
 	struct dma_fence *fence = NULL;
 	struct xe_vma_ops vops;
 	struct xe_vma_op *op, *next_op;
-	struct xe_tile *tile;
-	u8 id;
 	int err;
 
 	lockdep_assert_held(&vm->lock);
@@ -846,13 +835,10 @@ struct dma_fence *xe_vm_range_rebind(struct xe_vm *vm,
 	xe_assert(vm->xe, xe_vm_in_fault_mode(vm));
 	xe_assert(vm->xe, xe_vma_is_cpu_addr_mirror(vma));
 
-	xe_vma_ops_init(&vops, vm, NULL, NULL, 0);
-	vops.flags |= XE_VMA_OPS_FLAG_SKIP_TLB_WAIT;
-	for_each_tile(tile, vm->xe, id) {
-		vops.pt_update_ops[id].wait_vm_bookkeep = true;
-		vops.pt_update_ops[tile->id].q =
-			xe_migrate_bind_queue(tile->migrate);
-	}
+	xe_vma_ops_init(&vops, vm, xe_cpu_bind_queue(vm->xe->cpu_bind),
+			NULL, 0);
+	vops.flags |= XE_VMA_OPS_FLAG_SKIP_TLB_WAIT |
+		XE_VMA_OPS_FLAG_WAIT_VM_BOOKKEEP;
 
 	err = xe_vm_ops_add_range_rebind(&vops, vma, range, tile_mask);
 	if (err)
@@ -919,8 +905,6 @@ struct dma_fence *xe_vm_range_unbind(struct xe_vm *vm,
 	struct dma_fence *fence = NULL;
 	struct xe_vma_ops vops;
 	struct xe_vma_op *op, *next_op;
-	struct xe_tile *tile;
-	u8 id;
 	int err;
 
 	lockdep_assert_held(&vm->lock);
@@ -930,12 +914,9 @@ struct dma_fence *xe_vm_range_unbind(struct xe_vm *vm,
 	if (!range->tile_present)
 		return dma_fence_get_stub();
 
-	xe_vma_ops_init(&vops, vm, NULL, NULL, 0);
-	for_each_tile(tile, vm->xe, id) {
-		vops.pt_update_ops[id].wait_vm_bookkeep = true;
-		vops.pt_update_ops[tile->id].q =
-			xe_migrate_bind_queue(tile->migrate);
-	}
+	xe_vma_ops_init(&vops, vm, xe_cpu_bind_queue(vm->xe->cpu_bind),
+			NULL, 0);
+	vops.flags |= XE_VMA_OPS_FLAG_WAIT_VM_BOOKKEEP;
 
 	err = xe_vm_ops_add_range_unbind(&vops, range);
 	if (err)
@@ -1555,9 +1536,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 
 	init_rwsem(&vm->exec_queues.lock);
 	xe_vm_init_prove_locking(xe, vm);
-
-	for_each_tile(tile, xe, id)
-		xe_range_fence_tree_init(&vm->rftree[id]);
+	xe_range_fence_tree_init(&vm->rftree);
 
 	vm->pt_ops = &xelp_pt_ops;
 
@@ -1701,8 +1680,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 	}
 err_no_resv:
 	mutex_destroy(&vm->snap_mutex);
-	for_each_tile(tile, xe, id)
-		xe_range_fence_tree_fini(&vm->rftree[id]);
+	xe_range_fence_tree_fini(&vm->rftree);
 	ttm_lru_bulk_move_fini(&xe->ttm, &vm->lru_bulk_move);
 	if (vm->xef)
 		xe_file_put(vm->xef);
@@ -1758,10 +1736,8 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 {
 	LIST_HEAD(contested);
 	struct xe_device *xe = vm->xe;
-	struct xe_tile *tile;
 	struct xe_vma *vma, *next_vma;
 	struct drm_gpuva *gpuva, *next;
-	u8 id;
 
 	xe_assert(xe, !vm->preempt.num_exec_queues);
 
@@ -1851,8 +1827,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 	}
 	up_write(&xe->usm.lock);
 
-	for_each_tile(tile, xe, id)
-		xe_range_fence_tree_fini(&vm->rftree[id]);
+	xe_range_fence_tree_fini(&vm->rftree);
 
 	xe_vm_put(vm);
 }
@@ -3141,23 +3116,16 @@ static void trace_xe_vm_ops_execute(struct xe_vma_ops *vops)
 
 static int vm_ops_setup_tile_args(struct xe_vm *vm, struct xe_vma_ops *vops)
 {
-	struct xe_exec_queue *q = vops->q;
 	struct xe_tile *tile;
 	int number_tiles = 0;
 	u8 id;
 
-	for_each_tile(tile, vm->xe, id) {
+	for_each_tile(tile, vm->xe, id)
 		if (vops->pt_update_ops[id].num_ops)
 			++number_tiles;
 
-		if (vops->pt_update_ops[id].q)
-			continue;
-
-		if (q)
-			vops->pt_update_ops[id].q = q;
-		else
-			vops->pt_update_ops[id].q = vm->q;
-	}
+	if (!vops->q)
+		vops->q = vm->q;
 
 	return number_tiles;
 }
@@ -3165,22 +3133,17 @@ static int vm_ops_setup_tile_args(struct xe_vm *vm, struct xe_vma_ops *vops)
 static struct dma_fence *ops_execute(struct xe_vm *vm,
 				     struct xe_vma_ops *vops)
 {
-	struct xe_tile *tile;
+	struct xe_device *xe = vm->xe;
 	struct dma_fence *fence = NULL;
 	struct dma_fence **fences = NULL;
 	struct dma_fence_array *cf = NULL;
-	int number_tiles = 0, current_fence = 0, n_fence = 0, err, i;
-	u8 id;
+	int current_fence = 0, n_fence = 1, err, i;
 
-	number_tiles = vm_ops_setup_tile_args(vm, vops);
-	if (number_tiles == 0)
+	if (!vm_ops_setup_tile_args(vm, vops))
 		return ERR_PTR(-ENODATA);
 
-	for_each_tile(tile, vm->xe, id)
-		++n_fence;
-
 	if (!(vops->flags & XE_VMA_OPS_FLAG_SKIP_TLB_WAIT)) {
-		for_each_tlb_inval(vops->pt_update_ops[0].q, i)
+		for_each_tlb_inval(vops->q, i)
 			++n_fence;
 	}
 
@@ -3196,71 +3159,40 @@ static struct dma_fence *ops_execute(struct xe_vm *vm,
 		goto err_out;
 	}
 
-	for_each_tile(tile, vm->xe, id) {
-		if (!vops->pt_update_ops[id].num_ops)
-			continue;
-
-		err = xe_pt_update_ops_prepare(tile, vops);
-		if (err) {
-			fence = ERR_PTR(err);
-			goto err_out;
-		}
+	err = xe_pt_update_ops_prepare(xe, vops);
+	if (err) {
+		fence = ERR_PTR(err);
+		goto err_out;
 	}
 
 	trace_xe_vm_ops_execute(vops);
 
-	for_each_tile(tile, vm->xe, id) {
-		struct xe_exec_queue *q = vops->pt_update_ops[tile->id].q;
-
-		fence = NULL;
-		if (!vops->pt_update_ops[id].num_ops)
-			goto collect_fences;
-
-		fence = xe_pt_update_ops_run(tile, vops);
-		if (IS_ERR(fence))
-			goto err_out;
-
-collect_fences:
-		fences[current_fence++] = fence ?: dma_fence_get_stub();
-		if (vops->flags & XE_VMA_OPS_FLAG_SKIP_TLB_WAIT)
-			continue;
+	fence = xe_pt_update_ops_run(xe, vops);
+	if (IS_ERR(fence))
+		goto err_out;
 
-		xe_migrate_job_lock(tile->migrate, q);
-		for_each_tlb_inval(q, i) {
-			if (i >= (tile->id + 1) * XE_MAX_GT_PER_TILE ||
-			    i < tile->id * XE_MAX_GT_PER_TILE)
-				continue;
+	fences[current_fence++] = fence;
 
-			fences[current_fence++] = fence ?
-				xe_exec_queue_tlb_inval_last_fence_get(q, vm, i) :
-				dma_fence_get_stub();
-		}
-		xe_migrate_job_unlock(tile->migrate, q);
+	if (!(vops->flags & XE_VMA_OPS_FLAG_SKIP_TLB_WAIT)) {
+		xe_cpu_bind_job_lock(xe->cpu_bind, vops->q);
+		for_each_tlb_inval(vops->q, i)
+			fences[current_fence++] =
+				xe_exec_queue_tlb_inval_last_fence_get(vops->q,
+								       vm, i);
+		xe_cpu_bind_job_unlock(xe->cpu_bind, vops->q);
 	}
 
-	xe_assert(vm->xe, current_fence == n_fence);
+	xe_assert(xe, current_fence == n_fence);
 	dma_fence_array_init(cf, n_fence, fences, dma_fence_context_alloc(1),
 			     1, false);
 	fence = &cf->base;
 
-	for_each_tile(tile, vm->xe, id) {
-		if (!vops->pt_update_ops[id].num_ops)
-			continue;
-
-		xe_pt_update_ops_fini(tile, vops);
-	}
+	xe_pt_update_ops_fini(xe, vops);
 
 	return fence;
 
 err_out:
-	for_each_tile(tile, vm->xe, id) {
-		if (!vops->pt_update_ops[id].num_ops)
-			continue;
-
-		xe_pt_update_ops_abort(tile, vops);
-	}
-	while (current_fence)
-		dma_fence_put(fences[--current_fence]);
+	xe_pt_update_ops_abort(xe, vops);
 	kfree(fences);
 	kfree(cf);
 
@@ -3553,6 +3485,8 @@ static void xe_vma_ops_init(struct xe_vma_ops *vops, struct xe_vm *vm,
 	vops->syncs = syncs;
 	vops->num_syncs = num_syncs;
 	vops->flags = 0;
+	vops->start = ~0x0ull;
+	vops->last = 0x0ull;
 }
 
 static int xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo *bo,
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 2c173550346a..b4593bd3fe58 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -264,7 +264,7 @@ struct xe_vm {
 	 * @rftree: range fence tree to track updates to page table structure.
 	 * Used to implement conflict tracking between independent bind engines.
 	 */
-	struct xe_range_fence_tree rftree[XE_MAX_TILES_PER_DEVICE];
+	struct xe_range_fence_tree rftree;
 
 	const struct xe_pt_ops *pt_ops;
 
@@ -492,12 +492,20 @@ struct xe_vma_ops {
 	u32 num_syncs;
 	/** @pt_update_ops: page table update operations */
 	struct xe_vm_pgtable_update_ops pt_update_ops[XE_MAX_TILES_PER_DEVICE];
+	/** @start: start address of ops */
+	u64 start;
+	/** @last: last address of ops */
+	u64 last;
 	/** @flag: signify the properties within xe_vma_ops*/
-#define XE_VMA_OPS_FLAG_HAS_SVM_PREFETCH BIT(0)
-#define XE_VMA_OPS_FLAG_MADVISE          BIT(1)
-#define XE_VMA_OPS_ARRAY_OF_BINDS	 BIT(2)
-#define XE_VMA_OPS_FLAG_SKIP_TLB_WAIT	 BIT(3)
-#define XE_VMA_OPS_FLAG_ALLOW_SVM_UNMAP  BIT(4)
+#define XE_VMA_OPS_FLAG_HAS_SVM_PREFETCH	BIT(0)
+#define XE_VMA_OPS_FLAG_MADVISE			BIT(1)
+#define XE_VMA_OPS_ARRAY_OF_BINDS		BIT(2)
+#define XE_VMA_OPS_FLAG_SKIP_TLB_WAIT		BIT(3)
+#define XE_VMA_OPS_FLAG_ALLOW_SVM_UNMAP		BIT(4)
+#define XE_VMA_OPS_FLAG_WAIT_VM_BOOKKEEP	BIT(5)
+#define XE_VMA_OPS_FLAG_WAIT_VM_KERNEL		BIT(6)
+#define XE_VMA_OPS_FLAG_NEEDS_INVALIDATION	BIT(7)
+#define XE_VMA_OPS_FLAG_NEEDS_SVM_LOCK		BIT(8)
 	u32 flags;
 #ifdef TEST_VM_OPS_ERROR
 	/** @inject_error: inject error to test error handling */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 17/25] drm/xe: Add device flag to enable PT mirroring across tiles
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (15 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 16/25] drm/xe: Add CPU bind layer Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-02-28  1:34 ` [PATCH v3 18/25] drm/xe: Add xe_hw_engine_write_ring_tail Matthew Brost
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Some multi-tile devices may want to mirror page tables across tiles for
memory-bandwidth reasons, while others may not. Add a device flag that
allows enabling or disabling page-table mirroring across tiles.

Setting the flag to true (the existing behavior) on PVC, but both modes
have been tested and are working on PVC.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h |  2 ++
 drivers/gpu/drm/xe/xe_migrate.c      |  5 ++--
 drivers/gpu/drm/xe/xe_pci.c          |  2 ++
 drivers/gpu/drm/xe/xe_pci_types.h    |  1 +
 drivers/gpu/drm/xe/xe_pt.c           | 38 ++++++++++++++++++++++++++--
 drivers/gpu/drm/xe/xe_vm.c           | 37 ++++++++++++++++++++++++---
 drivers/gpu/drm/xe/xe_vm.h           |  3 +++
 7 files changed, 80 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 776e9e190320..b3737dfcc45c 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -208,6 +208,8 @@ struct xe_device {
 		u8 has_usm:1;
 		/** @info.has_64bit_timestamp: Device supports 64-bit timestamps */
 		u8 has_64bit_timestamp:1;
+		/** @info.has_pt_mirror: Device has PT mirroring across tiles */
+		u8 has_pt_mirror:1;
 		/** @info.is_dgfx: is discrete device */
 		u8 is_dgfx:1;
 		/** @info.needs_scratch: needs scratch page for oob prefetch to work */
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index b5d4fc4d4c62..c9ee6325ec9d 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -183,7 +183,8 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 	struct xe_device *xe = tile_to_xe(tile);
 	u16 pat_index = xe->pat.idx[XE_CACHE_WB];
 	u8 id = tile->id;
-	u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level;
+	u32 num_entries = NUM_PT_SLOTS, num_level =
+		xe_vm_pt_root(vm, id)->level;
 #define VRAM_IDENTITY_MAP_COUNT	2
 	u32 num_setup = num_level + VRAM_IDENTITY_MAP_COUNT;
 #undef VRAM_IDENTITY_MAP_COUNT
@@ -210,7 +211,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 	/* PT30 & PT31 reserved for 2M identity map */
 	pt29_ofs = xe_bo_size(bo) - 3 * XE_PAGE_SIZE;
 	entry = vm->pt_ops->pde_encode_bo(bo, pt29_ofs);
-	xe_pt_write(xe, &vm->pt_root[id]->bo->vmap, 0, entry);
+	xe_pt_write(xe, &xe_vm_pt_root(vm, id)->bo->vmap, 0, entry);
 
 	map_ofs = (num_entries - num_setup) * XE_PAGE_SIZE;
 
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 3ac99472d6dd..b8c8953833a5 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -352,6 +352,7 @@ static const __maybe_unused struct xe_device_desc pvc_desc = {
 	.dma_mask_size = 52,
 	.has_display = false,
 	.has_gsc_nvm = 1,
+	.has_pt_mirror = 1,
 	.has_heci_gscfi = 1,
 	.max_gt_per_tile = 1,
 	.max_remote_tiles = 1,
@@ -761,6 +762,7 @@ static int xe_info_init_early(struct xe_device *xe,
 	xe->info.has_soc_remapper_telem = desc->has_soc_remapper_telem;
 	xe->info.has_sriov = xe_configfs_primary_gt_allowed(to_pci_dev(xe->drm.dev)) &&
 		desc->has_sriov;
+	xe->info.has_pt_mirror = desc->has_pt_mirror;
 	xe->info.skip_guc_pc = desc->skip_guc_pc;
 	xe->info.skip_mtcfg = desc->skip_mtcfg;
 	xe->info.skip_pcode = desc->skip_pcode;
diff --git a/drivers/gpu/drm/xe/xe_pci_types.h b/drivers/gpu/drm/xe/xe_pci_types.h
index 47e8a1552c2b..14b688fbb4c1 100644
--- a/drivers/gpu/drm/xe/xe_pci_types.h
+++ b/drivers/gpu/drm/xe/xe_pci_types.h
@@ -57,6 +57,7 @@ struct xe_device_desc {
 	u8 has_soc_remapper_sysctrl:1;
 	u8 has_soc_remapper_telem:1;
 	u8 has_sriov:1;
+	u8 has_pt_mirror:1;
 	u8 needs_scratch:1;
 	u8 skip_guc_pc:1;
 	u8 skip_mtcfg:1;
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index d91d80c92957..ef34fbfc14f0 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -734,7 +734,7 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 		.wupd.entries = entries,
 		.clear_pt = clear_pt,
 	};
-	struct xe_pt *pt = vm->pt_root[tile->id];
+	struct xe_pt *pt = xe_vm_pt_root(vm, tile->id);
 	int ret;
 
 	if (range) {
@@ -895,6 +895,11 @@ static int xe_pt_zap_ptes_entry(struct xe_ptw *parent, pgoff_t offset,
 	return 0;
 }
 
+static bool pt_mirroring_disabled_for_tile(struct xe_vm *vm, u8 tile_id)
+{
+	return xe_vm_pt_root(vm, tile_id) != vm->pt_root[tile_id];
+}
+
 static const struct xe_pt_walk_ops xe_pt_zap_ptes_ops = {
 	.pt_entry = xe_pt_zap_ptes_entry,
 };
@@ -936,6 +941,9 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma)
 	if (!(pt_mask & BIT(tile->id)))
 		return false;
 
+	if (pt_mirroring_disabled_for_tile(xe_vma_vm(vma), tile->id))
+		return true;
+
 	(void)xe_pt_walk_shared(&pt->base, pt->level, xe_vma_start(vma),
 				xe_vma_end(vma), &xe_walk.base);
 
@@ -988,6 +996,9 @@ bool xe_pt_zap_ptes_range(struct xe_tile *tile, struct xe_vm *vm,
 	if (!(pt_mask & BIT(tile->id)))
 		return false;
 
+	if (pt_mirroring_disabled_for_tile(vm, tile->id))
+		return true;
+
 	(void)xe_pt_walk_shared(&pt->base, pt->level, xe_svm_range_start(range),
 				xe_svm_range_end(range), &xe_walk.base);
 
@@ -1803,7 +1814,7 @@ static unsigned int xe_pt_stage_unbind(struct xe_tile *tile,
 		.wupd.entries = entries,
 		.prl = pt_update_op->prl,
 	};
-	struct xe_pt *pt = vm->pt_root[tile->id];
+	struct xe_pt *pt = xe_vm_pt_root(vm, tile->id);
 
 	(void)xe_pt_walk_shared(&pt->base, pt->level, start, end,
 				&xe_walk.base);
@@ -2341,9 +2352,20 @@ int xe_pt_update_ops_prepare(struct xe_device *xe, struct xe_vma_ops *vops)
 	int id, err;
 
 	for_each_tile(tile, xe, id) {
+		struct xe_vm_pgtable_update_ops *pt_update_ops =
+			&vops->pt_update_ops[id];
+
 		if (!vops->pt_update_ops[id].num_ops)
 			continue;
 
+		if (pt_mirroring_disabled_for_tile(vops->vm, id)) {
+			struct xe_page_reclaim_list *prl = &pt_update_ops->prl;
+
+			/* Transfer root PT update ops PRL to current */
+			*prl = vops->pt_update_ops[0].prl;
+			continue;
+		}
+
 		err = __xe_pt_update_ops_prepare(tile, vops);
 		if (err)
 			return err;
@@ -2635,6 +2657,12 @@ xe_pt_update_ops_run(struct xe_device *xe, struct xe_vma_ops *vops)
 		struct xe_vm_pgtable_update_ops *pt_update_ops =
 			&vops->pt_update_ops[j];
 
+		if (pt_mirroring_disabled_for_tile(vm, j)) {
+			xe_tile_assert(tile, !get_current_op(pt_update_ops));
+			tile_mask |= BIT(tile->id);
+			continue;
+		}
+
 		for (i = 0; i < get_current_op(pt_update_ops); ++i) {
 			struct xe_vm_pgtable_update_op *pt_op =
 				to_pt_op(pt_update_ops, i);
@@ -2724,6 +2752,9 @@ void xe_pt_update_ops_fini(struct xe_device *xe, struct xe_vma_ops *vops)
 		if (!pt_update_ops->num_ops)
 			continue;
 
+		if (pt_mirroring_disabled_for_tile(vops->vm, id))
+			continue;
+
 		xe_page_reclaim_entries_put(pt_update_ops->prl.entries);
 	}
 }
@@ -2748,6 +2779,9 @@ void xe_pt_update_ops_abort(struct xe_device *xe, struct xe_vma_ops *vops)
 			&vops->pt_update_ops[id];
 		int i;
 
+		if (pt_mirroring_disabled_for_tile(vops->vm, id))
+			continue;
+
 		for (i = pt_update_ops->num_ops - 1; i >= 0; --i) {
 			struct xe_vm_pgtable_update_op *pt_op =
 				to_pt_op(pt_update_ops, i);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index b3928e05b70a..d4629e953b01 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -738,6 +738,14 @@ int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 	return err;
 }
 
+static u8 adjust_rebind_tile_mask(struct xe_vm *vm, u8 tile_mask)
+{
+	if (vm->xe->info.has_pt_mirror)
+		return tile_mask;
+
+	return (0x1 << vm->xe->info.max_gt_per_tile) - 1;
+}
+
 struct dma_fence *xe_vma_rebind(struct xe_vm *vm, struct xe_vma *vma, u8 tile_mask)
 {
 	struct dma_fence *fence = NULL;
@@ -754,7 +762,8 @@ struct dma_fence *xe_vma_rebind(struct xe_vm *vm, struct xe_vma *vma, u8 tile_ma
 	vops.flags |= XE_VMA_OPS_FLAG_SKIP_TLB_WAIT |
 		XE_VMA_OPS_FLAG_WAIT_VM_BOOKKEEP;
 
-	err = xe_vm_ops_add_rebind(&vops, vma, tile_mask);
+	err = xe_vm_ops_add_rebind(&vops, vma,
+				   adjust_rebind_tile_mask(vm, tile_mask));
 	if (err)
 		return ERR_PTR(err);
 
@@ -840,7 +849,8 @@ struct dma_fence *xe_vm_range_rebind(struct xe_vm *vm,
 	vops.flags |= XE_VMA_OPS_FLAG_SKIP_TLB_WAIT |
 		XE_VMA_OPS_FLAG_WAIT_VM_BOOKKEEP;
 
-	err = xe_vm_ops_add_range_rebind(&vops, vma, range, tile_mask);
+	err = xe_vm_ops_add_range_rebind(&vops, vma, range,
+					 adjust_rebind_tile_mask(vm, tile_mask));
 	if (err)
 		return ERR_PTR(err);
 
@@ -1578,7 +1588,8 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 
 		for_each_tile(tile, xe, id) {
 			if (flags & XE_VM_FLAG_MIGRATION &&
-			    tile->id != XE_VM_FLAG_TILE_ID(flags))
+			    tile->id != XE_VM_FLAG_TILE_ID(flags) &&
+			    (vm->xe->info.has_pt_mirror || id))
 				continue;
 
 			vm->pt_root[id] = xe_pt_create(vm, tile, xe->info.vm_max_level,
@@ -1887,7 +1898,7 @@ struct xe_vm *xe_vm_lookup(struct xe_file *xef, u32 id)
 
 u64 xe_vm_pdp4_descriptor(struct xe_vm *vm, struct xe_tile *tile)
 {
-	return vm->pt_ops->pde_encode_bo(vm->pt_root[tile->id]->bo, 0);
+	return vm->pt_ops->pde_encode_bo(xe_vm_pt_root(vm, tile->id)->bo, 0);
 }
 
 static struct xe_exec_queue *
@@ -4583,3 +4594,21 @@ void xe_vm_remove_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
 	}
 	up_write(&vm->exec_queues.lock);
 }
+
+/**
+ * xe_vm_pt_root() - Retrieve VM page-table root
+ * @vm: The VM.
+ * @tile_id: Tile ID
+ *
+ * Retrieve VM page-table root for a tile ID, used to abstract if PT mirroring is
+ * enabled across tiles.
+ *
+ * Return: VM page-table root for a tile ID
+ */
+struct xe_pt *xe_vm_pt_root(struct xe_vm *vm, u8 tile_id)
+{
+	if (vm->xe->info.has_pt_mirror)
+		return vm->pt_root[tile_id];
+
+	return vm->pt_root[0];
+}
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index f849e369432b..5a22f74ff332 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -425,4 +425,7 @@ static inline struct drm_exec *xe_vm_validation_exec(struct xe_vm *vm)
 	((READ_ONCE(tile_present) & ~READ_ONCE(tile_invalidated)) & BIT((tile)->id))
 
 void xe_vma_mem_attr_copy(struct xe_vma_mem_attr *to, struct xe_vma_mem_attr *from);
+
+struct xe_pt *xe_vm_pt_root(struct xe_vm *vm, u8 tile_id);
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 18/25] drm/xe: Add xe_hw_engine_write_ring_tail
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (16 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 17/25] drm/xe: Add device flag to enable PT mirroring across tiles Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-02-28  1:34 ` [PATCH v3 19/25] drm/xe: Add ULLS support to LRC Matthew Brost
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

ULLS for migration jobs need to directly set hw engine ring tail, add
function to support this.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_hw_engine.c | 10 ++++++++++
 drivers/gpu/drm/xe/xe_hw_engine.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index ea3ad600d7c7..253c65583b7f 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -303,6 +303,16 @@ void xe_hw_engine_mmio_write32(struct xe_hw_engine *hwe,
 	xe_mmio_write32(&hwe->gt->mmio, reg, val);
 }
 
+/**
+ * xe_hw_engine_write_ring_tail() - Write ring tail
+ * @hwe: engine
+ * @val: desired 32-bit value to write
+ */
+void xe_hw_engine_write_ring_tail(struct xe_hw_engine *hwe, u32 val)
+{
+	xe_hw_engine_mmio_write32(hwe, RING_TAIL(0), val);
+}
+
 /**
  * xe_hw_engine_mmio_read32() - Read engine register
  * @hwe: engine
diff --git a/drivers/gpu/drm/xe/xe_hw_engine.h b/drivers/gpu/drm/xe/xe_hw_engine.h
index 6b5f9fa2a594..b93c3eabca06 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.h
+++ b/drivers/gpu/drm/xe/xe_hw_engine.h
@@ -78,5 +78,6 @@ enum xe_force_wake_domains xe_hw_engine_to_fw_domain(struct xe_hw_engine *hwe);
 
 void xe_hw_engine_mmio_write32(struct xe_hw_engine *hwe, struct xe_reg reg, u32 val);
 u32 xe_hw_engine_mmio_read32(struct xe_hw_engine *hwe, struct xe_reg reg);
+void xe_hw_engine_write_ring_tail(struct xe_hw_engine *hwe, u32 val);
 
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 19/25] drm/xe: Add ULLS support to LRC
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (17 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 18/25] drm/xe: Add xe_hw_engine_write_ring_tail Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-03-05 20:21   ` Francois Dugast
  2026-02-28  1:34 ` [PATCH v3 20/25] drm/xe: Add ULLS migration job support to migration layer Matthew Brost
                   ` (11 subsequent siblings)
  30 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Define memory layout for ULLS semaphores stored in LRC memory. Add
support functions to return GGTT address and set semaphore based on a
job's seqno.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_lrc.c       | 51 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_lrc.h       |  3 ++
 drivers/gpu/drm/xe/xe_lrc_types.h |  4 +++
 3 files changed, 58 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 384f9b31421e..44fb600bd228 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -718,6 +718,7 @@ u32 xe_lrc_pphwsp_offset(struct xe_lrc *lrc)
 #define LRC_CTX_JOB_TIMESTAMP_OFFSET 512
 #define LRC_ENGINE_ID_PPHWSP_OFFSET 1024
 #define LRC_PARALLEL_PPHWSP_OFFSET 2048
+#define LRC_ULLS_PPHWSP_OFFSET 2048	/* Mutually exclusive with parallel */
 
 #define LRC_SEQNO_OFFSET 0
 #define LRC_START_SEQNO_OFFSET (LRC_SEQNO_OFFSET + 8)
@@ -773,6 +774,12 @@ static inline u32 __xe_lrc_engine_id_offset(struct xe_lrc *lrc)
 	return xe_lrc_pphwsp_offset(lrc) + LRC_ENGINE_ID_PPHWSP_OFFSET;
 }
 
+static u32 __xe_lrc_ulls_offset(struct xe_lrc *lrc)
+{
+	/* The ulls is stored in the driver-defined portion of PPHWSP */
+	return xe_lrc_pphwsp_offset(lrc) + LRC_ULLS_PPHWSP_OFFSET;
+}
+
 static u32 __xe_lrc_ctx_timestamp_offset(struct xe_lrc *lrc)
 {
 	return __xe_lrc_regs_offset(lrc) + CTX_TIMESTAMP * sizeof(u32);
@@ -830,6 +837,7 @@ DECL_MAP_ADDR_HELPERS(ctx_job_timestamp, lrc->bo)
 DECL_MAP_ADDR_HELPERS(ctx_timestamp, lrc->bo)
 DECL_MAP_ADDR_HELPERS(ctx_timestamp_udw, lrc->bo)
 DECL_MAP_ADDR_HELPERS(parallel, lrc->bo)
+DECL_MAP_ADDR_HELPERS(ulls, lrc->bo)
 DECL_MAP_ADDR_HELPERS(indirect_ring, lrc->bo)
 DECL_MAP_ADDR_HELPERS(engine_id, lrc->bo)
 
@@ -1860,6 +1868,49 @@ static u32 xe_lrc_engine_id(struct xe_lrc *lrc)
 	return xe_map_read32(xe, &map);
 }
 
+#define semaphore_offset(seqno) \
+	(sizeof(u32) * ((seqno) % LRC_MIGRATION_ULLS_SEMAPORE_COUNT))
+
+/**
+ * xe_lrc_ulls_semaphore_ggtt_addr() - ULLS semaphore GGTT address
+ * @lrc: Pointer to the lrc.
+ * @seqno: seqno of current job.
+ *
+ * Calculate ULLS semaphore GGTT address based on input seqno
+ *
+ * Returns: ULLS semaphore GGTT address
+ */
+u32 xe_lrc_ulls_semaphore_ggtt_addr(struct xe_lrc *lrc, u32 seqno)
+{
+	xe_assert(lrc_to_xe(lrc), semaphore_offset(seqno) <
+		  LRC_PPHWSP_SIZE - LRC_ULLS_PPHWSP_OFFSET);
+
+	return __xe_lrc_ulls_ggtt_addr(lrc) + semaphore_offset(seqno);
+}
+
+/**
+ * xe_lrc_set_ulls_semaphore() - Set ULLS semaphore
+ * @lrc: Pointer to the lrc.
+ * @seqno: seqno of current job.
+ *
+ * Set ULLS semaphore based on input seqno
+ */
+void xe_lrc_set_ulls_semaphore(struct xe_lrc *lrc, u32 seqno)
+{
+	struct xe_device *xe = lrc_to_xe(lrc);
+	struct iosys_map map = __xe_lrc_ulls_map(lrc);
+
+	xe_assert(xe, semaphore_offset(seqno) <
+		  LRC_PPHWSP_SIZE - LRC_ULLS_PPHWSP_OFFSET);
+
+	xe_device_wmb(xe);	/* Ensure everything before in code is ordered */
+
+	iosys_map_incr(&map, semaphore_offset(seqno));
+	xe_map_write32(xe, &map, LRC_MIGRATION_ULLS_SEMAPORE_SINGAL);
+
+	xe_device_wmb(xe);	/* Flush write to hardware */
+}
+
 static int instr_dw(u32 cmd_header)
 {
 	/* GFXPIPE "SINGLE_DW" opcodes are a single dword */
diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
index 48f7c26cf129..9e51222191ea 100644
--- a/drivers/gpu/drm/xe/xe_lrc.h
+++ b/drivers/gpu/drm/xe/xe_lrc.h
@@ -111,6 +111,9 @@ void xe_default_lrc_update_memirq_regs_with_address(struct xe_hw_engine *hwe);
 void xe_lrc_update_memirq_regs_with_address(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 					    u32 *regs);
 
+u32 xe_lrc_ulls_semaphore_ggtt_addr(struct xe_lrc *lrc, u32 seqno);
+void xe_lrc_set_ulls_semaphore(struct xe_lrc *lrc, u32 seqno);
+
 u32 xe_lrc_read_ctx_reg(struct xe_lrc *lrc, int reg_nr);
 void xe_lrc_write_ctx_reg(struct xe_lrc *lrc, int reg_nr, u32 val);
 
diff --git a/drivers/gpu/drm/xe/xe_lrc_types.h b/drivers/gpu/drm/xe/xe_lrc_types.h
index 5a718f759ed6..7cf84e32f998 100644
--- a/drivers/gpu/drm/xe/xe_lrc_types.h
+++ b/drivers/gpu/drm/xe/xe_lrc_types.h
@@ -12,6 +12,10 @@
 
 struct xe_bo;
 
+#define LRC_MIGRATION_ULLS_SEMAPORE_COUNT	64	/* Must be pow2 */
+#define LRC_MIGRATION_ULLS_SEMAPORE_CLEAR	0
+#define LRC_MIGRATION_ULLS_SEMAPORE_SINGAL	1
+
 /**
  * struct xe_lrc - Logical ring context (LRC) and submission ring object
  */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 19/25] drm/xe: Add ULLS support to LRC
  2026-02-28  1:34 ` [PATCH v3 19/25] drm/xe: Add ULLS support to LRC Matthew Brost
@ 2026-03-05 20:21   ` Francois Dugast
  0 siblings, 0 replies; 63+ messages in thread
From: Francois Dugast @ 2026-03-05 20:21 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom

On Fri, Feb 27, 2026 at 05:34:55PM -0800, Matthew Brost wrote:
> Define memory layout for ULLS semaphores stored in LRC memory. Add
> support functions to return GGTT address and set semaphore based on a
> job's seqno.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_lrc.c       | 51 +++++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_lrc.h       |  3 ++
>  drivers/gpu/drm/xe/xe_lrc_types.h |  4 +++
>  3 files changed, 58 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
> index 384f9b31421e..44fb600bd228 100644
> --- a/drivers/gpu/drm/xe/xe_lrc.c
> +++ b/drivers/gpu/drm/xe/xe_lrc.c
> @@ -718,6 +718,7 @@ u32 xe_lrc_pphwsp_offset(struct xe_lrc *lrc)
>  #define LRC_CTX_JOB_TIMESTAMP_OFFSET 512
>  #define LRC_ENGINE_ID_PPHWSP_OFFSET 1024
>  #define LRC_PARALLEL_PPHWSP_OFFSET 2048
> +#define LRC_ULLS_PPHWSP_OFFSET 2048	/* Mutually exclusive with parallel */
>  
>  #define LRC_SEQNO_OFFSET 0
>  #define LRC_START_SEQNO_OFFSET (LRC_SEQNO_OFFSET + 8)
> @@ -773,6 +774,12 @@ static inline u32 __xe_lrc_engine_id_offset(struct xe_lrc *lrc)
>  	return xe_lrc_pphwsp_offset(lrc) + LRC_ENGINE_ID_PPHWSP_OFFSET;
>  }
>  
> +static u32 __xe_lrc_ulls_offset(struct xe_lrc *lrc)
> +{
> +	/* The ulls is stored in the driver-defined portion of PPHWSP */
> +	return xe_lrc_pphwsp_offset(lrc) + LRC_ULLS_PPHWSP_OFFSET;
> +}
> +
>  static u32 __xe_lrc_ctx_timestamp_offset(struct xe_lrc *lrc)
>  {
>  	return __xe_lrc_regs_offset(lrc) + CTX_TIMESTAMP * sizeof(u32);
> @@ -830,6 +837,7 @@ DECL_MAP_ADDR_HELPERS(ctx_job_timestamp, lrc->bo)
>  DECL_MAP_ADDR_HELPERS(ctx_timestamp, lrc->bo)
>  DECL_MAP_ADDR_HELPERS(ctx_timestamp_udw, lrc->bo)
>  DECL_MAP_ADDR_HELPERS(parallel, lrc->bo)
> +DECL_MAP_ADDR_HELPERS(ulls, lrc->bo)
>  DECL_MAP_ADDR_HELPERS(indirect_ring, lrc->bo)
>  DECL_MAP_ADDR_HELPERS(engine_id, lrc->bo)
>  
> @@ -1860,6 +1868,49 @@ static u32 xe_lrc_engine_id(struct xe_lrc *lrc)
>  	return xe_map_read32(xe, &map);
>  }
>  
> +#define semaphore_offset(seqno) \
> +	(sizeof(u32) * ((seqno) % LRC_MIGRATION_ULLS_SEMAPORE_COUNT))
> +
> +/**
> + * xe_lrc_ulls_semaphore_ggtt_addr() - ULLS semaphore GGTT address
> + * @lrc: Pointer to the lrc.
> + * @seqno: seqno of current job.
> + *
> + * Calculate ULLS semaphore GGTT address based on input seqno
> + *
> + * Returns: ULLS semaphore GGTT address
> + */
> +u32 xe_lrc_ulls_semaphore_ggtt_addr(struct xe_lrc *lrc, u32 seqno)
> +{
> +	xe_assert(lrc_to_xe(lrc), semaphore_offset(seqno) <
> +		  LRC_PPHWSP_SIZE - LRC_ULLS_PPHWSP_OFFSET);
> +
> +	return __xe_lrc_ulls_ggtt_addr(lrc) + semaphore_offset(seqno);
> +}
> +
> +/**
> + * xe_lrc_set_ulls_semaphore() - Set ULLS semaphore
> + * @lrc: Pointer to the lrc.
> + * @seqno: seqno of current job.
> + *
> + * Set ULLS semaphore based on input seqno
> + */
> +void xe_lrc_set_ulls_semaphore(struct xe_lrc *lrc, u32 seqno)
> +{
> +	struct xe_device *xe = lrc_to_xe(lrc);
> +	struct iosys_map map = __xe_lrc_ulls_map(lrc);
> +
> +	xe_assert(xe, semaphore_offset(seqno) <
> +		  LRC_PPHWSP_SIZE - LRC_ULLS_PPHWSP_OFFSET);
> +
> +	xe_device_wmb(xe);	/* Ensure everything before in code is ordered */
> +
> +	iosys_map_incr(&map, semaphore_offset(seqno));
> +	xe_map_write32(xe, &map, LRC_MIGRATION_ULLS_SEMAPORE_SINGAL);
> +
> +	xe_device_wmb(xe);	/* Flush write to hardware */
> +}
> +
>  static int instr_dw(u32 cmd_header)
>  {
>  	/* GFXPIPE "SINGLE_DW" opcodes are a single dword */
> diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
> index 48f7c26cf129..9e51222191ea 100644
> --- a/drivers/gpu/drm/xe/xe_lrc.h
> +++ b/drivers/gpu/drm/xe/xe_lrc.h
> @@ -111,6 +111,9 @@ void xe_default_lrc_update_memirq_regs_with_address(struct xe_hw_engine *hwe);
>  void xe_lrc_update_memirq_regs_with_address(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
>  					    u32 *regs);
>  
> +u32 xe_lrc_ulls_semaphore_ggtt_addr(struct xe_lrc *lrc, u32 seqno);
> +void xe_lrc_set_ulls_semaphore(struct xe_lrc *lrc, u32 seqno);
> +
>  u32 xe_lrc_read_ctx_reg(struct xe_lrc *lrc, int reg_nr);
>  void xe_lrc_write_ctx_reg(struct xe_lrc *lrc, int reg_nr, u32 val);
>  
> diff --git a/drivers/gpu/drm/xe/xe_lrc_types.h b/drivers/gpu/drm/xe/xe_lrc_types.h
> index 5a718f759ed6..7cf84e32f998 100644
> --- a/drivers/gpu/drm/xe/xe_lrc_types.h
> +++ b/drivers/gpu/drm/xe/xe_lrc_types.h
> @@ -12,6 +12,10 @@
>  
>  struct xe_bo;
>  
> +#define LRC_MIGRATION_ULLS_SEMAPORE_COUNT	64	/* Must be pow2 */
> +#define LRC_MIGRATION_ULLS_SEMAPORE_CLEAR	0
> +#define LRC_MIGRATION_ULLS_SEMAPORE_SINGAL	1

s/SEMAPORE/SEMAPHORE/
s/SINGAL/SIGNAL/

> +
>  /**
>   * struct xe_lrc - Logical ring context (LRC) and submission ring object
>   */
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 20/25] drm/xe: Add ULLS migration job support to migration layer
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (18 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 19/25] drm/xe: Add ULLS support to LRC Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-03-05 23:34   ` Summers, Stuart
  2026-02-28  1:34 ` [PATCH v3 21/25] drm/xe: Add MI_SEMAPHORE_WAIT instruction defs Matthew Brost
                   ` (10 subsequent siblings)
  30 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Add function to enter ULLS mode for migration job and delayed worker to
exit (power saving). ULLS mode expected to entered upon page fault or
SVM prefetch. ULLS mode exit delay is currently set to 5us.

ULLS mode only support on DGFX and USM platforms where a hardware engine
is reserved for migrations jobs. When in ULLS mode, set several flags on
migration jobs so submission backend / ring ops can properly submit in
ULLS mode.

Upon ULLS mode enter, send a job trigger waiting a semphore pipling
initial GuC / HW conetxt switch.

Upon ULLS mode exit, send a job to trigger that current ULLS
semaphore so the ring can be taken off the hardware.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c      |   5 +-
 drivers/gpu/drm/xe/xe_exec_queue.h      |   4 +-
 drivers/gpu/drm/xe/xe_migrate.c         | 180 ++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_migrate.h         |   2 +
 drivers/gpu/drm/xe/xe_pt.c              |   2 +-
 drivers/gpu/drm/xe/xe_sched_job_types.h |   6 +
 drivers/gpu/drm/xe/xe_vm.c              |   2 +-
 7 files changed, 195 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index ee2119cf45c1..4fa99f12c566 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -1348,6 +1348,7 @@ bool xe_exec_queue_is_lr(struct xe_exec_queue *q)
 /**
  * xe_exec_queue_is_idle() - Whether an exec_queue is idle.
  * @q: The exec_queue
+ * @extra_jobs: Extra jobs on the queue
  *
  * FIXME: Need to determine what to use as the short-lived
  * timeline lock for the exec_queues, so that the return value
@@ -1359,9 +1360,9 @@ bool xe_exec_queue_is_lr(struct xe_exec_queue *q)
  *
  * Return: True if the exec_queue is idle, false otherwise.
  */
-bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
+bool xe_exec_queue_is_idle(struct xe_exec_queue *q, int extra_jobs)
 {
-	return !atomic_read(&q->job_cnt);
+	return !(atomic_read(&q->job_cnt) - extra_jobs);
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
index b5aabab388c1..a11648b62a98 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue.h
@@ -116,7 +116,7 @@ static inline struct xe_exec_queue *xe_exec_queue_multi_queue_primary(struct xe_
 
 bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
 
-bool xe_exec_queue_is_idle(struct xe_exec_queue *q);
+bool xe_exec_queue_is_idle(struct xe_exec_queue *q, int extra_jobs);
 
 void xe_exec_queue_kill(struct xe_exec_queue *q);
 
@@ -176,7 +176,7 @@ struct xe_lrc *xe_exec_queue_get_lrc(struct xe_exec_queue *q, u16 idx);
  */
 static inline bool xe_exec_queue_idle_skip_suspend(struct xe_exec_queue *q)
 {
-	return !xe_exec_queue_is_parallel(q) && xe_exec_queue_is_idle(q);
+	return !xe_exec_queue_is_parallel(q) && xe_exec_queue_is_idle(q, 0);
 }
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index c9ee6325ec9d..62f27868f56b 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -8,6 +8,7 @@
 #include <linux/bitfield.h>
 #include <linux/sizes.h>
 
+#include <drm/drm_drv.h>
 #include <drm/drm_managed.h>
 #include <drm/drm_pagemap.h>
 #include <drm/ttm/ttm_tt.h>
@@ -23,6 +24,7 @@
 #include "xe_bb.h"
 #include "xe_bo.h"
 #include "xe_exec_queue.h"
+#include "xe_force_wake.h"
 #include "xe_ggtt.h"
 #include "xe_gt.h"
 #include "xe_gt_printk.h"
@@ -30,6 +32,7 @@
 #include "xe_lrc.h"
 #include "xe_map.h"
 #include "xe_mocs.h"
+#include "xe_pm.h"
 #include "xe_printk.h"
 #include "xe_pt.h"
 #include "xe_res_cursor.h"
@@ -75,6 +78,14 @@ struct xe_migrate {
 	struct dma_fence *fence;
 	/** @min_chunk_size: For dgfx, Minimum chunk size */
 	u64 min_chunk_size;
+	/** @ulls: ULLS support */
+	struct {
+		/** @ulls.enabled: ULLS is enabled */
+		bool enabled;
+#define ULLS_EXIT_JIFFIES	(HZ / 50)
+		/** @ulls.exit_work: ULLS exit worker */
+		struct delayed_work exit_work;
+	} ulls;
 };
 
 #define MAX_PREEMPTDISABLE_TRANSFER SZ_8M /* Around 1ms. */
@@ -96,6 +107,16 @@ struct xe_migrate {
 static void xe_migrate_fini(void *arg)
 {
 	struct xe_migrate *m = arg;
+	struct xe_device *xe = tile_to_xe(m->tile);
+
+	disable_delayed_work_sync(&m->ulls.exit_work);
+	mutex_lock(&m->job_mutex);
+	if (m->ulls.enabled) {
+		xe_force_wake_put(gt_to_fw(m->q->hwe->gt), m->q->hwe->domain);
+		xe_pm_runtime_put(xe);
+		m->ulls.enabled = false;
+	}
+	mutex_unlock(&m->job_mutex);
 
 	xe_vm_lock(m->q->vm, false);
 	xe_bo_unpin(m->pt_bo);
@@ -410,6 +431,140 @@ static int xe_migrate_lock_prepare_vm(struct xe_tile *tile, struct xe_migrate *m
 	return err;
 }
 
+/**
+ * xe_migrate_ulls_enter() - Enter ULLS mode
+ * @m: The migration context.
+ *
+ * If DGFX and not a VF, enter ULLS mode bypassing GuC / HW context
+ * switches by utilizing semaphore and continuously running batches.
+ */
+void xe_migrate_ulls_enter(struct xe_migrate *m)
+{
+	struct xe_device *xe = tile_to_xe(m->tile);
+	struct xe_sched_job *job = NULL;
+	u64 batch_addr[2] = { 0, 0 };
+	bool alloc = false;
+
+	xe_assert(xe, xe->info.has_usm);
+
+	if (!IS_DGFX(xe) || IS_SRIOV_VF(xe))
+		return;
+
+job_alloc:
+	if (alloc) {
+		/*
+		 * Must be done outside job_mutex as that lock is tainted with
+		 * reclaim.
+		 */
+		job = xe_sched_job_create(m->q, batch_addr);
+		if (WARN_ON_ONCE(IS_ERR(job)))
+			return;		/* Not fatal */
+	}
+
+	mutex_lock(&m->job_mutex);
+	if (!m->ulls.enabled) {
+		unsigned int fw_ref;
+
+		if (!job) {
+			alloc = true;
+			mutex_unlock(&m->job_mutex);
+			goto job_alloc;
+		}
+
+		/* Pairs with FW put on ULLS exit */
+		fw_ref = xe_force_wake_get(gt_to_fw(m->q->hwe->gt),
+					   m->q->hwe->domain);
+		if (fw_ref) {
+			struct xe_device *xe = tile_to_xe(m->tile);
+			struct dma_fence *fence;
+
+			/* Pairs with PM put on ULLS exit */
+			xe_pm_runtime_get_noresume(xe);
+
+			xe_sched_job_get(job);
+			xe_sched_job_arm(job);
+			job->is_ulls = true;
+			job->is_ulls_first = true;
+			fence = dma_fence_get(&job->drm.s_fence->finished);
+			xe_sched_job_push(job);
+
+			dma_fence_put(fence);
+
+			xe_dbg(xe, "Migrate ULLS mode enter");
+			m->ulls.enabled = true;
+		}
+	}
+	if (job)
+		xe_sched_job_put(job);
+	if (m->ulls.enabled)
+		mod_delayed_work(system_percpu_wq, &m->ulls.exit_work,
+				 ULLS_EXIT_JIFFIES);
+	mutex_unlock(&m->job_mutex);
+}
+
+static void xe_migrate_ulls_exit(struct work_struct *work)
+{
+	struct xe_migrate *m = container_of(work, struct xe_migrate,
+					    ulls.exit_work.work);
+	struct xe_device *xe = tile_to_xe(m->tile);
+	struct xe_sched_job *job = NULL;
+	struct dma_fence *fence;
+	u64 batch_addr[2] = { 0, 0 };
+	int idx;
+
+	xe_assert(xe, m->ulls.enabled);
+
+	if (!drm_dev_enter(&xe->drm, &idx))
+		return;
+
+	/*
+	 * Must be done outside job_mutex as that lock is tainted with
+	 * reclaim and must be done holding a pm ref.
+	 */
+	job = xe_sched_job_create(m->q, batch_addr);
+	if (WARN_ON_ONCE(IS_ERR(job))) {
+		drm_dev_exit(idx);
+		mod_delayed_work(system_percpu_wq, &m->ulls.exit_work,
+				 ULLS_EXIT_JIFFIES);
+		return;		/* Not fatal */
+	}
+
+	mutex_lock(&m->job_mutex);
+
+	if (!xe_exec_queue_is_idle(m->q, 1))
+		goto unlock_exit;
+
+	xe_sched_job_get(job);
+	xe_sched_job_arm(job);
+	job->is_ulls = true;
+	job->is_ulls_last = true;
+	fence = dma_fence_get(&job->drm.s_fence->finished);
+	xe_sched_job_push(job);
+
+	/* Serialize force wake put */
+	dma_fence_wait(fence, false);
+	dma_fence_put(fence);
+
+	m->ulls.enabled = false;
+unlock_exit:
+	if (job)
+		xe_sched_job_put(job);
+	if (!m->ulls.enabled) {
+		/* Pairs with PM gets on enter */
+		xe_force_wake_put(gt_to_fw(m->q->hwe->gt), m->q->hwe->domain);
+		xe_pm_runtime_put(xe);
+
+		cancel_delayed_work(&m->ulls.exit_work);
+		xe_dbg(xe, "Migrate ULLS mode exit");
+	} else {
+		mod_delayed_work(system_percpu_wq, &m->ulls.exit_work,
+				 ULLS_EXIT_JIFFIES);
+	}
+
+	drm_dev_exit(idx);
+	mutex_unlock(&m->job_mutex);
+}
+
 /**
  * xe_migrate_init() - Initialize a migrate context
  * @m: The migration context
@@ -473,6 +628,8 @@ int xe_migrate_init(struct xe_migrate *m)
 	might_lock(&m->job_mutex);
 	fs_reclaim_release(GFP_KERNEL);
 
+	INIT_DELAYED_WORK(&m->ulls.exit_work, xe_migrate_ulls_exit);
+
 	err = devm_add_action_or_reset(xe->drm.dev, xe_migrate_fini, m);
 	if (err)
 		return err;
@@ -818,6 +975,26 @@ static u32 xe_migrate_ccs_copy(struct xe_migrate *m,
 	return flush_flags;
 }
 
+static bool xe_migrate_is_ulls(struct xe_migrate *m)
+{
+	lockdep_assert_held(&m->job_mutex);
+
+	return m->ulls.enabled;
+}
+
+static void xe_migrate_job_set_ulls_flags(struct xe_migrate *m,
+					  struct xe_sched_job *job)
+{
+	lockdep_assert_held(&m->job_mutex);
+	xe_tile_assert(m->tile, m->q == job->q);
+
+	if (xe_migrate_is_ulls(m)) {
+		job->is_ulls = true;
+		mod_delayed_work(system_percpu_wq, &m->ulls.exit_work,
+				 ULLS_EXIT_JIFFIES);
+	}
+}
+
 /**
  * xe_migrate_copy() - Copy content of TTM resources.
  * @m: The migration context.
@@ -992,6 +1169,7 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m,
 
 		mutex_lock(&m->job_mutex);
 		xe_sched_job_arm(job);
+		xe_migrate_job_set_ulls_flags(m, job);
 		dma_fence_put(fence);
 		fence = dma_fence_get(&job->drm.s_fence->finished);
 		xe_sched_job_push(job);
@@ -1602,6 +1780,7 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m,
 
 		mutex_lock(&m->job_mutex);
 		xe_sched_job_arm(job);
+		xe_migrate_job_set_ulls_flags(m, job);
 		dma_fence_put(fence);
 		fence = dma_fence_get(&job->drm.s_fence->finished);
 		xe_sched_job_push(job);
@@ -1881,6 +2060,7 @@ static struct dma_fence *xe_migrate_vram(struct xe_migrate *m,
 
 	mutex_lock(&m->job_mutex);
 	xe_sched_job_arm(job);
+	xe_migrate_job_set_ulls_flags(m, job);
 	fence = dma_fence_get(&job->drm.s_fence->finished);
 	xe_sched_job_push(job);
 
diff --git a/drivers/gpu/drm/xe/xe_migrate.h b/drivers/gpu/drm/xe/xe_migrate.h
index f6fa23c6c4fb..71606fb4fad0 100644
--- a/drivers/gpu/drm/xe/xe_migrate.h
+++ b/drivers/gpu/drm/xe/xe_migrate.h
@@ -85,4 +85,6 @@ struct xe_vm *xe_migrate_get_vm(struct xe_migrate *m);
 
 void xe_migrate_wait(struct xe_migrate *m);
 
+void xe_migrate_ulls_enter(struct xe_migrate *m);
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index ef34fbfc14f0..2c0f9a99d7a9 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1317,7 +1317,7 @@ static int xe_pt_vm_dependencies(struct xe_sched_job *job,
 	if (!job && !no_in_syncs(vops->syncs, vops->num_syncs))
 		return -ETIME;
 
-	if (!job && !xe_exec_queue_is_idle(vops->q))
+	if (!job && !xe_exec_queue_is_idle(vops->q, 0))
 		return -ETIME;
 
 	if (vops->flags & (XE_VMA_OPS_FLAG_WAIT_VM_BOOKKEEP |
diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
index 3a797de746ad..fe2d2ee12efc 100644
--- a/drivers/gpu/drm/xe/xe_sched_job_types.h
+++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
@@ -89,6 +89,12 @@ struct xe_sched_job {
 	bool last_replay;
 	/** @is_pt_job: is a PT job */
 	bool is_pt_job;
+	/** @is_ulls: is ULLS job */
+	bool is_ulls;
+	/** @is_ulls_first: is first ULLS job */
+	bool is_ulls_first;
+	/** @is_ulls_last: is last ULLS job */
+	bool is_ulls_last;
 	union {
 		/** @ptrs: per instance pointers. */
 		DECLARE_FLEX_ARRAY(struct xe_job_ptrs, ptrs);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index d4629e953b01..931d46696811 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -146,7 +146,7 @@ static bool xe_vm_is_idle(struct xe_vm *vm)
 
 	xe_vm_assert_held(vm);
 	list_for_each_entry(q, &vm->preempt.exec_queues, lr.link) {
-		if (!xe_exec_queue_is_idle(q))
+		if (!xe_exec_queue_is_idle(q, 0))
 			return false;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 20/25] drm/xe: Add ULLS migration job support to migration layer
  2026-02-28  1:34 ` [PATCH v3 20/25] drm/xe: Add ULLS migration job support to migration layer Matthew Brost
@ 2026-03-05 23:34   ` Summers, Stuart
  2026-03-09 23:11     ` Matthew Brost
  0 siblings, 1 reply; 63+ messages in thread
From: Summers, Stuart @ 2026-03-05 23:34 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Ghimiray, Himal Prasad, Yadav, Arvind,
	thomas.hellstrom@linux.intel.com, Dugast, Francois

On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> Add function to enter ULLS mode for migration job and delayed worker
> to
> exit (power saving). ULLS mode expected to entered upon page fault or
> SVM prefetch. ULLS mode exit delay is currently set to 5us.
> 
> ULLS mode only support on DGFX and USM platforms where a hardware
> engine
> is reserved for migrations jobs. When in ULLS mode, set several flags
> on
> migration jobs so submission backend / ring ops can properly submit
> in
> ULLS mode.
> 
> Upon ULLS mode enter, send a job trigger waiting a semphore pipling
> initial GuC / HW conetxt switch.
> 
> Upon ULLS mode exit, send a job to trigger that current ULLS
> semaphore so the ring can be taken off the hardware.

Assuming we do go down the ULLS in the KMD route, can you add a little
documentation for how this is being managed? Just in terms of how the
KMD is interacting with GuC and HW to manage this basically, how you
might configure, etc. Not specific to this patch, but maybe more for
the ULLS portion of the series generally...

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec_queue.c      |   5 +-
>  drivers/gpu/drm/xe/xe_exec_queue.h      |   4 +-
>  drivers/gpu/drm/xe/xe_migrate.c         | 180
> ++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_migrate.h         |   2 +
>  drivers/gpu/drm/xe/xe_pt.c              |   2 +-
>  drivers/gpu/drm/xe/xe_sched_job_types.h |   6 +
>  drivers/gpu/drm/xe/xe_vm.c              |   2 +-
>  7 files changed, 195 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> b/drivers/gpu/drm/xe/xe_exec_queue.c
> index ee2119cf45c1..4fa99f12c566 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -1348,6 +1348,7 @@ bool xe_exec_queue_is_lr(struct xe_exec_queue
> *q)
>  /**
>   * xe_exec_queue_is_idle() - Whether an exec_queue is idle.
>   * @q: The exec_queue
> + * @extra_jobs: Extra jobs on the queue
>   *
>   * FIXME: Need to determine what to use as the short-lived
>   * timeline lock for the exec_queues, so that the return value
> @@ -1359,9 +1360,9 @@ bool xe_exec_queue_is_lr(struct xe_exec_queue
> *q)
>   *
>   * Return: True if the exec_queue is idle, false otherwise.
>   */
> -bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
> +bool xe_exec_queue_is_idle(struct xe_exec_queue *q, int extra_jobs)
>  {
> -       return !atomic_read(&q->job_cnt);
> +       return !(atomic_read(&q->job_cnt) - extra_jobs);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h
> b/drivers/gpu/drm/xe/xe_exec_queue.h
> index b5aabab388c1..a11648b62a98 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> @@ -116,7 +116,7 @@ static inline struct xe_exec_queue
> *xe_exec_queue_multi_queue_primary(struct xe_
>  
>  bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
>  
> -bool xe_exec_queue_is_idle(struct xe_exec_queue *q);
> +bool xe_exec_queue_is_idle(struct xe_exec_queue *q, int extra_jobs);

Is this extra_jobs bit something coming in a future patch? I might have
missed, but I'm not seeing any non-zero usage here.

>  
>  void xe_exec_queue_kill(struct xe_exec_queue *q);
>  
> @@ -176,7 +176,7 @@ struct xe_lrc *xe_exec_queue_get_lrc(struct
> xe_exec_queue *q, u16 idx);
>   */
>  static inline bool xe_exec_queue_idle_skip_suspend(struct
> xe_exec_queue *q)
>  {
> -       return !xe_exec_queue_is_parallel(q) &&
> xe_exec_queue_is_idle(q);
> +       return !xe_exec_queue_is_parallel(q) &&
> xe_exec_queue_is_idle(q, 0);
>  }
>  
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> b/drivers/gpu/drm/xe/xe_migrate.c
> index c9ee6325ec9d..62f27868f56b 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -8,6 +8,7 @@
>  #include <linux/bitfield.h>
>  #include <linux/sizes.h>
>  
> +#include <drm/drm_drv.h>
>  #include <drm/drm_managed.h>
>  #include <drm/drm_pagemap.h>
>  #include <drm/ttm/ttm_tt.h>
> @@ -23,6 +24,7 @@
>  #include "xe_bb.h"
>  #include "xe_bo.h"
>  #include "xe_exec_queue.h"
> +#include "xe_force_wake.h"
>  #include "xe_ggtt.h"
>  #include "xe_gt.h"
>  #include "xe_gt_printk.h"
> @@ -30,6 +32,7 @@
>  #include "xe_lrc.h"
>  #include "xe_map.h"
>  #include "xe_mocs.h"
> +#include "xe_pm.h"
>  #include "xe_printk.h"
>  #include "xe_pt.h"
>  #include "xe_res_cursor.h"
> @@ -75,6 +78,14 @@ struct xe_migrate {
>         struct dma_fence *fence;
>         /** @min_chunk_size: For dgfx, Minimum chunk size */
>         u64 min_chunk_size;
> +       /** @ulls: ULLS support */
> +       struct {
> +               /** @ulls.enabled: ULLS is enabled */
> +               bool enabled;
> +#define ULLS_EXIT_JIFFIES      (HZ / 50)

It might be nice to make this configurable through sysfs or debugfs
even...

> +               /** @ulls.exit_work: ULLS exit worker */
> +               struct delayed_work exit_work;
> +       } ulls;
>  };
>  
>  #define MAX_PREEMPTDISABLE_TRANSFER SZ_8M /* Around 1ms. */
> @@ -96,6 +107,16 @@ struct xe_migrate {
>  static void xe_migrate_fini(void *arg)
>  {
>         struct xe_migrate *m = arg;
> +       struct xe_device *xe = tile_to_xe(m->tile);
> +
> +       disable_delayed_work_sync(&m->ulls.exit_work);
> +       mutex_lock(&m->job_mutex);
> +       if (m->ulls.enabled) {
> +               xe_force_wake_put(gt_to_fw(m->q->hwe->gt), m->q->hwe-
> >domain);
> +               xe_pm_runtime_put(xe);
> +               m->ulls.enabled = false;
> +       }
> +       mutex_unlock(&m->job_mutex);
>  
>         xe_vm_lock(m->q->vm, false);
>         xe_bo_unpin(m->pt_bo);
> @@ -410,6 +431,140 @@ static int xe_migrate_lock_prepare_vm(struct
> xe_tile *tile, struct xe_migrate *m
>         return err;
>  }
>  
> +/**
> + * xe_migrate_ulls_enter() - Enter ULLS mode
> + * @m: The migration context.
> + *
> + * If DGFX and not a VF, enter ULLS mode bypassing GuC / HW context
> + * switches by utilizing semaphore and continuously running batches.
> + */
> +void xe_migrate_ulls_enter(struct xe_migrate *m)
> +{
> +       struct xe_device *xe = tile_to_xe(m->tile);
> +       struct xe_sched_job *job = NULL;
> +       u64 batch_addr[2] = { 0, 0 };
> +       bool alloc = false;
> +
> +       xe_assert(xe, xe->info.has_usm);
> +
> +       if (!IS_DGFX(xe) || IS_SRIOV_VF(xe))
> +               return;
> +
> +job_alloc:
> +       if (alloc) {
> +               /*
> +                * Must be done outside job_mutex as that lock is
> tainted with
> +                * reclaim.

Where is the reclaim happening for this? It seems ugly jumping back and
forth like this to avoid the lock.

> +                */
> +               job = xe_sched_job_create(m->q, batch_addr);
> +               if (WARN_ON_ONCE(IS_ERR(job)))
> +                       return;         /* Not fatal */
> +       }
> +
> +       mutex_lock(&m->job_mutex);
> +       if (!m->ulls.enabled) {
> +               unsigned int fw_ref;
> +
> +               if (!job) {
> +                       alloc = true;
> +                       mutex_unlock(&m->job_mutex);
> +                       goto job_alloc;

Why are you jumping through this alloc/!job hoop here? Can we just do
this in one place instead of jumping back and forth?

> +               }
> +
> +               /* Pairs with FW put on ULLS exit */
> +               fw_ref = xe_force_wake_get(gt_to_fw(m->q->hwe->gt),
> +                                          m->q->hwe->domain);
> +               if (fw_ref) {
> +                       struct xe_device *xe = tile_to_xe(m->tile);
> +                       struct dma_fence *fence;
> +
> +                       /* Pairs with PM put on ULLS exit */
> +                       xe_pm_runtime_get_noresume(xe);
> +
> +                       xe_sched_job_get(job);
> +                       xe_sched_job_arm(job);
> +                       job->is_ulls = true;
> +                       job->is_ulls_first = true;
> +                       fence = dma_fence_get(&job->drm.s_fence-
> >finished);
> +                       xe_sched_job_push(job);
> +
> +                       dma_fence_put(fence);
> +
> +                       xe_dbg(xe, "Migrate ULLS mode enter");
> +                       m->ulls.enabled = true;
> +               }
> +       }
> +       if (job)
> +               xe_sched_job_put(job);
> +       if (m->ulls.enabled)
> +               mod_delayed_work(system_percpu_wq, &m-
> >ulls.exit_work,
> +                                ULLS_EXIT_JIFFIES);
> +       mutex_unlock(&m->job_mutex);
> +}
> +
> +static void xe_migrate_ulls_exit(struct work_struct *work)
> +{
> +       struct xe_migrate *m = container_of(work, struct xe_migrate,
> +                                           ulls.exit_work.work);
> +       struct xe_device *xe = tile_to_xe(m->tile);
> +       struct xe_sched_job *job = NULL;
> +       struct dma_fence *fence;
> +       u64 batch_addr[2] = { 0, 0 };
> +       int idx;
> +
> +       xe_assert(xe, m->ulls.enabled);
> +
> +       if (!drm_dev_enter(&xe->drm, &idx))
> +               return;
> +
> +       /*
> +        * Must be done outside job_mutex as that lock is tainted
> with
> +        * reclaim and must be done holding a pm ref.
> +        */
> +       job = xe_sched_job_create(m->q, batch_addr);
> +       if (WARN_ON_ONCE(IS_ERR(job))) {
> +               drm_dev_exit(idx);
> +               mod_delayed_work(system_percpu_wq, &m-
> >ulls.exit_work,
> +                                ULLS_EXIT_JIFFIES);
> +               return;         /* Not fatal */
> +       }
> +
> +       mutex_lock(&m->job_mutex);
> +
> +       if (!xe_exec_queue_is_idle(m->q, 1))
> +               goto unlock_exit;
> +
> +       xe_sched_job_get(job);
> +       xe_sched_job_arm(job);
> +       job->is_ulls = true;
> +       job->is_ulls_last = true;
> +       fence = dma_fence_get(&job->drm.s_fence->finished);
> +       xe_sched_job_push(job);
> +
> +       /* Serialize force wake put */
> +       dma_fence_wait(fence, false);
> +       dma_fence_put(fence);
> +
> +       m->ulls.enabled = false;
> +unlock_exit:
> +       if (job)
> +               xe_sched_job_put(job);
> +       if (!m->ulls.enabled) {
> +               /* Pairs with PM gets on enter */
> +               xe_force_wake_put(gt_to_fw(m->q->hwe->gt), m->q->hwe-
> >domain);
> +               xe_pm_runtime_put(xe);

Maybe reverse these to match the gets above.

> +
> +               cancel_delayed_work(&m->ulls.exit_work);
> +               xe_dbg(xe, "Migrate ULLS mode exit");
> +       } else {
> +               mod_delayed_work(system_percpu_wq, &m-
> >ulls.exit_work,
> +                                ULLS_EXIT_JIFFIES);
> +       }
> +
> +       drm_dev_exit(idx);
> +       mutex_unlock(&m->job_mutex);
> +}
> +
>  /**
>   * xe_migrate_init() - Initialize a migrate context
>   * @m: The migration context
> @@ -473,6 +628,8 @@ int xe_migrate_init(struct xe_migrate *m)
>         might_lock(&m->job_mutex);
>         fs_reclaim_release(GFP_KERNEL);
>  
> +       INIT_DELAYED_WORK(&m->ulls.exit_work, xe_migrate_ulls_exit);
> +
>         err = devm_add_action_or_reset(xe->drm.dev, xe_migrate_fini,
> m);
>         if (err)
>                 return err;
> @@ -818,6 +975,26 @@ static u32 xe_migrate_ccs_copy(struct xe_migrate
> *m,
>         return flush_flags;
>  }
>  
> +static bool xe_migrate_is_ulls(struct xe_migrate *m)
> +{
> +       lockdep_assert_held(&m->job_mutex);
> +
> +       return m->ulls.enabled;
> +}
> +
> +static void xe_migrate_job_set_ulls_flags(struct xe_migrate *m,
> +                                         struct xe_sched_job *job)
> +{
> +       lockdep_assert_held(&m->job_mutex);
> +       xe_tile_assert(m->tile, m->q == job->q);

Nit: Should we have a helper here like you have for the bind queue?

> +
> +       if (xe_migrate_is_ulls(m)) {
> +               job->is_ulls = true;
> +               mod_delayed_work(system_percpu_wq, &m-
> >ulls.exit_work,
> +                                ULLS_EXIT_JIFFIES);
> +       }
> +}
> +
>  /**
>   * xe_migrate_copy() - Copy content of TTM resources.
>   * @m: The migration context.
> @@ -992,6 +1169,7 @@ struct dma_fence *xe_migrate_copy(struct
> xe_migrate *m,
>  
>                 mutex_lock(&m->job_mutex);
>                 xe_sched_job_arm(job);
> +               xe_migrate_job_set_ulls_flags(m, job);
>                 dma_fence_put(fence);
>                 fence = dma_fence_get(&job->drm.s_fence->finished);
>                 xe_sched_job_push(job);
> @@ -1602,6 +1780,7 @@ struct dma_fence *xe_migrate_clear(struct
> xe_migrate *m,
>  
>                 mutex_lock(&m->job_mutex);
>                 xe_sched_job_arm(job);
> +               xe_migrate_job_set_ulls_flags(m, job);
>                 dma_fence_put(fence);
>                 fence = dma_fence_get(&job->drm.s_fence->finished);
>                 xe_sched_job_push(job);
> @@ -1881,6 +2060,7 @@ static struct dma_fence *xe_migrate_vram(struct
> xe_migrate *m,
>  
>         mutex_lock(&m->job_mutex);
>         xe_sched_job_arm(job);
> +       xe_migrate_job_set_ulls_flags(m, job);
>         fence = dma_fence_get(&job->drm.s_fence->finished);
>         xe_sched_job_push(job);
>  
> diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> b/drivers/gpu/drm/xe/xe_migrate.h
> index f6fa23c6c4fb..71606fb4fad0 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.h
> +++ b/drivers/gpu/drm/xe/xe_migrate.h
> @@ -85,4 +85,6 @@ struct xe_vm *xe_migrate_get_vm(struct xe_migrate
> *m);
>  
>  void xe_migrate_wait(struct xe_migrate *m);
>  
> +void xe_migrate_ulls_enter(struct xe_migrate *m);
> +
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index ef34fbfc14f0..2c0f9a99d7a9 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1317,7 +1317,7 @@ static int xe_pt_vm_dependencies(struct
> xe_sched_job *job,
>         if (!job && !no_in_syncs(vops->syncs, vops->num_syncs))
>                 return -ETIME;
>  
> -       if (!job && !xe_exec_queue_is_idle(vops->q))
> +       if (!job && !xe_exec_queue_is_idle(vops->q, 0))
>                 return -ETIME;
>  
>         if (vops->flags & (XE_VMA_OPS_FLAG_WAIT_VM_BOOKKEEP |
> diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h
> b/drivers/gpu/drm/xe/xe_sched_job_types.h
> index 3a797de746ad..fe2d2ee12efc 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job_types.h
> +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
> @@ -89,6 +89,12 @@ struct xe_sched_job {
>         bool last_replay;
>         /** @is_pt_job: is a PT job */
>         bool is_pt_job;
> +       /** @is_ulls: is ULLS job */
> +       bool is_ulls;
> +       /** @is_ulls_first: is first ULLS job */

This flag I'm not fully understanding. Why do we need to separate this
from is_ulls?

Thanks,
Stuart

> +       bool is_ulls_first;
> +       /** @is_ulls_last: is last ULLS job */
> +       bool is_ulls_last;
>         union {
>                 /** @ptrs: per instance pointers. */
>                 DECLARE_FLEX_ARRAY(struct xe_job_ptrs, ptrs);
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index d4629e953b01..931d46696811 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -146,7 +146,7 @@ static bool xe_vm_is_idle(struct xe_vm *vm)
>  
>         xe_vm_assert_held(vm);
>         list_for_each_entry(q, &vm->preempt.exec_queues, lr.link) {
> -               if (!xe_exec_queue_is_idle(q))
> +               if (!xe_exec_queue_is_idle(q, 0))
>                         return false;
>         }
>  


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 20/25] drm/xe: Add ULLS migration job support to migration layer
  2026-03-05 23:34   ` Summers, Stuart
@ 2026-03-09 23:11     ` Matthew Brost
  0 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-03-09 23:11 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, Ghimiray, Himal Prasad,
	Yadav, Arvind, thomas.hellstrom@linux.intel.com, Dugast, Francois

On Thu, Mar 05, 2026 at 04:34:36PM -0700, Summers, Stuart wrote:
> On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> > Add function to enter ULLS mode for migration job and delayed worker
> > to
> > exit (power saving). ULLS mode expected to entered upon page fault or
> > SVM prefetch. ULLS mode exit delay is currently set to 5us.
> > 
> > ULLS mode only support on DGFX and USM platforms where a hardware
> > engine
> > is reserved for migrations jobs. When in ULLS mode, set several flags
> > on
> > migration jobs so submission backend / ring ops can properly submit
> > in
> > ULLS mode.
> > 
> > Upon ULLS mode enter, send a job trigger waiting a semphore pipling
> > initial GuC / HW conetxt switch.
> > 
> > Upon ULLS mode exit, send a job to trigger that current ULLS
> > semaphore so the ring can be taken off the hardware.
> 
> Assuming we do go down the ULLS in the KMD route, can you add a little
> documentation for how this is being managed? Just in terms of how the
> KMD is interacting with GuC and HW to manage this basically, how you
> might configure, etc. Not specific to this patch, but maybe more for
> the ULLS portion of the series generally...
> 

I can write a proper kernel doc section for ULLS explaining the design -
I should have done that to make reviews easier. 

> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_exec_queue.c      |   5 +-
> >  drivers/gpu/drm/xe/xe_exec_queue.h      |   4 +-
> >  drivers/gpu/drm/xe/xe_migrate.c         | 180
> > ++++++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_migrate.h         |   2 +
> >  drivers/gpu/drm/xe/xe_pt.c              |   2 +-
> >  drivers/gpu/drm/xe/xe_sched_job_types.h |   6 +
> >  drivers/gpu/drm/xe/xe_vm.c              |   2 +-
> >  7 files changed, 195 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > index ee2119cf45c1..4fa99f12c566 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > @@ -1348,6 +1348,7 @@ bool xe_exec_queue_is_lr(struct xe_exec_queue
> > *q)
> >  /**
> >   * xe_exec_queue_is_idle() - Whether an exec_queue is idle.
> >   * @q: The exec_queue
> > + * @extra_jobs: Extra jobs on the queue
> >   *
> >   * FIXME: Need to determine what to use as the short-lived
> >   * timeline lock for the exec_queues, so that the return value
> > @@ -1359,9 +1360,9 @@ bool xe_exec_queue_is_lr(struct xe_exec_queue
> > *q)
> >   *
> >   * Return: True if the exec_queue is idle, false otherwise.
> >   */
> > -bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
> > +bool xe_exec_queue_is_idle(struct xe_exec_queue *q, int extra_jobs)
> >  {
> > -       return !atomic_read(&q->job_cnt);
> > +       return !(atomic_read(&q->job_cnt) - extra_jobs);
> >  }
> >  
> >  /**
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h
> > b/drivers/gpu/drm/xe/xe_exec_queue.h
> > index b5aabab388c1..a11648b62a98 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> > @@ -116,7 +116,7 @@ static inline struct xe_exec_queue
> > *xe_exec_queue_multi_queue_primary(struct xe_
> >  
> >  bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
> >  
> > -bool xe_exec_queue_is_idle(struct xe_exec_queue *q);
> > +bool xe_exec_queue_is_idle(struct xe_exec_queue *q, int extra_jobs);
> 
> Is this extra_jobs bit something coming in a future patch? I might have
> missed, but I'm not seeing any non-zero usage here.
> 

It is used in xe_migrate_ulls_exit...

> >  
> >  void xe_exec_queue_kill(struct xe_exec_queue *q);
> >  
> > @@ -176,7 +176,7 @@ struct xe_lrc *xe_exec_queue_get_lrc(struct
> > xe_exec_queue *q, u16 idx);
> >   */
> >  static inline bool xe_exec_queue_idle_skip_suspend(struct
> > xe_exec_queue *q)
> >  {
> > -       return !xe_exec_queue_is_parallel(q) &&
> > xe_exec_queue_is_idle(q);
> > +       return !xe_exec_queue_is_parallel(q) &&
> > xe_exec_queue_is_idle(q, 0);
> >  }
> >  
> >  #endif
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > b/drivers/gpu/drm/xe/xe_migrate.c
> > index c9ee6325ec9d..62f27868f56b 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -8,6 +8,7 @@
> >  #include <linux/bitfield.h>
> >  #include <linux/sizes.h>
> >  
> > +#include <drm/drm_drv.h>
> >  #include <drm/drm_managed.h>
> >  #include <drm/drm_pagemap.h>
> >  #include <drm/ttm/ttm_tt.h>
> > @@ -23,6 +24,7 @@
> >  #include "xe_bb.h"
> >  #include "xe_bo.h"
> >  #include "xe_exec_queue.h"
> > +#include "xe_force_wake.h"
> >  #include "xe_ggtt.h"
> >  #include "xe_gt.h"
> >  #include "xe_gt_printk.h"
> > @@ -30,6 +32,7 @@
> >  #include "xe_lrc.h"
> >  #include "xe_map.h"
> >  #include "xe_mocs.h"
> > +#include "xe_pm.h"
> >  #include "xe_printk.h"
> >  #include "xe_pt.h"
> >  #include "xe_res_cursor.h"
> > @@ -75,6 +78,14 @@ struct xe_migrate {
> >         struct dma_fence *fence;
> >         /** @min_chunk_size: For dgfx, Minimum chunk size */
> >         u64 min_chunk_size;
> > +       /** @ulls: ULLS support */
> > +       struct {
> > +               /** @ulls.enabled: ULLS is enabled */
> > +               bool enabled;
> > +#define ULLS_EXIT_JIFFIES      (HZ / 50)
> 
> It might be nice to make this configurable through sysfs or debugfs
> even...
> 

I agree. Will do in next rev.

> > +               /** @ulls.exit_work: ULLS exit worker */
> > +               struct delayed_work exit_work;
> > +       } ulls;
> >  };
> >  
> >  #define MAX_PREEMPTDISABLE_TRANSFER SZ_8M /* Around 1ms. */
> > @@ -96,6 +107,16 @@ struct xe_migrate {
> >  static void xe_migrate_fini(void *arg)
> >  {
> >         struct xe_migrate *m = arg;
> > +       struct xe_device *xe = tile_to_xe(m->tile);
> > +
> > +       disable_delayed_work_sync(&m->ulls.exit_work);
> > +       mutex_lock(&m->job_mutex);
> > +       if (m->ulls.enabled) {
> > +               xe_force_wake_put(gt_to_fw(m->q->hwe->gt), m->q->hwe-
> > >domain);
> > +               xe_pm_runtime_put(xe);
> > +               m->ulls.enabled = false;
> > +       }
> > +       mutex_unlock(&m->job_mutex);
> >  
> >         xe_vm_lock(m->q->vm, false);
> >         xe_bo_unpin(m->pt_bo);
> > @@ -410,6 +431,140 @@ static int xe_migrate_lock_prepare_vm(struct
> > xe_tile *tile, struct xe_migrate *m
> >         return err;
> >  }
> >  
> > +/**
> > + * xe_migrate_ulls_enter() - Enter ULLS mode
> > + * @m: The migration context.
> > + *
> > + * If DGFX and not a VF, enter ULLS mode bypassing GuC / HW context
> > + * switches by utilizing semaphore and continuously running batches.
> > + */
> > +void xe_migrate_ulls_enter(struct xe_migrate *m)
> > +{
> > +       struct xe_device *xe = tile_to_xe(m->tile);
> > +       struct xe_sched_job *job = NULL;
> > +       u64 batch_addr[2] = { 0, 0 };
> > +       bool alloc = false;
> > +
> > +       xe_assert(xe, xe->info.has_usm);
> > +
> > +       if (!IS_DGFX(xe) || IS_SRIOV_VF(xe))
> > +               return;
> > +
> > +job_alloc:
> > +       if (alloc) {
> > +               /*
> > +                * Must be done outside job_mutex as that lock is
> > tainted with
> > +                * reclaim.
> 
> Where is the reclaim happening for this? It seems ugly jumping back and
> forth like this to avoid the lock.
> 

The job_mutex is annotated with reclaim, because it is dma-fence
signaling path. I forgot the details here of why it is, but Thomas
figured this out a while back and fixed the bugs we had there and added
the annotation. I can try to page this information back in why the
job_mutex is reclaim and perhaps add some kernel doc explaining this.

> > +                */
> > +               job = xe_sched_job_create(m->q, batch_addr);
> > +               if (WARN_ON_ONCE(IS_ERR(job)))
> > +                       return;         /* Not fatal */
> > +       }
> > +
> > +       mutex_lock(&m->job_mutex);
> > +       if (!m->ulls.enabled) {
> > +               unsigned int fw_ref;
> > +
> > +               if (!job) {
> > +                       alloc = true;
> > +                       mutex_unlock(&m->job_mutex);
> > +                       goto job_alloc;
> 
> Why are you jumping through this alloc/!job hoop here? Can we just do
> this in one place instead of jumping back and forth?
> 

It could be rewritten - maybe I always just allocate the job and discard
it if it isn't needed.

> > +               }
> > +
> > +               /* Pairs with FW put on ULLS exit */
> > +               fw_ref = xe_force_wake_get(gt_to_fw(m->q->hwe->gt),
> > +                                          m->q->hwe->domain);
> > +               if (fw_ref) {
> > +                       struct xe_device *xe = tile_to_xe(m->tile);
> > +                       struct dma_fence *fence;
> > +
> > +                       /* Pairs with PM put on ULLS exit */
> > +                       xe_pm_runtime_get_noresume(xe);
> > +
> > +                       xe_sched_job_get(job);
> > +                       xe_sched_job_arm(job);
> > +                       job->is_ulls = true;
> > +                       job->is_ulls_first = true;
> > +                       fence = dma_fence_get(&job->drm.s_fence-
> > >finished);
> > +                       xe_sched_job_push(job);
> > +
> > +                       dma_fence_put(fence);
> > +
> > +                       xe_dbg(xe, "Migrate ULLS mode enter");
> > +                       m->ulls.enabled = true;
> > +               }
> > +       }
> > +       if (job)
> > +               xe_sched_job_put(job);
> > +       if (m->ulls.enabled)
> > +               mod_delayed_work(system_percpu_wq, &m-
> > >ulls.exit_work,
> > +                                ULLS_EXIT_JIFFIES);
> > +       mutex_unlock(&m->job_mutex);
> > +}
> > +
> > +static void xe_migrate_ulls_exit(struct work_struct *work)
> > +{
> > +       struct xe_migrate *m = container_of(work, struct xe_migrate,
> > +                                           ulls.exit_work.work);
> > +       struct xe_device *xe = tile_to_xe(m->tile);
> > +       struct xe_sched_job *job = NULL;
> > +       struct dma_fence *fence;
> > +       u64 batch_addr[2] = { 0, 0 };
> > +       int idx;
> > +
> > +       xe_assert(xe, m->ulls.enabled);
> > +
> > +       if (!drm_dev_enter(&xe->drm, &idx))
> > +               return;
> > +
> > +       /*
> > +        * Must be done outside job_mutex as that lock is tainted
> > with
> > +        * reclaim and must be done holding a pm ref.
> > +        */
> > +       job = xe_sched_job_create(m->q, batch_addr);
> > +       if (WARN_ON_ONCE(IS_ERR(job))) {
> > +               drm_dev_exit(idx);
> > +               mod_delayed_work(system_percpu_wq, &m-
> > >ulls.exit_work,
> > +                                ULLS_EXIT_JIFFIES);
> > +               return;         /* Not fatal */
> > +       }
> > +
> > +       mutex_lock(&m->job_mutex);
> > +
> > +       if (!xe_exec_queue_is_idle(m->q, 1))
> > +               goto unlock_exit;
> > +
> > +       xe_sched_job_get(job);
> > +       xe_sched_job_arm(job);
> > +       job->is_ulls = true;
> > +       job->is_ulls_last = true;
> > +       fence = dma_fence_get(&job->drm.s_fence->finished);
> > +       xe_sched_job_push(job);
> > +
> > +       /* Serialize force wake put */
> > +       dma_fence_wait(fence, false);
> > +       dma_fence_put(fence);
> > +
> > +       m->ulls.enabled = false;
> > +unlock_exit:
> > +       if (job)
> > +               xe_sched_job_put(job);
> > +       if (!m->ulls.enabled) {
> > +               /* Pairs with PM gets on enter */
> > +               xe_force_wake_put(gt_to_fw(m->q->hwe->gt), m->q->hwe-
> > >domain);
> > +               xe_pm_runtime_put(xe);
> 
> Maybe reverse these to match the gets above.
> 

Yes.

> > +
> > +               cancel_delayed_work(&m->ulls.exit_work);
> > +               xe_dbg(xe, "Migrate ULLS mode exit");
> > +       } else {
> > +               mod_delayed_work(system_percpu_wq, &m-
> > >ulls.exit_work,
> > +                                ULLS_EXIT_JIFFIES);
> > +       }
> > +
> > +       drm_dev_exit(idx);
> > +       mutex_unlock(&m->job_mutex);
> > +}
> > +
> >  /**
> >   * xe_migrate_init() - Initialize a migrate context
> >   * @m: The migration context
> > @@ -473,6 +628,8 @@ int xe_migrate_init(struct xe_migrate *m)
> >         might_lock(&m->job_mutex);
> >         fs_reclaim_release(GFP_KERNEL);
> >  
> > +       INIT_DELAYED_WORK(&m->ulls.exit_work, xe_migrate_ulls_exit);
> > +
> >         err = devm_add_action_or_reset(xe->drm.dev, xe_migrate_fini,
> > m);
> >         if (err)
> >                 return err;
> > @@ -818,6 +975,26 @@ static u32 xe_migrate_ccs_copy(struct xe_migrate
> > *m,
> >         return flush_flags;
> >  }
> >  
> > +static bool xe_migrate_is_ulls(struct xe_migrate *m)
> > +{
> > +       lockdep_assert_held(&m->job_mutex);
> > +
> > +       return m->ulls.enabled;
> > +}
> > +
> > +static void xe_migrate_job_set_ulls_flags(struct xe_migrate *m,
> > +                                         struct xe_sched_job *job)
> > +{
> > +       lockdep_assert_held(&m->job_mutex);
> > +       xe_tile_assert(m->tile, m->q == job->q);
> 
> Nit: Should we have a helper here like you have for the bind queue?
> 

Let me see if there is a helper I can use here.

> > +
> > +       if (xe_migrate_is_ulls(m)) {
> > +               job->is_ulls = true;
> > +               mod_delayed_work(system_percpu_wq, &m-
> > >ulls.exit_work,
> > +                                ULLS_EXIT_JIFFIES);
> > +       }
> > +}
> > +
> >  /**
> >   * xe_migrate_copy() - Copy content of TTM resources.
> >   * @m: The migration context.
> > @@ -992,6 +1169,7 @@ struct dma_fence *xe_migrate_copy(struct
> > xe_migrate *m,
> >  
> >                 mutex_lock(&m->job_mutex);
> >                 xe_sched_job_arm(job);
> > +               xe_migrate_job_set_ulls_flags(m, job);
> >                 dma_fence_put(fence);
> >                 fence = dma_fence_get(&job->drm.s_fence->finished);
> >                 xe_sched_job_push(job);
> > @@ -1602,6 +1780,7 @@ struct dma_fence *xe_migrate_clear(struct
> > xe_migrate *m,
> >  
> >                 mutex_lock(&m->job_mutex);
> >                 xe_sched_job_arm(job);
> > +               xe_migrate_job_set_ulls_flags(m, job);
> >                 dma_fence_put(fence);
> >                 fence = dma_fence_get(&job->drm.s_fence->finished);
> >                 xe_sched_job_push(job);
> > @@ -1881,6 +2060,7 @@ static struct dma_fence *xe_migrate_vram(struct
> > xe_migrate *m,
> >  
> >         mutex_lock(&m->job_mutex);
> >         xe_sched_job_arm(job);
> > +       xe_migrate_job_set_ulls_flags(m, job);
> >         fence = dma_fence_get(&job->drm.s_fence->finished);
> >         xe_sched_job_push(job);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> > b/drivers/gpu/drm/xe/xe_migrate.h
> > index f6fa23c6c4fb..71606fb4fad0 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.h
> > +++ b/drivers/gpu/drm/xe/xe_migrate.h
> > @@ -85,4 +85,6 @@ struct xe_vm *xe_migrate_get_vm(struct xe_migrate
> > *m);
> >  
> >  void xe_migrate_wait(struct xe_migrate *m);
> >  
> > +void xe_migrate_ulls_enter(struct xe_migrate *m);
> > +
> >  #endif
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> > index ef34fbfc14f0..2c0f9a99d7a9 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -1317,7 +1317,7 @@ static int xe_pt_vm_dependencies(struct
> > xe_sched_job *job,
> >         if (!job && !no_in_syncs(vops->syncs, vops->num_syncs))
> >                 return -ETIME;
> >  
> > -       if (!job && !xe_exec_queue_is_idle(vops->q))
> > +       if (!job && !xe_exec_queue_is_idle(vops->q, 0))
> >                 return -ETIME;
> >  
> >         if (vops->flags & (XE_VMA_OPS_FLAG_WAIT_VM_BOOKKEEP |
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h
> > b/drivers/gpu/drm/xe/xe_sched_job_types.h
> > index 3a797de746ad..fe2d2ee12efc 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job_types.h
> > +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
> > @@ -89,6 +89,12 @@ struct xe_sched_job {
> >         bool last_replay;
> >         /** @is_pt_job: is a PT job */
> >         bool is_pt_job;
> > +       /** @is_ulls: is ULLS job */
> > +       bool is_ulls;
> > +       /** @is_ulls_first: is first ULLS job */
> 
> This flag I'm not fully understanding. Why do we need to separate this
> from is_ulls?
> 

This is where kernel doc would help. Let me explain:

The first ULLS job only submits ring operations to start spinning on a
semaphore in the ring (no BB execution). This is a GuC submission, if
that isn’t clear.

The middle ULLS jobs submit by moving the ring tail via an MMIO write,
signaling the spinning semaphore from the previous job (GuC bypass, BB
execution). The last instructions in the ring set up a new spinning
semaphore.

The last ULLS job submits by moving the ring tail via an MMIO write,
signaling the spinning semaphore from the previous job (GuC bypass, no
BB execution). There are no instructions in the ring for a spinning
semaphore, so the queue will context-switch off the hardware.

Also, if it isn’t clear, this whole scheme only works if it runs on a
dedicated hardware engine—which we have after CPU binds, since the
paging copy engine is mapped only to the migration queue. Also this
relies on MMIO write, this also doesn't work on VFs.

Matt

> Thanks,
> Stuart
> 
> > +       bool is_ulls_first;
> > +       /** @is_ulls_last: is last ULLS job */
> > +       bool is_ulls_last;
> >         union {
> >                 /** @ptrs: per instance pointers. */
> >                 DECLARE_FLEX_ARRAY(struct xe_job_ptrs, ptrs);
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index d4629e953b01..931d46696811 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -146,7 +146,7 @@ static bool xe_vm_is_idle(struct xe_vm *vm)
> >  
> >         xe_vm_assert_held(vm);
> >         list_for_each_entry(q, &vm->preempt.exec_queues, lr.link) {
> > -               if (!xe_exec_queue_is_idle(q))
> > +               if (!xe_exec_queue_is_idle(q, 0))
> >                         return false;
> >         }
> >  
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 21/25] drm/xe: Add MI_SEMAPHORE_WAIT instruction defs
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (19 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 20/25] drm/xe: Add ULLS migration job support to migration layer Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-02-28  1:34 ` [PATCH v3 22/25] drm/xe: Add ULLS migration job support to ring ops Matthew Brost
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Add MI_SEMAPHORE_WAIT instruction defs which are need for kernel ULLS
migration jobs.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/instructions/xe_mi_commands.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
index c47b290e0e9f..4491f880513a 100644
--- a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
+++ b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
@@ -34,6 +34,12 @@
 #define MI_FORCE_WAKEUP			__MI_INSTR(0x1D)
 #define MI_MATH(n)			(__MI_INSTR(0x1A) | XE_INSTR_NUM_DW((n) + 1))
 
+#define MI_SEMAPHORE_WAIT		(__MI_INSTR(0x1c) | XE_INSTR_NUM_DW(4))
+#define   MI_SEMAPHORE_GLOBAL_GTT	REG_BIT(22)
+#define   MI_SEMAPHORE_POLL             REG_BIT(15)
+#define   MI_SEMAPHORE_COMPARE		GENMASK(15, 12)
+#define   MI_SEMAPHORE_SAD_EQ_SDD       REG_FIELD_PREP(MI_SEMAPHORE_COMPARE, 4)
+
 #define MI_STORE_DATA_IMM		__MI_INSTR(0x20)
 #define   MI_SDI_GGTT			REG_BIT(22)
 #define   MI_SDI_LEN_DW			GENMASK(9, 0)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 22/25] drm/xe: Add ULLS migration job support to ring ops
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (20 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 21/25] drm/xe: Add MI_SEMAPHORE_WAIT instruction defs Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-02-28  1:34 ` [PATCH v3 23/25] drm/xe: Add ULLS migration job support to GuC submission Matthew Brost
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Add preamble and postamble for ULLS migrations jobs. Preamble clears
current semaphore for reuse. Postamble waits on next semaphore which is
set upon next job submission. The last ULLS migration job skips BB
submission and postamble (clear current semaphore, write seqno, exit
ULLS).

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_ring_ops.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 53d420d72164..4e233651beb9 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -422,6 +422,27 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 	xe_lrc_write_ring(lrc, dw, i * sizeof(*dw));
 }
 
+static int emit_ulls_preamble(struct xe_lrc *lrc, u32 *dw, int i, u32 seqno)
+{
+	u32 addr = xe_lrc_ulls_semaphore_ggtt_addr(lrc, seqno);
+
+	return emit_store_imm_ggtt(addr, LRC_MIGRATION_ULLS_SEMAPORE_CLEAR,
+				   dw, i);
+}
+
+static int emit_ulls_postamble(struct xe_lrc *lrc, u32 *dw, int i, u32 seqno)
+{
+	dw[i++] = MI_SEMAPHORE_WAIT |
+		MI_SEMAPHORE_GLOBAL_GTT |
+		MI_SEMAPHORE_POLL |
+		MI_SEMAPHORE_SAD_EQ_SDD;
+	dw[i++] = LRC_MIGRATION_ULLS_SEMAPORE_SINGAL;
+	dw[i++] = xe_lrc_ulls_semaphore_ggtt_addr(lrc, seqno + 1);
+	dw[i++] = 0;
+
+	return i;
+}
+
 static void emit_migration_job_gen12(struct xe_sched_job *job,
 				     struct xe_lrc *lrc, u32 *head,
 				     u32 seqno)
@@ -433,10 +454,16 @@ static void emit_migration_job_gen12(struct xe_sched_job *job,
 
 	*head = lrc->ring.tail;
 
+	if (job->is_ulls)
+		i = emit_ulls_preamble(lrc, dw, i, seqno);
+
 	i = emit_copy_timestamp(xe, lrc, dw, i);
 
 	i = emit_store_imm_ggtt(saddr, seqno, dw, i);
 
+	if (job->is_ulls_last || job->is_ulls_first)
+		goto seqno_write;
+
 	dw[i++] = MI_ARB_ON_OFF | MI_ARB_DISABLE; /* Enabled again below */
 
 	i = emit_bb_start(job->ptrs[0].batch_addr, BIT(8), dw, i);
@@ -447,12 +474,16 @@ static void emit_migration_job_gen12(struct xe_sched_job *job,
 
 	i = emit_bb_start(job->ptrs[1].batch_addr, BIT(8), dw, i);
 
+seqno_write:
 	i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc), seqno,
 				job->migrate_flush_flags,
 				dw, i);
 
 	i = emit_user_interrupt(dw, i);
 
+	if (job->is_ulls && !job->is_ulls_last)
+		i = emit_ulls_postamble(lrc, dw, i, seqno);
+
 	xe_gt_assert(job->q->gt, i <= MAX_JOB_SIZE_DW);
 
 	xe_lrc_write_ring(lrc, dw, i * sizeof(*dw));
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 23/25] drm/xe: Add ULLS migration job support to GuC submission
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (21 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 22/25] drm/xe: Add ULLS migration job support to ring ops Matthew Brost
@ 2026-02-28  1:34 ` Matthew Brost
  2026-02-28  1:35 ` [PATCH v3 24/25] drm/xe: Enter ULLS for migration jobs upon page fault or SVM prefetch Matthew Brost
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:34 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Add ULLS migration job support to GuC submission backend.

Changes required:
- On migration queue, reduce max jobs to the number of ULLS semaphores
  minus one
- Directly set the hardware engine tail via a MMIO write for ULLS jobs
  except for first ULLS job
- Set ULLS sempahore for current job releasing last job except for first
  ULLS job
- Suppress submit H2G for ULLS except for first ULLS job

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index f7b56a1eaed4..db096bfb640c 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1154,6 +1154,11 @@ static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job)
 	 */
 	q = xe_exec_queue_multi_queue_primary(q);
 
+	if (job->is_ulls && !job->is_ulls_first) {
+		xe_hw_engine_write_ring_tail(q->hwe, lrc->ring.tail);
+		xe_lrc_set_ulls_semaphore(lrc, xe_sched_job_lrc_seqno(job));
+	}
+
 	if (!exec_queue_enabled(q) && !exec_queue_suspended(q)) {
 		action[len++] = XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = q->guc->id;
@@ -1167,13 +1172,14 @@ static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job)
 		set_exec_queue_pending_enable(q);
 		set_exec_queue_enabled(q);
 		trace_xe_exec_queue_scheduling_enable(q);
-	} else {
+	} else if (!job->is_ulls || job->is_ulls_first) {
 		action[len++] = XE_GUC_ACTION_SCHED_CONTEXT;
 		action[len++] = q->guc->id;
 		trace_xe_exec_queue_submit(q);
 	}
 
-	xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
+	if (!job->is_ulls || job->is_ulls_first || num_g2h)
+		xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
 
 	if (extra_submit) {
 		len = 0;
@@ -2000,6 +2006,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 	struct xe_guc_exec_queue *ge;
 	long timeout;
 	int err, i;
+	int max_jobs = (xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES);
 
 	xe_gt_assert(guc_to_gt(guc), xe_device_uc_enabled(guc_to_xe(guc)));
 
@@ -2029,8 +2036,15 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 		submit_wq = primary->guc->sched.base.submit_wq;
 	}
 
+	if (q->vm && q->vm->flags & XE_VM_FLAG_MIGRATION) {
+		xe_assert(guc_to_xe(guc),
+			  LRC_MIGRATION_ULLS_SEMAPORE_COUNT - 1 < max_jobs);
+
+		max_jobs = LRC_MIGRATION_ULLS_SEMAPORE_COUNT - 1;
+	}
+
 	err = xe_sched_init(&ge->sched, &drm_sched_ops, &xe_sched_ops,
-			    submit_wq, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
+			    submit_wq, max_jobs, 64,
 			    timeout, guc_to_gt(guc)->ordered_wq, NULL,
 			    q->name, gt_to_xe(q->gt)->drm.dev);
 	if (err)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 24/25] drm/xe: Enter ULLS for migration jobs upon page fault or SVM prefetch
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (22 preceding siblings ...)
  2026-02-28  1:34 ` [PATCH v3 23/25] drm/xe: Add ULLS migration job support to GuC submission Matthew Brost
@ 2026-02-28  1:35 ` Matthew Brost
  2026-02-28  1:35 ` [PATCH v3 25/25] drm/xe: Add modparam to enable / disable ULLS on migrate queue Matthew Brost
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:35 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Call xe_migration_ulls_enter upon page fault or SVM prefetch in an
effort speed up these critical paths.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_pagefault.c | 3 +++
 drivers/gpu/drm/xe/xe_vm.c        | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index ea4857acf28d..a025d2cbb021 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -14,6 +14,7 @@
 #include "xe_gt_types.h"
 #include "xe_gt_stats.h"
 #include "xe_hw_engine.h"
+#include "xe_migrate.h"
 #include "xe_pagefault.h"
 #include "xe_pagefault_types.h"
 #include "xe_svm.h"
@@ -171,6 +172,8 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
 	if (IS_ERR(vm))
 		return PTR_ERR(vm);
 
+	xe_migrate_ulls_enter(gt_to_tile(gt)->migrate);
+
 	/*
 	 * TODO: Change to read lock? Using write lock for simplicity.
 	 */
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 931d46696811..9103ce82aacf 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2366,8 +2366,10 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_vma_ops *vops,
 			ctx.devmem_possible = IS_DGFX(vm->xe) &&
 					      IS_ENABLED(CONFIG_DRM_XE_PAGEMAP);
 
-			for_each_tile(tile, vm->xe, id)
+			for_each_tile(tile, vm->xe, id) {
+				xe_migrate_ulls_enter(tile->migrate);
 				tile_mask |= 0x1 << id;
+			}
 
 			xa_init_flags(&op->prefetch_range.range, XA_FLAGS_ALLOC);
 			op->prefetch_range.ranges_count = 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 25/25] drm/xe: Add modparam to enable / disable ULLS on migrate queue
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (23 preceding siblings ...)
  2026-02-28  1:35 ` [PATCH v3 24/25] drm/xe: Enter ULLS for migration jobs upon page fault or SVM prefetch Matthew Brost
@ 2026-02-28  1:35 ` Matthew Brost
  2026-03-05 22:59   ` Summers, Stuart
  2026-02-28  1:43 ` ✗ CI.checkpatch: warning for CPU binds and ULLS on migration queue (rev3) Patchwork
                   ` (5 subsequent siblings)
  30 siblings, 1 reply; 63+ messages in thread
From: Matthew Brost @ 2026-02-28  1:35 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Having modparam to enable / disable ULLS on migrate queue will help with
quick experiments.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_debugfs.c      |  1 +
 drivers/gpu/drm/xe/xe_defaults.h     |  1 +
 drivers/gpu/drm/xe/xe_device.c       | 12 +++++++++---
 drivers/gpu/drm/xe/xe_device_types.h |  5 +++++
 drivers/gpu/drm/xe/xe_migrate.c      |  2 +-
 drivers/gpu/drm/xe/xe_module.c       |  4 ++++
 drivers/gpu/drm/xe/xe_module.h       |  1 +
 7 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
index 844cfafe1ec7..049389205b3f 100644
--- a/drivers/gpu/drm/xe/xe_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_debugfs.c
@@ -85,6 +85,7 @@ static int info(struct seq_file *m, void *data)
 	drm_printf(&p, "tile_count %d\n", xe->info.tile_count);
 	drm_printf(&p, "vm_max_level %d\n", xe->info.vm_max_level);
 	drm_printf(&p, "force_execlist %s\n", str_yes_no(xe->info.force_execlist));
+	drm_printf(&p, "ulls_enable %s\n", str_yes_no(xe->info.ulls_enable));
 	drm_printf(&p, "has_flat_ccs %s\n", str_yes_no(xe->info.has_flat_ccs));
 	drm_printf(&p, "has_usm %s\n", str_yes_no(xe->info.has_usm));
 	drm_printf(&p, "skip_guc_pc %s\n", str_yes_no(xe->info.skip_guc_pc));
diff --git a/drivers/gpu/drm/xe/xe_defaults.h b/drivers/gpu/drm/xe/xe_defaults.h
index c8ae1d5f3d60..299360546283 100644
--- a/drivers/gpu/drm/xe/xe_defaults.h
+++ b/drivers/gpu/drm/xe/xe_defaults.h
@@ -14,6 +14,7 @@
 #endif
 
 #define XE_DEFAULT_PROBE_DISPLAY		IS_ENABLED(CONFIG_DRM_XE_DISPLAY)
+#define XE_DEFAULT_ULLS_ENABLE			true
 #define XE_DEFAULT_VRAM_BAR_SIZE		0
 #define XE_DEFAULT_FORCE_PROBE			CONFIG_DRM_XE_FORCE_PROBE
 #define XE_DEFAULT_MAX_VFS			~0
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index b7ad7f97e68c..18af003c95c5 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -437,6 +437,14 @@ static void xe_device_destroy(struct drm_device *dev, void *dummy)
 	ttm_device_fini(&xe->ttm);
 }
 
+static void xe_device_parse_modparam(struct xe_device *xe)
+{
+	xe->info.force_execlist = xe_modparam.force_execlist;
+	xe->info.ulls_enable = xe_modparam.ulls_enable;
+	xe->atomic_svm_timeslice_ms = 5;
+	xe->min_run_period_lr_ms = 5;
+}
+
 struct xe_device *xe_device_create(struct pci_dev *pdev,
 				   const struct pci_device_id *ent)
 {
@@ -470,9 +478,7 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 
 	xe->info.devid = pdev->device;
 	xe->info.revid = pdev->revision;
-	xe->info.force_execlist = xe_modparam.force_execlist;
-	xe->atomic_svm_timeslice_ms = 5;
-	xe->min_run_period_lr_ms = 5;
+	xe_device_parse_modparam(xe);
 
 	err = xe_irq_init(xe);
 	if (err)
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index b3737dfcc45c..a20ff1707227 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -231,6 +231,11 @@ struct xe_device {
 		u8 skip_pcode:1;
 		/** @info.needs_shared_vf_gt_wq: needs shared GT WQ on VF */
 		u8 needs_shared_vf_gt_wq:1;
+		/**
+		 * @info.ulls_enable: Enable ULLS on migration queue in LR VM
+		 * open
+		 */
+		u8 ulls_enable:1;
 	} info;
 
 	/** @wa_active: keep track of active workarounds */
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 62f27868f56b..9f02e238e7c6 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -447,7 +447,7 @@ void xe_migrate_ulls_enter(struct xe_migrate *m)
 
 	xe_assert(xe, xe->info.has_usm);
 
-	if (!IS_DGFX(xe) || IS_SRIOV_VF(xe))
+	if (!IS_DGFX(xe) || IS_SRIOV_VF(xe) || !xe->info.ulls_enable)
 		return;
 
 job_alloc:
diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 4cb578182912..bb4fb967aec9 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -22,6 +22,7 @@
 
 struct xe_modparam xe_modparam = {
 	.probe_display =	XE_DEFAULT_PROBE_DISPLAY,
+	.ulls_enable =		XE_DEFAULT_ULLS_ENABLE,
 	.guc_log_level =	XE_DEFAULT_GUC_LOG_LEVEL,
 	.force_probe =		XE_DEFAULT_FORCE_PROBE,
 #ifdef CONFIG_PCI_IOV
@@ -45,6 +46,9 @@ MODULE_PARM_DESC(probe_display, "Probe display HW, otherwise it's left untouched
 		 "[default=" __stringify(XE_DEFAULT_PROBE_DISPLAY) "])");
 #endif
 
+module_param_named(ulls_enable, xe_modparam.ulls_enable, bool, 0444);
+MODULE_PARM_DESC(ulls_enable, "Enable ULLS on migration queue if LR VM open (default: true)");
+
 module_param_named(vram_bar_size, xe_modparam.force_vram_bar_size, int, 0600);
 MODULE_PARM_DESC(vram_bar_size, "Set the vram bar size in MiB (<0=disable-resize, 0=max-needed-size, >0=force-size "
 		 "[default=" __stringify(XE_DEFAULT_VRAM_BAR_SIZE) "])");
diff --git a/drivers/gpu/drm/xe/xe_module.h b/drivers/gpu/drm/xe/xe_module.h
index 79cb9639c0f3..f0220b694c40 100644
--- a/drivers/gpu/drm/xe/xe_module.h
+++ b/drivers/gpu/drm/xe/xe_module.h
@@ -12,6 +12,7 @@
 struct xe_modparam {
 	bool force_execlist;
 	bool probe_display;
+	bool ulls_enable;
 	int force_vram_bar_size;
 	int guc_log_level;
 	char *guc_firmware_path;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 25/25] drm/xe: Add modparam to enable / disable ULLS on migrate queue
  2026-02-28  1:35 ` [PATCH v3 25/25] drm/xe: Add modparam to enable / disable ULLS on migrate queue Matthew Brost
@ 2026-03-05 22:59   ` Summers, Stuart
  2026-04-01 22:44     ` Matthew Brost
  0 siblings, 1 reply; 63+ messages in thread
From: Summers, Stuart @ 2026-03-05 22:59 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Ghimiray, Himal Prasad, Yadav, Arvind,
	thomas.hellstrom@linux.intel.com, Dugast, Francois

On Fri, 2026-02-27 at 17:35 -0800, Matthew Brost wrote:
> Having modparam to enable / disable ULLS on migrate queue will help
> with
> quick experiments.

Can we do this in configfs instead?

Thanks,
Stuart

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_debugfs.c      |  1 +
>  drivers/gpu/drm/xe/xe_defaults.h     |  1 +
>  drivers/gpu/drm/xe/xe_device.c       | 12 +++++++++---
>  drivers/gpu/drm/xe/xe_device_types.h |  5 +++++
>  drivers/gpu/drm/xe/xe_migrate.c      |  2 +-
>  drivers/gpu/drm/xe/xe_module.c       |  4 ++++
>  drivers/gpu/drm/xe/xe_module.h       |  1 +
>  7 files changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_debugfs.c
> b/drivers/gpu/drm/xe/xe_debugfs.c
> index 844cfafe1ec7..049389205b3f 100644
> --- a/drivers/gpu/drm/xe/xe_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_debugfs.c
> @@ -85,6 +85,7 @@ static int info(struct seq_file *m, void *data)
>         drm_printf(&p, "tile_count %d\n", xe->info.tile_count);
>         drm_printf(&p, "vm_max_level %d\n", xe->info.vm_max_level);
>         drm_printf(&p, "force_execlist %s\n", str_yes_no(xe-
> >info.force_execlist));
> +       drm_printf(&p, "ulls_enable %s\n", str_yes_no(xe-
> >info.ulls_enable));
>         drm_printf(&p, "has_flat_ccs %s\n", str_yes_no(xe-
> >info.has_flat_ccs));
>         drm_printf(&p, "has_usm %s\n", str_yes_no(xe->info.has_usm));
>         drm_printf(&p, "skip_guc_pc %s\n", str_yes_no(xe-
> >info.skip_guc_pc));
> diff --git a/drivers/gpu/drm/xe/xe_defaults.h
> b/drivers/gpu/drm/xe/xe_defaults.h
> index c8ae1d5f3d60..299360546283 100644
> --- a/drivers/gpu/drm/xe/xe_defaults.h
> +++ b/drivers/gpu/drm/xe/xe_defaults.h
> @@ -14,6 +14,7 @@
>  #endif
>  
>  #define
> XE_DEFAULT_PROBE_DISPLAY               IS_ENABLED(CONFIG_DRM_XE_DISPL
> AY)
> +#define XE_DEFAULT_ULLS_ENABLE                 true
>  #define XE_DEFAULT_VRAM_BAR_SIZE               0
>  #define
> XE_DEFAULT_FORCE_PROBE                 CONFIG_DRM_XE_FORCE_PROBE
>  #define XE_DEFAULT_MAX_VFS                     ~0
> diff --git a/drivers/gpu/drm/xe/xe_device.c
> b/drivers/gpu/drm/xe/xe_device.c
> index b7ad7f97e68c..18af003c95c5 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -437,6 +437,14 @@ static void xe_device_destroy(struct drm_device
> *dev, void *dummy)
>         ttm_device_fini(&xe->ttm);
>  }
>  
> +static void xe_device_parse_modparam(struct xe_device *xe)
> +{
> +       xe->info.force_execlist = xe_modparam.force_execlist;
> +       xe->info.ulls_enable = xe_modparam.ulls_enable;
> +       xe->atomic_svm_timeslice_ms = 5;
> +       xe->min_run_period_lr_ms = 5;
> +}
> +
>  struct xe_device *xe_device_create(struct pci_dev *pdev,
>                                    const struct pci_device_id *ent)
>  {
> @@ -470,9 +478,7 @@ struct xe_device *xe_device_create(struct pci_dev
> *pdev,
>  
>         xe->info.devid = pdev->device;
>         xe->info.revid = pdev->revision;
> -       xe->info.force_execlist = xe_modparam.force_execlist;
> -       xe->atomic_svm_timeslice_ms = 5;
> -       xe->min_run_period_lr_ms = 5;
> +       xe_device_parse_modparam(xe);
>  
>         err = xe_irq_init(xe);
>         if (err)
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> b/drivers/gpu/drm/xe/xe_device_types.h
> index b3737dfcc45c..a20ff1707227 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -231,6 +231,11 @@ struct xe_device {
>                 u8 skip_pcode:1;
>                 /** @info.needs_shared_vf_gt_wq: needs shared GT WQ
> on VF */
>                 u8 needs_shared_vf_gt_wq:1;
> +               /**
> +                * @info.ulls_enable: Enable ULLS on migration queue
> in LR VM
> +                * open
> +                */
> +               u8 ulls_enable:1;
>         } info;
>  
>         /** @wa_active: keep track of active workarounds */
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> b/drivers/gpu/drm/xe/xe_migrate.c
> index 62f27868f56b..9f02e238e7c6 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -447,7 +447,7 @@ void xe_migrate_ulls_enter(struct xe_migrate *m)
>  
>         xe_assert(xe, xe->info.has_usm);
>  
> -       if (!IS_DGFX(xe) || IS_SRIOV_VF(xe))
> +       if (!IS_DGFX(xe) || IS_SRIOV_VF(xe) || !xe->info.ulls_enable)
>                 return;
>  
>  job_alloc:
> diff --git a/drivers/gpu/drm/xe/xe_module.c
> b/drivers/gpu/drm/xe/xe_module.c
> index 4cb578182912..bb4fb967aec9 100644
> --- a/drivers/gpu/drm/xe/xe_module.c
> +++ b/drivers/gpu/drm/xe/xe_module.c
> @@ -22,6 +22,7 @@
>  
>  struct xe_modparam xe_modparam = {
>         .probe_display =        XE_DEFAULT_PROBE_DISPLAY,
> +       .ulls_enable =          XE_DEFAULT_ULLS_ENABLE,
>         .guc_log_level =        XE_DEFAULT_GUC_LOG_LEVEL,
>         .force_probe =          XE_DEFAULT_FORCE_PROBE,
>  #ifdef CONFIG_PCI_IOV
> @@ -45,6 +46,9 @@ MODULE_PARM_DESC(probe_display, "Probe display HW,
> otherwise it's left untouched
>                  "[default=" __stringify(XE_DEFAULT_PROBE_DISPLAY)
> "])");
>  #endif
>  
> +module_param_named(ulls_enable, xe_modparam.ulls_enable, bool,
> 0444);
> +MODULE_PARM_DESC(ulls_enable, "Enable ULLS on migration queue if LR
> VM open (default: true)");
> +
>  module_param_named(vram_bar_size, xe_modparam.force_vram_bar_size,
> int, 0600);
>  MODULE_PARM_DESC(vram_bar_size, "Set the vram bar size in MiB
> (<0=disable-resize, 0=max-needed-size, >0=force-size "
>                  "[default=" __stringify(XE_DEFAULT_VRAM_BAR_SIZE)
> "])");
> diff --git a/drivers/gpu/drm/xe/xe_module.h
> b/drivers/gpu/drm/xe/xe_module.h
> index 79cb9639c0f3..f0220b694c40 100644
> --- a/drivers/gpu/drm/xe/xe_module.h
> +++ b/drivers/gpu/drm/xe/xe_module.h
> @@ -12,6 +12,7 @@
>  struct xe_modparam {
>         bool force_execlist;
>         bool probe_display;
> +       bool ulls_enable;
>         int force_vram_bar_size;
>         int guc_log_level;
>         char *guc_firmware_path;


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 25/25] drm/xe: Add modparam to enable / disable ULLS on migrate queue
  2026-03-05 22:59   ` Summers, Stuart
@ 2026-04-01 22:44     ` Matthew Brost
  0 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-04-01 22:44 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, Ghimiray, Himal Prasad,
	Yadav, Arvind, thomas.hellstrom@linux.intel.com, Dugast, Francois

On Thu, Mar 05, 2026 at 03:59:24PM -0700, Summers, Stuart wrote:
> On Fri, 2026-02-27 at 17:35 -0800, Matthew Brost wrote:
> > Having modparam to enable / disable ULLS on migrate queue will help
> > with
> > quick experiments.
> 
> Can we do this in configfs instead?
> 

We could... I don't know about the guidance rules on configfs though -
e.g., is that ABI? Modparams definitely are not... I don't really want
anything to be ABI here - I'd rather just drop this than add ABI.

Matt

> Thanks,
> Stuart
> 
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_debugfs.c      |  1 +
> >  drivers/gpu/drm/xe/xe_defaults.h     |  1 +
> >  drivers/gpu/drm/xe/xe_device.c       | 12 +++++++++---
> >  drivers/gpu/drm/xe/xe_device_types.h |  5 +++++
> >  drivers/gpu/drm/xe/xe_migrate.c      |  2 +-
> >  drivers/gpu/drm/xe/xe_module.c       |  4 ++++
> >  drivers/gpu/drm/xe/xe_module.h       |  1 +
> >  7 files changed, 22 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_debugfs.c
> > b/drivers/gpu/drm/xe/xe_debugfs.c
> > index 844cfafe1ec7..049389205b3f 100644
> > --- a/drivers/gpu/drm/xe/xe_debugfs.c
> > +++ b/drivers/gpu/drm/xe/xe_debugfs.c
> > @@ -85,6 +85,7 @@ static int info(struct seq_file *m, void *data)
> >         drm_printf(&p, "tile_count %d\n", xe->info.tile_count);
> >         drm_printf(&p, "vm_max_level %d\n", xe->info.vm_max_level);
> >         drm_printf(&p, "force_execlist %s\n", str_yes_no(xe-
> > >info.force_execlist));
> > +       drm_printf(&p, "ulls_enable %s\n", str_yes_no(xe-
> > >info.ulls_enable));
> >         drm_printf(&p, "has_flat_ccs %s\n", str_yes_no(xe-
> > >info.has_flat_ccs));
> >         drm_printf(&p, "has_usm %s\n", str_yes_no(xe->info.has_usm));
> >         drm_printf(&p, "skip_guc_pc %s\n", str_yes_no(xe-
> > >info.skip_guc_pc));
> > diff --git a/drivers/gpu/drm/xe/xe_defaults.h
> > b/drivers/gpu/drm/xe/xe_defaults.h
> > index c8ae1d5f3d60..299360546283 100644
> > --- a/drivers/gpu/drm/xe/xe_defaults.h
> > +++ b/drivers/gpu/drm/xe/xe_defaults.h
> > @@ -14,6 +14,7 @@
> >  #endif
> >  
> >  #define
> > XE_DEFAULT_PROBE_DISPLAY               IS_ENABLED(CONFIG_DRM_XE_DISPL
> > AY)
> > +#define XE_DEFAULT_ULLS_ENABLE                 true
> >  #define XE_DEFAULT_VRAM_BAR_SIZE               0
> >  #define
> > XE_DEFAULT_FORCE_PROBE                 CONFIG_DRM_XE_FORCE_PROBE
> >  #define XE_DEFAULT_MAX_VFS                     ~0
> > diff --git a/drivers/gpu/drm/xe/xe_device.c
> > b/drivers/gpu/drm/xe/xe_device.c
> > index b7ad7f97e68c..18af003c95c5 100644
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -437,6 +437,14 @@ static void xe_device_destroy(struct drm_device
> > *dev, void *dummy)
> >         ttm_device_fini(&xe->ttm);
> >  }
> >  
> > +static void xe_device_parse_modparam(struct xe_device *xe)
> > +{
> > +       xe->info.force_execlist = xe_modparam.force_execlist;
> > +       xe->info.ulls_enable = xe_modparam.ulls_enable;
> > +       xe->atomic_svm_timeslice_ms = 5;
> > +       xe->min_run_period_lr_ms = 5;
> > +}
> > +
> >  struct xe_device *xe_device_create(struct pci_dev *pdev,
> >                                    const struct pci_device_id *ent)
> >  {
> > @@ -470,9 +478,7 @@ struct xe_device *xe_device_create(struct pci_dev
> > *pdev,
> >  
> >         xe->info.devid = pdev->device;
> >         xe->info.revid = pdev->revision;
> > -       xe->info.force_execlist = xe_modparam.force_execlist;
> > -       xe->atomic_svm_timeslice_ms = 5;
> > -       xe->min_run_period_lr_ms = 5;
> > +       xe_device_parse_modparam(xe);
> >  
> >         err = xe_irq_init(xe);
> >         if (err)
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > b/drivers/gpu/drm/xe/xe_device_types.h
> > index b3737dfcc45c..a20ff1707227 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -231,6 +231,11 @@ struct xe_device {
> >                 u8 skip_pcode:1;
> >                 /** @info.needs_shared_vf_gt_wq: needs shared GT WQ
> > on VF */
> >                 u8 needs_shared_vf_gt_wq:1;
> > +               /**
> > +                * @info.ulls_enable: Enable ULLS on migration queue
> > in LR VM
> > +                * open
> > +                */
> > +               u8 ulls_enable:1;
> >         } info;
> >  
> >         /** @wa_active: keep track of active workarounds */
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > b/drivers/gpu/drm/xe/xe_migrate.c
> > index 62f27868f56b..9f02e238e7c6 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -447,7 +447,7 @@ void xe_migrate_ulls_enter(struct xe_migrate *m)
> >  
> >         xe_assert(xe, xe->info.has_usm);
> >  
> > -       if (!IS_DGFX(xe) || IS_SRIOV_VF(xe))
> > +       if (!IS_DGFX(xe) || IS_SRIOV_VF(xe) || !xe->info.ulls_enable)
> >                 return;
> >  
> >  job_alloc:
> > diff --git a/drivers/gpu/drm/xe/xe_module.c
> > b/drivers/gpu/drm/xe/xe_module.c
> > index 4cb578182912..bb4fb967aec9 100644
> > --- a/drivers/gpu/drm/xe/xe_module.c
> > +++ b/drivers/gpu/drm/xe/xe_module.c
> > @@ -22,6 +22,7 @@
> >  
> >  struct xe_modparam xe_modparam = {
> >         .probe_display =        XE_DEFAULT_PROBE_DISPLAY,
> > +       .ulls_enable =          XE_DEFAULT_ULLS_ENABLE,
> >         .guc_log_level =        XE_DEFAULT_GUC_LOG_LEVEL,
> >         .force_probe =          XE_DEFAULT_FORCE_PROBE,
> >  #ifdef CONFIG_PCI_IOV
> > @@ -45,6 +46,9 @@ MODULE_PARM_DESC(probe_display, "Probe display HW,
> > otherwise it's left untouched
> >                  "[default=" __stringify(XE_DEFAULT_PROBE_DISPLAY)
> > "])");
> >  #endif
> >  
> > +module_param_named(ulls_enable, xe_modparam.ulls_enable, bool,
> > 0444);
> > +MODULE_PARM_DESC(ulls_enable, "Enable ULLS on migration queue if LR
> > VM open (default: true)");
> > +
> >  module_param_named(vram_bar_size, xe_modparam.force_vram_bar_size,
> > int, 0600);
> >  MODULE_PARM_DESC(vram_bar_size, "Set the vram bar size in MiB
> > (<0=disable-resize, 0=max-needed-size, >0=force-size "
> >                  "[default=" __stringify(XE_DEFAULT_VRAM_BAR_SIZE)
> > "])");
> > diff --git a/drivers/gpu/drm/xe/xe_module.h
> > b/drivers/gpu/drm/xe/xe_module.h
> > index 79cb9639c0f3..f0220b694c40 100644
> > --- a/drivers/gpu/drm/xe/xe_module.h
> > +++ b/drivers/gpu/drm/xe/xe_module.h
> > @@ -12,6 +12,7 @@
> >  struct xe_modparam {
> >         bool force_execlist;
> >         bool probe_display;
> > +       bool ulls_enable;
> >         int force_vram_bar_size;
> >         int guc_log_level;
> >         char *guc_firmware_path;
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* ✗ CI.checkpatch: warning for CPU binds and ULLS on migration queue (rev3)
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (24 preceding siblings ...)
  2026-02-28  1:35 ` [PATCH v3 25/25] drm/xe: Add modparam to enable / disable ULLS on migrate queue Matthew Brost
@ 2026-02-28  1:43 ` Patchwork
  2026-02-28  1:44 ` ✓ CI.KUnit: success " Patchwork
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 63+ messages in thread
From: Patchwork @ 2026-02-28  1:43 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: CPU binds and ULLS on migration queue (rev3)
URL   : https://patchwork.freedesktop.org/series/149888/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
1f57ba1afceae32108bd24770069f764d940a0e4
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 531eaee59c3ac68792e045084aa7d5ae6a0b0ea6
Author: Matthew Brost <matthew.brost@intel.com>
Date:   Fri Feb 27 17:35:01 2026 -0800

    drm/xe: Add modparam to enable / disable ULLS on migrate queue
    
    Having modparam to enable / disable ULLS on migrate queue will help with
    quick experiments.
    
    Signed-off-by: Matthew Brost <matthew.brost@intel.com>
+ /mt/dim checkpatch a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1 drm-intel
c6d7a5d86f5a drm/xe: Drop struct xe_migrate_pt_update argument from populate/clear vfuns
03799a61f336 drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper
7e1e3cd4526c drm/xe: Decouple exec queue idle check from LRC
06c680c5a5e4 drm/xe: Add job count to GuC exec queue snapshot
9b9d70d7490a drm/xe: Update xe_bo_put_deferred arguments to include writeback flag
ac11091a4b48 drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC
71e369db81f3 drm/xe: Update scheduler job layer to support PT jobs
84b14a4ca5b4 drm/xe: Add helpers to access PT ops
549a82d6b216 drm/xe: Add struct xe_pt_job_ops
-:177: WARNING:ALLOC_WITH_SIZEOF: Prefer kmalloc_obj over kmalloc with sizeof
#177: FILE: drivers/gpu/drm/xe/xe_pt.c:2769:
+	pt_job_ops = kmalloc(sizeof(*pt_job_ops), GFP_KERNEL);

total: 0 errors, 1 warnings, 0 checks, 296 lines checked
6a327420218d drm/xe: Update GuC submission backend to run PT jobs
-:122: CHECK:LINE_SPACING: Please don't use multiple blank lines
#122: FILE: drivers/gpu/drm/xe/xe_migrate.h:161:
 
+

total: 0 errors, 0 warnings, 1 checks, 99 lines checked
b856a54d5750 drm/xe: Store level in struct xe_vm_pgtable_update
53b3d37165f1 drm/xe: Don't use migrate exec queue for page fault binds
a627e2044eca drm/xe: Enable CPU binds for jobs
4435208ab89b drm/xe: Remove unused arguments from xe_migrate_pt_update_ops
9a118a617834 drm/xe: Make bind queues operate cross-tile
-:304: CHECK:MACRO_ARG_REUSE: Macro argument reuse '__i' - possible side-effects?
#304: FILE: drivers/gpu/drm/xe/xe_exec_queue.h:17:
+#define for_each_tlb_inval(__q, __i)	\
+	for (__i = 0; __i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++__i)	\
+		for_each_if((__q)->tlb_inval[__i].dep_scheduler)

total: 0 errors, 0 warnings, 1 checks, 630 lines checked
c91664d91904 drm/xe: Add CPU bind layer
-:32: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#32: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 2335 lines checked
e7a0e614bea1 drm/xe: Add device flag to enable PT mirroring across tiles
a2c648d76671 drm/xe: Add xe_hw_engine_write_ring_tail
31a9b7b87b03 drm/xe: Add ULLS support to LRC
c5f1ef0cc157 drm/xe: Add ULLS migration job support to migration layer
9c4f8e2efd57 drm/xe: Add MI_SEMAPHORE_WAIT instruction defs
1cd235bcabf8 drm/xe: Add ULLS migration job support to ring ops
118337e217f7 drm/xe: Add ULLS migration job support to GuC submission
582f957830f0 drm/xe: Enter ULLS for migration jobs upon page fault or SVM prefetch
531eaee59c3a drm/xe: Add modparam to enable / disable ULLS on migrate queue



^ permalink raw reply	[flat|nested] 63+ messages in thread

* ✓ CI.KUnit: success for CPU binds and ULLS on migration queue (rev3)
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (25 preceding siblings ...)
  2026-02-28  1:43 ` ✗ CI.checkpatch: warning for CPU binds and ULLS on migration queue (rev3) Patchwork
@ 2026-02-28  1:44 ` Patchwork
  2026-02-28  2:32 ` ✓ Xe.CI.BAT: " Patchwork
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 63+ messages in thread
From: Patchwork @ 2026-02-28  1:44 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: CPU binds and ULLS on migration queue (rev3)
URL   : https://patchwork.freedesktop.org/series/149888/
State : success

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[01:43:40] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[01:43:45] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[01:44:15] Starting KUnit Kernel (1/1)...
[01:44:15] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[01:44:15] ================== guc_buf (11 subtests) ===================
[01:44:15] [PASSED] test_smallest
[01:44:15] [PASSED] test_largest
[01:44:15] [PASSED] test_granular
[01:44:15] [PASSED] test_unique
[01:44:15] [PASSED] test_overlap
[01:44:15] [PASSED] test_reusable
[01:44:15] [PASSED] test_too_big
[01:44:15] [PASSED] test_flush
[01:44:15] [PASSED] test_lookup
[01:44:15] [PASSED] test_data
[01:44:15] [PASSED] test_class
[01:44:15] ===================== [PASSED] guc_buf =====================
[01:44:15] =================== guc_dbm (7 subtests) ===================
[01:44:15] [PASSED] test_empty
[01:44:15] [PASSED] test_default
[01:44:15] ======================== test_size  ========================
[01:44:15] [PASSED] 4
[01:44:15] [PASSED] 8
[01:44:15] [PASSED] 32
[01:44:15] [PASSED] 256
[01:44:15] ==================== [PASSED] test_size ====================
[01:44:15] ======================= test_reuse  ========================
[01:44:15] [PASSED] 4
[01:44:15] [PASSED] 8
[01:44:15] [PASSED] 32
[01:44:15] [PASSED] 256
[01:44:15] =================== [PASSED] test_reuse ====================
[01:44:15] =================== test_range_overlap  ====================
[01:44:15] [PASSED] 4
[01:44:15] [PASSED] 8
[01:44:15] [PASSED] 32
[01:44:15] [PASSED] 256
[01:44:15] =============== [PASSED] test_range_overlap ================
[01:44:15] =================== test_range_compact  ====================
[01:44:15] [PASSED] 4
[01:44:15] [PASSED] 8
[01:44:15] [PASSED] 32
[01:44:15] [PASSED] 256
[01:44:15] =============== [PASSED] test_range_compact ================
[01:44:15] ==================== test_range_spare  =====================
[01:44:15] [PASSED] 4
[01:44:15] [PASSED] 8
[01:44:15] [PASSED] 32
[01:44:15] [PASSED] 256
[01:44:15] ================ [PASSED] test_range_spare =================
[01:44:15] ===================== [PASSED] guc_dbm =====================
[01:44:15] =================== guc_idm (6 subtests) ===================
[01:44:15] [PASSED] bad_init
[01:44:15] [PASSED] no_init
[01:44:15] [PASSED] init_fini
[01:44:15] [PASSED] check_used
[01:44:15] [PASSED] check_quota
[01:44:15] [PASSED] check_all
[01:44:15] ===================== [PASSED] guc_idm =====================
[01:44:15] ================== no_relay (3 subtests) ===================
[01:44:15] [PASSED] xe_drops_guc2pf_if_not_ready
[01:44:15] [PASSED] xe_drops_guc2vf_if_not_ready
[01:44:15] [PASSED] xe_rejects_send_if_not_ready
[01:44:15] ==================== [PASSED] no_relay =====================
[01:44:15] ================== pf_relay (14 subtests) ==================
[01:44:15] [PASSED] pf_rejects_guc2pf_too_short
[01:44:15] [PASSED] pf_rejects_guc2pf_too_long
[01:44:15] [PASSED] pf_rejects_guc2pf_no_payload
[01:44:15] [PASSED] pf_fails_no_payload
[01:44:15] [PASSED] pf_fails_bad_origin
[01:44:15] [PASSED] pf_fails_bad_type
[01:44:15] [PASSED] pf_txn_reports_error
[01:44:15] [PASSED] pf_txn_sends_pf2guc
[01:44:15] [PASSED] pf_sends_pf2guc
[01:44:15] [SKIPPED] pf_loopback_nop
[01:44:15] [SKIPPED] pf_loopback_echo
[01:44:15] [SKIPPED] pf_loopback_fail
[01:44:15] [SKIPPED] pf_loopback_busy
[01:44:15] [SKIPPED] pf_loopback_retry
[01:44:15] ==================== [PASSED] pf_relay =====================
[01:44:15] ================== vf_relay (3 subtests) ===================
[01:44:15] [PASSED] vf_rejects_guc2vf_too_short
[01:44:15] [PASSED] vf_rejects_guc2vf_too_long
[01:44:15] [PASSED] vf_rejects_guc2vf_no_payload
[01:44:15] ==================== [PASSED] vf_relay =====================
[01:44:15] ================ pf_gt_config (9 subtests) =================
[01:44:15] [PASSED] fair_contexts_1vf
[01:44:15] [PASSED] fair_doorbells_1vf
[01:44:15] [PASSED] fair_ggtt_1vf
[01:44:15] ====================== fair_vram_1vf  ======================
[01:44:15] [PASSED] 3.50 GiB
[01:44:15] [PASSED] 11.5 GiB
[01:44:15] [PASSED] 15.5 GiB
[01:44:15] [PASSED] 31.5 GiB
[01:44:15] [PASSED] 63.5 GiB
[01:44:15] [PASSED] 13.9 GiB
[01:44:15] ================== [PASSED] fair_vram_1vf ==================
[01:44:15] ================ fair_vram_1vf_admin_only  =================
[01:44:15] [PASSED] 3.50 GiB
[01:44:15] [PASSED] 11.5 GiB
[01:44:15] [PASSED] 15.5 GiB
[01:44:15] [PASSED] 31.5 GiB
[01:44:15] [PASSED] 63.5 GiB
[01:44:15] [PASSED] 13.9 GiB
[01:44:15] ============ [PASSED] fair_vram_1vf_admin_only =============
[01:44:15] ====================== fair_contexts  ======================
[01:44:15] [PASSED] 1 VF
[01:44:15] [PASSED] 2 VFs
[01:44:15] [PASSED] 3 VFs
[01:44:15] [PASSED] 4 VFs
[01:44:15] [PASSED] 5 VFs
[01:44:15] [PASSED] 6 VFs
[01:44:15] [PASSED] 7 VFs
[01:44:15] [PASSED] 8 VFs
[01:44:15] [PASSED] 9 VFs
[01:44:15] [PASSED] 10 VFs
[01:44:15] [PASSED] 11 VFs
[01:44:15] [PASSED] 12 VFs
[01:44:15] [PASSED] 13 VFs
[01:44:15] [PASSED] 14 VFs
[01:44:15] [PASSED] 15 VFs
[01:44:15] [PASSED] 16 VFs
[01:44:15] [PASSED] 17 VFs
[01:44:15] [PASSED] 18 VFs
[01:44:15] [PASSED] 19 VFs
[01:44:15] [PASSED] 20 VFs
[01:44:16] [PASSED] 21 VFs
[01:44:16] [PASSED] 22 VFs
[01:44:16] [PASSED] 23 VFs
[01:44:16] [PASSED] 24 VFs
[01:44:16] [PASSED] 25 VFs
[01:44:16] [PASSED] 26 VFs
[01:44:16] [PASSED] 27 VFs
[01:44:16] [PASSED] 28 VFs
[01:44:16] [PASSED] 29 VFs
[01:44:16] [PASSED] 30 VFs
[01:44:16] [PASSED] 31 VFs
[01:44:16] [PASSED] 32 VFs
[01:44:16] [PASSED] 33 VFs
[01:44:16] [PASSED] 34 VFs
[01:44:16] [PASSED] 35 VFs
[01:44:16] [PASSED] 36 VFs
[01:44:16] [PASSED] 37 VFs
[01:44:16] [PASSED] 38 VFs
[01:44:16] [PASSED] 39 VFs
[01:44:16] [PASSED] 40 VFs
[01:44:16] [PASSED] 41 VFs
[01:44:16] [PASSED] 42 VFs
[01:44:16] [PASSED] 43 VFs
[01:44:16] [PASSED] 44 VFs
[01:44:16] [PASSED] 45 VFs
[01:44:16] [PASSED] 46 VFs
[01:44:16] [PASSED] 47 VFs
[01:44:16] [PASSED] 48 VFs
[01:44:16] [PASSED] 49 VFs
[01:44:16] [PASSED] 50 VFs
[01:44:16] [PASSED] 51 VFs
[01:44:16] [PASSED] 52 VFs
[01:44:16] [PASSED] 53 VFs
[01:44:16] [PASSED] 54 VFs
[01:44:16] [PASSED] 55 VFs
[01:44:16] [PASSED] 56 VFs
[01:44:16] [PASSED] 57 VFs
[01:44:16] [PASSED] 58 VFs
[01:44:16] [PASSED] 59 VFs
[01:44:16] [PASSED] 60 VFs
[01:44:16] [PASSED] 61 VFs
[01:44:16] [PASSED] 62 VFs
[01:44:16] [PASSED] 63 VFs
[01:44:16] ================== [PASSED] fair_contexts ==================
[01:44:16] ===================== fair_doorbells  ======================
[01:44:16] [PASSED] 1 VF
[01:44:16] [PASSED] 2 VFs
[01:44:16] [PASSED] 3 VFs
[01:44:16] [PASSED] 4 VFs
[01:44:16] [PASSED] 5 VFs
[01:44:16] [PASSED] 6 VFs
[01:44:16] [PASSED] 7 VFs
[01:44:16] [PASSED] 8 VFs
[01:44:16] [PASSED] 9 VFs
[01:44:16] [PASSED] 10 VFs
[01:44:16] [PASSED] 11 VFs
[01:44:16] [PASSED] 12 VFs
[01:44:16] [PASSED] 13 VFs
[01:44:16] [PASSED] 14 VFs
[01:44:16] [PASSED] 15 VFs
[01:44:16] [PASSED] 16 VFs
[01:44:16] [PASSED] 17 VFs
[01:44:16] [PASSED] 18 VFs
[01:44:16] [PASSED] 19 VFs
[01:44:16] [PASSED] 20 VFs
[01:44:16] [PASSED] 21 VFs
[01:44:16] [PASSED] 22 VFs
[01:44:16] [PASSED] 23 VFs
[01:44:16] [PASSED] 24 VFs
[01:44:16] [PASSED] 25 VFs
[01:44:16] [PASSED] 26 VFs
[01:44:16] [PASSED] 27 VFs
[01:44:16] [PASSED] 28 VFs
[01:44:16] [PASSED] 29 VFs
[01:44:16] [PASSED] 30 VFs
[01:44:16] [PASSED] 31 VFs
[01:44:16] [PASSED] 32 VFs
[01:44:16] [PASSED] 33 VFs
[01:44:16] [PASSED] 34 VFs
[01:44:16] [PASSED] 35 VFs
[01:44:16] [PASSED] 36 VFs
[01:44:16] [PASSED] 37 VFs
[01:44:16] [PASSED] 38 VFs
[01:44:16] [PASSED] 39 VFs
[01:44:16] [PASSED] 40 VFs
[01:44:16] [PASSED] 41 VFs
[01:44:16] [PASSED] 42 VFs
[01:44:16] [PASSED] 43 VFs
[01:44:16] [PASSED] 44 VFs
[01:44:16] [PASSED] 45 VFs
[01:44:16] [PASSED] 46 VFs
[01:44:16] [PASSED] 47 VFs
[01:44:16] [PASSED] 48 VFs
[01:44:16] [PASSED] 49 VFs
[01:44:16] [PASSED] 50 VFs
[01:44:16] [PASSED] 51 VFs
[01:44:16] [PASSED] 52 VFs
[01:44:16] [PASSED] 53 VFs
[01:44:16] [PASSED] 54 VFs
[01:44:16] [PASSED] 55 VFs
[01:44:16] [PASSED] 56 VFs
[01:44:16] [PASSED] 57 VFs
[01:44:16] [PASSED] 58 VFs
[01:44:16] [PASSED] 59 VFs
[01:44:16] [PASSED] 60 VFs
[01:44:16] [PASSED] 61 VFs
[01:44:16] [PASSED] 62 VFs
[01:44:16] [PASSED] 63 VFs
[01:44:16] ================= [PASSED] fair_doorbells ==================
[01:44:16] ======================== fair_ggtt  ========================
[01:44:16] [PASSED] 1 VF
[01:44:16] [PASSED] 2 VFs
[01:44:16] [PASSED] 3 VFs
[01:44:16] [PASSED] 4 VFs
[01:44:16] [PASSED] 5 VFs
[01:44:16] [PASSED] 6 VFs
[01:44:16] [PASSED] 7 VFs
[01:44:16] [PASSED] 8 VFs
[01:44:16] [PASSED] 9 VFs
[01:44:16] [PASSED] 10 VFs
[01:44:16] [PASSED] 11 VFs
[01:44:16] [PASSED] 12 VFs
[01:44:16] [PASSED] 13 VFs
[01:44:16] [PASSED] 14 VFs
[01:44:16] [PASSED] 15 VFs
[01:44:16] [PASSED] 16 VFs
[01:44:16] [PASSED] 17 VFs
[01:44:16] [PASSED] 18 VFs
[01:44:16] [PASSED] 19 VFs
[01:44:16] [PASSED] 20 VFs
[01:44:16] [PASSED] 21 VFs
[01:44:16] [PASSED] 22 VFs
[01:44:16] [PASSED] 23 VFs
[01:44:16] [PASSED] 24 VFs
[01:44:16] [PASSED] 25 VFs
[01:44:16] [PASSED] 26 VFs
[01:44:16] [PASSED] 27 VFs
[01:44:16] [PASSED] 28 VFs
[01:44:16] [PASSED] 29 VFs
[01:44:16] [PASSED] 30 VFs
[01:44:16] [PASSED] 31 VFs
[01:44:16] [PASSED] 32 VFs
[01:44:16] [PASSED] 33 VFs
[01:44:16] [PASSED] 34 VFs
[01:44:16] [PASSED] 35 VFs
[01:44:16] [PASSED] 36 VFs
[01:44:16] [PASSED] 37 VFs
[01:44:16] [PASSED] 38 VFs
[01:44:16] [PASSED] 39 VFs
[01:44:16] [PASSED] 40 VFs
[01:44:16] [PASSED] 41 VFs
[01:44:16] [PASSED] 42 VFs
[01:44:16] [PASSED] 43 VFs
[01:44:16] [PASSED] 44 VFs
[01:44:16] [PASSED] 45 VFs
[01:44:16] [PASSED] 46 VFs
[01:44:16] [PASSED] 47 VFs
[01:44:16] [PASSED] 48 VFs
[01:44:16] [PASSED] 49 VFs
[01:44:16] [PASSED] 50 VFs
[01:44:16] [PASSED] 51 VFs
[01:44:16] [PASSED] 52 VFs
[01:44:16] [PASSED] 53 VFs
[01:44:16] [PASSED] 54 VFs
[01:44:16] [PASSED] 55 VFs
[01:44:16] [PASSED] 56 VFs
[01:44:16] [PASSED] 57 VFs
[01:44:16] [PASSED] 58 VFs
[01:44:16] [PASSED] 59 VFs
[01:44:16] [PASSED] 60 VFs
[01:44:16] [PASSED] 61 VFs
[01:44:16] [PASSED] 62 VFs
[01:44:16] [PASSED] 63 VFs
[01:44:16] ==================== [PASSED] fair_ggtt ====================
[01:44:16] ======================== fair_vram  ========================
[01:44:16] [PASSED] 1 VF
[01:44:16] [PASSED] 2 VFs
[01:44:16] [PASSED] 3 VFs
[01:44:16] [PASSED] 4 VFs
[01:44:16] [PASSED] 5 VFs
[01:44:16] [PASSED] 6 VFs
[01:44:16] [PASSED] 7 VFs
[01:44:16] [PASSED] 8 VFs
[01:44:16] [PASSED] 9 VFs
[01:44:16] [PASSED] 10 VFs
[01:44:16] [PASSED] 11 VFs
[01:44:16] [PASSED] 12 VFs
[01:44:16] [PASSED] 13 VFs
[01:44:16] [PASSED] 14 VFs
[01:44:16] [PASSED] 15 VFs
[01:44:16] [PASSED] 16 VFs
[01:44:16] [PASSED] 17 VFs
[01:44:16] [PASSED] 18 VFs
[01:44:16] [PASSED] 19 VFs
[01:44:16] [PASSED] 20 VFs
[01:44:16] [PASSED] 21 VFs
[01:44:16] [PASSED] 22 VFs
[01:44:16] [PASSED] 23 VFs
[01:44:16] [PASSED] 24 VFs
[01:44:16] [PASSED] 25 VFs
[01:44:16] [PASSED] 26 VFs
[01:44:16] [PASSED] 27 VFs
[01:44:16] [PASSED] 28 VFs
[01:44:16] [PASSED] 29 VFs
[01:44:16] [PASSED] 30 VFs
[01:44:16] [PASSED] 31 VFs
[01:44:16] [PASSED] 32 VFs
[01:44:16] [PASSED] 33 VFs
[01:44:16] [PASSED] 34 VFs
[01:44:16] [PASSED] 35 VFs
[01:44:16] [PASSED] 36 VFs
[01:44:16] [PASSED] 37 VFs
[01:44:16] [PASSED] 38 VFs
[01:44:16] [PASSED] 39 VFs
[01:44:16] [PASSED] 40 VFs
[01:44:16] [PASSED] 41 VFs
[01:44:16] [PASSED] 42 VFs
[01:44:16] [PASSED] 43 VFs
[01:44:16] [PASSED] 44 VFs
[01:44:16] [PASSED] 45 VFs
[01:44:16] [PASSED] 46 VFs
[01:44:16] [PASSED] 47 VFs
[01:44:16] [PASSED] 48 VFs
[01:44:16] [PASSED] 49 VFs
[01:44:16] [PASSED] 50 VFs
[01:44:16] [PASSED] 51 VFs
[01:44:16] [PASSED] 52 VFs
[01:44:16] [PASSED] 53 VFs
[01:44:16] [PASSED] 54 VFs
[01:44:16] [PASSED] 55 VFs
[01:44:16] [PASSED] 56 VFs
[01:44:16] [PASSED] 57 VFs
[01:44:16] [PASSED] 58 VFs
[01:44:16] [PASSED] 59 VFs
[01:44:16] [PASSED] 60 VFs
[01:44:16] [PASSED] 61 VFs
[01:44:16] [PASSED] 62 VFs
[01:44:16] [PASSED] 63 VFs
[01:44:16] ==================== [PASSED] fair_vram ====================
[01:44:16] ================== [PASSED] pf_gt_config ===================
[01:44:16] ===================== lmtt (1 subtest) =====================
[01:44:16] ======================== test_ops  =========================
[01:44:16] [PASSED] 2-level
[01:44:16] [PASSED] multi-level
[01:44:16] ==================== [PASSED] test_ops =====================
[01:44:16] ====================== [PASSED] lmtt =======================
[01:44:16] ================= pf_service (11 subtests) =================
[01:44:16] [PASSED] pf_negotiate_any
[01:44:16] [PASSED] pf_negotiate_base_match
[01:44:16] [PASSED] pf_negotiate_base_newer
[01:44:16] [PASSED] pf_negotiate_base_next
[01:44:16] [SKIPPED] pf_negotiate_base_older
[01:44:16] [PASSED] pf_negotiate_base_prev
[01:44:16] [PASSED] pf_negotiate_latest_match
[01:44:16] [PASSED] pf_negotiate_latest_newer
[01:44:16] [PASSED] pf_negotiate_latest_next
[01:44:16] [SKIPPED] pf_negotiate_latest_older
[01:44:16] [SKIPPED] pf_negotiate_latest_prev
[01:44:16] =================== [PASSED] pf_service ====================
[01:44:16] ================= xe_guc_g2g (2 subtests) ==================
[01:44:16] ============== xe_live_guc_g2g_kunit_default  ==============
[01:44:16] ========= [SKIPPED] xe_live_guc_g2g_kunit_default ==========
[01:44:16] ============== xe_live_guc_g2g_kunit_allmem  ===============
[01:44:16] ========== [SKIPPED] xe_live_guc_g2g_kunit_allmem ==========
[01:44:16] =================== [SKIPPED] xe_guc_g2g ===================
[01:44:16] =================== xe_mocs (2 subtests) ===================
[01:44:16] ================ xe_live_mocs_kernel_kunit  ================
[01:44:16] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[01:44:16] ================ xe_live_mocs_reset_kunit  =================
[01:44:16] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[01:44:16] ==================== [SKIPPED] xe_mocs =====================
[01:44:16] ================= xe_migrate (2 subtests) ==================
[01:44:16] ================= xe_migrate_sanity_kunit  =================
[01:44:16] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[01:44:16] ================== xe_validate_ccs_kunit  ==================
[01:44:16] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[01:44:16] =================== [SKIPPED] xe_migrate ===================
[01:44:16] ================== xe_dma_buf (1 subtest) ==================
[01:44:16] ==================== xe_dma_buf_kunit  =====================
[01:44:16] ================ [SKIPPED] xe_dma_buf_kunit ================
[01:44:16] =================== [SKIPPED] xe_dma_buf ===================
[01:44:16] ================= xe_bo_shrink (1 subtest) =================
[01:44:16] =================== xe_bo_shrink_kunit  ====================
[01:44:16] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[01:44:16] ================== [SKIPPED] xe_bo_shrink ==================
[01:44:16] ==================== xe_bo (2 subtests) ====================
[01:44:16] ================== xe_ccs_migrate_kunit  ===================
[01:44:16] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[01:44:16] ==================== xe_bo_evict_kunit  ====================
[01:44:16] =============== [SKIPPED] xe_bo_evict_kunit ================
[01:44:16] ===================== [SKIPPED] xe_bo ======================
[01:44:16] ==================== args (13 subtests) ====================
[01:44:16] [PASSED] count_args_test
[01:44:16] [PASSED] call_args_example
[01:44:16] [PASSED] call_args_test
[01:44:16] [PASSED] drop_first_arg_example
[01:44:16] [PASSED] drop_first_arg_test
[01:44:16] [PASSED] first_arg_example
[01:44:16] [PASSED] first_arg_test
[01:44:16] [PASSED] last_arg_example
[01:44:16] [PASSED] last_arg_test
[01:44:16] [PASSED] pick_arg_example
[01:44:16] [PASSED] if_args_example
[01:44:16] [PASSED] if_args_test
[01:44:16] [PASSED] sep_comma_example
[01:44:16] ====================== [PASSED] args =======================
[01:44:16] =================== xe_pci (3 subtests) ====================
[01:44:16] ==================== check_graphics_ip  ====================
[01:44:16] [PASSED] 12.00 Xe_LP
[01:44:16] [PASSED] 12.10 Xe_LP+
[01:44:16] [PASSED] 12.55 Xe_HPG
[01:44:16] [PASSED] 12.60 Xe_HPC
[01:44:16] [PASSED] 12.70 Xe_LPG
[01:44:16] [PASSED] 12.71 Xe_LPG
[01:44:16] [PASSED] 12.74 Xe_LPG+
[01:44:16] [PASSED] 20.01 Xe2_HPG
[01:44:16] [PASSED] 20.02 Xe2_HPG
[01:44:16] [PASSED] 20.04 Xe2_LPG
[01:44:16] [PASSED] 30.00 Xe3_LPG
[01:44:16] [PASSED] 30.01 Xe3_LPG
[01:44:16] [PASSED] 30.03 Xe3_LPG
[01:44:16] [PASSED] 30.04 Xe3_LPG
[01:44:16] [PASSED] 30.05 Xe3_LPG
[01:44:16] [PASSED] 35.10 Xe3p_LPG
[01:44:16] [PASSED] 35.11 Xe3p_XPC
[01:44:16] ================ [PASSED] check_graphics_ip ================
[01:44:16] ===================== check_media_ip  ======================
[01:44:16] [PASSED] 12.00 Xe_M
[01:44:16] [PASSED] 12.55 Xe_HPM
[01:44:16] [PASSED] 13.00 Xe_LPM+
[01:44:16] [PASSED] 13.01 Xe2_HPM
[01:44:16] [PASSED] 20.00 Xe2_LPM
[01:44:16] [PASSED] 30.00 Xe3_LPM
[01:44:16] [PASSED] 30.02 Xe3_LPM
[01:44:16] [PASSED] 35.00 Xe3p_LPM
[01:44:16] [PASSED] 35.03 Xe3p_HPM
[01:44:16] ================= [PASSED] check_media_ip ==================
[01:44:16] =================== check_platform_desc  ===================
[01:44:16] [PASSED] 0x9A60 (TIGERLAKE)
[01:44:16] [PASSED] 0x9A68 (TIGERLAKE)
[01:44:16] [PASSED] 0x9A70 (TIGERLAKE)
[01:44:16] [PASSED] 0x9A40 (TIGERLAKE)
[01:44:16] [PASSED] 0x9A49 (TIGERLAKE)
[01:44:16] [PASSED] 0x9A59 (TIGERLAKE)
[01:44:16] [PASSED] 0x9A78 (TIGERLAKE)
[01:44:16] [PASSED] 0x9AC0 (TIGERLAKE)
[01:44:16] [PASSED] 0x9AC9 (TIGERLAKE)
[01:44:16] [PASSED] 0x9AD9 (TIGERLAKE)
[01:44:16] [PASSED] 0x9AF8 (TIGERLAKE)
[01:44:16] [PASSED] 0x4C80 (ROCKETLAKE)
[01:44:16] [PASSED] 0x4C8A (ROCKETLAKE)
[01:44:16] [PASSED] 0x4C8B (ROCKETLAKE)
[01:44:16] [PASSED] 0x4C8C (ROCKETLAKE)
[01:44:16] [PASSED] 0x4C90 (ROCKETLAKE)
[01:44:16] [PASSED] 0x4C9A (ROCKETLAKE)
[01:44:16] [PASSED] 0x4680 (ALDERLAKE_S)
[01:44:16] [PASSED] 0x4682 (ALDERLAKE_S)
[01:44:16] [PASSED] 0x4688 (ALDERLAKE_S)
[01:44:16] [PASSED] 0x468A (ALDERLAKE_S)
[01:44:16] [PASSED] 0x468B (ALDERLAKE_S)
[01:44:16] [PASSED] 0x4690 (ALDERLAKE_S)
[01:44:16] [PASSED] 0x4692 (ALDERLAKE_S)
[01:44:16] [PASSED] 0x4693 (ALDERLAKE_S)
[01:44:16] [PASSED] 0x46A0 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46A1 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46A2 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46A3 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46A6 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46A8 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46AA (ALDERLAKE_P)
[01:44:16] [PASSED] 0x462A (ALDERLAKE_P)
[01:44:16] [PASSED] 0x4626 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x4628 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46B0 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46B1 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46B2 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46B3 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46C0 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46C1 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46C2 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46C3 (ALDERLAKE_P)
[01:44:16] [PASSED] 0x46D0 (ALDERLAKE_N)
[01:44:16] [PASSED] 0x46D1 (ALDERLAKE_N)
[01:44:16] [PASSED] 0x46D2 (ALDERLAKE_N)
[01:44:16] [PASSED] 0x46D3 (ALDERLAKE_N)
[01:44:16] [PASSED] 0x46D4 (ALDERLAKE_N)
[01:44:16] [PASSED] 0xA721 (ALDERLAKE_P)
[01:44:16] [PASSED] 0xA7A1 (ALDERLAKE_P)
[01:44:16] [PASSED] 0xA7A9 (ALDERLAKE_P)
[01:44:16] [PASSED] 0xA7AC (ALDERLAKE_P)
[01:44:16] [PASSED] 0xA7AD (ALDERLAKE_P)
[01:44:16] [PASSED] 0xA720 (ALDERLAKE_P)
[01:44:16] [PASSED] 0xA7A0 (ALDERLAKE_P)
[01:44:16] [PASSED] 0xA7A8 (ALDERLAKE_P)
[01:44:16] [PASSED] 0xA7AA (ALDERLAKE_P)
[01:44:16] [PASSED] 0xA7AB (ALDERLAKE_P)
[01:44:16] [PASSED] 0xA780 (ALDERLAKE_S)
[01:44:16] [PASSED] 0xA781 (ALDERLAKE_S)
[01:44:16] [PASSED] 0xA782 (ALDERLAKE_S)
[01:44:16] [PASSED] 0xA783 (ALDERLAKE_S)
[01:44:16] [PASSED] 0xA788 (ALDERLAKE_S)
[01:44:16] [PASSED] 0xA789 (ALDERLAKE_S)
[01:44:16] [PASSED] 0xA78A (ALDERLAKE_S)
[01:44:16] [PASSED] 0xA78B (ALDERLAKE_S)
[01:44:16] [PASSED] 0x4905 (DG1)
[01:44:16] [PASSED] 0x4906 (DG1)
[01:44:16] [PASSED] 0x4907 (DG1)
[01:44:16] [PASSED] 0x4908 (DG1)
[01:44:16] [PASSED] 0x4909 (DG1)
[01:44:16] [PASSED] 0x56C0 (DG2)
[01:44:16] [PASSED] 0x56C2 (DG2)
[01:44:16] [PASSED] 0x56C1 (DG2)
[01:44:16] [PASSED] 0x7D51 (METEORLAKE)
[01:44:16] [PASSED] 0x7DD1 (METEORLAKE)
[01:44:16] [PASSED] 0x7D41 (METEORLAKE)
[01:44:16] [PASSED] 0x7D67 (METEORLAKE)
[01:44:16] [PASSED] 0xB640 (METEORLAKE)
[01:44:16] [PASSED] 0x56A0 (DG2)
[01:44:16] [PASSED] 0x56A1 (DG2)
[01:44:16] [PASSED] 0x56A2 (DG2)
[01:44:16] [PASSED] 0x56BE (DG2)
[01:44:16] [PASSED] 0x56BF (DG2)
[01:44:16] [PASSED] 0x5690 (DG2)
[01:44:16] [PASSED] 0x5691 (DG2)
[01:44:16] [PASSED] 0x5692 (DG2)
[01:44:16] [PASSED] 0x56A5 (DG2)
[01:44:16] [PASSED] 0x56A6 (DG2)
[01:44:16] [PASSED] 0x56B0 (DG2)
[01:44:16] [PASSED] 0x56B1 (DG2)
[01:44:16] [PASSED] 0x56BA (DG2)
[01:44:16] [PASSED] 0x56BB (DG2)
[01:44:16] [PASSED] 0x56BC (DG2)
[01:44:16] [PASSED] 0x56BD (DG2)
[01:44:16] [PASSED] 0x5693 (DG2)
[01:44:16] [PASSED] 0x5694 (DG2)
[01:44:16] [PASSED] 0x5695 (DG2)
[01:44:16] [PASSED] 0x56A3 (DG2)
[01:44:16] [PASSED] 0x56A4 (DG2)
[01:44:16] [PASSED] 0x56B2 (DG2)
[01:44:16] [PASSED] 0x56B3 (DG2)
[01:44:16] [PASSED] 0x5696 (DG2)
[01:44:16] [PASSED] 0x5697 (DG2)
[01:44:16] [PASSED] 0xB69 (PVC)
[01:44:16] [PASSED] 0xB6E (PVC)
[01:44:16] [PASSED] 0xBD4 (PVC)
[01:44:16] [PASSED] 0xBD5 (PVC)
[01:44:16] [PASSED] 0xBD6 (PVC)
[01:44:16] [PASSED] 0xBD7 (PVC)
[01:44:16] [PASSED] 0xBD8 (PVC)
[01:44:16] [PASSED] 0xBD9 (PVC)
[01:44:16] [PASSED] 0xBDA (PVC)
[01:44:16] [PASSED] 0xBDB (PVC)
[01:44:16] [PASSED] 0xBE0 (PVC)
[01:44:16] [PASSED] 0xBE1 (PVC)
[01:44:16] [PASSED] 0xBE5 (PVC)
[01:44:16] [PASSED] 0x7D40 (METEORLAKE)
[01:44:16] [PASSED] 0x7D45 (METEORLAKE)
[01:44:16] [PASSED] 0x7D55 (METEORLAKE)
[01:44:16] [PASSED] 0x7D60 (METEORLAKE)
[01:44:16] [PASSED] 0x7DD5 (METEORLAKE)
[01:44:16] [PASSED] 0x6420 (LUNARLAKE)
[01:44:16] [PASSED] 0x64A0 (LUNARLAKE)
[01:44:16] [PASSED] 0x64B0 (LUNARLAKE)
[01:44:16] [PASSED] 0xE202 (BATTLEMAGE)
[01:44:16] [PASSED] 0xE209 (BATTLEMAGE)
[01:44:16] [PASSED] 0xE20B (BATTLEMAGE)
[01:44:16] [PASSED] 0xE20C (BATTLEMAGE)
[01:44:16] [PASSED] 0xE20D (BATTLEMAGE)
[01:44:16] [PASSED] 0xE210 (BATTLEMAGE)
[01:44:16] [PASSED] 0xE211 (BATTLEMAGE)
[01:44:16] [PASSED] 0xE212 (BATTLEMAGE)
[01:44:16] [PASSED] 0xE216 (BATTLEMAGE)
[01:44:16] [PASSED] 0xE220 (BATTLEMAGE)
[01:44:16] [PASSED] 0xE221 (BATTLEMAGE)
[01:44:16] [PASSED] 0xE222 (BATTLEMAGE)
[01:44:16] [PASSED] 0xE223 (BATTLEMAGE)
[01:44:16] [PASSED] 0xB080 (PANTHERLAKE)
[01:44:16] [PASSED] 0xB081 (PANTHERLAKE)
[01:44:16] [PASSED] 0xB082 (PANTHERLAKE)
[01:44:16] [PASSED] 0xB083 (PANTHERLAKE)
[01:44:16] [PASSED] 0xB084 (PANTHERLAKE)
[01:44:16] [PASSED] 0xB085 (PANTHERLAKE)
[01:44:16] [PASSED] 0xB086 (PANTHERLAKE)
[01:44:16] [PASSED] 0xB087 (PANTHERLAKE)
[01:44:16] [PASSED] 0xB08F (PANTHERLAKE)
[01:44:16] [PASSED] 0xB090 (PANTHERLAKE)
[01:44:16] [PASSED] 0xB0A0 (PANTHERLAKE)
[01:44:16] [PASSED] 0xB0B0 (PANTHERLAKE)
[01:44:16] [PASSED] 0xFD80 (PANTHERLAKE)
[01:44:16] [PASSED] 0xFD81 (PANTHERLAKE)
[01:44:16] [PASSED] 0xD740 (NOVALAKE_S)
[01:44:16] [PASSED] 0xD741 (NOVALAKE_S)
[01:44:16] [PASSED] 0xD742 (NOVALAKE_S)
[01:44:16] [PASSED] 0xD743 (NOVALAKE_S)
[01:44:16] [PASSED] 0xD744 (NOVALAKE_S)
[01:44:16] [PASSED] 0xD745 (NOVALAKE_S)
[01:44:16] [PASSED] 0x674C (CRESCENTISLAND)
[01:44:16] [PASSED] 0xD750 (NOVALAKE_P)
[01:44:16] [PASSED] 0xD751 (NOVALAKE_P)
[01:44:16] [PASSED] 0xD752 (NOVALAKE_P)
[01:44:16] [PASSED] 0xD753 (NOVALAKE_P)
[01:44:16] [PASSED] 0xD754 (NOVALAKE_P)
[01:44:16] [PASSED] 0xD755 (NOVALAKE_P)
[01:44:16] [PASSED] 0xD756 (NOVALAKE_P)
[01:44:16] [PASSED] 0xD757 (NOVALAKE_P)
[01:44:16] [PASSED] 0xD75F (NOVALAKE_P)
[01:44:16] =============== [PASSED] check_platform_desc ===============
[01:44:16] ===================== [PASSED] xe_pci ======================
[01:44:16] =================== xe_rtp (2 subtests) ====================
[01:44:16] =============== xe_rtp_process_to_sr_tests  ================
[01:44:16] [PASSED] coalesce-same-reg
[01:44:16] [PASSED] no-match-no-add
[01:44:16] [PASSED] match-or
[01:44:16] [PASSED] match-or-xfail
[01:44:16] [PASSED] no-match-no-add-multiple-rules
[01:44:16] [PASSED] two-regs-two-entries
[01:44:16] [PASSED] clr-one-set-other
[01:44:16] [PASSED] set-field
[01:44:16] [PASSED] conflict-duplicate
stty: 'standard input': Inappropriate ioctl for device
[01:44:16] [PASSED] conflict-not-disjoint
[01:44:16] [PASSED] conflict-reg-type
[01:44:16] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[01:44:16] ================== xe_rtp_process_tests  ===================
[01:44:16] [PASSED] active1
[01:44:16] [PASSED] active2
[01:44:16] [PASSED] active-inactive
[01:44:16] [PASSED] inactive-active
[01:44:16] [PASSED] inactive-1st_or_active-inactive
[01:44:16] [PASSED] inactive-2nd_or_active-inactive
[01:44:16] [PASSED] inactive-last_or_active-inactive
[01:44:16] [PASSED] inactive-no_or_active-inactive
[01:44:16] ============== [PASSED] xe_rtp_process_tests ===============
[01:44:16] ===================== [PASSED] xe_rtp ======================
[01:44:16] ==================== xe_wa (1 subtest) =====================
[01:44:16] ======================== xe_wa_gt  =========================
[01:44:16] [PASSED] TIGERLAKE B0
[01:44:16] [PASSED] DG1 A0
[01:44:16] [PASSED] DG1 B0
[01:44:16] [PASSED] ALDERLAKE_S A0
[01:44:16] [PASSED] ALDERLAKE_S B0
[01:44:16] [PASSED] ALDERLAKE_S C0
[01:44:16] [PASSED] ALDERLAKE_S D0
[01:44:16] [PASSED] ALDERLAKE_P A0
[01:44:16] [PASSED] ALDERLAKE_P B0
[01:44:16] [PASSED] ALDERLAKE_P C0
[01:44:16] [PASSED] ALDERLAKE_S RPLS D0
[01:44:16] [PASSED] ALDERLAKE_P RPLU E0
[01:44:16] [PASSED] DG2 G10 C0
[01:44:16] [PASSED] DG2 G11 B1
[01:44:16] [PASSED] DG2 G12 A1
[01:44:16] [PASSED] METEORLAKE 12.70(Xe_LPG) A0 13.00(Xe_LPM+) A0
[01:44:16] [PASSED] METEORLAKE 12.71(Xe_LPG) A0 13.00(Xe_LPM+) A0
[01:44:16] [PASSED] METEORLAKE 12.74(Xe_LPG+) A0 13.00(Xe_LPM+) A0
[01:44:16] [PASSED] LUNARLAKE 20.04(Xe2_LPG) A0 20.00(Xe2_LPM) A0
[01:44:16] [PASSED] LUNARLAKE 20.04(Xe2_LPG) B0 20.00(Xe2_LPM) A0
[01:44:16] [PASSED] BATTLEMAGE 20.01(Xe2_HPG) A0 13.01(Xe2_HPM) A1
[01:44:16] [PASSED] PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0
[01:44:16] ==================== [PASSED] xe_wa_gt =====================
[01:44:16] ====================== [PASSED] xe_wa ======================
[01:44:16] ============================================================
[01:44:16] Testing complete. Ran 597 tests: passed: 579, skipped: 18
[01:44:16] Elapsed time: 35.471s total, 4.218s configuring, 30.586s building, 0.614s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[01:44:16] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[01:44:18] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[01:44:42] Starting KUnit Kernel (1/1)...
[01:44:42] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[01:44:42] ============ drm_test_pick_cmdline (2 subtests) ============
[01:44:42] [PASSED] drm_test_pick_cmdline_res_1920_1080_60
[01:44:42] =============== drm_test_pick_cmdline_named  ===============
[01:44:42] [PASSED] NTSC
[01:44:42] [PASSED] NTSC-J
[01:44:42] [PASSED] PAL
[01:44:42] [PASSED] PAL-M
[01:44:42] =========== [PASSED] drm_test_pick_cmdline_named ===========
[01:44:42] ============== [PASSED] drm_test_pick_cmdline ==============
[01:44:42] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[01:44:42] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[01:44:42] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[01:44:42] =========== drm_validate_clone_mode (2 subtests) ===========
[01:44:42] ============== drm_test_check_in_clone_mode  ===============
[01:44:42] [PASSED] in_clone_mode
[01:44:42] [PASSED] not_in_clone_mode
[01:44:42] ========== [PASSED] drm_test_check_in_clone_mode ===========
[01:44:42] =============== drm_test_check_valid_clones  ===============
[01:44:42] [PASSED] not_in_clone_mode
[01:44:42] [PASSED] valid_clone
[01:44:42] [PASSED] invalid_clone
[01:44:42] =========== [PASSED] drm_test_check_valid_clones ===========
[01:44:42] ============= [PASSED] drm_validate_clone_mode =============
[01:44:42] ============= drm_validate_modeset (1 subtest) =============
[01:44:42] [PASSED] drm_test_check_connector_changed_modeset
[01:44:42] ============== [PASSED] drm_validate_modeset ===============
[01:44:42] ====== drm_test_bridge_get_current_state (2 subtests) ======
[01:44:42] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[01:44:42] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[01:44:42] ======== [PASSED] drm_test_bridge_get_current_state ========
[01:44:42] ====== drm_test_bridge_helper_reset_crtc (3 subtests) ======
[01:44:42] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[01:44:42] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[01:44:42] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[01:44:42] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[01:44:42] ============== drm_bridge_alloc (2 subtests) ===============
[01:44:42] [PASSED] drm_test_drm_bridge_alloc_basic
[01:44:42] [PASSED] drm_test_drm_bridge_alloc_get_put
[01:44:42] ================ [PASSED] drm_bridge_alloc =================
[01:44:42] ============= drm_cmdline_parser (40 subtests) =============
[01:44:42] [PASSED] drm_test_cmdline_force_d_only
[01:44:42] [PASSED] drm_test_cmdline_force_D_only_dvi
[01:44:42] [PASSED] drm_test_cmdline_force_D_only_hdmi
[01:44:42] [PASSED] drm_test_cmdline_force_D_only_not_digital
[01:44:42] [PASSED] drm_test_cmdline_force_e_only
[01:44:42] [PASSED] drm_test_cmdline_res
[01:44:42] [PASSED] drm_test_cmdline_res_vesa
[01:44:42] [PASSED] drm_test_cmdline_res_vesa_rblank
[01:44:42] [PASSED] drm_test_cmdline_res_rblank
[01:44:42] [PASSED] drm_test_cmdline_res_bpp
[01:44:42] [PASSED] drm_test_cmdline_res_refresh
[01:44:42] [PASSED] drm_test_cmdline_res_bpp_refresh
[01:44:42] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[01:44:42] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[01:44:42] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[01:44:42] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[01:44:42] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[01:44:42] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[01:44:42] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[01:44:42] [PASSED] drm_test_cmdline_res_margins_force_on
[01:44:42] [PASSED] drm_test_cmdline_res_vesa_margins
[01:44:42] [PASSED] drm_test_cmdline_name
[01:44:42] [PASSED] drm_test_cmdline_name_bpp
[01:44:42] [PASSED] drm_test_cmdline_name_option
[01:44:42] [PASSED] drm_test_cmdline_name_bpp_option
[01:44:42] [PASSED] drm_test_cmdline_rotate_0
[01:44:42] [PASSED] drm_test_cmdline_rotate_90
[01:44:42] [PASSED] drm_test_cmdline_rotate_180
[01:44:42] [PASSED] drm_test_cmdline_rotate_270
[01:44:42] [PASSED] drm_test_cmdline_hmirror
[01:44:42] [PASSED] drm_test_cmdline_vmirror
[01:44:42] [PASSED] drm_test_cmdline_margin_options
[01:44:42] [PASSED] drm_test_cmdline_multiple_options
[01:44:42] [PASSED] drm_test_cmdline_bpp_extra_and_option
[01:44:42] [PASSED] drm_test_cmdline_extra_and_option
[01:44:42] [PASSED] drm_test_cmdline_freestanding_options
[01:44:42] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[01:44:42] [PASSED] drm_test_cmdline_panel_orientation
[01:44:42] ================ drm_test_cmdline_invalid  =================
[01:44:42] [PASSED] margin_only
[01:44:42] [PASSED] interlace_only
[01:44:42] [PASSED] res_missing_x
[01:44:42] [PASSED] res_missing_y
[01:44:42] [PASSED] res_bad_y
[01:44:42] [PASSED] res_missing_y_bpp
[01:44:42] [PASSED] res_bad_bpp
[01:44:42] [PASSED] res_bad_refresh
[01:44:42] [PASSED] res_bpp_refresh_force_on_off
[01:44:42] [PASSED] res_invalid_mode
[01:44:42] [PASSED] res_bpp_wrong_place_mode
[01:44:42] [PASSED] name_bpp_refresh
[01:44:42] [PASSED] name_refresh
[01:44:42] [PASSED] name_refresh_wrong_mode
[01:44:42] [PASSED] name_refresh_invalid_mode
[01:44:42] [PASSED] rotate_multiple
[01:44:42] [PASSED] rotate_invalid_val
[01:44:42] [PASSED] rotate_truncated
[01:44:42] [PASSED] invalid_option
[01:44:42] [PASSED] invalid_tv_option
[01:44:42] [PASSED] truncated_tv_option
[01:44:42] ============ [PASSED] drm_test_cmdline_invalid =============
[01:44:42] =============== drm_test_cmdline_tv_options  ===============
[01:44:42] [PASSED] NTSC
[01:44:42] [PASSED] NTSC_443
[01:44:42] [PASSED] NTSC_J
[01:44:42] [PASSED] PAL
[01:44:42] [PASSED] PAL_M
[01:44:42] [PASSED] PAL_N
[01:44:42] [PASSED] SECAM
[01:44:42] [PASSED] MONO_525
[01:44:42] [PASSED] MONO_625
[01:44:42] =========== [PASSED] drm_test_cmdline_tv_options ===========
[01:44:42] =============== [PASSED] drm_cmdline_parser ================
[01:44:42] ========== drmm_connector_hdmi_init (20 subtests) ==========
[01:44:42] [PASSED] drm_test_connector_hdmi_init_valid
[01:44:42] [PASSED] drm_test_connector_hdmi_init_bpc_8
[01:44:42] [PASSED] drm_test_connector_hdmi_init_bpc_10
[01:44:42] [PASSED] drm_test_connector_hdmi_init_bpc_12
[01:44:42] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[01:44:42] [PASSED] drm_test_connector_hdmi_init_bpc_null
[01:44:42] [PASSED] drm_test_connector_hdmi_init_formats_empty
[01:44:42] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[01:44:42] === drm_test_connector_hdmi_init_formats_yuv420_allowed  ===
[01:44:42] [PASSED] supported_formats=0x9 yuv420_allowed=1
[01:44:42] [PASSED] supported_formats=0x9 yuv420_allowed=0
[01:44:42] [PASSED] supported_formats=0x3 yuv420_allowed=1
[01:44:42] [PASSED] supported_formats=0x3 yuv420_allowed=0
[01:44:42] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[01:44:42] [PASSED] drm_test_connector_hdmi_init_null_ddc
[01:44:42] [PASSED] drm_test_connector_hdmi_init_null_product
[01:44:42] [PASSED] drm_test_connector_hdmi_init_null_vendor
[01:44:42] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[01:44:42] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[01:44:42] [PASSED] drm_test_connector_hdmi_init_product_valid
[01:44:42] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[01:44:42] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[01:44:42] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[01:44:42] ========= drm_test_connector_hdmi_init_type_valid  =========
[01:44:42] [PASSED] HDMI-A
[01:44:42] [PASSED] HDMI-B
[01:44:42] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[01:44:42] ======== drm_test_connector_hdmi_init_type_invalid  ========
[01:44:42] [PASSED] Unknown
[01:44:42] [PASSED] VGA
[01:44:42] [PASSED] DVI-I
[01:44:42] [PASSED] DVI-D
[01:44:42] [PASSED] DVI-A
[01:44:42] [PASSED] Composite
[01:44:42] [PASSED] SVIDEO
[01:44:42] [PASSED] LVDS
[01:44:42] [PASSED] Component
[01:44:42] [PASSED] DIN
[01:44:42] [PASSED] DP
[01:44:42] [PASSED] TV
[01:44:42] [PASSED] eDP
[01:44:42] [PASSED] Virtual
[01:44:42] [PASSED] DSI
[01:44:42] [PASSED] DPI
[01:44:42] [PASSED] Writeback
[01:44:42] [PASSED] SPI
[01:44:42] [PASSED] USB
[01:44:42] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[01:44:42] ============ [PASSED] drmm_connector_hdmi_init =============
[01:44:42] ============= drmm_connector_init (3 subtests) =============
[01:44:42] [PASSED] drm_test_drmm_connector_init
[01:44:42] [PASSED] drm_test_drmm_connector_init_null_ddc
[01:44:42] ========= drm_test_drmm_connector_init_type_valid  =========
[01:44:42] [PASSED] Unknown
[01:44:42] [PASSED] VGA
[01:44:42] [PASSED] DVI-I
[01:44:42] [PASSED] DVI-D
[01:44:42] [PASSED] DVI-A
[01:44:42] [PASSED] Composite
[01:44:42] [PASSED] SVIDEO
[01:44:42] [PASSED] LVDS
[01:44:42] [PASSED] Component
[01:44:42] [PASSED] DIN
[01:44:42] [PASSED] DP
[01:44:42] [PASSED] HDMI-A
[01:44:42] [PASSED] HDMI-B
[01:44:42] [PASSED] TV
[01:44:42] [PASSED] eDP
[01:44:42] [PASSED] Virtual
[01:44:42] [PASSED] DSI
[01:44:42] [PASSED] DPI
[01:44:42] [PASSED] Writeback
[01:44:42] [PASSED] SPI
[01:44:42] [PASSED] USB
[01:44:42] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[01:44:42] =============== [PASSED] drmm_connector_init ===============
[01:44:42] ========= drm_connector_dynamic_init (6 subtests) ==========
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_init
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_init_properties
[01:44:42] ===== drm_test_drm_connector_dynamic_init_type_valid  ======
[01:44:42] [PASSED] Unknown
[01:44:42] [PASSED] VGA
[01:44:42] [PASSED] DVI-I
[01:44:42] [PASSED] DVI-D
[01:44:42] [PASSED] DVI-A
[01:44:42] [PASSED] Composite
[01:44:42] [PASSED] SVIDEO
[01:44:42] [PASSED] LVDS
[01:44:42] [PASSED] Component
[01:44:42] [PASSED] DIN
[01:44:42] [PASSED] DP
[01:44:42] [PASSED] HDMI-A
[01:44:42] [PASSED] HDMI-B
[01:44:42] [PASSED] TV
[01:44:42] [PASSED] eDP
[01:44:42] [PASSED] Virtual
[01:44:42] [PASSED] DSI
[01:44:42] [PASSED] DPI
[01:44:42] [PASSED] Writeback
[01:44:42] [PASSED] SPI
[01:44:42] [PASSED] USB
[01:44:42] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[01:44:42] ======== drm_test_drm_connector_dynamic_init_name  =========
[01:44:42] [PASSED] Unknown
[01:44:42] [PASSED] VGA
[01:44:42] [PASSED] DVI-I
[01:44:42] [PASSED] DVI-D
[01:44:42] [PASSED] DVI-A
[01:44:42] [PASSED] Composite
[01:44:42] [PASSED] SVIDEO
[01:44:42] [PASSED] LVDS
[01:44:42] [PASSED] Component
[01:44:42] [PASSED] DIN
[01:44:42] [PASSED] DP
[01:44:42] [PASSED] HDMI-A
[01:44:42] [PASSED] HDMI-B
[01:44:42] [PASSED] TV
[01:44:42] [PASSED] eDP
[01:44:42] [PASSED] Virtual
[01:44:42] [PASSED] DSI
[01:44:42] [PASSED] DPI
[01:44:42] [PASSED] Writeback
[01:44:42] [PASSED] SPI
[01:44:42] [PASSED] USB
[01:44:42] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[01:44:42] =========== [PASSED] drm_connector_dynamic_init ============
[01:44:42] ==== drm_connector_dynamic_register_early (4 subtests) =====
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[01:44:42] ====== [PASSED] drm_connector_dynamic_register_early =======
[01:44:42] ======= drm_connector_dynamic_register (7 subtests) ========
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[01:44:42] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[01:44:42] ========= [PASSED] drm_connector_dynamic_register ==========
[01:44:42] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[01:44:42] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[01:44:42] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[01:44:42] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[01:44:42] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[01:44:42] ========== drm_test_get_tv_mode_from_name_valid  ===========
[01:44:42] [PASSED] NTSC
[01:44:42] [PASSED] NTSC-443
[01:44:42] [PASSED] NTSC-J
[01:44:42] [PASSED] PAL
[01:44:42] [PASSED] PAL-M
[01:44:42] [PASSED] PAL-N
[01:44:42] [PASSED] SECAM
[01:44:42] [PASSED] Mono
[01:44:42] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[01:44:42] [PASSED] drm_test_get_tv_mode_from_name_truncated
[01:44:42] ============ [PASSED] drm_get_tv_mode_from_name ============
[01:44:42] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[01:44:42] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[01:44:42] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[01:44:42] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[01:44:42] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[01:44:42] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[01:44:42] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[01:44:42] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid  =
[01:44:42] [PASSED] VIC 96
[01:44:42] [PASSED] VIC 97
[01:44:42] [PASSED] VIC 101
[01:44:42] [PASSED] VIC 102
[01:44:42] [PASSED] VIC 106
[01:44:42] [PASSED] VIC 107
[01:44:42] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[01:44:42] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[01:44:42] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[01:44:42] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[01:44:42] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[01:44:42] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[01:44:42] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[01:44:42] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[01:44:42] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name  ====
[01:44:42] [PASSED] Automatic
[01:44:42] [PASSED] Full
[01:44:42] [PASSED] Limited 16:235
[01:44:42] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[01:44:42] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[01:44:42] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[01:44:42] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[01:44:42] === drm_test_drm_hdmi_connector_get_output_format_name  ====
[01:44:42] [PASSED] RGB
[01:44:42] [PASSED] YUV 4:2:0
[01:44:42] [PASSED] YUV 4:2:2
[01:44:42] [PASSED] YUV 4:4:4
[01:44:42] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[01:44:42] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[01:44:42] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[01:44:42] ============= drm_damage_helper (21 subtests) ==============
[01:44:42] [PASSED] drm_test_damage_iter_no_damage
[01:44:42] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[01:44:42] [PASSED] drm_test_damage_iter_no_damage_src_moved
[01:44:42] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[01:44:42] [PASSED] drm_test_damage_iter_no_damage_not_visible
[01:44:42] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[01:44:42] [PASSED] drm_test_damage_iter_no_damage_no_fb
[01:44:42] [PASSED] drm_test_damage_iter_simple_damage
[01:44:42] [PASSED] drm_test_damage_iter_single_damage
[01:44:42] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[01:44:42] [PASSED] drm_test_damage_iter_single_damage_outside_src
[01:44:42] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[01:44:42] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[01:44:42] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[01:44:42] [PASSED] drm_test_damage_iter_single_damage_src_moved
[01:44:42] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[01:44:42] [PASSED] drm_test_damage_iter_damage
[01:44:42] [PASSED] drm_test_damage_iter_damage_one_intersect
[01:44:42] [PASSED] drm_test_damage_iter_damage_one_outside
[01:44:42] [PASSED] drm_test_damage_iter_damage_src_moved
[01:44:42] [PASSED] drm_test_damage_iter_damage_not_visible
[01:44:42] ================ [PASSED] drm_damage_helper ================
[01:44:42] ============== drm_dp_mst_helper (3 subtests) ==============
[01:44:42] ============== drm_test_dp_mst_calc_pbn_mode  ==============
[01:44:42] [PASSED] Clock 154000 BPP 30 DSC disabled
[01:44:42] [PASSED] Clock 234000 BPP 30 DSC disabled
[01:44:42] [PASSED] Clock 297000 BPP 24 DSC disabled
[01:44:42] [PASSED] Clock 332880 BPP 24 DSC enabled
[01:44:42] [PASSED] Clock 324540 BPP 24 DSC enabled
[01:44:42] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[01:44:42] ============== drm_test_dp_mst_calc_pbn_div  ===============
[01:44:42] [PASSED] Link rate 2000000 lane count 4
[01:44:42] [PASSED] Link rate 2000000 lane count 2
[01:44:42] [PASSED] Link rate 2000000 lane count 1
[01:44:42] [PASSED] Link rate 1350000 lane count 4
[01:44:42] [PASSED] Link rate 1350000 lane count 2
[01:44:42] [PASSED] Link rate 1350000 lane count 1
[01:44:42] [PASSED] Link rate 1000000 lane count 4
[01:44:42] [PASSED] Link rate 1000000 lane count 2
[01:44:42] [PASSED] Link rate 1000000 lane count 1
[01:44:42] [PASSED] Link rate 810000 lane count 4
[01:44:42] [PASSED] Link rate 810000 lane count 2
[01:44:42] [PASSED] Link rate 810000 lane count 1
[01:44:42] [PASSED] Link rate 540000 lane count 4
[01:44:42] [PASSED] Link rate 540000 lane count 2
[01:44:42] [PASSED] Link rate 540000 lane count 1
[01:44:42] [PASSED] Link rate 270000 lane count 4
[01:44:42] [PASSED] Link rate 270000 lane count 2
[01:44:42] [PASSED] Link rate 270000 lane count 1
[01:44:42] [PASSED] Link rate 162000 lane count 4
[01:44:42] [PASSED] Link rate 162000 lane count 2
[01:44:42] [PASSED] Link rate 162000 lane count 1
[01:44:42] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[01:44:42] ========= drm_test_dp_mst_sideband_msg_req_decode  =========
[01:44:42] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[01:44:42] [PASSED] DP_POWER_UP_PHY with port number
[01:44:42] [PASSED] DP_POWER_DOWN_PHY with port number
[01:44:42] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[01:44:42] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[01:44:42] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[01:44:42] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[01:44:42] [PASSED] DP_QUERY_PAYLOAD with port number
[01:44:42] [PASSED] DP_QUERY_PAYLOAD with VCPI
[01:44:42] [PASSED] DP_REMOTE_DPCD_READ with port number
[01:44:42] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[01:44:42] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[01:44:42] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[01:44:42] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[01:44:42] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[01:44:42] [PASSED] DP_REMOTE_I2C_READ with port number
[01:44:42] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[01:44:42] [PASSED] DP_REMOTE_I2C_READ with transactions array
[01:44:42] [PASSED] DP_REMOTE_I2C_WRITE with port number
[01:44:42] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[01:44:42] [PASSED] DP_REMOTE_I2C_WRITE with data array
[01:44:42] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[01:44:42] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[01:44:42] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[01:44:42] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[01:44:42] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[01:44:42] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[01:44:42] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[01:44:42] ================ [PASSED] drm_dp_mst_helper ================
[01:44:42] ================== drm_exec (7 subtests) ===================
[01:44:42] [PASSED] sanitycheck
[01:44:42] [PASSED] test_lock
[01:44:42] [PASSED] test_lock_unlock
[01:44:42] [PASSED] test_duplicates
[01:44:42] [PASSED] test_prepare
[01:44:42] [PASSED] test_prepare_array
[01:44:42] [PASSED] test_multiple_loops
[01:44:42] ==================== [PASSED] drm_exec =====================
[01:44:42] =========== drm_format_helper_test (17 subtests) ===========
[01:44:42] ============== drm_test_fb_xrgb8888_to_gray8  ==============
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[01:44:42] ============= drm_test_fb_xrgb8888_to_rgb332  ==============
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[01:44:42] ============= drm_test_fb_xrgb8888_to_rgb565  ==============
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[01:44:42] ============ drm_test_fb_xrgb8888_to_xrgb1555  =============
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[01:44:42] ============ drm_test_fb_xrgb8888_to_argb1555  =============
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[01:44:42] ============ drm_test_fb_xrgb8888_to_rgba5551  =============
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[01:44:42] ============= drm_test_fb_xrgb8888_to_rgb888  ==============
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[01:44:42] ============= drm_test_fb_xrgb8888_to_bgr888  ==============
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[01:44:42] ============ drm_test_fb_xrgb8888_to_argb8888  =============
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[01:44:42] =========== drm_test_fb_xrgb8888_to_xrgb2101010  ===========
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[01:44:42] =========== drm_test_fb_xrgb8888_to_argb2101010  ===========
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[01:44:42] ============== drm_test_fb_xrgb8888_to_mono  ===============
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[01:44:42] ==================== drm_test_fb_swab  =====================
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ================ [PASSED] drm_test_fb_swab =================
[01:44:42] ============ drm_test_fb_xrgb8888_to_xbgr8888  =============
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[01:44:42] ============ drm_test_fb_xrgb8888_to_abgr8888  =============
[01:44:42] [PASSED] single_pixel_source_buffer
[01:44:42] [PASSED] single_pixel_clip_rectangle
[01:44:42] [PASSED] well_known_colors
[01:44:42] [PASSED] destination_pitch
[01:44:42] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[01:44:42] ================= drm_test_fb_clip_offset  =================
[01:44:42] [PASSED] pass through
[01:44:42] [PASSED] horizontal offset
[01:44:42] [PASSED] vertical offset
[01:44:42] [PASSED] horizontal and vertical offset
[01:44:42] [PASSED] horizontal offset (custom pitch)
[01:44:42] [PASSED] vertical offset (custom pitch)
[01:44:42] [PASSED] horizontal and vertical offset (custom pitch)
[01:44:42] ============= [PASSED] drm_test_fb_clip_offset =============
[01:44:42] =================== drm_test_fb_memcpy  ====================
[01:44:42] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[01:44:42] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[01:44:42] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[01:44:42] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[01:44:42] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[01:44:42] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[01:44:42] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[01:44:42] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[01:44:42] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[01:44:42] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[01:44:42] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[01:44:42] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[01:44:42] =============== [PASSED] drm_test_fb_memcpy ================
[01:44:42] ============= [PASSED] drm_format_helper_test ==============
[01:44:42] ================= drm_format (18 subtests) =================
[01:44:42] [PASSED] drm_test_format_block_width_invalid
[01:44:42] [PASSED] drm_test_format_block_width_one_plane
[01:44:42] [PASSED] drm_test_format_block_width_two_plane
[01:44:42] [PASSED] drm_test_format_block_width_three_plane
[01:44:42] [PASSED] drm_test_format_block_width_tiled
[01:44:42] [PASSED] drm_test_format_block_height_invalid
[01:44:42] [PASSED] drm_test_format_block_height_one_plane
[01:44:42] [PASSED] drm_test_format_block_height_two_plane
[01:44:42] [PASSED] drm_test_format_block_height_three_plane
[01:44:42] [PASSED] drm_test_format_block_height_tiled
[01:44:42] [PASSED] drm_test_format_min_pitch_invalid
[01:44:42] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[01:44:42] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[01:44:42] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[01:44:42] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[01:44:42] [PASSED] drm_test_format_min_pitch_two_plane
[01:44:42] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[01:44:42] [PASSED] drm_test_format_min_pitch_tiled
[01:44:42] =================== [PASSED] drm_format ====================
[01:44:42] ============== drm_framebuffer (10 subtests) ===============
[01:44:42] ========== drm_test_framebuffer_check_src_coords  ==========
[01:44:42] [PASSED] Success: source fits into fb
[01:44:42] [PASSED] Fail: overflowing fb with x-axis coordinate
[01:44:42] [PASSED] Fail: overflowing fb with y-axis coordinate
[01:44:42] [PASSED] Fail: overflowing fb with source width
[01:44:42] [PASSED] Fail: overflowing fb with source height
[01:44:42] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[01:44:42] [PASSED] drm_test_framebuffer_cleanup
[01:44:42] =============== drm_test_framebuffer_create  ===============
[01:44:42] [PASSED] ABGR8888 normal sizes
[01:44:42] [PASSED] ABGR8888 max sizes
[01:44:42] [PASSED] ABGR8888 pitch greater than min required
[01:44:42] [PASSED] ABGR8888 pitch less than min required
[01:44:42] [PASSED] ABGR8888 Invalid width
[01:44:42] [PASSED] ABGR8888 Invalid buffer handle
[01:44:42] [PASSED] No pixel format
[01:44:42] [PASSED] ABGR8888 Width 0
[01:44:42] [PASSED] ABGR8888 Height 0
[01:44:42] [PASSED] ABGR8888 Out of bound height * pitch combination
[01:44:42] [PASSED] ABGR8888 Large buffer offset
[01:44:42] [PASSED] ABGR8888 Buffer offset for inexistent plane
[01:44:42] [PASSED] ABGR8888 Invalid flag
[01:44:42] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[01:44:42] [PASSED] ABGR8888 Valid buffer modifier
[01:44:42] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[01:44:42] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[01:44:42] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[01:44:42] [PASSED] NV12 Normal sizes
[01:44:42] [PASSED] NV12 Max sizes
[01:44:42] [PASSED] NV12 Invalid pitch
[01:44:42] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[01:44:42] [PASSED] NV12 different  modifier per-plane
[01:44:42] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[01:44:42] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[01:44:42] [PASSED] NV12 Modifier for inexistent plane
[01:44:42] [PASSED] NV12 Handle for inexistent plane
[01:44:42] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[01:44:42] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[01:44:42] [PASSED] YVU420 Normal sizes
[01:44:42] [PASSED] YVU420 Max sizes
[01:44:42] [PASSED] YVU420 Invalid pitch
[01:44:42] [PASSED] YVU420 Different pitches
[01:44:42] [PASSED] YVU420 Different buffer offsets/pitches
[01:44:42] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[01:44:42] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[01:44:42] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[01:44:42] [PASSED] YVU420 Valid modifier
[01:44:42] [PASSED] YVU420 Different modifiers per plane
[01:44:42] [PASSED] YVU420 Modifier for inexistent plane
[01:44:42] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[01:44:42] [PASSED] X0L2 Normal sizes
[01:44:42] [PASSED] X0L2 Max sizes
[01:44:42] [PASSED] X0L2 Invalid pitch
[01:44:42] [PASSED] X0L2 Pitch greater than minimum required
[01:44:42] [PASSED] X0L2 Handle for inexistent plane
[01:44:42] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[01:44:42] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[01:44:42] [PASSED] X0L2 Valid modifier
[01:44:42] [PASSED] X0L2 Modifier for inexistent plane
[01:44:42] =========== [PASSED] drm_test_framebuffer_create ===========
[01:44:42] [PASSED] drm_test_framebuffer_free
[01:44:42] [PASSED] drm_test_framebuffer_init
[01:44:42] [PASSED] drm_test_framebuffer_init_bad_format
[01:44:42] [PASSED] drm_test_framebuffer_init_dev_mismatch
[01:44:42] [PASSED] drm_test_framebuffer_lookup
[01:44:42] [PASSED] drm_test_framebuffer_lookup_inexistent
[01:44:42] [PASSED] drm_test_framebuffer_modifiers_not_supported
[01:44:42] ================= [PASSED] drm_framebuffer =================
[01:44:42] ================ drm_gem_shmem (8 subtests) ================
[01:44:42] [PASSED] drm_gem_shmem_test_obj_create
[01:44:42] [PASSED] drm_gem_shmem_test_obj_create_private
[01:44:42] [PASSED] drm_gem_shmem_test_pin_pages
[01:44:42] [PASSED] drm_gem_shmem_test_vmap
[01:44:42] [PASSED] drm_gem_shmem_test_get_sg_table
[01:44:42] [PASSED] drm_gem_shmem_test_get_pages_sgt
[01:44:42] [PASSED] drm_gem_shmem_test_madvise
[01:44:42] [PASSED] drm_gem_shmem_test_purge
[01:44:42] ================== [PASSED] drm_gem_shmem ==================
[01:44:42] === drm_atomic_helper_connector_hdmi_check (27 subtests) ===
[01:44:42] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[01:44:42] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[01:44:42] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[01:44:42] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[01:44:42] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[01:44:42] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[01:44:42] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420  =======
[01:44:42] [PASSED] Automatic
[01:44:42] [PASSED] Full
[01:44:42] [PASSED] Limited 16:235
[01:44:42] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[01:44:42] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[01:44:42] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[01:44:42] [PASSED] drm_test_check_disable_connector
[01:44:42] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[01:44:42] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[01:44:42] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[01:44:42] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[01:44:42] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[01:44:42] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[01:44:42] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[01:44:42] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[01:44:42] [PASSED] drm_test_check_output_bpc_dvi
[01:44:42] [PASSED] drm_test_check_output_bpc_format_vic_1
[01:44:42] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[01:44:42] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[01:44:42] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[01:44:42] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[01:44:42] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[01:44:42] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[01:44:42] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[01:44:42] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[01:44:42] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[01:44:42] [PASSED] drm_test_check_broadcast_rgb_value
[01:44:42] [PASSED] drm_test_check_bpc_8_value
[01:44:42] [PASSED] drm_test_check_bpc_10_value
[01:44:42] [PASSED] drm_test_check_bpc_12_value
[01:44:42] [PASSED] drm_test_check_format_value
[01:44:42] [PASSED] drm_test_check_tmds_char_value
[01:44:42] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[01:44:42] = drm_atomic_helper_connector_hdmi_mode_valid (4 subtests) =
[01:44:42] [PASSED] drm_test_check_mode_valid
[01:44:42] [PASSED] drm_test_check_mode_valid_reject
[01:44:42] [PASSED] drm_test_check_mode_valid_reject_rate
[01:44:42] [PASSED] drm_test_check_mode_valid_reject_max_clock
[01:44:42] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[01:44:42] = drm_atomic_helper_connector_hdmi_infoframes (5 subtests) =
[01:44:42] [PASSED] drm_test_check_infoframes
[01:44:42] [PASSED] drm_test_check_reject_avi_infoframe
[01:44:42] [PASSED] drm_test_check_reject_hdr_infoframe_bpc_8
[01:44:42] [PASSED] drm_test_check_reject_hdr_infoframe_bpc_10
[01:44:42] [PASSED] drm_test_check_reject_audio_infoframe
[01:44:42] === [PASSED] drm_atomic_helper_connector_hdmi_infoframes ===
[01:44:42] ================= drm_managed (2 subtests) =================
[01:44:42] [PASSED] drm_test_managed_release_action
[01:44:42] [PASSED] drm_test_managed_run_action
[01:44:42] =================== [PASSED] drm_managed ===================
[01:44:42] =================== drm_mm (6 subtests) ====================
[01:44:42] [PASSED] drm_test_mm_init
[01:44:42] [PASSED] drm_test_mm_debug
[01:44:42] [PASSED] drm_test_mm_align32
[01:44:42] [PASSED] drm_test_mm_align64
[01:44:42] [PASSED] drm_test_mm_lowest
[01:44:42] [PASSED] drm_test_mm_highest
[01:44:42] ===================== [PASSED] drm_mm ======================
[01:44:42] ============= drm_modes_analog_tv (5 subtests) =============
[01:44:42] [PASSED] drm_test_modes_analog_tv_mono_576i
[01:44:42] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[01:44:42] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[01:44:42] [PASSED] drm_test_modes_analog_tv_pal_576i
[01:44:42] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[01:44:42] =============== [PASSED] drm_modes_analog_tv ===============
[01:44:42] ============== drm_plane_helper (2 subtests) ===============
[01:44:42] =============== drm_test_check_plane_state  ================
[01:44:42] [PASSED] clipping_simple
[01:44:42] [PASSED] clipping_rotate_reflect
[01:44:42] [PASSED] positioning_simple
[01:44:42] [PASSED] upscaling
[01:44:42] [PASSED] downscaling
[01:44:42] [PASSED] rounding1
[01:44:42] [PASSED] rounding2
[01:44:42] [PASSED] rounding3
[01:44:42] [PASSED] rounding4
[01:44:42] =========== [PASSED] drm_test_check_plane_state ============
[01:44:42] =========== drm_test_check_invalid_plane_state  ============
[01:44:42] [PASSED] positioning_invalid
[01:44:42] [PASSED] upscaling_invalid
[01:44:42] [PASSED] downscaling_invalid
[01:44:42] ======= [PASSED] drm_test_check_invalid_plane_state ========
[01:44:42] ================ [PASSED] drm_plane_helper =================
[01:44:42] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[01:44:42] ====== drm_test_connector_helper_tv_get_modes_check  =======
[01:44:42] [PASSED] None
[01:44:42] [PASSED] PAL
[01:44:42] [PASSED] NTSC
[01:44:42] [PASSED] Both, NTSC Default
[01:44:42] [PASSED] Both, PAL Default
[01:44:42] [PASSED] Both, NTSC Default, with PAL on command-line
[01:44:42] [PASSED] Both, PAL Default, with NTSC on command-line
[01:44:42] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[01:44:42] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[01:44:42] ================== drm_rect (9 subtests) ===================
[01:44:42] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[01:44:42] [PASSED] drm_test_rect_clip_scaled_not_clipped
[01:44:42] [PASSED] drm_test_rect_clip_scaled_clipped
[01:44:42] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[01:44:42] ================= drm_test_rect_intersect  =================
[01:44:42] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[01:44:42] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[01:44:42] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[01:44:42] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[01:44:42] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[01:44:42] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[01:44:42] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[01:44:42] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[01:44:42] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[01:44:42] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[01:44:42] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[01:44:42] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[01:44:42] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[01:44:42] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[01:44:42] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[01:44:42] ============= [PASSED] drm_test_rect_intersect =============
[01:44:42] ================ drm_test_rect_calc_hscale  ================
[01:44:42] [PASSED] normal use
[01:44:42] [PASSED] out of max range
[01:44:42] [PASSED] out of min range
[01:44:42] [PASSED] zero dst
[01:44:42] [PASSED] negative src
[01:44:42] [PASSED] negative dst
[01:44:42] ============ [PASSED] drm_test_rect_calc_hscale ============
[01:44:42] ================ drm_test_rect_calc_vscale  ================
[01:44:42] [PASSED] normal use
[01:44:42] [PASSED] out of max range
[01:44:42] [PASSED] out of min range
[01:44:42] [PASSED] zero dst
[01:44:42] [PASSED] negative src
[01:44:42] [PASSED] negative dst
stty: 'standard input': Inappropriate ioctl for device
[01:44:42] ============ [PASSED] drm_test_rect_calc_vscale ============
[01:44:42] ================== drm_test_rect_rotate  ===================
[01:44:42] [PASSED] reflect-x
[01:44:42] [PASSED] reflect-y
[01:44:42] [PASSED] rotate-0
[01:44:42] [PASSED] rotate-90
[01:44:42] [PASSED] rotate-180
[01:44:42] [PASSED] rotate-270
[01:44:42] ============== [PASSED] drm_test_rect_rotate ===============
[01:44:42] ================ drm_test_rect_rotate_inv  =================
[01:44:42] [PASSED] reflect-x
[01:44:42] [PASSED] reflect-y
[01:44:42] [PASSED] rotate-0
[01:44:42] [PASSED] rotate-90
[01:44:42] [PASSED] rotate-180
[01:44:42] [PASSED] rotate-270
[01:44:42] ============ [PASSED] drm_test_rect_rotate_inv =============
[01:44:42] ==================== [PASSED] drm_rect =====================
[01:44:42] ============ drm_sysfb_modeset_test (1 subtest) ============
[01:44:42] ============ drm_test_sysfb_build_fourcc_list  =============
[01:44:42] [PASSED] no native formats
[01:44:42] [PASSED] XRGB8888 as native format
[01:44:42] [PASSED] remove duplicates
[01:44:42] [PASSED] convert alpha formats
[01:44:42] [PASSED] random formats
[01:44:42] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[01:44:42] ============= [PASSED] drm_sysfb_modeset_test ==============
[01:44:42] ================== drm_fixp (2 subtests) ===================
[01:44:42] [PASSED] drm_test_int2fixp
[01:44:42] [PASSED] drm_test_sm2fixp
[01:44:42] ==================== [PASSED] drm_fixp =====================
[01:44:42] ============================================================
[01:44:42] Testing complete. Ran 621 tests: passed: 621
[01:44:42] Elapsed time: 26.047s total, 1.733s configuring, 24.148s building, 0.116s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[01:44:42] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[01:44:44] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[01:44:53] Starting KUnit Kernel (1/1)...
[01:44:53] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[01:44:53] ================= ttm_device (5 subtests) ==================
[01:44:53] [PASSED] ttm_device_init_basic
[01:44:53] [PASSED] ttm_device_init_multiple
[01:44:53] [PASSED] ttm_device_fini_basic
[01:44:53] [PASSED] ttm_device_init_no_vma_man
[01:44:53] ================== ttm_device_init_pools  ==================
[01:44:53] [PASSED] No DMA allocations, no DMA32 required
[01:44:53] [PASSED] DMA allocations, DMA32 required
[01:44:53] [PASSED] No DMA allocations, DMA32 required
[01:44:53] [PASSED] DMA allocations, no DMA32 required
[01:44:53] ============== [PASSED] ttm_device_init_pools ==============
[01:44:53] =================== [PASSED] ttm_device ====================
[01:44:53] ================== ttm_pool (8 subtests) ===================
[01:44:53] ================== ttm_pool_alloc_basic  ===================
[01:44:53] [PASSED] One page
[01:44:53] [PASSED] More than one page
[01:44:53] [PASSED] Above the allocation limit
[01:44:53] [PASSED] One page, with coherent DMA mappings enabled
[01:44:53] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[01:44:53] ============== [PASSED] ttm_pool_alloc_basic ===============
[01:44:53] ============== ttm_pool_alloc_basic_dma_addr  ==============
[01:44:53] [PASSED] One page
[01:44:53] [PASSED] More than one page
[01:44:53] [PASSED] Above the allocation limit
[01:44:53] [PASSED] One page, with coherent DMA mappings enabled
[01:44:53] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[01:44:53] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[01:44:53] [PASSED] ttm_pool_alloc_order_caching_match
[01:44:53] [PASSED] ttm_pool_alloc_caching_mismatch
[01:44:53] [PASSED] ttm_pool_alloc_order_mismatch
[01:44:53] [PASSED] ttm_pool_free_dma_alloc
[01:44:53] [PASSED] ttm_pool_free_no_dma_alloc
[01:44:53] [PASSED] ttm_pool_fini_basic
[01:44:53] ==================== [PASSED] ttm_pool =====================
[01:44:53] ================ ttm_resource (8 subtests) =================
[01:44:53] ================= ttm_resource_init_basic  =================
[01:44:53] [PASSED] Init resource in TTM_PL_SYSTEM
[01:44:53] [PASSED] Init resource in TTM_PL_VRAM
[01:44:53] [PASSED] Init resource in a private placement
[01:44:53] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[01:44:53] ============= [PASSED] ttm_resource_init_basic =============
[01:44:53] [PASSED] ttm_resource_init_pinned
[01:44:53] [PASSED] ttm_resource_fini_basic
[01:44:53] [PASSED] ttm_resource_manager_init_basic
[01:44:53] [PASSED] ttm_resource_manager_usage_basic
[01:44:53] [PASSED] ttm_resource_manager_set_used_basic
[01:44:53] [PASSED] ttm_sys_man_alloc_basic
[01:44:53] [PASSED] ttm_sys_man_free_basic
[01:44:53] ================== [PASSED] ttm_resource ===================
[01:44:53] =================== ttm_tt (15 subtests) ===================
[01:44:53] ==================== ttm_tt_init_basic  ====================
[01:44:53] [PASSED] Page-aligned size
[01:44:53] [PASSED] Extra pages requested
[01:44:53] ================ [PASSED] ttm_tt_init_basic ================
[01:44:53] [PASSED] ttm_tt_init_misaligned
[01:44:53] [PASSED] ttm_tt_fini_basic
[01:44:53] [PASSED] ttm_tt_fini_sg
[01:44:53] [PASSED] ttm_tt_fini_shmem
[01:44:53] [PASSED] ttm_tt_create_basic
[01:44:53] [PASSED] ttm_tt_create_invalid_bo_type
[01:44:53] [PASSED] ttm_tt_create_ttm_exists
[01:44:53] [PASSED] ttm_tt_create_failed
[01:44:53] [PASSED] ttm_tt_destroy_basic
[01:44:53] [PASSED] ttm_tt_populate_null_ttm
[01:44:53] [PASSED] ttm_tt_populate_populated_ttm
[01:44:53] [PASSED] ttm_tt_unpopulate_basic
[01:44:53] [PASSED] ttm_tt_unpopulate_empty_ttm
[01:44:53] [PASSED] ttm_tt_swapin_basic
[01:44:53] ===================== [PASSED] ttm_tt ======================
[01:44:53] =================== ttm_bo (14 subtests) ===================
[01:44:53] =========== ttm_bo_reserve_optimistic_no_ticket  ===========
[01:44:53] [PASSED] Cannot be interrupted and sleeps
[01:44:53] [PASSED] Cannot be interrupted, locks straight away
[01:44:53] [PASSED] Can be interrupted, sleeps
[01:44:53] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[01:44:53] [PASSED] ttm_bo_reserve_locked_no_sleep
[01:44:53] [PASSED] ttm_bo_reserve_no_wait_ticket
[01:44:53] [PASSED] ttm_bo_reserve_double_resv
[01:44:53] [PASSED] ttm_bo_reserve_interrupted
[01:44:53] [PASSED] ttm_bo_reserve_deadlock
[01:44:53] [PASSED] ttm_bo_unreserve_basic
[01:44:53] [PASSED] ttm_bo_unreserve_pinned
[01:44:53] [PASSED] ttm_bo_unreserve_bulk
[01:44:53] [PASSED] ttm_bo_fini_basic
[01:44:53] [PASSED] ttm_bo_fini_shared_resv
[01:44:53] [PASSED] ttm_bo_pin_basic
[01:44:53] [PASSED] ttm_bo_pin_unpin_resource
[01:44:53] [PASSED] ttm_bo_multiple_pin_one_unpin
[01:44:53] ===================== [PASSED] ttm_bo ======================
[01:44:53] ============== ttm_bo_validate (21 subtests) ===============
[01:44:53] ============== ttm_bo_init_reserved_sys_man  ===============
[01:44:53] [PASSED] Buffer object for userspace
[01:44:53] [PASSED] Kernel buffer object
[01:44:53] [PASSED] Shared buffer object
[01:44:53] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[01:44:53] ============== ttm_bo_init_reserved_mock_man  ==============
[01:44:53] [PASSED] Buffer object for userspace
[01:44:53] [PASSED] Kernel buffer object
[01:44:53] [PASSED] Shared buffer object
[01:44:53] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[01:44:53] [PASSED] ttm_bo_init_reserved_resv
[01:44:53] ================== ttm_bo_validate_basic  ==================
[01:44:53] [PASSED] Buffer object for userspace
[01:44:53] [PASSED] Kernel buffer object
[01:44:53] [PASSED] Shared buffer object
[01:44:53] ============== [PASSED] ttm_bo_validate_basic ==============
[01:44:53] [PASSED] ttm_bo_validate_invalid_placement
[01:44:53] ============= ttm_bo_validate_same_placement  ==============
[01:44:53] [PASSED] System manager
[01:44:53] [PASSED] VRAM manager
[01:44:53] ========= [PASSED] ttm_bo_validate_same_placement ==========
[01:44:53] [PASSED] ttm_bo_validate_failed_alloc
[01:44:53] [PASSED] ttm_bo_validate_pinned
[01:44:53] [PASSED] ttm_bo_validate_busy_placement
[01:44:53] ================ ttm_bo_validate_multihop  =================
[01:44:53] [PASSED] Buffer object for userspace
[01:44:53] [PASSED] Kernel buffer object
[01:44:53] [PASSED] Shared buffer object
[01:44:53] ============ [PASSED] ttm_bo_validate_multihop =============
[01:44:53] ========== ttm_bo_validate_no_placement_signaled  ==========
[01:44:53] [PASSED] Buffer object in system domain, no page vector
[01:44:53] [PASSED] Buffer object in system domain with an existing page vector
[01:44:53] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[01:44:53] ======== ttm_bo_validate_no_placement_not_signaled  ========
[01:44:53] [PASSED] Buffer object for userspace
[01:44:53] [PASSED] Kernel buffer object
[01:44:53] [PASSED] Shared buffer object
[01:44:53] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[01:44:53] [PASSED] ttm_bo_validate_move_fence_signaled
[01:44:53] ========= ttm_bo_validate_move_fence_not_signaled  =========
[01:44:53] [PASSED] Waits for GPU
[01:44:53] [PASSED] Tries to lock straight away
[01:44:53] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[01:44:53] [PASSED] ttm_bo_validate_happy_evict
[01:44:53] [PASSED] ttm_bo_validate_all_pinned_evict
[01:44:53] [PASSED] ttm_bo_validate_allowed_only_evict
[01:44:53] [PASSED] ttm_bo_validate_deleted_evict
[01:44:53] [PASSED] ttm_bo_validate_busy_domain_evict
[01:44:53] [PASSED] ttm_bo_validate_evict_gutting
[01:44:53] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[01:44:53] ================= [PASSED] ttm_bo_validate =================
[01:44:53] ============================================================
[01:44:53] Testing complete. Ran 101 tests: passed: 101
[01:44:53] Elapsed time: 11.162s total, 1.651s configuring, 9.295s building, 0.185s running

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 63+ messages in thread

* ✓ Xe.CI.BAT: success for CPU binds and ULLS on migration queue (rev3)
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (26 preceding siblings ...)
  2026-02-28  1:44 ` ✓ CI.KUnit: success " Patchwork
@ 2026-02-28  2:32 ` Patchwork
  2026-02-28 13:59 ` ✗ Xe.CI.FULL: failure " Patchwork
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 63+ messages in thread
From: Patchwork @ 2026-02-28  2:32 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 28377 bytes --]

== Series Details ==

Series: CPU binds and ULLS on migration queue (rev3)
URL   : https://patchwork.freedesktop.org/series/149888/
State : success

== Summary ==

CI Bug Log - changes from xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1_BAT -> xe-pw-149888v3_BAT
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (4 -> 13)
------------------------------

  Additional (9): bat-bmg-1 bat-lnl-2 bat-lnl-1 bat-ptl-vm bat-atsm-2 bat-wcl-1 bat-wcl-2 bat-bmg-2 bat-adlp-7 

Known issues
------------

  Here are the changes found in xe-pw-149888v3_BAT that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@fbdev@info:
    - bat-atsm-2:         NOTRUN -> [SKIP][1] ([Intel XE#2134]) +4 other tests skip
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@fbdev@info.html

  * igt@fbdev@nullptr:
    - bat-bmg-2:          NOTRUN -> [SKIP][2] ([Intel XE#2134]) +4 other tests skip
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-2/igt@fbdev@nullptr.html
    - bat-lnl-2:          NOTRUN -> [SKIP][3] ([Intel XE#2134]) +4 other tests skip
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@fbdev@nullptr.html

  * igt@fbdev@write:
    - bat-wcl-2:          NOTRUN -> [SKIP][4] ([Intel XE#7241]) +4 other tests skip
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-2/igt@fbdev@write.html

  * igt@kms_addfb_basic@addfb25-y-tiled-small-legacy:
    - bat-lnl-1:          NOTRUN -> [SKIP][5] ([Intel XE#1466])
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-1/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html
    - bat-wcl-2:          NOTRUN -> [SKIP][6] ([Intel XE#7245])
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-2/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html
    - bat-bmg-2:          NOTRUN -> [SKIP][7] ([Intel XE#2233])
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-2/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html
    - bat-bmg-1:          NOTRUN -> [SKIP][8] ([Intel XE#2233])
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-1/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html
    - bat-lnl-2:          NOTRUN -> [SKIP][9] ([Intel XE#1466] / [Intel XE#2235])
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html
    - bat-wcl-1:          NOTRUN -> [SKIP][10] ([Intel XE#7245])
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-1/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html

  * igt@kms_addfb_basic@invalid-set-prop-any:
    - bat-atsm-2:         NOTRUN -> [SKIP][11] ([i915#6077]) +30 other tests skip
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@kms_addfb_basic@invalid-set-prop-any.html

  * igt@kms_cursor_legacy@basic-flip-after-cursor-atomic:
    - bat-atsm-2:         NOTRUN -> [SKIP][12] ([Intel XE#1024] / [Intel XE#782] / [Intel XE#947]) +5 other tests skip
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@kms_cursor_legacy@basic-flip-after-cursor-atomic.html

  * igt@kms_cursor_legacy@basic-flip-after-cursor-legacy:
    - bat-bmg-2:          NOTRUN -> [SKIP][13] ([Intel XE#2489] / [Intel XE#3419]) +13 other tests skip
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-2/igt@kms_cursor_legacy@basic-flip-after-cursor-legacy.html

  * igt@kms_dsc@dsc-basic:
    - bat-lnl-1:          NOTRUN -> [SKIP][14] ([Intel XE#2244])
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-1/igt@kms_dsc@dsc-basic.html
    - bat-adlp-7:         NOTRUN -> [SKIP][15] ([Intel XE#2244] / [Intel XE#455])
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-adlp-7/igt@kms_dsc@dsc-basic.html
    - bat-bmg-1:          NOTRUN -> [SKIP][16] ([Intel XE#2244])
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-1/igt@kms_dsc@dsc-basic.html
    - bat-atsm-2:         NOTRUN -> [SKIP][17] ([Intel XE#1024] / [Intel XE#784] / [Intel XE#947])
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@kms_dsc@dsc-basic.html
    - bat-wcl-1:          NOTRUN -> [SKIP][18] ([Intel XE#7244])
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-1/igt@kms_dsc@dsc-basic.html

  * igt@kms_flip@basic-flip-vs-dpms:
    - bat-adlp-7:         NOTRUN -> [DMESG-WARN][19] ([Intel XE#7483]) +12 other tests dmesg-warn
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-adlp-7/igt@kms_flip@basic-flip-vs-dpms.html
    - bat-lnl-2:          NOTRUN -> [SKIP][20] ([Intel XE#2235] / [Intel XE#2482]) +3 other tests skip
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@kms_flip@basic-flip-vs-dpms.html

  * igt@kms_flip@basic-flip-vs-modeset:
    - bat-wcl-2:          NOTRUN -> [SKIP][21] ([Intel XE#7240]) +3 other tests skip
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-2/igt@kms_flip@basic-flip-vs-modeset.html
    - bat-bmg-2:          NOTRUN -> [SKIP][22] ([Intel XE#2482]) +3 other tests skip
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-2/igt@kms_flip@basic-flip-vs-modeset.html

  * igt@kms_force_connector_basic@force-connector-state:
    - bat-lnl-1:          NOTRUN -> [SKIP][23] ([Intel XE#352]) +2 other tests skip
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-1/igt@kms_force_connector_basic@force-connector-state.html
    - bat-lnl-2:          NOTRUN -> [SKIP][24] ([Intel XE#2235] / [Intel XE#352]) +2 other tests skip
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@kms_force_connector_basic@force-connector-state.html

  * igt@kms_frontbuffer_tracking@basic:
    - bat-wcl-2:          NOTRUN -> [SKIP][25] ([Intel XE#7246])
   [25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-2/igt@kms_frontbuffer_tracking@basic.html
    - bat-bmg-2:          NOTRUN -> [SKIP][26] ([Intel XE#2434] / [Intel XE#2548] / [Intel XE#6314])
   [26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-2/igt@kms_frontbuffer_tracking@basic.html
    - bat-lnl-2:          NOTRUN -> [SKIP][27] ([Intel XE#2235] / [Intel XE#2548])
   [27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@kms_frontbuffer_tracking@basic.html
    - bat-atsm-2:         NOTRUN -> [SKIP][28] ([Intel XE#1024] / [Intel XE#783] / [Intel XE#947])
   [28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@kms_frontbuffer_tracking@basic.html

  * igt@kms_hdmi_inject@inject-audio:
    - bat-atsm-2:         NOTRUN -> [SKIP][29] ([Intel XE#540]) +3 other tests skip
   [29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@kms_hdmi_inject@inject-audio.html
    - bat-lnl-1:          NOTRUN -> [SKIP][30] ([Intel XE#1470] / [Intel XE#2853])
   [30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-1/igt@kms_hdmi_inject@inject-audio.html
    - bat-lnl-2:          NOTRUN -> [SKIP][31] ([Intel XE#1470] / [Intel XE#2235] / [Intel XE#2853])
   [31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@kms_hdmi_inject@inject-audio.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-xr24:
    - bat-lnl-2:          NOTRUN -> [SKIP][32] ([Intel XE#2235]) +13 other tests skip
   [32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-xr24.html

  * igt@kms_pipe_crc_basic@hang-read-crc:
    - bat-wcl-2:          NOTRUN -> [SKIP][33] ([Intel XE#7237]) +13 other tests skip
   [33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-2/igt@kms_pipe_crc_basic@hang-read-crc.html

  * igt@kms_pipe_crc_basic@nonblocking-crc:
    - bat-atsm-2:         NOTRUN -> [SKIP][34] ([Intel XE#829] / [i915#1836]) +6 other tests skip
   [34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@kms_pipe_crc_basic@nonblocking-crc.html

  * igt@kms_prop_blob@basic:
    - bat-atsm-2:         NOTRUN -> [SKIP][35] ([Intel XE#780])
   [35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@kms_prop_blob@basic.html

  * igt@kms_psr@psr-cursor-plane-move:
    - bat-lnl-2:          NOTRUN -> [SKIP][36] ([Intel XE#2850] / [Intel XE#929]) +2 other tests skip
   [36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@kms_psr@psr-cursor-plane-move.html

  * igt@kms_psr@psr-primary-page-flip:
    - bat-atsm-2:         NOTRUN -> [SKIP][37] ([Intel XE#1024] / [Intel XE#947]) +6 other tests skip
   [37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@kms_psr@psr-primary-page-flip.html

  * igt@kms_psr@psr-sprite-plane-onoff:
    - bat-wcl-2:          NOTRUN -> [SKIP][38] ([Intel XE#2850]) +2 other tests skip
   [38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-2/igt@kms_psr@psr-sprite-plane-onoff.html
    - bat-bmg-2:          NOTRUN -> [SKIP][39] ([Intel XE#2234] / [Intel XE#2850]) +2 other tests skip
   [39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-2/igt@kms_psr@psr-sprite-plane-onoff.html
    - bat-bmg-1:          NOTRUN -> [SKIP][40] ([Intel XE#2234] / [Intel XE#2850]) +2 other tests skip
   [40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-1/igt@kms_psr@psr-sprite-plane-onoff.html

  * igt@sriov_basic@enable-vfs-autoprobe-off:
    - bat-lnl-1:          NOTRUN -> [SKIP][41] ([Intel XE#1091] / [Intel XE#2849]) +1 other test skip
   [41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-1/igt@sriov_basic@enable-vfs-autoprobe-off.html
    - bat-lnl-2:          NOTRUN -> [SKIP][42] ([Intel XE#1091] / [Intel XE#2849]) +1 other test skip
   [42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@sriov_basic@enable-vfs-autoprobe-off.html

  * igt@xe_evict@evict-beng-small:
    - bat-adlp-7:         NOTRUN -> [SKIP][43] ([Intel XE#261] / [Intel XE#5564] / [Intel XE#688]) +9 other tests skip
   [43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-adlp-7/igt@xe_evict@evict-beng-small.html
    - bat-lnl-2:          NOTRUN -> [SKIP][44] ([Intel XE#6540] / [Intel XE#688]) +11 other tests skip
   [44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@xe_evict@evict-beng-small.html

  * igt@xe_evict@evict-beng-small-cm:
    - bat-ptl-vm:         NOTRUN -> [SKIP][45] ([Intel XE#5764]) +10 other tests skip
   [45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-ptl-vm/igt@xe_evict@evict-beng-small-cm.html
    - bat-lnl-1:          NOTRUN -> [SKIP][46] ([Intel XE#6540] / [Intel XE#688]) +11 other tests skip
   [46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-1/igt@xe_evict@evict-beng-small-cm.html

  * igt@xe_evict@evict-small-external-cm:
    - bat-wcl-1:          NOTRUN -> [SKIP][47] ([Intel XE#7238]) +11 other tests skip
   [47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-1/igt@xe_evict@evict-small-external-cm.html

  * igt@xe_evict@evict-small-multi-vm:
    - bat-wcl-2:          NOTRUN -> [SKIP][48] ([Intel XE#7238]) +11 other tests skip
   [48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-2/igt@xe_evict@evict-small-multi-vm.html

  * igt@xe_evict_ccs@evict-overcommit-parallel-nofree-samefd:
    - bat-adlp-7:         NOTRUN -> [SKIP][49] ([Intel XE#5563] / [Intel XE#688]) +1 other test skip
   [49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-adlp-7/igt@xe_evict_ccs@evict-overcommit-parallel-nofree-samefd.html

  * igt@xe_exec_balancer@no-exec-parallel-basic:
    - bat-lnl-1:          NOTRUN -> [SKIP][50] ([Intel XE#7482]) +17 other tests skip
   [50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-1/igt@xe_exec_balancer@no-exec-parallel-basic.html

  * igt@xe_exec_balancer@no-exec-virtual-basic:
    - bat-lnl-2:          NOTRUN -> [SKIP][51] ([Intel XE#7482]) +17 other tests skip
   [51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@xe_exec_balancer@no-exec-virtual-basic.html

  * igt@xe_exec_balancer@twice-cm-virtual-userptr:
    - bat-wcl-2:          NOTRUN -> [SKIP][52] ([Intel XE#7482]) +17 other tests skip
   [52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-2/igt@xe_exec_balancer@twice-cm-virtual-userptr.html

  * igt@xe_exec_balancer@twice-parallel-basic:
    - bat-ptl-vm:         NOTRUN -> [SKIP][53] ([Intel XE#7482]) +17 other tests skip
   [53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-ptl-vm/igt@xe_exec_balancer@twice-parallel-basic.html

  * igt@xe_exec_balancer@twice-virtual-rebind:
    - bat-wcl-1:          NOTRUN -> [SKIP][54] ([Intel XE#7482]) +17 other tests skip
   [54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-1/igt@xe_exec_balancer@twice-virtual-rebind.html

  * igt@xe_exec_balancer@twice-virtual-userptr-invalidate:
    - bat-adlp-7:         NOTRUN -> [SKIP][55] ([Intel XE#7482]) +17 other tests skip
   [55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-adlp-7/igt@xe_exec_balancer@twice-virtual-userptr-invalidate.html

  * igt@xe_exec_fault_mode@twice-userptr-invalidate-imm:
    - bat-atsm-2:         NOTRUN -> [SKIP][56] ([Intel XE#288]) +32 other tests skip
   [56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@xe_exec_fault_mode@twice-userptr-invalidate-imm.html

  * igt@xe_exec_fault_mode@twice-userptr-invalidate-prefetch:
    - bat-adlp-7:         NOTRUN -> [SKIP][57] ([Intel XE#288] / [Intel XE#5561]) +32 other tests skip
   [57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-adlp-7/igt@xe_exec_fault_mode@twice-userptr-invalidate-prefetch.html

  * igt@xe_huc_copy@huc_copy:
    - bat-atsm-2:         NOTRUN -> [SKIP][58] ([Intel XE#255])
   [58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@xe_huc_copy@huc_copy.html

  * igt@xe_live_ktest@xe_bo:
    - bat-lnl-1:          NOTRUN -> [SKIP][59] ([Intel XE#2229]) +2 other tests skip
   [59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-1/igt@xe_live_ktest@xe_bo.html

  * igt@xe_live_ktest@xe_bo@xe_bo_evict_kunit:
    - bat-wcl-2:          NOTRUN -> [SKIP][60] ([Intel XE#7239]) +2 other tests skip
   [60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-2/igt@xe_live_ktest@xe_bo@xe_bo_evict_kunit.html
    - bat-adlp-7:         NOTRUN -> [SKIP][61] ([Intel XE#2229] / [Intel XE#455]) +2 other tests skip
   [61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-adlp-7/igt@xe_live_ktest@xe_bo@xe_bo_evict_kunit.html
    - bat-lnl-2:          NOTRUN -> [SKIP][62] ([Intel XE#2229]) +2 other tests skip
   [62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@xe_live_ktest@xe_bo@xe_bo_evict_kunit.html

  * igt@xe_live_ktest@xe_bo@xe_ccs_migrate_kunit:
    - bat-bmg-2:          NOTRUN -> [SKIP][63] ([Intel XE#2229])
   [63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-2/igt@xe_live_ktest@xe_bo@xe_ccs_migrate_kunit.html
    - bat-bmg-1:          NOTRUN -> [SKIP][64] ([Intel XE#2229])
   [64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-1/igt@xe_live_ktest@xe_bo@xe_ccs_migrate_kunit.html

  * igt@xe_live_ktest@xe_migrate@xe_validate_ccs_kunit:
    - bat-atsm-2:         NOTRUN -> [SKIP][65] ([Intel XE#2229])
   [65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@xe_live_ktest@xe_migrate@xe_validate_ccs_kunit.html
    - bat-wcl-1:          NOTRUN -> [SKIP][66] ([Intel XE#7239]) +2 other tests skip
   [66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-1/igt@xe_live_ktest@xe_migrate@xe_validate_ccs_kunit.html
    - bat-ptl-vm:         NOTRUN -> [SKIP][67] ([Intel XE#5775])
   [67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-ptl-vm/igt@xe_live_ktest@xe_migrate@xe_validate_ccs_kunit.html
    - bat-adlp-7:         NOTRUN -> [SKIP][68] ([Intel XE#2229] / [Intel XE#5488])
   [68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-adlp-7/igt@xe_live_ktest@xe_migrate@xe_validate_ccs_kunit.html

  * igt@xe_mmap@vram:
    - bat-ptl-vm:         NOTRUN -> [SKIP][69] ([Intel XE#5776])
   [69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-ptl-vm/igt@xe_mmap@vram.html
    - bat-lnl-1:          NOTRUN -> [SKIP][70] ([Intel XE#1416])
   [70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-1/igt@xe_mmap@vram.html
    - bat-wcl-2:          NOTRUN -> [SKIP][71] ([Intel XE#7243])
   [71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-2/igt@xe_mmap@vram.html
    - bat-adlp-7:         NOTRUN -> [SKIP][72] ([Intel XE#1008] / [Intel XE#5591])
   [72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-adlp-7/igt@xe_mmap@vram.html
    - bat-lnl-2:          NOTRUN -> [SKIP][73] ([Intel XE#1416])
   [73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@xe_mmap@vram.html
    - bat-wcl-1:          NOTRUN -> [SKIP][74] ([Intel XE#7243])
   [74]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-1/igt@xe_mmap@vram.html

  * igt@xe_pat@pat-index-xe2:
    - bat-atsm-2:         NOTRUN -> [SKIP][75] ([Intel XE#977])
   [75]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@xe_pat@pat-index-xe2.html
    - bat-adlp-7:         NOTRUN -> [SKIP][76] ([Intel XE#977])
   [76]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-adlp-7/igt@xe_pat@pat-index-xe2.html

  * igt@xe_pat@pat-index-xehpc:
    - bat-ptl-vm:         NOTRUN -> [SKIP][77] ([Intel XE#5777])
   [77]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-ptl-vm/igt@xe_pat@pat-index-xehpc.html
    - bat-lnl-1:          NOTRUN -> [SKIP][78] ([Intel XE#1420] / [Intel XE#2838])
   [78]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-1/igt@xe_pat@pat-index-xehpc.html
    - bat-wcl-2:          NOTRUN -> [SKIP][79] ([Intel XE#7247])
   [79]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-2/igt@xe_pat@pat-index-xehpc.html
    - bat-bmg-2:          NOTRUN -> [SKIP][80] ([Intel XE#1420])
   [80]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-2/igt@xe_pat@pat-index-xehpc.html
    - bat-bmg-1:          NOTRUN -> [SKIP][81] ([Intel XE#1420])
   [81]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-1/igt@xe_pat@pat-index-xehpc.html
    - bat-adlp-7:         NOTRUN -> [SKIP][82] ([Intel XE#2838] / [Intel XE#979])
   [82]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-adlp-7/igt@xe_pat@pat-index-xehpc.html
    - bat-lnl-2:          NOTRUN -> [SKIP][83] ([Intel XE#1420] / [Intel XE#2838])
   [83]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@xe_pat@pat-index-xehpc.html
    - bat-wcl-1:          NOTRUN -> [SKIP][84] ([Intel XE#7247])
   [84]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-1/igt@xe_pat@pat-index-xehpc.html
    - bat-atsm-2:         NOTRUN -> [SKIP][85] ([Intel XE#2838] / [Intel XE#979])
   [85]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@xe_pat@pat-index-xehpc.html

  * igt@xe_pat@pat-index-xelp:
    - bat-bmg-2:          NOTRUN -> [SKIP][86] ([Intel XE#2245])
   [86]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-2/igt@xe_pat@pat-index-xelp.html
    - bat-bmg-1:          NOTRUN -> [SKIP][87] ([Intel XE#2245])
   [87]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-1/igt@xe_pat@pat-index-xelp.html
    - bat-lnl-2:          NOTRUN -> [SKIP][88] ([Intel XE#977])
   [88]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@xe_pat@pat-index-xelp.html
    - bat-wcl-1:          NOTRUN -> [SKIP][89] ([Intel XE#7242])
   [89]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-1/igt@xe_pat@pat-index-xelp.html
    - bat-ptl-vm:         NOTRUN -> [SKIP][90] ([Intel XE#5771])
   [90]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-ptl-vm/igt@xe_pat@pat-index-xelp.html
    - bat-wcl-2:          NOTRUN -> [SKIP][91] ([Intel XE#7242])
   [91]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-2/igt@xe_pat@pat-index-xelp.html
    - bat-lnl-1:          NOTRUN -> [SKIP][92] ([Intel XE#977])
   [92]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-1/igt@xe_pat@pat-index-xelp.html

  * igt@xe_pat@pat-index-xelpg:
    - bat-ptl-vm:         NOTRUN -> [SKIP][93] ([Intel XE#5780])
   [93]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-ptl-vm/igt@xe_pat@pat-index-xelpg.html
    - bat-lnl-1:          NOTRUN -> [SKIP][94] ([Intel XE#979])
   [94]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-1/igt@xe_pat@pat-index-xelpg.html
    - bat-wcl-2:          NOTRUN -> [SKIP][95] ([Intel XE#7248])
   [95]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-2/igt@xe_pat@pat-index-xelpg.html
    - bat-bmg-2:          NOTRUN -> [SKIP][96] ([Intel XE#2236])
   [96]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-2/igt@xe_pat@pat-index-xelpg.html
    - bat-bmg-1:          NOTRUN -> [SKIP][97] ([Intel XE#2236])
   [97]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-bmg-1/igt@xe_pat@pat-index-xelpg.html
    - bat-adlp-7:         NOTRUN -> [SKIP][98] ([Intel XE#979])
   [98]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-adlp-7/igt@xe_pat@pat-index-xelpg.html
    - bat-lnl-2:          NOTRUN -> [SKIP][99] ([Intel XE#2236] / [Intel XE#979])
   [99]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@xe_pat@pat-index-xelpg.html
    - bat-wcl-1:          NOTRUN -> [SKIP][100] ([Intel XE#7248])
   [100]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-wcl-1/igt@xe_pat@pat-index-xelpg.html
    - bat-atsm-2:         NOTRUN -> [SKIP][101] ([Intel XE#979])
   [101]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-atsm-2/igt@xe_pat@pat-index-xelpg.html

  * igt@xe_sriov_flr@flr-vf1-clear:
    - bat-lnl-2:          NOTRUN -> [SKIP][102] ([Intel XE#3342])
   [102]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-2/igt@xe_sriov_flr@flr-vf1-clear.html
    - bat-lnl-1:          NOTRUN -> [SKIP][103] ([Intel XE#3342])
   [103]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/bat-lnl-1/igt@xe_sriov_flr@flr-vf1-clear.html

  
  [Intel XE#1008]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1008
  [Intel XE#1024]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1024
  [Intel XE#1091]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1091
  [Intel XE#1416]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1416
  [Intel XE#1420]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1420
  [Intel XE#1466]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1466
  [Intel XE#1470]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1470
  [Intel XE#2134]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2134
  [Intel XE#2229]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2229
  [Intel XE#2233]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2233
  [Intel XE#2234]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2234
  [Intel XE#2235]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2235
  [Intel XE#2236]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2236
  [Intel XE#2244]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2244
  [Intel XE#2245]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2245
  [Intel XE#2434]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2434
  [Intel XE#2482]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2482
  [Intel XE#2489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2489
  [Intel XE#2548]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2548
  [Intel XE#255]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/255
  [Intel XE#261]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/261
  [Intel XE#2838]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2838
  [Intel XE#2849]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2849
  [Intel XE#2850]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2850
  [Intel XE#2853]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2853
  [Intel XE#288]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/288
  [Intel XE#3342]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3342
  [Intel XE#3419]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3419
  [Intel XE#352]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/352
  [Intel XE#455]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/455
  [Intel XE#540]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/540
  [Intel XE#5488]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5488
  [Intel XE#5561]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5561
  [Intel XE#5563]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5563
  [Intel XE#5564]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5564
  [Intel XE#5591]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5591
  [Intel XE#5764]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5764
  [Intel XE#5771]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5771
  [Intel XE#5775]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5775
  [Intel XE#5776]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5776
  [Intel XE#5777]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5777
  [Intel XE#5780]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5780
  [Intel XE#6314]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6314
  [Intel XE#6540]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6540
  [Intel XE#688]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/688
  [Intel XE#7237]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7237
  [Intel XE#7238]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7238
  [Intel XE#7239]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7239
  [Intel XE#7240]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7240
  [Intel XE#7241]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7241
  [Intel XE#7242]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7242
  [Intel XE#7243]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7243
  [Intel XE#7244]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7244
  [Intel XE#7245]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7245
  [Intel XE#7246]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7246
  [Intel XE#7247]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7247
  [Intel XE#7248]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7248
  [Intel XE#7482]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7482
  [Intel XE#7483]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7483
  [Intel XE#780]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/780
  [Intel XE#782]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/782
  [Intel XE#783]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/783
  [Intel XE#784]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/784
  [Intel XE#829]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/829
  [Intel XE#929]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/929
  [Intel XE#947]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/947
  [Intel XE#977]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/977
  [Intel XE#979]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/979
  [i915#1836]: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/1836
  [i915#6077]: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/6077


Build changes
-------------

  * Linux: xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1 -> xe-pw-149888v3

  IGT_8775: 8775
  xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1: a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1
  xe-pw-149888v3: 149888v3

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/index.html

[-- Attachment #2: Type: text/html, Size: 34947 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* ✗ Xe.CI.FULL: failure for CPU binds and ULLS on migration queue (rev3)
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (27 preceding siblings ...)
  2026-02-28  2:32 ` ✓ Xe.CI.BAT: " Patchwork
@ 2026-02-28 13:59 ` Patchwork
  2026-03-02 17:54   ` Summers, Stuart
  2026-03-05 22:56 ` [PATCH v3 00/25] CPU binds and ULLS on migration queue Summers, Stuart
  2026-03-20 15:31 ` Thomas Hellström
  30 siblings, 1 reply; 63+ messages in thread
From: Patchwork @ 2026-02-28 13:59 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 23374 bytes --]

== Series Details ==

Series: CPU binds and ULLS on migration queue (rev3)
URL   : https://patchwork.freedesktop.org/series/149888/
State : failure

== Summary ==

CI Bug Log - changes from xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1_FULL -> xe-pw-149888v3_FULL
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-149888v3_FULL absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-149888v3_FULL, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (2 -> 2)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-149888v3_FULL:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs@pipe-d-dp-2:
    - shard-bmg:          [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-bmg-3/igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs@pipe-d-dp-2.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-1/igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs@pipe-d-dp-2.html

  * igt@xe_module_load@many-reload:
    - shard-bmg:          [PASS][3] -> [ABORT][4] +1 other test abort
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-bmg-8/igt@xe_module_load@many-reload.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-8/igt@xe_module_load@many-reload.html

  * igt@xe_vm@bind-array-enobufs:
    - shard-lnl:          [PASS][5] -> [FAIL][6]
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-lnl-3/igt@xe_vm@bind-array-enobufs.html
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-4/igt@xe_vm@bind-array-enobufs.html
    - shard-bmg:          [PASS][7] -> [FAIL][8]
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-bmg-2/igt@xe_vm@bind-array-enobufs.html
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-4/igt@xe_vm@bind-array-enobufs.html

  
Known issues
------------

  Here are the changes found in xe-pw-149888v3_FULL that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_big_fb@x-tiled-16bpp-rotate-270:
    - shard-lnl:          NOTRUN -> [SKIP][9] ([Intel XE#1407])
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_big_fb@x-tiled-16bpp-rotate-270.html

  * igt@kms_big_fb@y-tiled-64bpp-rotate-270:
    - shard-lnl:          NOTRUN -> [SKIP][10] ([Intel XE#1124]) +2 other tests skip
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_big_fb@y-tiled-64bpp-rotate-270.html

  * igt@kms_bw@connected-linear-tiling-2-displays-2160x1440p:
    - shard-lnl:          NOTRUN -> [SKIP][11] ([Intel XE#2191])
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_bw@connected-linear-tiling-2-displays-2160x1440p.html

  * igt@kms_ccs@ccs-on-another-bo-4-tiled-mtl-rc-ccs:
    - shard-lnl:          NOTRUN -> [SKIP][12] ([Intel XE#2887]) +3 other tests skip
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_ccs@ccs-on-another-bo-4-tiled-mtl-rc-ccs.html

  * igt@kms_ccs@crc-primary-rotation-180-4-tiled-lnl-ccs@pipe-a-dp-1:
    - shard-bmg:          NOTRUN -> [SKIP][13] ([Intel XE#2652]) +3 other tests skip
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-5/igt@kms_ccs@crc-primary-rotation-180-4-tiled-lnl-ccs@pipe-a-dp-1.html

  * igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs:
    - shard-bmg:          [PASS][14] -> [INCOMPLETE][15] ([Intel XE#7084])
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-bmg-3/igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs.html
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-1/igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs.html

  * igt@kms_chamelium_frames@hdmi-crc-fast:
    - shard-lnl:          NOTRUN -> [SKIP][16] ([Intel XE#373]) +1 other test skip
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_chamelium_frames@hdmi-crc-fast.html

  * igt@kms_content_protection@atomic-dpms:
    - shard-lnl:          NOTRUN -> [SKIP][17] ([Intel XE#3278] / [Intel XE#6973])
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_content_protection@atomic-dpms.html

  * igt@kms_content_protection@legacy-hdcp14@pipe-a-dp-2:
    - shard-bmg:          NOTRUN -> [FAIL][18] ([Intel XE#1178] / [Intel XE#3304])
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-1/igt@kms_content_protection@legacy-hdcp14@pipe-a-dp-2.html

  * igt@kms_content_protection@lic-type-0@pipe-a-dp-1:
    - shard-bmg:          NOTRUN -> [FAIL][19] ([Intel XE#3304]) +2 other tests fail
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-5/igt@kms_content_protection@lic-type-0@pipe-a-dp-1.html

  * igt@kms_cursor_crc@cursor-onscreen-128x42:
    - shard-lnl:          NOTRUN -> [SKIP][20] ([Intel XE#1424]) +1 other test skip
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_cursor_crc@cursor-onscreen-128x42.html

  * igt@kms_cursor_crc@cursor-sliding-512x170:
    - shard-lnl:          NOTRUN -> [SKIP][21] ([Intel XE#2321])
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_cursor_crc@cursor-sliding-512x170.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-atomic:
    - shard-lnl:          NOTRUN -> [SKIP][22] ([Intel XE#309])
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic.html

  * igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-dirtyfb-tests:
    - shard-lnl:          NOTRUN -> [SKIP][23] ([Intel XE#4422])
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-dirtyfb-tests.html

  * igt@kms_flip@2x-flip-vs-expired-vblank:
    - shard-bmg:          [PASS][24] -> [FAIL][25] ([Intel XE#3149] / [Intel XE#3321])
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-bmg-6/igt@kms_flip@2x-flip-vs-expired-vblank.html
   [25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-3/igt@kms_flip@2x-flip-vs-expired-vblank.html

  * igt@kms_flip@2x-flip-vs-expired-vblank@cd-dp2-hdmi-a3:
    - shard-bmg:          [PASS][26] -> [FAIL][27] ([Intel XE#3149])
   [26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-bmg-6/igt@kms_flip@2x-flip-vs-expired-vblank@cd-dp2-hdmi-a3.html
   [27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-3/igt@kms_flip@2x-flip-vs-expired-vblank@cd-dp2-hdmi-a3.html

  * igt@kms_flip@2x-wf_vblank-ts-check-interruptible:
    - shard-lnl:          NOTRUN -> [SKIP][28] ([Intel XE#1421]) +1 other test skip
   [28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_flip@2x-wf_vblank-ts-check-interruptible.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-bmg:          [PASS][29] -> [INCOMPLETE][30] ([Intel XE#2049] / [Intel XE#2597]) +3 other tests incomplete
   [29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-bmg-2/igt@kms_flip@flip-vs-suspend-interruptible.html
   [30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-6/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@drrs-slowdraw:
    - shard-lnl:          NOTRUN -> [SKIP][31] ([Intel XE#6312] / [Intel XE#651]) +2 other tests skip
   [31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_frontbuffer_tracking@drrs-slowdraw.html

  * igt@kms_frontbuffer_tracking@fbc-2p-rte:
    - shard-lnl:          NOTRUN -> [SKIP][32] ([Intel XE#656]) +8 other tests skip
   [32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_frontbuffer_tracking@fbc-2p-rte.html

  * igt@kms_frontbuffer_tracking@psr-abgr161616f-draw-blt:
    - shard-lnl:          NOTRUN -> [SKIP][33] ([Intel XE#7061]) +1 other test skip
   [33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_frontbuffer_tracking@psr-abgr161616f-draw-blt.html

  * igt@kms_joiner@invalid-modeset-force-ultra-joiner:
    - shard-lnl:          NOTRUN -> [SKIP][34] ([Intel XE#6900])
   [34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_joiner@invalid-modeset-force-ultra-joiner.html

  * igt@kms_multipipe_modeset@basic-max-pipe-crc-check:
    - shard-lnl:          NOTRUN -> [SKIP][35] ([Intel XE#356])
   [35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_multipipe_modeset@basic-max-pipe-crc-check.html

  * igt@kms_plane@pixel-format-4-tiled-modifier@pipe-a-plane-5:
    - shard-lnl:          NOTRUN -> [SKIP][36] ([Intel XE#7130]) +1 other test skip
   [36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_plane@pixel-format-4-tiled-modifier@pipe-a-plane-5.html

  * igt@kms_plane_lowres@tiling-y:
    - shard-lnl:          NOTRUN -> [SKIP][37] ([Intel XE#599])
   [37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_plane_lowres@tiling-y.html

  * igt@kms_plane_scaling@planes-downscale-factor-0-75@pipe-a:
    - shard-lnl:          NOTRUN -> [SKIP][38] ([Intel XE#2763] / [Intel XE#6886]) +3 other tests skip
   [38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_plane_scaling@planes-downscale-factor-0-75@pipe-a.html

  * igt@kms_psr2_sf@fbc-pr-cursor-plane-update-sf:
    - shard-lnl:          NOTRUN -> [SKIP][39] ([Intel XE#2893])
   [39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_psr2_sf@fbc-pr-cursor-plane-update-sf.html

  * igt@kms_psr@pr-sprite-render:
    - shard-lnl:          NOTRUN -> [SKIP][40] ([Intel XE#1406]) +2 other tests skip
   [40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_psr@pr-sprite-render.html

  * igt@kms_rotation_crc@primary-y-tiled-reflect-x-90:
    - shard-lnl:          NOTRUN -> [SKIP][41] ([Intel XE#3414] / [Intel XE#3904])
   [41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_rotation_crc@primary-y-tiled-reflect-x-90.html

  * igt@kms_vrr@negative-basic:
    - shard-lnl:          NOTRUN -> [SKIP][42] ([Intel XE#1499])
   [42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@kms_vrr@negative-basic.html

  * igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1:
    - shard-lnl:          [PASS][43] -> [FAIL][44] ([Intel XE#2142]) +1 other test fail
   [43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-lnl-2/igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1.html
   [44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-5/igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1.html

  * igt@xe_configfs@survivability-mode:
    - shard-lnl:          NOTRUN -> [SKIP][45] ([Intel XE#6010] / [Intel XE#7317])
   [45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@xe_configfs@survivability-mode.html

  * igt@xe_eudebug@read-metadata:
    - shard-lnl:          NOTRUN -> [SKIP][46] ([Intel XE#4837]) +3 other tests skip
   [46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@xe_eudebug@read-metadata.html

  * igt@xe_evict@evict-mixed-threads-large-multi-vm:
    - shard-lnl:          NOTRUN -> [SKIP][47] ([Intel XE#6540] / [Intel XE#688]) +2 other tests skip
   [47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@xe_evict@evict-mixed-threads-large-multi-vm.html

  * igt@xe_exec_balancer@twice-cm-parallel-rebind:
    - shard-lnl:          NOTRUN -> [SKIP][48] ([Intel XE#7482]) +4 other tests skip
   [48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@xe_exec_balancer@twice-cm-parallel-rebind.html

  * igt@xe_exec_basic@multigpu-many-execqueues-many-vm-bindexecqueue-userptr-invalidate-race:
    - shard-lnl:          NOTRUN -> [SKIP][49] ([Intel XE#1392]) +2 other tests skip
   [49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-bindexecqueue-userptr-invalidate-race.html

  * igt@xe_exec_fault_mode@twice-multi-queue-prefetch:
    - shard-lnl:          NOTRUN -> [SKIP][50] ([Intel XE#7136]) +2 other tests skip
   [50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@xe_exec_fault_mode@twice-multi-queue-prefetch.html

  * igt@xe_exec_multi_queue@one-queue-basic:
    - shard-lnl:          NOTRUN -> [SKIP][51] ([Intel XE#6874]) +6 other tests skip
   [51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@xe_exec_multi_queue@one-queue-basic.html

  * igt@xe_exec_threads@threads-multi-queue-fd-userptr-invalidate:
    - shard-lnl:          NOTRUN -> [SKIP][52] ([Intel XE#7138]) +2 other tests skip
   [52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@xe_exec_threads@threads-multi-queue-fd-userptr-invalidate.html

  * igt@xe_live_ktest@xe_bo@xe_bo_evict_kunit:
    - shard-lnl:          NOTRUN -> [SKIP][53] ([Intel XE#2229])
   [53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@xe_live_ktest@xe_bo@xe_bo_evict_kunit.html

  * igt@xe_multigpu_svm@mgpu-concurrent-access-basic:
    - shard-lnl:          NOTRUN -> [SKIP][54] ([Intel XE#6964])
   [54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@xe_multigpu_svm@mgpu-concurrent-access-basic.html

  * igt@xe_pm@d3cold-mmap-system:
    - shard-lnl:          NOTRUN -> [SKIP][55] ([Intel XE#2284] / [Intel XE#366]) +1 other test skip
   [55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-2/igt@xe_pm@d3cold-mmap-system.html

  
#### Possible fixes ####

  * igt@kms_async_flips@async-flip-with-page-flip-events-linear-atomic@pipe-b-edp-1:
    - shard-lnl:          [DMESG-FAIL][56] ([Intel XE#1727]) -> [PASS][57] +1 other test pass
   [56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-lnl-4/igt@kms_async_flips@async-flip-with-page-flip-events-linear-atomic@pipe-b-edp-1.html
   [57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-6/igt@kms_async_flips@async-flip-with-page-flip-events-linear-atomic@pipe-b-edp-1.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic:
    - shard-bmg:          [FAIL][58] ([Intel XE#7480]) -> [PASS][59]
   [58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-bmg-4/igt@kms_cursor_legacy@flip-vs-cursor-atomic.html
   [59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-2/igt@kms_cursor_legacy@flip-vs-cursor-atomic.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1:
    - shard-lnl:          [FAIL][60] ([Intel XE#301]) -> [PASS][61] +1 other test pass
   [60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-lnl-5/igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1.html
   [61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-5/igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1.html

  * igt@kms_pm_dc@dc6-dpms:
    - shard-lnl:          [FAIL][62] ([Intel XE#7340]) -> [PASS][63]
   [62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-lnl-5/igt@kms_pm_dc@dc6-dpms.html
   [63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-5/igt@kms_pm_dc@dc6-dpms.html

  * igt@xe_evict@evict-mixed-many-threads-small:
    - shard-bmg:          [INCOMPLETE][64] ([Intel XE#6321]) -> [PASS][65]
   [64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-bmg-2/igt@xe_evict@evict-mixed-many-threads-small.html
   [65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-6/igt@xe_evict@evict-mixed-many-threads-small.html

  * igt@xe_pm_residency@aspm_link_residency:
    - shard-bmg:          [SKIP][66] ([Intel XE#7258]) -> [PASS][67]
   [66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-bmg-5/igt@xe_pm_residency@aspm_link_residency.html
   [67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-3/igt@xe_pm_residency@aspm_link_residency.html

  
#### Warnings ####

  * igt@kms_hdr@brightness-with-hdr:
    - shard-bmg:          [SKIP][68] ([Intel XE#3544]) -> [SKIP][69] ([Intel XE#3374] / [Intel XE#3544])
   [68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-bmg-6/igt@kms_hdr@brightness-with-hdr.html
   [69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-1/igt@kms_hdr@brightness-with-hdr.html

  * igt@kms_tiled_display@basic-test-pattern:
    - shard-bmg:          [SKIP][70] ([Intel XE#2426]) -> [FAIL][71] ([Intel XE#1729])
   [70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-bmg-9/igt@kms_tiled_display@basic-test-pattern.html
   [71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-bmg-4/igt@kms_tiled_display@basic-test-pattern.html

  * igt@xe_exec_system_allocator@pat-index-madvise-pat-idx-uc-single-vma:
    - shard-lnl:          [FAIL][72] -> [FAIL][73] ([Intel XE#5625])
   [72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1/shard-lnl-5/igt@xe_exec_system_allocator@pat-index-madvise-pat-idx-uc-single-vma.html
   [73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/shard-lnl-5/igt@xe_exec_system_allocator@pat-index-madvise-pat-idx-uc-single-vma.html

  
  [Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
  [Intel XE#1178]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1178
  [Intel XE#1392]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1392
  [Intel XE#1406]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406
  [Intel XE#1407]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1407
  [Intel XE#1421]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1421
  [Intel XE#1424]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1424
  [Intel XE#1499]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1499
  [Intel XE#1727]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1727
  [Intel XE#1729]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1729
  [Intel XE#2049]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2049
  [Intel XE#2142]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2142
  [Intel XE#2191]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2191
  [Intel XE#2229]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2229
  [Intel XE#2284]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2284
  [Intel XE#2321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2321
  [Intel XE#2426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2426
  [Intel XE#2597]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2597
  [Intel XE#2652]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2652
  [Intel XE#2763]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2763
  [Intel XE#2887]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2887
  [Intel XE#2893]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2893
  [Intel XE#301]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/301
  [Intel XE#309]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/309
  [Intel XE#3149]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3149
  [Intel XE#3278]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3278
  [Intel XE#3304]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3304
  [Intel XE#3321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3321
  [Intel XE#3374]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3374
  [Intel XE#3414]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3414
  [Intel XE#3544]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3544
  [Intel XE#356]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/356
  [Intel XE#366]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/366
  [Intel XE#373]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/373
  [Intel XE#3904]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3904
  [Intel XE#4422]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4422
  [Intel XE#4837]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4837
  [Intel XE#5625]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5625
  [Intel XE#599]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/599
  [Intel XE#6010]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6010
  [Intel XE#6312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6312
  [Intel XE#6321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6321
  [Intel XE#651]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/651
  [Intel XE#6540]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6540
  [Intel XE#656]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/656
  [Intel XE#6874]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6874
  [Intel XE#688]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/688
  [Intel XE#6886]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6886
  [Intel XE#6900]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6900
  [Intel XE#6964]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6964
  [Intel XE#6973]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6973
  [Intel XE#7061]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7061
  [Intel XE#7084]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7084
  [Intel XE#7130]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7130
  [Intel XE#7136]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7136
  [Intel XE#7138]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7138
  [Intel XE#7258]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7258
  [Intel XE#7317]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7317
  [Intel XE#7340]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7340
  [Intel XE#7480]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7480
  [Intel XE#7482]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7482


Build changes
-------------

  * Linux: xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1 -> xe-pw-149888v3

  IGT_8775: 8775
  xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1: a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1
  xe-pw-149888v3: 149888v3

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/index.html

[-- Attachment #2: Type: text/html, Size: 25313 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re:  Xe.CI.FULL: failure for CPU binds and ULLS on migration queue (rev3)
  2026-02-28 13:59 ` ✗ Xe.CI.FULL: failure " Patchwork
@ 2026-03-02 17:54   ` Summers, Stuart
  2026-03-02 18:13     ` Matthew Brost
  0 siblings, 1 reply; 63+ messages in thread
From: Summers, Stuart @ 2026-03-02 17:54 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew

On Sat, 2026-02-28 at 13:59 +0000, Patchwork wrote:
> Patch Details
> Series: CPU binds and ULLS on migration queue (rev3) URL:
> https://patchwork.freedesktop.org/series/149888/ State: failure
> Details:
> https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/index.html 
> CI Bug Log - changes from xe-4635-
> a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1_FULL -> xe-pw-
> 149888v3_FULLSummaryFAILURE
> Serious unknown changes coming with xe-pw-149888v3_FULL absolutely
> need to be
>  verified manually.
> If you think the reported changes have nothing to do with the changes
>  introduced in xe-pw-149888v3_FULL, please notify your bug team
> (I915-ci-infra@lists.freedesktop.org) to allow them
>  to document this new failure mode, which will reduce false positives
> in CI.
> Participating hosts (2 -> 2)No changes in participating hosts
> Possible new issuesHere are the unknown changes that may have been
> introduced in xe-pw-149888v3_FULL:
> IGT changesPossible regressions *
> igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs@pipe-d-dp-2:shard-
> bmg: PASS -> INCOMPLETE
>  * igt@xe_module_load@many-reload:shard-bmg: PASS -> ABORT +1 other

Haven't looked through all the patches yet, but wondering if this might
be related here...

4> [73.639432] Oops: general protection fault, probably for non-
canonical address 0x6b6b6b6b6b6b6b9b: 0000 [#1] SMP NOPTI
<4> [73.639448] CPU: 11 UID: 0 PID: 2750 Comm: xe_module_load Tainted:
G S   U              7.0.0-rc1-lgci-xe-xe-pw-149888v3-debug+ #1
PREEMPT(lazy) 
<4> [73.639459] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
<4> [73.639462] Hardware name: ASUS System Product Name/PRIME Z790-P
WIFI, BIOS 1645 03/15/2024
<4> [73.639468] RIP: 0010:kernfs_root+0x3e/0x1b0
<4> [73.639476] Code: 31 c9 45 31 c0 48 8d 05 00 00 00 00 50 b9 02 00
00 00 31 f6 48 c7 c7 e0 5b 5c 83 e8 6c c2 ac ff e8 37 87 f6 00 5a 85 c0
75 62 <49> 8b 5c 24 30 e8 28 87 f6 00 85 c0 0f 85 f9 00 00 00 48 85 db
49
<4> [73.639485] RSP: 0018:ffffc900045cb7e8 EFLAGS: 00010202
<4> [73.639491] RAX: 0000000000000001 RBX: 6b6b6b6b6b6b6b6b RCX:
0000000000000000
<4> [73.639496] RDX: ffffffff819c2880 RSI: 0000000000000000 RDI:
0000000000000000
<4> [73.639500] RBP: ffffc900045cb800 R08: 0000000000000000 R09:
0000000000000000
<4> [73.639505] R10: 0000000000000000 R11: 0000000000000000 R12:
6b6b6b6b6b6b6b6b
<4> [73.639509] R13: 0000000000000000 R14: ffffffffa12018cf R15:
ffff88812ed6ee40
<4> [73.639513] FS:  00007360f3e40940(0000) GS:ffff8888db21b000(0000)
knlGS:0000000000000000
<4> [73.639520] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [73.639524] CR2: 00005bc7ecde6708 CR3: 0000000120b07002 CR4:
0000000000f72ef0
<4> [73.639530] PKRU: 55555554
<4> [73.639533] Call Trace:
<4> [73.639536]  <TASK>
<4> [73.639540]  kernfs_remove_by_name_ns+0x27/0xc0
<4> [73.639547]  sysfs_remove_link+0x19/0x50
<4> [73.639555]  action_rm_device_link+0x15/0x20 [xe]
<4> [73.639818]  devm_action_release+0x15/0x30
<4> [73.639825]  release_nodes+0x44/0x160
<4> [73.639830]  ? trace_hardirqs_on+0x22/0x100
<4> [73.639839]  devres_release_all+0x96/0xd0
<4> [73.639847]  device_unbind_cleanup+0x12/0xb0
<4> [73.639855]  device_release_driver_internal+0x23a/0x280
<4> [73.639860]  ? bus_find_device+0xa5/0xe0
<4> [73.639867]  device_driver_detach+0x14/0x20
<4> [73.639872]  unbind_store+0xac/0xc0
<4> [73.639879]  drv_attr_store+0x24/0x50
<4> [73.639884]  sysfs_kf_write+0x4d/0x80
<4> [73.639892]  kernfs_fop_write_iter+0x188/0x240
<4> [73.639899]  vfs_write+0x283/0x540
<4> [73.639910]  ksys_write+0x6f/0xf0
<4> [73.639917]  __x64_sys_write+0x19/0x30
<4> [73.639923]  x64_sys_call+0x259/0x26e0
<4> [73.639930]  do_syscall_64+0xdd/0x1470
<4> [73.639937]  ? check_bytes_and_report+0x59/0x150
<4> [73.639944]  ? find_held_lock+0x31/0x90
<4> [73.639950]  ? free_to_partial_list+0x46d/0x640
<4> [73.639956]  ? lock_release+0xd0/0x2b0
<4> [73.639962]  ? _raw_spin_unlock_irqrestore+0x51/0x80
<4> [73.639970]  ? free_to_partial_list+0x46d/0x640
<4> [73.639975]  ? trace_hardirqs_on+0x22/0x100
<4> [73.639981]  ? _raw_spin_unlock_irqrestore+0x51/0x80
<4> [73.639987]  ? free_to_partial_list+0x46d/0x640
<4> [73.639992]  ? putname+0x41/0x90
<4> [73.640000]  ? __slab_free+0x129/0x2b0
<4> [73.640005]  ? __pcs_replace_full_main+0x29a/0x660
<4> [73.640013]  ? putname+0x41/0x90
<4> [73.640019]  ? kmem_cache_free+0x165/0x510
<4> [73.640025]  ? putname+0x41/0x90
<4> [73.640031]  ? do_sys_openat2+0x85/0xd0
<4> [73.640038]  ? __x64_sys_openat+0x54/0xa0
<4> [73.640043]  ? trace_hardirqs_on_prepare+0xe1/0x100
<4> [73.640050]  ? do_syscall_64+0x22e/0x1470
<4> [73.640055]  ? trace_hardirqs_on_prepare+0xe1/0x100
<4> [73.640061]  ? do_syscall_64+0x22e/0x1470
<4> [73.640067]  ? do_syscall_64+0x22e/0x1470
<4> [73.640072]  ? trace_hardirqs_on_prepare+0xe1/0x100
<4> [73.640079]  ? do_syscall_64+0x22e/0x1470
<4> [73.640084]  ? trace_hardirqs_on_prepare+0xe1/0x100
<4> [73.640090]  ? do_syscall_64+0x22e/0x1470
<4> [73.640096]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
<4> [73.640102] RIP: 0033:0x7360f611c5a4
<4> [73.640107] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f
84 00 00 00 00 00 f3 0f 1e fa 80 3d a5 ea 0e 00 00 74 13 b8 01 00 00 00
0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48
89
<4> [73.640117] RSP: 002b:00007ffc58eed568 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
<4> [73.640124] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
00007360f611c5a4
<4> [73.640129] RDX: 000000000000000c RSI: 0000561d73ef2e83 RDI:
0000000000000006
<4> [73.640134] RBP: 000000000000000c R08: 00007360f6203b20 R09:
0000000000000000
<4> [73.640138] R10: 0000000000000000 R11: 0000000000000202 R12:
0000561d73ef2e83
<4> [73.640143] R13: 0000000000000006 R14: 0000561d6256fcf8 R15:
00007360f6485000
<4> [73.640151]  </TASK>

> test abort
>  * igt@xe_vm@bind-array-enobufs:shard-lnl: PASS -> FAILshard-bmg:
> PASS -> FAIL
> Known issuesHere are the changes found in xe-pw-149888v3_FULL that
> come from known issues:
> IGT changesIssues hit *
> igt@kms_big_fb@x-tiled-16bpp-rotate-270:shard-lnl: NOTRUN -> SKIP
> (Intel XE#1407)
>  * igt@kms_big_fb@y-tiled-64bpp-rotate-270:shard-lnl: NOTRUN -> SKIP
> (Intel XE#1124) +2 other tests skip
>  * igt@kms_bw@connected-linear-tiling-2-displays-2160x1440p:shard-
> lnl: NOTRUN -> SKIP (Intel XE#2191)
>  * igt@kms_ccs@ccs-on-another-bo-4-tiled-mtl-rc-ccs:shard-lnl: NOTRUN
> -> SKIP (Intel XE#2887) +3 other tests skip
>  * igt@kms_ccs@crc-primary-rotation-180-4-tiled-lnl-ccs@pipe-a-dp-
> 1:shard-bmg: NOTRUN -> SKIP (Intel XE#2652) +3 other tests skip
>  * igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs:shard-bmg: PASS ->
> INCOMPLETE (Intel XE#7084)
>  * igt@kms_chamelium_frames@hdmi-crc-fast:shard-lnl: NOTRUN -> SKIP
> (Intel XE#373) +1 other test skip
>  * igt@kms_content_protection@atomic-dpms:shard-lnl: NOTRUN -> SKIP
> (Intel XE#3278 / Intel XE#6973)
>  * igt@kms_content_protection@legacy-hdcp14@pipe-a-dp-2:shard-bmg:
> NOTRUN -> FAIL (Intel XE#1178 / Intel XE#3304)
>  * igt@kms_content_protection@lic-type-0@pipe-a-dp-1:shard-bmg:
> NOTRUN -> FAIL (Intel XE#3304) +2 other tests fail
>  * igt@kms_cursor_crc@cursor-onscreen-128x42:shard-lnl: NOTRUN ->
> SKIP (Intel XE#1424) +1 other test skip
>  * igt@kms_cursor_crc@cursor-sliding-512x170:shard-lnl: NOTRUN ->
> SKIP (Intel XE#2321)
>  * igt@kms_cursor_legacy@cursorb-vs-flipa-atomic:shard-lnl: NOTRUN ->
> SKIP (Intel XE#309)
>  * igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-dirtyfb-tests:shard-
> lnl: NOTRUN -> SKIP (Intel XE#4422)
>  * igt@kms_flip@2x-flip-vs-expired-vblank:shard-bmg: PASS -> FAIL
> (Intel XE#3149 / Intel XE#3321)
>  * igt@kms_flip@2x-flip-vs-expired-vblank@cd-dp2-hdmi-a3:shard-bmg:
> PASS -> FAIL (Intel XE#3149)
>  * igt@kms_flip@2x-wf_vblank-ts-check-interruptible:shard-lnl: NOTRUN
> -> SKIP (Intel XE#1421) +1 other test skip
>  * igt@kms_flip@flip-vs-suspend-interruptible:shard-bmg: PASS ->
> INCOMPLETE (Intel XE#2049 / Intel XE#2597) +3 other tests incomplete
>  * igt@kms_frontbuffer_tracking@drrs-slowdraw:shard-lnl: NOTRUN ->
> SKIP (Intel XE#6312 / Intel XE#651) +2 other tests skip
>  * igt@kms_frontbuffer_tracking@fbc-2p-rte:shard-lnl: NOTRUN -> SKIP
> (Intel XE#656) +8 other tests skip
>  * igt@kms_frontbuffer_tracking@psr-abgr161616f-draw-blt:shard-lnl:
> NOTRUN -> SKIP (Intel XE#7061) +1 other test skip
>  * igt@kms_joiner@invalid-modeset-force-ultra-joiner:shard-lnl:
> NOTRUN -> SKIP (Intel XE#6900)
>  * igt@kms_multipipe_modeset@basic-max-pipe-crc-check:shard-lnl:
> NOTRUN -> SKIP (Intel XE#356)
>  * igt@kms_plane@pixel-format-4-tiled-modifier@pipe-a-plane-5:shard-
> lnl: NOTRUN -> SKIP (Intel XE#7130) +1 other test skip
>  * igt@kms_plane_lowres@tiling-y:shard-lnl: NOTRUN -> SKIP (Intel
> XE#599)
>  * igt@kms_plane_scaling@planes-downscale-factor-0-75@pipe-a:shard-
> lnl: NOTRUN -> SKIP (Intel XE#2763 / Intel XE#6886) +3 other tests
> skip
>  * igt@kms_psr2_sf@fbc-pr-cursor-plane-update-sf:shard-lnl: NOTRUN ->
> SKIP (Intel XE#2893)
>  * igt@kms_psr@pr-sprite-render:shard-lnl: NOTRUN -> SKIP (Intel
> XE#1406) +2 other tests skip
>  * igt@kms_rotation_crc@primary-y-tiled-reflect-x-90:shard-lnl:
> NOTRUN -> SKIP (Intel XE#3414 / Intel XE#3904)
>  * igt@kms_vrr@negative-basic:shard-lnl: NOTRUN -> SKIP (Intel
> XE#1499)
>  * igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1:shard-lnl:
> PASS -> FAIL (Intel XE#2142) +1 other test fail
>  * igt@xe_configfs@survivability-mode:shard-lnl: NOTRUN -> SKIP
> (Intel XE#6010 / Intel XE#7317)
>  * igt@xe_eudebug@read-metadata:shard-lnl: NOTRUN -> SKIP (Intel
> XE#4837) +3 other tests skip
>  * igt@xe_evict@evict-mixed-threads-large-multi-vm:shard-lnl: NOTRUN
> -> SKIP (Intel XE#6540 / Intel XE#688) +2 other tests skip
>  * igt@xe_exec_balancer@twice-cm-parallel-rebind:shard-lnl: NOTRUN ->
> SKIP (Intel XE#7482) +4 other tests skip
>  *
> igt@xe_exec_basic@multigpu-many-execqueues-many-vm-bindexecqueue-userptr-invalida
> te-race:shard-lnl: NOTRUN -> SKIP (Intel XE#1392) +2 other tests skip
>  * igt@xe_exec_fault_mode@twice-multi-queue-prefetch:shard-lnl:
> NOTRUN -> SKIP (Intel XE#7136) +2 other tests skip
>  * igt@xe_exec_multi_queue@one-queue-basic:shard-lnl: NOTRUN -> SKIP
> (Intel XE#6874) +6 other tests skip
>  *
> igt@xe_exec_threads@threads-multi-queue-fd-userptr-invalidate:shard-
> lnl: NOTRUN -> SKIP (Intel XE#7138) +2 other tests skip
>  * igt@xe_live_ktest@xe_bo@xe_bo_evict_kunit:shard-lnl: NOTRUN ->
> SKIP (Intel XE#2229)
>  * igt@xe_multigpu_svm@mgpu-concurrent-access-basic:shard-lnl: NOTRUN
> -> SKIP (Intel XE#6964)
>  * igt@xe_pm@d3cold-mmap-system:shard-lnl: NOTRUN -> SKIP (Intel
> XE#2284 / Intel XE#366) +1 other test skip
> Possible fixes *
> igt@kms_async_flips@async-flip-with-page-flip-events-linear-atomic@pi
> pe-b-edp-1:shard-lnl: DMESG-FAIL (Intel XE#1727) -> PASS +1 other
> test pass
>  * igt@kms_cursor_legacy@flip-vs-cursor-atomic:shard-bmg: FAIL (Intel
> XE#7480) -> PASS
>  * igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1:shard-
> lnl: FAIL (Intel XE#301) -> PASS +1 other test pass
>  * igt@kms_pm_dc@dc6-dpms:shard-lnl: FAIL (Intel XE#7340) -> PASS
>  * igt@xe_evict@evict-mixed-many-threads-small:shard-bmg: INCOMPLETE
> (Intel XE#6321) -> PASS
>  * igt@xe_pm_residency@aspm_link_residency:shard-bmg: SKIP (Intel
> XE#7258) -> PASS
> Warnings * igt@kms_hdr@brightness-with-hdr:shard-bmg: SKIP (Intel
> XE#3544) -> SKIP (Intel XE#3374 / Intel XE#3544)
>  * igt@kms_tiled_display@basic-test-pattern:shard-bmg: SKIP (Intel
> XE#2426) -> FAIL (Intel XE#1729)
>  *
> igt@xe_exec_system_allocator@pat-index-madvise-pat-idx-uc-single-vma:
> shard-lnl: FAIL -> FAIL (Intel XE#5625)
> Build changes * Linux: xe-4635-
> a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1 -> xe-pw-149888v3
> IGT_8775: 8775
>  xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1:
> a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1
>  xe-pw-149888v3: 149888v3


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Xe.CI.FULL: failure for CPU binds and ULLS on migration queue (rev3)
  2026-03-02 17:54   ` Summers, Stuart
@ 2026-03-02 18:13     ` Matthew Brost
  0 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-03-02 18:13 UTC (permalink / raw)
  To: Summers, Stuart; +Cc: intel-xe@lists.freedesktop.org

On Mon, Mar 02, 2026 at 10:54:56AM -0700, Summers, Stuart wrote:
> On Sat, 2026-02-28 at 13:59 +0000, Patchwork wrote:
> > Patch Details
> > Series: CPU binds and ULLS on migration queue (rev3) URL:
> > https://patchwork.freedesktop.org/series/149888/ State: failure
> > Details:
> > https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-149888v3/index.html 
> > CI Bug Log - changes from xe-4635-
> > a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1_FULL -> xe-pw-
> > 149888v3_FULLSummaryFAILURE
> > Serious unknown changes coming with xe-pw-149888v3_FULL absolutely
> > need to be
> >  verified manually.
> > If you think the reported changes have nothing to do with the changes
> >  introduced in xe-pw-149888v3_FULL, please notify your bug team
> > (I915-ci-infra@lists.freedesktop.org) to allow them
> >  to document this new failure mode, which will reduce false positives
> > in CI.
> > Participating hosts (2 -> 2)No changes in participating hosts
> > Possible new issuesHere are the unknown changes that may have been
> > introduced in xe-pw-149888v3_FULL:
> > IGT changesPossible regressions *
> > igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs@pipe-d-dp-2:shard-
> > bmg: PASS -> INCOMPLETE
> >  * igt@xe_module_load@many-reload:shard-bmg: PASS -> ABORT +1 other
> 
> Haven't looked through all the patches yet, but wondering if this might
> be related here...
> 

I don't see obvious in the series which relates to this and I can't
recreate this locally. I'm putting this one in early rc cycle bugs for
now but will keep an eye on it if it pops up in future revs.

Matt 

> 4> [73.639432] Oops: general protection fault, probably for non-
> canonical address 0x6b6b6b6b6b6b6b9b: 0000 [#1] SMP NOPTI
> <4> [73.639448] CPU: 11 UID: 0 PID: 2750 Comm: xe_module_load Tainted:
> G S   U              7.0.0-rc1-lgci-xe-xe-pw-149888v3-debug+ #1
> PREEMPT(lazy) 
> <4> [73.639459] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
> <4> [73.639462] Hardware name: ASUS System Product Name/PRIME Z790-P
> WIFI, BIOS 1645 03/15/2024
> <4> [73.639468] RIP: 0010:kernfs_root+0x3e/0x1b0
> <4> [73.639476] Code: 31 c9 45 31 c0 48 8d 05 00 00 00 00 50 b9 02 00
> 00 00 31 f6 48 c7 c7 e0 5b 5c 83 e8 6c c2 ac ff e8 37 87 f6 00 5a 85 c0
> 75 62 <49> 8b 5c 24 30 e8 28 87 f6 00 85 c0 0f 85 f9 00 00 00 48 85 db
> 49
> <4> [73.639485] RSP: 0018:ffffc900045cb7e8 EFLAGS: 00010202
> <4> [73.639491] RAX: 0000000000000001 RBX: 6b6b6b6b6b6b6b6b RCX:
> 0000000000000000
> <4> [73.639496] RDX: ffffffff819c2880 RSI: 0000000000000000 RDI:
> 0000000000000000
> <4> [73.639500] RBP: ffffc900045cb800 R08: 0000000000000000 R09:
> 0000000000000000
> <4> [73.639505] R10: 0000000000000000 R11: 0000000000000000 R12:
> 6b6b6b6b6b6b6b6b
> <4> [73.639509] R13: 0000000000000000 R14: ffffffffa12018cf R15:
> ffff88812ed6ee40
> <4> [73.639513] FS:  00007360f3e40940(0000) GS:ffff8888db21b000(0000)
> knlGS:0000000000000000
> <4> [73.639520] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [73.639524] CR2: 00005bc7ecde6708 CR3: 0000000120b07002 CR4:
> 0000000000f72ef0
> <4> [73.639530] PKRU: 55555554
> <4> [73.639533] Call Trace:
> <4> [73.639536]  <TASK>
> <4> [73.639540]  kernfs_remove_by_name_ns+0x27/0xc0
> <4> [73.639547]  sysfs_remove_link+0x19/0x50
> <4> [73.639555]  action_rm_device_link+0x15/0x20 [xe]
> <4> [73.639818]  devm_action_release+0x15/0x30
> <4> [73.639825]  release_nodes+0x44/0x160
> <4> [73.639830]  ? trace_hardirqs_on+0x22/0x100
> <4> [73.639839]  devres_release_all+0x96/0xd0
> <4> [73.639847]  device_unbind_cleanup+0x12/0xb0
> <4> [73.639855]  device_release_driver_internal+0x23a/0x280
> <4> [73.639860]  ? bus_find_device+0xa5/0xe0
> <4> [73.639867]  device_driver_detach+0x14/0x20
> <4> [73.639872]  unbind_store+0xac/0xc0
> <4> [73.639879]  drv_attr_store+0x24/0x50
> <4> [73.639884]  sysfs_kf_write+0x4d/0x80
> <4> [73.639892]  kernfs_fop_write_iter+0x188/0x240
> <4> [73.639899]  vfs_write+0x283/0x540
> <4> [73.639910]  ksys_write+0x6f/0xf0
> <4> [73.639917]  __x64_sys_write+0x19/0x30
> <4> [73.639923]  x64_sys_call+0x259/0x26e0
> <4> [73.639930]  do_syscall_64+0xdd/0x1470
> <4> [73.639937]  ? check_bytes_and_report+0x59/0x150
> <4> [73.639944]  ? find_held_lock+0x31/0x90
> <4> [73.639950]  ? free_to_partial_list+0x46d/0x640
> <4> [73.639956]  ? lock_release+0xd0/0x2b0
> <4> [73.639962]  ? _raw_spin_unlock_irqrestore+0x51/0x80
> <4> [73.639970]  ? free_to_partial_list+0x46d/0x640
> <4> [73.639975]  ? trace_hardirqs_on+0x22/0x100
> <4> [73.639981]  ? _raw_spin_unlock_irqrestore+0x51/0x80
> <4> [73.639987]  ? free_to_partial_list+0x46d/0x640
> <4> [73.639992]  ? putname+0x41/0x90
> <4> [73.640000]  ? __slab_free+0x129/0x2b0
> <4> [73.640005]  ? __pcs_replace_full_main+0x29a/0x660
> <4> [73.640013]  ? putname+0x41/0x90
> <4> [73.640019]  ? kmem_cache_free+0x165/0x510
> <4> [73.640025]  ? putname+0x41/0x90
> <4> [73.640031]  ? do_sys_openat2+0x85/0xd0
> <4> [73.640038]  ? __x64_sys_openat+0x54/0xa0
> <4> [73.640043]  ? trace_hardirqs_on_prepare+0xe1/0x100
> <4> [73.640050]  ? do_syscall_64+0x22e/0x1470
> <4> [73.640055]  ? trace_hardirqs_on_prepare+0xe1/0x100
> <4> [73.640061]  ? do_syscall_64+0x22e/0x1470
> <4> [73.640067]  ? do_syscall_64+0x22e/0x1470
> <4> [73.640072]  ? trace_hardirqs_on_prepare+0xe1/0x100
> <4> [73.640079]  ? do_syscall_64+0x22e/0x1470
> <4> [73.640084]  ? trace_hardirqs_on_prepare+0xe1/0x100
> <4> [73.640090]  ? do_syscall_64+0x22e/0x1470
> <4> [73.640096]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> <4> [73.640102] RIP: 0033:0x7360f611c5a4
> <4> [73.640107] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f
> 84 00 00 00 00 00 f3 0f 1e fa 80 3d a5 ea 0e 00 00 74 13 b8 01 00 00 00
> 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48
> 89
> <4> [73.640117] RSP: 002b:00007ffc58eed568 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000001
> <4> [73.640124] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007360f611c5a4
> <4> [73.640129] RDX: 000000000000000c RSI: 0000561d73ef2e83 RDI:
> 0000000000000006
> <4> [73.640134] RBP: 000000000000000c R08: 00007360f6203b20 R09:
> 0000000000000000
> <4> [73.640138] R10: 0000000000000000 R11: 0000000000000202 R12:
> 0000561d73ef2e83
> <4> [73.640143] R13: 0000000000000006 R14: 0000561d6256fcf8 R15:
> 00007360f6485000
> <4> [73.640151]  </TASK>
> 
> > test abort
> >  * igt@xe_vm@bind-array-enobufs:shard-lnl: PASS -> FAILshard-bmg:
> > PASS -> FAIL
> > Known issuesHere are the changes found in xe-pw-149888v3_FULL that
> > come from known issues:
> > IGT changesIssues hit *
> > igt@kms_big_fb@x-tiled-16bpp-rotate-270:shard-lnl: NOTRUN -> SKIP
> > (Intel XE#1407)
> >  * igt@kms_big_fb@y-tiled-64bpp-rotate-270:shard-lnl: NOTRUN -> SKIP
> > (Intel XE#1124) +2 other tests skip
> >  * igt@kms_bw@connected-linear-tiling-2-displays-2160x1440p:shard-
> > lnl: NOTRUN -> SKIP (Intel XE#2191)
> >  * igt@kms_ccs@ccs-on-another-bo-4-tiled-mtl-rc-ccs:shard-lnl: NOTRUN
> > -> SKIP (Intel XE#2887) +3 other tests skip
> >  * igt@kms_ccs@crc-primary-rotation-180-4-tiled-lnl-ccs@pipe-a-dp-
> > 1:shard-bmg: NOTRUN -> SKIP (Intel XE#2652) +3 other tests skip
> >  * igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs:shard-bmg: PASS ->
> > INCOMPLETE (Intel XE#7084)
> >  * igt@kms_chamelium_frames@hdmi-crc-fast:shard-lnl: NOTRUN -> SKIP
> > (Intel XE#373) +1 other test skip
> >  * igt@kms_content_protection@atomic-dpms:shard-lnl: NOTRUN -> SKIP
> > (Intel XE#3278 / Intel XE#6973)
> >  * igt@kms_content_protection@legacy-hdcp14@pipe-a-dp-2:shard-bmg:
> > NOTRUN -> FAIL (Intel XE#1178 / Intel XE#3304)
> >  * igt@kms_content_protection@lic-type-0@pipe-a-dp-1:shard-bmg:
> > NOTRUN -> FAIL (Intel XE#3304) +2 other tests fail
> >  * igt@kms_cursor_crc@cursor-onscreen-128x42:shard-lnl: NOTRUN ->
> > SKIP (Intel XE#1424) +1 other test skip
> >  * igt@kms_cursor_crc@cursor-sliding-512x170:shard-lnl: NOTRUN ->
> > SKIP (Intel XE#2321)
> >  * igt@kms_cursor_legacy@cursorb-vs-flipa-atomic:shard-lnl: NOTRUN ->
> > SKIP (Intel XE#309)
> >  * igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-dirtyfb-tests:shard-
> > lnl: NOTRUN -> SKIP (Intel XE#4422)
> >  * igt@kms_flip@2x-flip-vs-expired-vblank:shard-bmg: PASS -> FAIL
> > (Intel XE#3149 / Intel XE#3321)
> >  * igt@kms_flip@2x-flip-vs-expired-vblank@cd-dp2-hdmi-a3:shard-bmg:
> > PASS -> FAIL (Intel XE#3149)
> >  * igt@kms_flip@2x-wf_vblank-ts-check-interruptible:shard-lnl: NOTRUN
> > -> SKIP (Intel XE#1421) +1 other test skip
> >  * igt@kms_flip@flip-vs-suspend-interruptible:shard-bmg: PASS ->
> > INCOMPLETE (Intel XE#2049 / Intel XE#2597) +3 other tests incomplete
> >  * igt@kms_frontbuffer_tracking@drrs-slowdraw:shard-lnl: NOTRUN ->
> > SKIP (Intel XE#6312 / Intel XE#651) +2 other tests skip
> >  * igt@kms_frontbuffer_tracking@fbc-2p-rte:shard-lnl: NOTRUN -> SKIP
> > (Intel XE#656) +8 other tests skip
> >  * igt@kms_frontbuffer_tracking@psr-abgr161616f-draw-blt:shard-lnl:
> > NOTRUN -> SKIP (Intel XE#7061) +1 other test skip
> >  * igt@kms_joiner@invalid-modeset-force-ultra-joiner:shard-lnl:
> > NOTRUN -> SKIP (Intel XE#6900)
> >  * igt@kms_multipipe_modeset@basic-max-pipe-crc-check:shard-lnl:
> > NOTRUN -> SKIP (Intel XE#356)
> >  * igt@kms_plane@pixel-format-4-tiled-modifier@pipe-a-plane-5:shard-
> > lnl: NOTRUN -> SKIP (Intel XE#7130) +1 other test skip
> >  * igt@kms_plane_lowres@tiling-y:shard-lnl: NOTRUN -> SKIP (Intel
> > XE#599)
> >  * igt@kms_plane_scaling@planes-downscale-factor-0-75@pipe-a:shard-
> > lnl: NOTRUN -> SKIP (Intel XE#2763 / Intel XE#6886) +3 other tests
> > skip
> >  * igt@kms_psr2_sf@fbc-pr-cursor-plane-update-sf:shard-lnl: NOTRUN ->
> > SKIP (Intel XE#2893)
> >  * igt@kms_psr@pr-sprite-render:shard-lnl: NOTRUN -> SKIP (Intel
> > XE#1406) +2 other tests skip
> >  * igt@kms_rotation_crc@primary-y-tiled-reflect-x-90:shard-lnl:
> > NOTRUN -> SKIP (Intel XE#3414 / Intel XE#3904)
> >  * igt@kms_vrr@negative-basic:shard-lnl: NOTRUN -> SKIP (Intel
> > XE#1499)
> >  * igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1:shard-lnl:
> > PASS -> FAIL (Intel XE#2142) +1 other test fail
> >  * igt@xe_configfs@survivability-mode:shard-lnl: NOTRUN -> SKIP
> > (Intel XE#6010 / Intel XE#7317)
> >  * igt@xe_eudebug@read-metadata:shard-lnl: NOTRUN -> SKIP (Intel
> > XE#4837) +3 other tests skip
> >  * igt@xe_evict@evict-mixed-threads-large-multi-vm:shard-lnl: NOTRUN
> > -> SKIP (Intel XE#6540 / Intel XE#688) +2 other tests skip
> >  * igt@xe_exec_balancer@twice-cm-parallel-rebind:shard-lnl: NOTRUN ->
> > SKIP (Intel XE#7482) +4 other tests skip
> >  *
> > igt@xe_exec_basic@multigpu-many-execqueues-many-vm-bindexecqueue-userptr-invalida
> > te-race:shard-lnl: NOTRUN -> SKIP (Intel XE#1392) +2 other tests skip
> >  * igt@xe_exec_fault_mode@twice-multi-queue-prefetch:shard-lnl:
> > NOTRUN -> SKIP (Intel XE#7136) +2 other tests skip
> >  * igt@xe_exec_multi_queue@one-queue-basic:shard-lnl: NOTRUN -> SKIP
> > (Intel XE#6874) +6 other tests skip
> >  *
> > igt@xe_exec_threads@threads-multi-queue-fd-userptr-invalidate:shard-
> > lnl: NOTRUN -> SKIP (Intel XE#7138) +2 other tests skip
> >  * igt@xe_live_ktest@xe_bo@xe_bo_evict_kunit:shard-lnl: NOTRUN ->
> > SKIP (Intel XE#2229)
> >  * igt@xe_multigpu_svm@mgpu-concurrent-access-basic:shard-lnl: NOTRUN
> > -> SKIP (Intel XE#6964)
> >  * igt@xe_pm@d3cold-mmap-system:shard-lnl: NOTRUN -> SKIP (Intel
> > XE#2284 / Intel XE#366) +1 other test skip
> > Possible fixes *
> > igt@kms_async_flips@async-flip-with-page-flip-events-linear-atomic@pi
> > pe-b-edp-1:shard-lnl: DMESG-FAIL (Intel XE#1727) -> PASS +1 other
> > test pass
> >  * igt@kms_cursor_legacy@flip-vs-cursor-atomic:shard-bmg: FAIL (Intel
> > XE#7480) -> PASS
> >  * igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1:shard-
> > lnl: FAIL (Intel XE#301) -> PASS +1 other test pass
> >  * igt@kms_pm_dc@dc6-dpms:shard-lnl: FAIL (Intel XE#7340) -> PASS
> >  * igt@xe_evict@evict-mixed-many-threads-small:shard-bmg: INCOMPLETE
> > (Intel XE#6321) -> PASS
> >  * igt@xe_pm_residency@aspm_link_residency:shard-bmg: SKIP (Intel
> > XE#7258) -> PASS
> > Warnings * igt@kms_hdr@brightness-with-hdr:shard-bmg: SKIP (Intel
> > XE#3544) -> SKIP (Intel XE#3374 / Intel XE#3544)
> >  * igt@kms_tiled_display@basic-test-pattern:shard-bmg: SKIP (Intel
> > XE#2426) -> FAIL (Intel XE#1729)
> >  *
> > igt@xe_exec_system_allocator@pat-index-madvise-pat-idx-uc-single-vma:
> > shard-lnl: FAIL -> FAIL (Intel XE#5625)
> > Build changes * Linux: xe-4635-
> > a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1 -> xe-pw-149888v3
> > IGT_8775: 8775
> >  xe-4635-a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1:
> > a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1
> >  xe-pw-149888v3: 149888v3
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 00/25] CPU binds and ULLS on migration queue
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (28 preceding siblings ...)
  2026-02-28 13:59 ` ✗ Xe.CI.FULL: failure " Patchwork
@ 2026-03-05 22:56 ` Summers, Stuart
  2026-03-10 22:17   ` Matthew Brost
  2026-03-20 15:31 ` Thomas Hellström
  30 siblings, 1 reply; 63+ messages in thread
From: Summers, Stuart @ 2026-03-05 22:56 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Ghimiray, Himal Prasad, Yadav, Arvind,
	thomas.hellstrom@linux.intel.com, Dugast, Francois

One question I have reading through some of the ULLS patches, why do we
need to do this in the kernel? This adds quite a bit of complexity here
that IMO might be a better fit for userspace migration, particularly
for the userspace that already handles ULLS like L0. What is the
benefit of doing this in the kernel vs adding a new API to allow some
kind of opt in for migration in the UMD?

The CPU binding generally even outside of the ULLS piece makes sense to
me, so I don't think this is blocking particularly. But it would be
nice to have a little more detail on the above before we move forward
here.

Thanks,
Stuart

On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> We now have data demonstrating the need for CPU binds and ULLS on the
> migration queue, based on results generated from [1].
> 
> On BMG, measurements show that when the GPU is continuously
> processing
> faults, copy jobs run approximately 30–40µs faster (depending on the
> test case) with ULLS compared to traditional GuC submission with SLPC
> enabled on the migration queue. Startup from a cold GPU shows an even
> larger speedup. Given the critical nature of fault performance, ULLS
> appears to be a worthwhile feature.
> 
> In addition to driver telemetry, UMD compute benchmarks consistently
> show over 1GB/s improvement in pagefault benchmarks with ULLS
> enabled.
> 
> ULLS will consume more power (not yet measured) due to a continuously
> running batch on the paging engine. However, compute UMDs already do
> this on engines exposed to users, so this seems like a worthwhile
> tradeoff. To mitigate power concerns, ULLS will exit after a period
> of
> time in which no faults have been processed.
> 
> CPU binds are required for ULLS to function, as the migration queue
> needs exclusive access to the paging hardware engine. Thus, CPU binds
> are included here.
> 
> Beyond being a requirement for ULLS, CPU binds should also reduce
> VM-bind latency, provide clearer multi-tile and TLB-invalidation
> layering, reduce pressure on GuC during fault storms as it is
> bypassed,
> and decouple kernel binds from unrelated copy/clear jobs—especially
> beneficial when faults are serviced in parallel. In a parallel-
> faulting
> test case, average bind time was reduced by approximately 15µs. In
> the
> worst case, 2MB copy time (~60–140µs) × (number of pagefault threads
> −
> 1) of latency would otherwise be added to a single fault. Reducing
> this
> latency increases overall throughput of the fault handler.
> 
> This series can be merged in phases:
> 
> Phase 1: CPU binds (patches 1–13)
> Phase 2: CPU-bind components and multi-tile relayers (patches 14–17)
> Phase 3: ULLS on the migration execution queue (patches 18–25)
> 
> v2:
>  - Use delayed worker to exit ULLS mode in an effort to save on power
>  - Various other cleanups
> v3:
>  - CPU bind component, multi-tile relayer
>  - Split CPU bind patches in many small patches
> 
> Matt
> 
> [1] https://patchwork.freedesktop.org/series/149811/
> 
> Matthew Brost (25):
>   drm/xe: Drop struct xe_migrate_pt_update argument from
> populate/clear
>     vfuns
>   drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper
>   drm/xe: Decouple exec queue idle check from LRC
>   drm/xe: Add job count to GuC exec queue snapshot
>   drm/xe: Update xe_bo_put_deferred arguments to include writeback
> flag
>   drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC
>   drm/xe: Update scheduler job layer to support PT jobs
>   drm/xe: Add helpers to access PT ops
>   drm/xe: Add struct xe_pt_job_ops
>   drm/xe: Update GuC submission backend to run PT jobs
>   drm/xe: Store level in struct xe_vm_pgtable_update
>   drm/xe: Don't use migrate exec queue for page fault binds
>   drm/xe: Enable CPU binds for jobs
>   drm/xe: Remove unused arguments from xe_migrate_pt_update_ops
>   drm/xe: Make bind queues operate cross-tile
>   drm/xe: Add CPU bind layer
>   drm/xe: Add device flag to enable PT mirroring across tiles
>   drm/xe: Add xe_hw_engine_write_ring_tail
>   drm/xe: Add ULLS support to LRC
>   drm/xe: Add ULLS migration job support to migration layer
>   drm/xe: Add MI_SEMAPHORE_WAIT instruction defs
>   drm/xe: Add ULLS migration job support to ring ops
>   drm/xe: Add ULLS migration job support to GuC submission
>   drm/xe: Enter ULLS for migration jobs upon page fault or SVM
> prefetch
>   drm/xe: Add modparam to enable / disable ULLS on migrate queue
> 
>  drivers/gpu/drm/xe/Makefile                   |   1 +
>  .../gpu/drm/xe/instructions/xe_mi_commands.h  |   6 +
>  drivers/gpu/drm/xe/xe_bo.c                    |   8 +-
>  drivers/gpu/drm/xe/xe_bo.h                    |  11 +-
>  drivers/gpu/drm/xe/xe_bo_types.h              |   2 -
>  drivers/gpu/drm/xe/xe_cpu_bind.c              | 296 +++++++
>  drivers/gpu/drm/xe/xe_cpu_bind.h              | 118 +++
>  drivers/gpu/drm/xe/xe_debugfs.c               |   1 +
>  drivers/gpu/drm/xe/xe_defaults.h              |   1 +
>  drivers/gpu/drm/xe/xe_device.c                |  17 +-
>  drivers/gpu/drm/xe/xe_device_types.h          |  11 +
>  drivers/gpu/drm/xe/xe_drm_client.c            |   2 +-
>  drivers/gpu/drm/xe/xe_exec_queue.c            | 163 ++--
>  drivers/gpu/drm/xe/xe_exec_queue.h            |  18 +-
>  drivers/gpu/drm/xe/xe_exec_queue_types.h      |  21 +-
>  drivers/gpu/drm/xe/xe_guc_submit.c            |  82 +-
>  drivers/gpu/drm/xe/xe_guc_submit_types.h      |   2 +
>  drivers/gpu/drm/xe/xe_hw_engine.c             |  10 +
>  drivers/gpu/drm/xe/xe_hw_engine.h             |   1 +
>  drivers/gpu/drm/xe/xe_lrc.c                   |  51 ++
>  drivers/gpu/drm/xe/xe_lrc.h                   |   3 +
>  drivers/gpu/drm/xe/xe_lrc_types.h             |   4 +
>  drivers/gpu/drm/xe/xe_migrate.c               | 585 +++++--------
>  drivers/gpu/drm/xe/xe_migrate.h               |  93 +--
>  drivers/gpu/drm/xe/xe_module.c                |   4 +
>  drivers/gpu/drm/xe/xe_module.h                |   1 +
>  drivers/gpu/drm/xe/xe_pagefault.c             |   3 +
>  drivers/gpu/drm/xe/xe_pci.c                   |   2 +
>  drivers/gpu/drm/xe/xe_pci_types.h             |   1 +
>  drivers/gpu/drm/xe/xe_pt.c                    | 773 +++++++++++-----
> --
>  drivers/gpu/drm/xe/xe_pt.h                    |  12 +-
>  drivers/gpu/drm/xe/xe_pt_types.h              |  49 +-
>  drivers/gpu/drm/xe/xe_ring_ops.c              |  31 +
>  drivers/gpu/drm/xe/xe_sched_job.c             | 100 ++-
>  drivers/gpu/drm/xe/xe_sched_job_types.h       |  36 +-
>  drivers/gpu/drm/xe/xe_sync.c                  |  20 +-
>  drivers/gpu/drm/xe/xe_tlb_inval_job.c         |  28 +-
>  drivers/gpu/drm/xe/xe_tlb_inval_job.h         |   4 +-
>  drivers/gpu/drm/xe/xe_vm.c                    | 241 +++---
>  drivers/gpu/drm/xe/xe_vm.h                    |   3 +
>  drivers/gpu/drm/xe/xe_vm_types.h              |  22 +-
>  41 files changed, 1658 insertions(+), 1179 deletions(-)
>  create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.c
>  create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.h
> 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 00/25] CPU binds and ULLS on migration queue
  2026-03-05 22:56 ` [PATCH v3 00/25] CPU binds and ULLS on migration queue Summers, Stuart
@ 2026-03-10 22:17   ` Matthew Brost
  0 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2026-03-10 22:17 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, Ghimiray, Himal Prasad,
	Yadav, Arvind, thomas.hellstrom@linux.intel.com, Dugast, Francois

On Thu, Mar 05, 2026 at 03:56:50PM -0700, Summers, Stuart wrote:
> One question I have reading through some of the ULLS patches, why do we
> need to do this in the kernel? This adds quite a bit of complexity here
> that IMO might be a better fit for userspace migration, particularly
> for the userspace that already handles ULLS like L0. What is the
> benefit of doing this in the kernel vs adding a new API to allow some
> kind of opt in for migration in the UMD?
> 

I cover some of this in the cover letter, but I’ll type it out again.
Page faults are handled by the kernel and trigger migrations. For
something like SVM, where we migrate chunks of 4K, 64K, or 2M, this
makes a large difference in performance.

Our SVM stats show that a GuC-based submission has roughly a 30µs
overhead. Now consider the individual copy time on BMG with the fastest
available PCIe: 15µs, 20µs, or 60µs for 4K, 64K, and 2M respectively.
The context-switch overhead is huge in the critical path. With future
devices and faster PCIe links, the ratio between context-switch cost and
copy time becomes even worse.

We also have compute benchmarks measuring pagefault bandwidth, and
enabling ULLS (via modparam) shows multi-GB/s bandwidth improvements.
François and I can provide exact numbers, but we already have a ton of
performance fixes in flight that collectively result in roughly 10×
speedups in compute benchmarks. If we can get SVM operating close to the
PCIe line rate, this becomes a very useful feature for
applications—we’re basically already hitting line rate on BMG after all
the performance improvements land.

I’d also argue that it really isn’t all that complex. It largely fits
into existing concepts and is layered quite nicely.

There are prefetch APIs for SVM, but those require applications to
explicitly call them rather than relying on the device to fault in a
malloc(). Those APIs also speed up when ULLS is enabled. Also, if it
isn’t clear—the only option for SVM is KMD‑driven migration, since we
also have to modify the CPU page tables (i.e., user space can’t just
issue a copy command to migrate a malloc’d buffer to the device).

Matt

> The CPU binding generally even outside of the ULLS piece makes sense to
> me, so I don't think this is blocking particularly. But it would be
> nice to have a little more detail on the above before we move forward
> here.
> 
> Thanks,
> Stuart
> 
> On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> > We now have data demonstrating the need for CPU binds and ULLS on the
> > migration queue, based on results generated from [1].
> > 
> > On BMG, measurements show that when the GPU is continuously
> > processing
> > faults, copy jobs run approximately 30–40µs faster (depending on the
> > test case) with ULLS compared to traditional GuC submission with SLPC
> > enabled on the migration queue. Startup from a cold GPU shows an even
> > larger speedup. Given the critical nature of fault performance, ULLS
> > appears to be a worthwhile feature.
> > 
> > In addition to driver telemetry, UMD compute benchmarks consistently
> > show over 1GB/s improvement in pagefault benchmarks with ULLS
> > enabled.
> > 
> > ULLS will consume more power (not yet measured) due to a continuously
> > running batch on the paging engine. However, compute UMDs already do
> > this on engines exposed to users, so this seems like a worthwhile
> > tradeoff. To mitigate power concerns, ULLS will exit after a period
> > of
> > time in which no faults have been processed.
> > 
> > CPU binds are required for ULLS to function, as the migration queue
> > needs exclusive access to the paging hardware engine. Thus, CPU binds
> > are included here.
> > 
> > Beyond being a requirement for ULLS, CPU binds should also reduce
> > VM-bind latency, provide clearer multi-tile and TLB-invalidation
> > layering, reduce pressure on GuC during fault storms as it is
> > bypassed,
> > and decouple kernel binds from unrelated copy/clear jobs—especially
> > beneficial when faults are serviced in parallel. In a parallel-
> > faulting
> > test case, average bind time was reduced by approximately 15µs. In
> > the
> > worst case, 2MB copy time (~60–140µs) × (number of pagefault threads
> > −
> > 1) of latency would otherwise be added to a single fault. Reducing
> > this
> > latency increases overall throughput of the fault handler.
> > 
> > This series can be merged in phases:
> > 
> > Phase 1: CPU binds (patches 1–13)
> > Phase 2: CPU-bind components and multi-tile relayers (patches 14–17)
> > Phase 3: ULLS on the migration execution queue (patches 18–25)
> > 
> > v2:
> >  - Use delayed worker to exit ULLS mode in an effort to save on power
> >  - Various other cleanups
> > v3:
> >  - CPU bind component, multi-tile relayer
> >  - Split CPU bind patches in many small patches
> > 
> > Matt
> > 
> > [1] https://patchwork.freedesktop.org/series/149811/
> > 
> > Matthew Brost (25):
> >   drm/xe: Drop struct xe_migrate_pt_update argument from
> > populate/clear
> >     vfuns
> >   drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper
> >   drm/xe: Decouple exec queue idle check from LRC
> >   drm/xe: Add job count to GuC exec queue snapshot
> >   drm/xe: Update xe_bo_put_deferred arguments to include writeback
> > flag
> >   drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC
> >   drm/xe: Update scheduler job layer to support PT jobs
> >   drm/xe: Add helpers to access PT ops
> >   drm/xe: Add struct xe_pt_job_ops
> >   drm/xe: Update GuC submission backend to run PT jobs
> >   drm/xe: Store level in struct xe_vm_pgtable_update
> >   drm/xe: Don't use migrate exec queue for page fault binds
> >   drm/xe: Enable CPU binds for jobs
> >   drm/xe: Remove unused arguments from xe_migrate_pt_update_ops
> >   drm/xe: Make bind queues operate cross-tile
> >   drm/xe: Add CPU bind layer
> >   drm/xe: Add device flag to enable PT mirroring across tiles
> >   drm/xe: Add xe_hw_engine_write_ring_tail
> >   drm/xe: Add ULLS support to LRC
> >   drm/xe: Add ULLS migration job support to migration layer
> >   drm/xe: Add MI_SEMAPHORE_WAIT instruction defs
> >   drm/xe: Add ULLS migration job support to ring ops
> >   drm/xe: Add ULLS migration job support to GuC submission
> >   drm/xe: Enter ULLS for migration jobs upon page fault or SVM
> > prefetch
> >   drm/xe: Add modparam to enable / disable ULLS on migrate queue
> > 
> >  drivers/gpu/drm/xe/Makefile                   |   1 +
> >  .../gpu/drm/xe/instructions/xe_mi_commands.h  |   6 +
> >  drivers/gpu/drm/xe/xe_bo.c                    |   8 +-
> >  drivers/gpu/drm/xe/xe_bo.h                    |  11 +-
> >  drivers/gpu/drm/xe/xe_bo_types.h              |   2 -
> >  drivers/gpu/drm/xe/xe_cpu_bind.c              | 296 +++++++
> >  drivers/gpu/drm/xe/xe_cpu_bind.h              | 118 +++
> >  drivers/gpu/drm/xe/xe_debugfs.c               |   1 +
> >  drivers/gpu/drm/xe/xe_defaults.h              |   1 +
> >  drivers/gpu/drm/xe/xe_device.c                |  17 +-
> >  drivers/gpu/drm/xe/xe_device_types.h          |  11 +
> >  drivers/gpu/drm/xe/xe_drm_client.c            |   2 +-
> >  drivers/gpu/drm/xe/xe_exec_queue.c            | 163 ++--
> >  drivers/gpu/drm/xe/xe_exec_queue.h            |  18 +-
> >  drivers/gpu/drm/xe/xe_exec_queue_types.h      |  21 +-
> >  drivers/gpu/drm/xe/xe_guc_submit.c            |  82 +-
> >  drivers/gpu/drm/xe/xe_guc_submit_types.h      |   2 +
> >  drivers/gpu/drm/xe/xe_hw_engine.c             |  10 +
> >  drivers/gpu/drm/xe/xe_hw_engine.h             |   1 +
> >  drivers/gpu/drm/xe/xe_lrc.c                   |  51 ++
> >  drivers/gpu/drm/xe/xe_lrc.h                   |   3 +
> >  drivers/gpu/drm/xe/xe_lrc_types.h             |   4 +
> >  drivers/gpu/drm/xe/xe_migrate.c               | 585 +++++--------
> >  drivers/gpu/drm/xe/xe_migrate.h               |  93 +--
> >  drivers/gpu/drm/xe/xe_module.c                |   4 +
> >  drivers/gpu/drm/xe/xe_module.h                |   1 +
> >  drivers/gpu/drm/xe/xe_pagefault.c             |   3 +
> >  drivers/gpu/drm/xe/xe_pci.c                   |   2 +
> >  drivers/gpu/drm/xe/xe_pci_types.h             |   1 +
> >  drivers/gpu/drm/xe/xe_pt.c                    | 773 +++++++++++-----
> > --
> >  drivers/gpu/drm/xe/xe_pt.h                    |  12 +-
> >  drivers/gpu/drm/xe/xe_pt_types.h              |  49 +-
> >  drivers/gpu/drm/xe/xe_ring_ops.c              |  31 +
> >  drivers/gpu/drm/xe/xe_sched_job.c             | 100 ++-
> >  drivers/gpu/drm/xe/xe_sched_job_types.h       |  36 +-
> >  drivers/gpu/drm/xe/xe_sync.c                  |  20 +-
> >  drivers/gpu/drm/xe/xe_tlb_inval_job.c         |  28 +-
> >  drivers/gpu/drm/xe/xe_tlb_inval_job.h         |   4 +-
> >  drivers/gpu/drm/xe/xe_vm.c                    | 241 +++---
> >  drivers/gpu/drm/xe/xe_vm.h                    |   3 +
> >  drivers/gpu/drm/xe/xe_vm_types.h              |  22 +-
> >  41 files changed, 1658 insertions(+), 1179 deletions(-)
> >  create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.c
> >  create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.h
> > 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 00/25] CPU binds and ULLS on migration queue
  2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
                   ` (29 preceding siblings ...)
  2026-03-05 22:56 ` [PATCH v3 00/25] CPU binds and ULLS on migration queue Summers, Stuart
@ 2026-03-20 15:31 ` Thomas Hellström
  30 siblings, 0 replies; 63+ messages in thread
From: Thomas Hellström @ 2026-03-20 15:31 UTC (permalink / raw)
  To: Matthew Brost, intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	francois.dugast

On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> We now have data demonstrating the need for CPU binds and ULLS on the
> migration queue, based on results generated from [1].
> 
> On BMG, measurements show that when the GPU is continuously
> processing
> faults, copy jobs run approximately 30–40µs faster (depending on the
> test case) with ULLS compared to traditional GuC submission with SLPC
> enabled on the migration queue. Startup from a cold GPU shows an even
> larger speedup. Given the critical nature of fault performance, ULLS
> appears to be a worthwhile feature.
> 
> In addition to driver telemetry, UMD compute benchmarks consistently
> show over 1GB/s improvement in pagefault benchmarks with ULLS
> enabled.
> 
> ULLS will consume more power (not yet measured) due to a continuously
> running batch on the paging engine. However, compute UMDs already do
> this on engines exposed to users, so this seems like a worthwhile
> tradeoff. To mitigate power concerns, ULLS will exit after a period
> of
> time in which no faults have been processed.

So the primary use-case would be if there is one or more SVM vmas
present, right? Is a check for this included.

> 
> CPU binds are required for ULLS to function, as the migration queue
> needs exclusive access to the paging hardware engine. 

Could you remind me why exclusive access to the paging hardware engine
is necessary?

> Thus, CPU binds
> are included here.
> 
> Beyond being a requirement for ULLS, CPU binds should also reduce
> VM-bind latency, provide clearer multi-tile and TLB-invalidation
> layering, reduce pressure on GuC during fault storms as it is
> bypassed,
> and decouple kernel binds from unrelated copy/clear jobs—especially
> beneficial when faults are serviced in parallel.

Looking at single-falt cases, I still find this surprising, considering
it should, at least in theory, be possible to proceed from the copy /
clear to a bind operation without performing a TLB flush or context
switch on BMG, and in the CPU bind case we also have the latency of
waking the bind CPU thread once the copy has finished.

>  In a parallel-faulting
> test case, average bind time was reduced by approximately 15µs. In
> the
> worst case, 2MB copy time (~60–140µs) × (number of pagefault threads
> −
> 1) of latency would otherwise be added to a single fault. Reducing
> this
> latency increases overall throughput of the fault handler.
> 
> This series can be merged in phases:
> 
> Phase 1: CPU binds (patches 1–13)
> Phase 2: CPU-bind components and multi-tile relayers (patches 14–17)
> Phase 3: ULLS on the migration execution queue (patches 18–25)

In any case, I'll proceed trying to review this. A future direction
maybe if we completely ditch the CPU binds is to rework the page-table
locking (alternatively craft a different xe_pt_pagefault page-table
implementation with CPU-page-table like locking. Since we don't need to
sync the PT access with the GPU).

/Thomas


> 
> v2:
>  - Use delayed worker to exit ULLS mode in an effort to save on power
>  - Various other cleanups
> v3:
>  - CPU bind component, multi-tile relayer
>  - Split CPU bind patches in many small patches
> 
> Matt
> 
> [1] https://patchwork.freedesktop.org/series/149811/
> 
> Matthew Brost (25):
>   drm/xe: Drop struct xe_migrate_pt_update argument from
> populate/clear
>     vfuns
>   drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper
>   drm/xe: Decouple exec queue idle check from LRC
>   drm/xe: Add job count to GuC exec queue snapshot
>   drm/xe: Update xe_bo_put_deferred arguments to include writeback
> flag
>   drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC
>   drm/xe: Update scheduler job layer to support PT jobs
>   drm/xe: Add helpers to access PT ops
>   drm/xe: Add struct xe_pt_job_ops
>   drm/xe: Update GuC submission backend to run PT jobs
>   drm/xe: Store level in struct xe_vm_pgtable_update
>   drm/xe: Don't use migrate exec queue for page fault binds
>   drm/xe: Enable CPU binds for jobs
>   drm/xe: Remove unused arguments from xe_migrate_pt_update_ops
>   drm/xe: Make bind queues operate cross-tile
>   drm/xe: Add CPU bind layer
>   drm/xe: Add device flag to enable PT mirroring across tiles
>   drm/xe: Add xe_hw_engine_write_ring_tail
>   drm/xe: Add ULLS support to LRC
>   drm/xe: Add ULLS migration job support to migration layer
>   drm/xe: Add MI_SEMAPHORE_WAIT instruction defs
>   drm/xe: Add ULLS migration job support to ring ops
>   drm/xe: Add ULLS migration job support to GuC submission
>   drm/xe: Enter ULLS for migration jobs upon page fault or SVM
> prefetch
>   drm/xe: Add modparam to enable / disable ULLS on migrate queue
> 
>  drivers/gpu/drm/xe/Makefile                   |   1 +
>  .../gpu/drm/xe/instructions/xe_mi_commands.h  |   6 +
>  drivers/gpu/drm/xe/xe_bo.c                    |   8 +-
>  drivers/gpu/drm/xe/xe_bo.h                    |  11 +-
>  drivers/gpu/drm/xe/xe_bo_types.h              |   2 -
>  drivers/gpu/drm/xe/xe_cpu_bind.c              | 296 +++++++
>  drivers/gpu/drm/xe/xe_cpu_bind.h              | 118 +++
>  drivers/gpu/drm/xe/xe_debugfs.c               |   1 +
>  drivers/gpu/drm/xe/xe_defaults.h              |   1 +
>  drivers/gpu/drm/xe/xe_device.c                |  17 +-
>  drivers/gpu/drm/xe/xe_device_types.h          |  11 +
>  drivers/gpu/drm/xe/xe_drm_client.c            |   2 +-
>  drivers/gpu/drm/xe/xe_exec_queue.c            | 163 ++--
>  drivers/gpu/drm/xe/xe_exec_queue.h            |  18 +-
>  drivers/gpu/drm/xe/xe_exec_queue_types.h      |  21 +-
>  drivers/gpu/drm/xe/xe_guc_submit.c            |  82 +-
>  drivers/gpu/drm/xe/xe_guc_submit_types.h      |   2 +
>  drivers/gpu/drm/xe/xe_hw_engine.c             |  10 +
>  drivers/gpu/drm/xe/xe_hw_engine.h             |   1 +
>  drivers/gpu/drm/xe/xe_lrc.c                   |  51 ++
>  drivers/gpu/drm/xe/xe_lrc.h                   |   3 +
>  drivers/gpu/drm/xe/xe_lrc_types.h             |   4 +
>  drivers/gpu/drm/xe/xe_migrate.c               | 585 +++++--------
>  drivers/gpu/drm/xe/xe_migrate.h               |  93 +--
>  drivers/gpu/drm/xe/xe_module.c                |   4 +
>  drivers/gpu/drm/xe/xe_module.h                |   1 +
>  drivers/gpu/drm/xe/xe_pagefault.c             |   3 +
>  drivers/gpu/drm/xe/xe_pci.c                   |   2 +
>  drivers/gpu/drm/xe/xe_pci_types.h             |   1 +
>  drivers/gpu/drm/xe/xe_pt.c                    | 773 +++++++++++-----
> --
>  drivers/gpu/drm/xe/xe_pt.h                    |  12 +-
>  drivers/gpu/drm/xe/xe_pt_types.h              |  49 +-
>  drivers/gpu/drm/xe/xe_ring_ops.c              |  31 +
>  drivers/gpu/drm/xe/xe_sched_job.c             | 100 ++-
>  drivers/gpu/drm/xe/xe_sched_job_types.h       |  36 +-
>  drivers/gpu/drm/xe/xe_sync.c                  |  20 +-
>  drivers/gpu/drm/xe/xe_tlb_inval_job.c         |  28 +-
>  drivers/gpu/drm/xe/xe_tlb_inval_job.h         |   4 +-
>  drivers/gpu/drm/xe/xe_vm.c                    | 241 +++---
>  drivers/gpu/drm/xe/xe_vm.h                    |   3 +
>  drivers/gpu/drm/xe/xe_vm_types.h              |  22 +-
>  41 files changed, 1658 insertions(+), 1179 deletions(-)
>  create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.c
>  create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.h

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2026-04-07 15:22 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
2026-02-28  1:34 ` [PATCH v3 01/25] drm/xe: Drop struct xe_migrate_pt_update argument from populate/clear vfuns Matthew Brost
2026-03-05 14:17   ` Francois Dugast
2026-02-28  1:34 ` [PATCH v3 02/25] drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper Matthew Brost
2026-03-05 14:39   ` Francois Dugast
2026-02-28  1:34 ` [PATCH v3 03/25] drm/xe: Decouple exec queue idle check from LRC Matthew Brost
2026-03-02 20:50   ` Summers, Stuart
2026-03-02 21:02     ` Matthew Brost
2026-03-03 21:26       ` Summers, Stuart
2026-03-03 22:42         ` Matthew Brost
2026-03-03 22:54           ` Summers, Stuart
2026-02-28  1:34 ` [PATCH v3 04/25] drm/xe: Add job count to GuC exec queue snapshot Matthew Brost
2026-03-02 20:50   ` Summers, Stuart
2026-02-28  1:34 ` [PATCH v3 05/25] drm/xe: Update xe_bo_put_deferred arguments to include writeback flag Matthew Brost
2026-04-01 12:20   ` Francois Dugast
2026-04-01 22:39     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 06/25] drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC Matthew Brost
2026-04-01 12:22   ` Francois Dugast
2026-04-01 22:38     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 07/25] drm/xe: Update scheduler job layer to support PT jobs Matthew Brost
2026-03-03 22:50   ` Summers, Stuart
2026-03-03 23:00     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 08/25] drm/xe: Add helpers to access PT ops Matthew Brost
2026-04-07 15:22   ` Francois Dugast
2026-02-28  1:34 ` [PATCH v3 09/25] drm/xe: Add struct xe_pt_job_ops Matthew Brost
2026-03-03 23:26   ` Summers, Stuart
2026-03-03 23:28     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 10/25] drm/xe: Update GuC submission backend to run PT jobs Matthew Brost
2026-03-03 23:28   ` Summers, Stuart
2026-03-04  0:26     ` Matthew Brost
2026-03-04 20:43       ` Summers, Stuart
2026-03-04 21:53         ` Matthew Brost
2026-03-05 20:24           ` Summers, Stuart
2026-02-28  1:34 ` [PATCH v3 11/25] drm/xe: Store level in struct xe_vm_pgtable_update Matthew Brost
2026-03-03 23:44   ` Summers, Stuart
2026-02-28  1:34 ` [PATCH v3 12/25] drm/xe: Don't use migrate exec queue for page fault binds Matthew Brost
2026-02-28  1:34 ` [PATCH v3 13/25] drm/xe: Enable CPU binds for jobs Matthew Brost
2026-02-28  1:34 ` [PATCH v3 14/25] drm/xe: Remove unused arguments from xe_migrate_pt_update_ops Matthew Brost
2026-02-28  1:34 ` [PATCH v3 15/25] drm/xe: Make bind queues operate cross-tile Matthew Brost
2026-02-28  1:34 ` [PATCH v3 16/25] drm/xe: Add CPU bind layer Matthew Brost
2026-02-28  1:34 ` [PATCH v3 17/25] drm/xe: Add device flag to enable PT mirroring across tiles Matthew Brost
2026-02-28  1:34 ` [PATCH v3 18/25] drm/xe: Add xe_hw_engine_write_ring_tail Matthew Brost
2026-02-28  1:34 ` [PATCH v3 19/25] drm/xe: Add ULLS support to LRC Matthew Brost
2026-03-05 20:21   ` Francois Dugast
2026-02-28  1:34 ` [PATCH v3 20/25] drm/xe: Add ULLS migration job support to migration layer Matthew Brost
2026-03-05 23:34   ` Summers, Stuart
2026-03-09 23:11     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 21/25] drm/xe: Add MI_SEMAPHORE_WAIT instruction defs Matthew Brost
2026-02-28  1:34 ` [PATCH v3 22/25] drm/xe: Add ULLS migration job support to ring ops Matthew Brost
2026-02-28  1:34 ` [PATCH v3 23/25] drm/xe: Add ULLS migration job support to GuC submission Matthew Brost
2026-02-28  1:35 ` [PATCH v3 24/25] drm/xe: Enter ULLS for migration jobs upon page fault or SVM prefetch Matthew Brost
2026-02-28  1:35 ` [PATCH v3 25/25] drm/xe: Add modparam to enable / disable ULLS on migrate queue Matthew Brost
2026-03-05 22:59   ` Summers, Stuart
2026-04-01 22:44     ` Matthew Brost
2026-02-28  1:43 ` ✗ CI.checkpatch: warning for CPU binds and ULLS on migration queue (rev3) Patchwork
2026-02-28  1:44 ` ✓ CI.KUnit: success " Patchwork
2026-02-28  2:32 ` ✓ Xe.CI.BAT: " Patchwork
2026-02-28 13:59 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-03-02 17:54   ` Summers, Stuart
2026-03-02 18:13     ` Matthew Brost
2026-03-05 22:56 ` [PATCH v3 00/25] CPU binds and ULLS on migration queue Summers, Stuart
2026-03-10 22:17   ` Matthew Brost
2026-03-20 15:31 ` Thomas Hellström

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox