[PATCH v2 00/12] Context based TLB invalidations

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 00/12] Context based TLB invalidations
@ 2025-11-04 19:56 Matthew Brost
  2025-11-04 19:56 ` [PATCH v2 01/12] drm/xe: Add normalize_invalidation_range Matthew Brost
                   ` (11 more replies)
  0 siblings, 12 replies; 32+ messages in thread
From: Matthew Brost @ 2025-11-04 19:56 UTC (permalink / raw)
  To: intel-xe; +Cc: stuart.summers, lucas.demarchi, matthew.d.roper

Add support for context based TLB invalidations.

Matt

Matthew Brost (12):
  drm/xe: Add normalize_invalidation_range
  drm/xe: Make usm.asid_to_vm allocation use GFP_NOWAIT
  drm/xe: Add xe_device_asid_to_vm helper
  drm/xe: Add vm to exec queues association
  drm/xe: Taint TLB invalidation seqno lock with GFP_KERNEL
  drm/xe: Do not forward invalid TLB invalidation seqnos to upper layers
  drm/xe: Rename send_tlb_inval_ppgtt to send_tlb_inval_asid_ppgtt
  drm/xe: Add send_tlb_inval_ppgtt helper
  drm/xe: Add xe_tlb_inval_idle helper
  drm/xe: Add exec queue active vfunc
  drm/xe: Add context-based invalidation to GuC TLB invalidation backend
  drm/xe: Enable context TLB invalidations for CI

 drivers/gpu/drm/xe/xe_device.c           |  25 +++
 drivers/gpu/drm/xe/xe_device.h           |  11 +-
 drivers/gpu/drm/xe/xe_device_types.h     |   9 +
 drivers/gpu/drm/xe/xe_exec_queue.c       |   7 +-
 drivers/gpu/drm/xe/xe_exec_queue_types.h |   5 +
 drivers/gpu/drm/xe/xe_execlist.c         |   7 +
 drivers/gpu/drm/xe/xe_guc_submit.c       |   6 +
 drivers/gpu/drm/xe/xe_guc_tlb_inval.c    | 215 ++++++++++++++++++-----
 drivers/gpu/drm/xe/xe_pci.c              |   2 +
 drivers/gpu/drm/xe/xe_pci_types.h        |   1 +
 drivers/gpu/drm/xe/xe_tlb_inval.c        |  33 ++++
 drivers/gpu/drm/xe/xe_tlb_inval.h        |   2 +
 drivers/gpu/drm/xe/xe_tlb_inval_types.h  |   1 +
 drivers/gpu/drm/xe/xe_vm.c               |  49 +++++-
 drivers/gpu/drm/xe/xe_vm.h               |   3 +
 drivers/gpu/drm/xe/xe_vm_types.h         |  13 ++
 16 files changed, 337 insertions(+), 52 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 01/12] drm/xe: Add normalize_invalidation_range
  2025-11-04 19:56 [PATCH v2 00/12] Context based TLB invalidations Matthew Brost
@ 2025-11-04 19:56 ` Matthew Brost
  2025-11-06 20:03   ` Summers, Stuart
  2025-11-04 19:56 ` [PATCH v2 02/12] drm/xe: Make usm.asid_to_vm allocation use GFP_NOWAIT Matthew Brost
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Matthew Brost @ 2025-11-04 19:56 UTC (permalink / raw)
  To: intel-xe; +Cc: stuart.summers, lucas.demarchi, matthew.d.roper

Extract the code that determines the alignment of TLB invalidation into
a helper function — normalize_invalidation_range. This will be useful
when adding context-based invalidations to the GuC TLB invalidation
backend.

Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 71 +++++++++++++--------------
 1 file changed, 35 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
index a80175c7c478..61bfa0d485f6 100644
--- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
+++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
@@ -92,6 +92,38 @@ static int send_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval, u32 seqno)
 	return -ECANCELED;
 }
 
+static u64 normalize_invalidation_range(struct xe_gt *gt, u64 *start, u64 *end)
+{
+	u64 orig_start = *start;
+	u64 length = *end - *start;
+	u64 align;
+
+	if (length < SZ_4K)
+		length = SZ_4K;
+
+	align = roundup_pow_of_two(length);
+	*start = ALIGN_DOWN(*start, align);
+	*end = ALIGN(*end, align);
+	length = align;
+	while (*start + length < *end) {
+		length <<= 1;
+		*start = ALIGN_DOWN(orig_start, length);
+	}
+
+	if (length >= SZ_2M) {
+		length = max_t(u64, SZ_16M, length);
+		*start = ALIGN_DOWN(orig_start, length);
+	}
+
+	xe_gt_assert(gt, length >= SZ_4K);
+	xe_gt_assert(gt, is_power_of_2(length));
+	xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1,
+					    ilog2(SZ_2M) + 1)));
+	xe_gt_assert(gt, IS_ALIGNED(*start, length));
+
+	return length;
+}
+
 /*
  * Ensure that roundup_pow_of_two(length) doesn't overflow.
  * Note that roundup_pow_of_two() operates on unsigned long,
@@ -118,47 +150,14 @@ static int send_tlb_inval_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno,
 	    length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
 		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
 	} else {
-		u64 orig_start = start;
-		u64 align;
-
-		if (length < SZ_4K)
-			length = SZ_4K;
-
-		/*
-		 * We need to invalidate a higher granularity if start address
-		 * is not aligned to length. When start is not aligned with
-		 * length we need to find the length large enough to create an
-		 * address mask covering the required range.
-		 */
-		align = roundup_pow_of_two(length);
-		start = ALIGN_DOWN(start, align);
-		end = ALIGN(end, align);
-		length = align;
-		while (start + length < end) {
-			length <<= 1;
-			start = ALIGN_DOWN(orig_start, length);
-		}
-
-		/*
-		 * Minimum invalidation size for a 2MB page that the hardware
-		 * expects is 16MB
-		 */
-		if (length >= SZ_2M) {
-			length = max_t(u64, SZ_16M, length);
-			start = ALIGN_DOWN(orig_start, length);
-		}
-
-		xe_gt_assert(gt, length >= SZ_4K);
-		xe_gt_assert(gt, is_power_of_2(length));
-		xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1,
-						    ilog2(SZ_2M) + 1)));
-		xe_gt_assert(gt, IS_ALIGNED(start, length));
+		u64 normalize_len = normalize_invalidation_range(gt, &start,
+								 &end);
 
 		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
 		action[len++] = asid;
 		action[len++] = lower_32_bits(start);
 		action[len++] = upper_32_bits(start);
-		action[len++] = ilog2(length) - ilog2(SZ_4K);
+		action[len++] = ilog2(normalize_len) - ilog2(SZ_4K);
 	}
 
 	xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 02/12] drm/xe: Make usm.asid_to_vm allocation use GFP_NOWAIT
  2025-11-04 19:56 [PATCH v2 00/12] Context based TLB invalidations Matthew Brost
  2025-11-04 19:56 ` [PATCH v2 01/12] drm/xe: Add normalize_invalidation_range Matthew Brost
@ 2025-11-04 19:56 ` Matthew Brost
  2025-11-06 22:08   ` Summers, Stuart
  2025-11-06 22:13   ` Summers, Stuart
  2025-11-04 19:56 ` [PATCH v2 03/12] drm/xe: Add xe_device_asid_to_vm helper Matthew Brost
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 32+ messages in thread
From: Matthew Brost @ 2025-11-04 19:56 UTC (permalink / raw)
  To: intel-xe; +Cc: stuart.summers, lucas.demarchi, matthew.d.roper

Ensure the asid_to_vm lookup is reclaim-safe so it can be performed
during TLB invalidations, which is necessary for context-based TLB
invalidation support.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 00f3520dec38..84f4c8f1be33 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1632,7 +1632,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 		down_write(&xe->usm.lock);
 		err = xa_alloc_cyclic(&xe->usm.asid_to_vm, &asid, vm,
 				      XA_LIMIT(1, XE_MAX_ASID - 1),
-				      &xe->usm.next_asid, GFP_KERNEL);
+				      &xe->usm.next_asid, GFP_NOWAIT);
 		up_write(&xe->usm.lock);
 		if (err < 0)
 			goto err_close;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 03/12] drm/xe: Add xe_device_asid_to_vm helper
  2025-11-04 19:56 [PATCH v2 00/12] Context based TLB invalidations Matthew Brost
  2025-11-04 19:56 ` [PATCH v2 01/12] drm/xe: Add normalize_invalidation_range Matthew Brost
  2025-11-04 19:56 ` [PATCH v2 02/12] drm/xe: Make usm.asid_to_vm allocation use GFP_NOWAIT Matthew Brost
@ 2025-11-04 19:56 ` Matthew Brost
  2025-12-11 22:07   ` Matt Atwood
  2025-11-04 19:56 ` [PATCH v2 04/12] drm/xe: Add vm to exec queues association Matthew Brost
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Matthew Brost @ 2025-11-04 19:56 UTC (permalink / raw)
  To: intel-xe; +Cc: stuart.summers, lucas.demarchi, matthew.d.roper

Introduce the xe_device_asid_to_vm helper, which can be used throughout
the driver to resolve the VM from a given ASID.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c | 25 +++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_device.h |  4 ++++
 2 files changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 86d5960476af..57907904c49d 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -1290,3 +1290,28 @@ void xe_device_declare_wedged(struct xe_device *xe)
 		drm_dev_wedged_event(&xe->drm, xe->wedged.method, NULL);
 	}
 }
+
+/**
+ * xe_device_asid_to_vm() - Find VM from ASID
+ * @xe: the &xe_device
+ * @asid: Address space ID
+ *
+ * Find a VM from ASID and take a reference to VM which caller must drop.
+ * Reclaim safe.
+ *
+ * Return: VM on success, ERR_PTR on failure
+ */
+struct xe_vm *xe_device_asid_to_vm(struct xe_device *xe, u32 asid)
+{
+	struct xe_vm *vm;
+
+	down_read(&xe->usm.lock);
+	vm = xa_load(&xe->usm.asid_to_vm, asid);
+	if (vm)
+		xe_vm_get(vm);
+	else
+		vm = ERR_PTR(-EINVAL);
+	up_read(&xe->usm.lock);
+
+	return vm;
+}
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 32cc6323b7f6..538202eebc16 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -6,6 +6,8 @@
 #ifndef _XE_DEVICE_H_
 #define _XE_DEVICE_H_
 
+struct xe_vm;
+
 #include <drm/drm_util.h>
 
 #include "xe_device_types.h"
@@ -195,6 +197,8 @@ void xe_file_put(struct xe_file *xef);
 
 int xe_is_injection_active(void);
 
+struct xe_vm *xe_device_asid_to_vm(struct xe_device *xe, u32 asid);
+
 /*
  * Occasionally it is seen that the G2H worker starts running after a delay of more than
  * a second even after being queued and activated by the Linux workqueue subsystem. This
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 04/12] drm/xe: Add vm to exec queues association
  2025-11-04 19:56 [PATCH v2 00/12] Context based TLB invalidations Matthew Brost
                   ` (2 preceding siblings ...)
  2025-11-04 19:56 ` [PATCH v2 03/12] drm/xe: Add xe_device_asid_to_vm helper Matthew Brost
@ 2025-11-04 19:56 ` Matthew Brost
  2025-11-06 22:15   ` Summers, Stuart
  2025-12-12 21:03   ` Summers, Stuart
  2025-11-04 19:56 ` [PATCH v2 05/12] drm/xe: Taint TLB invalidation seqno lock with GFP_KERNEL Matthew Brost
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 32+ messages in thread
From: Matthew Brost @ 2025-11-04 19:56 UTC (permalink / raw)
  To: intel-xe; +Cc: stuart.summers, lucas.demarchi, matthew.d.roper

Maintain a list of exec queues per vm which will be used by TLB
invalidation code to do context-ID based tlb invalidations.

Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_device.h           |  7 ----
 drivers/gpu/drm/xe/xe_device_types.h     |  7 ++++
 drivers/gpu/drm/xe/xe_exec_queue.c       |  7 +++-
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  3 ++
 drivers/gpu/drm/xe/xe_vm.c               | 47 ++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm.h               |  3 ++
 drivers/gpu/drm/xe/xe_vm_types.h         | 13 +++++++
 7 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 538202eebc16..764f24f4adfc 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -62,13 +62,6 @@ static inline struct xe_tile *xe_device_get_root_tile(struct xe_device *xe)
 	return &xe->tiles[0];
 }
 
-/*
- * Highest GT/tile count for any platform.  Used only for memory allocation
- * sizing.  Any logic looping over GTs or mapping userspace GT IDs into GT
- * structures should use the per-platform xe->info.max_gt_per_tile instead.
- */
-#define XE_MAX_GT_PER_TILE 2
-
 static inline struct xe_gt *xe_device_get_gt(struct xe_device *xe, u8 gt_id)
 {
 	struct xe_tile *tile;
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index af0ce275b032..145951dd95c9 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -57,6 +57,13 @@ struct xe_vram_region;
 #define XE_GT1		1
 #define XE_MAX_TILES_PER_DEVICE	(XE_GT1 + 1)
 
+/*
+ * Highest GT/tile count for any platform.  Used only for memory allocation
+ * sizing.  Any logic looping over GTs or mapping userspace GT IDs into GT
+ * structures should use the per-platform xe->info.max_gt_per_tile instead.
+ */
+#define XE_MAX_GT_PER_TILE 2
+
 #define XE_MAX_ASID	(BIT(20))
 
 #define IS_PLATFORM_STEP(_xe, _platform, min_step, max_step)	\
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 1b57d7c2cc94..49822baf5967 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -72,8 +72,10 @@ static void __xe_exec_queue_free(struct xe_exec_queue *q)
 
 	if (xe_exec_queue_uses_pxp(q))
 		xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
-	if (q->vm)
+	if (q->vm) {
+		xe_vm_remove_exec_queue(q->vm, q);
 		xe_vm_put(q->vm);
+	}
 
 	if (q->xef)
 		xe_file_put(q->xef);
@@ -143,6 +145,7 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
 	q->ring_ops = gt->ring_ops[hwe->class];
 	q->ops = gt->exec_queue_ops;
 	INIT_LIST_HEAD(&q->lr.link);
+	INIT_LIST_HEAD(&q->vm_exec_queue_link);
 	INIT_LIST_HEAD(&q->multi_gt_link);
 	INIT_LIST_HEAD(&q->hw_engine_group_link);
 	INIT_LIST_HEAD(&q->pxp.link);
@@ -796,6 +799,8 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 	}
 
 	q->xef = xe_file_get(xef);
+	if (eci[0].engine_class != DRM_XE_ENGINE_CLASS_VM_BIND)
+		xe_vm_add_exec_queue(vm, q);
 
 	/* user id alloc must always be last in ioctl to prevent UAF */
 	err = xa_alloc(&xef->exec_queue.xa, &id, q, xa_limit_32b, GFP_KERNEL);
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index c8807268ec6c..a2281fcb55b1 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -147,6 +147,9 @@ struct xe_exec_queue {
 		struct xe_dep_scheduler *dep_scheduler;
 	} tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
 
+	/** @vm_exec_queue_link: Link to track exec queue within a VM's list of exec queues. */
+	struct list_head vm_exec_queue_link;
+
 	/** @pxp: PXP info tracking */
 	struct {
 		/** @pxp.type: PXP session type used by this queue */
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 84f4c8f1be33..cccdd931dd5e 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1507,8 +1507,20 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 	INIT_WORK(&vm->destroy_work, vm_destroy_work_func);
 
 	INIT_LIST_HEAD(&vm->preempt.exec_queues);
+	INIT_LIST_HEAD(&vm->exec_queues.list);
 	vm->preempt.min_run_period_ms = 10;	/* FIXME: Wire up to uAPI */
 
+	init_rwsem(&vm->exec_queues.lock);
+	if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
+		fs_reclaim_acquire(GFP_KERNEL);
+		might_lock(&vm->exec_queues.lock);
+		fs_reclaim_release(GFP_KERNEL);
+
+		down_read(&vm->exec_queues.lock);
+		might_lock(&xe_root_mmio_gt(xe)->uc.guc.ct.lock);
+		up_read(&vm->exec_queues.lock);
+	}
+
 	for_each_tile(tile, xe, id)
 		xe_range_fence_tree_init(&vm->rftree[id]);
 
@@ -4387,3 +4399,38 @@ int xe_vm_alloc_cpu_addr_mirror_vma(struct xe_vm *vm, uint64_t start, uint64_t r
 
 	return xe_vm_alloc_vma(vm, &map_req, false);
 }
+
+/**
+ * xe_vm_add_exec_queue() - Add exec queue to VM
+ * @vm: The VM.
+ * @q: The exec_queue
+ */
+void xe_vm_add_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
+{
+	/* User VMs and queues only */
+	xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_KERNEL));
+	xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_PERMANENT));
+	xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_VM));
+	xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_MIGRATE));
+	xe_assert(vm->xe, vm->xef);
+
+	down_write(&vm->exec_queues.lock);
+	list_add(&q->vm_exec_queue_link, &vm->exec_queues.list);
+	++vm->exec_queues.count[q->gt->info.id];
+	up_write(&vm->exec_queues.lock);
+}
+
+/**
+ * xe_vm_remove_exec_queue() - Remove exec queue from VM
+ * @vm: The VM.
+ * @q: The exec_queue
+ */
+void xe_vm_remove_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
+{
+	down_write(&vm->exec_queues.lock);
+	if (!list_empty(&q->vm_exec_queue_link)) {
+		list_del(&q->vm_exec_queue_link);
+		--vm->exec_queues.count[q->gt->info.id];
+	}
+	up_write(&vm->exec_queues.lock);
+}
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index ef8a5019574e..5f3341ef99d2 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -284,6 +284,9 @@ static inline struct dma_resv *xe_vm_resv(struct xe_vm *vm)
 
 void xe_vm_kill(struct xe_vm *vm, bool unlocked);
 
+void xe_vm_add_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q);
+void xe_vm_remove_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q);
+
 /**
  * xe_vm_assert_held(vm) - Assert that the vm's reservation object is held.
  * @vm: The vm
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 830ed7b05c27..180b48d62480 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -290,6 +290,19 @@ struct xe_vm {
 		struct list_head pm_activate_link;
 	} preempt;
 
+	/** @exec_queues: Manages list of exec queues attached to this VM, protected by lock. */
+	struct {
+		/** @exec_queues.list: list of exec queues attached to this VM */
+		struct list_head list;
+		/**
+		 * @exec_queues.count: count of exec queues attached to this VM,
+		 * per GT
+		 */
+		int count[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE];
+		/** @exec_queues.lock: lock to protect exec_queues list */
+		struct rw_semaphore lock;
+	} exec_queues;
+
 	/** @um: unified memory state */
 	struct {
 		/** @asid: address space ID, unique to each VM */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 05/12] drm/xe: Taint TLB invalidation seqno lock with GFP_KERNEL
  2025-11-04 19:56 [PATCH v2 00/12] Context based TLB invalidations Matthew Brost
                   ` (3 preceding siblings ...)
  2025-11-04 19:56 ` [PATCH v2 04/12] drm/xe: Add vm to exec queues association Matthew Brost
@ 2025-11-04 19:56 ` Matthew Brost
  2025-12-11 22:35   ` Matt Atwood
  2025-11-04 19:56 ` [PATCH v2 06/12] drm/xe: Do not forward invalid TLB invalidation seqnos to upper layers Matthew Brost
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Matthew Brost @ 2025-11-04 19:56 UTC (permalink / raw)
  To: intel-xe; +Cc: stuart.summers, lucas.demarchi, matthew.d.roper

Taint TLB invalidation seqno lock with GFP_KERNEL as TLB invalidations
can be in the path of reclaim (e.g., MMU notifiers).

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_tlb_inval.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
index 918a59e686ea..7ee081b94f90 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval.c
+++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
@@ -114,6 +114,16 @@ static void tlb_inval_fini(struct drm_device *drm, void *arg)
 	xe_tlb_inval_reset(tlb_inval);
 }
 
+static void primelockdep(struct xe_tlb_inval *tlb_inval)
+{
+	if (!IS_ENABLED(CONFIG_LOCKDEP))
+		return;
+
+	fs_reclaim_acquire(GFP_KERNEL);
+	might_lock(&tlb_inval->seqno_lock);
+	fs_reclaim_release(GFP_KERNEL);
+}
+
 /**
  * xe_gt_tlb_inval_init - Initialize TLB invalidation state
  * @gt: GT structure
@@ -140,6 +150,8 @@ int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
 	if (err)
 		return err;
 
+	primelockdep(tlb_inval);
+
 	tlb_inval->job_wq = drmm_alloc_ordered_workqueue(&xe->drm,
 							 "gt-tbl-inval-job-wq",
 							 WQ_MEM_RECLAIM);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 06/12] drm/xe: Do not forward invalid TLB invalidation seqnos to upper layers
  2025-11-04 19:56 [PATCH v2 00/12] Context based TLB invalidations Matthew Brost
                   ` (4 preceding siblings ...)
  2025-11-04 19:56 ` [PATCH v2 05/12] drm/xe: Taint TLB invalidation seqno lock with GFP_KERNEL Matthew Brost
@ 2025-11-04 19:56 ` Matthew Brost
  2025-11-06 22:05   ` Summers, Stuart
  2025-11-04 19:56 ` [PATCH v2 07/12] drm/xe: Rename send_tlb_inval_ppgtt to send_tlb_inval_asid_ppgtt Matthew Brost
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Matthew Brost @ 2025-11-04 19:56 UTC (permalink / raw)
  To: intel-xe; +Cc: stuart.summers, lucas.demarchi, matthew.d.roper

Context-based TLB invalidations send multiple H2G messages per seqno,
with only the final H2G containing a valid seqno — the others carry an
invalid seqno. The G2H handler drops these invalid seqnos to avoid
prematurely signaling a TLB invalidation fence.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_tlb_inval.c   | 3 ++-
 drivers/gpu/drm/xe/xe_tlb_inval_types.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
index 61bfa0d485f6..995789f0d31f 100644
--- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
+++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
@@ -235,7 +235,8 @@ int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	if (unlikely(len != 1))
 		return -EPROTO;
 
-	xe_tlb_inval_done_handler(&gt->tlb_inval, msg[0]);
+	if (msg[0] != TLB_INVALIDATION_SEQNO_INVALID)
+		xe_tlb_inval_done_handler(&gt->tlb_inval, msg[0]);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
index 8f8b060e9005..7a6967ce3b76 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
+++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
@@ -80,6 +80,7 @@ struct xe_tlb_inval {
 	const struct xe_tlb_inval_ops *ops;
 	/** @tlb_inval.seqno: TLB invalidation seqno, protected by CT lock */
 #define TLB_INVALIDATION_SEQNO_MAX	0x100000
+#define TLB_INVALIDATION_SEQNO_INVALID	TLB_INVALIDATION_SEQNO_MAX
 	int seqno;
 	/** @tlb_invalidation.seqno_lock: protects @tlb_invalidation.seqno */
 	struct mutex seqno_lock;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 07/12] drm/xe: Rename send_tlb_inval_ppgtt to send_tlb_inval_asid_ppgtt
  2025-11-04 19:56 [PATCH v2 00/12] Context based TLB invalidations Matthew Brost
                   ` (5 preceding siblings ...)
  2025-11-04 19:56 ` [PATCH v2 06/12] drm/xe: Do not forward invalid TLB invalidation seqnos to upper layers Matthew Brost
@ 2025-11-04 19:56 ` Matthew Brost
  2025-11-06 20:22   ` Summers, Stuart
  2025-11-04 19:56 ` [PATCH v2 08/12] drm/xe: Add send_tlb_inval_ppgtt helper Matthew Brost
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Matthew Brost @ 2025-11-04 19:56 UTC (permalink / raw)
  To: intel-xe; +Cc: stuart.summers, lucas.demarchi, matthew.d.roper

Context-based TLB invalidations have their own set of GuC TLB
invalidation operations. Rename the current PPGTT invalidation function,
which operates on ASIDs, to a more descriptive name that reflects its
purpose.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
index 995789f0d31f..42e9fbd062ba 100644
--- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
+++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
@@ -131,8 +131,8 @@ static u64 normalize_invalidation_range(struct xe_gt *gt, u64 *start, u64 *end)
  */
 #define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
 
-static int send_tlb_inval_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno,
-				u64 start, u64 end, u32 asid)
+static int send_tlb_inval_asid_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno,
+				     u64 start, u64 end, u32 asid)
 {
 #define MAX_TLB_INVALIDATION_LEN	7
 	struct xe_guc *guc = tlb_inval->private;
@@ -195,7 +195,7 @@ static long tlb_inval_timeout_delay(struct xe_tlb_inval *tlb_inval)
 static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
 	.all = send_tlb_inval_all,
 	.ggtt = send_tlb_inval_ggtt,
-	.ppgtt = send_tlb_inval_ppgtt,
+	.ppgtt = send_tlb_inval_asid_ppgtt,
 	.initialized = tlb_inval_initialized,
 	.flush = tlb_inval_flush,
 	.timeout_delay = tlb_inval_timeout_delay,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 08/12] drm/xe: Add send_tlb_inval_ppgtt helper
  2025-11-04 19:56 [PATCH v2 00/12] Context based TLB invalidations Matthew Brost
                   ` (6 preceding siblings ...)
  2025-11-04 19:56 ` [PATCH v2 07/12] drm/xe: Rename send_tlb_inval_ppgtt to send_tlb_inval_asid_ppgtt Matthew Brost
@ 2025-11-04 19:56 ` Matthew Brost
  2025-11-06 20:25   ` Summers, Stuart
  2025-11-04 19:56 ` [PATCH v2 09/12] drm/xe: Add xe_tlb_inval_idle helper Matthew Brost
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Matthew Brost @ 2025-11-04 19:56 UTC (permalink / raw)
  To: intel-xe; +Cc: stuart.summers, lucas.demarchi, matthew.d.roper

Extract the common code that issues a TLB invalidation H2G for PPGTTs
into a helper function. This helper can be reused for both ASID-based
and context-based TLB invalidations.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
index 42e9fbd062ba..6978ee8edf2e 100644
--- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
+++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
@@ -131,19 +131,15 @@ static u64 normalize_invalidation_range(struct xe_gt *gt, u64 *start, u64 *end)
  */
 #define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
 
-static int send_tlb_inval_asid_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno,
-				     u64 start, u64 end, u32 asid)
+static int send_tlb_inval_ppgtt(struct xe_guc *guc, u32 seqno, u64 start,
+				u64 end, u32 id, u32 type)
 {
 #define MAX_TLB_INVALIDATION_LEN	7
-	struct xe_guc *guc = tlb_inval->private;
 	struct xe_gt *gt = guc_to_gt(guc);
 	u32 action[MAX_TLB_INVALIDATION_LEN];
 	u64 length = end - start;
 	int len = 0;
 
-	if (guc_to_xe(guc)->info.force_execlist)
-		return -ECANCELED;
-
 	action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
 	action[len++] = seqno;
 	if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
@@ -153,18 +149,33 @@ static int send_tlb_inval_asid_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno,
 		u64 normalize_len = normalize_invalidation_range(gt, &start,
 								 &end);
 
-		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
-		action[len++] = asid;
+		action[len++] = MAKE_INVAL_OP(type);
+		action[len++] = id;
 		action[len++] = lower_32_bits(start);
 		action[len++] = upper_32_bits(start);
 		action[len++] = ilog2(normalize_len) - ilog2(SZ_4K);
 	}
 
 	xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
+#undef MAX_TLB_INVALIDATION_LEN
 
 	return send_tlb_inval(guc, action, len);
 }
 
+static int send_tlb_inval_asid_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno,
+				     u64 start, u64 end, u32 asid)
+{
+	struct xe_guc *guc = tlb_inval->private;
+
+	lockdep_assert_held(&tlb_inval->seqno_lock);
+
+	if (guc_to_xe(guc)->info.force_execlist)
+		return -ECANCELED;
+
+	return send_tlb_inval_ppgtt(guc, seqno, start, end, asid,
+				    XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
+}
+
 static bool tlb_inval_initialized(struct xe_tlb_inval *tlb_inval)
 {
 	struct xe_guc *guc = tlb_inval->private;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 09/12] drm/xe: Add xe_tlb_inval_idle helper
  2025-11-04 19:56 [PATCH v2 00/12] Context based TLB invalidations Matthew Brost
                   ` (7 preceding siblings ...)
  2025-11-04 19:56 ` [PATCH v2 08/12] drm/xe: Add send_tlb_inval_ppgtt helper Matthew Brost
@ 2025-11-04 19:56 ` Matthew Brost
  2025-11-10 18:48   ` Summers, Stuart
  2025-11-04 19:56 ` [PATCH v2 10/12] drm/xe: Add exec queue active vfunc Matthew Brost
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Matthew Brost @ 2025-11-04 19:56 UTC (permalink / raw)
  To: intel-xe; +Cc: stuart.summers, lucas.demarchi, matthew.d.roper

Introduce the xe_tlb_inval_idle helper to detect whether any TLB
invalidations are currently in flight. This is used in context-based TLB
invalidations to determine whether dummy TLB invalidations need to be
sent to maintain proper TLB invalidation fence ordering..

v2:
 - Implement xe_tlb_inval_idle based on pending list

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_tlb_inval.c | 21 +++++++++++++++++++++
 drivers/gpu/drm/xe/xe_tlb_inval.h |  2 ++
 2 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
index 7ee081b94f90..7ae6b9f884e9 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval.c
+++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
@@ -44,11 +44,14 @@ static void xe_tlb_inval_fence_fini(struct xe_tlb_inval_fence *fence)
 static void
 xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence)
 {
+	struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
 	bool stack = test_bit(FENCE_STACK_BIT, &fence->base.flags);
 
 	lockdep_assert_held(&fence->tlb_inval->pending_lock);
 
 	list_del(&fence->link);
+	if (list_empty(&tlb_inval->pending_fences))
+		cancel_delayed_work(&tlb_inval->fence_tdr);
 	trace_xe_tlb_inval_fence_signal(fence->tlb_inval->xe, fence);
 	xe_tlb_inval_fence_fini(fence);
 	dma_fence_signal(&fence->base);
@@ -443,3 +446,21 @@ void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
 		dma_fence_get(&fence->base);
 	fence->tlb_inval = tlb_inval;
 }
+
+/**
+ * xe_tlb_inval_idle() - Initialize TLB invalidation is idle
+ * @tlb_inval: TLB invalidation client
+ *
+ * Check the TLB invalidation seqno to determine if it is idle (i.e., no TLB
+ * invalidations are in flight). Expected to be called in the backend after the
+ * fence has been added to the pending list, and takes this into account.
+ *
+ * Return: True if TLB invalidation client is idle, False otherwise
+ */
+bool xe_tlb_inval_idle(struct xe_tlb_inval *tlb_inval)
+{
+	lockdep_assert_held(&tlb_inval->seqno_lock);
+
+	guard(spinlock_irq)(&tlb_inval->pending_lock);
+	return list_is_singular(&tlb_inval->pending_fences);
+}
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h b/drivers/gpu/drm/xe/xe_tlb_inval.h
index 05614915463a..44a6d9177489 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval.h
+++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
@@ -43,4 +43,6 @@ xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence *fence)
 
 void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno);
 
+bool xe_tlb_inval_idle(struct xe_tlb_inval *tlb_inval);
+
 #endif	/* _XE_TLB_INVAL_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 10/12] drm/xe: Add exec queue active vfunc
  2025-11-04 19:56 [PATCH v2 00/12] Context based TLB invalidations Matthew Brost
                   ` (8 preceding siblings ...)
  2025-11-04 19:56 ` [PATCH v2 09/12] drm/xe: Add xe_tlb_inval_idle helper Matthew Brost
@ 2025-11-04 19:56 ` Matthew Brost
  2025-11-04 19:56 ` [PATCH v2 11/12] drm/xe: Add context-based invalidation to GuC TLB invalidation backend Matthew Brost
  2025-11-04 19:56 ` [PATCH v2 12/12] drm/xe: Enable context TLB invalidations for CI Matthew Brost
  11 siblings, 0 replies; 32+ messages in thread
From: Matthew Brost @ 2025-11-04 19:56 UTC (permalink / raw)
  To: intel-xe; +Cc: stuart.summers, lucas.demarchi, matthew.d.roper

If an exec queue is inactive (e.g., not registered or scheduling is
disabled), TLB invalidations are not issued for that queue. Add a
virtual function to determine the active state, which TLB invalidation
logic can hook into.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 ++
 drivers/gpu/drm/xe/xe_execlist.c         | 7 +++++++
 drivers/gpu/drm/xe/xe_guc_submit.c       | 6 ++++++
 3 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index a2281fcb55b1..4ffd55e39e6b 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -228,6 +228,8 @@ struct xe_exec_queue_ops {
 	void (*resume)(struct xe_exec_queue *q);
 	/** @reset_status: check exec queue reset status */
 	bool (*reset_status)(struct xe_exec_queue *q);
+	/** @active: check exec queue is active */
+	bool (*active)(struct xe_exec_queue *q);
 };
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
index 769d05517f93..07574ff6d53f 100644
--- a/drivers/gpu/drm/xe/xe_execlist.c
+++ b/drivers/gpu/drm/xe/xe_execlist.c
@@ -469,6 +469,12 @@ static bool execlist_exec_queue_reset_status(struct xe_exec_queue *q)
 	return false;
 }
 
+static bool execlist_exec_queue_active(struct xe_exec_queue *q)
+{
+	/* NIY */
+	return false;
+}
+
 static const struct xe_exec_queue_ops execlist_exec_queue_ops = {
 	.init = execlist_exec_queue_init,
 	.kill = execlist_exec_queue_kill,
@@ -481,6 +487,7 @@ static const struct xe_exec_queue_ops execlist_exec_queue_ops = {
 	.suspend_wait = execlist_exec_queue_suspend_wait,
 	.resume = execlist_exec_queue_resume,
 	.reset_status = execlist_exec_queue_reset_status,
+	.active = execlist_exec_queue_active,
 };
 
 int xe_execlist_init(struct xe_gt *gt)
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index d4ffdb71ef3d..e0126ab12ac2 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1919,6 +1919,11 @@ static bool guc_exec_queue_reset_status(struct xe_exec_queue *q)
 	return exec_queue_reset(q) || exec_queue_killed_or_banned_or_wedged(q);
 }
 
+static bool guc_exec_queue_active(struct xe_exec_queue *q)
+{
+	return exec_queue_enabled(q) && !exec_queue_pending_disable(q);
+}
+
 /*
  * All of these functions are an abstraction layer which other parts of Xe can
  * use to trap into the GuC backend. All of these functions, aside from init,
@@ -1937,6 +1942,7 @@ static const struct xe_exec_queue_ops guc_exec_queue_ops = {
 	.suspend_wait = guc_exec_queue_suspend_wait,
 	.resume = guc_exec_queue_resume,
 	.reset_status = guc_exec_queue_reset_status,
+	.active = guc_exec_queue_active,
 };
 
 static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 11/12] drm/xe: Add context-based invalidation to GuC TLB invalidation backend
  2025-11-04 19:56 [PATCH v2 00/12] Context based TLB invalidations Matthew Brost
                   ` (9 preceding siblings ...)
  2025-11-04 19:56 ` [PATCH v2 10/12] drm/xe: Add exec queue active vfunc Matthew Brost
@ 2025-11-04 19:56 ` Matthew Brost
  2025-11-06 21:50   ` Summers, Stuart
  2025-12-12 22:30   ` Summers, Stuart
  2025-11-04 19:56 ` [PATCH v2 12/12] drm/xe: Enable context TLB invalidations for CI Matthew Brost
  11 siblings, 2 replies; 32+ messages in thread
From: Matthew Brost @ 2025-11-04 19:56 UTC (permalink / raw)
  To: intel-xe; +Cc: stuart.summers, lucas.demarchi, matthew.d.roper

Introduce context-based invalidation support to the GuC TLB invalidation
backend. This is implemented by iterating over each exec queue per GT
within a VM, skipping inactive queues, and issuing a context-based (GuC
ID) H2G TLB invalidation. All H2G messages, except the final one, are
sent with an invalid seqno, which the G2H handler drops to ensure the
TLB invalidation fence is only signaled once all H2G messages are
completed.

A watermark mechanism is also added to switch between context-based TLB
invalidations and full device-wide invalidations, as the return on
investment for context-based invalidation diminishes when many exec
queues are mapped.

v2:
 - Fix checkpatch warnings

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h  |   2 +
 drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 122 +++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_pci.c           |   1 +
 drivers/gpu/drm/xe/xe_pci_types.h     |   1 +
 4 files changed, 124 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 145951dd95c9..ca285f4bce11 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -316,6 +316,8 @@ struct xe_device {
 		u8 has_mem_copy_instr:1;
 		/** @info.has_pxp: Device has PXP support */
 		u8 has_pxp:1;
+		/** @info.has_ctx_tlb_inval: Has context based TLB invalidations */
+		u8 has_ctx_tlb_inval:1;
 		/** @info.has_range_tlb_inval: Has range based TLB invalidations */
 		u8 has_range_tlb_inval:1;
 		/** @info.has_sriov: Supports SR-IOV */
diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
index 6978ee8edf2e..1baaf577cded 100644
--- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
+++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
@@ -6,14 +6,17 @@
 #include "abi/guc_actions_abi.h"
 
 #include "xe_device.h"
+#include "xe_exec_queue_types.h"
 #include "xe_gt_stats.h"
 #include "xe_gt_types.h"
 #include "xe_guc.h"
 #include "xe_guc_ct.h"
+#include "xe_guc_exec_queue_types.h"
 #include "xe_guc_tlb_inval.h"
 #include "xe_force_wake.h"
 #include "xe_mmio.h"
 #include "xe_tlb_inval.h"
+#include "xe_vm.h"
 
 #include "regs/xe_guc_regs.h"
 
@@ -136,10 +139,16 @@ static int send_tlb_inval_ppgtt(struct xe_guc *guc, u32 seqno, u64 start,
 {
 #define MAX_TLB_INVALIDATION_LEN	7
 	struct xe_gt *gt = guc_to_gt(guc);
+	struct xe_device *xe = guc_to_xe(guc);
 	u32 action[MAX_TLB_INVALIDATION_LEN];
 	u64 length = end - start;
 	int len = 0;
 
+	xe_gt_assert(gt, (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE &&
+			  !xe->info.has_ctx_tlb_inval) ||
+		     (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX &&
+		      xe->info.has_ctx_tlb_inval));
+
 	action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
 	action[len++] = seqno;
 	if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
@@ -176,6 +185,100 @@ static int send_tlb_inval_asid_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno,
 				    XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
 }
 
+static bool queue_mapped_in_guc(struct xe_guc *guc, struct xe_exec_queue *q)
+{
+	return q->gt == guc_to_gt(guc);
+}
+
+static int send_tlb_inval_ctx_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno,
+				    u64 start, u64 end, u32 asid)
+{
+	struct xe_guc *guc = tlb_inval->private;
+	struct xe_device *xe = guc_to_xe(guc);
+	struct xe_exec_queue *q, *last_q = NULL;
+	struct xe_vm *vm;
+	int err = 0;
+
+	lockdep_assert_held(&tlb_inval->seqno_lock);
+
+	if (xe->info.force_execlist)
+		return -ECANCELED;
+
+	vm = xe_device_asid_to_vm(xe, asid);
+	if (IS_ERR(vm))
+		return PTR_ERR(vm);
+
+	down_read(&vm->exec_queues.lock);
+
+	/*
+	 * XXX: Randomly picking a threshold for now. This will need to be
+	 * tuned based on expected UMD queue counts and performance profiling.
+	 */
+#define EXEC_QUEUE_COUNT_FULL_THRESHOLD	8
+	if (vm->exec_queues.count[guc_to_gt(guc)->info.id] >=
+	    EXEC_QUEUE_COUNT_FULL_THRESHOLD) {
+		u32 action[] = {
+			XE_GUC_ACTION_TLB_INVALIDATION,
+			seqno,
+			MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
+		};
+
+		err = send_tlb_inval(guc, action, ARRAY_SIZE(action));
+		goto err_unlock;
+	}
+#undef EXEC_QUEUE_COUNT_FULL_THRESHOLD
+
+	list_for_each_entry_reverse(q, &vm->exec_queues.list,
+				    vm_exec_queue_link)
+		if (queue_mapped_in_guc(guc, q) && q->ops->active(q)) {
+			last_q = q;
+			break;
+		}
+
+	if (!last_q) {
+		/*
+		 * We can't break fence ordering for TLB invalidation jobs, if
+		 * TLB invalidations are inflight issue a dummy invalidation to
+		 * maintain ordering. Nor can we move safely the seqno_recv when
+		 * returning -ECANCELED if TLB invalidations are in flight. Use
+		 * GGTT invalidation as dummy invalidation given ASID
+		 * invalidations are unsupported here.
+		 */
+		if (xe_tlb_inval_idle(tlb_inval))
+			err = -ECANCELED;
+		else
+			err = send_tlb_inval_ggtt(tlb_inval, seqno);
+		goto err_unlock;
+	}
+
+	list_for_each_entry(q, &vm->exec_queues.list, vm_exec_queue_link) {
+		int __seqno = last_q == q ? seqno :
+			TLB_INVALIDATION_SEQNO_INVALID;
+		u32 type = XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX;
+
+		/*
+		 * XXX: Techincally we can race here and queue can become
+		 * inactive, not ideal. The TLB invalidation will timeout in
+		 * this case unless we get GuC support to convert to NOP...
+		 */
+		if (!queue_mapped_in_guc(guc, q) || !q->ops->active(q))
+			continue;
+
+		xe_assert(xe, q->vm == vm);
+
+		err = send_tlb_inval_ppgtt(guc, __seqno, start, end,
+					   q->guc->id, type);
+		if (err)
+			goto err_unlock;
+	}
+
+err_unlock:
+	up_read(&vm->exec_queues.lock);
+	xe_vm_put(vm);
+
+	return err;
+}
+
 static bool tlb_inval_initialized(struct xe_tlb_inval *tlb_inval)
 {
 	struct xe_guc *guc = tlb_inval->private;
@@ -203,7 +306,7 @@ static long tlb_inval_timeout_delay(struct xe_tlb_inval *tlb_inval)
 	return hw_tlb_timeout + 2 * delay;
 }
 
-static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
+static const struct xe_tlb_inval_ops guc_tlb_inval_asid_ops = {
 	.all = send_tlb_inval_all,
 	.ggtt = send_tlb_inval_ggtt,
 	.ppgtt = send_tlb_inval_asid_ppgtt,
@@ -212,6 +315,15 @@ static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
 	.timeout_delay = tlb_inval_timeout_delay,
 };
 
+static const struct xe_tlb_inval_ops guc_tlb_inval_ctx_ops = {
+	.ggtt = send_tlb_inval_ggtt,
+	.all = send_tlb_inval_all,
+	.ppgtt = send_tlb_inval_ctx_ppgtt,
+	.initialized = tlb_inval_initialized,
+	.flush = tlb_inval_flush,
+	.timeout_delay = tlb_inval_timeout_delay,
+};
+
 /**
  * xe_guc_tlb_inval_init_early() - Init GuC TLB invalidation early
  * @guc: GuC object
@@ -223,8 +335,14 @@ static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
 void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
 				 struct xe_tlb_inval *tlb_inval)
 {
+	struct xe_device *xe = guc_to_xe(guc);
+
 	tlb_inval->private = guc;
-	tlb_inval->ops = &guc_tlb_inval_ops;
+
+	if (xe->info.has_ctx_tlb_inval)
+		tlb_inval->ops = &guc_tlb_inval_ctx_ops;
+	else
+		tlb_inval->ops = &guc_tlb_inval_asid_ops;
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 1959de3f7a27..9a11066c7d4a 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -863,6 +863,7 @@ static int xe_info_init(struct xe_device *xe,
 		xe->info.has_device_atomics_on_smem = 1;
 
 	xe->info.has_range_tlb_inval = graphics_desc->has_range_tlb_inval;
+	xe->info.has_ctx_tlb_inval = graphics_desc->has_ctx_tlb_inval;
 	xe->info.has_usm = graphics_desc->has_usm;
 	xe->info.has_64bit_timestamp = graphics_desc->has_64bit_timestamp;
 
diff --git a/drivers/gpu/drm/xe/xe_pci_types.h b/drivers/gpu/drm/xe/xe_pci_types.h
index 9892c063a9c5..c08857c06c7e 100644
--- a/drivers/gpu/drm/xe/xe_pci_types.h
+++ b/drivers/gpu/drm/xe/xe_pci_types.h
@@ -63,6 +63,7 @@ struct xe_graphics_desc {
 	u8 has_atomic_enable_pte_bit:1;
 	u8 has_indirect_ring_state:1;
 	u8 has_range_tlb_inval:1;
+	u8 has_ctx_tlb_inval:1;
 	u8 has_usm:1;
 	u8 has_64bit_timestamp:1;
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 12/12] drm/xe: Enable context TLB invalidations for CI
  2025-11-04 19:56 [PATCH v2 00/12] Context based TLB invalidations Matthew Brost
                   ` (10 preceding siblings ...)
  2025-11-04 19:56 ` [PATCH v2 11/12] drm/xe: Add context-based invalidation to GuC TLB invalidation backend Matthew Brost
@ 2025-11-04 19:56 ` Matthew Brost
  11 siblings, 0 replies; 32+ messages in thread
From: Matthew Brost @ 2025-11-04 19:56 UTC (permalink / raw)
  To: intel-xe; +Cc: stuart.summers, lucas.demarchi, matthew.d.roper

Do not review, CI only.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 9a11066c7d4a..7d643233a6fa 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -95,6 +95,7 @@ static const struct xe_graphics_desc graphics_xelpg = {
 	.has_asid = 1, \
 	.has_atomic_enable_pte_bit = 1, \
 	.has_range_tlb_inval = 1, \
+	.has_ctx_tlb_inval = 1, \
 	.has_usm = 1, \
 	.has_64bit_timestamp = 1, \
 	.hw_engine_mask = \
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/12] drm/xe: Add normalize_invalidation_range
  2025-11-04 19:56 ` [PATCH v2 01/12] drm/xe: Add normalize_invalidation_range Matthew Brost
@ 2025-11-06 20:03   ` Summers, Stuart
  0 siblings, 0 replies; 32+ messages in thread
From: Summers, Stuart @ 2025-11-06 20:03 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Roper, Matthew D, De Marchi, Lucas

On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> Extract the code that determines the alignment of TLB invalidation
> into
> a helper function — normalize_invalidation_range. This will be useful
> when adding context-based invalidations to the GuC TLB invalidation
> backend.
> 
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Stuart Summers <stuart.summers@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 71 +++++++++++++------------
> --
>  1 file changed, 35 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> index a80175c7c478..61bfa0d485f6 100644
> --- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> @@ -92,6 +92,38 @@ static int send_tlb_inval_ggtt(struct xe_tlb_inval
> *tlb_inval, u32 seqno)
>         return -ECANCELED;
>  }
>  
> +static u64 normalize_invalidation_range(struct xe_gt *gt, u64
> *start, u64 *end)
> +{
> +       u64 orig_start = *start;
> +       u64 length = *end - *start;
> +       u64 align;
> +
> +       if (length < SZ_4K)
> +               length = SZ_4K;
> +
> +       align = roundup_pow_of_two(length);
> +       *start = ALIGN_DOWN(*start, align);
> +       *end = ALIGN(*end, align);
> +       length = align;
> +       while (*start + length < *end) {
> +               length <<= 1;
> +               *start = ALIGN_DOWN(orig_start, length);
> +       }
> +
> +       if (length >= SZ_2M) {
> +               length = max_t(u64, SZ_16M, length);
> +               *start = ALIGN_DOWN(orig_start, length);
> +       }
> +
> +       xe_gt_assert(gt, length >= SZ_4K);
> +       xe_gt_assert(gt, is_power_of_2(length));
> +       xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1,
> +                                           ilog2(SZ_2M) + 1)));
> +       xe_gt_assert(gt, IS_ALIGNED(*start, length));
> +
> +       return length;
> +}
> +
>  /*
>   * Ensure that roundup_pow_of_two(length) doesn't overflow.
>   * Note that roundup_pow_of_two() operates on unsigned long,
> @@ -118,47 +150,14 @@ static int send_tlb_inval_ppgtt(struct
> xe_tlb_inval *tlb_inval, u32 seqno,
>             length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
>                 action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
>         } else {
> -               u64 orig_start = start;
> -               u64 align;
> -
> -               if (length < SZ_4K)
> -                       length = SZ_4K;
> -
> -               /*
> -                * We need to invalidate a higher granularity if
> start address
> -                * is not aligned to length. When start is not
> aligned with
> -                * length we need to find the length large enough to
> create an
> -                * address mask covering the required range.
> -                */
> -               align = roundup_pow_of_two(length);
> -               start = ALIGN_DOWN(start, align);
> -               end = ALIGN(end, align);
> -               length = align;
> -               while (start + length < end) {
> -                       length <<= 1;
> -                       start = ALIGN_DOWN(orig_start, length);
> -               }
> -
> -               /*
> -                * Minimum invalidation size for a 2MB page that the
> hardware
> -                * expects is 16MB
> -                */
> -               if (length >= SZ_2M) {
> -                       length = max_t(u64, SZ_16M, length);
> -                       start = ALIGN_DOWN(orig_start, length);
> -               }
> -
> -               xe_gt_assert(gt, length >= SZ_4K);
> -               xe_gt_assert(gt, is_power_of_2(length));
> -               xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) -
> 1,
> -                                                   ilog2(SZ_2M) +
> 1)));
> -               xe_gt_assert(gt, IS_ALIGNED(start, length));
> +               u64 normalize_len = normalize_invalidation_range(gt,
> &start,
> +                                                               
> &end);
>  
>                 action[len++] =
> MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
>                 action[len++] = asid;
>                 action[len++] = lower_32_bits(start);
>                 action[len++] = upper_32_bits(start);
> -               action[len++] = ilog2(length) - ilog2(SZ_4K);
> +               action[len++] = ilog2(normalize_len) - ilog2(SZ_4K);
>         }
>  
>         xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 07/12] drm/xe: Rename send_tlb_inval_ppgtt to send_tlb_inval_asid_ppgtt
  2025-11-04 19:56 ` [PATCH v2 07/12] drm/xe: Rename send_tlb_inval_ppgtt to send_tlb_inval_asid_ppgtt Matthew Brost
@ 2025-11-06 20:22   ` Summers, Stuart
  0 siblings, 0 replies; 32+ messages in thread
From: Summers, Stuart @ 2025-11-06 20:22 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Roper, Matthew D, De Marchi, Lucas

On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> Context-based TLB invalidations have their own set of GuC TLB
> invalidation operations. Rename the current PPGTT invalidation
> function,
> which operates on ASIDs, to a more descriptive name that reflects its
> purpose.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Stuart Summers <stuart.summers@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> index 995789f0d31f..42e9fbd062ba 100644
> --- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> @@ -131,8 +131,8 @@ static u64 normalize_invalidation_range(struct
> xe_gt *gt, u64 *start, u64 *end)
>   */
>  #define MAX_RANGE_TLB_INVALIDATION_LENGTH
> (rounddown_pow_of_two(ULONG_MAX))
>  
> -static int send_tlb_inval_ppgtt(struct xe_tlb_inval *tlb_inval, u32
> seqno,
> -                               u64 start, u64 end, u32 asid)
> +static int send_tlb_inval_asid_ppgtt(struct xe_tlb_inval *tlb_inval,
> u32 seqno,
> +                                    u64 start, u64 end, u32 asid)
>  {
>  #define MAX_TLB_INVALIDATION_LEN       7
>         struct xe_guc *guc = tlb_inval->private;
> @@ -195,7 +195,7 @@ static long tlb_inval_timeout_delay(struct
> xe_tlb_inval *tlb_inval)
>  static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
>         .all = send_tlb_inval_all,
>         .ggtt = send_tlb_inval_ggtt,
> -       .ppgtt = send_tlb_inval_ppgtt,
> +       .ppgtt = send_tlb_inval_asid_ppgtt,
>         .initialized = tlb_inval_initialized,
>         .flush = tlb_inval_flush,
>         .timeout_delay = tlb_inval_timeout_delay,


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 08/12] drm/xe: Add send_tlb_inval_ppgtt helper
  2025-11-04 19:56 ` [PATCH v2 08/12] drm/xe: Add send_tlb_inval_ppgtt helper Matthew Brost
@ 2025-11-06 20:25   ` Summers, Stuart
  0 siblings, 0 replies; 32+ messages in thread
From: Summers, Stuart @ 2025-11-06 20:25 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Roper, Matthew D, De Marchi, Lucas

On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> Extract the common code that issues a TLB invalidation H2G for PPGTTs
> into a helper function. This helper can be reused for both ASID-based
> and context-based TLB invalidations.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Stuart Summers <stuart.summers@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 27 +++++++++++++++++++------
> --
>  1 file changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> index 42e9fbd062ba..6978ee8edf2e 100644
> --- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> @@ -131,19 +131,15 @@ static u64 normalize_invalidation_range(struct
> xe_gt *gt, u64 *start, u64 *end)
>   */
>  #define MAX_RANGE_TLB_INVALIDATION_LENGTH
> (rounddown_pow_of_two(ULONG_MAX))
>  
> -static int send_tlb_inval_asid_ppgtt(struct xe_tlb_inval *tlb_inval,
> u32 seqno,
> -                                    u64 start, u64 end, u32 asid)
> +static int send_tlb_inval_ppgtt(struct xe_guc *guc, u32 seqno, u64
> start,
> +                               u64 end, u32 id, u32 type)
>  {
>  #define MAX_TLB_INVALIDATION_LEN       7
> -       struct xe_guc *guc = tlb_inval->private;
>         struct xe_gt *gt = guc_to_gt(guc);
>         u32 action[MAX_TLB_INVALIDATION_LEN];
>         u64 length = end - start;
>         int len = 0;
>  
> -       if (guc_to_xe(guc)->info.force_execlist)
> -               return -ECANCELED;
> -
>         action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
>         action[len++] = seqno;
>         if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> @@ -153,18 +149,33 @@ static int send_tlb_inval_asid_ppgtt(struct
> xe_tlb_inval *tlb_inval, u32 seqno,
>                 u64 normalize_len = normalize_invalidation_range(gt,
> &start,
>                                                                 
> &end);
>  
> -               action[len++] =
> MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> -               action[len++] = asid;
> +               action[len++] = MAKE_INVAL_OP(type);
> +               action[len++] = id;
>                 action[len++] = lower_32_bits(start);
>                 action[len++] = upper_32_bits(start);
>                 action[len++] = ilog2(normalize_len) - ilog2(SZ_4K);
>         }
>  
>         xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> +#undef MAX_TLB_INVALIDATION_LEN
>  
>         return send_tlb_inval(guc, action, len);
>  }
>  
> +static int send_tlb_inval_asid_ppgtt(struct xe_tlb_inval *tlb_inval,
> u32 seqno,
> +                                    u64 start, u64 end, u32 asid)
> +{
> +       struct xe_guc *guc = tlb_inval->private;
> +
> +       lockdep_assert_held(&tlb_inval->seqno_lock);
> +
> +       if (guc_to_xe(guc)->info.force_execlist)
> +               return -ECANCELED;
> +
> +       return send_tlb_inval_ppgtt(guc, seqno, start, end, asid,
> +                                   XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> +}
> +
>  static bool tlb_inval_initialized(struct xe_tlb_inval *tlb_inval)
>  {
>         struct xe_guc *guc = tlb_inval->private;


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 11/12] drm/xe: Add context-based invalidation to GuC TLB invalidation backend
  2025-11-04 19:56 ` [PATCH v2 11/12] drm/xe: Add context-based invalidation to GuC TLB invalidation backend Matthew Brost
@ 2025-11-06 21:50   ` Summers, Stuart
  2025-11-07  7:01     ` Matthew Brost
  2025-12-12 22:30   ` Summers, Stuart
  1 sibling, 1 reply; 32+ messages in thread
From: Summers, Stuart @ 2025-11-06 21:50 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Roper, Matthew D, De Marchi, Lucas

On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> Introduce context-based invalidation support to the GuC TLB
> invalidation
> backend. This is implemented by iterating over each exec queue per GT
> within a VM, skipping inactive queues, and issuing a context-based
> (GuC
> ID) H2G TLB invalidation. All H2G messages, except the final one, are
> sent with an invalid seqno, which the G2H handler drops to ensure the
> TLB invalidation fence is only signaled once all H2G messages are
> completed.
> 
> A watermark mechanism is also added to switch between context-based
> TLB
> invalidations and full device-wide invalidations, as the return on
> investment for context-based invalidation diminishes when many exec
> queues are mapped.
> 
> v2:
>  - Fix checkpatch warnings
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device_types.h  |   2 +
>  drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 122
> +++++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_pci.c           |   1 +
>  drivers/gpu/drm/xe/xe_pci_types.h     |   1 +
>  4 files changed, 124 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> b/drivers/gpu/drm/xe/xe_device_types.h
> index 145951dd95c9..ca285f4bce11 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -316,6 +316,8 @@ struct xe_device {
>                 u8 has_mem_copy_instr:1;
>                 /** @info.has_pxp: Device has PXP support */
>                 u8 has_pxp:1;
> +               /** @info.has_ctx_tlb_inval: Has context based TLB
> invalidations */
> +               u8 has_ctx_tlb_inval:1;
>                 /** @info.has_range_tlb_inval: Has range based TLB
> invalidations */
>                 u8 has_range_tlb_inval:1;
>                 /** @info.has_sriov: Supports SR-IOV */
> diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> index 6978ee8edf2e..1baaf577cded 100644
> --- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> @@ -6,14 +6,17 @@
>  #include "abi/guc_actions_abi.h"
>  
>  #include "xe_device.h"
> +#include "xe_exec_queue_types.h"
>  #include "xe_gt_stats.h"
>  #include "xe_gt_types.h"
>  #include "xe_guc.h"
>  #include "xe_guc_ct.h"
> +#include "xe_guc_exec_queue_types.h"
>  #include "xe_guc_tlb_inval.h"
>  #include "xe_force_wake.h"
>  #include "xe_mmio.h"
>  #include "xe_tlb_inval.h"
> +#include "xe_vm.h"
>  
>  #include "regs/xe_guc_regs.h"
>  
> @@ -136,10 +139,16 @@ static int send_tlb_inval_ppgtt(struct xe_guc
> *guc, u32 seqno, u64 start,
>  {
>  #define MAX_TLB_INVALIDATION_LEN       7
>         struct xe_gt *gt = guc_to_gt(guc);
> +       struct xe_device *xe = guc_to_xe(guc);
>         u32 action[MAX_TLB_INVALIDATION_LEN];
>         u64 length = end - start;
>         int len = 0;
>  
> +       xe_gt_assert(gt, (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE &&
> +                         !xe->info.has_ctx_tlb_inval) ||
> +                    (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX &&
> +                     xe->info.has_ctx_tlb_inval));
> +
>         action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
>         action[len++] = seqno;
>         if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> @@ -176,6 +185,100 @@ static int send_tlb_inval_asid_ppgtt(struct
> xe_tlb_inval *tlb_inval, u32 seqno,
>                                     XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
>  }
>  
> +static bool queue_mapped_in_guc(struct xe_guc *guc, struct
> xe_exec_queue *q)
> +{
> +       return q->gt == guc_to_gt(guc);
> +}
> +
> +static int send_tlb_inval_ctx_ppgtt(struct xe_tlb_inval *tlb_inval,
> u32 seqno,
> +                                   u64 start, u64 end, u32 asid)
> +{
> +       struct xe_guc *guc = tlb_inval->private;
> +       struct xe_device *xe = guc_to_xe(guc);
> +       struct xe_exec_queue *q, *last_q = NULL;
> +       struct xe_vm *vm;
> +       int err = 0;
> +
> +       lockdep_assert_held(&tlb_inval->seqno_lock);
> +
> +       if (xe->info.force_execlist)
> +               return -ECANCELED;
> +
> +       vm = xe_device_asid_to_vm(xe, asid);
> +       if (IS_ERR(vm))
> +               return PTR_ERR(vm);
> +
> +       down_read(&vm->exec_queues.lock);
> +
> +       /*
> +        * XXX: Randomly picking a threshold for now. This will need
> to be
> +        * tuned based on expected UMD queue counts and performance
> profiling.
> +        */
> +#define EXEC_QUEUE_COUNT_FULL_THRESHOLD        8

Does seem interesting for this to be configurable. Maybe different
applications have different requirements here... But also we should
have some kind of default. No issue with what you 

> +       if (vm->exec_queues.count[guc_to_gt(guc)->info.id] >=
> +           EXEC_QUEUE_COUNT_FULL_THRESHOLD) {
> +               u32 action[] = {
> +                       XE_GUC_ACTION_TLB_INVALIDATION,
> +                       seqno,
> +                       MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> +               };
> +
> +               err = send_tlb_inval(guc, action,
> ARRAY_SIZE(action));
> +               goto err_unlock;
> +       }
> +#undef EXEC_QUEUE_COUNT_FULL_THRESHOLD
> +
> +       list_for_each_entry_reverse(q, &vm->exec_queues.list,
> +                                   vm_exec_queue_link)

Braces here around the if() {}. Otherwise it's too easy to accidentally
add lines below the if condition with unintended behavior.

> +               if (queue_mapped_in_guc(guc, q) && q->ops->active(q))
> {
> +                       last_q = q;
> +                       break;
> +               }
> +
> +       if (!last_q) {
> +               /*
> +                * We can't break fence ordering for TLB invalidation
> jobs, if
> +                * TLB invalidations are inflight issue a dummy
> invalidation to
> +                * maintain ordering. Nor can we move safely the
> seqno_recv when
> +                * returning -ECANCELED if TLB invalidations are in
> flight. Use
> +                * GGTT invalidation as dummy invalidation given ASID
> +                * invalidations are unsupported here.
> +                */
> +               if (xe_tlb_inval_idle(tlb_inval))
> +                       err = -ECANCELED;
> +               else
> +                       err = send_tlb_inval_ggtt(tlb_inval, seqno);
> +               goto err_unlock;
> +       }
> +
> +       list_for_each_entry(q, &vm->exec_queues.list,
> vm_exec_queue_link) {
> +               int __seqno = last_q == q ? seqno :
> +                       TLB_INVALIDATION_SEQNO_INVALID;
> +               u32 type = XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX;
> +
> +               /*
> +                * XXX: Techincally we can race here and queue can
> become
> +                * inactive, not ideal. The TLB invalidation will
> timeout in
> +                * this case unless we get GuC support to convert to
> NOP...

We actually can't support this. It won't just time out, GuC will return
an error (invalid parameter that the context isn't registered and we're
trying to submit something via context invalidation - nothing actually
gets sent to hardware) and we will issue a GT reset. We should make
sure when we send this, the context is registered and any
deregistration explicitly happens afterwards.

This is why generally we want to do this add/remove not at the queue
level but at the guc registration/deregistration level to guarantee
this. But I see you split the queue add and the submission based on
this active flag. Right now I believe we're tracking "pending
deregistration" with exec_queue_destroyed && exec_queue_registered,
although it might be nice to have an explicit state there.

Alternatively we could also capture that error response from GuC,
although there are a few different places we can get an error there and
we might not want to overload the CT error handler in the event there
is a legitimate state or hardware error that requires a guc reload/GT
reset.

Thanks,
Stuart

> +                */
> +               if (!queue_mapped_in_guc(guc, q) || !q->ops-
> >active(q))
> +                       continue;
> +
> +               xe_assert(xe, q->vm == vm);
> +
> +               err = send_tlb_inval_ppgtt(guc, __seqno, start, end,
> +                                          q->guc->id, type);
> +               if (err)
> +                       goto err_unlock;
> +       }
> +
> +err_unlock:
> +       up_read(&vm->exec_queues.lock);
> +       xe_vm_put(vm);
> +
> +       return err;
> +}
> +
>  static bool tlb_inval_initialized(struct xe_tlb_inval *tlb_inval)
>  {
>         struct xe_guc *guc = tlb_inval->private;
> @@ -203,7 +306,7 @@ static long tlb_inval_timeout_delay(struct
> xe_tlb_inval *tlb_inval)
>         return hw_tlb_timeout + 2 * delay;
>  }
>  
> -static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
> +static const struct xe_tlb_inval_ops guc_tlb_inval_asid_ops = {
>         .all = send_tlb_inval_all,
>         .ggtt = send_tlb_inval_ggtt,
>         .ppgtt = send_tlb_inval_asid_ppgtt,
> @@ -212,6 +315,15 @@ static const struct xe_tlb_inval_ops
> guc_tlb_inval_ops = {
>         .timeout_delay = tlb_inval_timeout_delay,
>  };
>  
> +static const struct xe_tlb_inval_ops guc_tlb_inval_ctx_ops = {
> +       .ggtt = send_tlb_inval_ggtt,
> +       .all = send_tlb_inval_all,
> +       .ppgtt = send_tlb_inval_ctx_ppgtt,
> +       .initialized = tlb_inval_initialized,
> +       .flush = tlb_inval_flush,
> +       .timeout_delay = tlb_inval_timeout_delay,
> +};
> +
>  /**
>   * xe_guc_tlb_inval_init_early() - Init GuC TLB invalidation early
>   * @guc: GuC object
> @@ -223,8 +335,14 @@ static const struct xe_tlb_inval_ops
> guc_tlb_inval_ops = {
>  void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
>                                  struct xe_tlb_inval *tlb_inval)
>  {
> +       struct xe_device *xe = guc_to_xe(guc);
> +
>         tlb_inval->private = guc;
> -       tlb_inval->ops = &guc_tlb_inval_ops;
> +
> +       if (xe->info.has_ctx_tlb_inval)
> +               tlb_inval->ops = &guc_tlb_inval_ctx_ops;
> +       else
> +               tlb_inval->ops = &guc_tlb_inval_asid_ops;
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/xe/xe_pci.c
> b/drivers/gpu/drm/xe/xe_pci.c
> index 1959de3f7a27..9a11066c7d4a 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -863,6 +863,7 @@ static int xe_info_init(struct xe_device *xe,
>                 xe->info.has_device_atomics_on_smem = 1;
>  
>         xe->info.has_range_tlb_inval = graphics_desc-
> >has_range_tlb_inval;
> +       xe->info.has_ctx_tlb_inval = graphics_desc-
> >has_ctx_tlb_inval;
>         xe->info.has_usm = graphics_desc->has_usm;
>         xe->info.has_64bit_timestamp = graphics_desc-
> >has_64bit_timestamp;
>  
> diff --git a/drivers/gpu/drm/xe/xe_pci_types.h
> b/drivers/gpu/drm/xe/xe_pci_types.h
> index 9892c063a9c5..c08857c06c7e 100644
> --- a/drivers/gpu/drm/xe/xe_pci_types.h
> +++ b/drivers/gpu/drm/xe/xe_pci_types.h
> @@ -63,6 +63,7 @@ struct xe_graphics_desc {
>         u8 has_atomic_enable_pte_bit:1;
>         u8 has_indirect_ring_state:1;
>         u8 has_range_tlb_inval:1;
> +       u8 has_ctx_tlb_inval:1;
>         u8 has_usm:1;
>         u8 has_64bit_timestamp:1;
>  };


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 06/12] drm/xe: Do not forward invalid TLB invalidation seqnos to upper layers
  2025-11-04 19:56 ` [PATCH v2 06/12] drm/xe: Do not forward invalid TLB invalidation seqnos to upper layers Matthew Brost
@ 2025-11-06 22:05   ` Summers, Stuart
  0 siblings, 0 replies; 32+ messages in thread
From: Summers, Stuart @ 2025-11-06 22:05 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Roper, Matthew D, De Marchi, Lucas

On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> Context-based TLB invalidations send multiple H2G messages per seqno,
> with only the final H2G containing a valid seqno — the others carry
> an
> invalid seqno. The G2H handler drops these invalid seqnos to avoid
> prematurely signaling a TLB invalidation fence.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_guc_tlb_inval.c   | 3 ++-
>  drivers/gpu/drm/xe/xe_tlb_inval_types.h | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> index 61bfa0d485f6..995789f0d31f 100644
> --- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> @@ -235,7 +235,8 @@ int xe_guc_tlb_inval_done_handler(struct xe_guc
> *guc, u32 *msg, u32 len)
>         if (unlikely(len != 1))
>                 return -EPROTO;
>  
> -       xe_tlb_inval_done_handler(&gt->tlb_inval, msg[0]);
> +       if (msg[0] != TLB_INVALIDATION_SEQNO_INVALID)
> +               xe_tlb_inval_done_handler(&gt->tlb_inval, msg[0]);

Honestly I feel like this should be included as part of the patch that
actually uses it. No problem with the actual contents though.

Thanks,
Stuart

>  
>         return 0;
>  }
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> index 8f8b060e9005..7a6967ce3b76 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> @@ -80,6 +80,7 @@ struct xe_tlb_inval {
>         const struct xe_tlb_inval_ops *ops;
>         /** @tlb_inval.seqno: TLB invalidation seqno, protected by CT
> lock */
>  #define TLB_INVALIDATION_SEQNO_MAX     0x100000
> +#define TLB_INVALIDATION_SEQNO_INVALID TLB_INVALIDATION_SEQNO_MAX
>         int seqno;
>         /** @tlb_invalidation.seqno_lock: protects
> @tlb_invalidation.seqno */
>         struct mutex seqno_lock;


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/12] drm/xe: Make usm.asid_to_vm allocation use GFP_NOWAIT
  2025-11-04 19:56 ` [PATCH v2 02/12] drm/xe: Make usm.asid_to_vm allocation use GFP_NOWAIT Matthew Brost
@ 2025-11-06 22:08   ` Summers, Stuart
  2025-11-06 22:13   ` Summers, Stuart
  1 sibling, 0 replies; 32+ messages in thread
From: Summers, Stuart @ 2025-11-06 22:08 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Roper, Matthew D, De Marchi, Lucas

On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> Ensure the asid_to_vm lookup is reclaim-safe so it can be performed
> during TLB invalidations, which is necessary for context-based TLB
> invalidation support.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Stuart Summers <stuart.summers@intel.com>

Although best to merge this with the full series of course...

> ---
>  drivers/gpu/drm/xe/xe_vm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 00f3520dec38..84f4c8f1be33 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1632,7 +1632,7 @@ struct xe_vm *xe_vm_create(struct xe_device
> *xe, u32 flags, struct xe_file *xef)
>                 down_write(&xe->usm.lock);
>                 err = xa_alloc_cyclic(&xe->usm.asid_to_vm, &asid, vm,
>                                       XA_LIMIT(1, XE_MAX_ASID - 1),
> -                                     &xe->usm.next_asid,
> GFP_KERNEL);
> +                                     &xe->usm.next_asid,
> GFP_NOWAIT);
>                 up_write(&xe->usm.lock);
>                 if (err < 0)
>                         goto err_close;


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/12] drm/xe: Make usm.asid_to_vm allocation use GFP_NOWAIT
  2025-11-04 19:56 ` [PATCH v2 02/12] drm/xe: Make usm.asid_to_vm allocation use GFP_NOWAIT Matthew Brost
  2025-11-06 22:08   ` Summers, Stuart
@ 2025-11-06 22:13   ` Summers, Stuart
  1 sibling, 0 replies; 32+ messages in thread
From: Summers, Stuart @ 2025-11-06 22:13 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Roper, Matthew D, De Marchi, Lucas

On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> Ensure the asid_to_vm lookup is reclaim-safe so it can be performed
> during TLB invalidations, which is necessary for context-based TLB
> invalidation support.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_vm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 00f3520dec38..84f4c8f1be33 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1632,7 +1632,7 @@ struct xe_vm *xe_vm_create(struct xe_device
> *xe, u32 flags, struct xe_file *xef)
>                 down_write(&xe->usm.lock);
>                 err = xa_alloc_cyclic(&xe->usm.asid_to_vm, &asid, vm,
>                                       XA_LIMIT(1, XE_MAX_ASID - 1),
> -                                     &xe->usm.next_asid,
> GFP_KERNEL);
> +                                     &xe->usm.next_asid,
> GFP_NOWAIT);

Looks like there's also a for-ci usage of this in xe_device.c that
attempts to force an asid_to_vm wrap. Should we make the same change
there?

Thanks,
Stuart

>                 up_write(&xe->usm.lock);
>                 if (err < 0)
>                         goto err_close;


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 04/12] drm/xe: Add vm to exec queues association
  2025-11-04 19:56 ` [PATCH v2 04/12] drm/xe: Add vm to exec queues association Matthew Brost
@ 2025-11-06 22:15   ` Summers, Stuart
  2025-12-12 21:03   ` Summers, Stuart
  1 sibling, 0 replies; 32+ messages in thread
From: Summers, Stuart @ 2025-11-06 22:15 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Roper, Matthew D, De Marchi, Lucas

On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> Maintain a list of exec queues per vm which will be used by TLB
> invalidation code to do context-ID based tlb invalidations.

I had mentioned this in a subsequent patch, but just to confirm here..
if we're going to do this in the exec queue layer, we need to make sure
the actual usage of this ensures registration/deregistration
compliance. The series as it stands today does not.

Thanks,
Stuart

> 
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device.h           |  7 ----
>  drivers/gpu/drm/xe/xe_device_types.h     |  7 ++++
>  drivers/gpu/drm/xe/xe_exec_queue.c       |  7 +++-
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  3 ++
>  drivers/gpu/drm/xe/xe_vm.c               | 47
> ++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_vm.h               |  3 ++
>  drivers/gpu/drm/xe/xe_vm_types.h         | 13 +++++++
>  7 files changed, 79 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.h
> b/drivers/gpu/drm/xe/xe_device.h
> index 538202eebc16..764f24f4adfc 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -62,13 +62,6 @@ static inline struct xe_tile
> *xe_device_get_root_tile(struct xe_device *xe)
>         return &xe->tiles[0];
>  }
>  
> -/*
> - * Highest GT/tile count for any platform.  Used only for memory
> allocation
> - * sizing.  Any logic looping over GTs or mapping userspace GT IDs
> into GT
> - * structures should use the per-platform xe->info.max_gt_per_tile
> instead.
> - */
> -#define XE_MAX_GT_PER_TILE 2
> -
>  static inline struct xe_gt *xe_device_get_gt(struct xe_device *xe,
> u8 gt_id)
>  {
>         struct xe_tile *tile;
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> b/drivers/gpu/drm/xe/xe_device_types.h
> index af0ce275b032..145951dd95c9 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -57,6 +57,13 @@ struct xe_vram_region;
>  #define XE_GT1         1
>  #define XE_MAX_TILES_PER_DEVICE        (XE_GT1 + 1)
>  
> +/*
> + * Highest GT/tile count for any platform.  Used only for memory
> allocation
> + * sizing.  Any logic looping over GTs or mapping userspace GT IDs
> into GT
> + * structures should use the per-platform xe->info.max_gt_per_tile
> instead.
> + */
> +#define XE_MAX_GT_PER_TILE 2
> +
>  #define XE_MAX_ASID    (BIT(20))
>  
>  #define IS_PLATFORM_STEP(_xe, _platform, min_step, max_step)   \
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 1b57d7c2cc94..49822baf5967 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -72,8 +72,10 @@ static void __xe_exec_queue_free(struct
> xe_exec_queue *q)
>  
>         if (xe_exec_queue_uses_pxp(q))
>                 xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
> -       if (q->vm)
> +       if (q->vm) {
> +               xe_vm_remove_exec_queue(q->vm, q);
>                 xe_vm_put(q->vm);
> +       }
>  
>         if (q->xef)
>                 xe_file_put(q->xef);
> @@ -143,6 +145,7 @@ static struct xe_exec_queue
> *__xe_exec_queue_alloc(struct xe_device *xe,
>         q->ring_ops = gt->ring_ops[hwe->class];
>         q->ops = gt->exec_queue_ops;
>         INIT_LIST_HEAD(&q->lr.link);
> +       INIT_LIST_HEAD(&q->vm_exec_queue_link);
>         INIT_LIST_HEAD(&q->multi_gt_link);
>         INIT_LIST_HEAD(&q->hw_engine_group_link);
>         INIT_LIST_HEAD(&q->pxp.link);
> @@ -796,6 +799,8 @@ int xe_exec_queue_create_ioctl(struct drm_device
> *dev, void *data,
>         }
>  
>         q->xef = xe_file_get(xef);
> +       if (eci[0].engine_class != DRM_XE_ENGINE_CLASS_VM_BIND)
> +               xe_vm_add_exec_queue(vm, q);
>  
>         /* user id alloc must always be last in ioctl to prevent UAF
> */
>         err = xa_alloc(&xef->exec_queue.xa, &id, q, xa_limit_32b,
> GFP_KERNEL);
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index c8807268ec6c..a2281fcb55b1 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -147,6 +147,9 @@ struct xe_exec_queue {
>                 struct xe_dep_scheduler *dep_scheduler;
>         } tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
>  
> +       /** @vm_exec_queue_link: Link to track exec queue within a
> VM's list of exec queues. */
> +       struct list_head vm_exec_queue_link;
> +
>         /** @pxp: PXP info tracking */
>         struct {
>                 /** @pxp.type: PXP session type used by this queue */
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 84f4c8f1be33..cccdd931dd5e 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1507,8 +1507,20 @@ struct xe_vm *xe_vm_create(struct xe_device
> *xe, u32 flags, struct xe_file *xef)
>         INIT_WORK(&vm->destroy_work, vm_destroy_work_func);
>  
>         INIT_LIST_HEAD(&vm->preempt.exec_queues);
> +       INIT_LIST_HEAD(&vm->exec_queues.list);
>         vm->preempt.min_run_period_ms = 10;     /* FIXME: Wire up to
> uAPI */
>  
> +       init_rwsem(&vm->exec_queues.lock);
> +       if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
> +               fs_reclaim_acquire(GFP_KERNEL);
> +               might_lock(&vm->exec_queues.lock);
> +               fs_reclaim_release(GFP_KERNEL);
> +
> +               down_read(&vm->exec_queues.lock);
> +               might_lock(&xe_root_mmio_gt(xe)->uc.guc.ct.lock);
> +               up_read(&vm->exec_queues.lock);
> +       }
> +
>         for_each_tile(tile, xe, id)
>                 xe_range_fence_tree_init(&vm->rftree[id]);
>  
> @@ -4387,3 +4399,38 @@ int xe_vm_alloc_cpu_addr_mirror_vma(struct
> xe_vm *vm, uint64_t start, uint64_t r
>  
>         return xe_vm_alloc_vma(vm, &map_req, false);
>  }
> +
> +/**
> + * xe_vm_add_exec_queue() - Add exec queue to VM
> + * @vm: The VM.
> + * @q: The exec_queue
> + */
> +void xe_vm_add_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
> +{
> +       /* User VMs and queues only */
> +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_KERNEL));
> +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_PERMANENT));
> +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_VM));
> +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_MIGRATE));
> +       xe_assert(vm->xe, vm->xef);
> +
> +       down_write(&vm->exec_queues.lock);
> +       list_add(&q->vm_exec_queue_link, &vm->exec_queues.list);
> +       ++vm->exec_queues.count[q->gt->info.id];
> +       up_write(&vm->exec_queues.lock);
> +}
> +
> +/**
> + * xe_vm_remove_exec_queue() - Remove exec queue from VM
> + * @vm: The VM.
> + * @q: The exec_queue
> + */
> +void xe_vm_remove_exec_queue(struct xe_vm *vm, struct xe_exec_queue
> *q)
> +{
> +       down_write(&vm->exec_queues.lock);
> +       if (!list_empty(&q->vm_exec_queue_link)) {
> +               list_del(&q->vm_exec_queue_link);
> +               --vm->exec_queues.count[q->gt->info.id];
> +       }
> +       up_write(&vm->exec_queues.lock);
> +}
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index ef8a5019574e..5f3341ef99d2 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -284,6 +284,9 @@ static inline struct dma_resv *xe_vm_resv(struct
> xe_vm *vm)
>  
>  void xe_vm_kill(struct xe_vm *vm, bool unlocked);
>  
> +void xe_vm_add_exec_queue(struct xe_vm *vm, struct xe_exec_queue
> *q);
> +void xe_vm_remove_exec_queue(struct xe_vm *vm, struct xe_exec_queue
> *q);
> +
>  /**
>   * xe_vm_assert_held(vm) - Assert that the vm's reservation object
> is held.
>   * @vm: The vm
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> b/drivers/gpu/drm/xe/xe_vm_types.h
> index 830ed7b05c27..180b48d62480 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -290,6 +290,19 @@ struct xe_vm {
>                 struct list_head pm_activate_link;
>         } preempt;
>  
> +       /** @exec_queues: Manages list of exec queues attached to
> this VM, protected by lock. */
> +       struct {
> +               /** @exec_queues.list: list of exec queues attached
> to this VM */
> +               struct list_head list;
> +               /**
> +                * @exec_queues.count: count of exec queues attached
> to this VM,
> +                * per GT
> +                */
> +               int count[XE_MAX_TILES_PER_DEVICE *
> XE_MAX_GT_PER_TILE];
> +               /** @exec_queues.lock: lock to protect exec_queues
> list */
> +               struct rw_semaphore lock;
> +       } exec_queues;
> +
>         /** @um: unified memory state */
>         struct {
>                 /** @asid: address space ID, unique to each VM */


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 11/12] drm/xe: Add context-based invalidation to GuC TLB invalidation backend
  2025-11-06 21:50   ` Summers, Stuart
@ 2025-11-07  7:01     ` Matthew Brost
  2025-11-10 19:29       ` Summers, Stuart
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Brost @ 2025-11-07  7:01 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, Roper,  Matthew D,
	De Marchi, Lucas

On Thu, Nov 06, 2025 at 02:50:45PM -0700, Summers, Stuart wrote:
> On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> > Introduce context-based invalidation support to the GuC TLB
> > invalidation
> > backend. This is implemented by iterating over each exec queue per GT
> > within a VM, skipping inactive queues, and issuing a context-based
> > (GuC
> > ID) H2G TLB invalidation. All H2G messages, except the final one, are
> > sent with an invalid seqno, which the G2H handler drops to ensure the
> > TLB invalidation fence is only signaled once all H2G messages are
> > completed.
> > 
> > A watermark mechanism is also added to switch between context-based
> > TLB
> > invalidations and full device-wide invalidations, as the return on
> > investment for context-based invalidation diminishes when many exec
> > queues are mapped.
> > 
> > v2:
> >  - Fix checkpatch warnings
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_device_types.h  |   2 +
> >  drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 122
> > +++++++++++++++++++++++++-
> >  drivers/gpu/drm/xe/xe_pci.c           |   1 +
> >  drivers/gpu/drm/xe/xe_pci_types.h     |   1 +
> >  4 files changed, 124 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > b/drivers/gpu/drm/xe/xe_device_types.h
> > index 145951dd95c9..ca285f4bce11 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -316,6 +316,8 @@ struct xe_device {
> >                 u8 has_mem_copy_instr:1;
> >                 /** @info.has_pxp: Device has PXP support */
> >                 u8 has_pxp:1;
> > +               /** @info.has_ctx_tlb_inval: Has context based TLB
> > invalidations */
> > +               u8 has_ctx_tlb_inval:1;
> >                 /** @info.has_range_tlb_inval: Has range based TLB
> > invalidations */
> >                 u8 has_range_tlb_inval:1;
> >                 /** @info.has_sriov: Supports SR-IOV */
> > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > index 6978ee8edf2e..1baaf577cded 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > @@ -6,14 +6,17 @@
> >  #include "abi/guc_actions_abi.h"
> >  
> >  #include "xe_device.h"
> > +#include "xe_exec_queue_types.h"
> >  #include "xe_gt_stats.h"
> >  #include "xe_gt_types.h"
> >  #include "xe_guc.h"
> >  #include "xe_guc_ct.h"
> > +#include "xe_guc_exec_queue_types.h"
> >  #include "xe_guc_tlb_inval.h"
> >  #include "xe_force_wake.h"
> >  #include "xe_mmio.h"
> >  #include "xe_tlb_inval.h"
> > +#include "xe_vm.h"
> >  
> >  #include "regs/xe_guc_regs.h"
> >  
> > @@ -136,10 +139,16 @@ static int send_tlb_inval_ppgtt(struct xe_guc
> > *guc, u32 seqno, u64 start,
> >  {
> >  #define MAX_TLB_INVALIDATION_LEN       7
> >         struct xe_gt *gt = guc_to_gt(guc);
> > +       struct xe_device *xe = guc_to_xe(guc);
> >         u32 action[MAX_TLB_INVALIDATION_LEN];
> >         u64 length = end - start;
> >         int len = 0;
> >  
> > +       xe_gt_assert(gt, (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE &&
> > +                         !xe->info.has_ctx_tlb_inval) ||
> > +                    (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX &&
> > +                     xe->info.has_ctx_tlb_inval));
> > +
> >         action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> >         action[len++] = seqno;
> >         if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > @@ -176,6 +185,100 @@ static int send_tlb_inval_asid_ppgtt(struct
> > xe_tlb_inval *tlb_inval, u32 seqno,
> >                                     XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> >  }
> >  
> > +static bool queue_mapped_in_guc(struct xe_guc *guc, struct
> > xe_exec_queue *q)
> > +{
> > +       return q->gt == guc_to_gt(guc);
> > +}
> > +
> > +static int send_tlb_inval_ctx_ppgtt(struct xe_tlb_inval *tlb_inval,
> > u32 seqno,
> > +                                   u64 start, u64 end, u32 asid)
> > +{
> > +       struct xe_guc *guc = tlb_inval->private;
> > +       struct xe_device *xe = guc_to_xe(guc);
> > +       struct xe_exec_queue *q, *last_q = NULL;
> > +       struct xe_vm *vm;
> > +       int err = 0;
> > +
> > +       lockdep_assert_held(&tlb_inval->seqno_lock);
> > +
> > +       if (xe->info.force_execlist)
> > +               return -ECANCELED;
> > +
> > +       vm = xe_device_asid_to_vm(xe, asid);
> > +       if (IS_ERR(vm))
> > +               return PTR_ERR(vm);
> > +
> > +       down_read(&vm->exec_queues.lock);
> > +
> > +       /*
> > +        * XXX: Randomly picking a threshold for now. This will need
> > to be
> > +        * tuned based on expected UMD queue counts and performance
> > profiling.
> > +        */
> > +#define EXEC_QUEUE_COUNT_FULL_THRESHOLD        8
> 
> Does seem interesting for this to be configurable. Maybe different
> applications have different requirements here... But also we should
> have some kind of default. No issue with what you 
> 

Yes, I figure we can make this tunable somehow.

> > +       if (vm->exec_queues.count[guc_to_gt(guc)->info.id] >=
> > +           EXEC_QUEUE_COUNT_FULL_THRESHOLD) {
> > +               u32 action[] = {
> > +                       XE_GUC_ACTION_TLB_INVALIDATION,
> > +                       seqno,
> > +                       MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > +               };
> > +
> > +               err = send_tlb_inval(guc, action,
> > ARRAY_SIZE(action));
> > +               goto err_unlock;
> > +       }
> > +#undef EXEC_QUEUE_COUNT_FULL_THRESHOLD
> > +
> > +       list_for_each_entry_reverse(q, &vm->exec_queues.list,
> > +                                   vm_exec_queue_link)
> 
> Braces here around the if() {}. Otherwise it's too easy to accidentally
> add lines below the if condition with unintended behavior.
> 
> > +               if (queue_mapped_in_guc(guc, q) && q->ops->active(q))
> > {
> > +                       last_q = q;
> > +                       break;
> > +               }
> > +
> > +       if (!last_q) {
> > +               /*
> > +                * We can't break fence ordering for TLB invalidation
> > jobs, if
> > +                * TLB invalidations are inflight issue a dummy
> > invalidation to
> > +                * maintain ordering. Nor can we move safely the
> > seqno_recv when
> > +                * returning -ECANCELED if TLB invalidations are in
> > flight. Use
> > +                * GGTT invalidation as dummy invalidation given ASID
> > +                * invalidations are unsupported here.
> > +                */
> > +               if (xe_tlb_inval_idle(tlb_inval))
> > +                       err = -ECANCELED;
> > +               else
> > +                       err = send_tlb_inval_ggtt(tlb_inval, seqno);
> > +               goto err_unlock;
> > +       }
> > +
> > +       list_for_each_entry(q, &vm->exec_queues.list,
> > vm_exec_queue_link) {
> > +               int __seqno = last_q == q ? seqno :
> > +                       TLB_INVALIDATION_SEQNO_INVALID;
> > +               u32 type = XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX;
> > +
> > +               /*
> > +                * XXX: Techincally we can race here and queue can
> > become
> > +                * inactive, not ideal. The TLB invalidation will
> > timeout in
> > +                * this case unless we get GuC support to convert to
> > NOP...
> 
> We actually can't support this. It won't just time out, GuC will return

Yes, I agree. I think we'd receive a failure response to
GUC_HXG_TYPE_FAST_REQUEST, but if I recall correctly, in my testing I
only saw timeouts on TLB invalidations sent to unregistered GuC IDs. Let
me follow on failure case (I hit this in case when then the multi-GT
code wasn't quite right yet).

> an error (invalid parameter that the context isn't registered and we're
> trying to submit something via context invalidation - nothing actually
> gets sent to hardware) and we will issue a GT reset. We should make

A failure response to GUC_HXG_TYPE_FAST_REQUEST is indeed a GT reset.

> sure when we send this, the context is registered and any
> deregistration explicitly happens afterwards.
> 
> This is why generally we want to do this add/remove not at the queue
> level but at the guc registration/deregistration level to guarantee
> this. But I see you split the queue add and the submission based on
> this active flag. Right now I believe we're tracking "pending
> deregistration" with exec_queue_destroyed && exec_queue_registered,
> although it might be nice to have an explicit state there.

Locking is required to make this race-free, which just doesn't work.

- TLB invalidations must be issued if the context is enabled or even if a
  disable is pending—otherwise, we risk memory corruption, which could be
  catastrophic.
- The context disable "done" G2H is received under the CT lock; the
  subsequent deregister is also issued under the CT lock.
- The lock that iterates over a VM's exec queues (contexts) is the outer
  lock of the CT lock.

As far as I can tell, we can't resolve this race without inverting the
locking order—at least not in any way I'd consider a reasonable locking
scheme for the KMD.

> 
> Alternatively we could also capture that error response from GuC,
> although there are a few different places we can get an error there and
> we might not want to overload the CT error handler in the event there
> is a legitimate state or hardware error that requires a guc reload/GT
> reset.
> 

I believe the solution lies in GuC support. We could introduce a flag in
the H2G TLB invalidation that signals: "I understand the risks—this may
race. If it does, discard the TLB invalidation and still return a valid
G2H TLB done response."

We’d likely disable this in CI builds unless it causes noise. In
production, if we hit this race (which should be rare, given CI passes
with the race exposed), the worst-case outcome is an extra TLB
invalidation—which is acceptable.

Matt

> Thanks,
> Stuart
> 
> > +                */
> > +               if (!queue_mapped_in_guc(guc, q) || !q->ops-
> > >active(q))
> > +                       continue;
> > +
> > +               xe_assert(xe, q->vm == vm);
> > +
> > +               err = send_tlb_inval_ppgtt(guc, __seqno, start, end,
> > +                                          q->guc->id, type);
> > +               if (err)
> > +                       goto err_unlock;
> > +       }
> > +
> > +err_unlock:
> > +       up_read(&vm->exec_queues.lock);
> > +       xe_vm_put(vm);
> > +
> > +       return err;
> > +}
> > +
> >  static bool tlb_inval_initialized(struct xe_tlb_inval *tlb_inval)
> >  {
> >         struct xe_guc *guc = tlb_inval->private;
> > @@ -203,7 +306,7 @@ static long tlb_inval_timeout_delay(struct
> > xe_tlb_inval *tlb_inval)
> >         return hw_tlb_timeout + 2 * delay;
> >  }
> >  
> > -static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
> > +static const struct xe_tlb_inval_ops guc_tlb_inval_asid_ops = {
> >         .all = send_tlb_inval_all,
> >         .ggtt = send_tlb_inval_ggtt,
> >         .ppgtt = send_tlb_inval_asid_ppgtt,
> > @@ -212,6 +315,15 @@ static const struct xe_tlb_inval_ops
> > guc_tlb_inval_ops = {
> >         .timeout_delay = tlb_inval_timeout_delay,
> >  };
> >  
> > +static const struct xe_tlb_inval_ops guc_tlb_inval_ctx_ops = {
> > +       .ggtt = send_tlb_inval_ggtt,
> > +       .all = send_tlb_inval_all,
> > +       .ppgtt = send_tlb_inval_ctx_ppgtt,
> > +       .initialized = tlb_inval_initialized,
> > +       .flush = tlb_inval_flush,
> > +       .timeout_delay = tlb_inval_timeout_delay,
> > +};
> > +
> >  /**
> >   * xe_guc_tlb_inval_init_early() - Init GuC TLB invalidation early
> >   * @guc: GuC object
> > @@ -223,8 +335,14 @@ static const struct xe_tlb_inval_ops
> > guc_tlb_inval_ops = {
> >  void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> >                                  struct xe_tlb_inval *tlb_inval)
> >  {
> > +       struct xe_device *xe = guc_to_xe(guc);
> > +
> >         tlb_inval->private = guc;
> > -       tlb_inval->ops = &guc_tlb_inval_ops;
> > +
> > +       if (xe->info.has_ctx_tlb_inval)
> > +               tlb_inval->ops = &guc_tlb_inval_ctx_ops;
> > +       else
> > +               tlb_inval->ops = &guc_tlb_inval_asid_ops;
> >  }
> >  
> >  /**
> > diff --git a/drivers/gpu/drm/xe/xe_pci.c
> > b/drivers/gpu/drm/xe/xe_pci.c
> > index 1959de3f7a27..9a11066c7d4a 100644
> > --- a/drivers/gpu/drm/xe/xe_pci.c
> > +++ b/drivers/gpu/drm/xe/xe_pci.c
> > @@ -863,6 +863,7 @@ static int xe_info_init(struct xe_device *xe,
> >                 xe->info.has_device_atomics_on_smem = 1;
> >  
> >         xe->info.has_range_tlb_inval = graphics_desc-
> > >has_range_tlb_inval;
> > +       xe->info.has_ctx_tlb_inval = graphics_desc-
> > >has_ctx_tlb_inval;
> >         xe->info.has_usm = graphics_desc->has_usm;
> >         xe->info.has_64bit_timestamp = graphics_desc-
> > >has_64bit_timestamp;
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_pci_types.h
> > b/drivers/gpu/drm/xe/xe_pci_types.h
> > index 9892c063a9c5..c08857c06c7e 100644
> > --- a/drivers/gpu/drm/xe/xe_pci_types.h
> > +++ b/drivers/gpu/drm/xe/xe_pci_types.h
> > @@ -63,6 +63,7 @@ struct xe_graphics_desc {
> >         u8 has_atomic_enable_pte_bit:1;
> >         u8 has_indirect_ring_state:1;
> >         u8 has_range_tlb_inval:1;
> > +       u8 has_ctx_tlb_inval:1;
> >         u8 has_usm:1;
> >         u8 has_64bit_timestamp:1;
> >  };
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 09/12] drm/xe: Add xe_tlb_inval_idle helper
  2025-11-04 19:56 ` [PATCH v2 09/12] drm/xe: Add xe_tlb_inval_idle helper Matthew Brost
@ 2025-11-10 18:48   ` Summers, Stuart
  2025-12-12 22:00     ` Summers, Stuart
  0 siblings, 1 reply; 32+ messages in thread
From: Summers, Stuart @ 2025-11-10 18:48 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Roper, Matthew D, De Marchi, Lucas

On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> Introduce the xe_tlb_inval_idle helper to detect whether any TLB
> invalidations are currently in flight. This is used in context-based
> TLB
> invalidations to determine whether dummy TLB invalidations need to be
> sent to maintain proper TLB invalidation fence ordering..
> 
> v2:
>  - Implement xe_tlb_inval_idle based on pending list
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

I don't have any issue with this patch or the next one, but I want to
hold on full review until we have a working solution for the
registration/deregistration placement of this.

Thanks,
Stuart

> ---
>  drivers/gpu/drm/xe/xe_tlb_inval.c | 21 +++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_tlb_inval.h |  2 ++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c
> b/drivers/gpu/drm/xe/xe_tlb_inval.c
> index 7ee081b94f90..7ae6b9f884e9 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval.c
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
> @@ -44,11 +44,14 @@ static void xe_tlb_inval_fence_fini(struct
> xe_tlb_inval_fence *fence)
>  static void
>  xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence)
>  {
> +       struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
>         bool stack = test_bit(FENCE_STACK_BIT, &fence->base.flags);
>  
>         lockdep_assert_held(&fence->tlb_inval->pending_lock);
>  
>         list_del(&fence->link);
> +       if (list_empty(&tlb_inval->pending_fences))
> +               cancel_delayed_work(&tlb_inval->fence_tdr);
>         trace_xe_tlb_inval_fence_signal(fence->tlb_inval->xe, fence);
>         xe_tlb_inval_fence_fini(fence);
>         dma_fence_signal(&fence->base);
> @@ -443,3 +446,21 @@ void xe_tlb_inval_fence_init(struct xe_tlb_inval
> *tlb_inval,
>                 dma_fence_get(&fence->base);
>         fence->tlb_inval = tlb_inval;
>  }
> +
> +/**
> + * xe_tlb_inval_idle() - Initialize TLB invalidation is idle
> + * @tlb_inval: TLB invalidation client
> + *
> + * Check the TLB invalidation seqno to determine if it is idle
> (i.e., no TLB
> + * invalidations are in flight). Expected to be called in the
> backend after the
> + * fence has been added to the pending list, and takes this into
> account.
> + *
> + * Return: True if TLB invalidation client is idle, False otherwise
> + */
> +bool xe_tlb_inval_idle(struct xe_tlb_inval *tlb_inval)
> +{
> +       lockdep_assert_held(&tlb_inval->seqno_lock);
> +
> +       guard(spinlock_irq)(&tlb_inval->pending_lock);
> +       return list_is_singular(&tlb_inval->pending_fences);
> +}
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h
> b/drivers/gpu/drm/xe/xe_tlb_inval.h
> index 05614915463a..44a6d9177489 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> @@ -43,4 +43,6 @@ xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence
> *fence)
>  
>  void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int
> seqno);
>  
> +bool xe_tlb_inval_idle(struct xe_tlb_inval *tlb_inval);
> +
>  #endif /* _XE_TLB_INVAL_ */


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 11/12] drm/xe: Add context-based invalidation to GuC TLB invalidation backend
  2025-11-07  7:01     ` Matthew Brost
@ 2025-11-10 19:29       ` Summers, Stuart
  2025-11-11  1:01         ` Matthew Brost
  0 siblings, 1 reply; 32+ messages in thread
From: Summers, Stuart @ 2025-11-10 19:29 UTC (permalink / raw)
  To: Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, Roper,  Matthew D,
	De Marchi, Lucas

On Thu, 2025-11-06 at 23:01 -0800, Matthew Brost wrote:
> On Thu, Nov 06, 2025 at 02:50:45PM -0700, Summers, Stuart wrote:
> > On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> > > Introduce context-based invalidation support to the GuC TLB
> > > invalidation
> > > backend. This is implemented by iterating over each exec queue
> > > per GT
> > > within a VM, skipping inactive queues, and issuing a context-
> > > based
> > > (GuC
> > > ID) H2G TLB invalidation. All H2G messages, except the final one,
> > > are
> > > sent with an invalid seqno, which the G2H handler drops to ensure
> > > the
> > > TLB invalidation fence is only signaled once all H2G messages are
> > > completed.
> > > 
> > > A watermark mechanism is also added to switch between context-
> > > based
> > > TLB
> > > invalidations and full device-wide invalidations, as the return
> > > on
> > > investment for context-based invalidation diminishes when many
> > > exec
> > > queues are mapped.
> > > 
> > > v2:
> > >  - Fix checkpatch warnings
> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_device_types.h  |   2 +
> > >  drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 122
> > > +++++++++++++++++++++++++-
> > >  drivers/gpu/drm/xe/xe_pci.c           |   1 +
> > >  drivers/gpu/drm/xe/xe_pci_types.h     |   1 +
> > >  4 files changed, 124 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > > b/drivers/gpu/drm/xe/xe_device_types.h
> > > index 145951dd95c9..ca285f4bce11 100644
> > > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > > @@ -316,6 +316,8 @@ struct xe_device {
> > >                 u8 has_mem_copy_instr:1;
> > >                 /** @info.has_pxp: Device has PXP support */
> > >                 u8 has_pxp:1;
> > > +               /** @info.has_ctx_tlb_inval: Has context based
> > > TLB
> > > invalidations */
> > > +               u8 has_ctx_tlb_inval:1;
> > >                 /** @info.has_range_tlb_inval: Has range based
> > > TLB
> > > invalidations */
> > >                 u8 has_range_tlb_inval:1;
> > >                 /** @info.has_sriov: Supports SR-IOV */
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > index 6978ee8edf2e..1baaf577cded 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > @@ -6,14 +6,17 @@
> > >  #include "abi/guc_actions_abi.h"
> > >  
> > >  #include "xe_device.h"
> > > +#include "xe_exec_queue_types.h"
> > >  #include "xe_gt_stats.h"
> > >  #include "xe_gt_types.h"
> > >  #include "xe_guc.h"
> > >  #include "xe_guc_ct.h"
> > > +#include "xe_guc_exec_queue_types.h"
> > >  #include "xe_guc_tlb_inval.h"
> > >  #include "xe_force_wake.h"
> > >  #include "xe_mmio.h"
> > >  #include "xe_tlb_inval.h"
> > > +#include "xe_vm.h"
> > >  
> > >  #include "regs/xe_guc_regs.h"
> > >  
> > > @@ -136,10 +139,16 @@ static int send_tlb_inval_ppgtt(struct
> > > xe_guc
> > > *guc, u32 seqno, u64 start,
> > >  {
> > >  #define MAX_TLB_INVALIDATION_LEN       7
> > >         struct xe_gt *gt = guc_to_gt(guc);
> > > +       struct xe_device *xe = guc_to_xe(guc);
> > >         u32 action[MAX_TLB_INVALIDATION_LEN];
> > >         u64 length = end - start;
> > >         int len = 0;
> > >  
> > > +       xe_gt_assert(gt, (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE
> > > &&
> > > +                         !xe->info.has_ctx_tlb_inval) ||
> > > +                    (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX
> > > &&
> > > +                     xe->info.has_ctx_tlb_inval));
> > > +
> > >         action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > >         action[len++] = seqno;
> > >         if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > @@ -176,6 +185,100 @@ static int send_tlb_inval_asid_ppgtt(struct
> > > xe_tlb_inval *tlb_inval, u32 seqno,
> > >                                    
> > > XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > >  }
> > >  
> > > +static bool queue_mapped_in_guc(struct xe_guc *guc, struct
> > > xe_exec_queue *q)
> > > +{
> > > +       return q->gt == guc_to_gt(guc);
> > > +}
> > > +
> > > +static int send_tlb_inval_ctx_ppgtt(struct xe_tlb_inval
> > > *tlb_inval,
> > > u32 seqno,
> > > +                                   u64 start, u64 end, u32 asid)
> > > +{
> > > +       struct xe_guc *guc = tlb_inval->private;
> > > +       struct xe_device *xe = guc_to_xe(guc);
> > > +       struct xe_exec_queue *q, *last_q = NULL;
> > > +       struct xe_vm *vm;
> > > +       int err = 0;
> > > +
> > > +       lockdep_assert_held(&tlb_inval->seqno_lock);
> > > +
> > > +       if (xe->info.force_execlist)
> > > +               return -ECANCELED;
> > > +
> > > +       vm = xe_device_asid_to_vm(xe, asid);
> > > +       if (IS_ERR(vm))
> > > +               return PTR_ERR(vm);
> > > +
> > > +       down_read(&vm->exec_queues.lock);
> > > +
> > > +       /*
> > > +        * XXX: Randomly picking a threshold for now. This will
> > > need
> > > to be
> > > +        * tuned based on expected UMD queue counts and
> > > performance
> > > profiling.
> > > +        */
> > > +#define EXEC_QUEUE_COUNT_FULL_THRESHOLD        8
> > 
> > Does seem interesting for this to be configurable. Maybe different
> > applications have different requirements here... But also we should
> > have some kind of default. No issue with what you 
> > 
> 
> Yes, I figure we can make this tunable somehow.
> 
> > > +       if (vm->exec_queues.count[guc_to_gt(guc)->info.id] >=
> > > +           EXEC_QUEUE_COUNT_FULL_THRESHOLD) {
> > > +               u32 action[] = {
> > > +                       XE_GUC_ACTION_TLB_INVALIDATION,
> > > +                       seqno,
> > > +                       MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > +               };
> > > +
> > > +               err = send_tlb_inval(guc, action,
> > > ARRAY_SIZE(action));
> > > +               goto err_unlock;
> > > +       }
> > > +#undef EXEC_QUEUE_COUNT_FULL_THRESHOLD
> > > +
> > > +       list_for_each_entry_reverse(q, &vm->exec_queues.list,
> > > +                                   vm_exec_queue_link)
> > 
> > Braces here around the if() {}. Otherwise it's too easy to
> > accidentally
> > add lines below the if condition with unintended behavior.
> > 
> > > +               if (queue_mapped_in_guc(guc, q) && q->ops-
> > > >active(q))
> > > {
> > > +                       last_q = q;
> > > +                       break;
> > > +               }
> > > +
> > > +       if (!last_q) {
> > > +               /*
> > > +                * We can't break fence ordering for TLB
> > > invalidation
> > > jobs, if
> > > +                * TLB invalidations are inflight issue a dummy
> > > invalidation to
> > > +                * maintain ordering. Nor can we move safely the
> > > seqno_recv when
> > > +                * returning -ECANCELED if TLB invalidations are
> > > in
> > > flight. Use
> > > +                * GGTT invalidation as dummy invalidation given
> > > ASID
> > > +                * invalidations are unsupported here.
> > > +                */
> > > +               if (xe_tlb_inval_idle(tlb_inval))
> > > +                       err = -ECANCELED;
> > > +               else
> > > +                       err = send_tlb_inval_ggtt(tlb_inval,
> > > seqno);
> > > +               goto err_unlock;
> > > +       }
> > > +
> > > +       list_for_each_entry(q, &vm->exec_queues.list,
> > > vm_exec_queue_link) {
> > > +               int __seqno = last_q == q ? seqno :
> > > +                       TLB_INVALIDATION_SEQNO_INVALID;
> > > +               u32 type = XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX;
> > > +
> > > +               /*
> > > +                * XXX: Techincally we can race here and queue
> > > can
> > > become
> > > +                * inactive, not ideal. The TLB invalidation will
> > > timeout in
> > > +                * this case unless we get GuC support to convert
> > > to
> > > NOP...
> > 
> > We actually can't support this. It won't just time out, GuC will
> > return
> 
> Yes, I agree. I think we'd receive a failure response to
> GUC_HXG_TYPE_FAST_REQUEST, but if I recall correctly, in my testing I
> only saw timeouts on TLB invalidations sent to unregistered GuC IDs.
> Let
> me follow on failure case (I hit this in case when then the multi-GT
> code wasn't quite right yet).

I can look up the exact cases, but I had done a lot of testing around
this internally and any sufficiently large exec queue
registration/deregistration count is going to likely hit this. It has a
pretty large impact on CI.

> 
> > an error (invalid parameter that the context isn't registered and
> > we're
> > trying to submit something via context invalidation - nothing
> > actually
> > gets sent to hardware) and we will issue a GT reset. We should make
> 
> A failure response to GUC_HXG_TYPE_FAST_REQUEST is indeed a GT reset.
> 
> > sure when we send this, the context is registered and any
> > deregistration explicitly happens afterwards.
> > 
> > This is why generally we want to do this add/remove not at the
> > queue
> > level but at the guc registration/deregistration level to guarantee
> > this. But I see you split the queue add and the submission based on
> > this active flag. Right now I believe we're tracking "pending
> > deregistration" with exec_queue_destroyed && exec_queue_registered,
> > although it might be nice to have an explicit state there.
> 
> Locking is required to make this race-free, which just doesn't work.
> 
> - TLB invalidations must be issued if the context is enabled or even
> if a
>   disable is pending—otherwise, we risk memory corruption, which
> could be
>   catastrophic.

Agreed.

> - The context disable "done" G2H is received under the CT lock; the
>   subsequent deregister is also issued under the CT lock.
> - The lock that iterates over a VM's exec queues (contexts) is the
> outer
>   lock of the CT lock.
> 
> As far as I can tell, we can't resolve this race without inverting
> the
> locking order—at least not in any way I'd consider a reasonable
> locking
> scheme for the KMD.

Can't we just tie this to register_exec_queue() (for the add),
disable_scheduling_deregister(), __deregister_exec_queue(), and
guc_exec_queue_stop() (for the remove) and avoid the in-flight issue
around locking? So basically we ensure that the list is only populated
after the "register" CT is sent and before the "deregister" CT is sent,
therefore ensuring that any TLB inval sent to those queues are
sequenced properly around the register and deregister CT messages.

So basically (see the removal after we mark the queue as "destroyed"):
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
b/drivers/gpu/drm/xe/xe_guc_submit.c
index ec0d1d1b0249..7a6ef3811808 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -708,6 +708,8 @@ static void register_exec_queue(struct
xe_exec_queue *q, int ctx_type)
        else
                __register_exec_queue(guc, &info);
        init_policies(guc, q);
+
+       add_to_vm_exec_queue_list();
 }
 
 static u32 wq_space_until_wrap(struct xe_exec_queue *q)
@@ -950,6 +952,7 @@ static void disable_scheduling_deregister(struct
xe_guc *guc,
        clear_exec_queue_enabled(q);
        set_exec_queue_pending_disable(q);
        set_exec_queue_destroyed(q);
+       remove_from_vm_exec_queue_list();
        trace_xe_exec_queue_scheduling_disable(q);
 
        /*
@@ -1219,6 +1222,7 @@ static void __deregister_exec_queue(struct xe_guc
*guc, struct xe_exec_queue *q)
        xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
 
        set_exec_queue_destroyed(q);
+       remove_from_vm_exec_queue_list();
        trace_xe_exec_queue_deregister(q);
 
        xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
@@ -1971,6 +1975,7 @@ static void guc_exec_queue_stop(struct xe_guc
*guc, struct xe_exec_queue *q)
                   EXEC_QUEUE_STATE_KILLED | EXEC_QUEUE_STATE_DESTROYED
|
                   EXEC_QUEUE_STATE_SUSPENDED,
                   &q->guc->state);
+       remove_from_vm_exec_queue_list();
        q->guc->resume_time = 0;
        trace_xe_exec_queue_stop(q);

> 
> > 
> > Alternatively we could also capture that error response from GuC,
> > although there are a few different places we can get an error there
> > and
> > we might not want to overload the CT error handler in the event
> > there
> > is a legitimate state or hardware error that requires a guc
> > reload/GT
> > reset.
> > 
> 
> I believe the solution lies in GuC support. We could introduce a flag
> in
> the H2G TLB invalidation that signals: "I understand the risks—this
> may
> race. If it does, discard the TLB invalidation and still return a
> valid
> G2H TLB done response."

Of course, it would be nice if we had a dedicated error G2H that would
call this out explicitly. Right now we just get a general "invalid
parameter". Even having a way to easily identify the sequence number in
question (which I know is slightly different after your changes in this
series).

> 
> We’d likely disable this in CI builds unless it causes noise. In
> production, if we hit this race (which should be rare, given CI

Not rare, see my comment above. I think this needs to be fixed - we
can't merge this with GT resets happening after the exec queue
submission and destroy count gets high enough.

Thanks,
Stuart

> passes
> with the race exposed), the worst-case outcome is an extra TLB
> invalidation—which is acceptable.
> 
> Matt
> 
> > Thanks,
> > Stuart
> > 
> > > +                */
> > > +               if (!queue_mapped_in_guc(guc, q) || !q->ops-
> > > > active(q))
> > > +                       continue;
> > > +
> > > +               xe_assert(xe, q->vm == vm);
> > > +
> > > +               err = send_tlb_inval_ppgtt(guc, __seqno, start,
> > > end,
> > > +                                          q->guc->id, type);
> > > +               if (err)
> > > +                       goto err_unlock;
> > > +       }
> > > +
> > > +err_unlock:
> > > +       up_read(&vm->exec_queues.lock);
> > > +       xe_vm_put(vm);
> > > +
> > > +       return err;
> > > +}
> > > +
> > >  static bool tlb_inval_initialized(struct xe_tlb_inval
> > > *tlb_inval)
> > >  {
> > >         struct xe_guc *guc = tlb_inval->private;
> > > @@ -203,7 +306,7 @@ static long tlb_inval_timeout_delay(struct
> > > xe_tlb_inval *tlb_inval)
> > >         return hw_tlb_timeout + 2 * delay;
> > >  }
> > >  
> > > -static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
> > > +static const struct xe_tlb_inval_ops guc_tlb_inval_asid_ops = {
> > >         .all = send_tlb_inval_all,
> > >         .ggtt = send_tlb_inval_ggtt,
> > >         .ppgtt = send_tlb_inval_asid_ppgtt,
> > > @@ -212,6 +315,15 @@ static const struct xe_tlb_inval_ops
> > > guc_tlb_inval_ops = {
> > >         .timeout_delay = tlb_inval_timeout_delay,
> > >  };
> > >  
> > > +static const struct xe_tlb_inval_ops guc_tlb_inval_ctx_ops = {
> > > +       .ggtt = send_tlb_inval_ggtt,
> > > +       .all = send_tlb_inval_all,
> > > +       .ppgtt = send_tlb_inval_ctx_ppgtt,
> > > +       .initialized = tlb_inval_initialized,
> > > +       .flush = tlb_inval_flush,
> > > +       .timeout_delay = tlb_inval_timeout_delay,
> > > +};
> > > +
> > >  /**
> > >   * xe_guc_tlb_inval_init_early() - Init GuC TLB invalidation
> > > early
> > >   * @guc: GuC object
> > > @@ -223,8 +335,14 @@ static const struct xe_tlb_inval_ops
> > > guc_tlb_inval_ops = {
> > >  void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> > >                                  struct xe_tlb_inval *tlb_inval)
> > >  {
> > > +       struct xe_device *xe = guc_to_xe(guc);
> > > +
> > >         tlb_inval->private = guc;
> > > -       tlb_inval->ops = &guc_tlb_inval_ops;
> > > +
> > > +       if (xe->info.has_ctx_tlb_inval)
> > > +               tlb_inval->ops = &guc_tlb_inval_ctx_ops;
> > > +       else
> > > +               tlb_inval->ops = &guc_tlb_inval_asid_ops;
> > >  }
> > >  
> > >  /**
> > > diff --git a/drivers/gpu/drm/xe/xe_pci.c
> > > b/drivers/gpu/drm/xe/xe_pci.c
> > > index 1959de3f7a27..9a11066c7d4a 100644
> > > --- a/drivers/gpu/drm/xe/xe_pci.c
> > > +++ b/drivers/gpu/drm/xe/xe_pci.c
> > > @@ -863,6 +863,7 @@ static int xe_info_init(struct xe_device *xe,
> > >                 xe->info.has_device_atomics_on_smem = 1;
> > >  
> > >         xe->info.has_range_tlb_inval = graphics_desc-
> > > > has_range_tlb_inval;
> > > +       xe->info.has_ctx_tlb_inval = graphics_desc-
> > > > has_ctx_tlb_inval;
> > >         xe->info.has_usm = graphics_desc->has_usm;
> > >         xe->info.has_64bit_timestamp = graphics_desc-
> > > > has_64bit_timestamp;
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_pci_types.h
> > > b/drivers/gpu/drm/xe/xe_pci_types.h
> > > index 9892c063a9c5..c08857c06c7e 100644
> > > --- a/drivers/gpu/drm/xe/xe_pci_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_pci_types.h
> > > @@ -63,6 +63,7 @@ struct xe_graphics_desc {
> > >         u8 has_atomic_enable_pte_bit:1;
> > >         u8 has_indirect_ring_state:1;
> > >         u8 has_range_tlb_inval:1;
> > > +       u8 has_ctx_tlb_inval:1;
> > >         u8 has_usm:1;
> > >         u8 has_64bit_timestamp:1;
> > >  };
> > 


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 11/12] drm/xe: Add context-based invalidation to GuC TLB invalidation backend
  2025-11-10 19:29       ` Summers, Stuart
@ 2025-11-11  1:01         ` Matthew Brost
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Brost @ 2025-11-11  1:01 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, Roper,  Matthew D,
	De Marchi, Lucas

On Mon, Nov 10, 2025 at 12:29:57PM -0700, Summers, Stuart wrote:
> On Thu, 2025-11-06 at 23:01 -0800, Matthew Brost wrote:
> > On Thu, Nov 06, 2025 at 02:50:45PM -0700, Summers, Stuart wrote:
> > > On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> > > > Introduce context-based invalidation support to the GuC TLB
> > > > invalidation
> > > > backend. This is implemented by iterating over each exec queue
> > > > per GT
> > > > within a VM, skipping inactive queues, and issuing a context-
> > > > based
> > > > (GuC
> > > > ID) H2G TLB invalidation. All H2G messages, except the final one,
> > > > are
> > > > sent with an invalid seqno, which the G2H handler drops to ensure
> > > > the
> > > > TLB invalidation fence is only signaled once all H2G messages are
> > > > completed.
> > > > 
> > > > A watermark mechanism is also added to switch between context-
> > > > based
> > > > TLB
> > > > invalidations and full device-wide invalidations, as the return
> > > > on
> > > > investment for context-based invalidation diminishes when many
> > > > exec
> > > > queues are mapped.
> > > > 
> > > > v2:
> > > >  - Fix checkpatch warnings
> > > > 
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_device_types.h  |   2 +
> > > >  drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 122
> > > > +++++++++++++++++++++++++-
> > > >  drivers/gpu/drm/xe/xe_pci.c           |   1 +
> > > >  drivers/gpu/drm/xe/xe_pci_types.h     |   1 +
> > > >  4 files changed, 124 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > > > b/drivers/gpu/drm/xe/xe_device_types.h
> > > > index 145951dd95c9..ca285f4bce11 100644
> > > > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > > > @@ -316,6 +316,8 @@ struct xe_device {
> > > >                 u8 has_mem_copy_instr:1;
> > > >                 /** @info.has_pxp: Device has PXP support */
> > > >                 u8 has_pxp:1;
> > > > +               /** @info.has_ctx_tlb_inval: Has context based
> > > > TLB
> > > > invalidations */
> > > > +               u8 has_ctx_tlb_inval:1;
> > > >                 /** @info.has_range_tlb_inval: Has range based
> > > > TLB
> > > > invalidations */
> > > >                 u8 has_range_tlb_inval:1;
> > > >                 /** @info.has_sriov: Supports SR-IOV */
> > > > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > > b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > > index 6978ee8edf2e..1baaf577cded 100644
> > > > --- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > > @@ -6,14 +6,17 @@
> > > >  #include "abi/guc_actions_abi.h"
> > > >  
> > > >  #include "xe_device.h"
> > > > +#include "xe_exec_queue_types.h"
> > > >  #include "xe_gt_stats.h"
> > > >  #include "xe_gt_types.h"
> > > >  #include "xe_guc.h"
> > > >  #include "xe_guc_ct.h"
> > > > +#include "xe_guc_exec_queue_types.h"
> > > >  #include "xe_guc_tlb_inval.h"
> > > >  #include "xe_force_wake.h"
> > > >  #include "xe_mmio.h"
> > > >  #include "xe_tlb_inval.h"
> > > > +#include "xe_vm.h"
> > > >  
> > > >  #include "regs/xe_guc_regs.h"
> > > >  
> > > > @@ -136,10 +139,16 @@ static int send_tlb_inval_ppgtt(struct
> > > > xe_guc
> > > > *guc, u32 seqno, u64 start,
> > > >  {
> > > >  #define MAX_TLB_INVALIDATION_LEN       7
> > > >         struct xe_gt *gt = guc_to_gt(guc);
> > > > +       struct xe_device *xe = guc_to_xe(guc);
> > > >         u32 action[MAX_TLB_INVALIDATION_LEN];
> > > >         u64 length = end - start;
> > > >         int len = 0;
> > > >  
> > > > +       xe_gt_assert(gt, (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE
> > > > &&
> > > > +                         !xe->info.has_ctx_tlb_inval) ||
> > > > +                    (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX
> > > > &&
> > > > +                     xe->info.has_ctx_tlb_inval));
> > > > +
> > > >         action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > > >         action[len++] = seqno;
> > > >         if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > > @@ -176,6 +185,100 @@ static int send_tlb_inval_asid_ppgtt(struct
> > > > xe_tlb_inval *tlb_inval, u32 seqno,
> > > >                                    
> > > > XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > > >  }
> > > >  
> > > > +static bool queue_mapped_in_guc(struct xe_guc *guc, struct
> > > > xe_exec_queue *q)
> > > > +{
> > > > +       return q->gt == guc_to_gt(guc);
> > > > +}
> > > > +
> > > > +static int send_tlb_inval_ctx_ppgtt(struct xe_tlb_inval
> > > > *tlb_inval,
> > > > u32 seqno,
> > > > +                                   u64 start, u64 end, u32 asid)
> > > > +{
> > > > +       struct xe_guc *guc = tlb_inval->private;
> > > > +       struct xe_device *xe = guc_to_xe(guc);
> > > > +       struct xe_exec_queue *q, *last_q = NULL;
> > > > +       struct xe_vm *vm;
> > > > +       int err = 0;
> > > > +
> > > > +       lockdep_assert_held(&tlb_inval->seqno_lock);
> > > > +
> > > > +       if (xe->info.force_execlist)
> > > > +               return -ECANCELED;
> > > > +
> > > > +       vm = xe_device_asid_to_vm(xe, asid);
> > > > +       if (IS_ERR(vm))
> > > > +               return PTR_ERR(vm);
> > > > +
> > > > +       down_read(&vm->exec_queues.lock);
> > > > +
> > > > +       /*
> > > > +        * XXX: Randomly picking a threshold for now. This will
> > > > need
> > > > to be
> > > > +        * tuned based on expected UMD queue counts and
> > > > performance
> > > > profiling.
> > > > +        */
> > > > +#define EXEC_QUEUE_COUNT_FULL_THRESHOLD        8
> > > 
> > > Does seem interesting for this to be configurable. Maybe different
> > > applications have different requirements here... But also we should
> > > have some kind of default. No issue with what you 
> > > 
> > 
> > Yes, I figure we can make this tunable somehow.
> > 
> > > > +       if (vm->exec_queues.count[guc_to_gt(guc)->info.id] >=
> > > > +           EXEC_QUEUE_COUNT_FULL_THRESHOLD) {
> > > > +               u32 action[] = {
> > > > +                       XE_GUC_ACTION_TLB_INVALIDATION,
> > > > +                       seqno,
> > > > +                       MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > > +               };
> > > > +
> > > > +               err = send_tlb_inval(guc, action,
> > > > ARRAY_SIZE(action));
> > > > +               goto err_unlock;
> > > > +       }
> > > > +#undef EXEC_QUEUE_COUNT_FULL_THRESHOLD
> > > > +
> > > > +       list_for_each_entry_reverse(q, &vm->exec_queues.list,
> > > > +                                   vm_exec_queue_link)
> > > 
> > > Braces here around the if() {}. Otherwise it's too easy to
> > > accidentally
> > > add lines below the if condition with unintended behavior.
> > > 
> > > > +               if (queue_mapped_in_guc(guc, q) && q->ops-
> > > > >active(q))
> > > > {
> > > > +                       last_q = q;
> > > > +                       break;
> > > > +               }
> > > > +
> > > > +       if (!last_q) {
> > > > +               /*
> > > > +                * We can't break fence ordering for TLB
> > > > invalidation
> > > > jobs, if
> > > > +                * TLB invalidations are inflight issue a dummy
> > > > invalidation to
> > > > +                * maintain ordering. Nor can we move safely the
> > > > seqno_recv when
> > > > +                * returning -ECANCELED if TLB invalidations are
> > > > in
> > > > flight. Use
> > > > +                * GGTT invalidation as dummy invalidation given
> > > > ASID
> > > > +                * invalidations are unsupported here.
> > > > +                */
> > > > +               if (xe_tlb_inval_idle(tlb_inval))
> > > > +                       err = -ECANCELED;
> > > > +               else
> > > > +                       err = send_tlb_inval_ggtt(tlb_inval,
> > > > seqno);
> > > > +               goto err_unlock;
> > > > +       }
> > > > +
> > > > +       list_for_each_entry(q, &vm->exec_queues.list,
> > > > vm_exec_queue_link) {
> > > > +               int __seqno = last_q == q ? seqno :
> > > > +                       TLB_INVALIDATION_SEQNO_INVALID;
> > > > +               u32 type = XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX;
> > > > +
> > > > +               /*
> > > > +                * XXX: Techincally we can race here and queue
> > > > can
> > > > become
> > > > +                * inactive, not ideal. The TLB invalidation will
> > > > timeout in
> > > > +                * this case unless we get GuC support to convert
> > > > to
> > > > NOP...
> > > 
> > > We actually can't support this. It won't just time out, GuC will
> > > return
> > 
> > Yes, I agree. I think we'd receive a failure response to
> > GUC_HXG_TYPE_FAST_REQUEST, but if I recall correctly, in my testing I
> > only saw timeouts on TLB invalidations sent to unregistered GuC IDs.
> > Let
> > me follow on failure case (I hit this in case when then the multi-GT
> > code wasn't quite right yet).
> 
> I can look up the exact cases, but I had done a lot of testing around
> this internally and any sufficiently large exec queue
> registration/deregistration count is going to likely hit this. It has a
> pretty large impact on CI.
> 
> > 
> > > an error (invalid parameter that the context isn't registered and
> > > we're
> > > trying to submit something via context invalidation - nothing
> > > actually
> > > gets sent to hardware) and we will issue a GT reset. We should make
> > 
> > A failure response to GUC_HXG_TYPE_FAST_REQUEST is indeed a GT reset.
> > 
> > > sure when we send this, the context is registered and any
> > > deregistration explicitly happens afterwards.
> > > 
> > > This is why generally we want to do this add/remove not at the
> > > queue
> > > level but at the guc registration/deregistration level to guarantee
> > > this. But I see you split the queue add and the submission based on
> > > this active flag. Right now I believe we're tracking "pending
> > > deregistration" with exec_queue_destroyed && exec_queue_registered,
> > > although it might be nice to have an explicit state there.
> > 
> > Locking is required to make this race-free, which just doesn't work.
> > 
> > - TLB invalidations must be issued if the context is enabled or even
> > if a
> >   disable is pending—otherwise, we risk memory corruption, which
> > could be
> >   catastrophic.
> 
> Agreed.
> 
> > - The context disable "done" G2H is received under the CT lock; the
> >   subsequent deregister is also issued under the CT lock.
> > - The lock that iterates over a VM's exec queues (contexts) is the
> > outer
> >   lock of the CT lock.
> > 
> > As far as I can tell, we can't resolve this race without inverting
> > the
> > locking order—at least not in any way I'd consider a reasonable
> > locking
> > scheme for the KMD.
> 
> Can't we just tie this to register_exec_queue() (for the add),
> disable_scheduling_deregister(), __deregister_exec_queue(), and
> guc_exec_queue_stop() (for the remove) and avoid the in-flight issue
> around locking? So basically we ensure that the list is only populated
> after the "register" CT is sent and before the "deregister" CT is sent,
> therefore ensuring that any TLB inval sent to those queues are
> sequenced properly around the register and deregister CT messages.
> 
> So basically (see the removal after we mark the queue as "destroyed"):
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> b/drivers/gpu/drm/xe/xe_guc_submit.c
> index ec0d1d1b0249..7a6ef3811808 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -708,6 +708,8 @@ static void register_exec_queue(struct
> xe_exec_queue *q, int ctx_type)
>         else
>                 __register_exec_queue(guc, &info);
>         init_policies(guc, q);
> +
> +       add_to_vm_exec_queue_list();
>  }
>  
>  static u32 wq_space_until_wrap(struct xe_exec_queue *q)
> @@ -950,6 +952,7 @@ static void disable_scheduling_deregister(struct
> xe_guc *guc,
>         clear_exec_queue_enabled(q);
>         set_exec_queue_pending_disable(q);
>         set_exec_queue_destroyed(q);
> +       remove_from_vm_exec_queue_list();

We can't avoid skipping the TLB invalidation here because the context is
possibly running. So we skipping a TLB invalidation, the context touchs
a page that should have been invalidated, and boom memory corruption.

>         trace_xe_exec_queue_scheduling_disable(q);
>  
>         /*
> @@ -1219,6 +1222,7 @@ static void __deregister_exec_queue(struct xe_guc
> *guc, struct xe_exec_queue *q)
>         xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
>  
>         set_exec_queue_destroyed(q);
> +       remove_from_vm_exec_queue_list();

This is actually fine.

But the 2nd half of disable_scheduling_deregister (the safe point to
skip TLB invalidation), is done in deregister_exec_queue which is the in
G2H handler under the CT lock, which is where we'd invert locking.
Again, changing the locking model, is basically a non-starter for me.

>         trace_xe_exec_queue_deregister(q);
>  
>         xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> @@ -1971,6 +1975,7 @@ static void guc_exec_queue_stop(struct xe_guc
> *guc, struct xe_exec_queue *q)
>                    EXEC_QUEUE_STATE_KILLED | EXEC_QUEUE_STATE_DESTROYED
> |
>                    EXEC_QUEUE_STATE_SUSPENDED,
>                    &q->guc->state);
> +       remove_from_vm_exec_queue_list();
>         q->guc->resume_time = 0;
>         trace_xe_exec_queue_stop(q);
> 
> > 
> > > 
> > > Alternatively we could also capture that error response from GuC,
> > > although there are a few different places we can get an error there
> > > and
> > > we might not want to overload the CT error handler in the event
> > > there
> > > is a legitimate state or hardware error that requires a guc
> > > reload/GT
> > > reset.
> > > 
> > 
> > I believe the solution lies in GuC support. We could introduce a flag
> > in
> > the H2G TLB invalidation that signals: "I understand the risks—this
> > may
> > race. If it does, discard the TLB invalidation and still return a
> > valid
> > G2H TLB done response."
> 
> Of course, it would be nice if we had a dedicated error G2H that would
> call this out explicitly. Right now we just get a general "invalid
> parameter". Even having a way to easily identify the sequence number in
> question (which I know is slightly different after your changes in this
> series).
> 
> > 
> > We’d likely disable this in CI builds unless it causes noise. In
> > production, if we hit this race (which should be rare, given CI
> 
> Not rare, see my comment above. I think this needs to be fixed - we
> can't merge this with GT resets happening after the exec queue
> submission and destroy count gets high enough.
> 

Ok, then I think we really need to take this up with arch - either we do
something like I suggest above or change the GuC interface to remain
ASID based (no KMD changes) and the GuC does a ASID -> context hash
table lookup and figures out which contexts that actually need an
invalidation (it has this information, racwe free, also it likely can be
smart and skip ones which don't need an invalidations which should
reduce the overall cost of TLB invalidations too).

Matt

> Thanks,
> Stuart
> 
> > passes
> > with the race exposed), the worst-case outcome is an extra TLB
> > invalidation—which is acceptable.
> > 
> > Matt
> > 
> > > Thanks,
> > > Stuart
> > > 
> > > > +                */
> > > > +               if (!queue_mapped_in_guc(guc, q) || !q->ops-
> > > > > active(q))
> > > > +                       continue;
> > > > +
> > > > +               xe_assert(xe, q->vm == vm);
> > > > +
> > > > +               err = send_tlb_inval_ppgtt(guc, __seqno, start,
> > > > end,
> > > > +                                          q->guc->id, type);
> > > > +               if (err)
> > > > +                       goto err_unlock;
> > > > +       }
> > > > +
> > > > +err_unlock:
> > > > +       up_read(&vm->exec_queues.lock);
> > > > +       xe_vm_put(vm);
> > > > +
> > > > +       return err;
> > > > +}
> > > > +
> > > >  static bool tlb_inval_initialized(struct xe_tlb_inval
> > > > *tlb_inval)
> > > >  {
> > > >         struct xe_guc *guc = tlb_inval->private;
> > > > @@ -203,7 +306,7 @@ static long tlb_inval_timeout_delay(struct
> > > > xe_tlb_inval *tlb_inval)
> > > >         return hw_tlb_timeout + 2 * delay;
> > > >  }
> > > >  
> > > > -static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
> > > > +static const struct xe_tlb_inval_ops guc_tlb_inval_asid_ops = {
> > > >         .all = send_tlb_inval_all,
> > > >         .ggtt = send_tlb_inval_ggtt,
> > > >         .ppgtt = send_tlb_inval_asid_ppgtt,
> > > > @@ -212,6 +315,15 @@ static const struct xe_tlb_inval_ops
> > > > guc_tlb_inval_ops = {
> > > >         .timeout_delay = tlb_inval_timeout_delay,
> > > >  };
> > > >  
> > > > +static const struct xe_tlb_inval_ops guc_tlb_inval_ctx_ops = {
> > > > +       .ggtt = send_tlb_inval_ggtt,
> > > > +       .all = send_tlb_inval_all,
> > > > +       .ppgtt = send_tlb_inval_ctx_ppgtt,
> > > > +       .initialized = tlb_inval_initialized,
> > > > +       .flush = tlb_inval_flush,
> > > > +       .timeout_delay = tlb_inval_timeout_delay,
> > > > +};
> > > > +
> > > >  /**
> > > >   * xe_guc_tlb_inval_init_early() - Init GuC TLB invalidation
> > > > early
> > > >   * @guc: GuC object
> > > > @@ -223,8 +335,14 @@ static const struct xe_tlb_inval_ops
> > > > guc_tlb_inval_ops = {
> > > >  void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> > > >                                  struct xe_tlb_inval *tlb_inval)
> > > >  {
> > > > +       struct xe_device *xe = guc_to_xe(guc);
> > > > +
> > > >         tlb_inval->private = guc;
> > > > -       tlb_inval->ops = &guc_tlb_inval_ops;
> > > > +
> > > > +       if (xe->info.has_ctx_tlb_inval)
> > > > +               tlb_inval->ops = &guc_tlb_inval_ctx_ops;
> > > > +       else
> > > > +               tlb_inval->ops = &guc_tlb_inval_asid_ops;
> > > >  }
> > > >  
> > > >  /**
> > > > diff --git a/drivers/gpu/drm/xe/xe_pci.c
> > > > b/drivers/gpu/drm/xe/xe_pci.c
> > > > index 1959de3f7a27..9a11066c7d4a 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pci.c
> > > > +++ b/drivers/gpu/drm/xe/xe_pci.c
> > > > @@ -863,6 +863,7 @@ static int xe_info_init(struct xe_device *xe,
> > > >                 xe->info.has_device_atomics_on_smem = 1;
> > > >  
> > > >         xe->info.has_range_tlb_inval = graphics_desc-
> > > > > has_range_tlb_inval;
> > > > +       xe->info.has_ctx_tlb_inval = graphics_desc-
> > > > > has_ctx_tlb_inval;
> > > >         xe->info.has_usm = graphics_desc->has_usm;
> > > >         xe->info.has_64bit_timestamp = graphics_desc-
> > > > > has_64bit_timestamp;
> > > >  
> > > > diff --git a/drivers/gpu/drm/xe/xe_pci_types.h
> > > > b/drivers/gpu/drm/xe/xe_pci_types.h
> > > > index 9892c063a9c5..c08857c06c7e 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pci_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_pci_types.h
> > > > @@ -63,6 +63,7 @@ struct xe_graphics_desc {
> > > >         u8 has_atomic_enable_pte_bit:1;
> > > >         u8 has_indirect_ring_state:1;
> > > >         u8 has_range_tlb_inval:1;
> > > > +       u8 has_ctx_tlb_inval:1;
> > > >         u8 has_usm:1;
> > > >         u8 has_64bit_timestamp:1;
> > > >  };
> > > 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 03/12] drm/xe: Add xe_device_asid_to_vm helper
  2025-11-04 19:56 ` [PATCH v2 03/12] drm/xe: Add xe_device_asid_to_vm helper Matthew Brost
@ 2025-12-11 22:07   ` Matt Atwood
  0 siblings, 0 replies; 32+ messages in thread
From: Matt Atwood @ 2025-12-11 22:07 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, stuart.summers, lucas.demarchi, matthew.d.roper

On Tue, Nov 04, 2025 at 11:56:07AM -0800, Matthew Brost wrote:
> Introduce the xe_device_asid_to_vm helper, which can be used throughout
> the driver to resolve the VM from a given ASID.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device.c | 25 +++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_device.h |  4 ++++
>  2 files changed, 29 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 86d5960476af..57907904c49d 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -1290,3 +1290,28 @@ void xe_device_declare_wedged(struct xe_device *xe)
>  		drm_dev_wedged_event(&xe->drm, xe->wedged.method, NULL);
>  	}
>  }
> +
> +/**
> + * xe_device_asid_to_vm() - Find VM from ASID
> + * @xe: the &xe_device
> + * @asid: Address space ID
> + *
> + * Find a VM from ASID and take a reference to VM which caller must drop.
> + * Reclaim safe.
> + *
> + * Return: VM on success, ERR_PTR on failure
> + */
> +struct xe_vm *xe_device_asid_to_vm(struct xe_device *xe, u32 asid)
> +{
> +	struct xe_vm *vm;
> +
> +	down_read(&xe->usm.lock);
> +	vm = xa_load(&xe->usm.asid_to_vm, asid);
> +	if (vm)
> +		xe_vm_get(vm);
> +	else
> +		vm = ERR_PTR(-EINVAL);
> +	up_read(&xe->usm.lock);
> +
> +	return vm;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index 32cc6323b7f6..538202eebc16 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -6,6 +6,8 @@
>  #ifndef _XE_DEVICE_H_
>  #define _XE_DEVICE_H_
>  
> +struct xe_vm;
> +
>  #include <drm/drm_util.h>
>  
>  #include "xe_device_types.h"
> @@ -195,6 +197,8 @@ void xe_file_put(struct xe_file *xef);
>  
>  int xe_is_injection_active(void);
>  
> +struct xe_vm *xe_device_asid_to_vm(struct xe_device *xe, u32 asid);
> +
>  /*
>   * Occasionally it is seen that the G2H worker starts running after a delay of more than
>   * a second even after being queued and activated by the Linux workqueue subsystem. This
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 05/12] drm/xe: Taint TLB invalidation seqno lock with GFP_KERNEL
  2025-11-04 19:56 ` [PATCH v2 05/12] drm/xe: Taint TLB invalidation seqno lock with GFP_KERNEL Matthew Brost
@ 2025-12-11 22:35   ` Matt Atwood
  0 siblings, 0 replies; 32+ messages in thread
From: Matt Atwood @ 2025-12-11 22:35 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, stuart.summers, lucas.demarchi, matthew.d.roper

On Tue, Nov 04, 2025 at 11:56:09AM -0800, Matthew Brost wrote:
> Taint TLB invalidation seqno lock with GFP_KERNEL as TLB invalidations
> can be in the path of reclaim (e.g., MMU notifiers).
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_tlb_inval.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
> index 918a59e686ea..7ee081b94f90 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval.c
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
> @@ -114,6 +114,16 @@ static void tlb_inval_fini(struct drm_device *drm, void *arg)
>  	xe_tlb_inval_reset(tlb_inval);
>  }
>  
> +static void primelockdep(struct xe_tlb_inval *tlb_inval)
> +{
> +	if (!IS_ENABLED(CONFIG_LOCKDEP))
> +		return;
> +
> +	fs_reclaim_acquire(GFP_KERNEL);
> +	might_lock(&tlb_inval->seqno_lock);
> +	fs_reclaim_release(GFP_KERNEL);
> +}
> +
>  /**
>   * xe_gt_tlb_inval_init - Initialize TLB invalidation state
>   * @gt: GT structure
> @@ -140,6 +150,8 @@ int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
>  	if (err)
>  		return err;
>  
> +	primelockdep(tlb_inval);
> +
>  	tlb_inval->job_wq = drmm_alloc_ordered_workqueue(&xe->drm,
>  							 "gt-tbl-inval-job-wq",
>  							 WQ_MEM_RECLAIM);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 04/12] drm/xe: Add vm to exec queues association
  2025-11-04 19:56 ` [PATCH v2 04/12] drm/xe: Add vm to exec queues association Matthew Brost
  2025-11-06 22:15   ` Summers, Stuart
@ 2025-12-12 21:03   ` Summers, Stuart
  2025-12-12 21:24     ` Matthew Brost
  1 sibling, 1 reply; 32+ messages in thread
From: Summers, Stuart @ 2025-12-12 21:03 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Roper, Matthew D, lucas.demarchi@intel.com

On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> Maintain a list of exec queues per vm which will be used by TLB
> invalidation code to do context-ID based tlb invalidations.
> 
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device.h           |  7 ----
>  drivers/gpu/drm/xe/xe_device_types.h     |  7 ++++
>  drivers/gpu/drm/xe/xe_exec_queue.c       |  7 +++-
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  3 ++
>  drivers/gpu/drm/xe/xe_vm.c               | 47
> ++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_vm.h               |  3 ++
>  drivers/gpu/drm/xe/xe_vm_types.h         | 13 +++++++
>  7 files changed, 79 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.h
> b/drivers/gpu/drm/xe/xe_device.h
> index 538202eebc16..764f24f4adfc 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -62,13 +62,6 @@ static inline struct xe_tile
> *xe_device_get_root_tile(struct xe_device *xe)
>         return &xe->tiles[0];
>  }
>  
> -/*
> - * Highest GT/tile count for any platform.  Used only for memory
> allocation
> - * sizing.  Any logic looping over GTs or mapping userspace GT IDs
> into GT
> - * structures should use the per-platform xe->info.max_gt_per_tile
> instead.
> - */
> -#define XE_MAX_GT_PER_TILE 2
> -
>  static inline struct xe_gt *xe_device_get_gt(struct xe_device *xe,
> u8 gt_id)
>  {
>         struct xe_tile *tile;
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> b/drivers/gpu/drm/xe/xe_device_types.h
> index af0ce275b032..145951dd95c9 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -57,6 +57,13 @@ struct xe_vram_region;
>  #define XE_GT1         1
>  #define XE_MAX_TILES_PER_DEVICE        (XE_GT1 + 1)
>  
> +/*
> + * Highest GT/tile count for any platform.  Used only for memory
> allocation
> + * sizing.  Any logic looping over GTs or mapping userspace GT IDs
> into GT
> + * structures should use the per-platform xe->info.max_gt_per_tile
> instead.
> + */
> +#define XE_MAX_GT_PER_TILE 2
> +
>  #define XE_MAX_ASID    (BIT(20))
>  
>  #define IS_PLATFORM_STEP(_xe, _platform, min_step, max_step)   \
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 1b57d7c2cc94..49822baf5967 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -72,8 +72,10 @@ static void __xe_exec_queue_free(struct
> xe_exec_queue *q)
>  
>         if (xe_exec_queue_uses_pxp(q))
>                 xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
> -       if (q->vm)
> +       if (q->vm) {
> +               xe_vm_remove_exec_queue(q->vm, q);
>                 xe_vm_put(q->vm);
> +       }
>  
>         if (q->xef)
>                 xe_file_put(q->xef);
> @@ -143,6 +145,7 @@ static struct xe_exec_queue
> *__xe_exec_queue_alloc(struct xe_device *xe,
>         q->ring_ops = gt->ring_ops[hwe->class];
>         q->ops = gt->exec_queue_ops;
>         INIT_LIST_HEAD(&q->lr.link);
> +       INIT_LIST_HEAD(&q->vm_exec_queue_link);
>         INIT_LIST_HEAD(&q->multi_gt_link);
>         INIT_LIST_HEAD(&q->hw_engine_group_link);
>         INIT_LIST_HEAD(&q->pxp.link);
> @@ -796,6 +799,8 @@ int xe_exec_queue_create_ioctl(struct drm_device
> *dev, void *data,
>         }
>  
>         q->xef = xe_file_get(xef);
> +       if (eci[0].engine_class != DRM_XE_ENGINE_CLASS_VM_BIND)
> +               xe_vm_add_exec_queue(vm, q);

Discussed this offline, but because of potential memory corruption in
the register/deregister path, the plan is to continue for now in the
queue create/destroy. This means we'll get errors from the GuC when we
race for the register/deregister and submission which will be addressed
in some later cleanup. Since this feature is blocked behind a feature
flag anyway for testing, it shouldn't have any other adverse effects.

One other minor comment below...

>  
>         /* user id alloc must always be last in ioctl to prevent UAF
> */
>         err = xa_alloc(&xef->exec_queue.xa, &id, q, xa_limit_32b,
> GFP_KERNEL);
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index c8807268ec6c..a2281fcb55b1 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -147,6 +147,9 @@ struct xe_exec_queue {
>                 struct xe_dep_scheduler *dep_scheduler;
>         } tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
>  
> +       /** @vm_exec_queue_link: Link to track exec queue within a
> VM's list of exec queues. */
> +       struct list_head vm_exec_queue_link;
> +
>         /** @pxp: PXP info tracking */
>         struct {
>                 /** @pxp.type: PXP session type used by this queue */
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 84f4c8f1be33..cccdd931dd5e 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1507,8 +1507,20 @@ struct xe_vm *xe_vm_create(struct xe_device
> *xe, u32 flags, struct xe_file *xef)
>         INIT_WORK(&vm->destroy_work, vm_destroy_work_func);
>  
>         INIT_LIST_HEAD(&vm->preempt.exec_queues);
> +       INIT_LIST_HEAD(&vm->exec_queues.list);
>         vm->preempt.min_run_period_ms = 10;     /* FIXME: Wire up to
> uAPI */
>  
> +       init_rwsem(&vm->exec_queues.lock);
> +       if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
> +               fs_reclaim_acquire(GFP_KERNEL);
> +               might_lock(&vm->exec_queues.lock);
> +               fs_reclaim_release(GFP_KERNEL);
> +
> +               down_read(&vm->exec_queues.lock);
> +               might_lock(&xe_root_mmio_gt(xe)->uc.guc.ct.lock);
> +               up_read(&vm->exec_queues.lock);
> +       }
> +
>         for_each_tile(tile, xe, id)
>                 xe_range_fence_tree_init(&vm->rftree[id]);
>  
> @@ -4387,3 +4399,38 @@ int xe_vm_alloc_cpu_addr_mirror_vma(struct
> xe_vm *vm, uint64_t start, uint64_t r
>  
>         return xe_vm_alloc_vma(vm, &map_req, false);
>  }
> +
> +/**
> + * xe_vm_add_exec_queue() - Add exec queue to VM
> + * @vm: The VM.
> + * @q: The exec_queue
> + */
> +void xe_vm_add_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
> +{
> +       /* User VMs and queues only */

Why? So the expectation is we always do a full invalidation for kernel
VMs? Do we want the potential performance impact of that, particularly
for the migration queues?

I get that in the current version we are doing this as an explicit
result of a user queue create, but I don't otherwise see why we need
these restrictions.

Thanks,
Stuart

> +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_KERNEL));
> +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_PERMANENT));
> +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_VM));
> +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_MIGRATE));
> +       xe_assert(vm->xe, vm->xef);
> +
> +       down_write(&vm->exec_queues.lock);
> +       list_add(&q->vm_exec_queue_link, &vm->exec_queues.list);
> +       ++vm->exec_queues.count[q->gt->info.id];
> +       up_write(&vm->exec_queues.lock);
> +}
> +
> +/**
> + * xe_vm_remove_exec_queue() - Remove exec queue from VM
> + * @vm: The VM.
> + * @q: The exec_queue
> + */
> +void xe_vm_remove_exec_queue(struct xe_vm *vm, struct xe_exec_queue
> *q)
> +{
> +       down_write(&vm->exec_queues.lock);
> +       if (!list_empty(&q->vm_exec_queue_link)) {
> +               list_del(&q->vm_exec_queue_link);
> +               --vm->exec_queues.count[q->gt->info.id];
> +       }
> +       up_write(&vm->exec_queues.lock);
> +}
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index ef8a5019574e..5f3341ef99d2 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -284,6 +284,9 @@ static inline struct dma_resv *xe_vm_resv(struct
> xe_vm *vm)
>  
>  void xe_vm_kill(struct xe_vm *vm, bool unlocked);
>  
> +void xe_vm_add_exec_queue(struct xe_vm *vm, struct xe_exec_queue
> *q);
> +void xe_vm_remove_exec_queue(struct xe_vm *vm, struct xe_exec_queue
> *q);
> +
>  /**
>   * xe_vm_assert_held(vm) - Assert that the vm's reservation object
> is held.
>   * @vm: The vm
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> b/drivers/gpu/drm/xe/xe_vm_types.h
> index 830ed7b05c27..180b48d62480 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -290,6 +290,19 @@ struct xe_vm {
>                 struct list_head pm_activate_link;
>         } preempt;
>  
> +       /** @exec_queues: Manages list of exec queues attached to
> this VM, protected by lock. */
> +       struct {
> +               /** @exec_queues.list: list of exec queues attached
> to this VM */
> +               struct list_head list;
> +               /**
> +                * @exec_queues.count: count of exec queues attached
> to this VM,
> +                * per GT
> +                */
> +               int count[XE_MAX_TILES_PER_DEVICE *
> XE_MAX_GT_PER_TILE];
> +               /** @exec_queues.lock: lock to protect exec_queues
> list */
> +               struct rw_semaphore lock;
> +       } exec_queues;
> +
>         /** @um: unified memory state */
>         struct {
>                 /** @asid: address space ID, unique to each VM */


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 04/12] drm/xe: Add vm to exec queues association
  2025-12-12 21:03   ` Summers, Stuart
@ 2025-12-12 21:24     ` Matthew Brost
  2025-12-12 21:37       ` Summers, Stuart
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Brost @ 2025-12-12 21:24 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, Roper,  Matthew D,
	lucas.demarchi@intel.com

On Fri, Dec 12, 2025 at 02:03:32PM -0700, Summers, Stuart wrote:
> On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> > Maintain a list of exec queues per vm which will be used by TLB
> > invalidation code to do context-ID based tlb invalidations.
> > 
> > Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_device.h           |  7 ----
> >  drivers/gpu/drm/xe/xe_device_types.h     |  7 ++++
> >  drivers/gpu/drm/xe/xe_exec_queue.c       |  7 +++-
> >  drivers/gpu/drm/xe/xe_exec_queue_types.h |  3 ++
> >  drivers/gpu/drm/xe/xe_vm.c               | 47
> > ++++++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_vm.h               |  3 ++
> >  drivers/gpu/drm/xe/xe_vm_types.h         | 13 +++++++
> >  7 files changed, 79 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_device.h
> > b/drivers/gpu/drm/xe/xe_device.h
> > index 538202eebc16..764f24f4adfc 100644
> > --- a/drivers/gpu/drm/xe/xe_device.h
> > +++ b/drivers/gpu/drm/xe/xe_device.h
> > @@ -62,13 +62,6 @@ static inline struct xe_tile
> > *xe_device_get_root_tile(struct xe_device *xe)
> >         return &xe->tiles[0];
> >  }
> >  
> > -/*
> > - * Highest GT/tile count for any platform.  Used only for memory
> > allocation
> > - * sizing.  Any logic looping over GTs or mapping userspace GT IDs
> > into GT
> > - * structures should use the per-platform xe->info.max_gt_per_tile
> > instead.
> > - */
> > -#define XE_MAX_GT_PER_TILE 2
> > -
> >  static inline struct xe_gt *xe_device_get_gt(struct xe_device *xe,
> > u8 gt_id)
> >  {
> >         struct xe_tile *tile;
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > b/drivers/gpu/drm/xe/xe_device_types.h
> > index af0ce275b032..145951dd95c9 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -57,6 +57,13 @@ struct xe_vram_region;
> >  #define XE_GT1         1
> >  #define XE_MAX_TILES_PER_DEVICE        (XE_GT1 + 1)
> >  
> > +/*
> > + * Highest GT/tile count for any platform.  Used only for memory
> > allocation
> > + * sizing.  Any logic looping over GTs or mapping userspace GT IDs
> > into GT
> > + * structures should use the per-platform xe->info.max_gt_per_tile
> > instead.
> > + */
> > +#define XE_MAX_GT_PER_TILE 2
> > +
> >  #define XE_MAX_ASID    (BIT(20))
> >  
> >  #define IS_PLATFORM_STEP(_xe, _platform, min_step, max_step)   \
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > index 1b57d7c2cc94..49822baf5967 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > @@ -72,8 +72,10 @@ static void __xe_exec_queue_free(struct
> > xe_exec_queue *q)
> >  
> >         if (xe_exec_queue_uses_pxp(q))
> >                 xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
> > -       if (q->vm)
> > +       if (q->vm) {
> > +               xe_vm_remove_exec_queue(q->vm, q);
> >                 xe_vm_put(q->vm);
> > +       }
> >  
> >         if (q->xef)
> >                 xe_file_put(q->xef);
> > @@ -143,6 +145,7 @@ static struct xe_exec_queue
> > *__xe_exec_queue_alloc(struct xe_device *xe,
> >         q->ring_ops = gt->ring_ops[hwe->class];
> >         q->ops = gt->exec_queue_ops;
> >         INIT_LIST_HEAD(&q->lr.link);
> > +       INIT_LIST_HEAD(&q->vm_exec_queue_link);
> >         INIT_LIST_HEAD(&q->multi_gt_link);
> >         INIT_LIST_HEAD(&q->hw_engine_group_link);
> >         INIT_LIST_HEAD(&q->pxp.link);
> > @@ -796,6 +799,8 @@ int xe_exec_queue_create_ioctl(struct drm_device
> > *dev, void *data,
> >         }
> >  
> >         q->xef = xe_file_get(xef);
> > +       if (eci[0].engine_class != DRM_XE_ENGINE_CLASS_VM_BIND)
> > +               xe_vm_add_exec_queue(vm, q);
> 
> Discussed this offline, but because of potential memory corruption in
> the register/deregister path, the plan is to continue for now in the
> queue create/destroy. This means we'll get errors from the GuC when we
> race for the register/deregister and submission which will be addressed
> in some later cleanup. Since this feature is blocked behind a feature
> flag anyway for testing, it shouldn't have any other adverse effects.
> 
> One other minor comment below...
> 
> >  
> >         /* user id alloc must always be last in ioctl to prevent UAF
> > */
> >         err = xa_alloc(&xef->exec_queue.xa, &id, q, xa_limit_32b,
> > GFP_KERNEL);
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > index c8807268ec6c..a2281fcb55b1 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > @@ -147,6 +147,9 @@ struct xe_exec_queue {
> >                 struct xe_dep_scheduler *dep_scheduler;
> >         } tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
> >  
> > +       /** @vm_exec_queue_link: Link to track exec queue within a
> > VM's list of exec queues. */
> > +       struct list_head vm_exec_queue_link;
> > +
> >         /** @pxp: PXP info tracking */
> >         struct {
> >                 /** @pxp.type: PXP session type used by this queue */
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index 84f4c8f1be33..cccdd931dd5e 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -1507,8 +1507,20 @@ struct xe_vm *xe_vm_create(struct xe_device
> > *xe, u32 flags, struct xe_file *xef)
> >         INIT_WORK(&vm->destroy_work, vm_destroy_work_func);
> >  
> >         INIT_LIST_HEAD(&vm->preempt.exec_queues);
> > +       INIT_LIST_HEAD(&vm->exec_queues.list);
> >         vm->preempt.min_run_period_ms = 10;     /* FIXME: Wire up to
> > uAPI */
> >  
> > +       init_rwsem(&vm->exec_queues.lock);
> > +       if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
> > +               fs_reclaim_acquire(GFP_KERNEL);
> > +               might_lock(&vm->exec_queues.lock);
> > +               fs_reclaim_release(GFP_KERNEL);
> > +
> > +               down_read(&vm->exec_queues.lock);
> > +               might_lock(&xe_root_mmio_gt(xe)->uc.guc.ct.lock);
> > +               up_read(&vm->exec_queues.lock);
> > +       }
> > +
> >         for_each_tile(tile, xe, id)
> >                 xe_range_fence_tree_init(&vm->rftree[id]);
> >  
> > @@ -4387,3 +4399,38 @@ int xe_vm_alloc_cpu_addr_mirror_vma(struct
> > xe_vm *vm, uint64_t start, uint64_t r
> >  
> >         return xe_vm_alloc_vma(vm, &map_req, false);
> >  }
> > +
> > +/**
> > + * xe_vm_add_exec_queue() - Add exec queue to VM
> > + * @vm: The VM.
> > + * @q: The exec_queue
> > + */
> > +void xe_vm_add_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
> > +{
> > +       /* User VMs and queues only */
> 
> Why? So the expectation is we always do a full invalidation for kernel
> VMs? Do we want the potential performance impact of that, particularly
> for the migration queues?
> 

We don't issue TLB invalidations via H2G on kernel VMs; instead, we
insert ring instructions to perform them. We fully control kernel VMs
and queues, so we can trust that TLB invalidations are correctly
inserted. TLB invalidations are only issued when we have a VMA and
attempt to move the backing memory or perform an unbind. The migration
VM doesn't have VMAs; rather, it has self-mapped page tables that are
dynamically programmed to perform operations such as clears, copies, or
updating another VM's page tables. PXP does have a single VMA, but the
memory it backing it is pinned and never unbound.

> I get that in the current version we are doing this as an explicit
> result of a user queue create, but I don't otherwise see why we need
> these restrictions.

It is about ensuring correct usage in the current code. If we need to
add invalidations to a kernel VM, we can remove these, but any change
like that would require a proper review. That said, we likely should
have more asserts in the TLB invalidation layer(s) to ensure we are not
issuing TLB invalidations on kernel VMs by accident.

Matt

> 
> Thanks,
> Stuart
> 
> > +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_KERNEL));
> > +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_PERMANENT));
> > +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_VM));
> > +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_MIGRATE));
> > +       xe_assert(vm->xe, vm->xef);
> > +
> > +       down_write(&vm->exec_queues.lock);
> > +       list_add(&q->vm_exec_queue_link, &vm->exec_queues.list);
> > +       ++vm->exec_queues.count[q->gt->info.id];
> > +       up_write(&vm->exec_queues.lock);
> > +}
> > +
> > +/**
> > + * xe_vm_remove_exec_queue() - Remove exec queue from VM
> > + * @vm: The VM.
> > + * @q: The exec_queue
> > + */
> > +void xe_vm_remove_exec_queue(struct xe_vm *vm, struct xe_exec_queue
> > *q)
> > +{
> > +       down_write(&vm->exec_queues.lock);
> > +       if (!list_empty(&q->vm_exec_queue_link)) {
> > +               list_del(&q->vm_exec_queue_link);
> > +               --vm->exec_queues.count[q->gt->info.id];
> > +       }
> > +       up_write(&vm->exec_queues.lock);
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> > index ef8a5019574e..5f3341ef99d2 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.h
> > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > @@ -284,6 +284,9 @@ static inline struct dma_resv *xe_vm_resv(struct
> > xe_vm *vm)
> >  
> >  void xe_vm_kill(struct xe_vm *vm, bool unlocked);
> >  
> > +void xe_vm_add_exec_queue(struct xe_vm *vm, struct xe_exec_queue
> > *q);
> > +void xe_vm_remove_exec_queue(struct xe_vm *vm, struct xe_exec_queue
> > *q);
> > +
> >  /**
> >   * xe_vm_assert_held(vm) - Assert that the vm's reservation object
> > is held.
> >   * @vm: The vm
> > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> > b/drivers/gpu/drm/xe/xe_vm_types.h
> > index 830ed7b05c27..180b48d62480 100644
> > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > @@ -290,6 +290,19 @@ struct xe_vm {
> >                 struct list_head pm_activate_link;
> >         } preempt;
> >  
> > +       /** @exec_queues: Manages list of exec queues attached to
> > this VM, protected by lock. */
> > +       struct {
> > +               /** @exec_queues.list: list of exec queues attached
> > to this VM */
> > +               struct list_head list;
> > +               /**
> > +                * @exec_queues.count: count of exec queues attached
> > to this VM,
> > +                * per GT
> > +                */
> > +               int count[XE_MAX_TILES_PER_DEVICE *
> > XE_MAX_GT_PER_TILE];
> > +               /** @exec_queues.lock: lock to protect exec_queues
> > list */
> > +               struct rw_semaphore lock;
> > +       } exec_queues;
> > +
> >         /** @um: unified memory state */
> >         struct {
> >                 /** @asid: address space ID, unique to each VM */
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 04/12] drm/xe: Add vm to exec queues association
  2025-12-12 21:24     ` Matthew Brost
@ 2025-12-12 21:37       ` Summers, Stuart
  0 siblings, 0 replies; 32+ messages in thread
From: Summers, Stuart @ 2025-12-12 21:37 UTC (permalink / raw)
  To: Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, Roper,  Matthew D,
	lucas.demarchi@intel.com

On Fri, 2025-12-12 at 13:24 -0800, Matthew Brost wrote:
> On Fri, Dec 12, 2025 at 02:03:32PM -0700, Summers, Stuart wrote:
> > On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> > > Maintain a list of exec queues per vm which will be used by TLB
> > > invalidation code to do context-ID based tlb invalidations.
> > > 
> > > Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_device.h           |  7 ----
> > >  drivers/gpu/drm/xe/xe_device_types.h     |  7 ++++
> > >  drivers/gpu/drm/xe/xe_exec_queue.c       |  7 +++-
> > >  drivers/gpu/drm/xe/xe_exec_queue_types.h |  3 ++
> > >  drivers/gpu/drm/xe/xe_vm.c               | 47
> > > ++++++++++++++++++++++++
> > >  drivers/gpu/drm/xe/xe_vm.h               |  3 ++
> > >  drivers/gpu/drm/xe/xe_vm_types.h         | 13 +++++++
> > >  7 files changed, 79 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_device.h
> > > b/drivers/gpu/drm/xe/xe_device.h
> > > index 538202eebc16..764f24f4adfc 100644
> > > --- a/drivers/gpu/drm/xe/xe_device.h
> > > +++ b/drivers/gpu/drm/xe/xe_device.h
> > > @@ -62,13 +62,6 @@ static inline struct xe_tile
> > > *xe_device_get_root_tile(struct xe_device *xe)
> > >         return &xe->tiles[0];
> > >  }
> > >  
> > > -/*
> > > - * Highest GT/tile count for any platform.  Used only for memory
> > > allocation
> > > - * sizing.  Any logic looping over GTs or mapping userspace GT
> > > IDs
> > > into GT
> > > - * structures should use the per-platform xe-
> > > >info.max_gt_per_tile
> > > instead.
> > > - */
> > > -#define XE_MAX_GT_PER_TILE 2
> > > -
> > >  static inline struct xe_gt *xe_device_get_gt(struct xe_device
> > > *xe,
> > > u8 gt_id)
> > >  {
> > >         struct xe_tile *tile;
> > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > > b/drivers/gpu/drm/xe/xe_device_types.h
> > > index af0ce275b032..145951dd95c9 100644
> > > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > > @@ -57,6 +57,13 @@ struct xe_vram_region;
> > >  #define XE_GT1         1
> > >  #define XE_MAX_TILES_PER_DEVICE        (XE_GT1 + 1)
> > >  
> > > +/*
> > > + * Highest GT/tile count for any platform.  Used only for memory
> > > allocation
> > > + * sizing.  Any logic looping over GTs or mapping userspace GT
> > > IDs
> > > into GT
> > > + * structures should use the per-platform xe-
> > > >info.max_gt_per_tile
> > > instead.
> > > + */
> > > +#define XE_MAX_GT_PER_TILE 2
> > > +
> > >  #define XE_MAX_ASID    (BIT(20))
> > >  
> > >  #define IS_PLATFORM_STEP(_xe, _platform, min_step, max_step)   \
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > index 1b57d7c2cc94..49822baf5967 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > @@ -72,8 +72,10 @@ static void __xe_exec_queue_free(struct
> > > xe_exec_queue *q)
> > >  
> > >         if (xe_exec_queue_uses_pxp(q))
> > >                 xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp,
> > > q);
> > > -       if (q->vm)
> > > +       if (q->vm) {
> > > +               xe_vm_remove_exec_queue(q->vm, q);
> > >                 xe_vm_put(q->vm);
> > > +       }
> > >  
> > >         if (q->xef)
> > >                 xe_file_put(q->xef);
> > > @@ -143,6 +145,7 @@ static struct xe_exec_queue
> > > *__xe_exec_queue_alloc(struct xe_device *xe,
> > >         q->ring_ops = gt->ring_ops[hwe->class];
> > >         q->ops = gt->exec_queue_ops;
> > >         INIT_LIST_HEAD(&q->lr.link);
> > > +       INIT_LIST_HEAD(&q->vm_exec_queue_link);
> > >         INIT_LIST_HEAD(&q->multi_gt_link);
> > >         INIT_LIST_HEAD(&q->hw_engine_group_link);
> > >         INIT_LIST_HEAD(&q->pxp.link);
> > > @@ -796,6 +799,8 @@ int xe_exec_queue_create_ioctl(struct
> > > drm_device
> > > *dev, void *data,
> > >         }
> > >  
> > >         q->xef = xe_file_get(xef);
> > > +       if (eci[0].engine_class != DRM_XE_ENGINE_CLASS_VM_BIND)
> > > +               xe_vm_add_exec_queue(vm, q);
> > 
> > Discussed this offline, but because of potential memory corruption
> > in
> > the register/deregister path, the plan is to continue for now in
> > the
> > queue create/destroy. This means we'll get errors from the GuC when
> > we
> > race for the register/deregister and submission which will be
> > addressed
> > in some later cleanup. Since this feature is blocked behind a
> > feature
> > flag anyway for testing, it shouldn't have any other adverse
> > effects.
> > 
> > One other minor comment below...
> > 
> > >  
> > >         /* user id alloc must always be last in ioctl to prevent
> > > UAF
> > > */
> > >         err = xa_alloc(&xef->exec_queue.xa, &id, q, xa_limit_32b,
> > > GFP_KERNEL);
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > index c8807268ec6c..a2281fcb55b1 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > @@ -147,6 +147,9 @@ struct xe_exec_queue {
> > >                 struct xe_dep_scheduler *dep_scheduler;
> > >         } tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
> > >  
> > > +       /** @vm_exec_queue_link: Link to track exec queue within
> > > a
> > > VM's list of exec queues. */
> > > +       struct list_head vm_exec_queue_link;
> > > +
> > >         /** @pxp: PXP info tracking */
> > >         struct {
> > >                 /** @pxp.type: PXP session type used by this
> > > queue */
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > b/drivers/gpu/drm/xe/xe_vm.c
> > > index 84f4c8f1be33..cccdd931dd5e 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -1507,8 +1507,20 @@ struct xe_vm *xe_vm_create(struct
> > > xe_device
> > > *xe, u32 flags, struct xe_file *xef)
> > >         INIT_WORK(&vm->destroy_work, vm_destroy_work_func);
> > >  
> > >         INIT_LIST_HEAD(&vm->preempt.exec_queues);
> > > +       INIT_LIST_HEAD(&vm->exec_queues.list);
> > >         vm->preempt.min_run_period_ms = 10;     /* FIXME: Wire up
> > > to
> > > uAPI */
> > >  
> > > +       init_rwsem(&vm->exec_queues.lock);
> > > +       if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
> > > +               fs_reclaim_acquire(GFP_KERNEL);
> > > +               might_lock(&vm->exec_queues.lock);
> > > +               fs_reclaim_release(GFP_KERNEL);
> > > +
> > > +               down_read(&vm->exec_queues.lock);
> > > +               might_lock(&xe_root_mmio_gt(xe)->uc.guc.ct.lock);
> > > +               up_read(&vm->exec_queues.lock);
> > > +       }
> > > +
> > >         for_each_tile(tile, xe, id)
> > >                 xe_range_fence_tree_init(&vm->rftree[id]);
> > >  
> > > @@ -4387,3 +4399,38 @@ int xe_vm_alloc_cpu_addr_mirror_vma(struct
> > > xe_vm *vm, uint64_t start, uint64_t r
> > >  
> > >         return xe_vm_alloc_vma(vm, &map_req, false);
> > >  }
> > > +
> > > +/**
> > > + * xe_vm_add_exec_queue() - Add exec queue to VM
> > > + * @vm: The VM.
> > > + * @q: The exec_queue
> > > + */
> > > +void xe_vm_add_exec_queue(struct xe_vm *vm, struct xe_exec_queue
> > > *q)
> > > +{
> > > +       /* User VMs and queues only */
> > 
> > Why? So the expectation is we always do a full invalidation for
> > kernel
> > VMs? Do we want the potential performance impact of that,
> > particularly
> > for the migration queues?
> > 
> 
> We don't issue TLB invalidations via H2G on kernel VMs; instead, we
> insert ring instructions to perform them. We fully control kernel VMs
> and queues, so we can trust that TLB invalidations are correctly
> inserted. TLB invalidations are only issued when we have a VMA and
> attempt to move the backing memory or perform an unbind. The
> migration
> VM doesn't have VMAs; rather, it has self-mapped page tables that are
> dynamically programmed to perform operations such as clears, copies,
> or
> updating another VM's page tables. PXP does have a single VMA, but
> the
> memory it backing it is pinned and never unbound.
> 
> > I get that in the current version we are doing this as an explicit
> > result of a user queue create, but I don't otherwise see why we
> > need
> > these restrictions.
> 
> It is about ensuring correct usage in the current code. If we need to
> add invalidations to a kernel VM, we can remove these, but any change
> like that would require a proper review. That said, we likely should
> have more asserts in the TLB invalidation layer(s) to ensure we are
> not
> issuing TLB invalidations on kernel VMs by accident.

Ok yeah this all makes sense and thanks for the detail there. I agree
with what you said here about ensuring we aren't accidentally going to
this path from the kernel VMs.

Reviewed-by: Stuart Summers <stuart.summers@intel.com>

> 
> Matt
> 
> > 
> > Thanks,
> > Stuart
> > 
> > > +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_KERNEL));
> > > +       xe_assert(vm->xe, !(q->flags &
> > > EXEC_QUEUE_FLAG_PERMANENT));
> > > +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_VM));
> > > +       xe_assert(vm->xe, !(q->flags & EXEC_QUEUE_FLAG_MIGRATE));
> > > +       xe_assert(vm->xe, vm->xef);
> > > +
> > > +       down_write(&vm->exec_queues.lock);
> > > +       list_add(&q->vm_exec_queue_link, &vm->exec_queues.list);
> > > +       ++vm->exec_queues.count[q->gt->info.id];
> > > +       up_write(&vm->exec_queues.lock);
> > > +}
> > > +
> > > +/**
> > > + * xe_vm_remove_exec_queue() - Remove exec queue from VM
> > > + * @vm: The VM.
> > > + * @q: The exec_queue
> > > + */
> > > +void xe_vm_remove_exec_queue(struct xe_vm *vm, struct
> > > xe_exec_queue
> > > *q)
> > > +{
> > > +       down_write(&vm->exec_queues.lock);
> > > +       if (!list_empty(&q->vm_exec_queue_link)) {
> > > +               list_del(&q->vm_exec_queue_link);
> > > +               --vm->exec_queues.count[q->gt->info.id];
> > > +       }
> > > +       up_write(&vm->exec_queues.lock);
> > > +}
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.h
> > > b/drivers/gpu/drm/xe/xe_vm.h
> > > index ef8a5019574e..5f3341ef99d2 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.h
> > > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > > @@ -284,6 +284,9 @@ static inline struct dma_resv
> > > *xe_vm_resv(struct
> > > xe_vm *vm)
> > >  
> > >  void xe_vm_kill(struct xe_vm *vm, bool unlocked);
> > >  
> > > +void xe_vm_add_exec_queue(struct xe_vm *vm, struct xe_exec_queue
> > > *q);
> > > +void xe_vm_remove_exec_queue(struct xe_vm *vm, struct
> > > xe_exec_queue
> > > *q);
> > > +
> > >  /**
> > >   * xe_vm_assert_held(vm) - Assert that the vm's reservation
> > > object
> > > is held.
> > >   * @vm: The vm
> > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> > > b/drivers/gpu/drm/xe/xe_vm_types.h
> > > index 830ed7b05c27..180b48d62480 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > > @@ -290,6 +290,19 @@ struct xe_vm {
> > >                 struct list_head pm_activate_link;
> > >         } preempt;
> > >  
> > > +       /** @exec_queues: Manages list of exec queues attached to
> > > this VM, protected by lock. */
> > > +       struct {
> > > +               /** @exec_queues.list: list of exec queues
> > > attached
> > > to this VM */
> > > +               struct list_head list;
> > > +               /**
> > > +                * @exec_queues.count: count of exec queues
> > > attached
> > > to this VM,
> > > +                * per GT
> > > +                */
> > > +               int count[XE_MAX_TILES_PER_DEVICE *
> > > XE_MAX_GT_PER_TILE];
> > > +               /** @exec_queues.lock: lock to protect
> > > exec_queues
> > > list */
> > > +               struct rw_semaphore lock;
> > > +       } exec_queues;
> > > +
> > >         /** @um: unified memory state */
> > >         struct {
> > >                 /** @asid: address space ID, unique to each VM */
> > 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 09/12] drm/xe: Add xe_tlb_inval_idle helper
  2025-11-10 18:48   ` Summers, Stuart
@ 2025-12-12 22:00     ` Summers, Stuart
  0 siblings, 0 replies; 32+ messages in thread
From: Summers, Stuart @ 2025-12-12 22:00 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Roper, Matthew D, lucas.demarchi@intel.com

On Mon, 2025-11-10 at 18:48 +0000, Summers, Stuart wrote:
> On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> > Introduce the xe_tlb_inval_idle helper to detect whether any TLB
> > invalidations are currently in flight. This is used in context-
> > based
> > TLB
> > invalidations to determine whether dummy TLB invalidations need to
> > be
> > sent to maintain proper TLB invalidation fence ordering..
> > 
> > v2:
> >  - Implement xe_tlb_inval_idle based on pending list
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> 
> I don't have any issue with this patch or the next one, but I want to
> hold on full review until we have a working solution for the
> registration/deregistration placement of this.

Reviewed-by: Stuart Summers <stuart.summers@intel.com>

> 
> Thanks,
> Stuart
> 
> > ---
> >  drivers/gpu/drm/xe/xe_tlb_inval.c | 21 +++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_tlb_inval.h |  2 ++
> >  2 files changed, 23 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c
> > b/drivers/gpu/drm/xe/xe_tlb_inval.c
> > index 7ee081b94f90..7ae6b9f884e9 100644
> > --- a/drivers/gpu/drm/xe/xe_tlb_inval.c
> > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
> > @@ -44,11 +44,14 @@ static void xe_tlb_inval_fence_fini(struct
> > xe_tlb_inval_fence *fence)
> >  static void
> >  xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence)
> >  {
> > +       struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
> >         bool stack = test_bit(FENCE_STACK_BIT, &fence->base.flags);
> >  
> >         lockdep_assert_held(&fence->tlb_inval->pending_lock);
> >  
> >         list_del(&fence->link);
> > +       if (list_empty(&tlb_inval->pending_fences))
> > +               cancel_delayed_work(&tlb_inval->fence_tdr);
> >         trace_xe_tlb_inval_fence_signal(fence->tlb_inval->xe,
> > fence);
> >         xe_tlb_inval_fence_fini(fence);
> >         dma_fence_signal(&fence->base);
> > @@ -443,3 +446,21 @@ void xe_tlb_inval_fence_init(struct
> > xe_tlb_inval
> > *tlb_inval,
> >                 dma_fence_get(&fence->base);
> >         fence->tlb_inval = tlb_inval;
> >  }
> > +
> > +/**
> > + * xe_tlb_inval_idle() - Initialize TLB invalidation is idle
> > + * @tlb_inval: TLB invalidation client
> > + *
> > + * Check the TLB invalidation seqno to determine if it is idle
> > (i.e., no TLB
> > + * invalidations are in flight). Expected to be called in the
> > backend after the
> > + * fence has been added to the pending list, and takes this into
> > account.
> > + *
> > + * Return: True if TLB invalidation client is idle, False
> > otherwise
> > + */
> > +bool xe_tlb_inval_idle(struct xe_tlb_inval *tlb_inval)
> > +{
> > +       lockdep_assert_held(&tlb_inval->seqno_lock);
> > +
> > +       guard(spinlock_irq)(&tlb_inval->pending_lock);
> > +       return list_is_singular(&tlb_inval->pending_fences);
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > index 05614915463a..44a6d9177489 100644
> > --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > @@ -43,4 +43,6 @@ xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence
> > *fence)
> >  
> >  void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int
> > seqno);
> >  
> > +bool xe_tlb_inval_idle(struct xe_tlb_inval *tlb_inval);
> > +
> >  #endif /* _XE_TLB_INVAL_ */
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 11/12] drm/xe: Add context-based invalidation to GuC TLB invalidation backend
  2025-11-04 19:56 ` [PATCH v2 11/12] drm/xe: Add context-based invalidation to GuC TLB invalidation backend Matthew Brost
  2025-11-06 21:50   ` Summers, Stuart
@ 2025-12-12 22:30   ` Summers, Stuart
  1 sibling, 0 replies; 32+ messages in thread
From: Summers, Stuart @ 2025-12-12 22:30 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Roper, Matthew D, lucas.demarchi@intel.com

On Tue, 2025-11-04 at 11:56 -0800, Matthew Brost wrote:
> Introduce context-based invalidation support to the GuC TLB
> invalidation
> backend. This is implemented by iterating over each exec queue per GT
> within a VM, skipping inactive queues, and issuing a context-based
> (GuC
> ID) H2G TLB invalidation. All H2G messages, except the final one, are
> sent with an invalid seqno, which the G2H handler drops to ensure the
> TLB invalidation fence is only signaled once all H2G messages are
> completed.
> 
> A watermark mechanism is also added to switch between context-based
> TLB
> invalidations and full device-wide invalidations, as the return on
> investment for context-based invalidation diminishes when many exec
> queues are mapped.
> 
> v2:
>  - Fix checkpatch warnings
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Mentioned in an earlier patch, but after offline discussion, let's go
with what you have here. We will address this as a follow up either
with the GuC team to rework how they handle the
registration/deregistration requirements or rework the KMD sequence to
ensure we aren't exposed to the mentioned memory corruption along that
path.

Reviewed-by: Stuart Summers <stuart.summers@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_device_types.h  |   2 +
>  drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 122
> +++++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_pci.c           |   1 +
>  drivers/gpu/drm/xe/xe_pci_types.h     |   1 +
>  4 files changed, 124 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> b/drivers/gpu/drm/xe/xe_device_types.h
> index 145951dd95c9..ca285f4bce11 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -316,6 +316,8 @@ struct xe_device {
>                 u8 has_mem_copy_instr:1;
>                 /** @info.has_pxp: Device has PXP support */
>                 u8 has_pxp:1;
> +               /** @info.has_ctx_tlb_inval: Has context based TLB
> invalidations */
> +               u8 has_ctx_tlb_inval:1;
>                 /** @info.has_range_tlb_inval: Has range based TLB
> invalidations */
>                 u8 has_range_tlb_inval:1;
>                 /** @info.has_sriov: Supports SR-IOV */
> diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> index 6978ee8edf2e..1baaf577cded 100644
> --- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> @@ -6,14 +6,17 @@
>  #include "abi/guc_actions_abi.h"
>  
>  #include "xe_device.h"
> +#include "xe_exec_queue_types.h"
>  #include "xe_gt_stats.h"
>  #include "xe_gt_types.h"
>  #include "xe_guc.h"
>  #include "xe_guc_ct.h"
> +#include "xe_guc_exec_queue_types.h"
>  #include "xe_guc_tlb_inval.h"
>  #include "xe_force_wake.h"
>  #include "xe_mmio.h"
>  #include "xe_tlb_inval.h"
> +#include "xe_vm.h"
>  
>  #include "regs/xe_guc_regs.h"
>  
> @@ -136,10 +139,16 @@ static int send_tlb_inval_ppgtt(struct xe_guc
> *guc, u32 seqno, u64 start,
>  {
>  #define MAX_TLB_INVALIDATION_LEN       7
>         struct xe_gt *gt = guc_to_gt(guc);
> +       struct xe_device *xe = guc_to_xe(guc);
>         u32 action[MAX_TLB_INVALIDATION_LEN];
>         u64 length = end - start;
>         int len = 0;
>  
> +       xe_gt_assert(gt, (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE &&
> +                         !xe->info.has_ctx_tlb_inval) ||
> +                    (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX &&
> +                     xe->info.has_ctx_tlb_inval));
> +
>         action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
>         action[len++] = seqno;
>         if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> @@ -176,6 +185,100 @@ static int send_tlb_inval_asid_ppgtt(struct
> xe_tlb_inval *tlb_inval, u32 seqno,
>                                     XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
>  }
>  
> +static bool queue_mapped_in_guc(struct xe_guc *guc, struct
> xe_exec_queue *q)
> +{
> +       return q->gt == guc_to_gt(guc);
> +}
> +
> +static int send_tlb_inval_ctx_ppgtt(struct xe_tlb_inval *tlb_inval,
> u32 seqno,
> +                                   u64 start, u64 end, u32 asid)
> +{
> +       struct xe_guc *guc = tlb_inval->private;
> +       struct xe_device *xe = guc_to_xe(guc);
> +       struct xe_exec_queue *q, *last_q = NULL;
> +       struct xe_vm *vm;
> +       int err = 0;
> +
> +       lockdep_assert_held(&tlb_inval->seqno_lock);
> +
> +       if (xe->info.force_execlist)
> +               return -ECANCELED;
> +
> +       vm = xe_device_asid_to_vm(xe, asid);
> +       if (IS_ERR(vm))
> +               return PTR_ERR(vm);
> +
> +       down_read(&vm->exec_queues.lock);
> +
> +       /*
> +        * XXX: Randomly picking a threshold for now. This will need
> to be
> +        * tuned based on expected UMD queue counts and performance
> profiling.
> +        */
> +#define EXEC_QUEUE_COUNT_FULL_THRESHOLD        8
> +       if (vm->exec_queues.count[guc_to_gt(guc)->info.id] >=
> +           EXEC_QUEUE_COUNT_FULL_THRESHOLD) {
> +               u32 action[] = {
> +                       XE_GUC_ACTION_TLB_INVALIDATION,
> +                       seqno,
> +                       MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> +               };
> +
> +               err = send_tlb_inval(guc, action,
> ARRAY_SIZE(action));
> +               goto err_unlock;
> +       }
> +#undef EXEC_QUEUE_COUNT_FULL_THRESHOLD
> +
> +       list_for_each_entry_reverse(q, &vm->exec_queues.list,
> +                                   vm_exec_queue_link)
> +               if (queue_mapped_in_guc(guc, q) && q->ops->active(q))
> {
> +                       last_q = q;
> +                       break;
> +               }
> +
> +       if (!last_q) {
> +               /*
> +                * We can't break fence ordering for TLB invalidation
> jobs, if
> +                * TLB invalidations are inflight issue a dummy
> invalidation to
> +                * maintain ordering. Nor can we move safely the
> seqno_recv when
> +                * returning -ECANCELED if TLB invalidations are in
> flight. Use
> +                * GGTT invalidation as dummy invalidation given ASID
> +                * invalidations are unsupported here.
> +                */
> +               if (xe_tlb_inval_idle(tlb_inval))
> +                       err = -ECANCELED;
> +               else
> +                       err = send_tlb_inval_ggtt(tlb_inval, seqno);
> +               goto err_unlock;
> +       }
> +
> +       list_for_each_entry(q, &vm->exec_queues.list,
> vm_exec_queue_link) {
> +               int __seqno = last_q == q ? seqno :
> +                       TLB_INVALIDATION_SEQNO_INVALID;
> +               u32 type = XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX;
> +
> +               /*
> +                * XXX: Techincally we can race here and queue can
> become
> +                * inactive, not ideal. The TLB invalidation will
> timeout in
> +                * this case unless we get GuC support to convert to
> NOP...
> +                */
> +               if (!queue_mapped_in_guc(guc, q) || !q->ops-
> >active(q))
> +                       continue;
> +
> +               xe_assert(xe, q->vm == vm);
> +
> +               err = send_tlb_inval_ppgtt(guc, __seqno, start, end,
> +                                          q->guc->id, type);
> +               if (err)
> +                       goto err_unlock;
> +       }
> +
> +err_unlock:
> +       up_read(&vm->exec_queues.lock);
> +       xe_vm_put(vm);
> +
> +       return err;
> +}
> +
>  static bool tlb_inval_initialized(struct xe_tlb_inval *tlb_inval)
>  {
>         struct xe_guc *guc = tlb_inval->private;
> @@ -203,7 +306,7 @@ static long tlb_inval_timeout_delay(struct
> xe_tlb_inval *tlb_inval)
>         return hw_tlb_timeout + 2 * delay;
>  }
>  
> -static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
> +static const struct xe_tlb_inval_ops guc_tlb_inval_asid_ops = {
>         .all = send_tlb_inval_all,
>         .ggtt = send_tlb_inval_ggtt,
>         .ppgtt = send_tlb_inval_asid_ppgtt,
> @@ -212,6 +315,15 @@ static const struct xe_tlb_inval_ops
> guc_tlb_inval_ops = {
>         .timeout_delay = tlb_inval_timeout_delay,
>  };
>  
> +static const struct xe_tlb_inval_ops guc_tlb_inval_ctx_ops = {
> +       .ggtt = send_tlb_inval_ggtt,
> +       .all = send_tlb_inval_all,
> +       .ppgtt = send_tlb_inval_ctx_ppgtt,
> +       .initialized = tlb_inval_initialized,
> +       .flush = tlb_inval_flush,
> +       .timeout_delay = tlb_inval_timeout_delay,
> +};
> +
>  /**
>   * xe_guc_tlb_inval_init_early() - Init GuC TLB invalidation early
>   * @guc: GuC object
> @@ -223,8 +335,14 @@ static const struct xe_tlb_inval_ops
> guc_tlb_inval_ops = {
>  void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
>                                  struct xe_tlb_inval *tlb_inval)
>  {
> +       struct xe_device *xe = guc_to_xe(guc);
> +
>         tlb_inval->private = guc;
> -       tlb_inval->ops = &guc_tlb_inval_ops;
> +
> +       if (xe->info.has_ctx_tlb_inval)
> +               tlb_inval->ops = &guc_tlb_inval_ctx_ops;
> +       else
> +               tlb_inval->ops = &guc_tlb_inval_asid_ops;
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/xe/xe_pci.c
> b/drivers/gpu/drm/xe/xe_pci.c
> index 1959de3f7a27..9a11066c7d4a 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -863,6 +863,7 @@ static int xe_info_init(struct xe_device *xe,
>                 xe->info.has_device_atomics_on_smem = 1;
>  
>         xe->info.has_range_tlb_inval = graphics_desc-
> >has_range_tlb_inval;
> +       xe->info.has_ctx_tlb_inval = graphics_desc-
> >has_ctx_tlb_inval;
>         xe->info.has_usm = graphics_desc->has_usm;
>         xe->info.has_64bit_timestamp = graphics_desc-
> >has_64bit_timestamp;
>  
> diff --git a/drivers/gpu/drm/xe/xe_pci_types.h
> b/drivers/gpu/drm/xe/xe_pci_types.h
> index 9892c063a9c5..c08857c06c7e 100644
> --- a/drivers/gpu/drm/xe/xe_pci_types.h
> +++ b/drivers/gpu/drm/xe/xe_pci_types.h
> @@ -63,6 +63,7 @@ struct xe_graphics_desc {
>         u8 has_atomic_enable_pte_bit:1;
>         u8 has_indirect_ring_state:1;
>         u8 has_range_tlb_inval:1;
> +       u8 has_ctx_tlb_inval:1;
>         u8 has_usm:1;
>         u8 has_64bit_timestamp:1;
>  };


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2025-12-12 22:30 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-04 19:56 [PATCH v2 00/12] Context based TLB invalidations Matthew Brost
2025-11-04 19:56 ` [PATCH v2 01/12] drm/xe: Add normalize_invalidation_range Matthew Brost
2025-11-06 20:03   ` Summers, Stuart
2025-11-04 19:56 ` [PATCH v2 02/12] drm/xe: Make usm.asid_to_vm allocation use GFP_NOWAIT Matthew Brost
2025-11-06 22:08   ` Summers, Stuart
2025-11-06 22:13   ` Summers, Stuart
2025-11-04 19:56 ` [PATCH v2 03/12] drm/xe: Add xe_device_asid_to_vm helper Matthew Brost
2025-12-11 22:07   ` Matt Atwood
2025-11-04 19:56 ` [PATCH v2 04/12] drm/xe: Add vm to exec queues association Matthew Brost
2025-11-06 22:15   ` Summers, Stuart
2025-12-12 21:03   ` Summers, Stuart
2025-12-12 21:24     ` Matthew Brost
2025-12-12 21:37       ` Summers, Stuart
2025-11-04 19:56 ` [PATCH v2 05/12] drm/xe: Taint TLB invalidation seqno lock with GFP_KERNEL Matthew Brost
2025-12-11 22:35   ` Matt Atwood
2025-11-04 19:56 ` [PATCH v2 06/12] drm/xe: Do not forward invalid TLB invalidation seqnos to upper layers Matthew Brost
2025-11-06 22:05   ` Summers, Stuart
2025-11-04 19:56 ` [PATCH v2 07/12] drm/xe: Rename send_tlb_inval_ppgtt to send_tlb_inval_asid_ppgtt Matthew Brost
2025-11-06 20:22   ` Summers, Stuart
2025-11-04 19:56 ` [PATCH v2 08/12] drm/xe: Add send_tlb_inval_ppgtt helper Matthew Brost
2025-11-06 20:25   ` Summers, Stuart
2025-11-04 19:56 ` [PATCH v2 09/12] drm/xe: Add xe_tlb_inval_idle helper Matthew Brost
2025-11-10 18:48   ` Summers, Stuart
2025-12-12 22:00     ` Summers, Stuart
2025-11-04 19:56 ` [PATCH v2 10/12] drm/xe: Add exec queue active vfunc Matthew Brost
2025-11-04 19:56 ` [PATCH v2 11/12] drm/xe: Add context-based invalidation to GuC TLB invalidation backend Matthew Brost
2025-11-06 21:50   ` Summers, Stuart
2025-11-07  7:01     ` Matthew Brost
2025-11-10 19:29       ` Summers, Stuart
2025-11-11  1:01         ` Matthew Brost
2025-12-12 22:30   ` Summers, Stuart
2025-11-04 19:56 ` [PATCH v2 12/12] drm/xe: Enable context TLB invalidations for CI Matthew Brost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox