[PATCH 0/5] Add TLB invalidation abstraction

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] Add TLB invalidation abstraction
@ 2025-07-23 18:22 stuartsummers
  2025-07-23 18:22 ` [PATCH 1/5] drm/xe: Add xe_gt_tlb_invalidation_done_handler stuartsummers
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: stuartsummers @ 2025-07-23 18:22 UTC (permalink / raw)
  Cc: matthew.brost, matthew.auld, maarten.lankhorst, farah.kassabri,
	intel-xe, stuartsummers

This is a new collection of patches from Matt that has
been floating around internally and on the mailing list.
The goal here is to abstract the actual mechanism of
the invalidation from the higher level invalidation triggers
(like page table updates).

Most of these were brought in unmodified by Matt, but
I've done some minor rebase work here and there and
added my signoff where those rebases seemed a little
more extensive.

This is built on top of [1] and [2]. I've also brought
in [3] and [4] individually from a separate series
from Matt.

Stuart

[1] https://patchwork.freedesktop.org/series/151670/#rev1
[2] https://patchwork.freedesktop.org/series/150402/#rev3
[3] https://lists.freedesktop.org/archives/intel-xe/2024-July/041363.html
[4] https://lists.freedesktop.org/archives/intel-xe/2024-July/041364.html

Matthew Brost (5):
  drm/xe: Add xe_gt_tlb_invalidation_done_handler
  drm/xe: Decouple TLB invalidations from GT
  drm/xe: Prep TLB invalidation fence before sending
  drm/xe: Add helpers to send TLB invalidations
  drm/xe: Split TLB invalidation code in frontend and backend

 drivers/gpu/drm/xe/Makefile                   |   5 +-
 drivers/gpu/drm/xe/xe_ggtt.c                  |   6 +-
 drivers/gpu/drm/xe/xe_gt.c                    |   6 +-
 drivers/gpu/drm/xe/xe_gt_tlb_inval.c          | 600 ------------------
 drivers/gpu/drm/xe/xe_gt_tlb_inval.h          |  40 --
 drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h      |  34 -
 drivers/gpu/drm/xe/xe_gt_tlb_inval_types.h    |  64 --
 drivers/gpu/drm/xe/xe_gt_types.h              |   2 +-
 drivers/gpu/drm/xe/xe_guc_ct.c                |   2 +-
 drivers/gpu/drm/xe/xe_guc_tlb_inval.c         | 263 ++++++++
 drivers/gpu/drm/xe/xe_guc_tlb_inval.h         |  19 +
 drivers/gpu/drm/xe/xe_lmtt.c                  |  12 +-
 drivers/gpu/drm/xe/xe_migrate.h               |  10 +-
 drivers/gpu/drm/xe/xe_pt.c                    |  63 +-
 drivers/gpu/drm/xe/xe_svm.c                   |   1 -
 drivers/gpu/drm/xe/xe_tlb_inval.c             | 411 ++++++++++++
 drivers/gpu/drm/xe/xe_tlb_inval.h             |  47 ++
 ..._gt_tlb_inval_job.c => xe_tlb_inval_job.c} | 154 +++--
 drivers/gpu/drm/xe/xe_tlb_inval_job.h         |  34 +
 drivers/gpu/drm/xe/xe_tlb_inval_types.h       | 138 ++++
 drivers/gpu/drm/xe/xe_trace.h                 |  24 +-
 drivers/gpu/drm/xe/xe_vm.c                    |  26 +-
 22 files changed, 1073 insertions(+), 888 deletions(-)
 delete mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_inval.c
 delete mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_inval.h
 delete mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
 delete mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_inval_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.c
 create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.h
 create mode 100644 drivers/gpu/drm/xe/xe_tlb_inval.c
 create mode 100644 drivers/gpu/drm/xe/xe_tlb_inval.h
 rename drivers/gpu/drm/xe/{xe_gt_tlb_inval_job.c => xe_tlb_inval_job.c} (51%)
 create mode 100644 drivers/gpu/drm/xe/xe_tlb_inval_job.h
 create mode 100644 drivers/gpu/drm/xe/xe_tlb_inval_types.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/5] drm/xe: Add xe_gt_tlb_invalidation_done_handler
  2025-07-23 18:22 [PATCH 0/5] Add TLB invalidation abstraction stuartsummers
@ 2025-07-23 18:22 ` stuartsummers
  2025-07-23 18:22 ` [PATCH 2/5] drm/xe: Decouple TLB invalidations from GT stuartsummers
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: stuartsummers @ 2025-07-23 18:22 UTC (permalink / raw)
  Cc: matthew.brost, matthew.auld, maarten.lankhorst, farah.kassabri,
	intel-xe, Stuart Summers

From: Matthew Brost <matthew.brost@intel.com>

Decouple GT TLB seqno handling from G2H handler.

v2:
 - Add kernel doc

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_inval.c | 47 ++++++++++++++++++----------
 1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_inval.c b/drivers/gpu/drm/xe/xe_gt_tlb_inval.c
index 2da0c5243a52..ea2231be79fe 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_inval.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_inval.c
@@ -487,27 +487,18 @@ void xe_gt_tlb_inval_vm(struct xe_gt *gt, struct xe_vm *vm)
 }
 
 /**
- * xe_guc_tlb_inval_done_handler - TLB invalidation done handler
- * @guc: guc
- * @msg: message indicating TLB invalidation done
- * @len: length of message
- *
- * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
- * invalidation fences for seqno. Algorithm for this depends on seqno being
- * received in-order and asserts this assumption.
+ * xe_gt_tlb_inval_done_handler - GT TLB invalidation done handler
+ * @gt: gt
+ * @seqno: seqno of invalidation that is done
  *
- * Return: 0 on success, -EPROTO for malformed messages.
+ * Update recv seqno, signal any GT TLB invalidation fences, and restart TDR
  */
-int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
+static void xe_gt_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
 {
-	struct xe_gt *gt = guc_to_gt(guc);
 	struct xe_device *xe = gt_to_xe(gt);
 	struct xe_gt_tlb_inval_fence *fence, *next;
 	unsigned long flags;
 
-	if (unlikely(len != 1))
-		return -EPROTO;
-
 	/*
 	 * This can also be run both directly from the IRQ handler and also in
 	 * process_g2h_msg(). Only one may process any individual CT message,
@@ -524,12 +515,12 @@ int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	 * process_g2h_msg().
 	 */
 	spin_lock_irqsave(&gt->tlb_inval.pending_lock, flags);
-	if (tlb_inval_seqno_past(gt, msg[0])) {
+	if (tlb_inval_seqno_past(gt, seqno)) {
 		spin_unlock_irqrestore(&gt->tlb_inval.pending_lock, flags);
-		return 0;
+		return;
 	}
 
-	WRITE_ONCE(gt->tlb_inval.seqno_recv, msg[0]);
+	WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
 
 	list_for_each_entry_safe(fence, next,
 				 &gt->tlb_inval.pending_fences, link) {
@@ -549,6 +540,28 @@ int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 		cancel_delayed_work(&gt->tlb_inval.fence_tdr);
 
 	spin_unlock_irqrestore(&gt->tlb_inval.pending_lock, flags);
+}
+
+/**
+ * xe_guc_tlb_inval_done_handler - TLB invalidation done handler
+ * @guc: guc
+ * @msg: message indicating TLB invalidation done
+ * @len: length of message
+ *
+ * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
+ * invalidation fences for seqno. Algorithm for this depends on seqno being
+ * received in-order and asserts this assumption.
+ *
+ * Return: 0 on success, -EPROTO for malformed messages.
+ */
+int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
+{
+	struct xe_gt *gt = guc_to_gt(guc);
+
+	if (unlikely(len != 1))
+		return -EPROTO;
+
+	xe_gt_tlb_inval_done_handler(gt, msg[0]);
 
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/5] drm/xe: Decouple TLB invalidations from GT
  2025-07-23 18:22 [PATCH 0/5] Add TLB invalidation abstraction stuartsummers
  2025-07-23 18:22 ` [PATCH 1/5] drm/xe: Add xe_gt_tlb_invalidation_done_handler stuartsummers
@ 2025-07-23 18:22 ` stuartsummers
  2025-07-23 18:22 ` [PATCH 3/5] drm/xe: Prep TLB invalidation fence before sending stuartsummers
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: stuartsummers @ 2025-07-23 18:22 UTC (permalink / raw)
  Cc: matthew.brost, matthew.auld, maarten.lankhorst, farah.kassabri,
	intel-xe, Stuart Summers

From: Matthew Brost <matthew.brost@intel.com>

Decouple TLB invalidations from the GT by updating the TLB invalidation
layer to accept a `struct xe_tlb_inval` instead of a `struct xe_gt`.
Also, rename *gt_tlb* to *tlb*. The internals of the TLB invalidation
code still operate on a GT, but this is now hidden from the rest of the
driver.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
---
 drivers/gpu/drm/xe/Makefile                   |   4 +-
 drivers/gpu/drm/xe/xe_ggtt.c                  |   6 +-
 drivers/gpu/drm/xe/xe_gt.c                    |   6 +-
 drivers/gpu/drm/xe/xe_gt_tlb_inval.h          |  40 -----
 drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h      |  34 ----
 drivers/gpu/drm/xe/xe_gt_types.h              |   2 +-
 drivers/gpu/drm/xe/xe_guc_ct.c                |   2 +-
 drivers/gpu/drm/xe/xe_lmtt.c                  |  12 +-
 drivers/gpu/drm/xe/xe_migrate.h               |  10 +-
 drivers/gpu/drm/xe/xe_pt.c                    |  63 ++++---
 drivers/gpu/drm/xe/xe_svm.c                   |   1 -
 .../xe/{xe_gt_tlb_inval.c => xe_tlb_inval.c}  | 143 ++++++++--------
 drivers/gpu/drm/xe/xe_tlb_inval.h             |  41 +++++
 ..._gt_tlb_inval_job.c => xe_tlb_inval_job.c} | 154 +++++++++---------
 drivers/gpu/drm/xe/xe_tlb_inval_job.h         |  34 ++++
 ...tlb_inval_types.h => xe_tlb_inval_types.h} |  35 ++--
 drivers/gpu/drm/xe/xe_trace.h                 |  24 +--
 drivers/gpu/drm/xe/xe_vm.c                    |  26 +--
 18 files changed, 331 insertions(+), 306 deletions(-)
 delete mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_inval.h
 delete mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
 rename drivers/gpu/drm/xe/{xe_gt_tlb_inval.c => xe_tlb_inval.c} (79%)
 create mode 100644 drivers/gpu/drm/xe/xe_tlb_inval.h
 rename drivers/gpu/drm/xe/{xe_gt_tlb_inval_job.c => xe_tlb_inval_job.c} (51%)
 create mode 100644 drivers/gpu/drm/xe/xe_tlb_inval_job.h
 rename drivers/gpu/drm/xe/{xe_gt_tlb_inval_types.h => xe_tlb_inval_types.h} (54%)

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 77dd145bbe2e..332b2057cc00 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -61,8 +61,6 @@ xe-y += xe_bb.o \
 	xe_gt_pagefault.o \
 	xe_gt_sysfs.o \
 	xe_gt_throttle.o \
-	xe_gt_tlb_inval.o \
-	xe_gt_tlb_inval_job.o \
 	xe_gt_topology.o \
 	xe_guc.o \
 	xe_guc_ads.o \
@@ -116,6 +114,8 @@ xe-y += xe_bb.o \
 	xe_sync.o \
 	xe_tile.o \
 	xe_tile_sysfs.o \
+	xe_tlb_inval.o \
+	xe_tlb_inval_job.o \
 	xe_trace.o \
 	xe_trace_bo.o \
 	xe_trace_guc.o \
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index 9a06d68946cf..70476e7a3ff3 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -23,13 +23,13 @@
 #include "xe_device.h"
 #include "xe_gt.h"
 #include "xe_gt_printk.h"
-#include "xe_gt_tlb_inval.h"
 #include "xe_map.h"
 #include "xe_mmio.h"
 #include "xe_pm.h"
 #include "xe_res_cursor.h"
 #include "xe_sriov.h"
 #include "xe_tile_sriov_vf.h"
+#include "xe_tlb_inval.h"
 #include "xe_wa.h"
 #include "xe_wopcm.h"
 
@@ -438,9 +438,9 @@ static void ggtt_invalidate_gt_tlb(struct xe_gt *gt)
 	if (!gt)
 		return;
 
-	err = xe_gt_tlb_inval_ggtt(gt);
+	err = xe_tlb_inval_ggtt(&gt->tlb_inval);
 	if (err)
-		drm_warn(&gt_to_xe(gt)->drm, "xe_gt_tlb_inval_ggtt error=%d", err);
+		drm_warn(&gt_to_xe(gt)->drm, "xe_tlb_inval_ggtt error=%d", err);
 }
 
 static void xe_ggtt_invalidate(struct xe_ggtt *ggtt)
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index a7048e7c7177..6ab8590b60a4 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -37,7 +37,6 @@
 #include "xe_gt_sriov_pf.h"
 #include "xe_gt_sriov_vf.h"
 #include "xe_gt_sysfs.h"
-#include "xe_gt_tlb_inval.h"
 #include "xe_gt_topology.h"
 #include "xe_guc_exec_queue_types.h"
 #include "xe_guc_pc.h"
@@ -57,6 +56,7 @@
 #include "xe_sa.h"
 #include "xe_sched_job.h"
 #include "xe_sriov.h"
+#include "xe_tlb_inval.h"
 #include "xe_tuning.h"
 #include "xe_uc.h"
 #include "xe_uc_fw.h"
@@ -842,7 +842,7 @@ static int gt_reset(struct xe_gt *gt)
 
 	xe_uc_stop(&gt->uc);
 
-	xe_gt_tlb_inval_reset(gt);
+	xe_tlb_inval_reset(&gt->tlb_inval);
 
 	err = do_gt_reset(gt);
 	if (err)
@@ -1056,5 +1056,5 @@ void xe_gt_declare_wedged(struct xe_gt *gt)
 	xe_gt_assert(gt, gt_to_xe(gt)->wedged.mode);
 
 	xe_uc_declare_wedged(&gt->uc);
-	xe_gt_tlb_inval_reset(gt);
+	xe_tlb_inval_reset(&gt->tlb_inval);
 }
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_inval.h b/drivers/gpu/drm/xe/xe_gt_tlb_inval.h
deleted file mode 100644
index 801d4ecf88f0..000000000000
--- a/drivers/gpu/drm/xe/xe_gt_tlb_inval.h
+++ /dev/null
@@ -1,40 +0,0 @@
-/* SPDX-License-Identifier: MIT */
-/*
- * Copyright © 2023 Intel Corporation
- */
-
-#ifndef _XE_GT_TLB_INVAL_H_
-#define _XE_GT_TLB_INVAL_H_
-
-#include <linux/types.h>
-
-#include "xe_gt_tlb_inval_types.h"
-
-struct xe_gt;
-struct xe_guc;
-struct xe_vm;
-struct xe_vma;
-
-int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
-
-void xe_gt_tlb_inval_reset(struct xe_gt *gt);
-int xe_gt_tlb_inval_ggtt(struct xe_gt *gt);
-void xe_gt_tlb_inval_vm(struct xe_gt *gt, struct xe_vm *vm);
-int xe_gt_tlb_inval_all(struct xe_gt *gt, struct xe_gt_tlb_inval_fence *fence);
-int xe_gt_tlb_inval_range(struct xe_gt *gt,
-			  struct xe_gt_tlb_inval_fence *fence,
-			  u64 start, u64 end, u32 asid);
-int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
-
-void xe_gt_tlb_inval_fence_init(struct xe_gt *gt,
-				struct xe_gt_tlb_inval_fence *fence,
-				bool stack);
-void xe_gt_tlb_inval_fence_signal(struct xe_gt_tlb_inval_fence *fence);
-
-static inline void
-xe_gt_tlb_inval_fence_wait(struct xe_gt_tlb_inval_fence *fence)
-{
-	dma_fence_wait(&fence->base, false);
-}
-
-#endif	/* _XE_GT_TLB_INVAL_ */
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
deleted file mode 100644
index 883896194a34..000000000000
--- a/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
+++ /dev/null
@@ -1,34 +0,0 @@
-/* SPDX-License-Identifier: MIT */
-/*
- * Copyright © 2025 Intel Corporation
- */
-
-#ifndef _XE_GT_TLB_INVAL_JOB_H_
-#define _XE_GT_TLB_INVAL_JOB_H_
-
-#include <linux/types.h>
-
-struct dma_fence;
-struct drm_sched_job;
-struct kref;
-struct xe_exec_queue;
-struct xe_gt;
-struct xe_gt_tlb_inval_job;
-struct xe_migrate;
-
-struct xe_gt_tlb_inval_job *xe_gt_tlb_inval_job_create(struct xe_exec_queue *q,
-						       struct xe_gt *gt,
-						       u64 start, u64 end,
-						       u32 asid);
-
-int xe_gt_tlb_inval_job_alloc_dep(struct xe_gt_tlb_inval_job *job);
-
-struct dma_fence *xe_gt_tlb_inval_job_push(struct xe_gt_tlb_inval_job *job,
-					   struct xe_migrate *m,
-					   struct dma_fence *fence);
-
-void xe_gt_tlb_inval_job_get(struct xe_gt_tlb_inval_job *job);
-
-void xe_gt_tlb_inval_job_put(struct xe_gt_tlb_inval_job *job);
-
-#endif
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index ed21bd63b001..cdba2e9f584b 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -12,12 +12,12 @@
 #include "xe_gt_sriov_pf_types.h"
 #include "xe_gt_sriov_vf_types.h"
 #include "xe_gt_stats_types.h"
-#include "xe_gt_tlb_inval_types.h"
 #include "xe_hw_engine_types.h"
 #include "xe_hw_fence_types.h"
 #include "xe_oa_types.h"
 #include "xe_reg_sr_types.h"
 #include "xe_sa_types.h"
+#include "xe_tlb_inval_types.h"
 #include "xe_uc_types.h"
 
 struct xe_exec_queue_ops;
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index c213a037b346..2ef86c0ae8b4 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -26,13 +26,13 @@
 #include "xe_gt_sriov_pf_control.h"
 #include "xe_gt_sriov_pf_monitor.h"
 #include "xe_gt_sriov_printk.h"
-#include "xe_gt_tlb_inval.h"
 #include "xe_guc.h"
 #include "xe_guc_log.h"
 #include "xe_guc_relay.h"
 #include "xe_guc_submit.h"
 #include "xe_map.h"
 #include "xe_pm.h"
+#include "xe_tlb_inval.h"
 #include "xe_trace_guc.h"
 
 static void receive_g2h(struct xe_guc_ct *ct);
diff --git a/drivers/gpu/drm/xe/xe_lmtt.c b/drivers/gpu/drm/xe/xe_lmtt.c
index 8869ad491d99..16fcee64025b 100644
--- a/drivers/gpu/drm/xe/xe_lmtt.c
+++ b/drivers/gpu/drm/xe/xe_lmtt.c
@@ -11,7 +11,7 @@
 
 #include "xe_assert.h"
 #include "xe_bo.h"
-#include "xe_gt_tlb_inval.h"
+#include "xe_tlb_inval.h"
 #include "xe_lmtt.h"
 #include "xe_map.h"
 #include "xe_mmio.h"
@@ -225,8 +225,8 @@ void xe_lmtt_init_hw(struct xe_lmtt *lmtt)
 
 static int lmtt_invalidate_hw(struct xe_lmtt *lmtt)
 {
-	struct xe_gt_tlb_inval_fence fences[XE_MAX_GT_PER_TILE];
-	struct xe_gt_tlb_inval_fence *fence = fences;
+	struct xe_tlb_inval_fence fences[XE_MAX_GT_PER_TILE];
+	struct xe_tlb_inval_fence *fence = fences;
 	struct xe_tile *tile = lmtt_to_tile(lmtt);
 	struct xe_gt *gt;
 	int result = 0;
@@ -234,8 +234,8 @@ static int lmtt_invalidate_hw(struct xe_lmtt *lmtt)
 	u8 id;
 
 	for_each_gt_on_tile(gt, tile, id) {
-		xe_gt_tlb_inval_fence_init(gt, fence, true);
-		err = xe_gt_tlb_inval_all(gt, fence);
+		xe_tlb_inval_fence_init(&gt->tlb_inval, fence, true);
+		err = xe_tlb_inval_all(&gt->tlb_inval, fence);
 		result = result ?: err;
 		fence++;
 	}
@@ -249,7 +249,7 @@ static int lmtt_invalidate_hw(struct xe_lmtt *lmtt)
 	 */
 	fence = fences;
 	for_each_gt_on_tile(gt, tile, id)
-		xe_gt_tlb_inval_fence_wait(fence++);
+		xe_tlb_inval_fence_wait(fence++);
 
 	return result;
 }
diff --git a/drivers/gpu/drm/xe/xe_migrate.h b/drivers/gpu/drm/xe/xe_migrate.h
index 3758f9615484..d1611f5d9369 100644
--- a/drivers/gpu/drm/xe/xe_migrate.h
+++ b/drivers/gpu/drm/xe/xe_migrate.h
@@ -14,7 +14,7 @@ struct ttm_resource;
 
 struct xe_bo;
 struct xe_gt;
-struct xe_gt_tlb_inval_job;
+struct xe_tlb_inval_job;
 struct xe_exec_queue;
 struct xe_migrate;
 struct xe_migrate_pt_update;
@@ -93,13 +93,13 @@ struct xe_migrate_pt_update {
 	/** @job: The job if a GPU page-table update. NULL otherwise */
 	struct xe_sched_job *job;
 	/**
-	 * @ijob: The GT TLB invalidation job for primary tile. NULL otherwise
+	 * @ijob: The TLB invalidation job for primary GT. NULL otherwise
 	 */
-	struct xe_gt_tlb_inval_job *ijob;
+	struct xe_tlb_inval_job *ijob;
 	/**
-	 * @mjob: The GT TLB invalidation job for media tile. NULL otherwise
+	 * @mjob: The TLB invalidation job for media GT. NULL otherwise
 	 */
-	struct xe_gt_tlb_inval_job *mjob;
+	struct xe_tlb_inval_job *mjob;
 	/** @tile_id: Tile ID of the update */
 	u8 tile_id;
 };
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 0645b2fee645..5493ab36fbcf 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -13,8 +13,6 @@
 #include "xe_drm_client.h"
 #include "xe_exec_queue.h"
 #include "xe_gt.h"
-#include "xe_gt_tlb_inval.h"
-#include "xe_gt_tlb_inval_job.h"
 #include "xe_migrate.h"
 #include "xe_pt_types.h"
 #include "xe_pt_walk.h"
@@ -22,6 +20,7 @@
 #include "xe_sched_job.h"
 #include "xe_sync.h"
 #include "xe_svm.h"
+#include "xe_tlb_inval_job.h"
 #include "xe_trace.h"
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_vm.h"
@@ -1262,8 +1261,8 @@ static int op_add_deps(struct xe_vm *vm, struct xe_vma_op *op,
 }
 
 static int xe_pt_vm_dependencies(struct xe_sched_job *job,
-				 struct xe_gt_tlb_inval_job *ijob,
-				 struct xe_gt_tlb_inval_job *mjob,
+				 struct xe_tlb_inval_job *ijob,
+				 struct xe_tlb_inval_job *mjob,
 				 struct xe_vm *vm,
 				 struct xe_vma_ops *vops,
 				 struct xe_vm_pgtable_update_ops *pt_update_ops,
@@ -1333,13 +1332,13 @@ static int xe_pt_vm_dependencies(struct xe_sched_job *job,
 
 	if (job) {
 		if (ijob) {
-			err = xe_gt_tlb_inval_job_alloc_dep(ijob);
+			err = xe_tlb_inval_job_alloc_dep(ijob);
 			if (err)
 				return err;
 		}
 
 		if (mjob) {
-			err = xe_gt_tlb_inval_job_alloc_dep(mjob);
+			err = xe_tlb_inval_job_alloc_dep(mjob);
 			if (err)
 				return err;
 		}
@@ -2339,6 +2338,15 @@ static const struct xe_migrate_pt_update_ops svm_migrate_ops = {
 static const struct xe_migrate_pt_update_ops svm_migrate_ops;
 #endif
 
+static struct xe_dep_scheduler *to_dep_scheduler(struct xe_exec_queue *q,
+						 struct xe_gt *gt)
+{
+	if (xe_gt_is_media_type(gt))
+		return q->tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT].dep_scheduler;
+
+	return q->tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT].dep_scheduler;
+}
+
 /**
  * xe_pt_update_ops_run() - Run PT update operations
  * @tile: Tile of PT update operations
@@ -2357,7 +2365,7 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
 	struct xe_vm_pgtable_update_ops *pt_update_ops =
 		&vops->pt_update_ops[tile->id];
 	struct dma_fence *fence, *ifence, *mfence;
-	struct xe_gt_tlb_inval_job *ijob = NULL, *mjob = NULL;
+	struct xe_tlb_inval_job *ijob = NULL, *mjob = NULL;
 	struct dma_fence **fences = NULL;
 	struct dma_fence_array *cf = NULL;
 	struct xe_range_fence *rfence;
@@ -2389,22 +2397,29 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
 #endif
 
 	if (pt_update_ops->needs_invalidation) {
-		ijob = xe_gt_tlb_inval_job_create(pt_update_ops->q,
-						  tile->primary_gt,
-						  pt_update_ops->start,
-						  pt_update_ops->last,
-						  vm->usm.asid);
+		struct xe_exec_queue *q = pt_update_ops->q;
+		struct xe_dep_scheduler *dep_scheduler =
+			to_dep_scheduler(q, tile->primary_gt);
+
+		ijob = xe_tlb_inval_job_create(q, &tile->primary_gt->tlb_inval,
+					       dep_scheduler,
+					       pt_update_ops->start,
+					       pt_update_ops->last,
+					       vm->usm.asid);
 		if (IS_ERR(ijob)) {
 			err = PTR_ERR(ijob);
 			goto kill_vm_tile1;
 		}
 
 		if (tile->media_gt) {
-			mjob = xe_gt_tlb_inval_job_create(pt_update_ops->q,
-							  tile->media_gt,
-							  pt_update_ops->start,
-							  pt_update_ops->last,
-							  vm->usm.asid);
+			dep_scheduler = to_dep_scheduler(q, tile->media_gt);
+
+			mjob = xe_tlb_inval_job_create(q,
+						       &tile->media_gt->tlb_inval,
+						       dep_scheduler,
+						       pt_update_ops->start,
+						       pt_update_ops->last,
+						       vm->usm.asid);
 			if (IS_ERR(mjob)) {
 				err = PTR_ERR(mjob);
 				goto free_ijob;
@@ -2455,13 +2470,13 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
 	if (ijob) {
 		struct dma_fence *__fence;
 
-		ifence = xe_gt_tlb_inval_job_push(ijob, tile->migrate, fence);
+		ifence = xe_tlb_inval_job_push(ijob, tile->migrate, fence);
 		__fence = ifence;
 
 		if (mjob) {
 			fences[0] = ifence;
-			mfence = xe_gt_tlb_inval_job_push(mjob, tile->migrate,
-							  fence);
+			mfence = xe_tlb_inval_job_push(mjob, tile->migrate,
+						       fence);
 			fences[1] = mfence;
 
 			dma_fence_array_init(cf, 2, fences,
@@ -2504,8 +2519,8 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
 	if (pt_update_ops->needs_userptr_lock)
 		up_read(&vm->userptr.notifier_lock);
 
-	xe_gt_tlb_inval_job_put(mjob);
-	xe_gt_tlb_inval_job_put(ijob);
+	xe_tlb_inval_job_put(mjob);
+	xe_tlb_inval_job_put(ijob);
 
 	return fence;
 
@@ -2514,8 +2529,8 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
 free_ijob:
 	kfree(cf);
 	kfree(fences);
-	xe_gt_tlb_inval_job_put(mjob);
-	xe_gt_tlb_inval_job_put(ijob);
+	xe_tlb_inval_job_put(mjob);
+	xe_tlb_inval_job_put(ijob);
 kill_vm_tile1:
 	if (err != -EAGAIN && err != -ENODATA && tile->id)
 		xe_vm_kill(vops->vm, false);
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 0c9f7fe42af6..b8f8b1a577b6 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -7,7 +7,6 @@
 
 #include "xe_bo.h"
 #include "xe_gt_stats.h"
-#include "xe_gt_tlb_inval.h"
 #include "xe_migrate.h"
 #include "xe_module.h"
 #include "xe_pm.h"
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
similarity index 79%
rename from drivers/gpu/drm/xe/xe_gt_tlb_inval.c
rename to drivers/gpu/drm/xe/xe_tlb_inval.c
index ea2231be79fe..a25c35005689 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_inval.c
+++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
@@ -13,7 +13,7 @@
 #include "xe_guc.h"
 #include "xe_guc_ct.h"
 #include "xe_gt_stats.h"
-#include "xe_gt_tlb_inval.h"
+#include "xe_tlb_inval.h"
 #include "xe_mmio.h"
 #include "xe_pm.h"
 #include "xe_sriov.h"
@@ -38,40 +38,46 @@ static long tlb_timeout_jiffies(struct xe_gt *gt)
 	return hw_tlb_timeout + 2 * delay;
 }
 
-static void xe_gt_tlb_inval_fence_fini(struct xe_gt_tlb_inval_fence *fence)
+static void xe_tlb_inval_fence_fini(struct xe_tlb_inval_fence *fence)
 {
-	if (WARN_ON_ONCE(!fence->gt))
+	struct xe_gt *gt;
+
+	if (WARN_ON_ONCE(!fence->tlb_inval))
 		return;
 
-	xe_pm_runtime_put(gt_to_xe(fence->gt));
-	fence->gt = NULL; /* fini() should be called once */
+	gt = fence->tlb_inval->private;
+	xe_pm_runtime_put(gt_to_xe(gt));
+	fence->tlb_inval = NULL; /* fini() should be called once */
 }
 
 static void
-__inval_fence_signal(struct xe_device *xe, struct xe_gt_tlb_inval_fence *fence)
+__inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
 {
 	bool stack = test_bit(FENCE_STACK_BIT, &fence->base.flags);
 
-	trace_xe_gt_tlb_inval_fence_signal(xe, fence);
-	xe_gt_tlb_inval_fence_fini(fence);
+	trace_xe_tlb_inval_fence_signal(xe, fence);
+	xe_tlb_inval_fence_fini(fence);
 	dma_fence_signal(&fence->base);
 	if (!stack)
 		dma_fence_put(&fence->base);
 }
 
 static void
-inval_fence_signal(struct xe_device *xe, struct xe_gt_tlb_inval_fence *fence)
+inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
 {
 	list_del(&fence->link);
 	__inval_fence_signal(xe, fence);
 }
 
-void xe_gt_tlb_inval_fence_signal(struct xe_gt_tlb_inval_fence *fence)
+void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence)
 {
-	if (WARN_ON_ONCE(!fence->gt))
+	struct xe_gt *gt;
+
+	if (WARN_ON_ONCE(!fence->tlb_inval))
 		return;
 
-	__inval_fence_signal(gt_to_xe(fence->gt), fence);
+	gt = fence->tlb_inval->private;
+	__inval_fence_signal(gt_to_xe(gt), fence);
 }
 
 static void xe_gt_tlb_fence_timeout(struct work_struct *work)
@@ -79,7 +85,7 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
 	struct xe_gt *gt = container_of(work, struct xe_gt,
 					tlb_inval.fence_tdr.work);
 	struct xe_device *xe = gt_to_xe(gt);
-	struct xe_gt_tlb_inval_fence *fence, *next;
+	struct xe_tlb_inval_fence *fence, *next;
 
 	LNL_FLUSH_WORK(&gt->uc.guc.ct.g2h_worker);
 
@@ -92,7 +98,7 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
 		if (msecs_to_jiffies(since_inval_ms) < tlb_timeout_jiffies(gt))
 			break;
 
-		trace_xe_gt_tlb_inval_fence_timeout(xe, fence);
+		trace_xe_tlb_inval_fence_timeout(xe, fence);
 		xe_gt_err(gt, "TLB invalidation fence timeout, seqno=%d recv=%d",
 			  fence->seqno, gt->tlb_inval.seqno_recv);
 
@@ -107,16 +113,17 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
 }
 
 /**
- * xe_gt_tlb_inval_init_early - Initialize GT TLB invalidation state
+ * xe_tlb_inval_init_early - Initialize TLB invalidation state
  * @gt: GT structure
  *
- * Initialize GT TLB invalidation state, purely software initialization, should
+ * Initialize TLB invalidation state, purely software initialization, should
  * be called once during driver load.
  *
  * Return: 0 on success, negative error code on error.
  */
 int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
 {
+	gt->tlb_inval.private = gt;
 	gt->tlb_inval.seqno = 1;
 	INIT_LIST_HEAD(&gt->tlb_inval.pending_fences);
 	spin_lock_init(&gt->tlb_inval.pending_lock);
@@ -134,14 +141,15 @@ int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
 }
 
 /**
- * xe_gt_tlb_inval_reset - Initialize GT TLB invalidation reset
- * @gt: GT structure
+ * xe_tlb_inval_reset - Initialize TLB invalidation reset
+ * @tlb_inval: TLB invalidation client
  *
  * Signal any pending invalidation fences, should be called during a GT reset
  */
-void xe_gt_tlb_inval_reset(struct xe_gt *gt)
+void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval)
 {
-	struct xe_gt_tlb_inval_fence *fence, *next;
+	struct xe_gt *gt = tlb_inval->private;
+	struct xe_tlb_inval_fence *fence, *next;
 	int pending_seqno;
 
 	/*
@@ -194,7 +202,7 @@ static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
 }
 
 static int send_tlb_inval(struct xe_guc *guc,
-			  struct xe_gt_tlb_inval_fence *fence,
+			  struct xe_tlb_inval_fence *fence,
 			  u32 *action, int len)
 {
 	struct xe_gt *gt = guc_to_gt(guc);
@@ -213,7 +221,7 @@ static int send_tlb_inval(struct xe_guc *guc,
 	mutex_lock(&guc->ct.lock);
 	seqno = gt->tlb_inval.seqno;
 	fence->seqno = seqno;
-	trace_xe_gt_tlb_inval_fence_send(xe, fence);
+	trace_xe_tlb_inval_fence_send(xe, fence);
 	action[1] = seqno;
 	ret = xe_guc_ct_send_locked(&guc->ct, action, len,
 				    G2H_LEN_DW_TLB_INVALIDATE, 1);
@@ -258,7 +266,7 @@ static int send_tlb_inval(struct xe_guc *guc,
 		XE_GUC_TLB_INVAL_FLUSH_CACHE)
 
 /**
- * xe_gt_tlb_inval_guc - Issue a TLB invalidation on this GT for the GuC
+ * xe_tlb_inval_guc - Issue a TLB invalidation on this GT for the GuC
  * @gt: GT structure
  * @fence: invalidation fence which will be signal on TLB invalidation
  * completion
@@ -268,8 +276,8 @@ static int send_tlb_inval(struct xe_guc *guc,
  *
  * Return: 0 on success, negative error code on error
  */
-static int xe_gt_tlb_inval_guc(struct xe_gt *gt,
-			       struct xe_gt_tlb_inval_fence *fence)
+static int xe_tlb_inval_guc(struct xe_gt *gt,
+			    struct xe_tlb_inval_fence *fence)
 {
 	u32 action[] = {
 		XE_GUC_ACTION_TLB_INVALIDATION,
@@ -290,30 +298,31 @@ static int xe_gt_tlb_inval_guc(struct xe_gt *gt,
 }
 
 /**
- * xe_gt_tlb_inval_ggtt - Issue a TLB invalidation on this GT for the GGTT
- * @gt: GT structure
+ * xe_tlb_inval_ggtt - Issue a TLB invalidation on this GT for the GGTT
+ * @tlb_inval: TLB invalidation client
  *
  * Issue a TLB invalidation for the GGTT. Completion of TLB invalidation is
  * synchronous.
  *
  * Return: 0 on success, negative error code on error
  */
-int xe_gt_tlb_inval_ggtt(struct xe_gt *gt)
+int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
 {
+	struct xe_gt *gt = tlb_inval->private;
 	struct xe_device *xe = gt_to_xe(gt);
 	unsigned int fw_ref;
 
 	if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
 	    gt->uc.guc.submission_state.enabled) {
-		struct xe_gt_tlb_inval_fence fence;
+		struct xe_tlb_inval_fence fence;
 		int ret;
 
-		xe_gt_tlb_inval_fence_init(gt, &fence, true);
-		ret = xe_gt_tlb_inval_guc(gt, &fence);
+		xe_tlb_inval_fence_init(tlb_inval, &fence, true);
+		ret = xe_tlb_inval_guc(gt, &fence);
 		if (ret)
 			return ret;
 
-		xe_gt_tlb_inval_fence_wait(&fence);
+		xe_tlb_inval_fence_wait(&fence);
 	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
 		struct xe_mmio *mmio = &gt->mmio;
 
@@ -336,14 +345,17 @@ int xe_gt_tlb_inval_ggtt(struct xe_gt *gt)
 	return 0;
 }
 
-static int send_tlb_inval_all(struct xe_gt *gt,
-			      struct xe_gt_tlb_inval_fence *fence)
+static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
+			      struct xe_tlb_inval_fence *fence)
 {
 	u32 action[] = {
 		XE_GUC_ACTION_TLB_INVALIDATION_ALL,
 		0,  /* seqno, replaced in send_tlb_inval */
 		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
 	};
+	struct xe_gt *gt = tlb_inval->private;
+
+	xe_gt_assert(gt, fence);
 
 	return send_tlb_inval(&gt->uc.guc, fence, action, ARRAY_SIZE(action));
 }
@@ -351,19 +363,19 @@ static int send_tlb_inval_all(struct xe_gt *gt,
 /**
  * xe_gt_tlb_invalidation_all - Invalidate all TLBs across PF and all VFs.
  * @gt: the &xe_gt structure
- * @fence: the &xe_gt_tlb_inval_fence to be signaled on completion
+ * @fence: the &xe_tlb_inval_fence to be signaled on completion
  *
  * Send a request to invalidate all TLBs across PF and all VFs.
  *
  * Return: 0 on success, negative error code on error
  */
-int xe_gt_tlb_inval_all(struct xe_gt *gt, struct xe_gt_tlb_inval_fence *fence)
+int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
+		     struct xe_tlb_inval_fence *fence)
 {
+	struct xe_gt *gt = tlb_inval->private;
 	int err;
 
-	xe_gt_assert(gt, gt == fence->gt);
-
-	err = send_tlb_inval_all(gt, fence);
+	err = send_tlb_inval_all(tlb_inval, fence);
 	if (err)
 		xe_gt_err(gt, "TLB invalidation request failed (%pe)", ERR_PTR(err));
 
@@ -378,9 +390,8 @@ int xe_gt_tlb_inval_all(struct xe_gt *gt, struct xe_gt_tlb_inval_fence *fence)
 #define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
 
 /**
- * xe_gt_tlb_inval_range - Issue a TLB invalidation on this GT for an address range
- *
- * @gt: GT structure
+ * xe_tlb_inval_range - Issue a TLB invalidation on this GT for an address range
+ * @tlb_inval: TLB invalidation client
  * @fence: invalidation fence which will be signal on TLB invalidation
  * completion
  * @start: start address
@@ -393,9 +404,11 @@ int xe_gt_tlb_inval_all(struct xe_gt *gt, struct xe_gt_tlb_inval_fence *fence)
  *
  * Return: Negative error code on error, 0 on success
  */
-int xe_gt_tlb_inval_range(struct xe_gt *gt, struct xe_gt_tlb_inval_fence *fence,
-			  u64 start, u64 end, u32 asid)
+int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
+		       struct xe_tlb_inval_fence *fence, u64 start, u64 end,
+		       u32 asid)
 {
+	struct xe_gt *gt = tlb_inval->private;
 	struct xe_device *xe = gt_to_xe(gt);
 #define MAX_TLB_INVALIDATION_LEN	7
 	u32 action[MAX_TLB_INVALIDATION_LEN];
@@ -465,38 +478,38 @@ int xe_gt_tlb_inval_range(struct xe_gt *gt, struct xe_gt_tlb_inval_fence *fence,
 }
 
 /**
- * xe_gt_tlb_inval_vm - Issue a TLB invalidation on this GT for a VM
- * @gt: graphics tile
+ * xe_tlb_inval_vm - Issue a TLB invalidation on this GT for a VM
+ * @tlb_inval: TLB invalidation client
  * @vm: VM to invalidate
  *
  * Invalidate entire VM's address space
  */
-void xe_gt_tlb_inval_vm(struct xe_gt *gt, struct xe_vm *vm)
+void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm)
 {
-	struct xe_gt_tlb_inval_fence fence;
+	struct xe_tlb_inval_fence fence;
 	u64 range = 1ull << vm->xe->info.va_bits;
 	int ret;
 
-	xe_gt_tlb_inval_fence_init(gt, &fence, true);
+	xe_tlb_inval_fence_init(tlb_inval, &fence, true);
 
-	ret = xe_gt_tlb_inval_range(gt, &fence, 0, range, vm->usm.asid);
+	ret = xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm->usm.asid);
 	if (ret < 0)
 		return;
 
-	xe_gt_tlb_inval_fence_wait(&fence);
+	xe_tlb_inval_fence_wait(&fence);
 }
 
 /**
- * xe_gt_tlb_inval_done_handler - GT TLB invalidation done handler
+ * xe_tlb_inval_done_handler - TLB invalidation done handler
  * @gt: gt
  * @seqno: seqno of invalidation that is done
  *
- * Update recv seqno, signal any GT TLB invalidation fences, and restart TDR
+ * Update recv seqno, signal any TLB invalidation fences, and restart TDR
  */
-static void xe_gt_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
+static void xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
 {
 	struct xe_device *xe = gt_to_xe(gt);
-	struct xe_gt_tlb_inval_fence *fence, *next;
+	struct xe_tlb_inval_fence *fence, *next;
 	unsigned long flags;
 
 	/*
@@ -524,7 +537,7 @@ static void xe_gt_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
 
 	list_for_each_entry_safe(fence, next,
 				 &gt->tlb_inval.pending_fences, link) {
-		trace_xe_gt_tlb_inval_fence_recv(xe, fence);
+		trace_xe_tlb_inval_fence_recv(xe, fence);
 
 		if (!tlb_inval_seqno_past(gt, fence->seqno))
 			break;
@@ -561,7 +574,7 @@ int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	if (unlikely(len != 1))
 		return -EPROTO;
 
-	xe_gt_tlb_inval_done_handler(gt, msg[0]);
+	xe_tlb_inval_done_handler(gt, msg[0]);
 
 	return 0;
 }
@@ -584,19 +597,21 @@ static const struct dma_fence_ops inval_fence_ops = {
 };
 
 /**
- * xe_gt_tlb_inval_fence_init - Initialize TLB invalidation fence
- * @gt: GT
+ * xe_tlb_inval_fence_init - Initialize TLB invalidation fence
+ * @tlb_inval: TLB invalidation client
  * @fence: TLB invalidation fence to initialize
  * @stack: fence is stack variable
  *
- * Initialize TLB invalidation fence for use. xe_gt_tlb_inval_fence_fini
+ * Initialize TLB invalidation fence for use. xe_tlb_inval_fence_fini
  * will be automatically called when fence is signalled (all fences must signal),
  * even on error.
  */
-void xe_gt_tlb_inval_fence_init(struct xe_gt *gt,
-				struct xe_gt_tlb_inval_fence *fence,
-				bool stack)
+void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
+			     struct xe_tlb_inval_fence *fence,
+			     bool stack)
 {
+	struct xe_gt *gt = tlb_inval->private;
+
 	xe_pm_runtime_get_noresume(gt_to_xe(gt));
 
 	spin_lock_irq(&gt->tlb_inval.lock);
@@ -609,5 +624,5 @@ void xe_gt_tlb_inval_fence_init(struct xe_gt *gt,
 		set_bit(FENCE_STACK_BIT, &fence->base.flags);
 	else
 		dma_fence_get(&fence->base);
-	fence->gt = gt;
+	fence->tlb_inval = tlb_inval;
 }
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h b/drivers/gpu/drm/xe/xe_tlb_inval.h
new file mode 100644
index 000000000000..7adee3f8c551
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#ifndef _XE_TLB_INVAL_H_
+#define _XE_TLB_INVAL_H_
+
+#include <linux/types.h>
+
+#include "xe_tlb_inval_types.h"
+
+struct xe_gt;
+struct xe_guc;
+struct xe_vm;
+struct xe_vma;
+
+int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
+
+void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval);
+int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
+void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm);
+int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
+		     struct xe_tlb_inval_fence *fence);
+int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
+		       struct xe_tlb_inval_fence *fence,
+		       u64 start, u64 end, u32 asid);
+int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
+
+void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
+			     struct xe_tlb_inval_fence *fence,
+			     bool stack);
+void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence);
+
+static inline void
+xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence *fence)
+{
+	dma_fence_wait(&fence->base, false);
+}
+
+#endif	/* _XE_TLB_INVAL_ */
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
similarity index 51%
rename from drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
rename to drivers/gpu/drm/xe/xe_tlb_inval_job.c
index 41e0ea92ea5a..492def04a559 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
+++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
@@ -3,21 +3,22 @@
  * Copyright © 2025 Intel Corporation
  */
 
+#include "xe_assert.h"
 #include "xe_dep_job_types.h"
 #include "xe_dep_scheduler.h"
 #include "xe_exec_queue.h"
-#include "xe_gt.h"
-#include "xe_gt_tlb_inval.h"
-#include "xe_gt_tlb_inval_job.h"
+#include "xe_gt_types.h"
+#include "xe_tlb_inval.h"
+#include "xe_tlb_inval_job.h"
 #include "xe_migrate.h"
 #include "xe_pm.h"
 
-/** struct xe_gt_tlb_inval_job - GT TLB invalidation job */
-struct xe_gt_tlb_inval_job {
+/** struct xe_tlb_inval_job - TLB invalidation job */
+struct xe_tlb_inval_job {
 	/** @dep: base generic dependency Xe job */
 	struct xe_dep_job dep;
-	/** @gt: GT to invalidate */
-	struct xe_gt *gt;
+	/** @tlb_inval: TLB invalidation client */
+	struct xe_tlb_inval *tlb_inval;
 	/** @q: exec queue issuing the invalidate */
 	struct xe_exec_queue *q;
 	/** @refcount: ref count of this job */
@@ -37,63 +38,56 @@ struct xe_gt_tlb_inval_job {
 	bool fence_armed;
 };
 
-static struct dma_fence *xe_gt_tlb_inval_job_run(struct xe_dep_job *dep_job)
+static struct dma_fence *xe_tlb_inval_job_run(struct xe_dep_job *dep_job)
 {
-	struct xe_gt_tlb_inval_job *job =
+	struct xe_tlb_inval_job *job =
 		container_of(dep_job, typeof(*job), dep);
-	struct xe_gt_tlb_inval_fence *ifence =
+	struct xe_tlb_inval_fence *ifence =
 		container_of(job->fence, typeof(*ifence), base);
 
-	xe_gt_tlb_inval_range(job->gt, ifence, job->start,
-			      job->end, job->asid);
+	xe_tlb_inval_range(job->tlb_inval, ifence, job->start,
+			   job->end, job->asid);
 
 	return job->fence;
 }
 
-static void xe_gt_tlb_inval_job_free(struct xe_dep_job *dep_job)
+static void xe_tlb_inval_job_free(struct xe_dep_job *dep_job)
 {
-	struct xe_gt_tlb_inval_job *job =
+	struct xe_tlb_inval_job *job =
 		container_of(dep_job, typeof(*job), dep);
 
-	/* Pairs with get in xe_gt_tlb_inval_job_push */
-	xe_gt_tlb_inval_job_put(job);
+	/* Pairs with get in xe_tlb_inval_job_push */
+	xe_tlb_inval_job_put(job);
 }
 
 static const struct xe_dep_job_ops dep_job_ops = {
-	.run_job = xe_gt_tlb_inval_job_run,
-	.free_job = xe_gt_tlb_inval_job_free,
+	.run_job = xe_tlb_inval_job_run,
+	.free_job = xe_tlb_inval_job_free,
 };
 
-static int xe_gt_tlb_inval_context(struct xe_gt *gt)
-{
-	return xe_gt_is_media_type(gt) ? XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT :
-		XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT;
-}
-
 /**
- * xe_gt_tlb_inval_job_create() - GT TLB invalidation job create
- * @gt: GT to invalidate
+ * xe_tlb_inval_job_create() - TLB invalidation job create
  * @q: exec queue issuing the invalidate
+ * @tlb_inval: TLB invalidation client
+ * @dep_scheduler: Dependency scheduler for job
  * @start: Start address to invalidate
  * @end: End address to invalidate
  * @asid: Address space ID to invalidate
  *
- * Create a GT TLB invalidation job and initialize internal fields. The caller is
+ * Create a TLB invalidation job and initialize internal fields. The caller is
  * responsible for releasing the creation reference.
  *
- * Return: GT TLB invalidation job object on success, ERR_PTR failure
+ * Return: TLB invalidation job object on success, ERR_PTR failure
  */
-struct xe_gt_tlb_inval_job *xe_gt_tlb_inval_job_create(struct xe_exec_queue *q,
-						       struct xe_gt *gt,
-						       u64 start, u64 end,
-						       u32 asid)
+struct xe_tlb_inval_job *
+xe_tlb_inval_job_create(struct xe_exec_queue *q, struct xe_tlb_inval *tlb_inval,
+			struct xe_dep_scheduler *dep_scheduler, u64 start,
+			u64 end, u32 asid)
 {
-	struct xe_gt_tlb_inval_job *job;
-	struct xe_dep_scheduler *dep_scheduler =
-		q->tlb_inval[xe_gt_tlb_inval_context(gt)].dep_scheduler;
+	struct xe_tlb_inval_job *job;
 	struct drm_sched_entity *entity =
 		xe_dep_scheduler_entity(dep_scheduler);
-	struct xe_gt_tlb_inval_fence *ifence;
+	struct xe_tlb_inval_fence *ifence;
 	int err;
 
 	job = kmalloc(sizeof(*job), GFP_KERNEL);
@@ -101,14 +95,14 @@ struct xe_gt_tlb_inval_job *xe_gt_tlb_inval_job_create(struct xe_exec_queue *q,
 		return ERR_PTR(-ENOMEM);
 
 	job->q = q;
-	job->gt = gt;
+	job->tlb_inval = tlb_inval;
 	job->start = start;
 	job->end = end;
 	job->asid = asid;
 	job->fence_armed = false;
 	job->dep.ops = &dep_job_ops;
 	kref_init(&job->refcount);
-	xe_exec_queue_get(q);	/* Pairs with put in xe_gt_tlb_inval_job_destroy */
+	xe_exec_queue_get(q);	/* Pairs with put in xe_tlb_inval_job_destroy */
 
 	ifence = kmalloc(sizeof(*ifence), GFP_KERNEL);
 	if (!ifence) {
@@ -122,8 +116,8 @@ struct xe_gt_tlb_inval_job *xe_gt_tlb_inval_job_create(struct xe_exec_queue *q,
 	if (err)
 		goto err_fence;
 
-	/* Pairs with put in xe_gt_tlb_inval_job_destroy */
-	xe_pm_runtime_get_noresume(gt_to_xe(job->gt));
+	/* Pairs with put in xe_tlb_inval_job_destroy */
+	xe_pm_runtime_get_noresume(gt_to_xe(q->gt));
 
 	return job;
 
@@ -136,40 +130,40 @@ struct xe_gt_tlb_inval_job *xe_gt_tlb_inval_job_create(struct xe_exec_queue *q,
 	return ERR_PTR(err);
 }
 
-static void xe_gt_tlb_inval_job_destroy(struct kref *ref)
+static void xe_tlb_inval_job_destroy(struct kref *ref)
 {
-	struct xe_gt_tlb_inval_job *job = container_of(ref, typeof(*job),
-						       refcount);
-	struct xe_gt_tlb_inval_fence *ifence =
+	struct xe_tlb_inval_job *job = container_of(ref, typeof(*job),
+						    refcount);
+	struct xe_tlb_inval_fence *ifence =
 		container_of(job->fence, typeof(*ifence), base);
-	struct xe_device *xe = gt_to_xe(job->gt);
 	struct xe_exec_queue *q = job->q;
+	struct xe_device *xe = gt_to_xe(q->gt);
 
 	if (!job->fence_armed)
 		kfree(ifence);
 	else
-		/* Ref from xe_gt_tlb_inval_fence_init */
+		/* Ref from xe_tlb_inval_fence_init */
 		dma_fence_put(job->fence);
 
 	drm_sched_job_cleanup(&job->dep.drm);
 	kfree(job);
-	xe_exec_queue_put(q);	/* Pairs with get from xe_gt_tlb_inval_job_create */
-	xe_pm_runtime_put(xe);	/* Pairs with get from xe_gt_tlb_inval_job_create */
+	xe_exec_queue_put(q);	/* Pairs with get from xe_tlb_inval_job_create */
+	xe_pm_runtime_put(xe);	/* Pairs with get from xe_tlb_inval_job_create */
 }
 
 /**
- * xe_gt_tlb_inval_alloc_dep() - GT TLB invalidation job alloc dependency
- * @job: GT TLB invalidation job to alloc dependency for
+ * xe_tlb_inval_alloc_dep() - TLB invalidation job alloc dependency
+ * @job: TLB invalidation job to alloc dependency for
  *
- * Allocate storage for a dependency in the GT TLB invalidation fence. This
+ * Allocate storage for a dependency in the TLB invalidation fence. This
  * function should be called at most once per job and must be paired with
- * xe_gt_tlb_inval_job_push being called with a real fence.
+ * xe_tlb_inval_job_push being called with a real fence.
  *
  * Return: 0 on success, -errno on failure
  */
-int xe_gt_tlb_inval_job_alloc_dep(struct xe_gt_tlb_inval_job *job)
+int xe_tlb_inval_job_alloc_dep(struct xe_tlb_inval_job *job)
 {
-	xe_assert(gt_to_xe(job->gt), !xa_load(&job->dep.drm.dependencies, 0));
+	xe_assert(gt_to_xe(job->q->gt), !xa_load(&job->dep.drm.dependencies, 0));
 	might_alloc(GFP_KERNEL);
 
 	return drm_sched_job_add_dependency(&job->dep.drm,
@@ -177,24 +171,24 @@ int xe_gt_tlb_inval_job_alloc_dep(struct xe_gt_tlb_inval_job *job)
 }
 
 /**
- * xe_gt_tlb_inval_job_push() - GT TLB invalidation job push
- * @job: GT TLB invalidation job to push
+ * xe_tlb_inval_job_push() - TLB invalidation job push
+ * @job: TLB invalidation job to push
  * @m: The migration object being used
- * @fence: Dependency for GT TLB invalidation job
+ * @fence: Dependency for TLB invalidation job
  *
- * Pushes a GT TLB invalidation job for execution, using @fence as a dependency.
- * Storage for @fence must be preallocated with xe_gt_tlb_inval_job_alloc_dep
+ * Pushes a TLB invalidation job for execution, using @fence as a dependency.
+ * Storage for @fence must be preallocated with xe_tlb_inval_job_alloc_dep
  * prior to this call if @fence is not signaled. Takes a reference to the job’s
  * finished fence, which the caller is responsible for releasing, and return it
  * to the caller. This function is safe to be called in the path of reclaim.
  *
  * Return: Job's finished fence on success, cannot fail
  */
-struct dma_fence *xe_gt_tlb_inval_job_push(struct xe_gt_tlb_inval_job *job,
-					   struct xe_migrate *m,
-					   struct dma_fence *fence)
+struct dma_fence *xe_tlb_inval_job_push(struct xe_tlb_inval_job *job,
+					struct xe_migrate *m,
+					struct dma_fence *fence)
 {
-	struct xe_gt_tlb_inval_fence *ifence =
+	struct xe_tlb_inval_fence *ifence =
 		container_of(job->fence, typeof(*ifence), base);
 
 	if (!dma_fence_is_signaled(fence)) {
@@ -202,20 +196,20 @@ struct dma_fence *xe_gt_tlb_inval_job_push(struct xe_gt_tlb_inval_job *job,
 
 		/*
 		 * Can be in path of reclaim, hence the preallocation of fence
-		 * storage in xe_gt_tlb_inval_job_alloc_dep. Verify caller did
+		 * storage in xe_tlb_inval_job_alloc_dep. Verify caller did
 		 * this correctly.
 		 */
-		xe_assert(gt_to_xe(job->gt),
+		xe_assert(gt_to_xe(job->q->gt),
 			  xa_load(&job->dep.drm.dependencies, 0) ==
 			  dma_fence_get_stub());
 
 		dma_fence_get(fence);	/* ref released once dependency processed by scheduler */
 		ptr = xa_store(&job->dep.drm.dependencies, 0, fence,
 			       GFP_ATOMIC);
-		xe_assert(gt_to_xe(job->gt), !xa_is_err(ptr));
+		xe_assert(gt_to_xe(job->q->gt), !xa_is_err(ptr));
 	}
 
-	xe_gt_tlb_inval_job_get(job);	/* Pairs with put in free_job */
+	xe_tlb_inval_job_get(job);	/* Pairs with put in free_job */
 	job->fence_armed = true;
 
 	/*
@@ -225,8 +219,8 @@ struct dma_fence *xe_gt_tlb_inval_job_push(struct xe_gt_tlb_inval_job *job,
 	 */
 	xe_migrate_job_lock(m, job->q);
 
-	/* Creation ref pairs with put in xe_gt_tlb_inval_job_destroy */
-	xe_gt_tlb_inval_fence_init(job->gt, ifence, false);
+	/* Creation ref pairs with put in xe_tlb_inval_job_destroy */
+	xe_tlb_inval_fence_init(job->tlb_inval, ifence, false);
 	dma_fence_get(job->fence);	/* Pairs with put in DRM scheduler */
 
 	drm_sched_job_arm(&job->dep.drm);
@@ -241,7 +235,7 @@ struct dma_fence *xe_gt_tlb_inval_job_push(struct xe_gt_tlb_inval_job *job,
 
 	/*
 	 * Not using job->fence, as it has its own dma-fence context, which does
-	 * not allow GT TLB invalidation fences on the same queue, GT tuple to
+	 * not allow TLB invalidation fences on the same queue, GT tuple to
 	 * be squashed in dma-resv/DRM scheduler. Instead, we use the DRM scheduler
 	 * context and job's finished fence, which enables squashing.
 	 */
@@ -249,26 +243,26 @@ struct dma_fence *xe_gt_tlb_inval_job_push(struct xe_gt_tlb_inval_job *job,
 }
 
 /**
- * xe_gt_tlb_inval_job_get() - Get a reference to GT TLB invalidation job
- * @job: GT TLB invalidation job object
+ * xe_tlb_inval_job_get() - Get a reference to TLB invalidation job
+ * @job: TLB invalidation job object
  *
- * Increment the GT TLB invalidation job's reference count
+ * Increment the TLB invalidation job's reference count
  */
-void xe_gt_tlb_inval_job_get(struct xe_gt_tlb_inval_job *job)
+void xe_tlb_inval_job_get(struct xe_tlb_inval_job *job)
 {
 	kref_get(&job->refcount);
 }
 
 /**
- * xe_gt_tlb_inval_job_put() - Put a reference to GT TLB invalidation job
- * @job: GT TLB invalidation job object
+ * xe_tlb_inval_job_put() - Put a reference to TLB invalidation job
+ * @job: TLB invalidation job object
  *
- * Decrement the GT TLB invalidation job's reference count, call
- * xe_gt_tlb_inval_job_destroy when reference count == 0. Skips decrement if
+ * Decrement the TLB invalidation job's reference count, call
+ * xe_tlb_inval_job_destroy when reference count == 0. Skips decrement if
  * input @job is NULL or IS_ERR.
  */
-void xe_gt_tlb_inval_job_put(struct xe_gt_tlb_inval_job *job)
+void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job)
 {
 	if (!IS_ERR_OR_NULL(job))
-		kref_put(&job->refcount, xe_gt_tlb_inval_job_destroy);
+		kref_put(&job->refcount, xe_tlb_inval_job_destroy);
 }
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
new file mode 100644
index 000000000000..32fb41599a19
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#ifndef _XE_TLB_INVAL_JOB_H_
+#define _XE_TLB_INVAL_JOB_H_
+
+#include <linux/types.h>
+
+struct dma_fence;
+struct drm_sched_job;
+struct xe_dep_scheduler;
+struct xe_exec_queue;
+struct xe_tlb_inval;
+struct xe_tlb_inval_job;
+struct xe_migrate;
+
+struct xe_tlb_inval_job *
+xe_tlb_inval_job_create(struct xe_exec_queue *q, struct xe_tlb_inval *tlb_inval,
+			struct xe_dep_scheduler *dep_scheduler,
+			u64 start, u64 end, u32 asid);
+
+int xe_tlb_inval_job_alloc_dep(struct xe_tlb_inval_job *job);
+
+struct dma_fence *xe_tlb_inval_job_push(struct xe_tlb_inval_job *job,
+					struct xe_migrate *m,
+					struct dma_fence *fence);
+
+void xe_tlb_inval_job_get(struct xe_tlb_inval_job *job);
+
+void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job);
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_inval_types.h b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
similarity index 54%
rename from drivers/gpu/drm/xe/xe_gt_tlb_inval_types.h
rename to drivers/gpu/drm/xe/xe_tlb_inval_types.h
index 8b37d0b8f545..05b6adc929bb 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_inval_types.h
+++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
@@ -3,56 +3,55 @@
  * Copyright © 2023 Intel Corporation
  */
 
-#ifndef _XE_GT_TLB_INVAL_TYPES_H_
-#define _XE_GT_TLB_INVAL_TYPES_H_
+#ifndef _XE_TLB_INVAL_TYPES_H_
+#define _XE_TLB_INVAL_TYPES_H_
 
 #include <linux/workqueue.h>
 #include <linux/dma-fence.h>
 
-struct xe_gt;
-
 /** struct xe_tlb_inval - TLB invalidation client */
 struct xe_tlb_inval {
+	/** @private: Backend private pointer */
+	void *private;
 	/** @tlb_inval.seqno: TLB invalidation seqno, protected by CT lock */
 #define TLB_INVALIDATION_SEQNO_MAX	0x100000
 	int seqno;
 	/**
-	 * @tlb_inval.seqno_recv: last received TLB invalidation seqno,
-	 * protected by CT lock
+	 * @seqno_recv: last received TLB invalidation seqno, protected by
+	 * CT lock
 	 */
 	int seqno_recv;
 	/**
-	 * @tlb_inval.pending_fences: list of pending fences waiting TLB
-	 * invaliations, protected by CT lock
+	 * @pending_fences: list of pending fences waiting TLB invaliations,
+	 * protected CT lock
 	 */
 	struct list_head pending_fences;
 	/**
-	 * @tlb_inval.pending_lock: protects @tlb_inval.pending_fences
-	 * and updating @tlb_inval.seqno_recv.
+	 * @pending_lock: protects @pending_fences and updating @seqno_recv.
 	 */
 	spinlock_t pending_lock;
 	/**
-	 * @tlb_inval.fence_tdr: schedules a delayed call to
-	 * xe_gt_tlb_fence_timeout after the timeut interval is over.
+	 * @fence_tdr: schedules a delayed call to xe_tlb_fence_timeout after
+	 * the timeout interval is over.
 	 */
 	struct delayed_work fence_tdr;
-	/** @wtlb_invalidation.wq: schedules GT TLB invalidation jobs */
+	/** @job_wq: schedules TLB invalidation jobs */
 	struct workqueue_struct *job_wq;
 	/** @tlb_inval.lock: protects TLB invalidation fences */
 	spinlock_t lock;
 };
 
 /**
- * struct xe_gt_tlb_inval_fence - XE GT TLB invalidation fence
+ * struct xe_tlb_inval_fence - TLB invalidation fence
  *
- * Optionally passed to xe_gt_tlb_inval and will be signaled upon TLB
+ * Optionally passed to xe_tlb_inval* functions and will be signaled upon TLB
  * invalidation completion.
  */
-struct xe_gt_tlb_inval_fence {
+struct xe_tlb_inval_fence {
 	/** @base: dma fence base */
 	struct dma_fence base;
-	/** @gt: GT which fence belong to */
-	struct xe_gt *gt;
+	/** @tlb_inval: TLB invalidation client which fence belong to */
+	struct xe_tlb_inval *tlb_inval;
 	/** @link: link into list of pending tlb fences */
 	struct list_head link;
 	/** @seqno: seqno of TLB invalidation to signal fence one */
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index 36538f50d06f..314f42fcbcbd 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -14,10 +14,10 @@
 
 #include "xe_exec_queue_types.h"
 #include "xe_gpu_scheduler_types.h"
-#include "xe_gt_tlb_inval_types.h"
 #include "xe_gt_types.h"
 #include "xe_guc_exec_queue_types.h"
 #include "xe_sched_job.h"
+#include "xe_tlb_inval_types.h"
 #include "xe_vm.h"
 
 #define __dev_name_xe(xe)	dev_name((xe)->drm.dev)
@@ -25,13 +25,13 @@
 #define __dev_name_gt(gt)	__dev_name_xe(gt_to_xe((gt)))
 #define __dev_name_eq(q)	__dev_name_gt((q)->gt)
 
-DECLARE_EVENT_CLASS(xe_gt_tlb_inval_fence,
-		    TP_PROTO(struct xe_device *xe, struct xe_gt_tlb_inval_fence *fence),
+DECLARE_EVENT_CLASS(xe_tlb_inval_fence,
+		    TP_PROTO(struct xe_device *xe, struct xe_tlb_inval_fence *fence),
 		    TP_ARGS(xe, fence),
 
 		    TP_STRUCT__entry(
 			     __string(dev, __dev_name_xe(xe))
-			     __field(struct xe_gt_tlb_inval_fence *, fence)
+			     __field(struct xe_tlb_inval_fence *, fence)
 			     __field(int, seqno)
 			     ),
 
@@ -45,23 +45,23 @@ DECLARE_EVENT_CLASS(xe_gt_tlb_inval_fence,
 			      __get_str(dev), __entry->fence, __entry->seqno)
 );
 
-DEFINE_EVENT(xe_gt_tlb_inval_fence, xe_gt_tlb_inval_fence_send,
-	     TP_PROTO(struct xe_device *xe, struct xe_gt_tlb_inval_fence *fence),
+DEFINE_EVENT(xe_tlb_inval_fence, xe_tlb_inval_fence_send,
+	     TP_PROTO(struct xe_device *xe, struct xe_tlb_inval_fence *fence),
 	     TP_ARGS(xe, fence)
 );
 
-DEFINE_EVENT(xe_gt_tlb_inval_fence, xe_gt_tlb_inval_fence_recv,
-	     TP_PROTO(struct xe_device *xe, struct xe_gt_tlb_inval_fence *fence),
+DEFINE_EVENT(xe_tlb_inval_fence, xe_tlb_inval_fence_recv,
+	     TP_PROTO(struct xe_device *xe, struct xe_tlb_inval_fence *fence),
 	     TP_ARGS(xe, fence)
 );
 
-DEFINE_EVENT(xe_gt_tlb_inval_fence, xe_gt_tlb_inval_fence_signal,
-	     TP_PROTO(struct xe_device *xe, struct xe_gt_tlb_inval_fence *fence),
+DEFINE_EVENT(xe_tlb_inval_fence, xe_tlb_inval_fence_signal,
+	     TP_PROTO(struct xe_device *xe, struct xe_tlb_inval_fence *fence),
 	     TP_ARGS(xe, fence)
 );
 
-DEFINE_EVENT(xe_gt_tlb_inval_fence, xe_gt_tlb_inval_fence_timeout,
-	     TP_PROTO(struct xe_device *xe, struct xe_gt_tlb_inval_fence *fence),
+DEFINE_EVENT(xe_tlb_inval_fence, xe_tlb_inval_fence_timeout,
+	     TP_PROTO(struct xe_device *xe, struct xe_tlb_inval_fence *fence),
 	     TP_ARGS(xe, fence)
 );
 
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index a4011a1e5e2c..5c530051723f 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -28,7 +28,6 @@
 #include "xe_drm_client.h"
 #include "xe_exec_queue.h"
 #include "xe_gt_pagefault.h"
-#include "xe_gt_tlb_inval.h"
 #include "xe_migrate.h"
 #include "xe_pat.h"
 #include "xe_pm.h"
@@ -38,6 +37,7 @@
 #include "xe_res_cursor.h"
 #include "xe_svm.h"
 #include "xe_sync.h"
+#include "xe_tlb_inval.h"
 #include "xe_trace_bo.h"
 #include "xe_wa.h"
 #include "xe_hmm.h"
@@ -1850,7 +1850,7 @@ static void xe_vm_close(struct xe_vm *vm)
 					xe_pt_clear(xe, vm->pt_root[id]);
 
 			for_each_gt(gt, xe, id)
-				xe_gt_tlb_inval_vm(gt, vm);
+				xe_tlb_inval_vm(&gt->tlb_inval, vm);
 		}
 	}
 
@@ -3857,7 +3857,7 @@ void xe_vm_unlock(struct xe_vm *vm)
 int xe_vm_range_tilemask_tlb_inval(struct xe_vm *vm, u64 start,
 				   u64 end, u8 tile_mask)
 {
-	struct xe_gt_tlb_inval_fence
+	struct xe_tlb_inval_fence
 		fence[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE];
 	struct xe_tile *tile;
 	u32 fence_id = 0;
@@ -3871,11 +3871,12 @@ int xe_vm_range_tilemask_tlb_inval(struct xe_vm *vm, u64 start,
 		if (!(tile_mask & BIT(id)))
 			continue;
 
-		xe_gt_tlb_inval_fence_init(tile->primary_gt,
-					   &fence[fence_id], true);
+		xe_tlb_inval_fence_init(&tile->primary_gt->tlb_inval,
+					&fence[fence_id], true);
 
-		err = xe_gt_tlb_inval_range(tile->primary_gt, &fence[fence_id],
-					    start, end, vm->usm.asid);
+		err = xe_tlb_inval_range(&tile->primary_gt->tlb_inval,
+					 &fence[fence_id], start, end,
+					 vm->usm.asid);
 		if (err)
 			goto wait;
 		++fence_id;
@@ -3883,11 +3884,12 @@ int xe_vm_range_tilemask_tlb_inval(struct xe_vm *vm, u64 start,
 		if (!tile->media_gt)
 			continue;
 
-		xe_gt_tlb_inval_fence_init(tile->media_gt,
-					   &fence[fence_id], true);
+		xe_tlb_inval_fence_init(&tile->media_gt->tlb_inval,
+					&fence[fence_id], true);
 
-		err = xe_gt_tlb_inval_range(tile->media_gt, &fence[fence_id],
-					    start, end, vm->usm.asid);
+		err = xe_tlb_inval_range(&tile->media_gt->tlb_inval,
+					 &fence[fence_id], start, end,
+					 vm->usm.asid);
 		if (err)
 			goto wait;
 		++fence_id;
@@ -3895,7 +3897,7 @@ int xe_vm_range_tilemask_tlb_inval(struct xe_vm *vm, u64 start,
 
 wait:
 	for (id = 0; id < fence_id; ++id)
-		xe_gt_tlb_inval_fence_wait(&fence[id]);
+		xe_tlb_inval_fence_wait(&fence[id]);
 
 	return err;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/5] drm/xe: Prep TLB invalidation fence before sending
  2025-07-23 18:22 [PATCH 0/5] Add TLB invalidation abstraction stuartsummers
  2025-07-23 18:22 ` [PATCH 1/5] drm/xe: Add xe_gt_tlb_invalidation_done_handler stuartsummers
  2025-07-23 18:22 ` [PATCH 2/5] drm/xe: Decouple TLB invalidations from GT stuartsummers
@ 2025-07-23 18:22 ` stuartsummers
  2025-07-23 18:22 ` [PATCH 4/5] drm/xe: Add helpers to send TLB invalidations stuartsummers
  2025-07-23 18:22 ` [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend stuartsummers
  4 siblings, 0 replies; 19+ messages in thread
From: stuartsummers @ 2025-07-23 18:22 UTC (permalink / raw)
  Cc: matthew.brost, matthew.auld, maarten.lankhorst, farah.kassabri,
	intel-xe

From: Matthew Brost <matthew.brost@intel.com>

It is a bit backwards to add a TLB invalidation fence to the pending
list after issuing the invalidation. Perform this step before issuing
the TLB invalidation in a helper function.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_tlb_inval.c | 116 ++++++++++++++++--------------
 1 file changed, 62 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
index a25c35005689..61bc16410228 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval.c
+++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
@@ -65,19 +65,19 @@ __inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
 static void
 inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
 {
+	lockdep_assert_held(&fence->tlb_inval->pending_lock);
+
 	list_del(&fence->link);
 	__inval_fence_signal(xe, fence);
 }
 
-void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence)
+static void
+inval_fence_signal_unlocked(struct xe_device *xe,
+			    struct xe_tlb_inval_fence *fence)
 {
-	struct xe_gt *gt;
-
-	if (WARN_ON_ONCE(!fence->tlb_inval))
-		return;
-
-	gt = fence->tlb_inval->private;
-	__inval_fence_signal(gt_to_xe(gt), fence);
+	spin_lock_irq(&fence->tlb_inval->pending_lock);
+	inval_fence_signal(xe, fence);
+	spin_unlock_irq(&fence->tlb_inval->pending_lock);
 }
 
 static void xe_gt_tlb_fence_timeout(struct work_struct *work)
@@ -201,16 +201,13 @@ static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
 	return seqno_recv >= seqno;
 }
 
-static int send_tlb_inval(struct xe_guc *guc,
-			  struct xe_tlb_inval_fence *fence,
+static int send_tlb_inval(struct xe_guc *guc, struct xe_tlb_inval_fence *fence,
 			  u32 *action, int len)
 {
 	struct xe_gt *gt = guc_to_gt(guc);
-	struct xe_device *xe = gt_to_xe(gt);
-	int seqno;
-	int ret;
 
 	xe_gt_assert(gt, fence);
+	lockdep_assert_held(&guc->ct.lock);
 
 	/*
 	 * XXX: The seqno algorithm relies on TLB invalidation being processed
@@ -218,47 +215,38 @@ static int send_tlb_inval(struct xe_guc *guc,
 	 * need to be updated.
 	 */
 
-	mutex_lock(&guc->ct.lock);
-	seqno = gt->tlb_inval.seqno;
-	fence->seqno = seqno;
-	trace_xe_tlb_inval_fence_send(xe, fence);
-	action[1] = seqno;
-	ret = xe_guc_ct_send_locked(&guc->ct, action, len,
-				    G2H_LEN_DW_TLB_INVALIDATE, 1);
-	if (!ret) {
-		spin_lock_irq(&gt->tlb_inval.pending_lock);
-		/*
-		 * We haven't actually published the TLB fence as per
-		 * pending_fences, but in theory our seqno could have already
-		 * been written as we acquired the pending_lock. In such a case
-		 * we can just go ahead and signal the fence here.
-		 */
-		if (tlb_inval_seqno_past(gt, seqno)) {
-			__inval_fence_signal(xe, fence);
-		} else {
-			fence->inval_time = ktime_get();
-			list_add_tail(&fence->link,
-				      &gt->tlb_inval.pending_fences);
-
-			if (list_is_singular(&gt->tlb_inval.pending_fences))
-				queue_delayed_work(system_wq,
-						   &gt->tlb_inval.fence_tdr,
-						   tlb_timeout_jiffies(gt));
-		}
-		spin_unlock_irq(&gt->tlb_inval.pending_lock);
-	} else {
-		__inval_fence_signal(xe, fence);
-	}
-	if (!ret) {
-		gt->tlb_inval.seqno = (gt->tlb_inval.seqno + 1) %
-			TLB_INVALIDATION_SEQNO_MAX;
-		if (!gt->tlb_inval.seqno)
-			gt->tlb_inval.seqno = 1;
-	}
-	mutex_unlock(&guc->ct.lock);
 	xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
+	action[1] = fence->seqno;
 
-	return ret;
+	return xe_guc_ct_send(&guc->ct, action, len,
+			      G2H_LEN_DW_TLB_INVALIDATE, 1);
+}
+
+static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence *fence)
+{
+	struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
+	struct xe_gt *gt = tlb_inval->private;
+	struct xe_device *xe = gt_to_xe(gt);
+
+	lockdep_assert_held(&gt->uc.guc.ct.lock);
+
+	fence->seqno = tlb_inval->seqno;
+	trace_xe_tlb_inval_fence_send(xe, fence);
+
+	spin_lock_irq(&tlb_inval->pending_lock);
+	fence->inval_time = ktime_get();
+	list_add_tail(&fence->link, &tlb_inval->pending_fences);
+
+	if (list_is_singular(&tlb_inval->pending_fences))
+		queue_delayed_work(system_wq,
+				   &tlb_inval->fence_tdr,
+				   tlb_timeout_jiffies(gt));
+	spin_unlock_irq(&tlb_inval->pending_lock);
+
+	tlb_inval->seqno = (tlb_inval->seqno + 1) %
+		TLB_INVALIDATION_SEQNO_MAX;
+	if (!tlb_inval->seqno)
+		tlb_inval->seqno = 1;
 }
 
 #define MAKE_INVAL_OP(type)	((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
@@ -286,7 +274,16 @@ static int xe_tlb_inval_guc(struct xe_gt *gt,
 	};
 	int ret;
 
+	mutex_lock(&gt->uc.guc.ct.lock);
+
+	xe_tlb_inval_fence_prep(fence);
+
 	ret = send_tlb_inval(&gt->uc.guc, fence, action, ARRAY_SIZE(action));
+	if (ret < 0)
+		inval_fence_signal_unlocked(gt_to_xe(gt), fence);
+
+	mutex_unlock(&gt->uc.guc.ct.lock);
+
 	/*
 	 * -ECANCELED indicates the CT is stopped for a GT reset. TLB caches
 	 *  should be nuked on a GT reset so this error can be ignored.
@@ -413,7 +410,7 @@ int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
 #define MAX_TLB_INVALIDATION_LEN	7
 	u32 action[MAX_TLB_INVALIDATION_LEN];
 	u64 length = end - start;
-	int len = 0;
+	int len = 0, ret;
 
 	xe_gt_assert(gt, fence);
 
@@ -474,7 +471,18 @@ int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
 
 	xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
 
-	return send_tlb_inval(&gt->uc.guc, fence, action, len);
+	mutex_lock(&gt->uc.guc.ct.lock);
+
+	xe_tlb_inval_fence_prep(fence);
+
+	ret = send_tlb_inval(&gt->uc.guc, fence, action,
+			     ARRAY_SIZE(action));
+	if (ret < 0)
+		inval_fence_signal_unlocked(xe, fence);
+
+	mutex_unlock(&gt->uc.guc.ct.lock);
+
+	return ret;
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/5] drm/xe: Add helpers to send TLB invalidations
  2025-07-23 18:22 [PATCH 0/5] Add TLB invalidation abstraction stuartsummers
                   ` (2 preceding siblings ...)
  2025-07-23 18:22 ` [PATCH 3/5] drm/xe: Prep TLB invalidation fence before sending stuartsummers
@ 2025-07-23 18:22 ` stuartsummers
  2025-07-23 18:22 ` [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend stuartsummers
  4 siblings, 0 replies; 19+ messages in thread
From: stuartsummers @ 2025-07-23 18:22 UTC (permalink / raw)
  Cc: matthew.brost, matthew.auld, maarten.lankhorst, farah.kassabri,
	intel-xe, Stuart Summers

From: Matthew Brost <matthew.brost@intel.com>

Break out the GuC specific code into helpers as part of the process to
decouple frontback TLB invalidation code from the backend.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
---
 drivers/gpu/drm/xe/xe_tlb_inval.c | 238 +++++++++++++++---------------
 1 file changed, 119 insertions(+), 119 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
index 61bc16410228..c795b78362bf 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval.c
+++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
@@ -201,12 +201,11 @@ static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
 	return seqno_recv >= seqno;
 }
 
-static int send_tlb_inval(struct xe_guc *guc, struct xe_tlb_inval_fence *fence,
-			  u32 *action, int len)
+static int send_tlb_inval(struct xe_guc *guc, const u32 *action, int len)
 {
 	struct xe_gt *gt = guc_to_gt(guc);
 
-	xe_gt_assert(gt, fence);
+	xe_gt_assert(gt, action[1]);	/* Seqno */
 	lockdep_assert_held(&guc->ct.lock);
 
 	/*
@@ -216,7 +215,6 @@ static int send_tlb_inval(struct xe_guc *guc, struct xe_tlb_inval_fence *fence,
 	 */
 
 	xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
-	action[1] = fence->seqno;
 
 	return xe_guc_ct_send(&guc->ct, action, len,
 			      G2H_LEN_DW_TLB_INVALIDATE, 1);
@@ -253,93 +251,15 @@ static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence *fence)
 		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \
 		XE_GUC_TLB_INVAL_FLUSH_CACHE)
 
-/**
- * xe_tlb_inval_guc - Issue a TLB invalidation on this GT for the GuC
- * @gt: GT structure
- * @fence: invalidation fence which will be signal on TLB invalidation
- * completion
- *
- * Issue a TLB invalidation for the GuC. Completion of TLB is asynchronous and
- * caller can use the invalidation fence to wait for completion.
- *
- * Return: 0 on success, negative error code on error
- */
-static int xe_tlb_inval_guc(struct xe_gt *gt,
-			    struct xe_tlb_inval_fence *fence)
+static int send_tlb_inval_ggtt(struct xe_gt *gt, int seqno)
 {
 	u32 action[] = {
 		XE_GUC_ACTION_TLB_INVALIDATION,
-		0,  /* seqno, replaced in send_tlb_inval */
+		seqno,
 		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
 	};
-	int ret;
-
-	mutex_lock(&gt->uc.guc.ct.lock);
-
-	xe_tlb_inval_fence_prep(fence);
-
-	ret = send_tlb_inval(&gt->uc.guc, fence, action, ARRAY_SIZE(action));
-	if (ret < 0)
-		inval_fence_signal_unlocked(gt_to_xe(gt), fence);
-
-	mutex_unlock(&gt->uc.guc.ct.lock);
-
-	/*
-	 * -ECANCELED indicates the CT is stopped for a GT reset. TLB caches
-	 *  should be nuked on a GT reset so this error can be ignored.
-	 */
-	if (ret == -ECANCELED)
-		return 0;
-
-	return ret;
-}
-
-/**
- * xe_tlb_inval_ggtt - Issue a TLB invalidation on this GT for the GGTT
- * @tlb_inval: TLB invalidation client
- *
- * Issue a TLB invalidation for the GGTT. Completion of TLB invalidation is
- * synchronous.
- *
- * Return: 0 on success, negative error code on error
- */
-int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
-{
-	struct xe_gt *gt = tlb_inval->private;
-	struct xe_device *xe = gt_to_xe(gt);
-	unsigned int fw_ref;
-
-	if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
-	    gt->uc.guc.submission_state.enabled) {
-		struct xe_tlb_inval_fence fence;
-		int ret;
 
-		xe_tlb_inval_fence_init(tlb_inval, &fence, true);
-		ret = xe_tlb_inval_guc(gt, &fence);
-		if (ret)
-			return ret;
-
-		xe_tlb_inval_fence_wait(&fence);
-	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
-		struct xe_mmio *mmio = &gt->mmio;
-
-		if (IS_SRIOV_VF(xe))
-			return 0;
-
-		fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
-		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
-			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC1,
-					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
-			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC0,
-					PVC_GUC_TLB_INV_DESC0_VALID);
-		} else {
-			xe_mmio_write32(mmio, GUC_TLB_INV_CR,
-					GUC_TLB_INV_CR_INVALIDATE);
-		}
-		xe_force_wake_put(gt_to_fw(gt), fw_ref);
-	}
-
-	return 0;
+	return send_tlb_inval(&gt->uc.guc, action, ARRAY_SIZE(action));
 }
 
 static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
@@ -354,7 +274,7 @@ static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
 
 	xe_gt_assert(gt, fence);
 
-	return send_tlb_inval(&gt->uc.guc, fence, action, ARRAY_SIZE(action));
+	return send_tlb_inval(&gt->uc.guc, action, ARRAY_SIZE(action));
 }
 
 /**
@@ -386,43 +306,17 @@ int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
  */
 #define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
 
-/**
- * xe_tlb_inval_range - Issue a TLB invalidation on this GT for an address range
- * @tlb_inval: TLB invalidation client
- * @fence: invalidation fence which will be signal on TLB invalidation
- * completion
- * @start: start address
- * @end: end address
- * @asid: address space id
- *
- * Issue a range based TLB invalidation if supported, if not fallback to a full
- * TLB invalidation. Completion of TLB is asynchronous and caller can use
- * the invalidation fence to wait for completion.
- *
- * Return: Negative error code on error, 0 on success
- */
-int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
-		       struct xe_tlb_inval_fence *fence, u64 start, u64 end,
-		       u32 asid)
+static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64 start, u64 end,
+				u32 asid, int seqno)
 {
-	struct xe_gt *gt = tlb_inval->private;
-	struct xe_device *xe = gt_to_xe(gt);
 #define MAX_TLB_INVALIDATION_LEN	7
 	u32 action[MAX_TLB_INVALIDATION_LEN];
 	u64 length = end - start;
-	int len = 0, ret;
-
-	xe_gt_assert(gt, fence);
-
-	/* Execlists not supported */
-	if (gt_to_xe(gt)->info.force_execlist) {
-		__inval_fence_signal(xe, fence);
-		return 0;
-	}
+	int len = 0;
 
 	action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
-	action[len++] = 0; /* seqno, replaced in send_tlb_inval */
-	if (!xe->info.has_range_tlb_inval ||
+	action[len++] = seqno;
+	if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
 	    length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
 		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
 	} else {
@@ -471,12 +365,118 @@ int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
 
 	xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
 
+	return send_tlb_inval(&gt->uc.guc, action, len);
+}
+
+static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
+			       struct xe_tlb_inval_fence *fence)
+{
+	int ret;
+
+	mutex_lock(&gt->uc.guc.ct.lock);
+
+	xe_tlb_inval_fence_prep(fence);
+
+	ret = send_tlb_inval_ggtt(gt, fence->seqno);
+	if (ret < 0)
+		inval_fence_signal_unlocked(gt_to_xe(gt), fence);
+
+	mutex_unlock(&gt->uc.guc.ct.lock);
+
+	/*
+	 * -ECANCELED indicates the CT is stopped for a GT reset. TLB caches
+	 *  should be nuked on a GT reset so this error can be ignored.
+	 */
+	if (ret == -ECANCELED)
+		return 0;
+
+	return ret;
+}
+
+/**
+ * xe_tlb_inval_ggtt - Issue a TLB invalidation on this GT for the GGTT
+ * @tlb_inval: TLB invalidation client
+ *
+ * Issue a TLB invalidation for the GGTT. Completion of TLB invalidation is
+ * synchronous.
+ *
+ * Return: 0 on success, negative error code on error
+ */
+int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
+{
+	struct xe_gt *gt = tlb_inval->private;
+	struct xe_device *xe = gt_to_xe(gt);
+	unsigned int fw_ref;
+
+	if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
+	    gt->uc.guc.submission_state.enabled) {
+		struct xe_tlb_inval_fence fence;
+		int ret;
+
+		xe_tlb_inval_fence_init(tlb_inval, &fence, true);
+		ret = __xe_tlb_inval_ggtt(gt, &fence);
+		if (ret)
+			return ret;
+
+		xe_tlb_inval_fence_wait(&fence);
+	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
+		struct xe_mmio *mmio = &gt->mmio;
+
+		if (IS_SRIOV_VF(xe))
+			return 0;
+
+		fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
+		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
+			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC1,
+					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
+			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC0,
+					PVC_GUC_TLB_INV_DESC0_VALID);
+		} else {
+			xe_mmio_write32(mmio, GUC_TLB_INV_CR,
+					GUC_TLB_INV_CR_INVALIDATE);
+		}
+		xe_force_wake_put(gt_to_fw(gt), fw_ref);
+	}
+
+	return 0;
+}
+
+/**
+ * xe_tlb_inval_range - Issue a TLB invalidation on this GT for an address range
+ * @tlb_inval: TLB invalidation client
+ * @fence: invalidation fence which will be signal on TLB invalidation
+ * completion
+ * @start: start address
+ * @end: end address
+ * @asid: address space id
+ *
+ * Issue a range based TLB invalidation if supported, if not fallback to a full
+ * TLB invalidation. Completion of TLB is asynchronous and caller can use
+ * the invalidation fence to wait for completion.
+ *
+ * Return: Negative error code on error, 0 on success
+ */
+int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
+		       struct xe_tlb_inval_fence *fence, u64 start, u64 end,
+		       u32 asid)
+{
+	struct xe_gt *gt = tlb_inval->private;
+	struct xe_device *xe = gt_to_xe(gt);
+	int  ret;
+
+	xe_gt_assert(gt, fence);
+
+	/* Execlists not supported */
+	if (xe->info.force_execlist) {
+		__inval_fence_signal(xe, fence);
+		return 0;
+	}
+
 	mutex_lock(&gt->uc.guc.ct.lock);
 
 	xe_tlb_inval_fence_prep(fence);
 
-	ret = send_tlb_inval(&gt->uc.guc, fence, action,
-			     ARRAY_SIZE(action));
+	ret = send_tlb_inval_ppgtt(gt, start, end, asid, fence->seqno);
 	if (ret < 0)
 		inval_fence_signal_unlocked(xe, fence);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend
  2025-07-23 18:22 [PATCH 0/5] Add TLB invalidation abstraction stuartsummers
                   ` (3 preceding siblings ...)
  2025-07-23 18:22 ` [PATCH 4/5] drm/xe: Add helpers to send TLB invalidations stuartsummers
@ 2025-07-23 18:22 ` stuartsummers
  2025-07-23 18:45   ` Matthew Brost
  2025-07-23 19:17   ` Matthew Brost
  4 siblings, 2 replies; 19+ messages in thread
From: stuartsummers @ 2025-07-23 18:22 UTC (permalink / raw)
  Cc: matthew.brost, matthew.auld, maarten.lankhorst, farah.kassabri,
	intel-xe, Stuart Summers

From: Matthew Brost <matthew.brost@intel.com>

The frontend exposes an API to the driver to send invalidations, handles
sequence number assignment, synchronization (fences), and provides a
timeout mechanism. The backend issues the actual invalidation to the
hardware (or firmware).

The new layering easily allows issuing TLB invalidations to different
hardware or firmware interfaces.

Normalize some naming while here too.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
---
 drivers/gpu/drm/xe/Makefile             |   1 +
 drivers/gpu/drm/xe/xe_guc_ct.c          |   2 +-
 drivers/gpu/drm/xe/xe_guc_tlb_inval.c   | 263 +++++++++++++
 drivers/gpu/drm/xe/xe_guc_tlb_inval.h   |  19 +
 drivers/gpu/drm/xe/xe_tlb_inval.c       | 495 +++++++-----------------
 drivers/gpu/drm/xe/xe_tlb_inval.h       |  14 +-
 drivers/gpu/drm/xe/xe_tlb_inval_types.h |  77 +++-
 7 files changed, 505 insertions(+), 366 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.c
 create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 332b2057cc00..8a2f836b3ab2 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -75,6 +75,7 @@ xe-y += xe_bb.o \
 	xe_guc_log.o \
 	xe_guc_pc.o \
 	xe_guc_submit.o \
+	xe_guc_tlb_inval.o \
 	xe_heci_gsc.o \
 	xe_huc.o \
 	xe_hw_engine.o \
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 2ef86c0ae8b4..90ebda5b3790 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -30,9 +30,9 @@
 #include "xe_guc_log.h"
 #include "xe_guc_relay.h"
 #include "xe_guc_submit.h"
+#include "xe_guc_tlb_inval.h"
 #include "xe_map.h"
 #include "xe_pm.h"
-#include "xe_tlb_inval.h"
 #include "xe_trace_guc.h"
 
 static void receive_g2h(struct xe_guc_ct *ct);
diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
new file mode 100644
index 000000000000..27d7dc938cb1
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
@@ -0,0 +1,263 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#include "abi/guc_actions_abi.h"
+
+#include "xe_device.h"
+#include "xe_gt_stats.h"
+#include "xe_gt_types.h"
+#include "xe_guc.h"
+#include "xe_guc_ct.h"
+#include "xe_guc_tlb_inval.h"
+#include "xe_force_wake.h"
+#include "xe_mmio.h"
+#include "xe_tlb_inval.h"
+
+#include "regs/xe_guc_regs.h"
+
+/*
+ * XXX: The seqno algorithm relies on TLB invalidation being processed in order
+ * which they currently are by the GuC, if that changes the algorithm will need
+ * to be updated.
+ */
+
+static int send_tlb_inval(struct xe_guc *guc, const u32 *action, int len)
+{
+	struct xe_gt *gt = guc_to_gt(guc);
+
+	lockdep_assert_held(&guc->ct.lock);
+	xe_gt_assert(gt, action[1]);	/* Seqno */
+
+	xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
+	return xe_guc_ct_send(&guc->ct, action, len,
+			      G2H_LEN_DW_TLB_INVALIDATE, 1);
+}
+
+#define MAKE_INVAL_OP(type)	((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
+		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \
+		XE_GUC_TLB_INVAL_FLUSH_CACHE)
+
+static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval, u32 seqno)
+{
+	struct xe_guc *guc = tlb_inval->private;
+	u32 action[] = {
+		XE_GUC_ACTION_TLB_INVALIDATION_ALL,
+		seqno,
+		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
+	};
+
+	return send_tlb_inval(guc, action, ARRAY_SIZE(action));
+}
+
+static int send_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval, u32 seqno)
+{
+	struct xe_guc *guc = tlb_inval->private;
+	struct xe_gt *gt = guc_to_gt(guc);
+	struct xe_device *xe = guc_to_xe(guc);
+
+	lockdep_assert_held(&guc->ct.lock);
+
+	/*
+	 * Returning -ECANCELED in this function is squashed at the caller and
+	 * signals waiters.
+	 */
+
+	if (xe_guc_ct_enabled(&guc->ct) && guc->submission_state.enabled) {
+		u32 action[] = {
+			XE_GUC_ACTION_TLB_INVALIDATION,
+			seqno,
+			MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
+		};
+
+		return send_tlb_inval(guc, action, ARRAY_SIZE(action));
+	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
+		struct xe_mmio *mmio = &gt->mmio;
+		unsigned int fw_ref;
+
+		if (IS_SRIOV_VF(xe))
+			return -ECANCELED;
+
+		fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
+		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
+			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC1,
+					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
+			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC0,
+					PVC_GUC_TLB_INV_DESC0_VALID);
+		} else {
+			xe_mmio_write32(mmio, GUC_TLB_INV_CR,
+					GUC_TLB_INV_CR_INVALIDATE);
+		}
+		xe_force_wake_put(gt_to_fw(gt), fw_ref);
+	}
+
+	return -ECANCELED;
+}
+
+/*
+ * Ensure that roundup_pow_of_two(length) doesn't overflow.
+ * Note that roundup_pow_of_two() operates on unsigned long,
+ * not on u64.
+ */
+#define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
+
+static int send_tlb_inval_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno,
+				u64 start, u64 end, u32 asid)
+{
+#define MAX_TLB_INVALIDATION_LEN	7
+	struct xe_guc *guc = tlb_inval->private;
+	struct xe_gt *gt = guc_to_gt(guc);
+	u32 action[MAX_TLB_INVALIDATION_LEN];
+	u64 length = end - start;
+	int len = 0;
+
+	lockdep_assert_held(&guc->ct.lock);
+
+	if (guc_to_xe(guc)->info.force_execlist)
+		return -ECANCELED;
+
+	action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
+	action[len++] = seqno;
+	if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
+	    length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
+		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
+	} else {
+		u64 orig_start = start;
+		u64 align;
+
+		if (length < SZ_4K)
+			length = SZ_4K;
+
+		/*
+		 * We need to invalidate a higher granularity if start address
+		 * is not aligned to length. When start is not aligned with
+		 * length we need to find the length large enough to create an
+		 * address mask covering the required range.
+		 */
+		align = roundup_pow_of_two(length);
+		start = ALIGN_DOWN(start, align);
+		end = ALIGN(end, align);
+		length = align;
+		while (start + length < end) {
+			length <<= 1;
+			start = ALIGN_DOWN(orig_start, length);
+		}
+
+		/*
+		 * Minimum invalidation size for a 2MB page that the hardware
+		 * expects is 16MB
+		 */
+		if (length >= SZ_2M) {
+			length = max_t(u64, SZ_16M, length);
+			start = ALIGN_DOWN(orig_start, length);
+		}
+
+		xe_gt_assert(gt, length >= SZ_4K);
+		xe_gt_assert(gt, is_power_of_2(length));
+		xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1,
+						    ilog2(SZ_2M) + 1)));
+		xe_gt_assert(gt, IS_ALIGNED(start, length));
+
+		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
+		action[len++] = asid;
+		action[len++] = lower_32_bits(start);
+		action[len++] = upper_32_bits(start);
+		action[len++] = ilog2(length) - ilog2(SZ_4K);
+	}
+
+	xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
+
+	return send_tlb_inval(guc, action, len);
+}
+
+static bool tlb_inval_initialized(struct xe_tlb_inval *tlb_inval)
+{
+	struct xe_guc *guc = tlb_inval->private;
+
+	return xe_guc_ct_initialized(&guc->ct);
+}
+
+static void tlb_inval_flush(struct xe_tlb_inval *tlb_inval)
+{
+	struct xe_guc *guc = tlb_inval->private;
+
+	LNL_FLUSH_WORK(&guc->ct.g2h_worker);
+}
+
+static long tlb_inval_timeout_delay(struct xe_tlb_inval *tlb_inval)
+{
+	struct xe_guc *guc = tlb_inval->private;
+
+	/* this reflects what HW/GuC needs to process TLB inv request */
+	const long hw_tlb_timeout = HZ / 4;
+
+	/* this estimates actual delay caused by the CTB transport */
+	long delay = xe_guc_ct_queue_proc_time_jiffies(&guc->ct);
+
+	return hw_tlb_timeout + 2 * delay;
+}
+
+static void tlb_inval_lock(struct xe_tlb_inval *tlb_inval)
+{
+	struct xe_guc *guc = tlb_inval->private;
+
+	mutex_lock(&guc->ct.lock);
+}
+
+static void tlb_inval_unlock(struct xe_tlb_inval *tlb_inval)
+{
+	struct xe_guc *guc = tlb_inval->private;
+
+	mutex_unlock(&guc->ct.lock);
+}
+
+static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
+	.all = send_tlb_inval_all,
+	.ggtt = send_tlb_inval_ggtt,
+	.ppgtt = send_tlb_inval_ppgtt,
+	.initialized = tlb_inval_initialized,
+	.flush = tlb_inval_flush,
+	.timeout_delay = tlb_inval_timeout_delay,
+	.lock = tlb_inval_lock,
+	.unlock = tlb_inval_unlock,
+};
+
+/**
+ * xe_guc_tlb_inval_init_early() - Init GuC TLB invalidation early
+ * @guc: GuC object
+ * @tlb_inval: TLB invalidation client
+ *
+ * Inititialize GuC TLB invalidation by setting back pointer in TLB invalidation
+ * client to the GuC and setting GuC backend ops.
+ */
+void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
+				 struct xe_tlb_inval *tlb_inval)
+{
+	tlb_inval->private = guc;
+	tlb_inval->ops = &guc_tlb_inval_ops;
+}
+
+/**
+ * xe_guc_tlb_inval_done_handler() - TLB invalidation done handler
+ * @guc: guc
+ * @msg: message indicating TLB invalidation done
+ * @len: length of message
+ *
+ * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
+ * invalidation fences for seqno. Algorithm for this depends on seqno being
+ * received in-order and asserts this assumption.
+ *
+ * Return: 0 on success, -EPROTO for malformed messages.
+ */
+int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
+{
+	struct xe_gt *gt = guc_to_gt(guc);
+
+	if (unlikely(len != 1))
+		return -EPROTO;
+
+	xe_tlb_inval_done_handler(&gt->tlb_inval, msg[0]);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.h b/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
new file mode 100644
index 000000000000..07d668b02e3d
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#ifndef _XE_GUC_TLB_INVAL_H_
+#define _XE_GUC_TLB_INVAL_H_
+
+#include <linux/types.h>
+
+struct xe_guc;
+struct xe_tlb_inval;
+
+void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
+				 struct xe_tlb_inval *tlb_inval);
+
+int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
index c795b78362bf..071c25fbdbac 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval.c
+++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
@@ -12,50 +12,45 @@
 #include "xe_gt_printk.h"
 #include "xe_guc.h"
 #include "xe_guc_ct.h"
+#include "xe_guc_tlb_inval.h"
 #include "xe_gt_stats.h"
 #include "xe_tlb_inval.h"
 #include "xe_mmio.h"
 #include "xe_pm.h"
-#include "xe_sriov.h"
+#include "xe_tlb_inval.h"
 #include "xe_trace.h"
-#include "regs/xe_guc_regs.h"
-
-#define FENCE_STACK_BIT		DMA_FENCE_FLAG_USER_BITS
 
-/*
- * TLB inval depends on pending commands in the CT queue and then the real
- * invalidation time. Double up the time to process full CT queue
- * just to be on the safe side.
+/**
+ * DOC: Xe TLB invalidation
+ *
+ * Xe TLB invalidation is implemented in two layers. The first is the frontend
+ * API, which provides an interface for TLB invalidations to the driver code.
+ * The frontend handles seqno assignment, synchronization (fences), and the
+ * timeout mechanism. The frontend is implemented via an embedded structure
+ * xe_tlb_inval that includes a set of ops hooking into the backend. The backend
+ * interacts with the hardware (or firmware) to perform the actual invalidation.
  */
-static long tlb_timeout_jiffies(struct xe_gt *gt)
-{
-	/* this reflects what HW/GuC needs to process TLB inv request */
-	const long hw_tlb_timeout = HZ / 4;
 
-	/* this estimates actual delay caused by the CTB transport */
-	long delay = xe_guc_ct_queue_proc_time_jiffies(&gt->uc.guc.ct);
-
-	return hw_tlb_timeout + 2 * delay;
-}
+#define FENCE_STACK_BIT		DMA_FENCE_FLAG_USER_BITS
 
 static void xe_tlb_inval_fence_fini(struct xe_tlb_inval_fence *fence)
 {
-	struct xe_gt *gt;
-
 	if (WARN_ON_ONCE(!fence->tlb_inval))
 		return;
 
-	gt = fence->tlb_inval->private;
-	xe_pm_runtime_put(gt_to_xe(gt));
+	xe_pm_runtime_put(fence->tlb_inval->xe);
 	fence->tlb_inval = NULL; /* fini() should be called once */
 }
 
 static void
-__inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
+xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence)
 {
 	bool stack = test_bit(FENCE_STACK_BIT, &fence->base.flags);
 
-	trace_xe_tlb_inval_fence_signal(xe, fence);
+	lockdep_assert_held(&fence->tlb_inval->pending_lock);
+
+	list_del(&fence->link);
+	trace_xe_tlb_inval_fence_signal(fence->tlb_inval->xe, fence);
 	xe_tlb_inval_fence_fini(fence);
 	dma_fence_signal(&fence->base);
 	if (!stack)
@@ -63,57 +58,50 @@ __inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
 }
 
 static void
-inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
+xe_tlb_inval_fence_signal_unlocked(struct xe_tlb_inval_fence *fence)
 {
-	lockdep_assert_held(&fence->tlb_inval->pending_lock);
-
-	list_del(&fence->link);
-	__inval_fence_signal(xe, fence);
-}
+	struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
 
-static void
-inval_fence_signal_unlocked(struct xe_device *xe,
-			    struct xe_tlb_inval_fence *fence)
-{
-	spin_lock_irq(&fence->tlb_inval->pending_lock);
-	inval_fence_signal(xe, fence);
-	spin_unlock_irq(&fence->tlb_inval->pending_lock);
+	spin_lock_irq(&tlb_inval->pending_lock);
+	xe_tlb_inval_fence_signal(fence);
+	spin_unlock_irq(&tlb_inval->pending_lock);
 }
 
-static void xe_gt_tlb_fence_timeout(struct work_struct *work)
+static void xe_tlb_inval_fence_timeout(struct work_struct *work)
 {
-	struct xe_gt *gt = container_of(work, struct xe_gt,
-					tlb_inval.fence_tdr.work);
-	struct xe_device *xe = gt_to_xe(gt);
+	struct xe_tlb_inval *tlb_inval = container_of(work, struct xe_tlb_inval,
+						      fence_tdr.work);
+	struct xe_device *xe = tlb_inval->xe;
 	struct xe_tlb_inval_fence *fence, *next;
+	long timeout_delay = tlb_inval->ops->timeout_delay(tlb_inval);
 
-	LNL_FLUSH_WORK(&gt->uc.guc.ct.g2h_worker);
+	tlb_inval->ops->flush(tlb_inval);
 
-	spin_lock_irq(&gt->tlb_inval.pending_lock);
+	spin_lock_irq(&tlb_inval->pending_lock);
 	list_for_each_entry_safe(fence, next,
-				 &gt->tlb_inval.pending_fences, link) {
+				 &tlb_inval->pending_fences, link) {
 		s64 since_inval_ms = ktime_ms_delta(ktime_get(),
 						    fence->inval_time);
 
-		if (msecs_to_jiffies(since_inval_ms) < tlb_timeout_jiffies(gt))
+		if (msecs_to_jiffies(since_inval_ms) < timeout_delay)
 			break;
 
 		trace_xe_tlb_inval_fence_timeout(xe, fence);
-		xe_gt_err(gt, "TLB invalidation fence timeout, seqno=%d recv=%d",
-			  fence->seqno, gt->tlb_inval.seqno_recv);
+		drm_err(&xe->drm,
+			"TLB invalidation fence timeout, seqno=%d recv=%d",
+			fence->seqno, tlb_inval->seqno_recv);
 
 		fence->base.error = -ETIME;
-		inval_fence_signal(xe, fence);
+		xe_tlb_inval_fence_signal(fence);
 	}
-	if (!list_empty(&gt->tlb_inval.pending_fences))
-		queue_delayed_work(system_wq,
-				   &gt->tlb_inval.fence_tdr,
-				   tlb_timeout_jiffies(gt));
-	spin_unlock_irq(&gt->tlb_inval.pending_lock);
+	if (!list_empty(&tlb_inval->pending_fences))
+		queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
+				   timeout_delay);
+	spin_unlock_irq(&tlb_inval->pending_lock);
 }
 
 /**
- * xe_tlb_inval_init_early - Initialize TLB invalidation state
+ * xe_gt_tlb_inval_init_early() - Initialize TLB invalidation state
  * @gt: GT structure
  *
  * Initialize TLB invalidation state, purely software initialization, should
@@ -123,13 +111,12 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
  */
 int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
 {
-	gt->tlb_inval.private = gt;
+	gt->tlb_inval.xe = gt_to_xe(gt);
 	gt->tlb_inval.seqno = 1;
 	INIT_LIST_HEAD(&gt->tlb_inval.pending_fences);
 	spin_lock_init(&gt->tlb_inval.pending_lock);
 	spin_lock_init(&gt->tlb_inval.lock);
-	INIT_DELAYED_WORK(&gt->tlb_inval.fence_tdr,
-			  xe_gt_tlb_fence_timeout);
+	INIT_DELAYED_WORK(&gt->tlb_inval.fence_tdr, xe_tlb_inval_fence_timeout);
 
 	gt->tlb_inval.job_wq =
 		drmm_alloc_ordered_workqueue(&gt_to_xe(gt)->drm, "gt-tbl-inval-job-wq",
@@ -137,60 +124,64 @@ int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
 	if (IS_ERR(gt->tlb_inval.job_wq))
 		return PTR_ERR(gt->tlb_inval.job_wq);
 
+	/* XXX: Blindly setting up backend to GuC */
+	xe_guc_tlb_inval_init_early(&gt->uc.guc, &gt->tlb_inval);
+
 	return 0;
 }
 
 /**
- * xe_tlb_inval_reset - Initialize TLB invalidation reset
+ * xe_tlb_inval_reset() - TLB invalidation reset
  * @tlb_inval: TLB invalidation client
  *
  * Signal any pending invalidation fences, should be called during a GT reset
  */
 void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval)
 {
-	struct xe_gt *gt = tlb_inval->private;
 	struct xe_tlb_inval_fence *fence, *next;
 	int pending_seqno;
 
 	/*
-	 * we can get here before the CTs are even initialized if we're wedging
-	 * very early, in which case there are not going to be any pending
-	 * fences so we can bail immediately.
+	 * we can get here before the backends are even initialized if we're
+	 * wedging very early, in which case there are not going to be any
+	 * pendind fences so we can bail immediately.
 	 */
-	if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
+	if (!tlb_inval->ops->initialized(tlb_inval))
 		return;
 
 	/*
-	 * CT channel is already disabled at this point. No new TLB requests can
+	 * Backend is already disabled at this point. No new TLB requests can
 	 * appear.
 	 */
 
-	mutex_lock(&gt->uc.guc.ct.lock);
-	spin_lock_irq(&gt->tlb_inval.pending_lock);
-	cancel_delayed_work(&gt->tlb_inval.fence_tdr);
+	tlb_inval->ops->lock(tlb_inval);
+	spin_lock_irq(&tlb_inval->pending_lock);
+	cancel_delayed_work(&tlb_inval->fence_tdr);
 	/*
 	 * We might have various kworkers waiting for TLB flushes to complete
 	 * which are not tracked with an explicit TLB fence, however at this
-	 * stage that will never happen since the CT is already disabled, so
-	 * make sure we signal them here under the assumption that we have
+	 * stage that will never happen since the backend is already disabled,
+	 * so make sure we signal them here under the assumption that we have
 	 * completed a full GT reset.
 	 */
-	if (gt->tlb_inval.seqno == 1)
+	if (tlb_inval->seqno == 1)
 		pending_seqno = TLB_INVALIDATION_SEQNO_MAX - 1;
 	else
-		pending_seqno = gt->tlb_inval.seqno - 1;
-	WRITE_ONCE(gt->tlb_inval.seqno_recv, pending_seqno);
+		pending_seqno = tlb_inval->seqno - 1;
+	WRITE_ONCE(tlb_inval->seqno_recv, pending_seqno);
 
 	list_for_each_entry_safe(fence, next,
-				 &gt->tlb_inval.pending_fences, link)
-		inval_fence_signal(gt_to_xe(gt), fence);
-	spin_unlock_irq(&gt->tlb_inval.pending_lock);
-	mutex_unlock(&gt->uc.guc.ct.lock);
+				 &tlb_inval->pending_fences, link)
+		xe_tlb_inval_fence_signal(fence);
+	spin_unlock_irq(&tlb_inval->pending_lock);
+	tlb_inval->ops->unlock(tlb_inval);
 }
 
-static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
+static bool xe_tlb_inval_seqno_past(struct xe_tlb_inval *tlb_inval, int seqno)
 {
-	int seqno_recv = READ_ONCE(gt->tlb_inval.seqno_recv);
+	int seqno_recv = READ_ONCE(tlb_inval->seqno_recv);
+
+	lockdep_assert_held(&tlb_inval->pending_lock);
 
 	if (seqno - seqno_recv < -(TLB_INVALIDATION_SEQNO_MAX / 2))
 		return false;
@@ -201,44 +192,20 @@ static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
 	return seqno_recv >= seqno;
 }
 
-static int send_tlb_inval(struct xe_guc *guc, const u32 *action, int len)
-{
-	struct xe_gt *gt = guc_to_gt(guc);
-
-	xe_gt_assert(gt, action[1]);	/* Seqno */
-	lockdep_assert_held(&guc->ct.lock);
-
-	/*
-	 * XXX: The seqno algorithm relies on TLB invalidation being processed
-	 * in order which they currently are, if that changes the algorithm will
-	 * need to be updated.
-	 */
-
-	xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
-
-	return xe_guc_ct_send(&guc->ct, action, len,
-			      G2H_LEN_DW_TLB_INVALIDATE, 1);
-}
-
 static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence *fence)
 {
 	struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
-	struct xe_gt *gt = tlb_inval->private;
-	struct xe_device *xe = gt_to_xe(gt);
-
-	lockdep_assert_held(&gt->uc.guc.ct.lock);
 
 	fence->seqno = tlb_inval->seqno;
-	trace_xe_tlb_inval_fence_send(xe, fence);
+	trace_xe_tlb_inval_fence_send(tlb_inval->xe, fence);
 
 	spin_lock_irq(&tlb_inval->pending_lock);
 	fence->inval_time = ktime_get();
 	list_add_tail(&fence->link, &tlb_inval->pending_fences);
 
 	if (list_is_singular(&tlb_inval->pending_fences))
-		queue_delayed_work(system_wq,
-				   &tlb_inval->fence_tdr,
-				   tlb_timeout_jiffies(gt));
+		queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
+				   tlb_inval->ops->timeout_delay(tlb_inval));
 	spin_unlock_irq(&tlb_inval->pending_lock);
 
 	tlb_inval->seqno = (tlb_inval->seqno + 1) %
@@ -247,202 +214,63 @@ static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence *fence)
 		tlb_inval->seqno = 1;
 }
 
-#define MAKE_INVAL_OP(type)	((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
-		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \
-		XE_GUC_TLB_INVAL_FLUSH_CACHE)
-
-static int send_tlb_inval_ggtt(struct xe_gt *gt, int seqno)
-{
-	u32 action[] = {
-		XE_GUC_ACTION_TLB_INVALIDATION,
-		seqno,
-		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
-	};
-
-	return send_tlb_inval(&gt->uc.guc, action, ARRAY_SIZE(action));
-}
-
-static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
-			      struct xe_tlb_inval_fence *fence)
-{
-	u32 action[] = {
-		XE_GUC_ACTION_TLB_INVALIDATION_ALL,
-		0,  /* seqno, replaced in send_tlb_inval */
-		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
-	};
-	struct xe_gt *gt = tlb_inval->private;
-
-	xe_gt_assert(gt, fence);
-
-	return send_tlb_inval(&gt->uc.guc, action, ARRAY_SIZE(action));
-}
+#define xe_tlb_inval_issue(__tlb_inval, __fence, op, args...)	\
+({								\
+	int __ret;						\
+								\
+	xe_assert((__tlb_inval)->xe, (__tlb_inval)->ops);	\
+	xe_assert((__tlb_inval)->xe, (__fence));		\
+								\
+	(__tlb_inval)->ops->lock((__tlb_inval));		\
+	xe_tlb_inval_fence_prep((__fence));			\
+	__ret = op((__tlb_inval), (__fence)->seqno, ##args);	\
+	if (__ret < 0)						\
+		xe_tlb_inval_fence_signal_unlocked((__fence));	\
+	(__tlb_inval)->ops->unlock((__tlb_inval));		\
+								\
+	__ret == -ECANCELED ? 0 : __ret;			\
+})
 
 /**
- * xe_gt_tlb_invalidation_all - Invalidate all TLBs across PF and all VFs.
- * @gt: the &xe_gt structure
- * @fence: the &xe_tlb_inval_fence to be signaled on completion
+ * xe_tlb_inval_all() - Issue a TLB invalidation for all TLBs
+ * @tlb_inval: TLB invalidation client
+ * @fence: invalidation fence which will be signal on TLB invalidation
+ * completion
  *
- * Send a request to invalidate all TLBs across PF and all VFs.
+ * Issue a TLB invalidation for all TLBs. Completion of TLB is asynchronous and
+ * caller can use the invalidation fence to wait for completion.
  *
  * Return: 0 on success, negative error code on error
  */
 int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
 		     struct xe_tlb_inval_fence *fence)
 {
-	struct xe_gt *gt = tlb_inval->private;
-	int err;
-
-	err = send_tlb_inval_all(tlb_inval, fence);
-	if (err)
-		xe_gt_err(gt, "TLB invalidation request failed (%pe)", ERR_PTR(err));
-
-	return err;
-}
-
-/*
- * Ensure that roundup_pow_of_two(length) doesn't overflow.
- * Note that roundup_pow_of_two() operates on unsigned long,
- * not on u64.
- */
-#define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
-
-static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64 start, u64 end,
-				u32 asid, int seqno)
-{
-#define MAX_TLB_INVALIDATION_LEN	7
-	u32 action[MAX_TLB_INVALIDATION_LEN];
-	u64 length = end - start;
-	int len = 0;
-
-	action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
-	action[len++] = seqno;
-	if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
-	    length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
-		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
-	} else {
-		u64 orig_start = start;
-		u64 align;
-
-		if (length < SZ_4K)
-			length = SZ_4K;
-
-		/*
-		 * We need to invalidate a higher granularity if start address
-		 * is not aligned to length. When start is not aligned with
-		 * length we need to find the length large enough to create an
-		 * address mask covering the required range.
-		 */
-		align = roundup_pow_of_two(length);
-		start = ALIGN_DOWN(start, align);
-		end = ALIGN(end, align);
-		length = align;
-		while (start + length < end) {
-			length <<= 1;
-			start = ALIGN_DOWN(orig_start, length);
-		}
-
-		/*
-		 * Minimum invalidation size for a 2MB page that the hardware
-		 * expects is 16MB
-		 */
-		if (length >= SZ_2M) {
-			length = max_t(u64, SZ_16M, length);
-			start = ALIGN_DOWN(orig_start, length);
-		}
-
-		xe_gt_assert(gt, length >= SZ_4K);
-		xe_gt_assert(gt, is_power_of_2(length));
-		xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1,
-						    ilog2(SZ_2M) + 1)));
-		xe_gt_assert(gt, IS_ALIGNED(start, length));
-
-		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
-		action[len++] = asid;
-		action[len++] = lower_32_bits(start);
-		action[len++] = upper_32_bits(start);
-		action[len++] = ilog2(length) - ilog2(SZ_4K);
-	}
-
-	xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
-
-	return send_tlb_inval(&gt->uc.guc, action, len);
-}
-
-static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
-			       struct xe_tlb_inval_fence *fence)
-{
-	int ret;
-
-	mutex_lock(&gt->uc.guc.ct.lock);
-
-	xe_tlb_inval_fence_prep(fence);
-
-	ret = send_tlb_inval_ggtt(gt, fence->seqno);
-	if (ret < 0)
-		inval_fence_signal_unlocked(gt_to_xe(gt), fence);
-
-	mutex_unlock(&gt->uc.guc.ct.lock);
-
-	/*
-	 * -ECANCELED indicates the CT is stopped for a GT reset. TLB caches
-	 *  should be nuked on a GT reset so this error can be ignored.
-	 */
-	if (ret == -ECANCELED)
-		return 0;
-
-	return ret;
+	return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval->ops->all);
 }
 
 /**
- * xe_tlb_inval_ggtt - Issue a TLB invalidation on this GT for the GGTT
+ * xe_tlb_inval_ggtt() - Issue a TLB invalidation for the GGTT
  * @tlb_inval: TLB invalidation client
  *
- * Issue a TLB invalidation for the GGTT. Completion of TLB invalidation is
- * synchronous.
+ * Issue a TLB invalidation for the GGTT. Completion of TLB is asynchronous and
+ * caller can use the invalidation fence to wait for completion.
  *
  * Return: 0 on success, negative error code on error
  */
 int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
 {
-	struct xe_gt *gt = tlb_inval->private;
-	struct xe_device *xe = gt_to_xe(gt);
-	unsigned int fw_ref;
-
-	if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
-	    gt->uc.guc.submission_state.enabled) {
-		struct xe_tlb_inval_fence fence;
-		int ret;
-
-		xe_tlb_inval_fence_init(tlb_inval, &fence, true);
-		ret = __xe_tlb_inval_ggtt(gt, &fence);
-		if (ret)
-			return ret;
-
-		xe_tlb_inval_fence_wait(&fence);
-	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
-		struct xe_mmio *mmio = &gt->mmio;
-
-		if (IS_SRIOV_VF(xe))
-			return 0;
-
-		fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
-		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
-			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC1,
-					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
-			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC0,
-					PVC_GUC_TLB_INV_DESC0_VALID);
-		} else {
-			xe_mmio_write32(mmio, GUC_TLB_INV_CR,
-					GUC_TLB_INV_CR_INVALIDATE);
-		}
-		xe_force_wake_put(gt_to_fw(gt), fw_ref);
-	}
+	struct xe_tlb_inval_fence fence, *fence_ptr = &fence;
+	int ret;
 
-	return 0;
+	xe_tlb_inval_fence_init(tlb_inval, fence_ptr, true);
+	ret = xe_tlb_inval_issue(tlb_inval, fence_ptr, tlb_inval->ops->ggtt);
+	xe_tlb_inval_fence_wait(fence_ptr);
+
+	return ret;
 }
 
 /**
- * xe_tlb_inval_range - Issue a TLB invalidation on this GT for an address range
+ * xe_tlb_inval_range() - Issue a TLB invalidation for an address range
  * @tlb_inval: TLB invalidation client
  * @fence: invalidation fence which will be signal on TLB invalidation
  * completion
@@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
 		       struct xe_tlb_inval_fence *fence, u64 start, u64 end,
 		       u32 asid)
 {
-	struct xe_gt *gt = tlb_inval->private;
-	struct xe_device *xe = gt_to_xe(gt);
-	int  ret;
-
-	xe_gt_assert(gt, fence);
-
-	/* Execlists not supported */
-	if (xe->info.force_execlist) {
-		__inval_fence_signal(xe, fence);
-		return 0;
-	}
-
-	mutex_lock(&gt->uc.guc.ct.lock);
-
-	xe_tlb_inval_fence_prep(fence);
-
-	ret = send_tlb_inval_ppgtt(gt, start, end, asid, fence->seqno);
-	if (ret < 0)
-		inval_fence_signal_unlocked(xe, fence);
-
-	mutex_unlock(&gt->uc.guc.ct.lock);
-
-	return ret;
+	return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval->ops->ppgtt,
+				  start, end, asid);
 }
 
 /**
- * xe_tlb_inval_vm - Issue a TLB invalidation on this GT for a VM
+ * xe_tlb_inval_vm() - Issue a TLB invalidation for a VM
  * @tlb_inval: TLB invalidation client
  * @vm: VM to invalidate
  *
@@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm)
 {
 	struct xe_tlb_inval_fence fence;
 	u64 range = 1ull << vm->xe->info.va_bits;
-	int ret;
 
 	xe_tlb_inval_fence_init(tlb_inval, &fence, true);
-
-	ret = xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm->usm.asid);
-	if (ret < 0)
-		return;
-
+	xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm->usm.asid);
 	xe_tlb_inval_fence_wait(&fence);
 }
 
 /**
- * xe_tlb_inval_done_handler - TLB invalidation done handler
- * @gt: gt
+ * xe_tlb_inval_done_handler() - TLB invalidation done handler
+ * @tlb_inval: TLB invalidation client
  * @seqno: seqno of invalidation that is done
  *
  * Update recv seqno, signal any TLB invalidation fences, and restart TDR
  */
-static void xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
+void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno)
 {
-	struct xe_device *xe = gt_to_xe(gt);
+	struct xe_device *xe = tlb_inval->xe;
 	struct xe_tlb_inval_fence *fence, *next;
 	unsigned long flags;
 
@@ -535,77 +337,53 @@ static void xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
 	 * officially process the CT message like if racing against
 	 * process_g2h_msg().
 	 */
-	spin_lock_irqsave(&gt->tlb_inval.pending_lock, flags);
-	if (tlb_inval_seqno_past(gt, seqno)) {
-		spin_unlock_irqrestore(&gt->tlb_inval.pending_lock, flags);
+	spin_lock_irqsave(&tlb_inval->pending_lock, flags);
+	if (xe_tlb_inval_seqno_past(tlb_inval, seqno)) {
+		spin_unlock_irqrestore(&tlb_inval->pending_lock, flags);
 		return;
 	}
 
-	WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
+	WRITE_ONCE(tlb_inval->seqno_recv, seqno);
 
 	list_for_each_entry_safe(fence, next,
-				 &gt->tlb_inval.pending_fences, link) {
+				 &tlb_inval->pending_fences, link) {
 		trace_xe_tlb_inval_fence_recv(xe, fence);
 
-		if (!tlb_inval_seqno_past(gt, fence->seqno))
+		if (!xe_tlb_inval_seqno_past(tlb_inval, fence->seqno))
 			break;
 
-		inval_fence_signal(xe, fence);
+		xe_tlb_inval_fence_signal(fence);
 	}
 
-	if (!list_empty(&gt->tlb_inval.pending_fences))
+	if (!list_empty(&tlb_inval->pending_fences))
 		mod_delayed_work(system_wq,
-				 &gt->tlb_inval.fence_tdr,
-				 tlb_timeout_jiffies(gt));
+				 &tlb_inval->fence_tdr,
+				 tlb_inval->ops->timeout_delay(tlb_inval));
 	else
-		cancel_delayed_work(&gt->tlb_inval.fence_tdr);
+		cancel_delayed_work(&tlb_inval->fence_tdr);
 
-	spin_unlock_irqrestore(&gt->tlb_inval.pending_lock, flags);
-}
-
-/**
- * xe_guc_tlb_inval_done_handler - TLB invalidation done handler
- * @guc: guc
- * @msg: message indicating TLB invalidation done
- * @len: length of message
- *
- * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
- * invalidation fences for seqno. Algorithm for this depends on seqno being
- * received in-order and asserts this assumption.
- *
- * Return: 0 on success, -EPROTO for malformed messages.
- */
-int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
-{
-	struct xe_gt *gt = guc_to_gt(guc);
-
-	if (unlikely(len != 1))
-		return -EPROTO;
-
-	xe_tlb_inval_done_handler(gt, msg[0]);
-
-	return 0;
+	spin_unlock_irqrestore(&tlb_inval->pending_lock, flags);
 }
 
 static const char *
-inval_fence_get_driver_name(struct dma_fence *dma_fence)
+xe_inval_fence_get_driver_name(struct dma_fence *dma_fence)
 {
 	return "xe";
 }
 
 static const char *
-inval_fence_get_timeline_name(struct dma_fence *dma_fence)
+xe_inval_fence_get_timeline_name(struct dma_fence *dma_fence)
 {
-	return "inval_fence";
+	return "tlb_inval_fence";
 }
 
 static const struct dma_fence_ops inval_fence_ops = {
-	.get_driver_name = inval_fence_get_driver_name,
-	.get_timeline_name = inval_fence_get_timeline_name,
+	.get_driver_name = xe_inval_fence_get_driver_name,
+	.get_timeline_name = xe_inval_fence_get_timeline_name,
 };
 
 /**
- * xe_tlb_inval_fence_init - Initialize TLB invalidation fence
+ * xe_tlb_inval_fence_init() - Initialize TLB invalidation fence
  * @tlb_inval: TLB invalidation client
  * @fence: TLB invalidation fence to initialize
  * @stack: fence is stack variable
@@ -618,15 +396,12 @@ void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
 			     struct xe_tlb_inval_fence *fence,
 			     bool stack)
 {
-	struct xe_gt *gt = tlb_inval->private;
-
-	xe_pm_runtime_get_noresume(gt_to_xe(gt));
+	xe_pm_runtime_get_noresume(tlb_inval->xe);
 
-	spin_lock_irq(&gt->tlb_inval.lock);
-	dma_fence_init(&fence->base, &inval_fence_ops,
-		       &gt->tlb_inval.lock,
+	spin_lock_irq(&tlb_inval->lock);
+	dma_fence_init(&fence->base, &inval_fence_ops, &tlb_inval->lock,
 		       dma_fence_context_alloc(1), 1);
-	spin_unlock_irq(&gt->tlb_inval.lock);
+	spin_unlock_irq(&tlb_inval->lock);
 	INIT_LIST_HEAD(&fence->link);
 	if (stack)
 		set_bit(FENCE_STACK_BIT, &fence->base.flags);
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h b/drivers/gpu/drm/xe/xe_tlb_inval.h
index 7adee3f8c551..cdeafc8d4391 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval.h
+++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
@@ -18,24 +18,30 @@ struct xe_vma;
 int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
 
 void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval);
-int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
-void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm);
 int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
 		     struct xe_tlb_inval_fence *fence);
+int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
+void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm);
 int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
 		       struct xe_tlb_inval_fence *fence,
 		       u64 start, u64 end, u32 asid);
-int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
 
 void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
 			     struct xe_tlb_inval_fence *fence,
 			     bool stack);
-void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence);
 
+/**
+ * xe_tlb_inval_fence_wait() - TLB invalidiation fence wait
+ * @fence: TLB invalidation fence to wait on
+ *
+ * Wait on a TLB invalidiation fence until it signals, non interruptable
+ */
 static inline void
 xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence *fence)
 {
 	dma_fence_wait(&fence->base, false);
 }
 
+void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno);
+
 #endif	/* _XE_TLB_INVAL_ */
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
index 05b6adc929bb..c1ad96d24fc8 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
+++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
@@ -9,10 +9,85 @@
 #include <linux/workqueue.h>
 #include <linux/dma-fence.h>
 
-/** struct xe_tlb_inval - TLB invalidation client */
+struct xe_tlb_inval;
+
+/** struct xe_tlb_inval_ops - TLB invalidation ops (backend) */
+struct xe_tlb_inval_ops {
+	/**
+	 * @all: Invalidate all TLBs
+	 * @tlb_inval: TLB invalidation client
+	 * @seqno: Seqno of TLB invalidation
+	 *
+	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
+	 * failure
+	 */
+	int (*all)(struct xe_tlb_inval *tlb_inval, u32 seqno);
+
+	/**
+	 * @ggtt: Invalidate global translation TLBs
+	 * @tlb_inval: TLB invalidation client
+	 * @seqno: Seqno of TLB invalidation
+	 *
+	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
+	 * failure
+	 */
+	int (*ggtt)(struct xe_tlb_inval *tlb_inval, u32 seqno);
+
+	/**
+	 * @ppttt: Invalidate per-process translation TLBs
+	 * @tlb_inval: TLB invalidation client
+	 * @seqno: Seqno of TLB invalidation
+	 * @start: Start address
+	 * @end: End address
+	 * @asid: Address space ID
+	 *
+	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
+	 * failure
+	 */
+	int (*ppgtt)(struct xe_tlb_inval *tlb_inval, u32 seqno, u64 start,
+		     u64 end, u32 asid);
+
+	/**
+	 * @initialized: Backend is initialized
+	 * @tlb_inval: TLB invalidation client
+	 *
+	 * Return: True if back is initialized, False otherwise
+	 */
+	bool (*initialized)(struct xe_tlb_inval *tlb_inval);
+
+	/**
+	 * @flush: Flush pending TLB invalidations
+	 * @tlb_inval: TLB invalidation client
+	 */
+	void (*flush)(struct xe_tlb_inval *tlb_inval);
+
+	/**
+	 * @timeout_delay: Timeout delay for TLB invalidation
+	 * @tlb_inval: TLB invalidation client
+	 *
+	 * Return: Timeout delay for TLB invalidation in jiffies
+	 */
+	long (*timeout_delay)(struct xe_tlb_inval *tlb_inval);
+
+	/**
+	 * @lock: Lock resources protecting the backend seqno management
+	 */
+	void (*lock)(struct xe_tlb_inval *tlb_inval);
+
+	/**
+	 * @unlock: Lock resources protecting the backend seqno management
+	 */
+	void (*unlock)(struct xe_tlb_inval *tlb_inval);
+};
+
+/** struct xe_tlb_inval - TLB invalidation client (frontend) */
 struct xe_tlb_inval {
 	/** @private: Backend private pointer */
 	void *private;
+	/** @xe: Pointer to Xe device */
+	struct xe_device *xe;
+	/** @ops: TLB invalidation ops */
+	const struct xe_tlb_inval_ops *ops;
 	/** @tlb_inval.seqno: TLB invalidation seqno, protected by CT lock */
 #define TLB_INVALIDATION_SEQNO_MAX	0x100000
 	int seqno;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend
  2025-07-23 18:22 ` [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend stuartsummers
@ 2025-07-23 18:45   ` Matthew Brost
  2025-07-23 18:51     ` Matthew Brost
  2025-07-23 19:17   ` Matthew Brost
  1 sibling, 1 reply; 19+ messages in thread
From: Matthew Brost @ 2025-07-23 18:45 UTC (permalink / raw)
  To: stuartsummers; +Cc: matthew.auld, maarten.lankhorst, farah.kassabri, intel-xe

On Wed, Jul 23, 2025 at 06:22:22PM +0000, stuartsummers wrote:
> From: Matthew Brost <matthew.brost@intel.com>
> 
> The frontend exposes an API to the driver to send invalidations, handles
> sequence number assignment, synchronization (fences), and provides a
> timeout mechanism. The backend issues the actual invalidation to the
> hardware (or firmware).
> 
> The new layering easily allows issuing TLB invalidations to different
> hardware or firmware interfaces.
> 
> Normalize some naming while here too.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> ---
>  drivers/gpu/drm/xe/Makefile             |   1 +
>  drivers/gpu/drm/xe/xe_guc_ct.c          |   2 +-
>  drivers/gpu/drm/xe/xe_guc_tlb_inval.c   | 263 +++++++++++++
>  drivers/gpu/drm/xe/xe_guc_tlb_inval.h   |  19 +
>  drivers/gpu/drm/xe/xe_tlb_inval.c       | 495 +++++++-----------------
>  drivers/gpu/drm/xe/xe_tlb_inval.h       |  14 +-
>  drivers/gpu/drm/xe/xe_tlb_inval_types.h |  77 +++-
>  7 files changed, 505 insertions(+), 366 deletions(-)
>  create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.c
>  create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 332b2057cc00..8a2f836b3ab2 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -75,6 +75,7 @@ xe-y += xe_bb.o \
>  	xe_guc_log.o \
>  	xe_guc_pc.o \
>  	xe_guc_submit.o \
> +	xe_guc_tlb_inval.o \
>  	xe_heci_gsc.o \
>  	xe_huc.o \
>  	xe_hw_engine.o \
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 2ef86c0ae8b4..90ebda5b3790 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -30,9 +30,9 @@
>  #include "xe_guc_log.h"
>  #include "xe_guc_relay.h"
>  #include "xe_guc_submit.h"
> +#include "xe_guc_tlb_inval.h"
>  #include "xe_map.h"
>  #include "xe_pm.h"
> -#include "xe_tlb_inval.h"
>  #include "xe_trace_guc.h"
>  
>  static void receive_g2h(struct xe_guc_ct *ct);
> diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> new file mode 100644
> index 000000000000..27d7dc938cb1
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> @@ -0,0 +1,263 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include "abi/guc_actions_abi.h"
> +
> +#include "xe_device.h"
> +#include "xe_gt_stats.h"
> +#include "xe_gt_types.h"
> +#include "xe_guc.h"
> +#include "xe_guc_ct.h"
> +#include "xe_guc_tlb_inval.h"
> +#include "xe_force_wake.h"
> +#include "xe_mmio.h"
> +#include "xe_tlb_inval.h"
> +
> +#include "regs/xe_guc_regs.h"
> +
> +/*
> + * XXX: The seqno algorithm relies on TLB invalidation being processed in order
> + * which they currently are by the GuC, if that changes the algorithm will need
> + * to be updated.
> + */
> +
> +static int send_tlb_inval(struct xe_guc *guc, const u32 *action, int len)
> +{
> +	struct xe_gt *gt = guc_to_gt(guc);
> +
> +	lockdep_assert_held(&guc->ct.lock);
> +	xe_gt_assert(gt, action[1]);	/* Seqno */
> +
> +	xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> +	return xe_guc_ct_send(&guc->ct, action, len,
> +			      G2H_LEN_DW_TLB_INVALIDATE, 1);
> +}
> +
> +#define MAKE_INVAL_OP(type)	((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> +		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \
> +		XE_GUC_TLB_INVAL_FLUSH_CACHE)
> +
> +static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval, u32 seqno)
> +{
> +	struct xe_guc *guc = tlb_inval->private;
> +	u32 action[] = {
> +		XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> +		seqno,
> +		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> +	};
> +
> +	return send_tlb_inval(guc, action, ARRAY_SIZE(action));
> +}
> +
> +static int send_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval, u32 seqno)
> +{
> +	struct xe_guc *guc = tlb_inval->private;
> +	struct xe_gt *gt = guc_to_gt(guc);
> +	struct xe_device *xe = guc_to_xe(guc);
> +
> +	lockdep_assert_held(&guc->ct.lock);
> +
> +	/*
> +	 * Returning -ECANCELED in this function is squashed at the caller and
> +	 * signals waiters.
> +	 */
> +
> +	if (xe_guc_ct_enabled(&guc->ct) && guc->submission_state.enabled) {
> +		u32 action[] = {
> +			XE_GUC_ACTION_TLB_INVALIDATION,
> +			seqno,
> +			MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> +		};
> +
> +		return send_tlb_inval(guc, action, ARRAY_SIZE(action));
> +	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
> +		struct xe_mmio *mmio = &gt->mmio;
> +		unsigned int fw_ref;
> +
> +		if (IS_SRIOV_VF(xe))
> +			return -ECANCELED;
> +
> +		fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
> +		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
> +			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC1,
> +					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
> +			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC0,
> +					PVC_GUC_TLB_INV_DESC0_VALID);
> +		} else {
> +			xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> +					GUC_TLB_INV_CR_INVALIDATE);
> +		}
> +		xe_force_wake_put(gt_to_fw(gt), fw_ref);
> +	}
> +
> +	return -ECANCELED;
> +}
> +
> +/*
> + * Ensure that roundup_pow_of_two(length) doesn't overflow.
> + * Note that roundup_pow_of_two() operates on unsigned long,
> + * not on u64.
> + */
> +#define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
> +
> +static int send_tlb_inval_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno,
> +				u64 start, u64 end, u32 asid)
> +{
> +#define MAX_TLB_INVALIDATION_LEN	7
> +	struct xe_guc *guc = tlb_inval->private;
> +	struct xe_gt *gt = guc_to_gt(guc);
> +	u32 action[MAX_TLB_INVALIDATION_LEN];
> +	u64 length = end - start;
> +	int len = 0;
> +
> +	lockdep_assert_held(&guc->ct.lock);
> +
> +	if (guc_to_xe(guc)->info.force_execlist)
> +		return -ECANCELED;
> +
> +	action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> +	action[len++] = seqno;
> +	if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> +	    length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> +		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> +	} else {
> +		u64 orig_start = start;
> +		u64 align;
> +
> +		if (length < SZ_4K)
> +			length = SZ_4K;
> +
> +		/*
> +		 * We need to invalidate a higher granularity if start address
> +		 * is not aligned to length. When start is not aligned with
> +		 * length we need to find the length large enough to create an
> +		 * address mask covering the required range.
> +		 */
> +		align = roundup_pow_of_two(length);
> +		start = ALIGN_DOWN(start, align);
> +		end = ALIGN(end, align);
> +		length = align;
> +		while (start + length < end) {
> +			length <<= 1;
> +			start = ALIGN_DOWN(orig_start, length);
> +		}
> +
> +		/*
> +		 * Minimum invalidation size for a 2MB page that the hardware
> +		 * expects is 16MB
> +		 */
> +		if (length >= SZ_2M) {
> +			length = max_t(u64, SZ_16M, length);
> +			start = ALIGN_DOWN(orig_start, length);
> +		}
> +
> +		xe_gt_assert(gt, length >= SZ_4K);
> +		xe_gt_assert(gt, is_power_of_2(length));
> +		xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1,
> +						    ilog2(SZ_2M) + 1)));
> +		xe_gt_assert(gt, IS_ALIGNED(start, length));
> +
> +		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> +		action[len++] = asid;
> +		action[len++] = lower_32_bits(start);
> +		action[len++] = upper_32_bits(start);
> +		action[len++] = ilog2(length) - ilog2(SZ_4K);
> +	}
> +
> +	xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> +
> +	return send_tlb_inval(guc, action, len);
> +}
> +
> +static bool tlb_inval_initialized(struct xe_tlb_inval *tlb_inval)
> +{
> +	struct xe_guc *guc = tlb_inval->private;
> +
> +	return xe_guc_ct_initialized(&guc->ct);
> +}
> +
> +static void tlb_inval_flush(struct xe_tlb_inval *tlb_inval)
> +{
> +	struct xe_guc *guc = tlb_inval->private;
> +
> +	LNL_FLUSH_WORK(&guc->ct.g2h_worker);
> +}
> +
> +static long tlb_inval_timeout_delay(struct xe_tlb_inval *tlb_inval)
> +{
> +	struct xe_guc *guc = tlb_inval->private;
> +
> +	/* this reflects what HW/GuC needs to process TLB inv request */
> +	const long hw_tlb_timeout = HZ / 4;
> +
> +	/* this estimates actual delay caused by the CTB transport */
> +	long delay = xe_guc_ct_queue_proc_time_jiffies(&guc->ct);
> +
> +	return hw_tlb_timeout + 2 * delay;
> +}
> +
> +static void tlb_inval_lock(struct xe_tlb_inval *tlb_inval)
> +{
> +	struct xe_guc *guc = tlb_inval->private;
> +
> +	mutex_lock(&guc->ct.lock);
> +}
> +
> +static void tlb_inval_unlock(struct xe_tlb_inval *tlb_inval)
> +{
> +	struct xe_guc *guc = tlb_inval->private;
> +
> +	mutex_unlock(&guc->ct.lock);
> +}
> +
> +static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
> +	.all = send_tlb_inval_all,
> +	.ggtt = send_tlb_inval_ggtt,
> +	.ppgtt = send_tlb_inval_ppgtt,
> +	.initialized = tlb_inval_initialized,
> +	.flush = tlb_inval_flush,
> +	.timeout_delay = tlb_inval_timeout_delay,
> +	.lock = tlb_inval_lock,
> +	.unlock = tlb_inval_unlock,

Where are the lock / unlock helpers planned to be used?

Matt

> +};
> +
> +/**
> + * xe_guc_tlb_inval_init_early() - Init GuC TLB invalidation early
> + * @guc: GuC object
> + * @tlb_inval: TLB invalidation client
> + *
> + * Inititialize GuC TLB invalidation by setting back pointer in TLB invalidation
> + * client to the GuC and setting GuC backend ops.
> + */
> +void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> +				 struct xe_tlb_inval *tlb_inval)
> +{
> +	tlb_inval->private = guc;
> +	tlb_inval->ops = &guc_tlb_inval_ops;
> +}
> +
> +/**
> + * xe_guc_tlb_inval_done_handler() - TLB invalidation done handler
> + * @guc: guc
> + * @msg: message indicating TLB invalidation done
> + * @len: length of message
> + *
> + * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
> + * invalidation fences for seqno. Algorithm for this depends on seqno being
> + * received in-order and asserts this assumption.
> + *
> + * Return: 0 on success, -EPROTO for malformed messages.
> + */
> +int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> +{
> +	struct xe_gt *gt = guc_to_gt(guc);
> +
> +	if (unlikely(len != 1))
> +		return -EPROTO;
> +
> +	xe_tlb_inval_done_handler(&gt->tlb_inval, msg[0]);
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.h b/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> new file mode 100644
> index 000000000000..07d668b02e3d
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_GUC_TLB_INVAL_H_
> +#define _XE_GUC_TLB_INVAL_H_
> +
> +#include <linux/types.h>
> +
> +struct xe_guc;
> +struct xe_tlb_inval;
> +
> +void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> +				 struct xe_tlb_inval *tlb_inval);
> +
> +int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
> index c795b78362bf..071c25fbdbac 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval.c
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
> @@ -12,50 +12,45 @@
>  #include "xe_gt_printk.h"
>  #include "xe_guc.h"
>  #include "xe_guc_ct.h"
> +#include "xe_guc_tlb_inval.h"
>  #include "xe_gt_stats.h"
>  #include "xe_tlb_inval.h"
>  #include "xe_mmio.h"
>  #include "xe_pm.h"
> -#include "xe_sriov.h"
> +#include "xe_tlb_inval.h"
>  #include "xe_trace.h"
> -#include "regs/xe_guc_regs.h"
> -
> -#define FENCE_STACK_BIT		DMA_FENCE_FLAG_USER_BITS
>  
> -/*
> - * TLB inval depends on pending commands in the CT queue and then the real
> - * invalidation time. Double up the time to process full CT queue
> - * just to be on the safe side.
> +/**
> + * DOC: Xe TLB invalidation
> + *
> + * Xe TLB invalidation is implemented in two layers. The first is the frontend
> + * API, which provides an interface for TLB invalidations to the driver code.
> + * The frontend handles seqno assignment, synchronization (fences), and the
> + * timeout mechanism. The frontend is implemented via an embedded structure
> + * xe_tlb_inval that includes a set of ops hooking into the backend. The backend
> + * interacts with the hardware (or firmware) to perform the actual invalidation.
>   */
> -static long tlb_timeout_jiffies(struct xe_gt *gt)
> -{
> -	/* this reflects what HW/GuC needs to process TLB inv request */
> -	const long hw_tlb_timeout = HZ / 4;
>  
> -	/* this estimates actual delay caused by the CTB transport */
> -	long delay = xe_guc_ct_queue_proc_time_jiffies(&gt->uc.guc.ct);
> -
> -	return hw_tlb_timeout + 2 * delay;
> -}
> +#define FENCE_STACK_BIT		DMA_FENCE_FLAG_USER_BITS
>  
>  static void xe_tlb_inval_fence_fini(struct xe_tlb_inval_fence *fence)
>  {
> -	struct xe_gt *gt;
> -
>  	if (WARN_ON_ONCE(!fence->tlb_inval))
>  		return;
>  
> -	gt = fence->tlb_inval->private;
> -	xe_pm_runtime_put(gt_to_xe(gt));
> +	xe_pm_runtime_put(fence->tlb_inval->xe);
>  	fence->tlb_inval = NULL; /* fini() should be called once */
>  }
>  
>  static void
> -__inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
> +xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence)
>  {
>  	bool stack = test_bit(FENCE_STACK_BIT, &fence->base.flags);
>  
> -	trace_xe_tlb_inval_fence_signal(xe, fence);
> +	lockdep_assert_held(&fence->tlb_inval->pending_lock);
> +
> +	list_del(&fence->link);
> +	trace_xe_tlb_inval_fence_signal(fence->tlb_inval->xe, fence);
>  	xe_tlb_inval_fence_fini(fence);
>  	dma_fence_signal(&fence->base);
>  	if (!stack)
> @@ -63,57 +58,50 @@ __inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
>  }
>  
>  static void
> -inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
> +xe_tlb_inval_fence_signal_unlocked(struct xe_tlb_inval_fence *fence)
>  {
> -	lockdep_assert_held(&fence->tlb_inval->pending_lock);
> -
> -	list_del(&fence->link);
> -	__inval_fence_signal(xe, fence);
> -}
> +	struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
>  
> -static void
> -inval_fence_signal_unlocked(struct xe_device *xe,
> -			    struct xe_tlb_inval_fence *fence)
> -{
> -	spin_lock_irq(&fence->tlb_inval->pending_lock);
> -	inval_fence_signal(xe, fence);
> -	spin_unlock_irq(&fence->tlb_inval->pending_lock);
> +	spin_lock_irq(&tlb_inval->pending_lock);
> +	xe_tlb_inval_fence_signal(fence);
> +	spin_unlock_irq(&tlb_inval->pending_lock);
>  }
>  
> -static void xe_gt_tlb_fence_timeout(struct work_struct *work)
> +static void xe_tlb_inval_fence_timeout(struct work_struct *work)
>  {
> -	struct xe_gt *gt = container_of(work, struct xe_gt,
> -					tlb_inval.fence_tdr.work);
> -	struct xe_device *xe = gt_to_xe(gt);
> +	struct xe_tlb_inval *tlb_inval = container_of(work, struct xe_tlb_inval,
> +						      fence_tdr.work);
> +	struct xe_device *xe = tlb_inval->xe;
>  	struct xe_tlb_inval_fence *fence, *next;
> +	long timeout_delay = tlb_inval->ops->timeout_delay(tlb_inval);
>  
> -	LNL_FLUSH_WORK(&gt->uc.guc.ct.g2h_worker);
> +	tlb_inval->ops->flush(tlb_inval);
>  
> -	spin_lock_irq(&gt->tlb_inval.pending_lock);
> +	spin_lock_irq(&tlb_inval->pending_lock);
>  	list_for_each_entry_safe(fence, next,
> -				 &gt->tlb_inval.pending_fences, link) {
> +				 &tlb_inval->pending_fences, link) {
>  		s64 since_inval_ms = ktime_ms_delta(ktime_get(),
>  						    fence->inval_time);
>  
> -		if (msecs_to_jiffies(since_inval_ms) < tlb_timeout_jiffies(gt))
> +		if (msecs_to_jiffies(since_inval_ms) < timeout_delay)
>  			break;
>  
>  		trace_xe_tlb_inval_fence_timeout(xe, fence);
> -		xe_gt_err(gt, "TLB invalidation fence timeout, seqno=%d recv=%d",
> -			  fence->seqno, gt->tlb_inval.seqno_recv);
> +		drm_err(&xe->drm,
> +			"TLB invalidation fence timeout, seqno=%d recv=%d",
> +			fence->seqno, tlb_inval->seqno_recv);
>  
>  		fence->base.error = -ETIME;
> -		inval_fence_signal(xe, fence);
> +		xe_tlb_inval_fence_signal(fence);
>  	}
> -	if (!list_empty(&gt->tlb_inval.pending_fences))
> -		queue_delayed_work(system_wq,
> -				   &gt->tlb_inval.fence_tdr,
> -				   tlb_timeout_jiffies(gt));
> -	spin_unlock_irq(&gt->tlb_inval.pending_lock);
> +	if (!list_empty(&tlb_inval->pending_fences))
> +		queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
> +				   timeout_delay);
> +	spin_unlock_irq(&tlb_inval->pending_lock);
>  }
>  
>  /**
> - * xe_tlb_inval_init_early - Initialize TLB invalidation state
> + * xe_gt_tlb_inval_init_early() - Initialize TLB invalidation state
>   * @gt: GT structure
>   *
>   * Initialize TLB invalidation state, purely software initialization, should
> @@ -123,13 +111,12 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
>   */
>  int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
>  {
> -	gt->tlb_inval.private = gt;
> +	gt->tlb_inval.xe = gt_to_xe(gt);
>  	gt->tlb_inval.seqno = 1;
>  	INIT_LIST_HEAD(&gt->tlb_inval.pending_fences);
>  	spin_lock_init(&gt->tlb_inval.pending_lock);
>  	spin_lock_init(&gt->tlb_inval.lock);
> -	INIT_DELAYED_WORK(&gt->tlb_inval.fence_tdr,
> -			  xe_gt_tlb_fence_timeout);
> +	INIT_DELAYED_WORK(&gt->tlb_inval.fence_tdr, xe_tlb_inval_fence_timeout);
>  
>  	gt->tlb_inval.job_wq =
>  		drmm_alloc_ordered_workqueue(&gt_to_xe(gt)->drm, "gt-tbl-inval-job-wq",
> @@ -137,60 +124,64 @@ int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
>  	if (IS_ERR(gt->tlb_inval.job_wq))
>  		return PTR_ERR(gt->tlb_inval.job_wq);
>  
> +	/* XXX: Blindly setting up backend to GuC */
> +	xe_guc_tlb_inval_init_early(&gt->uc.guc, &gt->tlb_inval);
> +
>  	return 0;
>  }
>  
>  /**
> - * xe_tlb_inval_reset - Initialize TLB invalidation reset
> + * xe_tlb_inval_reset() - TLB invalidation reset
>   * @tlb_inval: TLB invalidation client
>   *
>   * Signal any pending invalidation fences, should be called during a GT reset
>   */
>  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval)
>  {
> -	struct xe_gt *gt = tlb_inval->private;
>  	struct xe_tlb_inval_fence *fence, *next;
>  	int pending_seqno;
>  
>  	/*
> -	 * we can get here before the CTs are even initialized if we're wedging
> -	 * very early, in which case there are not going to be any pending
> -	 * fences so we can bail immediately.
> +	 * we can get here before the backends are even initialized if we're
> +	 * wedging very early, in which case there are not going to be any
> +	 * pendind fences so we can bail immediately.
>  	 */
> -	if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
> +	if (!tlb_inval->ops->initialized(tlb_inval))
>  		return;
>  
>  	/*
> -	 * CT channel is already disabled at this point. No new TLB requests can
> +	 * Backend is already disabled at this point. No new TLB requests can
>  	 * appear.
>  	 */
>  
> -	mutex_lock(&gt->uc.guc.ct.lock);
> -	spin_lock_irq(&gt->tlb_inval.pending_lock);
> -	cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> +	tlb_inval->ops->lock(tlb_inval);
> +	spin_lock_irq(&tlb_inval->pending_lock);
> +	cancel_delayed_work(&tlb_inval->fence_tdr);
>  	/*
>  	 * We might have various kworkers waiting for TLB flushes to complete
>  	 * which are not tracked with an explicit TLB fence, however at this
> -	 * stage that will never happen since the CT is already disabled, so
> -	 * make sure we signal them here under the assumption that we have
> +	 * stage that will never happen since the backend is already disabled,
> +	 * so make sure we signal them here under the assumption that we have
>  	 * completed a full GT reset.
>  	 */
> -	if (gt->tlb_inval.seqno == 1)
> +	if (tlb_inval->seqno == 1)
>  		pending_seqno = TLB_INVALIDATION_SEQNO_MAX - 1;
>  	else
> -		pending_seqno = gt->tlb_inval.seqno - 1;
> -	WRITE_ONCE(gt->tlb_inval.seqno_recv, pending_seqno);
> +		pending_seqno = tlb_inval->seqno - 1;
> +	WRITE_ONCE(tlb_inval->seqno_recv, pending_seqno);
>  
>  	list_for_each_entry_safe(fence, next,
> -				 &gt->tlb_inval.pending_fences, link)
> -		inval_fence_signal(gt_to_xe(gt), fence);
> -	spin_unlock_irq(&gt->tlb_inval.pending_lock);
> -	mutex_unlock(&gt->uc.guc.ct.lock);
> +				 &tlb_inval->pending_fences, link)
> +		xe_tlb_inval_fence_signal(fence);
> +	spin_unlock_irq(&tlb_inval->pending_lock);
> +	tlb_inval->ops->unlock(tlb_inval);
>  }
>  
> -static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
> +static bool xe_tlb_inval_seqno_past(struct xe_tlb_inval *tlb_inval, int seqno)
>  {
> -	int seqno_recv = READ_ONCE(gt->tlb_inval.seqno_recv);
> +	int seqno_recv = READ_ONCE(tlb_inval->seqno_recv);
> +
> +	lockdep_assert_held(&tlb_inval->pending_lock);
>  
>  	if (seqno - seqno_recv < -(TLB_INVALIDATION_SEQNO_MAX / 2))
>  		return false;
> @@ -201,44 +192,20 @@ static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
>  	return seqno_recv >= seqno;
>  }
>  
> -static int send_tlb_inval(struct xe_guc *guc, const u32 *action, int len)
> -{
> -	struct xe_gt *gt = guc_to_gt(guc);
> -
> -	xe_gt_assert(gt, action[1]);	/* Seqno */
> -	lockdep_assert_held(&guc->ct.lock);
> -
> -	/*
> -	 * XXX: The seqno algorithm relies on TLB invalidation being processed
> -	 * in order which they currently are, if that changes the algorithm will
> -	 * need to be updated.
> -	 */
> -
> -	xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> -
> -	return xe_guc_ct_send(&guc->ct, action, len,
> -			      G2H_LEN_DW_TLB_INVALIDATE, 1);
> -}
> -
>  static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence *fence)
>  {
>  	struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
> -	struct xe_gt *gt = tlb_inval->private;
> -	struct xe_device *xe = gt_to_xe(gt);
> -
> -	lockdep_assert_held(&gt->uc.guc.ct.lock);
>  
>  	fence->seqno = tlb_inval->seqno;
> -	trace_xe_tlb_inval_fence_send(xe, fence);
> +	trace_xe_tlb_inval_fence_send(tlb_inval->xe, fence);
>  
>  	spin_lock_irq(&tlb_inval->pending_lock);
>  	fence->inval_time = ktime_get();
>  	list_add_tail(&fence->link, &tlb_inval->pending_fences);
>  
>  	if (list_is_singular(&tlb_inval->pending_fences))
> -		queue_delayed_work(system_wq,
> -				   &tlb_inval->fence_tdr,
> -				   tlb_timeout_jiffies(gt));
> +		queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
> +				   tlb_inval->ops->timeout_delay(tlb_inval));
>  	spin_unlock_irq(&tlb_inval->pending_lock);
>  
>  	tlb_inval->seqno = (tlb_inval->seqno + 1) %
> @@ -247,202 +214,63 @@ static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence *fence)
>  		tlb_inval->seqno = 1;
>  }
>  
> -#define MAKE_INVAL_OP(type)	((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> -		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \
> -		XE_GUC_TLB_INVAL_FLUSH_CACHE)
> -
> -static int send_tlb_inval_ggtt(struct xe_gt *gt, int seqno)
> -{
> -	u32 action[] = {
> -		XE_GUC_ACTION_TLB_INVALIDATION,
> -		seqno,
> -		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> -	};
> -
> -	return send_tlb_inval(&gt->uc.guc, action, ARRAY_SIZE(action));
> -}
> -
> -static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> -			      struct xe_tlb_inval_fence *fence)
> -{
> -	u32 action[] = {
> -		XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> -		0,  /* seqno, replaced in send_tlb_inval */
> -		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> -	};
> -	struct xe_gt *gt = tlb_inval->private;
> -
> -	xe_gt_assert(gt, fence);
> -
> -	return send_tlb_inval(&gt->uc.guc, action, ARRAY_SIZE(action));
> -}
> +#define xe_tlb_inval_issue(__tlb_inval, __fence, op, args...)	\
> +({								\
> +	int __ret;						\
> +								\
> +	xe_assert((__tlb_inval)->xe, (__tlb_inval)->ops);	\
> +	xe_assert((__tlb_inval)->xe, (__fence));		\
> +								\
> +	(__tlb_inval)->ops->lock((__tlb_inval));		\
> +	xe_tlb_inval_fence_prep((__fence));			\
> +	__ret = op((__tlb_inval), (__fence)->seqno, ##args);	\
> +	if (__ret < 0)						\
> +		xe_tlb_inval_fence_signal_unlocked((__fence));	\
> +	(__tlb_inval)->ops->unlock((__tlb_inval));		\
> +								\
> +	__ret == -ECANCELED ? 0 : __ret;			\
> +})
>  
>  /**
> - * xe_gt_tlb_invalidation_all - Invalidate all TLBs across PF and all VFs.
> - * @gt: the &xe_gt structure
> - * @fence: the &xe_tlb_inval_fence to be signaled on completion
> + * xe_tlb_inval_all() - Issue a TLB invalidation for all TLBs
> + * @tlb_inval: TLB invalidation client
> + * @fence: invalidation fence which will be signal on TLB invalidation
> + * completion
>   *
> - * Send a request to invalidate all TLBs across PF and all VFs.
> + * Issue a TLB invalidation for all TLBs. Completion of TLB is asynchronous and
> + * caller can use the invalidation fence to wait for completion.
>   *
>   * Return: 0 on success, negative error code on error
>   */
>  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
>  		     struct xe_tlb_inval_fence *fence)
>  {
> -	struct xe_gt *gt = tlb_inval->private;
> -	int err;
> -
> -	err = send_tlb_inval_all(tlb_inval, fence);
> -	if (err)
> -		xe_gt_err(gt, "TLB invalidation request failed (%pe)", ERR_PTR(err));
> -
> -	return err;
> -}
> -
> -/*
> - * Ensure that roundup_pow_of_two(length) doesn't overflow.
> - * Note that roundup_pow_of_two() operates on unsigned long,
> - * not on u64.
> - */
> -#define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
> -
> -static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64 start, u64 end,
> -				u32 asid, int seqno)
> -{
> -#define MAX_TLB_INVALIDATION_LEN	7
> -	u32 action[MAX_TLB_INVALIDATION_LEN];
> -	u64 length = end - start;
> -	int len = 0;
> -
> -	action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> -	action[len++] = seqno;
> -	if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> -	    length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> -		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> -	} else {
> -		u64 orig_start = start;
> -		u64 align;
> -
> -		if (length < SZ_4K)
> -			length = SZ_4K;
> -
> -		/*
> -		 * We need to invalidate a higher granularity if start address
> -		 * is not aligned to length. When start is not aligned with
> -		 * length we need to find the length large enough to create an
> -		 * address mask covering the required range.
> -		 */
> -		align = roundup_pow_of_two(length);
> -		start = ALIGN_DOWN(start, align);
> -		end = ALIGN(end, align);
> -		length = align;
> -		while (start + length < end) {
> -			length <<= 1;
> -			start = ALIGN_DOWN(orig_start, length);
> -		}
> -
> -		/*
> -		 * Minimum invalidation size for a 2MB page that the hardware
> -		 * expects is 16MB
> -		 */
> -		if (length >= SZ_2M) {
> -			length = max_t(u64, SZ_16M, length);
> -			start = ALIGN_DOWN(orig_start, length);
> -		}
> -
> -		xe_gt_assert(gt, length >= SZ_4K);
> -		xe_gt_assert(gt, is_power_of_2(length));
> -		xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1,
> -						    ilog2(SZ_2M) + 1)));
> -		xe_gt_assert(gt, IS_ALIGNED(start, length));
> -
> -		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> -		action[len++] = asid;
> -		action[len++] = lower_32_bits(start);
> -		action[len++] = upper_32_bits(start);
> -		action[len++] = ilog2(length) - ilog2(SZ_4K);
> -	}
> -
> -	xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> -
> -	return send_tlb_inval(&gt->uc.guc, action, len);
> -}
> -
> -static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
> -			       struct xe_tlb_inval_fence *fence)
> -{
> -	int ret;
> -
> -	mutex_lock(&gt->uc.guc.ct.lock);
> -
> -	xe_tlb_inval_fence_prep(fence);
> -
> -	ret = send_tlb_inval_ggtt(gt, fence->seqno);
> -	if (ret < 0)
> -		inval_fence_signal_unlocked(gt_to_xe(gt), fence);
> -
> -	mutex_unlock(&gt->uc.guc.ct.lock);
> -
> -	/*
> -	 * -ECANCELED indicates the CT is stopped for a GT reset. TLB caches
> -	 *  should be nuked on a GT reset so this error can be ignored.
> -	 */
> -	if (ret == -ECANCELED)
> -		return 0;
> -
> -	return ret;
> +	return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval->ops->all);
>  }
>  
>  /**
> - * xe_tlb_inval_ggtt - Issue a TLB invalidation on this GT for the GGTT
> + * xe_tlb_inval_ggtt() - Issue a TLB invalidation for the GGTT
>   * @tlb_inval: TLB invalidation client
>   *
> - * Issue a TLB invalidation for the GGTT. Completion of TLB invalidation is
> - * synchronous.
> + * Issue a TLB invalidation for the GGTT. Completion of TLB is asynchronous and
> + * caller can use the invalidation fence to wait for completion.
>   *
>   * Return: 0 on success, negative error code on error
>   */
>  int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
>  {
> -	struct xe_gt *gt = tlb_inval->private;
> -	struct xe_device *xe = gt_to_xe(gt);
> -	unsigned int fw_ref;
> -
> -	if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
> -	    gt->uc.guc.submission_state.enabled) {
> -		struct xe_tlb_inval_fence fence;
> -		int ret;
> -
> -		xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> -		ret = __xe_tlb_inval_ggtt(gt, &fence);
> -		if (ret)
> -			return ret;
> -
> -		xe_tlb_inval_fence_wait(&fence);
> -	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
> -		struct xe_mmio *mmio = &gt->mmio;
> -
> -		if (IS_SRIOV_VF(xe))
> -			return 0;
> -
> -		fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
> -		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
> -			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC1,
> -					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
> -			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC0,
> -					PVC_GUC_TLB_INV_DESC0_VALID);
> -		} else {
> -			xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> -					GUC_TLB_INV_CR_INVALIDATE);
> -		}
> -		xe_force_wake_put(gt_to_fw(gt), fw_ref);
> -	}
> +	struct xe_tlb_inval_fence fence, *fence_ptr = &fence;
> +	int ret;
>  
> -	return 0;
> +	xe_tlb_inval_fence_init(tlb_inval, fence_ptr, true);
> +	ret = xe_tlb_inval_issue(tlb_inval, fence_ptr, tlb_inval->ops->ggtt);
> +	xe_tlb_inval_fence_wait(fence_ptr);
> +
> +	return ret;
>  }
>  
>  /**
> - * xe_tlb_inval_range - Issue a TLB invalidation on this GT for an address range
> + * xe_tlb_inval_range() - Issue a TLB invalidation for an address range
>   * @tlb_inval: TLB invalidation client
>   * @fence: invalidation fence which will be signal on TLB invalidation
>   * completion
> @@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
>  		       struct xe_tlb_inval_fence *fence, u64 start, u64 end,
>  		       u32 asid)
>  {
> -	struct xe_gt *gt = tlb_inval->private;
> -	struct xe_device *xe = gt_to_xe(gt);
> -	int  ret;
> -
> -	xe_gt_assert(gt, fence);
> -
> -	/* Execlists not supported */
> -	if (xe->info.force_execlist) {
> -		__inval_fence_signal(xe, fence);
> -		return 0;
> -	}
> -
> -	mutex_lock(&gt->uc.guc.ct.lock);
> -
> -	xe_tlb_inval_fence_prep(fence);
> -
> -	ret = send_tlb_inval_ppgtt(gt, start, end, asid, fence->seqno);
> -	if (ret < 0)
> -		inval_fence_signal_unlocked(xe, fence);
> -
> -	mutex_unlock(&gt->uc.guc.ct.lock);
> -
> -	return ret;
> +	return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval->ops->ppgtt,
> +				  start, end, asid);
>  }
>  
>  /**
> - * xe_tlb_inval_vm - Issue a TLB invalidation on this GT for a VM
> + * xe_tlb_inval_vm() - Issue a TLB invalidation for a VM
>   * @tlb_inval: TLB invalidation client
>   * @vm: VM to invalidate
>   *
> @@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm)
>  {
>  	struct xe_tlb_inval_fence fence;
>  	u64 range = 1ull << vm->xe->info.va_bits;
> -	int ret;
>  
>  	xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> -
> -	ret = xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm->usm.asid);
> -	if (ret < 0)
> -		return;
> -
> +	xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm->usm.asid);
>  	xe_tlb_inval_fence_wait(&fence);
>  }
>  
>  /**
> - * xe_tlb_inval_done_handler - TLB invalidation done handler
> - * @gt: gt
> + * xe_tlb_inval_done_handler() - TLB invalidation done handler
> + * @tlb_inval: TLB invalidation client
>   * @seqno: seqno of invalidation that is done
>   *
>   * Update recv seqno, signal any TLB invalidation fences, and restart TDR
>   */
> -static void xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> +void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno)
>  {
> -	struct xe_device *xe = gt_to_xe(gt);
> +	struct xe_device *xe = tlb_inval->xe;
>  	struct xe_tlb_inval_fence *fence, *next;
>  	unsigned long flags;
>  
> @@ -535,77 +337,53 @@ static void xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
>  	 * officially process the CT message like if racing against
>  	 * process_g2h_msg().
>  	 */
> -	spin_lock_irqsave(&gt->tlb_inval.pending_lock, flags);
> -	if (tlb_inval_seqno_past(gt, seqno)) {
> -		spin_unlock_irqrestore(&gt->tlb_inval.pending_lock, flags);
> +	spin_lock_irqsave(&tlb_inval->pending_lock, flags);
> +	if (xe_tlb_inval_seqno_past(tlb_inval, seqno)) {
> +		spin_unlock_irqrestore(&tlb_inval->pending_lock, flags);
>  		return;
>  	}
>  
> -	WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
> +	WRITE_ONCE(tlb_inval->seqno_recv, seqno);
>  
>  	list_for_each_entry_safe(fence, next,
> -				 &gt->tlb_inval.pending_fences, link) {
> +				 &tlb_inval->pending_fences, link) {
>  		trace_xe_tlb_inval_fence_recv(xe, fence);
>  
> -		if (!tlb_inval_seqno_past(gt, fence->seqno))
> +		if (!xe_tlb_inval_seqno_past(tlb_inval, fence->seqno))
>  			break;
>  
> -		inval_fence_signal(xe, fence);
> +		xe_tlb_inval_fence_signal(fence);
>  	}
>  
> -	if (!list_empty(&gt->tlb_inval.pending_fences))
> +	if (!list_empty(&tlb_inval->pending_fences))
>  		mod_delayed_work(system_wq,
> -				 &gt->tlb_inval.fence_tdr,
> -				 tlb_timeout_jiffies(gt));
> +				 &tlb_inval->fence_tdr,
> +				 tlb_inval->ops->timeout_delay(tlb_inval));
>  	else
> -		cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> +		cancel_delayed_work(&tlb_inval->fence_tdr);
>  
> -	spin_unlock_irqrestore(&gt->tlb_inval.pending_lock, flags);
> -}
> -
> -/**
> - * xe_guc_tlb_inval_done_handler - TLB invalidation done handler
> - * @guc: guc
> - * @msg: message indicating TLB invalidation done
> - * @len: length of message
> - *
> - * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
> - * invalidation fences for seqno. Algorithm for this depends on seqno being
> - * received in-order and asserts this assumption.
> - *
> - * Return: 0 on success, -EPROTO for malformed messages.
> - */
> -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> -{
> -	struct xe_gt *gt = guc_to_gt(guc);
> -
> -	if (unlikely(len != 1))
> -		return -EPROTO;
> -
> -	xe_tlb_inval_done_handler(gt, msg[0]);
> -
> -	return 0;
> +	spin_unlock_irqrestore(&tlb_inval->pending_lock, flags);
>  }
>  
>  static const char *
> -inval_fence_get_driver_name(struct dma_fence *dma_fence)
> +xe_inval_fence_get_driver_name(struct dma_fence *dma_fence)
>  {
>  	return "xe";
>  }
>  
>  static const char *
> -inval_fence_get_timeline_name(struct dma_fence *dma_fence)
> +xe_inval_fence_get_timeline_name(struct dma_fence *dma_fence)
>  {
> -	return "inval_fence";
> +	return "tlb_inval_fence";
>  }
>  
>  static const struct dma_fence_ops inval_fence_ops = {
> -	.get_driver_name = inval_fence_get_driver_name,
> -	.get_timeline_name = inval_fence_get_timeline_name,
> +	.get_driver_name = xe_inval_fence_get_driver_name,
> +	.get_timeline_name = xe_inval_fence_get_timeline_name,
>  };
>  
>  /**
> - * xe_tlb_inval_fence_init - Initialize TLB invalidation fence
> + * xe_tlb_inval_fence_init() - Initialize TLB invalidation fence
>   * @tlb_inval: TLB invalidation client
>   * @fence: TLB invalidation fence to initialize
>   * @stack: fence is stack variable
> @@ -618,15 +396,12 @@ void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
>  			     struct xe_tlb_inval_fence *fence,
>  			     bool stack)
>  {
> -	struct xe_gt *gt = tlb_inval->private;
> -
> -	xe_pm_runtime_get_noresume(gt_to_xe(gt));
> +	xe_pm_runtime_get_noresume(tlb_inval->xe);
>  
> -	spin_lock_irq(&gt->tlb_inval.lock);
> -	dma_fence_init(&fence->base, &inval_fence_ops,
> -		       &gt->tlb_inval.lock,
> +	spin_lock_irq(&tlb_inval->lock);
> +	dma_fence_init(&fence->base, &inval_fence_ops, &tlb_inval->lock,
>  		       dma_fence_context_alloc(1), 1);
> -	spin_unlock_irq(&gt->tlb_inval.lock);
> +	spin_unlock_irq(&tlb_inval->lock);
>  	INIT_LIST_HEAD(&fence->link);
>  	if (stack)
>  		set_bit(FENCE_STACK_BIT, &fence->base.flags);
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h b/drivers/gpu/drm/xe/xe_tlb_inval.h
> index 7adee3f8c551..cdeafc8d4391 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> @@ -18,24 +18,30 @@ struct xe_vma;
>  int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
>  
>  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval);
> -int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> -void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm);
>  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
>  		     struct xe_tlb_inval_fence *fence);
> +int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> +void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm);
>  int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
>  		       struct xe_tlb_inval_fence *fence,
>  		       u64 start, u64 end, u32 asid);
> -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
>  
>  void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
>  			     struct xe_tlb_inval_fence *fence,
>  			     bool stack);
> -void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence);
>  
> +/**
> + * xe_tlb_inval_fence_wait() - TLB invalidiation fence wait
> + * @fence: TLB invalidation fence to wait on
> + *
> + * Wait on a TLB invalidiation fence until it signals, non interruptable
> + */
>  static inline void
>  xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence *fence)
>  {
>  	dma_fence_wait(&fence->base, false);
>  }
>  
> +void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno);
> +
>  #endif	/* _XE_TLB_INVAL_ */
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> index 05b6adc929bb..c1ad96d24fc8 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> @@ -9,10 +9,85 @@
>  #include <linux/workqueue.h>
>  #include <linux/dma-fence.h>
>  
> -/** struct xe_tlb_inval - TLB invalidation client */
> +struct xe_tlb_inval;
> +
> +/** struct xe_tlb_inval_ops - TLB invalidation ops (backend) */
> +struct xe_tlb_inval_ops {
> +	/**
> +	 * @all: Invalidate all TLBs
> +	 * @tlb_inval: TLB invalidation client
> +	 * @seqno: Seqno of TLB invalidation
> +	 *
> +	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
> +	 * failure
> +	 */
> +	int (*all)(struct xe_tlb_inval *tlb_inval, u32 seqno);
> +
> +	/**
> +	 * @ggtt: Invalidate global translation TLBs
> +	 * @tlb_inval: TLB invalidation client
> +	 * @seqno: Seqno of TLB invalidation
> +	 *
> +	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
> +	 * failure
> +	 */
> +	int (*ggtt)(struct xe_tlb_inval *tlb_inval, u32 seqno);
> +
> +	/**
> +	 * @ppttt: Invalidate per-process translation TLBs
> +	 * @tlb_inval: TLB invalidation client
> +	 * @seqno: Seqno of TLB invalidation
> +	 * @start: Start address
> +	 * @end: End address
> +	 * @asid: Address space ID
> +	 *
> +	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
> +	 * failure
> +	 */
> +	int (*ppgtt)(struct xe_tlb_inval *tlb_inval, u32 seqno, u64 start,
> +		     u64 end, u32 asid);
> +
> +	/**
> +	 * @initialized: Backend is initialized
> +	 * @tlb_inval: TLB invalidation client
> +	 *
> +	 * Return: True if back is initialized, False otherwise
> +	 */
> +	bool (*initialized)(struct xe_tlb_inval *tlb_inval);
> +
> +	/**
> +	 * @flush: Flush pending TLB invalidations
> +	 * @tlb_inval: TLB invalidation client
> +	 */
> +	void (*flush)(struct xe_tlb_inval *tlb_inval);
> +
> +	/**
> +	 * @timeout_delay: Timeout delay for TLB invalidation
> +	 * @tlb_inval: TLB invalidation client
> +	 *
> +	 * Return: Timeout delay for TLB invalidation in jiffies
> +	 */
> +	long (*timeout_delay)(struct xe_tlb_inval *tlb_inval);
> +
> +	/**
> +	 * @lock: Lock resources protecting the backend seqno management
> +	 */
> +	void (*lock)(struct xe_tlb_inval *tlb_inval);
> +
> +	/**
> +	 * @unlock: Lock resources protecting the backend seqno management
> +	 */
> +	void (*unlock)(struct xe_tlb_inval *tlb_inval);
> +};
> +
> +/** struct xe_tlb_inval - TLB invalidation client (frontend) */
>  struct xe_tlb_inval {
>  	/** @private: Backend private pointer */
>  	void *private;
> +	/** @xe: Pointer to Xe device */
> +	struct xe_device *xe;
> +	/** @ops: TLB invalidation ops */
> +	const struct xe_tlb_inval_ops *ops;
>  	/** @tlb_inval.seqno: TLB invalidation seqno, protected by CT lock */
>  #define TLB_INVALIDATION_SEQNO_MAX	0x100000
>  	int seqno;
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend
  2025-07-23 18:45   ` Matthew Brost
@ 2025-07-23 18:51     ` Matthew Brost
  0 siblings, 0 replies; 19+ messages in thread
From: Matthew Brost @ 2025-07-23 18:51 UTC (permalink / raw)
  To: stuartsummers; +Cc: matthew.auld, maarten.lankhorst, farah.kassabri, intel-xe

On Wed, Jul 23, 2025 at 11:45:48AM -0700, Matthew Brost wrote:
> On Wed, Jul 23, 2025 at 06:22:22PM +0000, stuartsummers wrote:
> > From: Matthew Brost <matthew.brost@intel.com>
> > 
> > The frontend exposes an API to the driver to send invalidations, handles
> > sequence number assignment, synchronization (fences), and provides a
> > timeout mechanism. The backend issues the actual invalidation to the
> > hardware (or firmware).
> > 
> > The new layering easily allows issuing TLB invalidations to different
> > hardware or firmware interfaces.
> > 
> > Normalize some naming while here too.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> > ---
> >  drivers/gpu/drm/xe/Makefile             |   1 +
> >  drivers/gpu/drm/xe/xe_guc_ct.c          |   2 +-
> >  drivers/gpu/drm/xe/xe_guc_tlb_inval.c   | 263 +++++++++++++
> >  drivers/gpu/drm/xe/xe_guc_tlb_inval.h   |  19 +
> >  drivers/gpu/drm/xe/xe_tlb_inval.c       | 495 +++++++-----------------
> >  drivers/gpu/drm/xe/xe_tlb_inval.h       |  14 +-
> >  drivers/gpu/drm/xe/xe_tlb_inval_types.h |  77 +++-
> >  7 files changed, 505 insertions(+), 366 deletions(-)
> >  create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> >  create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> > 
> > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> > index 332b2057cc00..8a2f836b3ab2 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -75,6 +75,7 @@ xe-y += xe_bb.o \
> >  	xe_guc_log.o \
> >  	xe_guc_pc.o \
> >  	xe_guc_submit.o \
> > +	xe_guc_tlb_inval.o \
> >  	xe_heci_gsc.o \
> >  	xe_huc.o \
> >  	xe_hw_engine.o \
> > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > index 2ef86c0ae8b4..90ebda5b3790 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > @@ -30,9 +30,9 @@
> >  #include "xe_guc_log.h"
> >  #include "xe_guc_relay.h"
> >  #include "xe_guc_submit.h"
> > +#include "xe_guc_tlb_inval.h"
> >  #include "xe_map.h"
> >  #include "xe_pm.h"
> > -#include "xe_tlb_inval.h"
> >  #include "xe_trace_guc.h"
> >  
> >  static void receive_g2h(struct xe_guc_ct *ct);
> > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > new file mode 100644
> > index 000000000000..27d7dc938cb1
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > @@ -0,0 +1,263 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#include "abi/guc_actions_abi.h"
> > +
> > +#include "xe_device.h"
> > +#include "xe_gt_stats.h"
> > +#include "xe_gt_types.h"
> > +#include "xe_guc.h"
> > +#include "xe_guc_ct.h"
> > +#include "xe_guc_tlb_inval.h"
> > +#include "xe_force_wake.h"
> > +#include "xe_mmio.h"
> > +#include "xe_tlb_inval.h"
> > +
> > +#include "regs/xe_guc_regs.h"
> > +
> > +/*
> > + * XXX: The seqno algorithm relies on TLB invalidation being processed in order
> > + * which they currently are by the GuC, if that changes the algorithm will need
> > + * to be updated.
> > + */
> > +
> > +static int send_tlb_inval(struct xe_guc *guc, const u32 *action, int len)
> > +{
> > +	struct xe_gt *gt = guc_to_gt(guc);
> > +
> > +	lockdep_assert_held(&guc->ct.lock);
> > +	xe_gt_assert(gt, action[1]);	/* Seqno */
> > +
> > +	xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> > +	return xe_guc_ct_send(&guc->ct, action, len,
> > +			      G2H_LEN_DW_TLB_INVALIDATE, 1);
> > +}
> > +
> > +#define MAKE_INVAL_OP(type)	((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > +		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > +		XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > +
> > +static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval, u32 seqno)
> > +{
> > +	struct xe_guc *guc = tlb_inval->private;
> > +	u32 action[] = {
> > +		XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > +		seqno,
> > +		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > +	};
> > +
> > +	return send_tlb_inval(guc, action, ARRAY_SIZE(action));
> > +}
> > +
> > +static int send_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval, u32 seqno)
> > +{
> > +	struct xe_guc *guc = tlb_inval->private;
> > +	struct xe_gt *gt = guc_to_gt(guc);
> > +	struct xe_device *xe = guc_to_xe(guc);
> > +
> > +	lockdep_assert_held(&guc->ct.lock);
> > +
> > +	/*
> > +	 * Returning -ECANCELED in this function is squashed at the caller and
> > +	 * signals waiters.
> > +	 */
> > +
> > +	if (xe_guc_ct_enabled(&guc->ct) && guc->submission_state.enabled) {
> > +		u32 action[] = {
> > +			XE_GUC_ACTION_TLB_INVALIDATION,
> > +			seqno,
> > +			MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > +		};
> > +
> > +		return send_tlb_inval(guc, action, ARRAY_SIZE(action));
> > +	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
> > +		struct xe_mmio *mmio = &gt->mmio;
> > +		unsigned int fw_ref;
> > +
> > +		if (IS_SRIOV_VF(xe))
> > +			return -ECANCELED;
> > +
> > +		fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
> > +		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
> > +			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC1,
> > +					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
> > +			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC0,
> > +					PVC_GUC_TLB_INV_DESC0_VALID);
> > +		} else {
> > +			xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> > +					GUC_TLB_INV_CR_INVALIDATE);
> > +		}
> > +		xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > +	}
> > +
> > +	return -ECANCELED;
> > +}
> > +
> > +/*
> > + * Ensure that roundup_pow_of_two(length) doesn't overflow.
> > + * Note that roundup_pow_of_two() operates on unsigned long,
> > + * not on u64.
> > + */
> > +#define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
> > +
> > +static int send_tlb_inval_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno,
> > +				u64 start, u64 end, u32 asid)
> > +{
> > +#define MAX_TLB_INVALIDATION_LEN	7
> > +	struct xe_guc *guc = tlb_inval->private;
> > +	struct xe_gt *gt = guc_to_gt(guc);
> > +	u32 action[MAX_TLB_INVALIDATION_LEN];
> > +	u64 length = end - start;
> > +	int len = 0;
> > +
> > +	lockdep_assert_held(&guc->ct.lock);
> > +
> > +	if (guc_to_xe(guc)->info.force_execlist)
> > +		return -ECANCELED;
> > +
> > +	action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > +	action[len++] = seqno;
> > +	if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > +	    length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > +		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > +	} else {
> > +		u64 orig_start = start;
> > +		u64 align;
> > +
> > +		if (length < SZ_4K)
> > +			length = SZ_4K;
> > +
> > +		/*
> > +		 * We need to invalidate a higher granularity if start address
> > +		 * is not aligned to length. When start is not aligned with
> > +		 * length we need to find the length large enough to create an
> > +		 * address mask covering the required range.
> > +		 */
> > +		align = roundup_pow_of_two(length);
> > +		start = ALIGN_DOWN(start, align);
> > +		end = ALIGN(end, align);
> > +		length = align;
> > +		while (start + length < end) {
> > +			length <<= 1;
> > +			start = ALIGN_DOWN(orig_start, length);
> > +		}
> > +
> > +		/*
> > +		 * Minimum invalidation size for a 2MB page that the hardware
> > +		 * expects is 16MB
> > +		 */
> > +		if (length >= SZ_2M) {
> > +			length = max_t(u64, SZ_16M, length);
> > +			start = ALIGN_DOWN(orig_start, length);
> > +		}
> > +
> > +		xe_gt_assert(gt, length >= SZ_4K);
> > +		xe_gt_assert(gt, is_power_of_2(length));
> > +		xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1,
> > +						    ilog2(SZ_2M) + 1)));
> > +		xe_gt_assert(gt, IS_ALIGNED(start, length));
> > +
> > +		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > +		action[len++] = asid;
> > +		action[len++] = lower_32_bits(start);
> > +		action[len++] = upper_32_bits(start);
> > +		action[len++] = ilog2(length) - ilog2(SZ_4K);
> > +	}
> > +
> > +	xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> > +
> > +	return send_tlb_inval(guc, action, len);
> > +}
> > +
> > +static bool tlb_inval_initialized(struct xe_tlb_inval *tlb_inval)
> > +{
> > +	struct xe_guc *guc = tlb_inval->private;
> > +
> > +	return xe_guc_ct_initialized(&guc->ct);
> > +}
> > +
> > +static void tlb_inval_flush(struct xe_tlb_inval *tlb_inval)
> > +{
> > +	struct xe_guc *guc = tlb_inval->private;
> > +
> > +	LNL_FLUSH_WORK(&guc->ct.g2h_worker);
> > +}
> > +
> > +static long tlb_inval_timeout_delay(struct xe_tlb_inval *tlb_inval)
> > +{
> > +	struct xe_guc *guc = tlb_inval->private;
> > +
> > +	/* this reflects what HW/GuC needs to process TLB inv request */
> > +	const long hw_tlb_timeout = HZ / 4;
> > +
> > +	/* this estimates actual delay caused by the CTB transport */
> > +	long delay = xe_guc_ct_queue_proc_time_jiffies(&guc->ct);
> > +
> > +	return hw_tlb_timeout + 2 * delay;
> > +}
> > +
> > +static void tlb_inval_lock(struct xe_tlb_inval *tlb_inval)
> > +{
> > +	struct xe_guc *guc = tlb_inval->private;
> > +
> > +	mutex_lock(&guc->ct.lock);
> > +}
> > +
> > +static void tlb_inval_unlock(struct xe_tlb_inval *tlb_inval)
> > +{
> > +	struct xe_guc *guc = tlb_inval->private;
> > +
> > +	mutex_unlock(&guc->ct.lock);
> > +}
> > +
> > +static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
> > +	.all = send_tlb_inval_all,
> > +	.ggtt = send_tlb_inval_ggtt,
> > +	.ppgtt = send_tlb_inval_ppgtt,
> > +	.initialized = tlb_inval_initialized,
> > +	.flush = tlb_inval_flush,
> > +	.timeout_delay = tlb_inval_timeout_delay,
> > +	.lock = tlb_inval_lock,
> > +	.unlock = tlb_inval_unlock,
> 
> Where are the lock / unlock helpers planned to be used?
> 
> Matt
> 

Ignore this, I see that these vfuncs are used here. Let me do a proper
reply to the whole patch with my thoughts here.

Matt

> > +};
> > +
> > +/**
> > + * xe_guc_tlb_inval_init_early() - Init GuC TLB invalidation early
> > + * @guc: GuC object
> > + * @tlb_inval: TLB invalidation client
> > + *
> > + * Inititialize GuC TLB invalidation by setting back pointer in TLB invalidation
> > + * client to the GuC and setting GuC backend ops.
> > + */
> > +void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> > +				 struct xe_tlb_inval *tlb_inval)
> > +{
> > +	tlb_inval->private = guc;
> > +	tlb_inval->ops = &guc_tlb_inval_ops;
> > +}
> > +
> > +/**
> > + * xe_guc_tlb_inval_done_handler() - TLB invalidation done handler
> > + * @guc: guc
> > + * @msg: message indicating TLB invalidation done
> > + * @len: length of message
> > + *
> > + * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
> > + * invalidation fences for seqno. Algorithm for this depends on seqno being
> > + * received in-order and asserts this assumption.
> > + *
> > + * Return: 0 on success, -EPROTO for malformed messages.
> > + */
> > +int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> > +{
> > +	struct xe_gt *gt = guc_to_gt(guc);
> > +
> > +	if (unlikely(len != 1))
> > +		return -EPROTO;
> > +
> > +	xe_tlb_inval_done_handler(&gt->tlb_inval, msg[0]);
> > +
> > +	return 0;
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.h b/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> > new file mode 100644
> > index 000000000000..07d668b02e3d
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> > @@ -0,0 +1,19 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#ifndef _XE_GUC_TLB_INVAL_H_
> > +#define _XE_GUC_TLB_INVAL_H_
> > +
> > +#include <linux/types.h>
> > +
> > +struct xe_guc;
> > +struct xe_tlb_inval;
> > +
> > +void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> > +				 struct xe_tlb_inval *tlb_inval);
> > +
> > +int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
> > +
> > +#endif
> > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
> > index c795b78362bf..071c25fbdbac 100644
> > --- a/drivers/gpu/drm/xe/xe_tlb_inval.c
> > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
> > @@ -12,50 +12,45 @@
> >  #include "xe_gt_printk.h"
> >  #include "xe_guc.h"
> >  #include "xe_guc_ct.h"
> > +#include "xe_guc_tlb_inval.h"
> >  #include "xe_gt_stats.h"
> >  #include "xe_tlb_inval.h"
> >  #include "xe_mmio.h"
> >  #include "xe_pm.h"
> > -#include "xe_sriov.h"
> > +#include "xe_tlb_inval.h"
> >  #include "xe_trace.h"
> > -#include "regs/xe_guc_regs.h"
> > -
> > -#define FENCE_STACK_BIT		DMA_FENCE_FLAG_USER_BITS
> >  
> > -/*
> > - * TLB inval depends on pending commands in the CT queue and then the real
> > - * invalidation time. Double up the time to process full CT queue
> > - * just to be on the safe side.
> > +/**
> > + * DOC: Xe TLB invalidation
> > + *
> > + * Xe TLB invalidation is implemented in two layers. The first is the frontend
> > + * API, which provides an interface for TLB invalidations to the driver code.
> > + * The frontend handles seqno assignment, synchronization (fences), and the
> > + * timeout mechanism. The frontend is implemented via an embedded structure
> > + * xe_tlb_inval that includes a set of ops hooking into the backend. The backend
> > + * interacts with the hardware (or firmware) to perform the actual invalidation.
> >   */
> > -static long tlb_timeout_jiffies(struct xe_gt *gt)
> > -{
> > -	/* this reflects what HW/GuC needs to process TLB inv request */
> > -	const long hw_tlb_timeout = HZ / 4;
> >  
> > -	/* this estimates actual delay caused by the CTB transport */
> > -	long delay = xe_guc_ct_queue_proc_time_jiffies(&gt->uc.guc.ct);
> > -
> > -	return hw_tlb_timeout + 2 * delay;
> > -}
> > +#define FENCE_STACK_BIT		DMA_FENCE_FLAG_USER_BITS
> >  
> >  static void xe_tlb_inval_fence_fini(struct xe_tlb_inval_fence *fence)
> >  {
> > -	struct xe_gt *gt;
> > -
> >  	if (WARN_ON_ONCE(!fence->tlb_inval))
> >  		return;
> >  
> > -	gt = fence->tlb_inval->private;
> > -	xe_pm_runtime_put(gt_to_xe(gt));
> > +	xe_pm_runtime_put(fence->tlb_inval->xe);
> >  	fence->tlb_inval = NULL; /* fini() should be called once */
> >  }
> >  
> >  static void
> > -__inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
> > +xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence)
> >  {
> >  	bool stack = test_bit(FENCE_STACK_BIT, &fence->base.flags);
> >  
> > -	trace_xe_tlb_inval_fence_signal(xe, fence);
> > +	lockdep_assert_held(&fence->tlb_inval->pending_lock);
> > +
> > +	list_del(&fence->link);
> > +	trace_xe_tlb_inval_fence_signal(fence->tlb_inval->xe, fence);
> >  	xe_tlb_inval_fence_fini(fence);
> >  	dma_fence_signal(&fence->base);
> >  	if (!stack)
> > @@ -63,57 +58,50 @@ __inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
> >  }
> >  
> >  static void
> > -inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
> > +xe_tlb_inval_fence_signal_unlocked(struct xe_tlb_inval_fence *fence)
> >  {
> > -	lockdep_assert_held(&fence->tlb_inval->pending_lock);
> > -
> > -	list_del(&fence->link);
> > -	__inval_fence_signal(xe, fence);
> > -}
> > +	struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
> >  
> > -static void
> > -inval_fence_signal_unlocked(struct xe_device *xe,
> > -			    struct xe_tlb_inval_fence *fence)
> > -{
> > -	spin_lock_irq(&fence->tlb_inval->pending_lock);
> > -	inval_fence_signal(xe, fence);
> > -	spin_unlock_irq(&fence->tlb_inval->pending_lock);
> > +	spin_lock_irq(&tlb_inval->pending_lock);
> > +	xe_tlb_inval_fence_signal(fence);
> > +	spin_unlock_irq(&tlb_inval->pending_lock);
> >  }
> >  
> > -static void xe_gt_tlb_fence_timeout(struct work_struct *work)
> > +static void xe_tlb_inval_fence_timeout(struct work_struct *work)
> >  {
> > -	struct xe_gt *gt = container_of(work, struct xe_gt,
> > -					tlb_inval.fence_tdr.work);
> > -	struct xe_device *xe = gt_to_xe(gt);
> > +	struct xe_tlb_inval *tlb_inval = container_of(work, struct xe_tlb_inval,
> > +						      fence_tdr.work);
> > +	struct xe_device *xe = tlb_inval->xe;
> >  	struct xe_tlb_inval_fence *fence, *next;
> > +	long timeout_delay = tlb_inval->ops->timeout_delay(tlb_inval);
> >  
> > -	LNL_FLUSH_WORK(&gt->uc.guc.ct.g2h_worker);
> > +	tlb_inval->ops->flush(tlb_inval);
> >  
> > -	spin_lock_irq(&gt->tlb_inval.pending_lock);
> > +	spin_lock_irq(&tlb_inval->pending_lock);
> >  	list_for_each_entry_safe(fence, next,
> > -				 &gt->tlb_inval.pending_fences, link) {
> > +				 &tlb_inval->pending_fences, link) {
> >  		s64 since_inval_ms = ktime_ms_delta(ktime_get(),
> >  						    fence->inval_time);
> >  
> > -		if (msecs_to_jiffies(since_inval_ms) < tlb_timeout_jiffies(gt))
> > +		if (msecs_to_jiffies(since_inval_ms) < timeout_delay)
> >  			break;
> >  
> >  		trace_xe_tlb_inval_fence_timeout(xe, fence);
> > -		xe_gt_err(gt, "TLB invalidation fence timeout, seqno=%d recv=%d",
> > -			  fence->seqno, gt->tlb_inval.seqno_recv);
> > +		drm_err(&xe->drm,
> > +			"TLB invalidation fence timeout, seqno=%d recv=%d",
> > +			fence->seqno, tlb_inval->seqno_recv);
> >  
> >  		fence->base.error = -ETIME;
> > -		inval_fence_signal(xe, fence);
> > +		xe_tlb_inval_fence_signal(fence);
> >  	}
> > -	if (!list_empty(&gt->tlb_inval.pending_fences))
> > -		queue_delayed_work(system_wq,
> > -				   &gt->tlb_inval.fence_tdr,
> > -				   tlb_timeout_jiffies(gt));
> > -	spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > +	if (!list_empty(&tlb_inval->pending_fences))
> > +		queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
> > +				   timeout_delay);
> > +	spin_unlock_irq(&tlb_inval->pending_lock);
> >  }
> >  
> >  /**
> > - * xe_tlb_inval_init_early - Initialize TLB invalidation state
> > + * xe_gt_tlb_inval_init_early() - Initialize TLB invalidation state
> >   * @gt: GT structure
> >   *
> >   * Initialize TLB invalidation state, purely software initialization, should
> > @@ -123,13 +111,12 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
> >   */
> >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
> >  {
> > -	gt->tlb_inval.private = gt;
> > +	gt->tlb_inval.xe = gt_to_xe(gt);
> >  	gt->tlb_inval.seqno = 1;
> >  	INIT_LIST_HEAD(&gt->tlb_inval.pending_fences);
> >  	spin_lock_init(&gt->tlb_inval.pending_lock);
> >  	spin_lock_init(&gt->tlb_inval.lock);
> > -	INIT_DELAYED_WORK(&gt->tlb_inval.fence_tdr,
> > -			  xe_gt_tlb_fence_timeout);
> > +	INIT_DELAYED_WORK(&gt->tlb_inval.fence_tdr, xe_tlb_inval_fence_timeout);
> >  
> >  	gt->tlb_inval.job_wq =
> >  		drmm_alloc_ordered_workqueue(&gt_to_xe(gt)->drm, "gt-tbl-inval-job-wq",
> > @@ -137,60 +124,64 @@ int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
> >  	if (IS_ERR(gt->tlb_inval.job_wq))
> >  		return PTR_ERR(gt->tlb_inval.job_wq);
> >  
> > +	/* XXX: Blindly setting up backend to GuC */
> > +	xe_guc_tlb_inval_init_early(&gt->uc.guc, &gt->tlb_inval);
> > +
> >  	return 0;
> >  }
> >  
> >  /**
> > - * xe_tlb_inval_reset - Initialize TLB invalidation reset
> > + * xe_tlb_inval_reset() - TLB invalidation reset
> >   * @tlb_inval: TLB invalidation client
> >   *
> >   * Signal any pending invalidation fences, should be called during a GT reset
> >   */
> >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval)
> >  {
> > -	struct xe_gt *gt = tlb_inval->private;
> >  	struct xe_tlb_inval_fence *fence, *next;
> >  	int pending_seqno;
> >  
> >  	/*
> > -	 * we can get here before the CTs are even initialized if we're wedging
> > -	 * very early, in which case there are not going to be any pending
> > -	 * fences so we can bail immediately.
> > +	 * we can get here before the backends are even initialized if we're
> > +	 * wedging very early, in which case there are not going to be any
> > +	 * pendind fences so we can bail immediately.
> >  	 */
> > -	if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
> > +	if (!tlb_inval->ops->initialized(tlb_inval))
> >  		return;
> >  
> >  	/*
> > -	 * CT channel is already disabled at this point. No new TLB requests can
> > +	 * Backend is already disabled at this point. No new TLB requests can
> >  	 * appear.
> >  	 */
> >  
> > -	mutex_lock(&gt->uc.guc.ct.lock);
> > -	spin_lock_irq(&gt->tlb_inval.pending_lock);
> > -	cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > +	tlb_inval->ops->lock(tlb_inval);
> > +	spin_lock_irq(&tlb_inval->pending_lock);
> > +	cancel_delayed_work(&tlb_inval->fence_tdr);
> >  	/*
> >  	 * We might have various kworkers waiting for TLB flushes to complete
> >  	 * which are not tracked with an explicit TLB fence, however at this
> > -	 * stage that will never happen since the CT is already disabled, so
> > -	 * make sure we signal them here under the assumption that we have
> > +	 * stage that will never happen since the backend is already disabled,
> > +	 * so make sure we signal them here under the assumption that we have
> >  	 * completed a full GT reset.
> >  	 */
> > -	if (gt->tlb_inval.seqno == 1)
> > +	if (tlb_inval->seqno == 1)
> >  		pending_seqno = TLB_INVALIDATION_SEQNO_MAX - 1;
> >  	else
> > -		pending_seqno = gt->tlb_inval.seqno - 1;
> > -	WRITE_ONCE(gt->tlb_inval.seqno_recv, pending_seqno);
> > +		pending_seqno = tlb_inval->seqno - 1;
> > +	WRITE_ONCE(tlb_inval->seqno_recv, pending_seqno);
> >  
> >  	list_for_each_entry_safe(fence, next,
> > -				 &gt->tlb_inval.pending_fences, link)
> > -		inval_fence_signal(gt_to_xe(gt), fence);
> > -	spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > -	mutex_unlock(&gt->uc.guc.ct.lock);
> > +				 &tlb_inval->pending_fences, link)
> > +		xe_tlb_inval_fence_signal(fence);
> > +	spin_unlock_irq(&tlb_inval->pending_lock);
> > +	tlb_inval->ops->unlock(tlb_inval);
> >  }
> >  
> > -static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
> > +static bool xe_tlb_inval_seqno_past(struct xe_tlb_inval *tlb_inval, int seqno)
> >  {
> > -	int seqno_recv = READ_ONCE(gt->tlb_inval.seqno_recv);
> > +	int seqno_recv = READ_ONCE(tlb_inval->seqno_recv);
> > +
> > +	lockdep_assert_held(&tlb_inval->pending_lock);
> >  
> >  	if (seqno - seqno_recv < -(TLB_INVALIDATION_SEQNO_MAX / 2))
> >  		return false;
> > @@ -201,44 +192,20 @@ static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
> >  	return seqno_recv >= seqno;
> >  }
> >  
> > -static int send_tlb_inval(struct xe_guc *guc, const u32 *action, int len)
> > -{
> > -	struct xe_gt *gt = guc_to_gt(guc);
> > -
> > -	xe_gt_assert(gt, action[1]);	/* Seqno */
> > -	lockdep_assert_held(&guc->ct.lock);
> > -
> > -	/*
> > -	 * XXX: The seqno algorithm relies on TLB invalidation being processed
> > -	 * in order which they currently are, if that changes the algorithm will
> > -	 * need to be updated.
> > -	 */
> > -
> > -	xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> > -
> > -	return xe_guc_ct_send(&guc->ct, action, len,
> > -			      G2H_LEN_DW_TLB_INVALIDATE, 1);
> > -}
> > -
> >  static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence *fence)
> >  {
> >  	struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
> > -	struct xe_gt *gt = tlb_inval->private;
> > -	struct xe_device *xe = gt_to_xe(gt);
> > -
> > -	lockdep_assert_held(&gt->uc.guc.ct.lock);
> >  
> >  	fence->seqno = tlb_inval->seqno;
> > -	trace_xe_tlb_inval_fence_send(xe, fence);
> > +	trace_xe_tlb_inval_fence_send(tlb_inval->xe, fence);
> >  
> >  	spin_lock_irq(&tlb_inval->pending_lock);
> >  	fence->inval_time = ktime_get();
> >  	list_add_tail(&fence->link, &tlb_inval->pending_fences);
> >  
> >  	if (list_is_singular(&tlb_inval->pending_fences))
> > -		queue_delayed_work(system_wq,
> > -				   &tlb_inval->fence_tdr,
> > -				   tlb_timeout_jiffies(gt));
> > +		queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
> > +				   tlb_inval->ops->timeout_delay(tlb_inval));
> >  	spin_unlock_irq(&tlb_inval->pending_lock);
> >  
> >  	tlb_inval->seqno = (tlb_inval->seqno + 1) %
> > @@ -247,202 +214,63 @@ static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence *fence)
> >  		tlb_inval->seqno = 1;
> >  }
> >  
> > -#define MAKE_INVAL_OP(type)	((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > -		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > -		XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > -
> > -static int send_tlb_inval_ggtt(struct xe_gt *gt, int seqno)
> > -{
> > -	u32 action[] = {
> > -		XE_GUC_ACTION_TLB_INVALIDATION,
> > -		seqno,
> > -		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > -	};
> > -
> > -	return send_tlb_inval(&gt->uc.guc, action, ARRAY_SIZE(action));
> > -}
> > -
> > -static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > -			      struct xe_tlb_inval_fence *fence)
> > -{
> > -	u32 action[] = {
> > -		XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > -		0,  /* seqno, replaced in send_tlb_inval */
> > -		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > -	};
> > -	struct xe_gt *gt = tlb_inval->private;
> > -
> > -	xe_gt_assert(gt, fence);
> > -
> > -	return send_tlb_inval(&gt->uc.guc, action, ARRAY_SIZE(action));
> > -}
> > +#define xe_tlb_inval_issue(__tlb_inval, __fence, op, args...)	\
> > +({								\
> > +	int __ret;						\
> > +								\
> > +	xe_assert((__tlb_inval)->xe, (__tlb_inval)->ops);	\
> > +	xe_assert((__tlb_inval)->xe, (__fence));		\
> > +								\
> > +	(__tlb_inval)->ops->lock((__tlb_inval));		\
> > +	xe_tlb_inval_fence_prep((__fence));			\
> > +	__ret = op((__tlb_inval), (__fence)->seqno, ##args);	\
> > +	if (__ret < 0)						\
> > +		xe_tlb_inval_fence_signal_unlocked((__fence));	\
> > +	(__tlb_inval)->ops->unlock((__tlb_inval));		\
> > +								\
> > +	__ret == -ECANCELED ? 0 : __ret;			\
> > +})
> >  
> >  /**
> > - * xe_gt_tlb_invalidation_all - Invalidate all TLBs across PF and all VFs.
> > - * @gt: the &xe_gt structure
> > - * @fence: the &xe_tlb_inval_fence to be signaled on completion
> > + * xe_tlb_inval_all() - Issue a TLB invalidation for all TLBs
> > + * @tlb_inval: TLB invalidation client
> > + * @fence: invalidation fence which will be signal on TLB invalidation
> > + * completion
> >   *
> > - * Send a request to invalidate all TLBs across PF and all VFs.
> > + * Issue a TLB invalidation for all TLBs. Completion of TLB is asynchronous and
> > + * caller can use the invalidation fence to wait for completion.
> >   *
> >   * Return: 0 on success, negative error code on error
> >   */
> >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> >  		     struct xe_tlb_inval_fence *fence)
> >  {
> > -	struct xe_gt *gt = tlb_inval->private;
> > -	int err;
> > -
> > -	err = send_tlb_inval_all(tlb_inval, fence);
> > -	if (err)
> > -		xe_gt_err(gt, "TLB invalidation request failed (%pe)", ERR_PTR(err));
> > -
> > -	return err;
> > -}
> > -
> > -/*
> > - * Ensure that roundup_pow_of_two(length) doesn't overflow.
> > - * Note that roundup_pow_of_two() operates on unsigned long,
> > - * not on u64.
> > - */
> > -#define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
> > -
> > -static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64 start, u64 end,
> > -				u32 asid, int seqno)
> > -{
> > -#define MAX_TLB_INVALIDATION_LEN	7
> > -	u32 action[MAX_TLB_INVALIDATION_LEN];
> > -	u64 length = end - start;
> > -	int len = 0;
> > -
> > -	action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > -	action[len++] = seqno;
> > -	if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > -	    length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > -		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > -	} else {
> > -		u64 orig_start = start;
> > -		u64 align;
> > -
> > -		if (length < SZ_4K)
> > -			length = SZ_4K;
> > -
> > -		/*
> > -		 * We need to invalidate a higher granularity if start address
> > -		 * is not aligned to length. When start is not aligned with
> > -		 * length we need to find the length large enough to create an
> > -		 * address mask covering the required range.
> > -		 */
> > -		align = roundup_pow_of_two(length);
> > -		start = ALIGN_DOWN(start, align);
> > -		end = ALIGN(end, align);
> > -		length = align;
> > -		while (start + length < end) {
> > -			length <<= 1;
> > -			start = ALIGN_DOWN(orig_start, length);
> > -		}
> > -
> > -		/*
> > -		 * Minimum invalidation size for a 2MB page that the hardware
> > -		 * expects is 16MB
> > -		 */
> > -		if (length >= SZ_2M) {
> > -			length = max_t(u64, SZ_16M, length);
> > -			start = ALIGN_DOWN(orig_start, length);
> > -		}
> > -
> > -		xe_gt_assert(gt, length >= SZ_4K);
> > -		xe_gt_assert(gt, is_power_of_2(length));
> > -		xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1,
> > -						    ilog2(SZ_2M) + 1)));
> > -		xe_gt_assert(gt, IS_ALIGNED(start, length));
> > -
> > -		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > -		action[len++] = asid;
> > -		action[len++] = lower_32_bits(start);
> > -		action[len++] = upper_32_bits(start);
> > -		action[len++] = ilog2(length) - ilog2(SZ_4K);
> > -	}
> > -
> > -	xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> > -
> > -	return send_tlb_inval(&gt->uc.guc, action, len);
> > -}
> > -
> > -static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
> > -			       struct xe_tlb_inval_fence *fence)
> > -{
> > -	int ret;
> > -
> > -	mutex_lock(&gt->uc.guc.ct.lock);
> > -
> > -	xe_tlb_inval_fence_prep(fence);
> > -
> > -	ret = send_tlb_inval_ggtt(gt, fence->seqno);
> > -	if (ret < 0)
> > -		inval_fence_signal_unlocked(gt_to_xe(gt), fence);
> > -
> > -	mutex_unlock(&gt->uc.guc.ct.lock);
> > -
> > -	/*
> > -	 * -ECANCELED indicates the CT is stopped for a GT reset. TLB caches
> > -	 *  should be nuked on a GT reset so this error can be ignored.
> > -	 */
> > -	if (ret == -ECANCELED)
> > -		return 0;
> > -
> > -	return ret;
> > +	return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval->ops->all);
> >  }
> >  
> >  /**
> > - * xe_tlb_inval_ggtt - Issue a TLB invalidation on this GT for the GGTT
> > + * xe_tlb_inval_ggtt() - Issue a TLB invalidation for the GGTT
> >   * @tlb_inval: TLB invalidation client
> >   *
> > - * Issue a TLB invalidation for the GGTT. Completion of TLB invalidation is
> > - * synchronous.
> > + * Issue a TLB invalidation for the GGTT. Completion of TLB is asynchronous and
> > + * caller can use the invalidation fence to wait for completion.
> >   *
> >   * Return: 0 on success, negative error code on error
> >   */
> >  int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
> >  {
> > -	struct xe_gt *gt = tlb_inval->private;
> > -	struct xe_device *xe = gt_to_xe(gt);
> > -	unsigned int fw_ref;
> > -
> > -	if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
> > -	    gt->uc.guc.submission_state.enabled) {
> > -		struct xe_tlb_inval_fence fence;
> > -		int ret;
> > -
> > -		xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> > -		ret = __xe_tlb_inval_ggtt(gt, &fence);
> > -		if (ret)
> > -			return ret;
> > -
> > -		xe_tlb_inval_fence_wait(&fence);
> > -	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
> > -		struct xe_mmio *mmio = &gt->mmio;
> > -
> > -		if (IS_SRIOV_VF(xe))
> > -			return 0;
> > -
> > -		fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
> > -		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
> > -			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC1,
> > -					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
> > -			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC0,
> > -					PVC_GUC_TLB_INV_DESC0_VALID);
> > -		} else {
> > -			xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> > -					GUC_TLB_INV_CR_INVALIDATE);
> > -		}
> > -		xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > -	}
> > +	struct xe_tlb_inval_fence fence, *fence_ptr = &fence;
> > +	int ret;
> >  
> > -	return 0;
> > +	xe_tlb_inval_fence_init(tlb_inval, fence_ptr, true);
> > +	ret = xe_tlb_inval_issue(tlb_inval, fence_ptr, tlb_inval->ops->ggtt);
> > +	xe_tlb_inval_fence_wait(fence_ptr);
> > +
> > +	return ret;
> >  }
> >  
> >  /**
> > - * xe_tlb_inval_range - Issue a TLB invalidation on this GT for an address range
> > + * xe_tlb_inval_range() - Issue a TLB invalidation for an address range
> >   * @tlb_inval: TLB invalidation client
> >   * @fence: invalidation fence which will be signal on TLB invalidation
> >   * completion
> > @@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
> >  		       struct xe_tlb_inval_fence *fence, u64 start, u64 end,
> >  		       u32 asid)
> >  {
> > -	struct xe_gt *gt = tlb_inval->private;
> > -	struct xe_device *xe = gt_to_xe(gt);
> > -	int  ret;
> > -
> > -	xe_gt_assert(gt, fence);
> > -
> > -	/* Execlists not supported */
> > -	if (xe->info.force_execlist) {
> > -		__inval_fence_signal(xe, fence);
> > -		return 0;
> > -	}
> > -
> > -	mutex_lock(&gt->uc.guc.ct.lock);
> > -
> > -	xe_tlb_inval_fence_prep(fence);
> > -
> > -	ret = send_tlb_inval_ppgtt(gt, start, end, asid, fence->seqno);
> > -	if (ret < 0)
> > -		inval_fence_signal_unlocked(xe, fence);
> > -
> > -	mutex_unlock(&gt->uc.guc.ct.lock);
> > -
> > -	return ret;
> > +	return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval->ops->ppgtt,
> > +				  start, end, asid);
> >  }
> >  
> >  /**
> > - * xe_tlb_inval_vm - Issue a TLB invalidation on this GT for a VM
> > + * xe_tlb_inval_vm() - Issue a TLB invalidation for a VM
> >   * @tlb_inval: TLB invalidation client
> >   * @vm: VM to invalidate
> >   *
> > @@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm)
> >  {
> >  	struct xe_tlb_inval_fence fence;
> >  	u64 range = 1ull << vm->xe->info.va_bits;
> > -	int ret;
> >  
> >  	xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> > -
> > -	ret = xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm->usm.asid);
> > -	if (ret < 0)
> > -		return;
> > -
> > +	xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm->usm.asid);
> >  	xe_tlb_inval_fence_wait(&fence);
> >  }
> >  
> >  /**
> > - * xe_tlb_inval_done_handler - TLB invalidation done handler
> > - * @gt: gt
> > + * xe_tlb_inval_done_handler() - TLB invalidation done handler
> > + * @tlb_inval: TLB invalidation client
> >   * @seqno: seqno of invalidation that is done
> >   *
> >   * Update recv seqno, signal any TLB invalidation fences, and restart TDR
> >   */
> > -static void xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> > +void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno)
> >  {
> > -	struct xe_device *xe = gt_to_xe(gt);
> > +	struct xe_device *xe = tlb_inval->xe;
> >  	struct xe_tlb_inval_fence *fence, *next;
> >  	unsigned long flags;
> >  
> > @@ -535,77 +337,53 @@ static void xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> >  	 * officially process the CT message like if racing against
> >  	 * process_g2h_msg().
> >  	 */
> > -	spin_lock_irqsave(&gt->tlb_inval.pending_lock, flags);
> > -	if (tlb_inval_seqno_past(gt, seqno)) {
> > -		spin_unlock_irqrestore(&gt->tlb_inval.pending_lock, flags);
> > +	spin_lock_irqsave(&tlb_inval->pending_lock, flags);
> > +	if (xe_tlb_inval_seqno_past(tlb_inval, seqno)) {
> > +		spin_unlock_irqrestore(&tlb_inval->pending_lock, flags);
> >  		return;
> >  	}
> >  
> > -	WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
> > +	WRITE_ONCE(tlb_inval->seqno_recv, seqno);
> >  
> >  	list_for_each_entry_safe(fence, next,
> > -				 &gt->tlb_inval.pending_fences, link) {
> > +				 &tlb_inval->pending_fences, link) {
> >  		trace_xe_tlb_inval_fence_recv(xe, fence);
> >  
> > -		if (!tlb_inval_seqno_past(gt, fence->seqno))
> > +		if (!xe_tlb_inval_seqno_past(tlb_inval, fence->seqno))
> >  			break;
> >  
> > -		inval_fence_signal(xe, fence);
> > +		xe_tlb_inval_fence_signal(fence);
> >  	}
> >  
> > -	if (!list_empty(&gt->tlb_inval.pending_fences))
> > +	if (!list_empty(&tlb_inval->pending_fences))
> >  		mod_delayed_work(system_wq,
> > -				 &gt->tlb_inval.fence_tdr,
> > -				 tlb_timeout_jiffies(gt));
> > +				 &tlb_inval->fence_tdr,
> > +				 tlb_inval->ops->timeout_delay(tlb_inval));
> >  	else
> > -		cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > +		cancel_delayed_work(&tlb_inval->fence_tdr);
> >  
> > -	spin_unlock_irqrestore(&gt->tlb_inval.pending_lock, flags);
> > -}
> > -
> > -/**
> > - * xe_guc_tlb_inval_done_handler - TLB invalidation done handler
> > - * @guc: guc
> > - * @msg: message indicating TLB invalidation done
> > - * @len: length of message
> > - *
> > - * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
> > - * invalidation fences for seqno. Algorithm for this depends on seqno being
> > - * received in-order and asserts this assumption.
> > - *
> > - * Return: 0 on success, -EPROTO for malformed messages.
> > - */
> > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> > -{
> > -	struct xe_gt *gt = guc_to_gt(guc);
> > -
> > -	if (unlikely(len != 1))
> > -		return -EPROTO;
> > -
> > -	xe_tlb_inval_done_handler(gt, msg[0]);
> > -
> > -	return 0;
> > +	spin_unlock_irqrestore(&tlb_inval->pending_lock, flags);
> >  }
> >  
> >  static const char *
> > -inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > +xe_inval_fence_get_driver_name(struct dma_fence *dma_fence)
> >  {
> >  	return "xe";
> >  }
> >  
> >  static const char *
> > -inval_fence_get_timeline_name(struct dma_fence *dma_fence)
> > +xe_inval_fence_get_timeline_name(struct dma_fence *dma_fence)
> >  {
> > -	return "inval_fence";
> > +	return "tlb_inval_fence";
> >  }
> >  
> >  static const struct dma_fence_ops inval_fence_ops = {
> > -	.get_driver_name = inval_fence_get_driver_name,
> > -	.get_timeline_name = inval_fence_get_timeline_name,
> > +	.get_driver_name = xe_inval_fence_get_driver_name,
> > +	.get_timeline_name = xe_inval_fence_get_timeline_name,
> >  };
> >  
> >  /**
> > - * xe_tlb_inval_fence_init - Initialize TLB invalidation fence
> > + * xe_tlb_inval_fence_init() - Initialize TLB invalidation fence
> >   * @tlb_inval: TLB invalidation client
> >   * @fence: TLB invalidation fence to initialize
> >   * @stack: fence is stack variable
> > @@ -618,15 +396,12 @@ void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
> >  			     struct xe_tlb_inval_fence *fence,
> >  			     bool stack)
> >  {
> > -	struct xe_gt *gt = tlb_inval->private;
> > -
> > -	xe_pm_runtime_get_noresume(gt_to_xe(gt));
> > +	xe_pm_runtime_get_noresume(tlb_inval->xe);
> >  
> > -	spin_lock_irq(&gt->tlb_inval.lock);
> > -	dma_fence_init(&fence->base, &inval_fence_ops,
> > -		       &gt->tlb_inval.lock,
> > +	spin_lock_irq(&tlb_inval->lock);
> > +	dma_fence_init(&fence->base, &inval_fence_ops, &tlb_inval->lock,
> >  		       dma_fence_context_alloc(1), 1);
> > -	spin_unlock_irq(&gt->tlb_inval.lock);
> > +	spin_unlock_irq(&tlb_inval->lock);
> >  	INIT_LIST_HEAD(&fence->link);
> >  	if (stack)
> >  		set_bit(FENCE_STACK_BIT, &fence->base.flags);
> > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > index 7adee3f8c551..cdeafc8d4391 100644
> > --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > @@ -18,24 +18,30 @@ struct xe_vma;
> >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
> >  
> >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval);
> > -int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > -void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm);
> >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> >  		     struct xe_tlb_inval_fence *fence);
> > +int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > +void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm);
> >  int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
> >  		       struct xe_tlb_inval_fence *fence,
> >  		       u64 start, u64 end, u32 asid);
> > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
> >  
> >  void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
> >  			     struct xe_tlb_inval_fence *fence,
> >  			     bool stack);
> > -void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence);
> >  
> > +/**
> > + * xe_tlb_inval_fence_wait() - TLB invalidiation fence wait
> > + * @fence: TLB invalidation fence to wait on
> > + *
> > + * Wait on a TLB invalidiation fence until it signals, non interruptable
> > + */
> >  static inline void
> >  xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence *fence)
> >  {
> >  	dma_fence_wait(&fence->base, false);
> >  }
> >  
> > +void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno);
> > +
> >  #endif	/* _XE_TLB_INVAL_ */
> > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > index 05b6adc929bb..c1ad96d24fc8 100644
> > --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > @@ -9,10 +9,85 @@
> >  #include <linux/workqueue.h>
> >  #include <linux/dma-fence.h>
> >  
> > -/** struct xe_tlb_inval - TLB invalidation client */
> > +struct xe_tlb_inval;
> > +
> > +/** struct xe_tlb_inval_ops - TLB invalidation ops (backend) */
> > +struct xe_tlb_inval_ops {
> > +	/**
> > +	 * @all: Invalidate all TLBs
> > +	 * @tlb_inval: TLB invalidation client
> > +	 * @seqno: Seqno of TLB invalidation
> > +	 *
> > +	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
> > +	 * failure
> > +	 */
> > +	int (*all)(struct xe_tlb_inval *tlb_inval, u32 seqno);
> > +
> > +	/**
> > +	 * @ggtt: Invalidate global translation TLBs
> > +	 * @tlb_inval: TLB invalidation client
> > +	 * @seqno: Seqno of TLB invalidation
> > +	 *
> > +	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
> > +	 * failure
> > +	 */
> > +	int (*ggtt)(struct xe_tlb_inval *tlb_inval, u32 seqno);
> > +
> > +	/**
> > +	 * @ppttt: Invalidate per-process translation TLBs
> > +	 * @tlb_inval: TLB invalidation client
> > +	 * @seqno: Seqno of TLB invalidation
> > +	 * @start: Start address
> > +	 * @end: End address
> > +	 * @asid: Address space ID
> > +	 *
> > +	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
> > +	 * failure
> > +	 */
> > +	int (*ppgtt)(struct xe_tlb_inval *tlb_inval, u32 seqno, u64 start,
> > +		     u64 end, u32 asid);
> > +
> > +	/**
> > +	 * @initialized: Backend is initialized
> > +	 * @tlb_inval: TLB invalidation client
> > +	 *
> > +	 * Return: True if back is initialized, False otherwise
> > +	 */
> > +	bool (*initialized)(struct xe_tlb_inval *tlb_inval);
> > +
> > +	/**
> > +	 * @flush: Flush pending TLB invalidations
> > +	 * @tlb_inval: TLB invalidation client
> > +	 */
> > +	void (*flush)(struct xe_tlb_inval *tlb_inval);
> > +
> > +	/**
> > +	 * @timeout_delay: Timeout delay for TLB invalidation
> > +	 * @tlb_inval: TLB invalidation client
> > +	 *
> > +	 * Return: Timeout delay for TLB invalidation in jiffies
> > +	 */
> > +	long (*timeout_delay)(struct xe_tlb_inval *tlb_inval);
> > +
> > +	/**
> > +	 * @lock: Lock resources protecting the backend seqno management
> > +	 */
> > +	void (*lock)(struct xe_tlb_inval *tlb_inval);
> > +
> > +	/**
> > +	 * @unlock: Lock resources protecting the backend seqno management
> > +	 */
> > +	void (*unlock)(struct xe_tlb_inval *tlb_inval);
> > +};
> > +
> > +/** struct xe_tlb_inval - TLB invalidation client (frontend) */
> >  struct xe_tlb_inval {
> >  	/** @private: Backend private pointer */
> >  	void *private;
> > +	/** @xe: Pointer to Xe device */
> > +	struct xe_device *xe;
> > +	/** @ops: TLB invalidation ops */
> > +	const struct xe_tlb_inval_ops *ops;
> >  	/** @tlb_inval.seqno: TLB invalidation seqno, protected by CT lock */
> >  #define TLB_INVALIDATION_SEQNO_MAX	0x100000
> >  	int seqno;
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend
  2025-07-23 18:22 ` [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend stuartsummers
  2025-07-23 18:45   ` Matthew Brost
@ 2025-07-23 19:17   ` Matthew Brost
  2025-07-23 20:18     ` Matthew Brost
  1 sibling, 1 reply; 19+ messages in thread
From: Matthew Brost @ 2025-07-23 19:17 UTC (permalink / raw)
  To: stuartsummers; +Cc: matthew.auld, maarten.lankhorst, farah.kassabri, intel-xe

On Wed, Jul 23, 2025 at 06:22:22PM +0000, stuartsummers wrote:
> From: Matthew Brost <matthew.brost@intel.com>
> 
> The frontend exposes an API to the driver to send invalidations, handles
> sequence number assignment, synchronization (fences), and provides a
> timeout mechanism. The backend issues the actual invalidation to the
> hardware (or firmware).
> 
> The new layering easily allows issuing TLB invalidations to different
> hardware or firmware interfaces.
> 
> Normalize some naming while here too.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> ---
>  drivers/gpu/drm/xe/Makefile             |   1 +
>  drivers/gpu/drm/xe/xe_guc_ct.c          |   2 +-
>  drivers/gpu/drm/xe/xe_guc_tlb_inval.c   | 263 +++++++++++++
>  drivers/gpu/drm/xe/xe_guc_tlb_inval.h   |  19 +
>  drivers/gpu/drm/xe/xe_tlb_inval.c       | 495 +++++++-----------------
>  drivers/gpu/drm/xe/xe_tlb_inval.h       |  14 +-
>  drivers/gpu/drm/xe/xe_tlb_inval_types.h |  77 +++-
>  7 files changed, 505 insertions(+), 366 deletions(-)
>  create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.c
>  create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 332b2057cc00..8a2f836b3ab2 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -75,6 +75,7 @@ xe-y += xe_bb.o \
>  	xe_guc_log.o \
>  	xe_guc_pc.o \
>  	xe_guc_submit.o \
> +	xe_guc_tlb_inval.o \
>  	xe_heci_gsc.o \
>  	xe_huc.o \
>  	xe_hw_engine.o \
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 2ef86c0ae8b4..90ebda5b3790 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -30,9 +30,9 @@
>  #include "xe_guc_log.h"
>  #include "xe_guc_relay.h"
>  #include "xe_guc_submit.h"
> +#include "xe_guc_tlb_inval.h"
>  #include "xe_map.h"
>  #include "xe_pm.h"
> -#include "xe_tlb_inval.h"
>  #include "xe_trace_guc.h"
>  
>  static void receive_g2h(struct xe_guc_ct *ct);
> diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> new file mode 100644
> index 000000000000..27d7dc938cb1
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> @@ -0,0 +1,263 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include "abi/guc_actions_abi.h"
> +
> +#include "xe_device.h"
> +#include "xe_gt_stats.h"
> +#include "xe_gt_types.h"
> +#include "xe_guc.h"
> +#include "xe_guc_ct.h"
> +#include "xe_guc_tlb_inval.h"
> +#include "xe_force_wake.h"
> +#include "xe_mmio.h"
> +#include "xe_tlb_inval.h"
> +
> +#include "regs/xe_guc_regs.h"
> +
> +/*
> + * XXX: The seqno algorithm relies on TLB invalidation being processed in order
> + * which they currently are by the GuC, if that changes the algorithm will need
> + * to be updated.
> + */
> +
> +static int send_tlb_inval(struct xe_guc *guc, const u32 *action, int len)
> +{
> +	struct xe_gt *gt = guc_to_gt(guc);
> +
> +	lockdep_assert_held(&guc->ct.lock);
> +	xe_gt_assert(gt, action[1]);	/* Seqno */
> +
> +	xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> +	return xe_guc_ct_send(&guc->ct, action, len,
> +			      G2H_LEN_DW_TLB_INVALIDATE, 1);

As written, you’d need xe_guc_ct_send_locked here—but you actually
don’t. More on that below.

> +}
> +
> +#define MAKE_INVAL_OP(type)	((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> +		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \
> +		XE_GUC_TLB_INVAL_FLUSH_CACHE)
> +
> +static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval, u32 seqno)
> +{
> +	struct xe_guc *guc = tlb_inval->private;
> +	u32 action[] = {
> +		XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> +		seqno,
> +		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> +	};
> +
> +	return send_tlb_inval(guc, action, ARRAY_SIZE(action));
> +}
> +
> +static int send_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval, u32 seqno)
> +{
> +	struct xe_guc *guc = tlb_inval->private;
> +	struct xe_gt *gt = guc_to_gt(guc);
> +	struct xe_device *xe = guc_to_xe(guc);
> +
> +	lockdep_assert_held(&guc->ct.lock);
> +
> +	/*
> +	 * Returning -ECANCELED in this function is squashed at the caller and
> +	 * signals waiters.
> +	 */
> +
> +	if (xe_guc_ct_enabled(&guc->ct) && guc->submission_state.enabled) {
> +		u32 action[] = {
> +			XE_GUC_ACTION_TLB_INVALIDATION,
> +			seqno,
> +			MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> +		};
> +
> +		return send_tlb_inval(guc, action, ARRAY_SIZE(action));
> +	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
> +		struct xe_mmio *mmio = &gt->mmio;
> +		unsigned int fw_ref;
> +
> +		if (IS_SRIOV_VF(xe))
> +			return -ECANCELED;
> +
> +		fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
> +		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
> +			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC1,
> +					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
> +			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC0,
> +					PVC_GUC_TLB_INV_DESC0_VALID);
> +		} else {
> +			xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> +					GUC_TLB_INV_CR_INVALIDATE);
> +		}
> +		xe_force_wake_put(gt_to_fw(gt), fw_ref);
> +	}
> +
> +	return -ECANCELED;
> +}
> +
> +/*
> + * Ensure that roundup_pow_of_two(length) doesn't overflow.
> + * Note that roundup_pow_of_two() operates on unsigned long,
> + * not on u64.
> + */
> +#define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
> +
> +static int send_tlb_inval_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno,
> +				u64 start, u64 end, u32 asid)
> +{
> +#define MAX_TLB_INVALIDATION_LEN	7
> +	struct xe_guc *guc = tlb_inval->private;
> +	struct xe_gt *gt = guc_to_gt(guc);
> +	u32 action[MAX_TLB_INVALIDATION_LEN];
> +	u64 length = end - start;
> +	int len = 0;
> +
> +	lockdep_assert_held(&guc->ct.lock);
> +
> +	if (guc_to_xe(guc)->info.force_execlist)
> +		return -ECANCELED;
> +
> +	action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> +	action[len++] = seqno;
> +	if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> +	    length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> +		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> +	} else {
> +		u64 orig_start = start;
> +		u64 align;
> +
> +		if (length < SZ_4K)
> +			length = SZ_4K;
> +
> +		/*
> +		 * We need to invalidate a higher granularity if start address
> +		 * is not aligned to length. When start is not aligned with
> +		 * length we need to find the length large enough to create an
> +		 * address mask covering the required range.
> +		 */
> +		align = roundup_pow_of_two(length);
> +		start = ALIGN_DOWN(start, align);
> +		end = ALIGN(end, align);
> +		length = align;
> +		while (start + length < end) {
> +			length <<= 1;
> +			start = ALIGN_DOWN(orig_start, length);
> +		}
> +
> +		/*
> +		 * Minimum invalidation size for a 2MB page that the hardware
> +		 * expects is 16MB
> +		 */
> +		if (length >= SZ_2M) {
> +			length = max_t(u64, SZ_16M, length);
> +			start = ALIGN_DOWN(orig_start, length);
> +		}
> +
> +		xe_gt_assert(gt, length >= SZ_4K);
> +		xe_gt_assert(gt, is_power_of_2(length));
> +		xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1,
> +						    ilog2(SZ_2M) + 1)));
> +		xe_gt_assert(gt, IS_ALIGNED(start, length));
> +
> +		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> +		action[len++] = asid;
> +		action[len++] = lower_32_bits(start);
> +		action[len++] = upper_32_bits(start);
> +		action[len++] = ilog2(length) - ilog2(SZ_4K);
> +	}
> +
> +	xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> +
> +	return send_tlb_inval(guc, action, len);
> +}
> +
> +static bool tlb_inval_initialized(struct xe_tlb_inval *tlb_inval)
> +{
> +	struct xe_guc *guc = tlb_inval->private;
> +
> +	return xe_guc_ct_initialized(&guc->ct);
> +}
> +
> +static void tlb_inval_flush(struct xe_tlb_inval *tlb_inval)
> +{
> +	struct xe_guc *guc = tlb_inval->private;
> +
> +	LNL_FLUSH_WORK(&guc->ct.g2h_worker);
> +}
> +
> +static long tlb_inval_timeout_delay(struct xe_tlb_inval *tlb_inval)
> +{
> +	struct xe_guc *guc = tlb_inval->private;
> +
> +	/* this reflects what HW/GuC needs to process TLB inv request */
> +	const long hw_tlb_timeout = HZ / 4;
> +
> +	/* this estimates actual delay caused by the CTB transport */
> +	long delay = xe_guc_ct_queue_proc_time_jiffies(&guc->ct);
> +
> +	return hw_tlb_timeout + 2 * delay;
> +}
> +
> +static void tlb_inval_lock(struct xe_tlb_inval *tlb_inval)
> +{
> +	struct xe_guc *guc = tlb_inval->private;
> +
> +	mutex_lock(&guc->ct.lock);
> +}
> +
> +static void tlb_inval_unlock(struct xe_tlb_inval *tlb_inval)
> +{
> +	struct xe_guc *guc = tlb_inval->private;
> +
> +	mutex_unlock(&guc->ct.lock);
> +}
> +
> +static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
> +	.all = send_tlb_inval_all,
> +	.ggtt = send_tlb_inval_ggtt,
> +	.ppgtt = send_tlb_inval_ppgtt,
> +	.initialized = tlb_inval_initialized,
> +	.flush = tlb_inval_flush,
> +	.timeout_delay = tlb_inval_timeout_delay,
> +	.lock = tlb_inval_lock,
> +	.unlock = tlb_inval_unlock,
> +};
> +
> +/**
> + * xe_guc_tlb_inval_init_early() - Init GuC TLB invalidation early
> + * @guc: GuC object
> + * @tlb_inval: TLB invalidation client
> + *
> + * Inititialize GuC TLB invalidation by setting back pointer in TLB invalidation
> + * client to the GuC and setting GuC backend ops.
> + */
> +void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> +				 struct xe_tlb_inval *tlb_inval)
> +{
> +	tlb_inval->private = guc;
> +	tlb_inval->ops = &guc_tlb_inval_ops;
> +}
> +
> +/**
> + * xe_guc_tlb_inval_done_handler() - TLB invalidation done handler
> + * @guc: guc
> + * @msg: message indicating TLB invalidation done
> + * @len: length of message
> + *
> + * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
> + * invalidation fences for seqno. Algorithm for this depends on seqno being
> + * received in-order and asserts this assumption.
> + *
> + * Return: 0 on success, -EPROTO for malformed messages.
> + */
> +int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> +{
> +	struct xe_gt *gt = guc_to_gt(guc);
> +
> +	if (unlikely(len != 1))
> +		return -EPROTO;
> +
> +	xe_tlb_inval_done_handler(&gt->tlb_inval, msg[0]);
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.h b/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> new file mode 100644
> index 000000000000..07d668b02e3d
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_GUC_TLB_INVAL_H_
> +#define _XE_GUC_TLB_INVAL_H_
> +
> +#include <linux/types.h>
> +
> +struct xe_guc;
> +struct xe_tlb_inval;
> +
> +void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> +				 struct xe_tlb_inval *tlb_inval);
> +
> +int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
> index c795b78362bf..071c25fbdbac 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval.c
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
> @@ -12,50 +12,45 @@
>  #include "xe_gt_printk.h"
>  #include "xe_guc.h"
>  #include "xe_guc_ct.h"
> +#include "xe_guc_tlb_inval.h"
>  #include "xe_gt_stats.h"
>  #include "xe_tlb_inval.h"
>  #include "xe_mmio.h"
>  #include "xe_pm.h"
> -#include "xe_sriov.h"
> +#include "xe_tlb_inval.h"
>  #include "xe_trace.h"
> -#include "regs/xe_guc_regs.h"
> -
> -#define FENCE_STACK_BIT		DMA_FENCE_FLAG_USER_BITS
>  
> -/*
> - * TLB inval depends on pending commands in the CT queue and then the real
> - * invalidation time. Double up the time to process full CT queue
> - * just to be on the safe side.
> +/**
> + * DOC: Xe TLB invalidation
> + *
> + * Xe TLB invalidation is implemented in two layers. The first is the frontend
> + * API, which provides an interface for TLB invalidations to the driver code.
> + * The frontend handles seqno assignment, synchronization (fences), and the
> + * timeout mechanism. The frontend is implemented via an embedded structure
> + * xe_tlb_inval that includes a set of ops hooking into the backend. The backend
> + * interacts with the hardware (or firmware) to perform the actual invalidation.
>   */
> -static long tlb_timeout_jiffies(struct xe_gt *gt)
> -{
> -	/* this reflects what HW/GuC needs to process TLB inv request */
> -	const long hw_tlb_timeout = HZ / 4;
>  
> -	/* this estimates actual delay caused by the CTB transport */
> -	long delay = xe_guc_ct_queue_proc_time_jiffies(&gt->uc.guc.ct);
> -
> -	return hw_tlb_timeout + 2 * delay;
> -}
> +#define FENCE_STACK_BIT		DMA_FENCE_FLAG_USER_BITS
>  
>  static void xe_tlb_inval_fence_fini(struct xe_tlb_inval_fence *fence)
>  {
> -	struct xe_gt *gt;
> -
>  	if (WARN_ON_ONCE(!fence->tlb_inval))
>  		return;
>  
> -	gt = fence->tlb_inval->private;
> -	xe_pm_runtime_put(gt_to_xe(gt));
> +	xe_pm_runtime_put(fence->tlb_inval->xe);
>  	fence->tlb_inval = NULL; /* fini() should be called once */
>  }
>  
>  static void
> -__inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
> +xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence)
>  {
>  	bool stack = test_bit(FENCE_STACK_BIT, &fence->base.flags);
>  
> -	trace_xe_tlb_inval_fence_signal(xe, fence);
> +	lockdep_assert_held(&fence->tlb_inval->pending_lock);
> +
> +	list_del(&fence->link);
> +	trace_xe_tlb_inval_fence_signal(fence->tlb_inval->xe, fence);
>  	xe_tlb_inval_fence_fini(fence);
>  	dma_fence_signal(&fence->base);
>  	if (!stack)
> @@ -63,57 +58,50 @@ __inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
>  }
>  
>  static void
> -inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
> +xe_tlb_inval_fence_signal_unlocked(struct xe_tlb_inval_fence *fence)
>  {
> -	lockdep_assert_held(&fence->tlb_inval->pending_lock);
> -
> -	list_del(&fence->link);
> -	__inval_fence_signal(xe, fence);
> -}
> +	struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
>  
> -static void
> -inval_fence_signal_unlocked(struct xe_device *xe,
> -			    struct xe_tlb_inval_fence *fence)
> -{
> -	spin_lock_irq(&fence->tlb_inval->pending_lock);
> -	inval_fence_signal(xe, fence);
> -	spin_unlock_irq(&fence->tlb_inval->pending_lock);
> +	spin_lock_irq(&tlb_inval->pending_lock);
> +	xe_tlb_inval_fence_signal(fence);
> +	spin_unlock_irq(&tlb_inval->pending_lock);
>  }
>  
> -static void xe_gt_tlb_fence_timeout(struct work_struct *work)
> +static void xe_tlb_inval_fence_timeout(struct work_struct *work)
>  {
> -	struct xe_gt *gt = container_of(work, struct xe_gt,
> -					tlb_inval.fence_tdr.work);
> -	struct xe_device *xe = gt_to_xe(gt);
> +	struct xe_tlb_inval *tlb_inval = container_of(work, struct xe_tlb_inval,
> +						      fence_tdr.work);
> +	struct xe_device *xe = tlb_inval->xe;
>  	struct xe_tlb_inval_fence *fence, *next;
> +	long timeout_delay = tlb_inval->ops->timeout_delay(tlb_inval);
>  
> -	LNL_FLUSH_WORK(&gt->uc.guc.ct.g2h_worker);
> +	tlb_inval->ops->flush(tlb_inval);
>  
> -	spin_lock_irq(&gt->tlb_inval.pending_lock);
> +	spin_lock_irq(&tlb_inval->pending_lock);
>  	list_for_each_entry_safe(fence, next,
> -				 &gt->tlb_inval.pending_fences, link) {
> +				 &tlb_inval->pending_fences, link) {
>  		s64 since_inval_ms = ktime_ms_delta(ktime_get(),
>  						    fence->inval_time);
>  
> -		if (msecs_to_jiffies(since_inval_ms) < tlb_timeout_jiffies(gt))
> +		if (msecs_to_jiffies(since_inval_ms) < timeout_delay)
>  			break;
>  
>  		trace_xe_tlb_inval_fence_timeout(xe, fence);
> -		xe_gt_err(gt, "TLB invalidation fence timeout, seqno=%d recv=%d",
> -			  fence->seqno, gt->tlb_inval.seqno_recv);
> +		drm_err(&xe->drm,
> +			"TLB invalidation fence timeout, seqno=%d recv=%d",
> +			fence->seqno, tlb_inval->seqno_recv);
>  
>  		fence->base.error = -ETIME;
> -		inval_fence_signal(xe, fence);
> +		xe_tlb_inval_fence_signal(fence);
>  	}
> -	if (!list_empty(&gt->tlb_inval.pending_fences))
> -		queue_delayed_work(system_wq,
> -				   &gt->tlb_inval.fence_tdr,
> -				   tlb_timeout_jiffies(gt));
> -	spin_unlock_irq(&gt->tlb_inval.pending_lock);
> +	if (!list_empty(&tlb_inval->pending_fences))
> +		queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
> +				   timeout_delay);
> +	spin_unlock_irq(&tlb_inval->pending_lock);
>  }
>  
>  /**
> - * xe_tlb_inval_init_early - Initialize TLB invalidation state
> + * xe_gt_tlb_inval_init_early() - Initialize TLB invalidation state
>   * @gt: GT structure
>   *
>   * Initialize TLB invalidation state, purely software initialization, should
> @@ -123,13 +111,12 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
>   */
>  int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
>  {
> -	gt->tlb_inval.private = gt;
> +	gt->tlb_inval.xe = gt_to_xe(gt);
>  	gt->tlb_inval.seqno = 1;
>  	INIT_LIST_HEAD(&gt->tlb_inval.pending_fences);
>  	spin_lock_init(&gt->tlb_inval.pending_lock);
>  	spin_lock_init(&gt->tlb_inval.lock);
> -	INIT_DELAYED_WORK(&gt->tlb_inval.fence_tdr,
> -			  xe_gt_tlb_fence_timeout);
> +	INIT_DELAYED_WORK(&gt->tlb_inval.fence_tdr, xe_tlb_inval_fence_timeout);
>  
>  	gt->tlb_inval.job_wq =
>  		drmm_alloc_ordered_workqueue(&gt_to_xe(gt)->drm, "gt-tbl-inval-job-wq",
> @@ -137,60 +124,64 @@ int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
>  	if (IS_ERR(gt->tlb_inval.job_wq))
>  		return PTR_ERR(gt->tlb_inval.job_wq);
>  
> +	/* XXX: Blindly setting up backend to GuC */
> +	xe_guc_tlb_inval_init_early(&gt->uc.guc, &gt->tlb_inval);
> +
>  	return 0;
>  }
>  
>  /**
> - * xe_tlb_inval_reset - Initialize TLB invalidation reset
> + * xe_tlb_inval_reset() - TLB invalidation reset
>   * @tlb_inval: TLB invalidation client
>   *
>   * Signal any pending invalidation fences, should be called during a GT reset
>   */
>  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval)
>  {
> -	struct xe_gt *gt = tlb_inval->private;
>  	struct xe_tlb_inval_fence *fence, *next;
>  	int pending_seqno;
>  
>  	/*
> -	 * we can get here before the CTs are even initialized if we're wedging
> -	 * very early, in which case there are not going to be any pending
> -	 * fences so we can bail immediately.
> +	 * we can get here before the backends are even initialized if we're
> +	 * wedging very early, in which case there are not going to be any
> +	 * pendind fences so we can bail immediately.
>  	 */
> -	if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
> +	if (!tlb_inval->ops->initialized(tlb_inval))
>  		return;
>  
>  	/*
> -	 * CT channel is already disabled at this point. No new TLB requests can
> +	 * Backend is already disabled at this point. No new TLB requests can
>  	 * appear.
>  	 */
>  
> -	mutex_lock(&gt->uc.guc.ct.lock);
> -	spin_lock_irq(&gt->tlb_inval.pending_lock);
> -	cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> +	tlb_inval->ops->lock(tlb_inval);

I think you want a dedicated lock embedded in struct xe_tlb_inval,
rather than reaching into the backend to grab one.

This will deadlock as written: G2H TLB inval messages are sometimes
processed while holding ct->lock (non-fast path, unlikely) and sometimes
without it (fast path, likely).

I’d call this lock seqno_lock, since it protects exactly that—the order
in which a seqno is assigned by the frontend and handed to the backend.

Prime this lock for reclaim as well—do what primelockdep() does in
xe_guc_ct.c—to make it clear that memory allocations are not allowed
while the lock is held as TLB invalidations can be called from two
reclaim paths:

- MMU notifier callbacks
- The dma-fence signaling path of VM binds that require a TLB
  invalidation

> +	spin_lock_irq(&tlb_inval->pending_lock);
> +	cancel_delayed_work(&tlb_inval->fence_tdr);
>  	/*
>  	 * We might have various kworkers waiting for TLB flushes to complete
>  	 * which are not tracked with an explicit TLB fence, however at this
> -	 * stage that will never happen since the CT is already disabled, so
> -	 * make sure we signal them here under the assumption that we have
> +	 * stage that will never happen since the backend is already disabled,
> +	 * so make sure we signal them here under the assumption that we have
>  	 * completed a full GT reset.
>  	 */
> -	if (gt->tlb_inval.seqno == 1)
> +	if (tlb_inval->seqno == 1)
>  		pending_seqno = TLB_INVALIDATION_SEQNO_MAX - 1;
>  	else
> -		pending_seqno = gt->tlb_inval.seqno - 1;
> -	WRITE_ONCE(gt->tlb_inval.seqno_recv, pending_seqno);
> +		pending_seqno = tlb_inval->seqno - 1;
> +	WRITE_ONCE(tlb_inval->seqno_recv, pending_seqno);
>  
>  	list_for_each_entry_safe(fence, next,
> -				 &gt->tlb_inval.pending_fences, link)
> -		inval_fence_signal(gt_to_xe(gt), fence);
> -	spin_unlock_irq(&gt->tlb_inval.pending_lock);
> -	mutex_unlock(&gt->uc.guc.ct.lock);
> +				 &tlb_inval->pending_fences, link)
> +		xe_tlb_inval_fence_signal(fence);
> +	spin_unlock_irq(&tlb_inval->pending_lock);
> +	tlb_inval->ops->unlock(tlb_inval);
>  }
>  
> -static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
> +static bool xe_tlb_inval_seqno_past(struct xe_tlb_inval *tlb_inval, int seqno)
>  {
> -	int seqno_recv = READ_ONCE(gt->tlb_inval.seqno_recv);
> +	int seqno_recv = READ_ONCE(tlb_inval->seqno_recv);
> +
> +	lockdep_assert_held(&tlb_inval->pending_lock);
>  
>  	if (seqno - seqno_recv < -(TLB_INVALIDATION_SEQNO_MAX / 2))
>  		return false;
> @@ -201,44 +192,20 @@ static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
>  	return seqno_recv >= seqno;
>  }
>  
> -static int send_tlb_inval(struct xe_guc *guc, const u32 *action, int len)
> -{
> -	struct xe_gt *gt = guc_to_gt(guc);
> -
> -	xe_gt_assert(gt, action[1]);	/* Seqno */
> -	lockdep_assert_held(&guc->ct.lock);
> -
> -	/*
> -	 * XXX: The seqno algorithm relies on TLB invalidation being processed
> -	 * in order which they currently are, if that changes the algorithm will
> -	 * need to be updated.
> -	 */
> -
> -	xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> -
> -	return xe_guc_ct_send(&guc->ct, action, len,
> -			      G2H_LEN_DW_TLB_INVALIDATE, 1);
> -}
> -
>  static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence *fence)
>  {
>  	struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
> -	struct xe_gt *gt = tlb_inval->private;
> -	struct xe_device *xe = gt_to_xe(gt);
> -
> -	lockdep_assert_held(&gt->uc.guc.ct.lock);
>  
>  	fence->seqno = tlb_inval->seqno;
> -	trace_xe_tlb_inval_fence_send(xe, fence);
> +	trace_xe_tlb_inval_fence_send(tlb_inval->xe, fence);
>  
>  	spin_lock_irq(&tlb_inval->pending_lock);
>  	fence->inval_time = ktime_get();
>  	list_add_tail(&fence->link, &tlb_inval->pending_fences);
>  
>  	if (list_is_singular(&tlb_inval->pending_fences))
> -		queue_delayed_work(system_wq,
> -				   &tlb_inval->fence_tdr,
> -				   tlb_timeout_jiffies(gt));
> +		queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
> +				   tlb_inval->ops->timeout_delay(tlb_inval));
>  	spin_unlock_irq(&tlb_inval->pending_lock);
>  
>  	tlb_inval->seqno = (tlb_inval->seqno + 1) %
> @@ -247,202 +214,63 @@ static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence *fence)
>  		tlb_inval->seqno = 1;
>  }
>  
> -#define MAKE_INVAL_OP(type)	((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> -		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \
> -		XE_GUC_TLB_INVAL_FLUSH_CACHE)
> -
> -static int send_tlb_inval_ggtt(struct xe_gt *gt, int seqno)
> -{
> -	u32 action[] = {
> -		XE_GUC_ACTION_TLB_INVALIDATION,
> -		seqno,
> -		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> -	};
> -
> -	return send_tlb_inval(&gt->uc.guc, action, ARRAY_SIZE(action));
> -}
> -
> -static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> -			      struct xe_tlb_inval_fence *fence)
> -{
> -	u32 action[] = {
> -		XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> -		0,  /* seqno, replaced in send_tlb_inval */
> -		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> -	};
> -	struct xe_gt *gt = tlb_inval->private;
> -
> -	xe_gt_assert(gt, fence);
> -
> -	return send_tlb_inval(&gt->uc.guc, action, ARRAY_SIZE(action));
> -}
> +#define xe_tlb_inval_issue(__tlb_inval, __fence, op, args...)	\
> +({								\
> +	int __ret;						\
> +								\
> +	xe_assert((__tlb_inval)->xe, (__tlb_inval)->ops);	\
> +	xe_assert((__tlb_inval)->xe, (__fence));		\
> +								\
> +	(__tlb_inval)->ops->lock((__tlb_inval));		\
> +	xe_tlb_inval_fence_prep((__fence));			\
> +	__ret = op((__tlb_inval), (__fence)->seqno, ##args);	\
> +	if (__ret < 0)						\
> +		xe_tlb_inval_fence_signal_unlocked((__fence));	\
> +	(__tlb_inval)->ops->unlock((__tlb_inval));		\
> +								\
> +	__ret == -ECANCELED ? 0 : __ret;			\
> +})
>  
>  /**
> - * xe_gt_tlb_invalidation_all - Invalidate all TLBs across PF and all VFs.
> - * @gt: the &xe_gt structure
> - * @fence: the &xe_tlb_inval_fence to be signaled on completion
> + * xe_tlb_inval_all() - Issue a TLB invalidation for all TLBs
> + * @tlb_inval: TLB invalidation client
> + * @fence: invalidation fence which will be signal on TLB invalidation
> + * completion
>   *
> - * Send a request to invalidate all TLBs across PF and all VFs.
> + * Issue a TLB invalidation for all TLBs. Completion of TLB is asynchronous and
> + * caller can use the invalidation fence to wait for completion.
>   *
>   * Return: 0 on success, negative error code on error
>   */
>  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
>  		     struct xe_tlb_inval_fence *fence)
>  {
> -	struct xe_gt *gt = tlb_inval->private;
> -	int err;
> -
> -	err = send_tlb_inval_all(tlb_inval, fence);
> -	if (err)
> -		xe_gt_err(gt, "TLB invalidation request failed (%pe)", ERR_PTR(err));
> -
> -	return err;
> -}
> -
> -/*
> - * Ensure that roundup_pow_of_two(length) doesn't overflow.
> - * Note that roundup_pow_of_two() operates on unsigned long,
> - * not on u64.
> - */
> -#define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
> -
> -static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64 start, u64 end,
> -				u32 asid, int seqno)
> -{
> -#define MAX_TLB_INVALIDATION_LEN	7
> -	u32 action[MAX_TLB_INVALIDATION_LEN];
> -	u64 length = end - start;
> -	int len = 0;
> -
> -	action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> -	action[len++] = seqno;
> -	if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> -	    length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> -		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> -	} else {
> -		u64 orig_start = start;
> -		u64 align;
> -
> -		if (length < SZ_4K)
> -			length = SZ_4K;
> -
> -		/*
> -		 * We need to invalidate a higher granularity if start address
> -		 * is not aligned to length. When start is not aligned with
> -		 * length we need to find the length large enough to create an
> -		 * address mask covering the required range.
> -		 */
> -		align = roundup_pow_of_two(length);
> -		start = ALIGN_DOWN(start, align);
> -		end = ALIGN(end, align);
> -		length = align;
> -		while (start + length < end) {
> -			length <<= 1;
> -			start = ALIGN_DOWN(orig_start, length);
> -		}
> -
> -		/*
> -		 * Minimum invalidation size for a 2MB page that the hardware
> -		 * expects is 16MB
> -		 */
> -		if (length >= SZ_2M) {
> -			length = max_t(u64, SZ_16M, length);
> -			start = ALIGN_DOWN(orig_start, length);
> -		}
> -
> -		xe_gt_assert(gt, length >= SZ_4K);
> -		xe_gt_assert(gt, is_power_of_2(length));
> -		xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1,
> -						    ilog2(SZ_2M) + 1)));
> -		xe_gt_assert(gt, IS_ALIGNED(start, length));
> -
> -		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> -		action[len++] = asid;
> -		action[len++] = lower_32_bits(start);
> -		action[len++] = upper_32_bits(start);
> -		action[len++] = ilog2(length) - ilog2(SZ_4K);
> -	}
> -
> -	xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> -
> -	return send_tlb_inval(&gt->uc.guc, action, len);
> -}
> -
> -static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
> -			       struct xe_tlb_inval_fence *fence)
> -{
> -	int ret;
> -
> -	mutex_lock(&gt->uc.guc.ct.lock);
> -
> -	xe_tlb_inval_fence_prep(fence);
> -
> -	ret = send_tlb_inval_ggtt(gt, fence->seqno);
> -	if (ret < 0)
> -		inval_fence_signal_unlocked(gt_to_xe(gt), fence);
> -
> -	mutex_unlock(&gt->uc.guc.ct.lock);
> -
> -	/*
> -	 * -ECANCELED indicates the CT is stopped for a GT reset. TLB caches
> -	 *  should be nuked on a GT reset so this error can be ignored.
> -	 */
> -	if (ret == -ECANCELED)
> -		return 0;
> -
> -	return ret;
> +	return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval->ops->all);
>  }
>  
>  /**
> - * xe_tlb_inval_ggtt - Issue a TLB invalidation on this GT for the GGTT
> + * xe_tlb_inval_ggtt() - Issue a TLB invalidation for the GGTT
>   * @tlb_inval: TLB invalidation client
>   *
> - * Issue a TLB invalidation for the GGTT. Completion of TLB invalidation is
> - * synchronous.
> + * Issue a TLB invalidation for the GGTT. Completion of TLB is asynchronous and
> + * caller can use the invalidation fence to wait for completion.
>   *
>   * Return: 0 on success, negative error code on error
>   */
>  int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
>  {
> -	struct xe_gt *gt = tlb_inval->private;
> -	struct xe_device *xe = gt_to_xe(gt);
> -	unsigned int fw_ref;
> -
> -	if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
> -	    gt->uc.guc.submission_state.enabled) {
> -		struct xe_tlb_inval_fence fence;
> -		int ret;
> -
> -		xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> -		ret = __xe_tlb_inval_ggtt(gt, &fence);
> -		if (ret)
> -			return ret;
> -
> -		xe_tlb_inval_fence_wait(&fence);
> -	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
> -		struct xe_mmio *mmio = &gt->mmio;
> -
> -		if (IS_SRIOV_VF(xe))
> -			return 0;
> -
> -		fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
> -		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
> -			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC1,
> -					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
> -			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC0,
> -					PVC_GUC_TLB_INV_DESC0_VALID);
> -		} else {
> -			xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> -					GUC_TLB_INV_CR_INVALIDATE);
> -		}
> -		xe_force_wake_put(gt_to_fw(gt), fw_ref);
> -	}
> +	struct xe_tlb_inval_fence fence, *fence_ptr = &fence;
> +	int ret;
>  
> -	return 0;
> +	xe_tlb_inval_fence_init(tlb_inval, fence_ptr, true);
> +	ret = xe_tlb_inval_issue(tlb_inval, fence_ptr, tlb_inval->ops->ggtt);
> +	xe_tlb_inval_fence_wait(fence_ptr);
> +
> +	return ret;
>  }
>  
>  /**
> - * xe_tlb_inval_range - Issue a TLB invalidation on this GT for an address range
> + * xe_tlb_inval_range() - Issue a TLB invalidation for an address range
>   * @tlb_inval: TLB invalidation client
>   * @fence: invalidation fence which will be signal on TLB invalidation
>   * completion
> @@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
>  		       struct xe_tlb_inval_fence *fence, u64 start, u64 end,
>  		       u32 asid)
>  {
> -	struct xe_gt *gt = tlb_inval->private;
> -	struct xe_device *xe = gt_to_xe(gt);
> -	int  ret;
> -
> -	xe_gt_assert(gt, fence);
> -
> -	/* Execlists not supported */
> -	if (xe->info.force_execlist) {
> -		__inval_fence_signal(xe, fence);
> -		return 0;
> -	}
> -
> -	mutex_lock(&gt->uc.guc.ct.lock);
> -
> -	xe_tlb_inval_fence_prep(fence);
> -
> -	ret = send_tlb_inval_ppgtt(gt, start, end, asid, fence->seqno);
> -	if (ret < 0)
> -		inval_fence_signal_unlocked(xe, fence);
> -
> -	mutex_unlock(&gt->uc.guc.ct.lock);
> -
> -	return ret;
> +	return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval->ops->ppgtt,
> +				  start, end, asid);
>  }
>  
>  /**
> - * xe_tlb_inval_vm - Issue a TLB invalidation on this GT for a VM
> + * xe_tlb_inval_vm() - Issue a TLB invalidation for a VM
>   * @tlb_inval: TLB invalidation client
>   * @vm: VM to invalidate
>   *
> @@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm)
>  {
>  	struct xe_tlb_inval_fence fence;
>  	u64 range = 1ull << vm->xe->info.va_bits;
> -	int ret;
>  
>  	xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> -
> -	ret = xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm->usm.asid);
> -	if (ret < 0)
> -		return;
> -
> +	xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm->usm.asid);
>  	xe_tlb_inval_fence_wait(&fence);
>  }
>  
>  /**
> - * xe_tlb_inval_done_handler - TLB invalidation done handler
> - * @gt: gt
> + * xe_tlb_inval_done_handler() - TLB invalidation done handler
> + * @tlb_inval: TLB invalidation client
>   * @seqno: seqno of invalidation that is done
>   *
>   * Update recv seqno, signal any TLB invalidation fences, and restart TDR

I'd mention that is function is safe be called from any context (i.e.,
process, atomic, and hardirq contexts are allowed).

We might need to convert tlb_inval.pending_lock to a raw_spinlock_t for
PREEMPT_RT enablement. Same for the GuC fast_lock. AFAIK we haven’t had
any complaints, so maybe I’m just overthinking it, but also perhaps not.

>   */
> -static void xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> +void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno)
>  {
> -	struct xe_device *xe = gt_to_xe(gt);
> +	struct xe_device *xe = tlb_inval->xe;
>  	struct xe_tlb_inval_fence *fence, *next;
>  	unsigned long flags;
>  
> @@ -535,77 +337,53 @@ static void xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
>  	 * officially process the CT message like if racing against
>  	 * process_g2h_msg().
>  	 */
> -	spin_lock_irqsave(&gt->tlb_inval.pending_lock, flags);
> -	if (tlb_inval_seqno_past(gt, seqno)) {
> -		spin_unlock_irqrestore(&gt->tlb_inval.pending_lock, flags);
> +	spin_lock_irqsave(&tlb_inval->pending_lock, flags);
> +	if (xe_tlb_inval_seqno_past(tlb_inval, seqno)) {
> +		spin_unlock_irqrestore(&tlb_inval->pending_lock, flags);
>  		return;
>  	}
>  
> -	WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
> +	WRITE_ONCE(tlb_inval->seqno_recv, seqno);
>  
>  	list_for_each_entry_safe(fence, next,
> -				 &gt->tlb_inval.pending_fences, link) {
> +				 &tlb_inval->pending_fences, link) {
>  		trace_xe_tlb_inval_fence_recv(xe, fence);
>  
> -		if (!tlb_inval_seqno_past(gt, fence->seqno))
> +		if (!xe_tlb_inval_seqno_past(tlb_inval, fence->seqno))
>  			break;
>  
> -		inval_fence_signal(xe, fence);
> +		xe_tlb_inval_fence_signal(fence);
>  	}
>  
> -	if (!list_empty(&gt->tlb_inval.pending_fences))
> +	if (!list_empty(&tlb_inval->pending_fences))
>  		mod_delayed_work(system_wq,
> -				 &gt->tlb_inval.fence_tdr,
> -				 tlb_timeout_jiffies(gt));
> +				 &tlb_inval->fence_tdr,
> +				 tlb_inval->ops->timeout_delay(tlb_inval));
>  	else
> -		cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> +		cancel_delayed_work(&tlb_inval->fence_tdr);
>  
> -	spin_unlock_irqrestore(&gt->tlb_inval.pending_lock, flags);
> -}
> -
> -/**
> - * xe_guc_tlb_inval_done_handler - TLB invalidation done handler
> - * @guc: guc
> - * @msg: message indicating TLB invalidation done
> - * @len: length of message
> - *
> - * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
> - * invalidation fences for seqno. Algorithm for this depends on seqno being
> - * received in-order and asserts this assumption.
> - *
> - * Return: 0 on success, -EPROTO for malformed messages.
> - */
> -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> -{
> -	struct xe_gt *gt = guc_to_gt(guc);
> -
> -	if (unlikely(len != 1))
> -		return -EPROTO;
> -
> -	xe_tlb_inval_done_handler(gt, msg[0]);
> -
> -	return 0;
> +	spin_unlock_irqrestore(&tlb_inval->pending_lock, flags);
>  }
>  
>  static const char *
> -inval_fence_get_driver_name(struct dma_fence *dma_fence)
> +xe_inval_fence_get_driver_name(struct dma_fence *dma_fence)
>  {
>  	return "xe";
>  }
>  
>  static const char *
> -inval_fence_get_timeline_name(struct dma_fence *dma_fence)
> +xe_inval_fence_get_timeline_name(struct dma_fence *dma_fence)
>  {
> -	return "inval_fence";
> +	return "tlb_inval_fence";
>  }
>  
>  static const struct dma_fence_ops inval_fence_ops = {
> -	.get_driver_name = inval_fence_get_driver_name,
> -	.get_timeline_name = inval_fence_get_timeline_name,
> +	.get_driver_name = xe_inval_fence_get_driver_name,
> +	.get_timeline_name = xe_inval_fence_get_timeline_name,
>  };
>  
>  /**
> - * xe_tlb_inval_fence_init - Initialize TLB invalidation fence
> + * xe_tlb_inval_fence_init() - Initialize TLB invalidation fence
>   * @tlb_inval: TLB invalidation client
>   * @fence: TLB invalidation fence to initialize
>   * @stack: fence is stack variable
> @@ -618,15 +396,12 @@ void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
>  			     struct xe_tlb_inval_fence *fence,
>  			     bool stack)
>  {
> -	struct xe_gt *gt = tlb_inval->private;
> -
> -	xe_pm_runtime_get_noresume(gt_to_xe(gt));
> +	xe_pm_runtime_get_noresume(tlb_inval->xe);
>  
> -	spin_lock_irq(&gt->tlb_inval.lock);
> -	dma_fence_init(&fence->base, &inval_fence_ops,
> -		       &gt->tlb_inval.lock,
> +	spin_lock_irq(&tlb_inval->lock);
> +	dma_fence_init(&fence->base, &inval_fence_ops, &tlb_inval->lock,
>  		       dma_fence_context_alloc(1), 1);
> -	spin_unlock_irq(&gt->tlb_inval.lock);
> +	spin_unlock_irq(&tlb_inval->lock);

While here, 'fence_lock' is probably a better name.

Matt

>  	INIT_LIST_HEAD(&fence->link);
>  	if (stack)
>  		set_bit(FENCE_STACK_BIT, &fence->base.flags);
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h b/drivers/gpu/drm/xe/xe_tlb_inval.h
> index 7adee3f8c551..cdeafc8d4391 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> @@ -18,24 +18,30 @@ struct xe_vma;
>  int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
>  
>  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval);
> -int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> -void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm);
>  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
>  		     struct xe_tlb_inval_fence *fence);
> +int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> +void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm);
>  int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
>  		       struct xe_tlb_inval_fence *fence,
>  		       u64 start, u64 end, u32 asid);
> -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
>  
>  void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
>  			     struct xe_tlb_inval_fence *fence,
>  			     bool stack);
> -void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence);
>  
> +/**
> + * xe_tlb_inval_fence_wait() - TLB invalidiation fence wait
> + * @fence: TLB invalidation fence to wait on
> + *
> + * Wait on a TLB invalidiation fence until it signals, non interruptable
> + */
>  static inline void
>  xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence *fence)
>  {
>  	dma_fence_wait(&fence->base, false);
>  }
>  
> +void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno);
> +
>  #endif	/* _XE_TLB_INVAL_ */
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> index 05b6adc929bb..c1ad96d24fc8 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> @@ -9,10 +9,85 @@
>  #include <linux/workqueue.h>
>  #include <linux/dma-fence.h>
>  
> -/** struct xe_tlb_inval - TLB invalidation client */
> +struct xe_tlb_inval;
> +
> +/** struct xe_tlb_inval_ops - TLB invalidation ops (backend) */
> +struct xe_tlb_inval_ops {
> +	/**
> +	 * @all: Invalidate all TLBs
> +	 * @tlb_inval: TLB invalidation client
> +	 * @seqno: Seqno of TLB invalidation
> +	 *
> +	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
> +	 * failure
> +	 */
> +	int (*all)(struct xe_tlb_inval *tlb_inval, u32 seqno);
> +
> +	/**
> +	 * @ggtt: Invalidate global translation TLBs
> +	 * @tlb_inval: TLB invalidation client
> +	 * @seqno: Seqno of TLB invalidation
> +	 *
> +	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
> +	 * failure
> +	 */
> +	int (*ggtt)(struct xe_tlb_inval *tlb_inval, u32 seqno);
> +
> +	/**
> +	 * @ppttt: Invalidate per-process translation TLBs
> +	 * @tlb_inval: TLB invalidation client
> +	 * @seqno: Seqno of TLB invalidation
> +	 * @start: Start address
> +	 * @end: End address
> +	 * @asid: Address space ID
> +	 *
> +	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
> +	 * failure
> +	 */
> +	int (*ppgtt)(struct xe_tlb_inval *tlb_inval, u32 seqno, u64 start,
> +		     u64 end, u32 asid);
> +
> +	/**
> +	 * @initialized: Backend is initialized
> +	 * @tlb_inval: TLB invalidation client
> +	 *
> +	 * Return: True if back is initialized, False otherwise
> +	 */
> +	bool (*initialized)(struct xe_tlb_inval *tlb_inval);
> +
> +	/**
> +	 * @flush: Flush pending TLB invalidations
> +	 * @tlb_inval: TLB invalidation client
> +	 */
> +	void (*flush)(struct xe_tlb_inval *tlb_inval);
> +
> +	/**
> +	 * @timeout_delay: Timeout delay for TLB invalidation
> +	 * @tlb_inval: TLB invalidation client
> +	 *
> +	 * Return: Timeout delay for TLB invalidation in jiffies
> +	 */
> +	long (*timeout_delay)(struct xe_tlb_inval *tlb_inval);
> +
> +	/**
> +	 * @lock: Lock resources protecting the backend seqno management
> +	 */
> +	void (*lock)(struct xe_tlb_inval *tlb_inval);
> +
> +	/**
> +	 * @unlock: Lock resources protecting the backend seqno management
> +	 */
> +	void (*unlock)(struct xe_tlb_inval *tlb_inval);
> +};
> +
> +/** struct xe_tlb_inval - TLB invalidation client (frontend) */
>  struct xe_tlb_inval {
>  	/** @private: Backend private pointer */
>  	void *private;
> +	/** @xe: Pointer to Xe device */
> +	struct xe_device *xe;
> +	/** @ops: TLB invalidation ops */
> +	const struct xe_tlb_inval_ops *ops;
>  	/** @tlb_inval.seqno: TLB invalidation seqno, protected by CT lock */
>  #define TLB_INVALIDATION_SEQNO_MAX	0x100000
>  	int seqno;
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend
  2025-07-23 19:17   ` Matthew Brost
@ 2025-07-23 20:18     ` Matthew Brost
  2025-07-23 20:20       ` Summers, Stuart
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Brost @ 2025-07-23 20:18 UTC (permalink / raw)
  To: stuartsummers; +Cc: matthew.auld, maarten.lankhorst, farah.kassabri, intel-xe

On Wed, Jul 23, 2025 at 12:17:49PM -0700, Matthew Brost wrote:
> On Wed, Jul 23, 2025 at 06:22:22PM +0000, stuartsummers wrote:
> > From: Matthew Brost <matthew.brost@intel.com>
> > 
> > The frontend exposes an API to the driver to send invalidations, handles
> > sequence number assignment, synchronization (fences), and provides a
> > timeout mechanism. The backend issues the actual invalidation to the
> > hardware (or firmware).
> > 
> > The new layering easily allows issuing TLB invalidations to different
> > hardware or firmware interfaces.
> > 
> > Normalize some naming while here too.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> > ---
> >  drivers/gpu/drm/xe/Makefile             |   1 +
> >  drivers/gpu/drm/xe/xe_guc_ct.c          |   2 +-
> >  drivers/gpu/drm/xe/xe_guc_tlb_inval.c   | 263 +++++++++++++
> >  drivers/gpu/drm/xe/xe_guc_tlb_inval.h   |  19 +
> >  drivers/gpu/drm/xe/xe_tlb_inval.c       | 495 +++++++-----------------
> >  drivers/gpu/drm/xe/xe_tlb_inval.h       |  14 +-
> >  drivers/gpu/drm/xe/xe_tlb_inval_types.h |  77 +++-
> >  7 files changed, 505 insertions(+), 366 deletions(-)
> >  create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> >  create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> > 
> > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> > index 332b2057cc00..8a2f836b3ab2 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -75,6 +75,7 @@ xe-y += xe_bb.o \
> >  	xe_guc_log.o \
> >  	xe_guc_pc.o \
> >  	xe_guc_submit.o \
> > +	xe_guc_tlb_inval.o \
> >  	xe_heci_gsc.o \
> >  	xe_huc.o \
> >  	xe_hw_engine.o \
> > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > index 2ef86c0ae8b4..90ebda5b3790 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > @@ -30,9 +30,9 @@
> >  #include "xe_guc_log.h"
> >  #include "xe_guc_relay.h"
> >  #include "xe_guc_submit.h"
> > +#include "xe_guc_tlb_inval.h"
> >  #include "xe_map.h"
> >  #include "xe_pm.h"
> > -#include "xe_tlb_inval.h"
> >  #include "xe_trace_guc.h"
> >  
> >  static void receive_g2h(struct xe_guc_ct *ct);
> > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > new file mode 100644
> > index 000000000000..27d7dc938cb1
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > @@ -0,0 +1,263 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#include "abi/guc_actions_abi.h"
> > +
> > +#include "xe_device.h"
> > +#include "xe_gt_stats.h"
> > +#include "xe_gt_types.h"
> > +#include "xe_guc.h"
> > +#include "xe_guc_ct.h"
> > +#include "xe_guc_tlb_inval.h"
> > +#include "xe_force_wake.h"
> > +#include "xe_mmio.h"
> > +#include "xe_tlb_inval.h"
> > +
> > +#include "regs/xe_guc_regs.h"
> > +
> > +/*
> > + * XXX: The seqno algorithm relies on TLB invalidation being processed in order
> > + * which they currently are by the GuC, if that changes the algorithm will need
> > + * to be updated.
> > + */
> > +
> > +static int send_tlb_inval(struct xe_guc *guc, const u32 *action, int len)
> > +{
> > +	struct xe_gt *gt = guc_to_gt(guc);
> > +
> > +	lockdep_assert_held(&guc->ct.lock);
> > +	xe_gt_assert(gt, action[1]);	/* Seqno */
> > +
> > +	xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> > +	return xe_guc_ct_send(&guc->ct, action, len,
> > +			      G2H_LEN_DW_TLB_INVALIDATE, 1);
> 
> As written, you’d need xe_guc_ct_send_locked here—but you actually
> don’t. More on that below.
> 
> > +}
> > +
> > +#define MAKE_INVAL_OP(type)	((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > +		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > +		XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > +
> > +static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval, u32 seqno)
> > +{
> > +	struct xe_guc *guc = tlb_inval->private;
> > +	u32 action[] = {
> > +		XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > +		seqno,
> > +		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > +	};
> > +
> > +	return send_tlb_inval(guc, action, ARRAY_SIZE(action));
> > +}
> > +
> > +static int send_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval, u32 seqno)
> > +{
> > +	struct xe_guc *guc = tlb_inval->private;
> > +	struct xe_gt *gt = guc_to_gt(guc);
> > +	struct xe_device *xe = guc_to_xe(guc);
> > +
> > +	lockdep_assert_held(&guc->ct.lock);
> > +
> > +	/*
> > +	 * Returning -ECANCELED in this function is squashed at the caller and
> > +	 * signals waiters.
> > +	 */
> > +
> > +	if (xe_guc_ct_enabled(&guc->ct) && guc->submission_state.enabled) {
> > +		u32 action[] = {
> > +			XE_GUC_ACTION_TLB_INVALIDATION,
> > +			seqno,
> > +			MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > +		};
> > +
> > +		return send_tlb_inval(guc, action, ARRAY_SIZE(action));
> > +	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
> > +		struct xe_mmio *mmio = &gt->mmio;
> > +		unsigned int fw_ref;
> > +
> > +		if (IS_SRIOV_VF(xe))
> > +			return -ECANCELED;
> > +
> > +		fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
> > +		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
> > +			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC1,
> > +					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
> > +			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC0,
> > +					PVC_GUC_TLB_INV_DESC0_VALID);
> > +		} else {
> > +			xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> > +					GUC_TLB_INV_CR_INVALIDATE);
> > +		}
> > +		xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > +	}
> > +
> > +	return -ECANCELED;
> > +}
> > +
> > +/*
> > + * Ensure that roundup_pow_of_two(length) doesn't overflow.
> > + * Note that roundup_pow_of_two() operates on unsigned long,
> > + * not on u64.
> > + */
> > +#define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
> > +
> > +static int send_tlb_inval_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno,
> > +				u64 start, u64 end, u32 asid)
> > +{
> > +#define MAX_TLB_INVALIDATION_LEN	7
> > +	struct xe_guc *guc = tlb_inval->private;
> > +	struct xe_gt *gt = guc_to_gt(guc);
> > +	u32 action[MAX_TLB_INVALIDATION_LEN];
> > +	u64 length = end - start;
> > +	int len = 0;
> > +
> > +	lockdep_assert_held(&guc->ct.lock);
> > +
> > +	if (guc_to_xe(guc)->info.force_execlist)
> > +		return -ECANCELED;
> > +
> > +	action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > +	action[len++] = seqno;
> > +	if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > +	    length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > +		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > +	} else {
> > +		u64 orig_start = start;
> > +		u64 align;
> > +
> > +		if (length < SZ_4K)
> > +			length = SZ_4K;
> > +
> > +		/*
> > +		 * We need to invalidate a higher granularity if start address
> > +		 * is not aligned to length. When start is not aligned with
> > +		 * length we need to find the length large enough to create an
> > +		 * address mask covering the required range.
> > +		 */
> > +		align = roundup_pow_of_two(length);
> > +		start = ALIGN_DOWN(start, align);
> > +		end = ALIGN(end, align);
> > +		length = align;
> > +		while (start + length < end) {
> > +			length <<= 1;
> > +			start = ALIGN_DOWN(orig_start, length);
> > +		}
> > +
> > +		/*
> > +		 * Minimum invalidation size for a 2MB page that the hardware
> > +		 * expects is 16MB
> > +		 */
> > +		if (length >= SZ_2M) {
> > +			length = max_t(u64, SZ_16M, length);
> > +			start = ALIGN_DOWN(orig_start, length);
> > +		}
> > +
> > +		xe_gt_assert(gt, length >= SZ_4K);
> > +		xe_gt_assert(gt, is_power_of_2(length));
> > +		xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1,
> > +						    ilog2(SZ_2M) + 1)));
> > +		xe_gt_assert(gt, IS_ALIGNED(start, length));
> > +
> > +		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > +		action[len++] = asid;
> > +		action[len++] = lower_32_bits(start);
> > +		action[len++] = upper_32_bits(start);
> > +		action[len++] = ilog2(length) - ilog2(SZ_4K);
> > +	}
> > +
> > +	xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> > +
> > +	return send_tlb_inval(guc, action, len);
> > +}
> > +
> > +static bool tlb_inval_initialized(struct xe_tlb_inval *tlb_inval)
> > +{
> > +	struct xe_guc *guc = tlb_inval->private;
> > +
> > +	return xe_guc_ct_initialized(&guc->ct);
> > +}
> > +
> > +static void tlb_inval_flush(struct xe_tlb_inval *tlb_inval)
> > +{
> > +	struct xe_guc *guc = tlb_inval->private;
> > +
> > +	LNL_FLUSH_WORK(&guc->ct.g2h_worker);
> > +}
> > +
> > +static long tlb_inval_timeout_delay(struct xe_tlb_inval *tlb_inval)
> > +{
> > +	struct xe_guc *guc = tlb_inval->private;
> > +
> > +	/* this reflects what HW/GuC needs to process TLB inv request */
> > +	const long hw_tlb_timeout = HZ / 4;
> > +
> > +	/* this estimates actual delay caused by the CTB transport */
> > +	long delay = xe_guc_ct_queue_proc_time_jiffies(&guc->ct);
> > +
> > +	return hw_tlb_timeout + 2 * delay;
> > +}
> > +
> > +static void tlb_inval_lock(struct xe_tlb_inval *tlb_inval)
> > +{
> > +	struct xe_guc *guc = tlb_inval->private;
> > +
> > +	mutex_lock(&guc->ct.lock);
> > +}
> > +
> > +static void tlb_inval_unlock(struct xe_tlb_inval *tlb_inval)
> > +{
> > +	struct xe_guc *guc = tlb_inval->private;
> > +
> > +	mutex_unlock(&guc->ct.lock);
> > +}
> > +
> > +static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
> > +	.all = send_tlb_inval_all,
> > +	.ggtt = send_tlb_inval_ggtt,
> > +	.ppgtt = send_tlb_inval_ppgtt,
> > +	.initialized = tlb_inval_initialized,
> > +	.flush = tlb_inval_flush,
> > +	.timeout_delay = tlb_inval_timeout_delay,
> > +	.lock = tlb_inval_lock,
> > +	.unlock = tlb_inval_unlock,
> > +};
> > +
> > +/**
> > + * xe_guc_tlb_inval_init_early() - Init GuC TLB invalidation early
> > + * @guc: GuC object
> > + * @tlb_inval: TLB invalidation client
> > + *
> > + * Inititialize GuC TLB invalidation by setting back pointer in TLB invalidation
> > + * client to the GuC and setting GuC backend ops.
> > + */
> > +void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> > +				 struct xe_tlb_inval *tlb_inval)
> > +{
> > +	tlb_inval->private = guc;
> > +	tlb_inval->ops = &guc_tlb_inval_ops;
> > +}
> > +
> > +/**
> > + * xe_guc_tlb_inval_done_handler() - TLB invalidation done handler
> > + * @guc: guc
> > + * @msg: message indicating TLB invalidation done
> > + * @len: length of message
> > + *
> > + * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
> > + * invalidation fences for seqno. Algorithm for this depends on seqno being
> > + * received in-order and asserts this assumption.
> > + *
> > + * Return: 0 on success, -EPROTO for malformed messages.
> > + */
> > +int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> > +{
> > +	struct xe_gt *gt = guc_to_gt(guc);
> > +
> > +	if (unlikely(len != 1))
> > +		return -EPROTO;
> > +
> > +	xe_tlb_inval_done_handler(&gt->tlb_inval, msg[0]);
> > +
> > +	return 0;
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.h b/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> > new file mode 100644
> > index 000000000000..07d668b02e3d
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> > @@ -0,0 +1,19 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#ifndef _XE_GUC_TLB_INVAL_H_
> > +#define _XE_GUC_TLB_INVAL_H_
> > +
> > +#include <linux/types.h>
> > +
> > +struct xe_guc;
> > +struct xe_tlb_inval;
> > +
> > +void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> > +				 struct xe_tlb_inval *tlb_inval);
> > +
> > +int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
> > +
> > +#endif
> > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
> > index c795b78362bf..071c25fbdbac 100644
> > --- a/drivers/gpu/drm/xe/xe_tlb_inval.c
> > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
> > @@ -12,50 +12,45 @@
> >  #include "xe_gt_printk.h"
> >  #include "xe_guc.h"
> >  #include "xe_guc_ct.h"
> > +#include "xe_guc_tlb_inval.h"
> >  #include "xe_gt_stats.h"
> >  #include "xe_tlb_inval.h"
> >  #include "xe_mmio.h"
> >  #include "xe_pm.h"
> > -#include "xe_sriov.h"
> > +#include "xe_tlb_inval.h"
> >  #include "xe_trace.h"
> > -#include "regs/xe_guc_regs.h"
> > -
> > -#define FENCE_STACK_BIT		DMA_FENCE_FLAG_USER_BITS
> >  
> > -/*
> > - * TLB inval depends on pending commands in the CT queue and then the real
> > - * invalidation time. Double up the time to process full CT queue
> > - * just to be on the safe side.
> > +/**
> > + * DOC: Xe TLB invalidation
> > + *
> > + * Xe TLB invalidation is implemented in two layers. The first is the frontend
> > + * API, which provides an interface for TLB invalidations to the driver code.
> > + * The frontend handles seqno assignment, synchronization (fences), and the
> > + * timeout mechanism. The frontend is implemented via an embedded structure
> > + * xe_tlb_inval that includes a set of ops hooking into the backend. The backend
> > + * interacts with the hardware (or firmware) to perform the actual invalidation.
> >   */
> > -static long tlb_timeout_jiffies(struct xe_gt *gt)
> > -{
> > -	/* this reflects what HW/GuC needs to process TLB inv request */
> > -	const long hw_tlb_timeout = HZ / 4;
> >  
> > -	/* this estimates actual delay caused by the CTB transport */
> > -	long delay = xe_guc_ct_queue_proc_time_jiffies(&gt->uc.guc.ct);
> > -
> > -	return hw_tlb_timeout + 2 * delay;
> > -}
> > +#define FENCE_STACK_BIT		DMA_FENCE_FLAG_USER_BITS
> >  
> >  static void xe_tlb_inval_fence_fini(struct xe_tlb_inval_fence *fence)
> >  {
> > -	struct xe_gt *gt;
> > -
> >  	if (WARN_ON_ONCE(!fence->tlb_inval))
> >  		return;
> >  
> > -	gt = fence->tlb_inval->private;
> > -	xe_pm_runtime_put(gt_to_xe(gt));
> > +	xe_pm_runtime_put(fence->tlb_inval->xe);
> >  	fence->tlb_inval = NULL; /* fini() should be called once */
> >  }
> >  
> >  static void
> > -__inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
> > +xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence)
> >  {
> >  	bool stack = test_bit(FENCE_STACK_BIT, &fence->base.flags);
> >  
> > -	trace_xe_tlb_inval_fence_signal(xe, fence);
> > +	lockdep_assert_held(&fence->tlb_inval->pending_lock);
> > +
> > +	list_del(&fence->link);
> > +	trace_xe_tlb_inval_fence_signal(fence->tlb_inval->xe, fence);
> >  	xe_tlb_inval_fence_fini(fence);
> >  	dma_fence_signal(&fence->base);
> >  	if (!stack)
> > @@ -63,57 +58,50 @@ __inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
> >  }
> >  
> >  static void
> > -inval_fence_signal(struct xe_device *xe, struct xe_tlb_inval_fence *fence)
> > +xe_tlb_inval_fence_signal_unlocked(struct xe_tlb_inval_fence *fence)
> >  {
> > -	lockdep_assert_held(&fence->tlb_inval->pending_lock);
> > -
> > -	list_del(&fence->link);
> > -	__inval_fence_signal(xe, fence);
> > -}
> > +	struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
> >  
> > -static void
> > -inval_fence_signal_unlocked(struct xe_device *xe,
> > -			    struct xe_tlb_inval_fence *fence)
> > -{
> > -	spin_lock_irq(&fence->tlb_inval->pending_lock);
> > -	inval_fence_signal(xe, fence);
> > -	spin_unlock_irq(&fence->tlb_inval->pending_lock);
> > +	spin_lock_irq(&tlb_inval->pending_lock);
> > +	xe_tlb_inval_fence_signal(fence);
> > +	spin_unlock_irq(&tlb_inval->pending_lock);
> >  }
> >  
> > -static void xe_gt_tlb_fence_timeout(struct work_struct *work)
> > +static void xe_tlb_inval_fence_timeout(struct work_struct *work)
> >  {
> > -	struct xe_gt *gt = container_of(work, struct xe_gt,
> > -					tlb_inval.fence_tdr.work);
> > -	struct xe_device *xe = gt_to_xe(gt);
> > +	struct xe_tlb_inval *tlb_inval = container_of(work, struct xe_tlb_inval,
> > +						      fence_tdr.work);
> > +	struct xe_device *xe = tlb_inval->xe;
> >  	struct xe_tlb_inval_fence *fence, *next;
> > +	long timeout_delay = tlb_inval->ops->timeout_delay(tlb_inval);
> >  
> > -	LNL_FLUSH_WORK(&gt->uc.guc.ct.g2h_worker);
> > +	tlb_inval->ops->flush(tlb_inval);
> >  
> > -	spin_lock_irq(&gt->tlb_inval.pending_lock);
> > +	spin_lock_irq(&tlb_inval->pending_lock);
> >  	list_for_each_entry_safe(fence, next,
> > -				 &gt->tlb_inval.pending_fences, link) {
> > +				 &tlb_inval->pending_fences, link) {
> >  		s64 since_inval_ms = ktime_ms_delta(ktime_get(),
> >  						    fence->inval_time);
> >  
> > -		if (msecs_to_jiffies(since_inval_ms) < tlb_timeout_jiffies(gt))
> > +		if (msecs_to_jiffies(since_inval_ms) < timeout_delay)
> >  			break;
> >  
> >  		trace_xe_tlb_inval_fence_timeout(xe, fence);
> > -		xe_gt_err(gt, "TLB invalidation fence timeout, seqno=%d recv=%d",
> > -			  fence->seqno, gt->tlb_inval.seqno_recv);
> > +		drm_err(&xe->drm,
> > +			"TLB invalidation fence timeout, seqno=%d recv=%d",
> > +			fence->seqno, tlb_inval->seqno_recv);
> >  
> >  		fence->base.error = -ETIME;
> > -		inval_fence_signal(xe, fence);
> > +		xe_tlb_inval_fence_signal(fence);
> >  	}
> > -	if (!list_empty(&gt->tlb_inval.pending_fences))
> > -		queue_delayed_work(system_wq,
> > -				   &gt->tlb_inval.fence_tdr,
> > -				   tlb_timeout_jiffies(gt));
> > -	spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > +	if (!list_empty(&tlb_inval->pending_fences))
> > +		queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
> > +				   timeout_delay);
> > +	spin_unlock_irq(&tlb_inval->pending_lock);
> >  }
> >  
> >  /**
> > - * xe_tlb_inval_init_early - Initialize TLB invalidation state
> > + * xe_gt_tlb_inval_init_early() - Initialize TLB invalidation state
> >   * @gt: GT structure
> >   *
> >   * Initialize TLB invalidation state, purely software initialization, should
> > @@ -123,13 +111,12 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
> >   */
> >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
> >  {
> > -	gt->tlb_inval.private = gt;
> > +	gt->tlb_inval.xe = gt_to_xe(gt);
> >  	gt->tlb_inval.seqno = 1;
> >  	INIT_LIST_HEAD(&gt->tlb_inval.pending_fences);
> >  	spin_lock_init(&gt->tlb_inval.pending_lock);
> >  	spin_lock_init(&gt->tlb_inval.lock);
> > -	INIT_DELAYED_WORK(&gt->tlb_inval.fence_tdr,
> > -			  xe_gt_tlb_fence_timeout);
> > +	INIT_DELAYED_WORK(&gt->tlb_inval.fence_tdr, xe_tlb_inval_fence_timeout);
> >  
> >  	gt->tlb_inval.job_wq =
> >  		drmm_alloc_ordered_workqueue(&gt_to_xe(gt)->drm, "gt-tbl-inval-job-wq",
> > @@ -137,60 +124,64 @@ int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
> >  	if (IS_ERR(gt->tlb_inval.job_wq))
> >  		return PTR_ERR(gt->tlb_inval.job_wq);
> >  
> > +	/* XXX: Blindly setting up backend to GuC */
> > +	xe_guc_tlb_inval_init_early(&gt->uc.guc, &gt->tlb_inval);
> > +
> >  	return 0;
> >  }
> >  
> >  /**
> > - * xe_tlb_inval_reset - Initialize TLB invalidation reset
> > + * xe_tlb_inval_reset() - TLB invalidation reset
> >   * @tlb_inval: TLB invalidation client
> >   *
> >   * Signal any pending invalidation fences, should be called during a GT reset
> >   */
> >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval)
> >  {
> > -	struct xe_gt *gt = tlb_inval->private;
> >  	struct xe_tlb_inval_fence *fence, *next;
> >  	int pending_seqno;
> >  
> >  	/*
> > -	 * we can get here before the CTs are even initialized if we're wedging
> > -	 * very early, in which case there are not going to be any pending
> > -	 * fences so we can bail immediately.
> > +	 * we can get here before the backends are even initialized if we're
> > +	 * wedging very early, in which case there are not going to be any
> > +	 * pendind fences so we can bail immediately.
> >  	 */
> > -	if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
> > +	if (!tlb_inval->ops->initialized(tlb_inval))
> >  		return;
> >  
> >  	/*
> > -	 * CT channel is already disabled at this point. No new TLB requests can
> > +	 * Backend is already disabled at this point. No new TLB requests can
> >  	 * appear.
> >  	 */
> >  
> > -	mutex_lock(&gt->uc.guc.ct.lock);
> > -	spin_lock_irq(&gt->tlb_inval.pending_lock);
> > -	cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > +	tlb_inval->ops->lock(tlb_inval);
> 
> I think you want a dedicated lock embedded in struct xe_tlb_inval,
> rather than reaching into the backend to grab one.
> 
> This will deadlock as written: G2H TLB inval messages are sometimes
> processed while holding ct->lock (non-fast path, unlikely) and sometimes
> without it (fast path, likely).

Ugh, I'm off today. Ignore the deadlock part, I was confusing myself...
I was thinking this was the function xe_tlb_inval_done_handler, it is
not. I still think xe_tlb_inval should its own lock but this patch
written should work with s/xe_guc_ct_send/xe_guc_ct_send_locked.

Matt 

> 
> I’d call this lock seqno_lock, since it protects exactly that—the order
> in which a seqno is assigned by the frontend and handed to the backend.
> 
> Prime this lock for reclaim as well—do what primelockdep() does in
> xe_guc_ct.c—to make it clear that memory allocations are not allowed
> while the lock is held as TLB invalidations can be called from two
> reclaim paths:
> 
> - MMU notifier callbacks
> - The dma-fence signaling path of VM binds that require a TLB
>   invalidation
> 
> > +	spin_lock_irq(&tlb_inval->pending_lock);
> > +	cancel_delayed_work(&tlb_inval->fence_tdr);
> >  	/*
> >  	 * We might have various kworkers waiting for TLB flushes to complete
> >  	 * which are not tracked with an explicit TLB fence, however at this
> > -	 * stage that will never happen since the CT is already disabled, so
> > -	 * make sure we signal them here under the assumption that we have
> > +	 * stage that will never happen since the backend is already disabled,
> > +	 * so make sure we signal them here under the assumption that we have
> >  	 * completed a full GT reset.
> >  	 */
> > -	if (gt->tlb_inval.seqno == 1)
> > +	if (tlb_inval->seqno == 1)
> >  		pending_seqno = TLB_INVALIDATION_SEQNO_MAX - 1;
> >  	else
> > -		pending_seqno = gt->tlb_inval.seqno - 1;
> > -	WRITE_ONCE(gt->tlb_inval.seqno_recv, pending_seqno);
> > +		pending_seqno = tlb_inval->seqno - 1;
> > +	WRITE_ONCE(tlb_inval->seqno_recv, pending_seqno);
> >  
> >  	list_for_each_entry_safe(fence, next,
> > -				 &gt->tlb_inval.pending_fences, link)
> > -		inval_fence_signal(gt_to_xe(gt), fence);
> > -	spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > -	mutex_unlock(&gt->uc.guc.ct.lock);
> > +				 &tlb_inval->pending_fences, link)
> > +		xe_tlb_inval_fence_signal(fence);
> > +	spin_unlock_irq(&tlb_inval->pending_lock);
> > +	tlb_inval->ops->unlock(tlb_inval);
> >  }
> >  
> > -static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
> > +static bool xe_tlb_inval_seqno_past(struct xe_tlb_inval *tlb_inval, int seqno)
> >  {
> > -	int seqno_recv = READ_ONCE(gt->tlb_inval.seqno_recv);
> > +	int seqno_recv = READ_ONCE(tlb_inval->seqno_recv);
> > +
> > +	lockdep_assert_held(&tlb_inval->pending_lock);
> >  
> >  	if (seqno - seqno_recv < -(TLB_INVALIDATION_SEQNO_MAX / 2))
> >  		return false;
> > @@ -201,44 +192,20 @@ static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
> >  	return seqno_recv >= seqno;
> >  }
> >  
> > -static int send_tlb_inval(struct xe_guc *guc, const u32 *action, int len)
> > -{
> > -	struct xe_gt *gt = guc_to_gt(guc);
> > -
> > -	xe_gt_assert(gt, action[1]);	/* Seqno */
> > -	lockdep_assert_held(&guc->ct.lock);
> > -
> > -	/*
> > -	 * XXX: The seqno algorithm relies on TLB invalidation being processed
> > -	 * in order which they currently are, if that changes the algorithm will
> > -	 * need to be updated.
> > -	 */
> > -
> > -	xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> > -
> > -	return xe_guc_ct_send(&guc->ct, action, len,
> > -			      G2H_LEN_DW_TLB_INVALIDATE, 1);
> > -}
> > -
> >  static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence *fence)
> >  {
> >  	struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
> > -	struct xe_gt *gt = tlb_inval->private;
> > -	struct xe_device *xe = gt_to_xe(gt);
> > -
> > -	lockdep_assert_held(&gt->uc.guc.ct.lock);
> >  
> >  	fence->seqno = tlb_inval->seqno;
> > -	trace_xe_tlb_inval_fence_send(xe, fence);
> > +	trace_xe_tlb_inval_fence_send(tlb_inval->xe, fence);
> >  
> >  	spin_lock_irq(&tlb_inval->pending_lock);
> >  	fence->inval_time = ktime_get();
> >  	list_add_tail(&fence->link, &tlb_inval->pending_fences);
> >  
> >  	if (list_is_singular(&tlb_inval->pending_fences))
> > -		queue_delayed_work(system_wq,
> > -				   &tlb_inval->fence_tdr,
> > -				   tlb_timeout_jiffies(gt));
> > +		queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
> > +				   tlb_inval->ops->timeout_delay(tlb_inval));
> >  	spin_unlock_irq(&tlb_inval->pending_lock);
> >  
> >  	tlb_inval->seqno = (tlb_inval->seqno + 1) %
> > @@ -247,202 +214,63 @@ static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence *fence)
> >  		tlb_inval->seqno = 1;
> >  }
> >  
> > -#define MAKE_INVAL_OP(type)	((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > -		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > -		XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > -
> > -static int send_tlb_inval_ggtt(struct xe_gt *gt, int seqno)
> > -{
> > -	u32 action[] = {
> > -		XE_GUC_ACTION_TLB_INVALIDATION,
> > -		seqno,
> > -		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > -	};
> > -
> > -	return send_tlb_inval(&gt->uc.guc, action, ARRAY_SIZE(action));
> > -}
> > -
> > -static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > -			      struct xe_tlb_inval_fence *fence)
> > -{
> > -	u32 action[] = {
> > -		XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > -		0,  /* seqno, replaced in send_tlb_inval */
> > -		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > -	};
> > -	struct xe_gt *gt = tlb_inval->private;
> > -
> > -	xe_gt_assert(gt, fence);
> > -
> > -	return send_tlb_inval(&gt->uc.guc, action, ARRAY_SIZE(action));
> > -}
> > +#define xe_tlb_inval_issue(__tlb_inval, __fence, op, args...)	\
> > +({								\
> > +	int __ret;						\
> > +								\
> > +	xe_assert((__tlb_inval)->xe, (__tlb_inval)->ops);	\
> > +	xe_assert((__tlb_inval)->xe, (__fence));		\
> > +								\
> > +	(__tlb_inval)->ops->lock((__tlb_inval));		\
> > +	xe_tlb_inval_fence_prep((__fence));			\
> > +	__ret = op((__tlb_inval), (__fence)->seqno, ##args);	\
> > +	if (__ret < 0)						\
> > +		xe_tlb_inval_fence_signal_unlocked((__fence));	\
> > +	(__tlb_inval)->ops->unlock((__tlb_inval));		\
> > +								\
> > +	__ret == -ECANCELED ? 0 : __ret;			\
> > +})
> >  
> >  /**
> > - * xe_gt_tlb_invalidation_all - Invalidate all TLBs across PF and all VFs.
> > - * @gt: the &xe_gt structure
> > - * @fence: the &xe_tlb_inval_fence to be signaled on completion
> > + * xe_tlb_inval_all() - Issue a TLB invalidation for all TLBs
> > + * @tlb_inval: TLB invalidation client
> > + * @fence: invalidation fence which will be signal on TLB invalidation
> > + * completion
> >   *
> > - * Send a request to invalidate all TLBs across PF and all VFs.
> > + * Issue a TLB invalidation for all TLBs. Completion of TLB is asynchronous and
> > + * caller can use the invalidation fence to wait for completion.
> >   *
> >   * Return: 0 on success, negative error code on error
> >   */
> >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> >  		     struct xe_tlb_inval_fence *fence)
> >  {
> > -	struct xe_gt *gt = tlb_inval->private;
> > -	int err;
> > -
> > -	err = send_tlb_inval_all(tlb_inval, fence);
> > -	if (err)
> > -		xe_gt_err(gt, "TLB invalidation request failed (%pe)", ERR_PTR(err));
> > -
> > -	return err;
> > -}
> > -
> > -/*
> > - * Ensure that roundup_pow_of_two(length) doesn't overflow.
> > - * Note that roundup_pow_of_two() operates on unsigned long,
> > - * not on u64.
> > - */
> > -#define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
> > -
> > -static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64 start, u64 end,
> > -				u32 asid, int seqno)
> > -{
> > -#define MAX_TLB_INVALIDATION_LEN	7
> > -	u32 action[MAX_TLB_INVALIDATION_LEN];
> > -	u64 length = end - start;
> > -	int len = 0;
> > -
> > -	action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > -	action[len++] = seqno;
> > -	if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > -	    length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > -		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > -	} else {
> > -		u64 orig_start = start;
> > -		u64 align;
> > -
> > -		if (length < SZ_4K)
> > -			length = SZ_4K;
> > -
> > -		/*
> > -		 * We need to invalidate a higher granularity if start address
> > -		 * is not aligned to length. When start is not aligned with
> > -		 * length we need to find the length large enough to create an
> > -		 * address mask covering the required range.
> > -		 */
> > -		align = roundup_pow_of_two(length);
> > -		start = ALIGN_DOWN(start, align);
> > -		end = ALIGN(end, align);
> > -		length = align;
> > -		while (start + length < end) {
> > -			length <<= 1;
> > -			start = ALIGN_DOWN(orig_start, length);
> > -		}
> > -
> > -		/*
> > -		 * Minimum invalidation size for a 2MB page that the hardware
> > -		 * expects is 16MB
> > -		 */
> > -		if (length >= SZ_2M) {
> > -			length = max_t(u64, SZ_16M, length);
> > -			start = ALIGN_DOWN(orig_start, length);
> > -		}
> > -
> > -		xe_gt_assert(gt, length >= SZ_4K);
> > -		xe_gt_assert(gt, is_power_of_2(length));
> > -		xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1,
> > -						    ilog2(SZ_2M) + 1)));
> > -		xe_gt_assert(gt, IS_ALIGNED(start, length));
> > -
> > -		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > -		action[len++] = asid;
> > -		action[len++] = lower_32_bits(start);
> > -		action[len++] = upper_32_bits(start);
> > -		action[len++] = ilog2(length) - ilog2(SZ_4K);
> > -	}
> > -
> > -	xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> > -
> > -	return send_tlb_inval(&gt->uc.guc, action, len);
> > -}
> > -
> > -static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
> > -			       struct xe_tlb_inval_fence *fence)
> > -{
> > -	int ret;
> > -
> > -	mutex_lock(&gt->uc.guc.ct.lock);
> > -
> > -	xe_tlb_inval_fence_prep(fence);
> > -
> > -	ret = send_tlb_inval_ggtt(gt, fence->seqno);
> > -	if (ret < 0)
> > -		inval_fence_signal_unlocked(gt_to_xe(gt), fence);
> > -
> > -	mutex_unlock(&gt->uc.guc.ct.lock);
> > -
> > -	/*
> > -	 * -ECANCELED indicates the CT is stopped for a GT reset. TLB caches
> > -	 *  should be nuked on a GT reset so this error can be ignored.
> > -	 */
> > -	if (ret == -ECANCELED)
> > -		return 0;
> > -
> > -	return ret;
> > +	return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval->ops->all);
> >  }
> >  
> >  /**
> > - * xe_tlb_inval_ggtt - Issue a TLB invalidation on this GT for the GGTT
> > + * xe_tlb_inval_ggtt() - Issue a TLB invalidation for the GGTT
> >   * @tlb_inval: TLB invalidation client
> >   *
> > - * Issue a TLB invalidation for the GGTT. Completion of TLB invalidation is
> > - * synchronous.
> > + * Issue a TLB invalidation for the GGTT. Completion of TLB is asynchronous and
> > + * caller can use the invalidation fence to wait for completion.
> >   *
> >   * Return: 0 on success, negative error code on error
> >   */
> >  int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
> >  {
> > -	struct xe_gt *gt = tlb_inval->private;
> > -	struct xe_device *xe = gt_to_xe(gt);
> > -	unsigned int fw_ref;
> > -
> > -	if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
> > -	    gt->uc.guc.submission_state.enabled) {
> > -		struct xe_tlb_inval_fence fence;
> > -		int ret;
> > -
> > -		xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> > -		ret = __xe_tlb_inval_ggtt(gt, &fence);
> > -		if (ret)
> > -			return ret;
> > -
> > -		xe_tlb_inval_fence_wait(&fence);
> > -	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
> > -		struct xe_mmio *mmio = &gt->mmio;
> > -
> > -		if (IS_SRIOV_VF(xe))
> > -			return 0;
> > -
> > -		fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
> > -		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
> > -			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC1,
> > -					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
> > -			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC0,
> > -					PVC_GUC_TLB_INV_DESC0_VALID);
> > -		} else {
> > -			xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> > -					GUC_TLB_INV_CR_INVALIDATE);
> > -		}
> > -		xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > -	}
> > +	struct xe_tlb_inval_fence fence, *fence_ptr = &fence;
> > +	int ret;
> >  
> > -	return 0;
> > +	xe_tlb_inval_fence_init(tlb_inval, fence_ptr, true);
> > +	ret = xe_tlb_inval_issue(tlb_inval, fence_ptr, tlb_inval->ops->ggtt);
> > +	xe_tlb_inval_fence_wait(fence_ptr);
> > +
> > +	return ret;
> >  }
> >  
> >  /**
> > - * xe_tlb_inval_range - Issue a TLB invalidation on this GT for an address range
> > + * xe_tlb_inval_range() - Issue a TLB invalidation for an address range
> >   * @tlb_inval: TLB invalidation client
> >   * @fence: invalidation fence which will be signal on TLB invalidation
> >   * completion
> > @@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
> >  		       struct xe_tlb_inval_fence *fence, u64 start, u64 end,
> >  		       u32 asid)
> >  {
> > -	struct xe_gt *gt = tlb_inval->private;
> > -	struct xe_device *xe = gt_to_xe(gt);
> > -	int  ret;
> > -
> > -	xe_gt_assert(gt, fence);
> > -
> > -	/* Execlists not supported */
> > -	if (xe->info.force_execlist) {
> > -		__inval_fence_signal(xe, fence);
> > -		return 0;
> > -	}
> > -
> > -	mutex_lock(&gt->uc.guc.ct.lock);
> > -
> > -	xe_tlb_inval_fence_prep(fence);
> > -
> > -	ret = send_tlb_inval_ppgtt(gt, start, end, asid, fence->seqno);
> > -	if (ret < 0)
> > -		inval_fence_signal_unlocked(xe, fence);
> > -
> > -	mutex_unlock(&gt->uc.guc.ct.lock);
> > -
> > -	return ret;
> > +	return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval->ops->ppgtt,
> > +				  start, end, asid);
> >  }
> >  
> >  /**
> > - * xe_tlb_inval_vm - Issue a TLB invalidation on this GT for a VM
> > + * xe_tlb_inval_vm() - Issue a TLB invalidation for a VM
> >   * @tlb_inval: TLB invalidation client
> >   * @vm: VM to invalidate
> >   *
> > @@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm)
> >  {
> >  	struct xe_tlb_inval_fence fence;
> >  	u64 range = 1ull << vm->xe->info.va_bits;
> > -	int ret;
> >  
> >  	xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> > -
> > -	ret = xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm->usm.asid);
> > -	if (ret < 0)
> > -		return;
> > -
> > +	xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm->usm.asid);
> >  	xe_tlb_inval_fence_wait(&fence);
> >  }
> >  
> >  /**
> > - * xe_tlb_inval_done_handler - TLB invalidation done handler
> > - * @gt: gt
> > + * xe_tlb_inval_done_handler() - TLB invalidation done handler
> > + * @tlb_inval: TLB invalidation client
> >   * @seqno: seqno of invalidation that is done
> >   *
> >   * Update recv seqno, signal any TLB invalidation fences, and restart TDR
> 
> I'd mention that is function is safe be called from any context (i.e.,
> process, atomic, and hardirq contexts are allowed).
> 
> We might need to convert tlb_inval.pending_lock to a raw_spinlock_t for
> PREEMPT_RT enablement. Same for the GuC fast_lock. AFAIK we haven’t had
> any complaints, so maybe I’m just overthinking it, but also perhaps not.
> 
> >   */
> > -static void xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> > +void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno)
> >  {
> > -	struct xe_device *xe = gt_to_xe(gt);
> > +	struct xe_device *xe = tlb_inval->xe;
> >  	struct xe_tlb_inval_fence *fence, *next;
> >  	unsigned long flags;
> >  
> > @@ -535,77 +337,53 @@ static void xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> >  	 * officially process the CT message like if racing against
> >  	 * process_g2h_msg().
> >  	 */
> > -	spin_lock_irqsave(&gt->tlb_inval.pending_lock, flags);
> > -	if (tlb_inval_seqno_past(gt, seqno)) {
> > -		spin_unlock_irqrestore(&gt->tlb_inval.pending_lock, flags);
> > +	spin_lock_irqsave(&tlb_inval->pending_lock, flags);
> > +	if (xe_tlb_inval_seqno_past(tlb_inval, seqno)) {
> > +		spin_unlock_irqrestore(&tlb_inval->pending_lock, flags);
> >  		return;
> >  	}
> >  
> > -	WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
> > +	WRITE_ONCE(tlb_inval->seqno_recv, seqno);
> >  
> >  	list_for_each_entry_safe(fence, next,
> > -				 &gt->tlb_inval.pending_fences, link) {
> > +				 &tlb_inval->pending_fences, link) {
> >  		trace_xe_tlb_inval_fence_recv(xe, fence);
> >  
> > -		if (!tlb_inval_seqno_past(gt, fence->seqno))
> > +		if (!xe_tlb_inval_seqno_past(tlb_inval, fence->seqno))
> >  			break;
> >  
> > -		inval_fence_signal(xe, fence);
> > +		xe_tlb_inval_fence_signal(fence);
> >  	}
> >  
> > -	if (!list_empty(&gt->tlb_inval.pending_fences))
> > +	if (!list_empty(&tlb_inval->pending_fences))
> >  		mod_delayed_work(system_wq,
> > -				 &gt->tlb_inval.fence_tdr,
> > -				 tlb_timeout_jiffies(gt));
> > +				 &tlb_inval->fence_tdr,
> > +				 tlb_inval->ops->timeout_delay(tlb_inval));
> >  	else
> > -		cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > +		cancel_delayed_work(&tlb_inval->fence_tdr);
> >  
> > -	spin_unlock_irqrestore(&gt->tlb_inval.pending_lock, flags);
> > -}
> > -
> > -/**
> > - * xe_guc_tlb_inval_done_handler - TLB invalidation done handler
> > - * @guc: guc
> > - * @msg: message indicating TLB invalidation done
> > - * @len: length of message
> > - *
> > - * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
> > - * invalidation fences for seqno. Algorithm for this depends on seqno being
> > - * received in-order and asserts this assumption.
> > - *
> > - * Return: 0 on success, -EPROTO for malformed messages.
> > - */
> > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> > -{
> > -	struct xe_gt *gt = guc_to_gt(guc);
> > -
> > -	if (unlikely(len != 1))
> > -		return -EPROTO;
> > -
> > -	xe_tlb_inval_done_handler(gt, msg[0]);
> > -
> > -	return 0;
> > +	spin_unlock_irqrestore(&tlb_inval->pending_lock, flags);
> >  }
> >  
> >  static const char *
> > -inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > +xe_inval_fence_get_driver_name(struct dma_fence *dma_fence)
> >  {
> >  	return "xe";
> >  }
> >  
> >  static const char *
> > -inval_fence_get_timeline_name(struct dma_fence *dma_fence)
> > +xe_inval_fence_get_timeline_name(struct dma_fence *dma_fence)
> >  {
> > -	return "inval_fence";
> > +	return "tlb_inval_fence";
> >  }
> >  
> >  static const struct dma_fence_ops inval_fence_ops = {
> > -	.get_driver_name = inval_fence_get_driver_name,
> > -	.get_timeline_name = inval_fence_get_timeline_name,
> > +	.get_driver_name = xe_inval_fence_get_driver_name,
> > +	.get_timeline_name = xe_inval_fence_get_timeline_name,
> >  };
> >  
> >  /**
> > - * xe_tlb_inval_fence_init - Initialize TLB invalidation fence
> > + * xe_tlb_inval_fence_init() - Initialize TLB invalidation fence
> >   * @tlb_inval: TLB invalidation client
> >   * @fence: TLB invalidation fence to initialize
> >   * @stack: fence is stack variable
> > @@ -618,15 +396,12 @@ void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
> >  			     struct xe_tlb_inval_fence *fence,
> >  			     bool stack)
> >  {
> > -	struct xe_gt *gt = tlb_inval->private;
> > -
> > -	xe_pm_runtime_get_noresume(gt_to_xe(gt));
> > +	xe_pm_runtime_get_noresume(tlb_inval->xe);
> >  
> > -	spin_lock_irq(&gt->tlb_inval.lock);
> > -	dma_fence_init(&fence->base, &inval_fence_ops,
> > -		       &gt->tlb_inval.lock,
> > +	spin_lock_irq(&tlb_inval->lock);
> > +	dma_fence_init(&fence->base, &inval_fence_ops, &tlb_inval->lock,
> >  		       dma_fence_context_alloc(1), 1);
> > -	spin_unlock_irq(&gt->tlb_inval.lock);
> > +	spin_unlock_irq(&tlb_inval->lock);
> 
> While here, 'fence_lock' is probably a better name.
> 
> Matt
> 
> >  	INIT_LIST_HEAD(&fence->link);
> >  	if (stack)
> >  		set_bit(FENCE_STACK_BIT, &fence->base.flags);
> > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > index 7adee3f8c551..cdeafc8d4391 100644
> > --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > @@ -18,24 +18,30 @@ struct xe_vma;
> >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
> >  
> >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval);
> > -int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > -void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm);
> >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> >  		     struct xe_tlb_inval_fence *fence);
> > +int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > +void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm);
> >  int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
> >  		       struct xe_tlb_inval_fence *fence,
> >  		       u64 start, u64 end, u32 asid);
> > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
> >  
> >  void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
> >  			     struct xe_tlb_inval_fence *fence,
> >  			     bool stack);
> > -void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence);
> >  
> > +/**
> > + * xe_tlb_inval_fence_wait() - TLB invalidiation fence wait
> > + * @fence: TLB invalidation fence to wait on
> > + *
> > + * Wait on a TLB invalidiation fence until it signals, non interruptable
> > + */
> >  static inline void
> >  xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence *fence)
> >  {
> >  	dma_fence_wait(&fence->base, false);
> >  }
> >  
> > +void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno);
> > +
> >  #endif	/* _XE_TLB_INVAL_ */
> > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > index 05b6adc929bb..c1ad96d24fc8 100644
> > --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > @@ -9,10 +9,85 @@
> >  #include <linux/workqueue.h>
> >  #include <linux/dma-fence.h>
> >  
> > -/** struct xe_tlb_inval - TLB invalidation client */
> > +struct xe_tlb_inval;
> > +
> > +/** struct xe_tlb_inval_ops - TLB invalidation ops (backend) */
> > +struct xe_tlb_inval_ops {
> > +	/**
> > +	 * @all: Invalidate all TLBs
> > +	 * @tlb_inval: TLB invalidation client
> > +	 * @seqno: Seqno of TLB invalidation
> > +	 *
> > +	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
> > +	 * failure
> > +	 */
> > +	int (*all)(struct xe_tlb_inval *tlb_inval, u32 seqno);
> > +
> > +	/**
> > +	 * @ggtt: Invalidate global translation TLBs
> > +	 * @tlb_inval: TLB invalidation client
> > +	 * @seqno: Seqno of TLB invalidation
> > +	 *
> > +	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
> > +	 * failure
> > +	 */
> > +	int (*ggtt)(struct xe_tlb_inval *tlb_inval, u32 seqno);
> > +
> > +	/**
> > +	 * @ppttt: Invalidate per-process translation TLBs
> > +	 * @tlb_inval: TLB invalidation client
> > +	 * @seqno: Seqno of TLB invalidation
> > +	 * @start: Start address
> > +	 * @end: End address
> > +	 * @asid: Address space ID
> > +	 *
> > +	 * Return 0 on success, -ECANCELED if backend is mid-reset, error on
> > +	 * failure
> > +	 */
> > +	int (*ppgtt)(struct xe_tlb_inval *tlb_inval, u32 seqno, u64 start,
> > +		     u64 end, u32 asid);
> > +
> > +	/**
> > +	 * @initialized: Backend is initialized
> > +	 * @tlb_inval: TLB invalidation client
> > +	 *
> > +	 * Return: True if back is initialized, False otherwise
> > +	 */
> > +	bool (*initialized)(struct xe_tlb_inval *tlb_inval);
> > +
> > +	/**
> > +	 * @flush: Flush pending TLB invalidations
> > +	 * @tlb_inval: TLB invalidation client
> > +	 */
> > +	void (*flush)(struct xe_tlb_inval *tlb_inval);
> > +
> > +	/**
> > +	 * @timeout_delay: Timeout delay for TLB invalidation
> > +	 * @tlb_inval: TLB invalidation client
> > +	 *
> > +	 * Return: Timeout delay for TLB invalidation in jiffies
> > +	 */
> > +	long (*timeout_delay)(struct xe_tlb_inval *tlb_inval);
> > +
> > +	/**
> > +	 * @lock: Lock resources protecting the backend seqno management
> > +	 */
> > +	void (*lock)(struct xe_tlb_inval *tlb_inval);
> > +
> > +	/**
> > +	 * @unlock: Lock resources protecting the backend seqno management
> > +	 */
> > +	void (*unlock)(struct xe_tlb_inval *tlb_inval);
> > +};
> > +
> > +/** struct xe_tlb_inval - TLB invalidation client (frontend) */
> >  struct xe_tlb_inval {
> >  	/** @private: Backend private pointer */
> >  	void *private;
> > +	/** @xe: Pointer to Xe device */
> > +	struct xe_device *xe;
> > +	/** @ops: TLB invalidation ops */
> > +	const struct xe_tlb_inval_ops *ops;
> >  	/** @tlb_inval.seqno: TLB invalidation seqno, protected by CT lock */
> >  #define TLB_INVALIDATION_SEQNO_MAX	0x100000
> >  	int seqno;
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend
  2025-07-23 20:18     ` Matthew Brost
@ 2025-07-23 20:20       ` Summers, Stuart
  2025-07-23 20:47         ` Matthew Brost
  0 siblings, 1 reply; 19+ messages in thread
From: Summers, Stuart @ 2025-07-23 20:20 UTC (permalink / raw)
  To: Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
	Kassabri, Farah, Auld, Matthew

On Wed, 2025-07-23 at 13:18 -0700, Matthew Brost wrote:
> On Wed, Jul 23, 2025 at 12:17:49PM -0700, Matthew Brost wrote:
> > On Wed, Jul 23, 2025 at 06:22:22PM +0000, stuartsummers wrote:
> > > From: Matthew Brost <matthew.brost@intel.com>
> > > 
> > > The frontend exposes an API to the driver to send invalidations,
> > > handles
> > > sequence number assignment, synchronization (fences), and
> > > provides a
> > > timeout mechanism. The backend issues the actual invalidation to
> > > the
> > > hardware (or firmware).
> > > 
> > > The new layering easily allows issuing TLB invalidations to
> > > different
> > > hardware or firmware interfaces.
> > > 
> > > Normalize some naming while here too.
> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/Makefile             |   1 +
> > >  drivers/gpu/drm/xe/xe_guc_ct.c          |   2 +-
> > >  drivers/gpu/drm/xe/xe_guc_tlb_inval.c   | 263 +++++++++++++
> > >  drivers/gpu/drm/xe/xe_guc_tlb_inval.h   |  19 +
> > >  drivers/gpu/drm/xe/xe_tlb_inval.c       | 495 +++++++-----------
> > > ------
> > >  drivers/gpu/drm/xe/xe_tlb_inval.h       |  14 +-
> > >  drivers/gpu/drm/xe/xe_tlb_inval_types.h |  77 +++-
> > >  7 files changed, 505 insertions(+), 366 deletions(-)
> > >  create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > >  create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> > > 
> > > diff --git a/drivers/gpu/drm/xe/Makefile
> > > b/drivers/gpu/drm/xe/Makefile
> > > index 332b2057cc00..8a2f836b3ab2 100644
> > > --- a/drivers/gpu/drm/xe/Makefile
> > > +++ b/drivers/gpu/drm/xe/Makefile
> > > @@ -75,6 +75,7 @@ xe-y += xe_bb.o \
> > >         xe_guc_log.o \
> > >         xe_guc_pc.o \
> > >         xe_guc_submit.o \
> > > +       xe_guc_tlb_inval.o \
> > >         xe_heci_gsc.o \
> > >         xe_huc.o \
> > >         xe_hw_engine.o \
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > index 2ef86c0ae8b4..90ebda5b3790 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > @@ -30,9 +30,9 @@
> > >  #include "xe_guc_log.h"
> > >  #include "xe_guc_relay.h"
> > >  #include "xe_guc_submit.h"
> > > +#include "xe_guc_tlb_inval.h"
> > >  #include "xe_map.h"
> > >  #include "xe_pm.h"
> > > -#include "xe_tlb_inval.h"
> > >  #include "xe_trace_guc.h"
> > >  
> > >  static void receive_g2h(struct xe_guc_ct *ct);
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > new file mode 100644
> > > index 000000000000..27d7dc938cb1
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > @@ -0,0 +1,263 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2025 Intel Corporation
> > > + */
> > > +
> > > +#include "abi/guc_actions_abi.h"
> > > +
> > > +#include "xe_device.h"
> > > +#include "xe_gt_stats.h"
> > > +#include "xe_gt_types.h"
> > > +#include "xe_guc.h"
> > > +#include "xe_guc_ct.h"
> > > +#include "xe_guc_tlb_inval.h"
> > > +#include "xe_force_wake.h"
> > > +#include "xe_mmio.h"
> > > +#include "xe_tlb_inval.h"
> > > +
> > > +#include "regs/xe_guc_regs.h"
> > > +
> > > +/*
> > > + * XXX: The seqno algorithm relies on TLB invalidation being
> > > processed in order
> > > + * which they currently are by the GuC, if that changes the
> > > algorithm will need
> > > + * to be updated.
> > > + */
> > > +
> > > +static int send_tlb_inval(struct xe_guc *guc, const u32 *action,
> > > int len)
> > > +{
> > > +       struct xe_gt *gt = guc_to_gt(guc);
> > > +
> > > +       lockdep_assert_held(&guc->ct.lock);
> > > +       xe_gt_assert(gt, action[1]);    /* Seqno */
> > > +
> > > +       xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> > > +       return xe_guc_ct_send(&guc->ct, action, len,
> > > +                             G2H_LEN_DW_TLB_INVALIDATE, 1);
> > 
> > As written, you’d need xe_guc_ct_send_locked here—but you actually
> > don’t. More on that below.
> > 
> > > +}
> > > +
> > > +#define MAKE_INVAL_OP(type)    ((type <<
> > > XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > > +               XE_GUC_TLB_INVAL_MODE_HEAVY <<
> > > XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > > +               XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > > +
> > > +static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > u32 seqno)
> > > +{
> > > +       struct xe_guc *guc = tlb_inval->private;
> > > +       u32 action[] = {
> > > +               XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > > +               seqno,
> > > +               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > +       };
> > > +
> > > +       return send_tlb_inval(guc, action, ARRAY_SIZE(action));
> > > +}
> > > +
> > > +static int send_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval,
> > > u32 seqno)
> > > +{
> > > +       struct xe_guc *guc = tlb_inval->private;
> > > +       struct xe_gt *gt = guc_to_gt(guc);
> > > +       struct xe_device *xe = guc_to_xe(guc);
> > > +
> > > +       lockdep_assert_held(&guc->ct.lock);
> > > +
> > > +       /*
> > > +        * Returning -ECANCELED in this function is squashed at
> > > the caller and
> > > +        * signals waiters.
> > > +        */
> > > +
> > > +       if (xe_guc_ct_enabled(&guc->ct) && guc-
> > > >submission_state.enabled) {
> > > +               u32 action[] = {
> > > +                       XE_GUC_ACTION_TLB_INVALIDATION,
> > > +                       seqno,
> > > +                       MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > > +               };
> > > +
> > > +               return send_tlb_inval(guc, action,
> > > ARRAY_SIZE(action));
> > > +       } else if (xe_device_uc_enabled(xe) &&
> > > !xe_device_wedged(xe)) {
> > > +               struct xe_mmio *mmio = &gt->mmio;
> > > +               unsigned int fw_ref;
> > > +
> > > +               if (IS_SRIOV_VF(xe))
> > > +                       return -ECANCELED;
> > > +
> > > +               fw_ref = xe_force_wake_get(gt_to_fw(gt),
> > > XE_FW_GT);
> > > +               if (xe->info.platform == XE_PVC ||
> > > GRAPHICS_VER(xe) >= 20) {
> > > +                       xe_mmio_write32(mmio,
> > > PVC_GUC_TLB_INV_DESC1,
> > > +                                       PVC_GUC_TLB_INV_DESC1_INV
> > > ALIDATE);
> > > +                       xe_mmio_write32(mmio,
> > > PVC_GUC_TLB_INV_DESC0,
> > > +                                       PVC_GUC_TLB_INV_DESC0_VAL
> > > ID);
> > > +               } else {
> > > +                       xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> > > +                                       GUC_TLB_INV_CR_INVALIDATE
> > > );
> > > +               }
> > > +               xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > > +       }
> > > +
> > > +       return -ECANCELED;
> > > +}
> > > +
> > > +/*
> > > + * Ensure that roundup_pow_of_two(length) doesn't overflow.
> > > + * Note that roundup_pow_of_two() operates on unsigned long,
> > > + * not on u64.
> > > + */
> > > +#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > > (rounddown_pow_of_two(ULONG_MAX))
> > > +
> > > +static int send_tlb_inval_ppgtt(struct xe_tlb_inval *tlb_inval,
> > > u32 seqno,
> > > +                               u64 start, u64 end, u32 asid)
> > > +{
> > > +#define MAX_TLB_INVALIDATION_LEN       7
> > > +       struct xe_guc *guc = tlb_inval->private;
> > > +       struct xe_gt *gt = guc_to_gt(guc);
> > > +       u32 action[MAX_TLB_INVALIDATION_LEN];
> > > +       u64 length = end - start;
> > > +       int len = 0;
> > > +
> > > +       lockdep_assert_held(&guc->ct.lock);
> > > +
> > > +       if (guc_to_xe(guc)->info.force_execlist)
> > > +               return -ECANCELED;
> > > +
> > > +       action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > > +       action[len++] = seqno;
> > > +       if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > +           length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > > +               action[len++] =
> > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > > +       } else {
> > > +               u64 orig_start = start;
> > > +               u64 align;
> > > +
> > > +               if (length < SZ_4K)
> > > +                       length = SZ_4K;
> > > +
> > > +               /*
> > > +                * We need to invalidate a higher granularity if
> > > start address
> > > +                * is not aligned to length. When start is not
> > > aligned with
> > > +                * length we need to find the length large enough
> > > to create an
> > > +                * address mask covering the required range.
> > > +                */
> > > +               align = roundup_pow_of_two(length);
> > > +               start = ALIGN_DOWN(start, align);
> > > +               end = ALIGN(end, align);
> > > +               length = align;
> > > +               while (start + length < end) {
> > > +                       length <<= 1;
> > > +                       start = ALIGN_DOWN(orig_start, length);
> > > +               }
> > > +
> > > +               /*
> > > +                * Minimum invalidation size for a 2MB page that
> > > the hardware
> > > +                * expects is 16MB
> > > +                */
> > > +               if (length >= SZ_2M) {
> > > +                       length = max_t(u64, SZ_16M, length);
> > > +                       start = ALIGN_DOWN(orig_start, length);
> > > +               }
> > > +
> > > +               xe_gt_assert(gt, length >= SZ_4K);
> > > +               xe_gt_assert(gt, is_power_of_2(length));
> > > +               xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M)
> > > - 1,
> > > +                                                   ilog2(SZ_2M)
> > > + 1)));
> > > +               xe_gt_assert(gt, IS_ALIGNED(start, length));
> > > +
> > > +               action[len++] =
> > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > > +               action[len++] = asid;
> > > +               action[len++] = lower_32_bits(start);
> > > +               action[len++] = upper_32_bits(start);
> > > +               action[len++] = ilog2(length) - ilog2(SZ_4K);
> > > +       }
> > > +
> > > +       xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> > > +
> > > +       return send_tlb_inval(guc, action, len);
> > > +}
> > > +
> > > +static bool tlb_inval_initialized(struct xe_tlb_inval
> > > *tlb_inval)
> > > +{
> > > +       struct xe_guc *guc = tlb_inval->private;
> > > +
> > > +       return xe_guc_ct_initialized(&guc->ct);
> > > +}
> > > +
> > > +static void tlb_inval_flush(struct xe_tlb_inval *tlb_inval)
> > > +{
> > > +       struct xe_guc *guc = tlb_inval->private;
> > > +
> > > +       LNL_FLUSH_WORK(&guc->ct.g2h_worker);
> > > +}
> > > +
> > > +static long tlb_inval_timeout_delay(struct xe_tlb_inval
> > > *tlb_inval)
> > > +{
> > > +       struct xe_guc *guc = tlb_inval->private;
> > > +
> > > +       /* this reflects what HW/GuC needs to process TLB inv
> > > request */
> > > +       const long hw_tlb_timeout = HZ / 4;
> > > +
> > > +       /* this estimates actual delay caused by the CTB
> > > transport */
> > > +       long delay = xe_guc_ct_queue_proc_time_jiffies(&guc->ct);
> > > +
> > > +       return hw_tlb_timeout + 2 * delay;
> > > +}
> > > +
> > > +static void tlb_inval_lock(struct xe_tlb_inval *tlb_inval)
> > > +{
> > > +       struct xe_guc *guc = tlb_inval->private;
> > > +
> > > +       mutex_lock(&guc->ct.lock);
> > > +}
> > > +
> > > +static void tlb_inval_unlock(struct xe_tlb_inval *tlb_inval)
> > > +{
> > > +       struct xe_guc *guc = tlb_inval->private;
> > > +
> > > +       mutex_unlock(&guc->ct.lock);
> > > +}
> > > +
> > > +static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
> > > +       .all = send_tlb_inval_all,
> > > +       .ggtt = send_tlb_inval_ggtt,
> > > +       .ppgtt = send_tlb_inval_ppgtt,
> > > +       .initialized = tlb_inval_initialized,
> > > +       .flush = tlb_inval_flush,
> > > +       .timeout_delay = tlb_inval_timeout_delay,
> > > +       .lock = tlb_inval_lock,
> > > +       .unlock = tlb_inval_unlock,
> > > +};
> > > +
> > > +/**
> > > + * xe_guc_tlb_inval_init_early() - Init GuC TLB invalidation
> > > early
> > > + * @guc: GuC object
> > > + * @tlb_inval: TLB invalidation client
> > > + *
> > > + * Inititialize GuC TLB invalidation by setting back pointer in
> > > TLB invalidation
> > > + * client to the GuC and setting GuC backend ops.
> > > + */
> > > +void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> > > +                                struct xe_tlb_inval *tlb_inval)
> > > +{
> > > +       tlb_inval->private = guc;
> > > +       tlb_inval->ops = &guc_tlb_inval_ops;
> > > +}
> > > +
> > > +/**
> > > + * xe_guc_tlb_inval_done_handler() - TLB invalidation done
> > > handler
> > > + * @guc: guc
> > > + * @msg: message indicating TLB invalidation done
> > > + * @len: length of message
> > > + *
> > > + * Parse seqno of TLB invalidation, wake any waiters for seqno,
> > > and signal any
> > > + * invalidation fences for seqno. Algorithm for this depends on
> > > seqno being
> > > + * received in-order and asserts this assumption.
> > > + *
> > > + * Return: 0 on success, -EPROTO for malformed messages.
> > > + */
> > > +int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg,
> > > u32 len)
> > > +{
> > > +       struct xe_gt *gt = guc_to_gt(guc);
> > > +
> > > +       if (unlikely(len != 1))
> > > +               return -EPROTO;
> > > +
> > > +       xe_tlb_inval_done_handler(&gt->tlb_inval, msg[0]);
> > > +
> > > +       return 0;
> > > +}
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> > > b/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> > > new file mode 100644
> > > index 000000000000..07d668b02e3d
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> > > @@ -0,0 +1,19 @@
> > > +/* SPDX-License-Identifier: MIT */
> > > +/*
> > > + * Copyright © 2025 Intel Corporation
> > > + */
> > > +
> > > +#ifndef _XE_GUC_TLB_INVAL_H_
> > > +#define _XE_GUC_TLB_INVAL_H_
> > > +
> > > +#include <linux/types.h>
> > > +
> > > +struct xe_guc;
> > > +struct xe_tlb_inval;
> > > +
> > > +void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> > > +                                struct xe_tlb_inval *tlb_inval);
> > > +
> > > +int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg,
> > > u32 len);
> > > +
> > > +#endif
> > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c
> > > b/drivers/gpu/drm/xe/xe_tlb_inval.c
> > > index c795b78362bf..071c25fbdbac 100644
> > > --- a/drivers/gpu/drm/xe/xe_tlb_inval.c
> > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
> > > @@ -12,50 +12,45 @@
> > >  #include "xe_gt_printk.h"
> > >  #include "xe_guc.h"
> > >  #include "xe_guc_ct.h"
> > > +#include "xe_guc_tlb_inval.h"
> > >  #include "xe_gt_stats.h"
> > >  #include "xe_tlb_inval.h"
> > >  #include "xe_mmio.h"
> > >  #include "xe_pm.h"
> > > -#include "xe_sriov.h"
> > > +#include "xe_tlb_inval.h"
> > >  #include "xe_trace.h"
> > > -#include "regs/xe_guc_regs.h"
> > > -
> > > -#define FENCE_STACK_BIT                DMA_FENCE_FLAG_USER_BITS
> > >  
> > > -/*
> > > - * TLB inval depends on pending commands in the CT queue and
> > > then the real
> > > - * invalidation time. Double up the time to process full CT
> > > queue
> > > - * just to be on the safe side.
> > > +/**
> > > + * DOC: Xe TLB invalidation
> > > + *
> > > + * Xe TLB invalidation is implemented in two layers. The first
> > > is the frontend
> > > + * API, which provides an interface for TLB invalidations to the
> > > driver code.
> > > + * The frontend handles seqno assignment, synchronization
> > > (fences), and the
> > > + * timeout mechanism. The frontend is implemented via an
> > > embedded structure
> > > + * xe_tlb_inval that includes a set of ops hooking into the
> > > backend. The backend
> > > + * interacts with the hardware (or firmware) to perform the
> > > actual invalidation.
> > >   */
> > > -static long tlb_timeout_jiffies(struct xe_gt *gt)
> > > -{
> > > -       /* this reflects what HW/GuC needs to process TLB inv
> > > request */
> > > -       const long hw_tlb_timeout = HZ / 4;
> > >  
> > > -       /* this estimates actual delay caused by the CTB
> > > transport */
> > > -       long delay = xe_guc_ct_queue_proc_time_jiffies(&gt-
> > > >uc.guc.ct);
> > > -
> > > -       return hw_tlb_timeout + 2 * delay;
> > > -}
> > > +#define FENCE_STACK_BIT                DMA_FENCE_FLAG_USER_BITS
> > >  
> > >  static void xe_tlb_inval_fence_fini(struct xe_tlb_inval_fence
> > > *fence)
> > >  {
> > > -       struct xe_gt *gt;
> > > -
> > >         if (WARN_ON_ONCE(!fence->tlb_inval))
> > >                 return;
> > >  
> > > -       gt = fence->tlb_inval->private;
> > > -       xe_pm_runtime_put(gt_to_xe(gt));
> > > +       xe_pm_runtime_put(fence->tlb_inval->xe);
> > >         fence->tlb_inval = NULL; /* fini() should be called once
> > > */
> > >  }
> > >  
> > >  static void
> > > -__inval_fence_signal(struct xe_device *xe, struct
> > > xe_tlb_inval_fence *fence)
> > > +xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence)
> > >  {
> > >         bool stack = test_bit(FENCE_STACK_BIT, &fence-
> > > >base.flags);
> > >  
> > > -       trace_xe_tlb_inval_fence_signal(xe, fence);
> > > +       lockdep_assert_held(&fence->tlb_inval->pending_lock);
> > > +
> > > +       list_del(&fence->link);
> > > +       trace_xe_tlb_inval_fence_signal(fence->tlb_inval->xe,
> > > fence);
> > >         xe_tlb_inval_fence_fini(fence);
> > >         dma_fence_signal(&fence->base);
> > >         if (!stack)
> > > @@ -63,57 +58,50 @@ __inval_fence_signal(struct xe_device *xe,
> > > struct xe_tlb_inval_fence *fence)
> > >  }
> > >  
> > >  static void
> > > -inval_fence_signal(struct xe_device *xe, struct
> > > xe_tlb_inval_fence *fence)
> > > +xe_tlb_inval_fence_signal_unlocked(struct xe_tlb_inval_fence
> > > *fence)
> > >  {
> > > -       lockdep_assert_held(&fence->tlb_inval->pending_lock);
> > > -
> > > -       list_del(&fence->link);
> > > -       __inval_fence_signal(xe, fence);
> > > -}
> > > +       struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
> > >  
> > > -static void
> > > -inval_fence_signal_unlocked(struct xe_device *xe,
> > > -                           struct xe_tlb_inval_fence *fence)
> > > -{
> > > -       spin_lock_irq(&fence->tlb_inval->pending_lock);
> > > -       inval_fence_signal(xe, fence);
> > > -       spin_unlock_irq(&fence->tlb_inval->pending_lock);
> > > +       spin_lock_irq(&tlb_inval->pending_lock);
> > > +       xe_tlb_inval_fence_signal(fence);
> > > +       spin_unlock_irq(&tlb_inval->pending_lock);
> > >  }
> > >  
> > > -static void xe_gt_tlb_fence_timeout(struct work_struct *work)
> > > +static void xe_tlb_inval_fence_timeout(struct work_struct *work)
> > >  {
> > > -       struct xe_gt *gt = container_of(work, struct xe_gt,
> > > -
> > >                                        tlb_inval.fence_tdr.work);
> > > -       struct xe_device *xe = gt_to_xe(gt);
> > > +       struct xe_tlb_inval *tlb_inval = container_of(work,
> > > struct xe_tlb_inval,
> > > +                                                    
> > > fence_tdr.work);
> > > +       struct xe_device *xe = tlb_inval->xe;
> > >         struct xe_tlb_inval_fence *fence, *next;
> > > +       long timeout_delay = tlb_inval->ops-
> > > >timeout_delay(tlb_inval);
> > >  
> > > -       LNL_FLUSH_WORK(&gt->uc.guc.ct.g2h_worker);
> > > +       tlb_inval->ops->flush(tlb_inval);
> > >  
> > > -       spin_lock_irq(&gt->tlb_inval.pending_lock);
> > > +       spin_lock_irq(&tlb_inval->pending_lock);
> > >         list_for_each_entry_safe(fence, next,
> > > -                                &gt->tlb_inval.pending_fences,
> > > link) {
> > > +                                &tlb_inval->pending_fences,
> > > link) {
> > >                 s64 since_inval_ms = ktime_ms_delta(ktime_get(),
> > >                                                     fence-
> > > >inval_time);
> > >  
> > > -               if (msecs_to_jiffies(since_inval_ms) <
> > > tlb_timeout_jiffies(gt))
> > > +               if (msecs_to_jiffies(since_inval_ms) <
> > > timeout_delay)
> > >                         break;
> > >  
> > >                 trace_xe_tlb_inval_fence_timeout(xe, fence);
> > > -               xe_gt_err(gt, "TLB invalidation fence timeout,
> > > seqno=%d recv=%d",
> > > -                         fence->seqno, gt-
> > > >tlb_inval.seqno_recv);
> > > +               drm_err(&xe->drm,
> > > +                       "TLB invalidation fence timeout, seqno=%d
> > > recv=%d",
> > > +                       fence->seqno, tlb_inval->seqno_recv);
> > >  
> > >                 fence->base.error = -ETIME;
> > > -               inval_fence_signal(xe, fence);
> > > +               xe_tlb_inval_fence_signal(fence);
> > >         }
> > > -       if (!list_empty(&gt->tlb_inval.pending_fences))
> > > -               queue_delayed_work(system_wq,
> > > -                                  &gt->tlb_inval.fence_tdr,
> > > -                                  tlb_timeout_jiffies(gt));
> > > -       spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > > +       if (!list_empty(&tlb_inval->pending_fences))
> > > +               queue_delayed_work(system_wq, &tlb_inval-
> > > >fence_tdr,
> > > +                                  timeout_delay);
> > > +       spin_unlock_irq(&tlb_inval->pending_lock);
> > >  }
> > >  
> > >  /**
> > > - * xe_tlb_inval_init_early - Initialize TLB invalidation state
> > > + * xe_gt_tlb_inval_init_early() - Initialize TLB invalidation
> > > state
> > >   * @gt: GT structure
> > >   *
> > >   * Initialize TLB invalidation state, purely software
> > > initialization, should
> > > @@ -123,13 +111,12 @@ static void xe_gt_tlb_fence_timeout(struct
> > > work_struct *work)
> > >   */
> > >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
> > >  {
> > > -       gt->tlb_inval.private = gt;
> > > +       gt->tlb_inval.xe = gt_to_xe(gt);
> > >         gt->tlb_inval.seqno = 1;
> > >         INIT_LIST_HEAD(&gt->tlb_inval.pending_fences);
> > >         spin_lock_init(&gt->tlb_inval.pending_lock);
> > >         spin_lock_init(&gt->tlb_inval.lock);
> > > -       INIT_DELAYED_WORK(&gt->tlb_inval.fence_tdr,
> > > -                         xe_gt_tlb_fence_timeout);
> > > +       INIT_DELAYED_WORK(&gt->tlb_inval.fence_tdr,
> > > xe_tlb_inval_fence_timeout);
> > >  
> > >         gt->tlb_inval.job_wq =
> > >                 drmm_alloc_ordered_workqueue(&gt_to_xe(gt)->drm,
> > > "gt-tbl-inval-job-wq",
> > > @@ -137,60 +124,64 @@ int xe_gt_tlb_inval_init_early(struct xe_gt
> > > *gt)
> > >         if (IS_ERR(gt->tlb_inval.job_wq))
> > >                 return PTR_ERR(gt->tlb_inval.job_wq);
> > >  
> > > +       /* XXX: Blindly setting up backend to GuC */
> > > +       xe_guc_tlb_inval_init_early(&gt->uc.guc, &gt->tlb_inval);
> > > +
> > >         return 0;
> > >  }
> > >  
> > >  /**
> > > - * xe_tlb_inval_reset - Initialize TLB invalidation reset
> > > + * xe_tlb_inval_reset() - TLB invalidation reset
> > >   * @tlb_inval: TLB invalidation client
> > >   *
> > >   * Signal any pending invalidation fences, should be called
> > > during a GT reset
> > >   */
> > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval)
> > >  {
> > > -       struct xe_gt *gt = tlb_inval->private;
> > >         struct xe_tlb_inval_fence *fence, *next;
> > >         int pending_seqno;
> > >  
> > >         /*
> > > -        * we can get here before the CTs are even initialized if
> > > we're wedging
> > > -        * very early, in which case there are not going to be
> > > any pending
> > > -        * fences so we can bail immediately.
> > > +        * we can get here before the backends are even
> > > initialized if we're
> > > +        * wedging very early, in which case there are not going
> > > to be any
> > > +        * pendind fences so we can bail immediately.
> > >          */
> > > -       if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
> > > +       if (!tlb_inval->ops->initialized(tlb_inval))
> > >                 return;
> > >  
> > >         /*
> > > -        * CT channel is already disabled at this point. No new
> > > TLB requests can
> > > +        * Backend is already disabled at this point. No new TLB
> > > requests can
> > >          * appear.
> > >          */
> > >  
> > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > -       spin_lock_irq(&gt->tlb_inval.pending_lock);
> > > -       cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > > +       tlb_inval->ops->lock(tlb_inval);
> > 
> > I think you want a dedicated lock embedded in struct xe_tlb_inval,
> > rather than reaching into the backend to grab one.
> > 
> > This will deadlock as written: G2H TLB inval messages are sometimes
> > processed while holding ct->lock (non-fast path, unlikely) and
> > sometimes
> > without it (fast path, likely).
> 
> Ugh, I'm off today. Ignore the deadlock part, I was confusing
> myself...
> I was thinking this was the function xe_tlb_inval_done_handler, it is
> not. I still think xe_tlb_inval should its own lock but this patch
> written should work with s/xe_guc_ct_send/xe_guc_ct_send_locked.

So one reason I didn't go that way is we did just the reverse recently
- moved from a TLB dedicated lock to the more specific CT lock since
these are all going into the CT handler anyway when we use GuC
submission. Then this embedded version allows us to lock at the bottom
data layer rather than having a separate lock in the upper layer.
Another thing is we might want to have different types of invalidation
running in parallel without locking the data in the upper layer since
the real contention would be in the lower level pipelining anyway.

Thanks,
Stuart

> 
> Matt 
> 
> > 
> > I’d call this lock seqno_lock, since it protects exactly that—the
> > order
> > in which a seqno is assigned by the frontend and handed to the
> > backend.
> > 
> > Prime this lock for reclaim as well—do what primelockdep() does in
> > xe_guc_ct.c—to make it clear that memory allocations are not
> > allowed
> > while the lock is held as TLB invalidations can be called from two
> > reclaim paths:
> > 
> > - MMU notifier callbacks
> > - The dma-fence signaling path of VM binds that require a TLB
> >   invalidation
> > 
> > > +       spin_lock_irq(&tlb_inval->pending_lock);
> > > +       cancel_delayed_work(&tlb_inval->fence_tdr);
> > >         /*
> > >          * We might have various kworkers waiting for TLB flushes
> > > to complete
> > >          * which are not tracked with an explicit TLB fence,
> > > however at this
> > > -        * stage that will never happen since the CT is already
> > > disabled, so
> > > -        * make sure we signal them here under the assumption
> > > that we have
> > > +        * stage that will never happen since the backend is
> > > already disabled,
> > > +        * so make sure we signal them here under the assumption
> > > that we have
> > >          * completed a full GT reset.
> > >          */
> > > -       if (gt->tlb_inval.seqno == 1)
> > > +       if (tlb_inval->seqno == 1)
> > >                 pending_seqno = TLB_INVALIDATION_SEQNO_MAX - 1;
> > >         else
> > > -               pending_seqno = gt->tlb_inval.seqno - 1;
> > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, pending_seqno);
> > > +               pending_seqno = tlb_inval->seqno - 1;
> > > +       WRITE_ONCE(tlb_inval->seqno_recv, pending_seqno);
> > >  
> > >         list_for_each_entry_safe(fence, next,
> > > -                                &gt->tlb_inval.pending_fences,
> > > link)
> > > -               inval_fence_signal(gt_to_xe(gt), fence);
> > > -       spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > +                                &tlb_inval->pending_fences,
> > > link)
> > > +               xe_tlb_inval_fence_signal(fence);
> > > +       spin_unlock_irq(&tlb_inval->pending_lock);
> > > +       tlb_inval->ops->unlock(tlb_inval);
> > >  }
> > >  
> > > -static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
> > > +static bool xe_tlb_inval_seqno_past(struct xe_tlb_inval
> > > *tlb_inval, int seqno)
> > >  {
> > > -       int seqno_recv = READ_ONCE(gt->tlb_inval.seqno_recv);
> > > +       int seqno_recv = READ_ONCE(tlb_inval->seqno_recv);
> > > +
> > > +       lockdep_assert_held(&tlb_inval->pending_lock);
> > >  
> > >         if (seqno - seqno_recv < -(TLB_INVALIDATION_SEQNO_MAX /
> > > 2))
> > >                 return false;
> > > @@ -201,44 +192,20 @@ static bool tlb_inval_seqno_past(struct
> > > xe_gt *gt, int seqno)
> > >         return seqno_recv >= seqno;
> > >  }
> > >  
> > > -static int send_tlb_inval(struct xe_guc *guc, const u32 *action,
> > > int len)
> > > -{
> > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > -
> > > -       xe_gt_assert(gt, action[1]);    /* Seqno */
> > > -       lockdep_assert_held(&guc->ct.lock);
> > > -
> > > -       /*
> > > -        * XXX: The seqno algorithm relies on TLB invalidation
> > > being processed
> > > -        * in order which they currently are, if that changes the
> > > algorithm will
> > > -        * need to be updated.
> > > -        */
> > > -
> > > -       xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> > > -
> > > -       return xe_guc_ct_send(&guc->ct, action, len,
> > > -                             G2H_LEN_DW_TLB_INVALIDATE, 1);
> > > -}
> > > -
> > >  static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence
> > > *fence)
> > >  {
> > >         struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
> > > -       struct xe_gt *gt = tlb_inval->private;
> > > -       struct xe_device *xe = gt_to_xe(gt);
> > > -
> > > -       lockdep_assert_held(&gt->uc.guc.ct.lock);
> > >  
> > >         fence->seqno = tlb_inval->seqno;
> > > -       trace_xe_tlb_inval_fence_send(xe, fence);
> > > +       trace_xe_tlb_inval_fence_send(tlb_inval->xe, fence);
> > >  
> > >         spin_lock_irq(&tlb_inval->pending_lock);
> > >         fence->inval_time = ktime_get();
> > >         list_add_tail(&fence->link, &tlb_inval->pending_fences);
> > >  
> > >         if (list_is_singular(&tlb_inval->pending_fences))
> > > -               queue_delayed_work(system_wq,
> > > -                                  &tlb_inval->fence_tdr,
> > > -                                  tlb_timeout_jiffies(gt));
> > > +               queue_delayed_work(system_wq, &tlb_inval-
> > > >fence_tdr,
> > > +                                  tlb_inval->ops-
> > > >timeout_delay(tlb_inval));
> > >         spin_unlock_irq(&tlb_inval->pending_lock);
> > >  
> > >         tlb_inval->seqno = (tlb_inval->seqno + 1) %
> > > @@ -247,202 +214,63 @@ static void xe_tlb_inval_fence_prep(struct
> > > xe_tlb_inval_fence *fence)
> > >                 tlb_inval->seqno = 1;
> > >  }
> > >  
> > > -#define MAKE_INVAL_OP(type)    ((type <<
> > > XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > > -               XE_GUC_TLB_INVAL_MODE_HEAVY <<
> > > XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > > -               XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > > -
> > > -static int send_tlb_inval_ggtt(struct xe_gt *gt, int seqno)
> > > -{
> > > -       u32 action[] = {
> > > -               XE_GUC_ACTION_TLB_INVALIDATION,
> > > -               seqno,
> > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > > -       };
> > > -
> > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > ARRAY_SIZE(action));
> > > -}
> > > -
> > > -static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > -                             struct xe_tlb_inval_fence *fence)
> > > -{
> > > -       u32 action[] = {
> > > -               XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > > -               0,  /* seqno, replaced in send_tlb_inval */
> > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > -       };
> > > -       struct xe_gt *gt = tlb_inval->private;
> > > -
> > > -       xe_gt_assert(gt, fence);
> > > -
> > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > ARRAY_SIZE(action));
> > > -}
> > > +#define xe_tlb_inval_issue(__tlb_inval, __fence, op, args...)  \
> > > +({                                                             \
> > > +       int __ret;                                              \
> > > +                                                               \
> > > +       xe_assert((__tlb_inval)->xe, (__tlb_inval)->ops);       \
> > > +       xe_assert((__tlb_inval)->xe, (__fence));                \
> > > +                                                               \
> > > +       (__tlb_inval)->ops->lock((__tlb_inval));                \
> > > +       xe_tlb_inval_fence_prep((__fence));                     \
> > > +       __ret = op((__tlb_inval), (__fence)->seqno, ##args);    \
> > > +       if (__ret < 0)                                          \
> > > +               xe_tlb_inval_fence_signal_unlocked((__fence));  \
> > > +       (__tlb_inval)->ops->unlock((__tlb_inval));              \
> > > +                                                               \
> > > +       __ret == -ECANCELED ? 0 : __ret;                        \
> > > +})
> > >  
> > >  /**
> > > - * xe_gt_tlb_invalidation_all - Invalidate all TLBs across PF
> > > and all VFs.
> > > - * @gt: the &xe_gt structure
> > > - * @fence: the &xe_tlb_inval_fence to be signaled on completion
> > > + * xe_tlb_inval_all() - Issue a TLB invalidation for all TLBs
> > > + * @tlb_inval: TLB invalidation client
> > > + * @fence: invalidation fence which will be signal on TLB
> > > invalidation
> > > + * completion
> > >   *
> > > - * Send a request to invalidate all TLBs across PF and all VFs.
> > > + * Issue a TLB invalidation for all TLBs. Completion of TLB is
> > > asynchronous and
> > > + * caller can use the invalidation fence to wait for completion.
> > >   *
> > >   * Return: 0 on success, negative error code on error
> > >   */
> > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > >                      struct xe_tlb_inval_fence *fence)
> > >  {
> > > -       struct xe_gt *gt = tlb_inval->private;
> > > -       int err;
> > > -
> > > -       err = send_tlb_inval_all(tlb_inval, fence);
> > > -       if (err)
> > > -               xe_gt_err(gt, "TLB invalidation request failed
> > > (%pe)", ERR_PTR(err));
> > > -
> > > -       return err;
> > > -}
> > > -
> > > -/*
> > > - * Ensure that roundup_pow_of_two(length) doesn't overflow.
> > > - * Note that roundup_pow_of_two() operates on unsigned long,
> > > - * not on u64.
> > > - */
> > > -#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > > (rounddown_pow_of_two(ULONG_MAX))
> > > -
> > > -static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64 start, u64
> > > end,
> > > -                               u32 asid, int seqno)
> > > -{
> > > -#define MAX_TLB_INVALIDATION_LEN       7
> > > -       u32 action[MAX_TLB_INVALIDATION_LEN];
> > > -       u64 length = end - start;
> > > -       int len = 0;
> > > -
> > > -       action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > > -       action[len++] = seqno;
> > > -       if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > -           length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > > -               action[len++] =
> > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > > -       } else {
> > > -               u64 orig_start = start;
> > > -               u64 align;
> > > -
> > > -               if (length < SZ_4K)
> > > -                       length = SZ_4K;
> > > -
> > > -               /*
> > > -                * We need to invalidate a higher granularity if
> > > start address
> > > -                * is not aligned to length. When start is not
> > > aligned with
> > > -                * length we need to find the length large enough
> > > to create an
> > > -                * address mask covering the required range.
> > > -                */
> > > -               align = roundup_pow_of_two(length);
> > > -               start = ALIGN_DOWN(start, align);
> > > -               end = ALIGN(end, align);
> > > -               length = align;
> > > -               while (start + length < end) {
> > > -                       length <<= 1;
> > > -                       start = ALIGN_DOWN(orig_start, length);
> > > -               }
> > > -
> > > -               /*
> > > -                * Minimum invalidation size for a 2MB page that
> > > the hardware
> > > -                * expects is 16MB
> > > -                */
> > > -               if (length >= SZ_2M) {
> > > -                       length = max_t(u64, SZ_16M, length);
> > > -                       start = ALIGN_DOWN(orig_start, length);
> > > -               }
> > > -
> > > -               xe_gt_assert(gt, length >= SZ_4K);
> > > -               xe_gt_assert(gt, is_power_of_2(length));
> > > -               xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M)
> > > - 1,
> > > -                                                   ilog2(SZ_2M)
> > > + 1)));
> > > -               xe_gt_assert(gt, IS_ALIGNED(start, length));
> > > -
> > > -               action[len++] =
> > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > > -               action[len++] = asid;
> > > -               action[len++] = lower_32_bits(start);
> > > -               action[len++] = upper_32_bits(start);
> > > -               action[len++] = ilog2(length) - ilog2(SZ_4K);
> > > -       }
> > > -
> > > -       xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> > > -
> > > -       return send_tlb_inval(&gt->uc.guc, action, len);
> > > -}
> > > -
> > > -static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
> > > -                              struct xe_tlb_inval_fence *fence)
> > > -{
> > > -       int ret;
> > > -
> > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > -
> > > -       xe_tlb_inval_fence_prep(fence);
> > > -
> > > -       ret = send_tlb_inval_ggtt(gt, fence->seqno);
> > > -       if (ret < 0)
> > > -               inval_fence_signal_unlocked(gt_to_xe(gt), fence);
> > > -
> > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > -
> > > -       /*
> > > -        * -ECANCELED indicates the CT is stopped for a GT reset.
> > > TLB caches
> > > -        *  should be nuked on a GT reset so this error can be
> > > ignored.
> > > -        */
> > > -       if (ret == -ECANCELED)
> > > -               return 0;
> > > -
> > > -       return ret;
> > > +       return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval-
> > > >ops->all);
> > >  }
> > >  
> > >  /**
> > > - * xe_tlb_inval_ggtt - Issue a TLB invalidation on this GT for
> > > the GGTT
> > > + * xe_tlb_inval_ggtt() - Issue a TLB invalidation for the GGTT
> > >   * @tlb_inval: TLB invalidation client
> > >   *
> > > - * Issue a TLB invalidation for the GGTT. Completion of TLB
> > > invalidation is
> > > - * synchronous.
> > > + * Issue a TLB invalidation for the GGTT. Completion of TLB is
> > > asynchronous and
> > > + * caller can use the invalidation fence to wait for completion.
> > >   *
> > >   * Return: 0 on success, negative error code on error
> > >   */
> > >  int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
> > >  {
> > > -       struct xe_gt *gt = tlb_inval->private;
> > > -       struct xe_device *xe = gt_to_xe(gt);
> > > -       unsigned int fw_ref;
> > > -
> > > -       if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
> > > -           gt->uc.guc.submission_state.enabled) {
> > > -               struct xe_tlb_inval_fence fence;
> > > -               int ret;
> > > -
> > > -               xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> > > -               ret = __xe_tlb_inval_ggtt(gt, &fence);
> > > -               if (ret)
> > > -                       return ret;
> > > -
> > > -               xe_tlb_inval_fence_wait(&fence);
> > > -       } else if (xe_device_uc_enabled(xe) &&
> > > !xe_device_wedged(xe)) {
> > > -               struct xe_mmio *mmio = &gt->mmio;
> > > -
> > > -               if (IS_SRIOV_VF(xe))
> > > -                       return 0;
> > > -
> > > -               fw_ref = xe_force_wake_get(gt_to_fw(gt),
> > > XE_FW_GT);
> > > -               if (xe->info.platform == XE_PVC ||
> > > GRAPHICS_VER(xe) >= 20) {
> > > -                       xe_mmio_write32(mmio,
> > > PVC_GUC_TLB_INV_DESC1,
> > > -
> > >                                        PVC_GUC_TLB_INV_DESC1_INVAL
> > > IDATE);
> > > -                       xe_mmio_write32(mmio,
> > > PVC_GUC_TLB_INV_DESC0,
> > > -
> > >                                        PVC_GUC_TLB_INV_DESC0_VALID
> > > );
> > > -               } else {
> > > -                       xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> > > -
> > >                                        GUC_TLB_INV_CR_INVALIDATE);
> > > -               }
> > > -               xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > > -       }
> > > +       struct xe_tlb_inval_fence fence, *fence_ptr = &fence;
> > > +       int ret;
> > >  
> > > -       return 0;
> > > +       xe_tlb_inval_fence_init(tlb_inval, fence_ptr, true);
> > > +       ret = xe_tlb_inval_issue(tlb_inval, fence_ptr, tlb_inval-
> > > >ops->ggtt);
> > > +       xe_tlb_inval_fence_wait(fence_ptr);
> > > +
> > > +       return ret;
> > >  }
> > >  
> > >  /**
> > > - * xe_tlb_inval_range - Issue a TLB invalidation on this GT for
> > > an address range
> > > + * xe_tlb_inval_range() - Issue a TLB invalidation for an
> > > address range
> > >   * @tlb_inval: TLB invalidation client
> > >   * @fence: invalidation fence which will be signal on TLB
> > > invalidation
> > >   * completion
> > > @@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct xe_tlb_inval
> > > *tlb_inval,
> > >                        struct xe_tlb_inval_fence *fence, u64
> > > start, u64 end,
> > >                        u32 asid)
> > >  {
> > > -       struct xe_gt *gt = tlb_inval->private;
> > > -       struct xe_device *xe = gt_to_xe(gt);
> > > -       int  ret;
> > > -
> > > -       xe_gt_assert(gt, fence);
> > > -
> > > -       /* Execlists not supported */
> > > -       if (xe->info.force_execlist) {
> > > -               __inval_fence_signal(xe, fence);
> > > -               return 0;
> > > -       }
> > > -
> > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > -
> > > -       xe_tlb_inval_fence_prep(fence);
> > > -
> > > -       ret = send_tlb_inval_ppgtt(gt, start, end, asid, fence-
> > > >seqno);
> > > -       if (ret < 0)
> > > -               inval_fence_signal_unlocked(xe, fence);
> > > -
> > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > -
> > > -       return ret;
> > > +       return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval-
> > > >ops->ppgtt,
> > > +                                 start, end, asid);
> > >  }
> > >  
> > >  /**
> > > - * xe_tlb_inval_vm - Issue a TLB invalidation on this GT for a
> > > VM
> > > + * xe_tlb_inval_vm() - Issue a TLB invalidation for a VM
> > >   * @tlb_inval: TLB invalidation client
> > >   * @vm: VM to invalidate
> > >   *
> > > @@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct xe_tlb_inval
> > > *tlb_inval, struct xe_vm *vm)
> > >  {
> > >         struct xe_tlb_inval_fence fence;
> > >         u64 range = 1ull << vm->xe->info.va_bits;
> > > -       int ret;
> > >  
> > >         xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> > > -
> > > -       ret = xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm-
> > > >usm.asid);
> > > -       if (ret < 0)
> > > -               return;
> > > -
> > > +       xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm-
> > > >usm.asid);
> > >         xe_tlb_inval_fence_wait(&fence);
> > >  }
> > >  
> > >  /**
> > > - * xe_tlb_inval_done_handler - TLB invalidation done handler
> > > - * @gt: gt
> > > + * xe_tlb_inval_done_handler() - TLB invalidation done handler
> > > + * @tlb_inval: TLB invalidation client
> > >   * @seqno: seqno of invalidation that is done
> > >   *
> > >   * Update recv seqno, signal any TLB invalidation fences, and
> > > restart TDR
> > 
> > I'd mention that is function is safe be called from any context
> > (i.e.,
> > process, atomic, and hardirq contexts are allowed).
> > 
> > We might need to convert tlb_inval.pending_lock to a raw_spinlock_t
> > for
> > PREEMPT_RT enablement. Same for the GuC fast_lock. AFAIK we haven’t
> > had
> > any complaints, so maybe I’m just overthinking it, but also perhaps
> > not.
> > 
> > >   */
> > > -static void xe_tlb_inval_done_handler(struct xe_gt *gt, int
> > > seqno)
> > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval,
> > > int seqno)
> > >  {
> > > -       struct xe_device *xe = gt_to_xe(gt);
> > > +       struct xe_device *xe = tlb_inval->xe;
> > >         struct xe_tlb_inval_fence *fence, *next;
> > >         unsigned long flags;
> > >  
> > > @@ -535,77 +337,53 @@ static void
> > > xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> > >          * officially process the CT message like if racing
> > > against
> > >          * process_g2h_msg().
> > >          */
> > > -       spin_lock_irqsave(&gt->tlb_inval.pending_lock, flags);
> > > -       if (tlb_inval_seqno_past(gt, seqno)) {
> > > -               spin_unlock_irqrestore(&gt-
> > > >tlb_inval.pending_lock, flags);
> > > +       spin_lock_irqsave(&tlb_inval->pending_lock, flags);
> > > +       if (xe_tlb_inval_seqno_past(tlb_inval, seqno)) {
> > > +               spin_unlock_irqrestore(&tlb_inval->pending_lock,
> > > flags);
> > >                 return;
> > >         }
> > >  
> > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
> > > +       WRITE_ONCE(tlb_inval->seqno_recv, seqno);
> > >  
> > >         list_for_each_entry_safe(fence, next,
> > > -                                &gt->tlb_inval.pending_fences,
> > > link) {
> > > +                                &tlb_inval->pending_fences,
> > > link) {
> > >                 trace_xe_tlb_inval_fence_recv(xe, fence);
> > >  
> > > -               if (!tlb_inval_seqno_past(gt, fence->seqno))
> > > +               if (!xe_tlb_inval_seqno_past(tlb_inval, fence-
> > > >seqno))
> > >                         break;
> > >  
> > > -               inval_fence_signal(xe, fence);
> > > +               xe_tlb_inval_fence_signal(fence);
> > >         }
> > >  
> > > -       if (!list_empty(&gt->tlb_inval.pending_fences))
> > > +       if (!list_empty(&tlb_inval->pending_fences))
> > >                 mod_delayed_work(system_wq,
> > > -                                &gt->tlb_inval.fence_tdr,
> > > -                                tlb_timeout_jiffies(gt));
> > > +                                &tlb_inval->fence_tdr,
> > > +                                tlb_inval->ops-
> > > >timeout_delay(tlb_inval));
> > >         else
> > > -               cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > > +               cancel_delayed_work(&tlb_inval->fence_tdr);
> > >  
> > > -       spin_unlock_irqrestore(&gt->tlb_inval.pending_lock,
> > > flags);
> > > -}
> > > -
> > > -/**
> > > - * xe_guc_tlb_inval_done_handler - TLB invalidation done handler
> > > - * @guc: guc
> > > - * @msg: message indicating TLB invalidation done
> > > - * @len: length of message
> > > - *
> > > - * Parse seqno of TLB invalidation, wake any waiters for seqno,
> > > and signal any
> > > - * invalidation fences for seqno. Algorithm for this depends on
> > > seqno being
> > > - * received in-order and asserts this assumption.
> > > - *
> > > - * Return: 0 on success, -EPROTO for malformed messages.
> > > - */
> > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg,
> > > u32 len)
> > > -{
> > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > -
> > > -       if (unlikely(len != 1))
> > > -               return -EPROTO;
> > > -
> > > -       xe_tlb_inval_done_handler(gt, msg[0]);
> > > -
> > > -       return 0;
> > > +       spin_unlock_irqrestore(&tlb_inval->pending_lock, flags);
> > >  }
> > >  
> > >  static const char *
> > > -inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > > +xe_inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > >  {
> > >         return "xe";
> > >  }
> > >  
> > >  static const char *
> > > -inval_fence_get_timeline_name(struct dma_fence *dma_fence)
> > > +xe_inval_fence_get_timeline_name(struct dma_fence *dma_fence)
> > >  {
> > > -       return "inval_fence";
> > > +       return "tlb_inval_fence";
> > >  }
> > >  
> > >  static const struct dma_fence_ops inval_fence_ops = {
> > > -       .get_driver_name = inval_fence_get_driver_name,
> > > -       .get_timeline_name = inval_fence_get_timeline_name,
> > > +       .get_driver_name = xe_inval_fence_get_driver_name,
> > > +       .get_timeline_name = xe_inval_fence_get_timeline_name,
> > >  };
> > >  
> > >  /**
> > > - * xe_tlb_inval_fence_init - Initialize TLB invalidation fence
> > > + * xe_tlb_inval_fence_init() - Initialize TLB invalidation fence
> > >   * @tlb_inval: TLB invalidation client
> > >   * @fence: TLB invalidation fence to initialize
> > >   * @stack: fence is stack variable
> > > @@ -618,15 +396,12 @@ void xe_tlb_inval_fence_init(struct
> > > xe_tlb_inval *tlb_inval,
> > >                              struct xe_tlb_inval_fence *fence,
> > >                              bool stack)
> > >  {
> > > -       struct xe_gt *gt = tlb_inval->private;
> > > -
> > > -       xe_pm_runtime_get_noresume(gt_to_xe(gt));
> > > +       xe_pm_runtime_get_noresume(tlb_inval->xe);
> > >  
> > > -       spin_lock_irq(&gt->tlb_inval.lock);
> > > -       dma_fence_init(&fence->base, &inval_fence_ops,
> > > -                      &gt->tlb_inval.lock,
> > > +       spin_lock_irq(&tlb_inval->lock);
> > > +       dma_fence_init(&fence->base, &inval_fence_ops,
> > > &tlb_inval->lock,
> > >                        dma_fence_context_alloc(1), 1);
> > > -       spin_unlock_irq(&gt->tlb_inval.lock);
> > > +       spin_unlock_irq(&tlb_inval->lock);
> > 
> > While here, 'fence_lock' is probably a better name.
> > 
> > Matt
> > 
> > >         INIT_LIST_HEAD(&fence->link);
> > >         if (stack)
> > >                 set_bit(FENCE_STACK_BIT, &fence->base.flags);
> > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > index 7adee3f8c551..cdeafc8d4391 100644
> > > --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > @@ -18,24 +18,30 @@ struct xe_vma;
> > >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
> > >  
> > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval);
> > > -int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > -void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct
> > > xe_vm *vm);
> > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > >                      struct xe_tlb_inval_fence *fence);
> > > +int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > +void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct
> > > xe_vm *vm);
> > >  int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
> > >                        struct xe_tlb_inval_fence *fence,
> > >                        u64 start, u64 end, u32 asid);
> > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg,
> > > u32 len);
> > >  
> > >  void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
> > >                              struct xe_tlb_inval_fence *fence,
> > >                              bool stack);
> > > -void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence
> > > *fence);
> > >  
> > > +/**
> > > + * xe_tlb_inval_fence_wait() - TLB invalidiation fence wait
> > > + * @fence: TLB invalidation fence to wait on
> > > + *
> > > + * Wait on a TLB invalidiation fence until it signals, non
> > > interruptable
> > > + */
> > >  static inline void
> > >  xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence *fence)
> > >  {
> > >         dma_fence_wait(&fence->base, false);
> > >  }
> > >  
> > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval,
> > > int seqno);
> > > +
> > >  #endif /* _XE_TLB_INVAL_ */
> > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > index 05b6adc929bb..c1ad96d24fc8 100644
> > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > @@ -9,10 +9,85 @@
> > >  #include <linux/workqueue.h>
> > >  #include <linux/dma-fence.h>
> > >  
> > > -/** struct xe_tlb_inval - TLB invalidation client */
> > > +struct xe_tlb_inval;
> > > +
> > > +/** struct xe_tlb_inval_ops - TLB invalidation ops (backend) */
> > > +struct xe_tlb_inval_ops {
> > > +       /**
> > > +        * @all: Invalidate all TLBs
> > > +        * @tlb_inval: TLB invalidation client
> > > +        * @seqno: Seqno of TLB invalidation
> > > +        *
> > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > reset, error on
> > > +        * failure
> > > +        */
> > > +       int (*all)(struct xe_tlb_inval *tlb_inval, u32 seqno);
> > > +
> > > +       /**
> > > +        * @ggtt: Invalidate global translation TLBs
> > > +        * @tlb_inval: TLB invalidation client
> > > +        * @seqno: Seqno of TLB invalidation
> > > +        *
> > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > reset, error on
> > > +        * failure
> > > +        */
> > > +       int (*ggtt)(struct xe_tlb_inval *tlb_inval, u32 seqno);
> > > +
> > > +       /**
> > > +        * @ppttt: Invalidate per-process translation TLBs
> > > +        * @tlb_inval: TLB invalidation client
> > > +        * @seqno: Seqno of TLB invalidation
> > > +        * @start: Start address
> > > +        * @end: End address
> > > +        * @asid: Address space ID
> > > +        *
> > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > reset, error on
> > > +        * failure
> > > +        */
> > > +       int (*ppgtt)(struct xe_tlb_inval *tlb_inval, u32 seqno,
> > > u64 start,
> > > +                    u64 end, u32 asid);
> > > +
> > > +       /**
> > > +        * @initialized: Backend is initialized
> > > +        * @tlb_inval: TLB invalidation client
> > > +        *
> > > +        * Return: True if back is initialized, False otherwise
> > > +        */
> > > +       bool (*initialized)(struct xe_tlb_inval *tlb_inval);
> > > +
> > > +       /**
> > > +        * @flush: Flush pending TLB invalidations
> > > +        * @tlb_inval: TLB invalidation client
> > > +        */
> > > +       void (*flush)(struct xe_tlb_inval *tlb_inval);
> > > +
> > > +       /**
> > > +        * @timeout_delay: Timeout delay for TLB invalidation
> > > +        * @tlb_inval: TLB invalidation client
> > > +        *
> > > +        * Return: Timeout delay for TLB invalidation in jiffies
> > > +        */
> > > +       long (*timeout_delay)(struct xe_tlb_inval *tlb_inval);
> > > +
> > > +       /**
> > > +        * @lock: Lock resources protecting the backend seqno
> > > management
> > > +        */
> > > +       void (*lock)(struct xe_tlb_inval *tlb_inval);
> > > +
> > > +       /**
> > > +        * @unlock: Lock resources protecting the backend seqno
> > > management
> > > +        */
> > > +       void (*unlock)(struct xe_tlb_inval *tlb_inval);
> > > +};
> > > +
> > > +/** struct xe_tlb_inval - TLB invalidation client (frontend) */
> > >  struct xe_tlb_inval {
> > >         /** @private: Backend private pointer */
> > >         void *private;
> > > +       /** @xe: Pointer to Xe device */
> > > +       struct xe_device *xe;
> > > +       /** @ops: TLB invalidation ops */
> > > +       const struct xe_tlb_inval_ops *ops;
> > >         /** @tlb_inval.seqno: TLB invalidation seqno, protected
> > > by CT lock */
> > >  #define TLB_INVALIDATION_SEQNO_MAX     0x100000
> > >         int seqno;
> > > -- 
> > > 2.34.1
> > > 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend
  2025-07-23 20:20       ` Summers, Stuart
@ 2025-07-23 20:47         ` Matthew Brost
  2025-07-23 20:55           ` Summers, Stuart
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Brost @ 2025-07-23 20:47 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
	Kassabri, Farah, Auld, Matthew

On Wed, Jul 23, 2025 at 02:20:31PM -0600, Summers, Stuart wrote:
> On Wed, 2025-07-23 at 13:18 -0700, Matthew Brost wrote:
> > On Wed, Jul 23, 2025 at 12:17:49PM -0700, Matthew Brost wrote:
> > > On Wed, Jul 23, 2025 at 06:22:22PM +0000, stuartsummers wrote:
> > > > From: Matthew Brost <matthew.brost@intel.com>
> > > > 
> > > > The frontend exposes an API to the driver to send invalidations,
> > > > handles
> > > > sequence number assignment, synchronization (fences), and
> > > > provides a
> > > > timeout mechanism. The backend issues the actual invalidation to
> > > > the
> > > > hardware (or firmware).
> > > > 
> > > > The new layering easily allows issuing TLB invalidations to
> > > > different
> > > > hardware or firmware interfaces.
> > > > 
> > > > Normalize some naming while here too.
> > > > 
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/Makefile             |   1 +
> > > >  drivers/gpu/drm/xe/xe_guc_ct.c          |   2 +-
> > > >  drivers/gpu/drm/xe/xe_guc_tlb_inval.c   | 263 +++++++++++++
> > > >  drivers/gpu/drm/xe/xe_guc_tlb_inval.h   |  19 +
> > > >  drivers/gpu/drm/xe/xe_tlb_inval.c       | 495 +++++++-----------
> > > > ------
> > > >  drivers/gpu/drm/xe/xe_tlb_inval.h       |  14 +-
> > > >  drivers/gpu/drm/xe/xe_tlb_inval_types.h |  77 +++-
> > > >  7 files changed, 505 insertions(+), 366 deletions(-)
> > > >  create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > >  create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/Makefile
> > > > b/drivers/gpu/drm/xe/Makefile
> > > > index 332b2057cc00..8a2f836b3ab2 100644
> > > > --- a/drivers/gpu/drm/xe/Makefile
> > > > +++ b/drivers/gpu/drm/xe/Makefile
> > > > @@ -75,6 +75,7 @@ xe-y += xe_bb.o \
> > > >         xe_guc_log.o \
> > > >         xe_guc_pc.o \
> > > >         xe_guc_submit.o \
> > > > +       xe_guc_tlb_inval.o \
> > > >         xe_heci_gsc.o \
> > > >         xe_huc.o \
> > > >         xe_hw_engine.o \
> > > > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > > b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > > index 2ef86c0ae8b4..90ebda5b3790 100644
> > > > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > > @@ -30,9 +30,9 @@
> > > >  #include "xe_guc_log.h"
> > > >  #include "xe_guc_relay.h"
> > > >  #include "xe_guc_submit.h"
> > > > +#include "xe_guc_tlb_inval.h"
> > > >  #include "xe_map.h"
> > > >  #include "xe_pm.h"
> > > > -#include "xe_tlb_inval.h"
> > > >  #include "xe_trace_guc.h"
> > > >  
> > > >  static void receive_g2h(struct xe_guc_ct *ct);
> > > > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > > b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > > new file mode 100644
> > > > index 000000000000..27d7dc938cb1
> > > > --- /dev/null
> > > > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > > @@ -0,0 +1,263 @@
> > > > +// SPDX-License-Identifier: MIT
> > > > +/*
> > > > + * Copyright © 2025 Intel Corporation
> > > > + */
> > > > +
> > > > +#include "abi/guc_actions_abi.h"
> > > > +
> > > > +#include "xe_device.h"
> > > > +#include "xe_gt_stats.h"
> > > > +#include "xe_gt_types.h"
> > > > +#include "xe_guc.h"
> > > > +#include "xe_guc_ct.h"
> > > > +#include "xe_guc_tlb_inval.h"
> > > > +#include "xe_force_wake.h"
> > > > +#include "xe_mmio.h"
> > > > +#include "xe_tlb_inval.h"
> > > > +
> > > > +#include "regs/xe_guc_regs.h"
> > > > +
> > > > +/*
> > > > + * XXX: The seqno algorithm relies on TLB invalidation being
> > > > processed in order
> > > > + * which they currently are by the GuC, if that changes the
> > > > algorithm will need
> > > > + * to be updated.
> > > > + */
> > > > +
> > > > +static int send_tlb_inval(struct xe_guc *guc, const u32 *action,
> > > > int len)
> > > > +{
> > > > +       struct xe_gt *gt = guc_to_gt(guc);
> > > > +
> > > > +       lockdep_assert_held(&guc->ct.lock);
> > > > +       xe_gt_assert(gt, action[1]);    /* Seqno */
> > > > +
> > > > +       xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> > > > +       return xe_guc_ct_send(&guc->ct, action, len,
> > > > +                             G2H_LEN_DW_TLB_INVALIDATE, 1);
> > > 
> > > As written, you’d need xe_guc_ct_send_locked here—but you actually
> > > don’t. More on that below.
> > > 
> > > > +}
> > > > +
> > > > +#define MAKE_INVAL_OP(type)    ((type <<
> > > > XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > > > +               XE_GUC_TLB_INVAL_MODE_HEAVY <<
> > > > XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > > > +               XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > > > +
> > > > +static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > u32 seqno)
> > > > +{
> > > > +       struct xe_guc *guc = tlb_inval->private;
> > > > +       u32 action[] = {
> > > > +               XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > > > +               seqno,
> > > > +               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > > +       };
> > > > +
> > > > +       return send_tlb_inval(guc, action, ARRAY_SIZE(action));
> > > > +}
> > > > +
> > > > +static int send_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval,
> > > > u32 seqno)
> > > > +{
> > > > +       struct xe_guc *guc = tlb_inval->private;
> > > > +       struct xe_gt *gt = guc_to_gt(guc);
> > > > +       struct xe_device *xe = guc_to_xe(guc);
> > > > +
> > > > +       lockdep_assert_held(&guc->ct.lock);
> > > > +
> > > > +       /*
> > > > +        * Returning -ECANCELED in this function is squashed at
> > > > the caller and
> > > > +        * signals waiters.
> > > > +        */
> > > > +
> > > > +       if (xe_guc_ct_enabled(&guc->ct) && guc-
> > > > >submission_state.enabled) {
> > > > +               u32 action[] = {
> > > > +                       XE_GUC_ACTION_TLB_INVALIDATION,
> > > > +                       seqno,
> > > > +                       MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > > > +               };
> > > > +
> > > > +               return send_tlb_inval(guc, action,
> > > > ARRAY_SIZE(action));
> > > > +       } else if (xe_device_uc_enabled(xe) &&
> > > > !xe_device_wedged(xe)) {
> > > > +               struct xe_mmio *mmio = &gt->mmio;
> > > > +               unsigned int fw_ref;
> > > > +
> > > > +               if (IS_SRIOV_VF(xe))
> > > > +                       return -ECANCELED;
> > > > +
> > > > +               fw_ref = xe_force_wake_get(gt_to_fw(gt),
> > > > XE_FW_GT);
> > > > +               if (xe->info.platform == XE_PVC ||
> > > > GRAPHICS_VER(xe) >= 20) {
> > > > +                       xe_mmio_write32(mmio,
> > > > PVC_GUC_TLB_INV_DESC1,
> > > > +                                       PVC_GUC_TLB_INV_DESC1_INV
> > > > ALIDATE);
> > > > +                       xe_mmio_write32(mmio,
> > > > PVC_GUC_TLB_INV_DESC0,
> > > > +                                       PVC_GUC_TLB_INV_DESC0_VAL
> > > > ID);
> > > > +               } else {
> > > > +                       xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> > > > +                                       GUC_TLB_INV_CR_INVALIDATE
> > > > );
> > > > +               }
> > > > +               xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > > > +       }
> > > > +
> > > > +       return -ECANCELED;
> > > > +}
> > > > +
> > > > +/*
> > > > + * Ensure that roundup_pow_of_two(length) doesn't overflow.
> > > > + * Note that roundup_pow_of_two() operates on unsigned long,
> > > > + * not on u64.
> > > > + */
> > > > +#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > > > (rounddown_pow_of_two(ULONG_MAX))
> > > > +
> > > > +static int send_tlb_inval_ppgtt(struct xe_tlb_inval *tlb_inval,
> > > > u32 seqno,
> > > > +                               u64 start, u64 end, u32 asid)
> > > > +{
> > > > +#define MAX_TLB_INVALIDATION_LEN       7
> > > > +       struct xe_guc *guc = tlb_inval->private;
> > > > +       struct xe_gt *gt = guc_to_gt(guc);
> > > > +       u32 action[MAX_TLB_INVALIDATION_LEN];
> > > > +       u64 length = end - start;
> > > > +       int len = 0;
> > > > +
> > > > +       lockdep_assert_held(&guc->ct.lock);
> > > > +
> > > > +       if (guc_to_xe(guc)->info.force_execlist)
> > > > +               return -ECANCELED;
> > > > +
> > > > +       action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > > > +       action[len++] = seqno;
> > > > +       if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > > +           length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > > > +               action[len++] =
> > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > > > +       } else {
> > > > +               u64 orig_start = start;
> > > > +               u64 align;
> > > > +
> > > > +               if (length < SZ_4K)
> > > > +                       length = SZ_4K;
> > > > +
> > > > +               /*
> > > > +                * We need to invalidate a higher granularity if
> > > > start address
> > > > +                * is not aligned to length. When start is not
> > > > aligned with
> > > > +                * length we need to find the length large enough
> > > > to create an
> > > > +                * address mask covering the required range.
> > > > +                */
> > > > +               align = roundup_pow_of_two(length);
> > > > +               start = ALIGN_DOWN(start, align);
> > > > +               end = ALIGN(end, align);
> > > > +               length = align;
> > > > +               while (start + length < end) {
> > > > +                       length <<= 1;
> > > > +                       start = ALIGN_DOWN(orig_start, length);
> > > > +               }
> > > > +
> > > > +               /*
> > > > +                * Minimum invalidation size for a 2MB page that
> > > > the hardware
> > > > +                * expects is 16MB
> > > > +                */
> > > > +               if (length >= SZ_2M) {
> > > > +                       length = max_t(u64, SZ_16M, length);
> > > > +                       start = ALIGN_DOWN(orig_start, length);
> > > > +               }
> > > > +
> > > > +               xe_gt_assert(gt, length >= SZ_4K);
> > > > +               xe_gt_assert(gt, is_power_of_2(length));
> > > > +               xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M)
> > > > - 1,
> > > > +                                                   ilog2(SZ_2M)
> > > > + 1)));
> > > > +               xe_gt_assert(gt, IS_ALIGNED(start, length));
> > > > +
> > > > +               action[len++] =
> > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > > > +               action[len++] = asid;
> > > > +               action[len++] = lower_32_bits(start);
> > > > +               action[len++] = upper_32_bits(start);
> > > > +               action[len++] = ilog2(length) - ilog2(SZ_4K);
> > > > +       }
> > > > +
> > > > +       xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> > > > +
> > > > +       return send_tlb_inval(guc, action, len);
> > > > +}
> > > > +
> > > > +static bool tlb_inval_initialized(struct xe_tlb_inval
> > > > *tlb_inval)
> > > > +{
> > > > +       struct xe_guc *guc = tlb_inval->private;
> > > > +
> > > > +       return xe_guc_ct_initialized(&guc->ct);
> > > > +}
> > > > +
> > > > +static void tlb_inval_flush(struct xe_tlb_inval *tlb_inval)
> > > > +{
> > > > +       struct xe_guc *guc = tlb_inval->private;
> > > > +
> > > > +       LNL_FLUSH_WORK(&guc->ct.g2h_worker);
> > > > +}
> > > > +
> > > > +static long tlb_inval_timeout_delay(struct xe_tlb_inval
> > > > *tlb_inval)
> > > > +{
> > > > +       struct xe_guc *guc = tlb_inval->private;
> > > > +
> > > > +       /* this reflects what HW/GuC needs to process TLB inv
> > > > request */
> > > > +       const long hw_tlb_timeout = HZ / 4;
> > > > +
> > > > +       /* this estimates actual delay caused by the CTB
> > > > transport */
> > > > +       long delay = xe_guc_ct_queue_proc_time_jiffies(&guc->ct);
> > > > +
> > > > +       return hw_tlb_timeout + 2 * delay;
> > > > +}
> > > > +
> > > > +static void tlb_inval_lock(struct xe_tlb_inval *tlb_inval)
> > > > +{
> > > > +       struct xe_guc *guc = tlb_inval->private;
> > > > +
> > > > +       mutex_lock(&guc->ct.lock);
> > > > +}
> > > > +
> > > > +static void tlb_inval_unlock(struct xe_tlb_inval *tlb_inval)
> > > > +{
> > > > +       struct xe_guc *guc = tlb_inval->private;
> > > > +
> > > > +       mutex_unlock(&guc->ct.lock);
> > > > +}
> > > > +
> > > > +static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
> > > > +       .all = send_tlb_inval_all,
> > > > +       .ggtt = send_tlb_inval_ggtt,
> > > > +       .ppgtt = send_tlb_inval_ppgtt,
> > > > +       .initialized = tlb_inval_initialized,
> > > > +       .flush = tlb_inval_flush,
> > > > +       .timeout_delay = tlb_inval_timeout_delay,
> > > > +       .lock = tlb_inval_lock,
> > > > +       .unlock = tlb_inval_unlock,
> > > > +};
> > > > +
> > > > +/**
> > > > + * xe_guc_tlb_inval_init_early() - Init GuC TLB invalidation
> > > > early
> > > > + * @guc: GuC object
> > > > + * @tlb_inval: TLB invalidation client
> > > > + *
> > > > + * Inititialize GuC TLB invalidation by setting back pointer in
> > > > TLB invalidation
> > > > + * client to the GuC and setting GuC backend ops.
> > > > + */
> > > > +void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> > > > +                                struct xe_tlb_inval *tlb_inval)
> > > > +{
> > > > +       tlb_inval->private = guc;
> > > > +       tlb_inval->ops = &guc_tlb_inval_ops;
> > > > +}
> > > > +
> > > > +/**
> > > > + * xe_guc_tlb_inval_done_handler() - TLB invalidation done
> > > > handler
> > > > + * @guc: guc
> > > > + * @msg: message indicating TLB invalidation done
> > > > + * @len: length of message
> > > > + *
> > > > + * Parse seqno of TLB invalidation, wake any waiters for seqno,
> > > > and signal any
> > > > + * invalidation fences for seqno. Algorithm for this depends on
> > > > seqno being
> > > > + * received in-order and asserts this assumption.
> > > > + *
> > > > + * Return: 0 on success, -EPROTO for malformed messages.
> > > > + */
> > > > +int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg,
> > > > u32 len)
> > > > +{
> > > > +       struct xe_gt *gt = guc_to_gt(guc);
> > > > +
> > > > +       if (unlikely(len != 1))
> > > > +               return -EPROTO;
> > > > +
> > > > +       xe_tlb_inval_done_handler(&gt->tlb_inval, msg[0]);
> > > > +
> > > > +       return 0;
> > > > +}
> > > > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> > > > b/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> > > > new file mode 100644
> > > > index 000000000000..07d668b02e3d
> > > > --- /dev/null
> > > > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.h
> > > > @@ -0,0 +1,19 @@
> > > > +/* SPDX-License-Identifier: MIT */
> > > > +/*
> > > > + * Copyright © 2025 Intel Corporation
> > > > + */
> > > > +
> > > > +#ifndef _XE_GUC_TLB_INVAL_H_
> > > > +#define _XE_GUC_TLB_INVAL_H_
> > > > +
> > > > +#include <linux/types.h>
> > > > +
> > > > +struct xe_guc;
> > > > +struct xe_tlb_inval;
> > > > +
> > > > +void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> > > > +                                struct xe_tlb_inval *tlb_inval);
> > > > +
> > > > +int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg,
> > > > u32 len);
> > > > +
> > > > +#endif
> > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c
> > > > b/drivers/gpu/drm/xe/xe_tlb_inval.c
> > > > index c795b78362bf..071c25fbdbac 100644
> > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval.c
> > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
> > > > @@ -12,50 +12,45 @@
> > > >  #include "xe_gt_printk.h"
> > > >  #include "xe_guc.h"
> > > >  #include "xe_guc_ct.h"
> > > > +#include "xe_guc_tlb_inval.h"
> > > >  #include "xe_gt_stats.h"
> > > >  #include "xe_tlb_inval.h"
> > > >  #include "xe_mmio.h"
> > > >  #include "xe_pm.h"
> > > > -#include "xe_sriov.h"
> > > > +#include "xe_tlb_inval.h"
> > > >  #include "xe_trace.h"
> > > > -#include "regs/xe_guc_regs.h"
> > > > -
> > > > -#define FENCE_STACK_BIT                DMA_FENCE_FLAG_USER_BITS
> > > >  
> > > > -/*
> > > > - * TLB inval depends on pending commands in the CT queue and
> > > > then the real
> > > > - * invalidation time. Double up the time to process full CT
> > > > queue
> > > > - * just to be on the safe side.
> > > > +/**
> > > > + * DOC: Xe TLB invalidation
> > > > + *
> > > > + * Xe TLB invalidation is implemented in two layers. The first
> > > > is the frontend
> > > > + * API, which provides an interface for TLB invalidations to the
> > > > driver code.
> > > > + * The frontend handles seqno assignment, synchronization
> > > > (fences), and the
> > > > + * timeout mechanism. The frontend is implemented via an
> > > > embedded structure
> > > > + * xe_tlb_inval that includes a set of ops hooking into the
> > > > backend. The backend
> > > > + * interacts with the hardware (or firmware) to perform the
> > > > actual invalidation.
> > > >   */
> > > > -static long tlb_timeout_jiffies(struct xe_gt *gt)
> > > > -{
> > > > -       /* this reflects what HW/GuC needs to process TLB inv
> > > > request */
> > > > -       const long hw_tlb_timeout = HZ / 4;
> > > >  
> > > > -       /* this estimates actual delay caused by the CTB
> > > > transport */
> > > > -       long delay = xe_guc_ct_queue_proc_time_jiffies(&gt-
> > > > >uc.guc.ct);
> > > > -
> > > > -       return hw_tlb_timeout + 2 * delay;
> > > > -}
> > > > +#define FENCE_STACK_BIT                DMA_FENCE_FLAG_USER_BITS
> > > >  
> > > >  static void xe_tlb_inval_fence_fini(struct xe_tlb_inval_fence
> > > > *fence)
> > > >  {
> > > > -       struct xe_gt *gt;
> > > > -
> > > >         if (WARN_ON_ONCE(!fence->tlb_inval))
> > > >                 return;
> > > >  
> > > > -       gt = fence->tlb_inval->private;
> > > > -       xe_pm_runtime_put(gt_to_xe(gt));
> > > > +       xe_pm_runtime_put(fence->tlb_inval->xe);
> > > >         fence->tlb_inval = NULL; /* fini() should be called once
> > > > */
> > > >  }
> > > >  
> > > >  static void
> > > > -__inval_fence_signal(struct xe_device *xe, struct
> > > > xe_tlb_inval_fence *fence)
> > > > +xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence)
> > > >  {
> > > >         bool stack = test_bit(FENCE_STACK_BIT, &fence-
> > > > >base.flags);
> > > >  
> > > > -       trace_xe_tlb_inval_fence_signal(xe, fence);
> > > > +       lockdep_assert_held(&fence->tlb_inval->pending_lock);
> > > > +
> > > > +       list_del(&fence->link);
> > > > +       trace_xe_tlb_inval_fence_signal(fence->tlb_inval->xe,
> > > > fence);
> > > >         xe_tlb_inval_fence_fini(fence);
> > > >         dma_fence_signal(&fence->base);
> > > >         if (!stack)
> > > > @@ -63,57 +58,50 @@ __inval_fence_signal(struct xe_device *xe,
> > > > struct xe_tlb_inval_fence *fence)
> > > >  }
> > > >  
> > > >  static void
> > > > -inval_fence_signal(struct xe_device *xe, struct
> > > > xe_tlb_inval_fence *fence)
> > > > +xe_tlb_inval_fence_signal_unlocked(struct xe_tlb_inval_fence
> > > > *fence)
> > > >  {
> > > > -       lockdep_assert_held(&fence->tlb_inval->pending_lock);
> > > > -
> > > > -       list_del(&fence->link);
> > > > -       __inval_fence_signal(xe, fence);
> > > > -}
> > > > +       struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
> > > >  
> > > > -static void
> > > > -inval_fence_signal_unlocked(struct xe_device *xe,
> > > > -                           struct xe_tlb_inval_fence *fence)
> > > > -{
> > > > -       spin_lock_irq(&fence->tlb_inval->pending_lock);
> > > > -       inval_fence_signal(xe, fence);
> > > > -       spin_unlock_irq(&fence->tlb_inval->pending_lock);
> > > > +       spin_lock_irq(&tlb_inval->pending_lock);
> > > > +       xe_tlb_inval_fence_signal(fence);
> > > > +       spin_unlock_irq(&tlb_inval->pending_lock);
> > > >  }
> > > >  
> > > > -static void xe_gt_tlb_fence_timeout(struct work_struct *work)
> > > > +static void xe_tlb_inval_fence_timeout(struct work_struct *work)
> > > >  {
> > > > -       struct xe_gt *gt = container_of(work, struct xe_gt,
> > > > -
> > > >                                        tlb_inval.fence_tdr.work);
> > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > +       struct xe_tlb_inval *tlb_inval = container_of(work,
> > > > struct xe_tlb_inval,
> > > > +                                                    
> > > > fence_tdr.work);
> > > > +       struct xe_device *xe = tlb_inval->xe;
> > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > +       long timeout_delay = tlb_inval->ops-
> > > > >timeout_delay(tlb_inval);
> > > >  
> > > > -       LNL_FLUSH_WORK(&gt->uc.guc.ct.g2h_worker);
> > > > +       tlb_inval->ops->flush(tlb_inval);
> > > >  
> > > > -       spin_lock_irq(&gt->tlb_inval.pending_lock);
> > > > +       spin_lock_irq(&tlb_inval->pending_lock);
> > > >         list_for_each_entry_safe(fence, next,
> > > > -                                &gt->tlb_inval.pending_fences,
> > > > link) {
> > > > +                                &tlb_inval->pending_fences,
> > > > link) {
> > > >                 s64 since_inval_ms = ktime_ms_delta(ktime_get(),
> > > >                                                     fence-
> > > > >inval_time);
> > > >  
> > > > -               if (msecs_to_jiffies(since_inval_ms) <
> > > > tlb_timeout_jiffies(gt))
> > > > +               if (msecs_to_jiffies(since_inval_ms) <
> > > > timeout_delay)
> > > >                         break;
> > > >  
> > > >                 trace_xe_tlb_inval_fence_timeout(xe, fence);
> > > > -               xe_gt_err(gt, "TLB invalidation fence timeout,
> > > > seqno=%d recv=%d",
> > > > -                         fence->seqno, gt-
> > > > >tlb_inval.seqno_recv);
> > > > +               drm_err(&xe->drm,
> > > > +                       "TLB invalidation fence timeout, seqno=%d
> > > > recv=%d",
> > > > +                       fence->seqno, tlb_inval->seqno_recv);
> > > >  
> > > >                 fence->base.error = -ETIME;
> > > > -               inval_fence_signal(xe, fence);
> > > > +               xe_tlb_inval_fence_signal(fence);
> > > >         }
> > > > -       if (!list_empty(&gt->tlb_inval.pending_fences))
> > > > -               queue_delayed_work(system_wq,
> > > > -                                  &gt->tlb_inval.fence_tdr,
> > > > -                                  tlb_timeout_jiffies(gt));
> > > > -       spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > > > +       if (!list_empty(&tlb_inval->pending_fences))
> > > > +               queue_delayed_work(system_wq, &tlb_inval-
> > > > >fence_tdr,
> > > > +                                  timeout_delay);
> > > > +       spin_unlock_irq(&tlb_inval->pending_lock);
> > > >  }
> > > >  
> > > >  /**
> > > > - * xe_tlb_inval_init_early - Initialize TLB invalidation state
> > > > + * xe_gt_tlb_inval_init_early() - Initialize TLB invalidation
> > > > state
> > > >   * @gt: GT structure
> > > >   *
> > > >   * Initialize TLB invalidation state, purely software
> > > > initialization, should
> > > > @@ -123,13 +111,12 @@ static void xe_gt_tlb_fence_timeout(struct
> > > > work_struct *work)
> > > >   */
> > > >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
> > > >  {
> > > > -       gt->tlb_inval.private = gt;
> > > > +       gt->tlb_inval.xe = gt_to_xe(gt);
> > > >         gt->tlb_inval.seqno = 1;
> > > >         INIT_LIST_HEAD(&gt->tlb_inval.pending_fences);
> > > >         spin_lock_init(&gt->tlb_inval.pending_lock);
> > > >         spin_lock_init(&gt->tlb_inval.lock);
> > > > -       INIT_DELAYED_WORK(&gt->tlb_inval.fence_tdr,
> > > > -                         xe_gt_tlb_fence_timeout);
> > > > +       INIT_DELAYED_WORK(&gt->tlb_inval.fence_tdr,
> > > > xe_tlb_inval_fence_timeout);
> > > >  
> > > >         gt->tlb_inval.job_wq =
> > > >                 drmm_alloc_ordered_workqueue(&gt_to_xe(gt)->drm,
> > > > "gt-tbl-inval-job-wq",
> > > > @@ -137,60 +124,64 @@ int xe_gt_tlb_inval_init_early(struct xe_gt
> > > > *gt)
> > > >         if (IS_ERR(gt->tlb_inval.job_wq))
> > > >                 return PTR_ERR(gt->tlb_inval.job_wq);
> > > >  
> > > > +       /* XXX: Blindly setting up backend to GuC */
> > > > +       xe_guc_tlb_inval_init_early(&gt->uc.guc, &gt->tlb_inval);
> > > > +
> > > >         return 0;
> > > >  }
> > > >  
> > > >  /**
> > > > - * xe_tlb_inval_reset - Initialize TLB invalidation reset
> > > > + * xe_tlb_inval_reset() - TLB invalidation reset
> > > >   * @tlb_inval: TLB invalidation client
> > > >   *
> > > >   * Signal any pending invalidation fences, should be called
> > > > during a GT reset
> > > >   */
> > > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval)
> > > >  {
> > > > -       struct xe_gt *gt = tlb_inval->private;
> > > >         struct xe_tlb_inval_fence *fence, *next;
> > > >         int pending_seqno;
> > > >  
> > > >         /*
> > > > -        * we can get here before the CTs are even initialized if
> > > > we're wedging
> > > > -        * very early, in which case there are not going to be
> > > > any pending
> > > > -        * fences so we can bail immediately.
> > > > +        * we can get here before the backends are even
> > > > initialized if we're
> > > > +        * wedging very early, in which case there are not going
> > > > to be any
> > > > +        * pendind fences so we can bail immediately.
> > > >          */
> > > > -       if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
> > > > +       if (!tlb_inval->ops->initialized(tlb_inval))
> > > >                 return;
> > > >  
> > > >         /*
> > > > -        * CT channel is already disabled at this point. No new
> > > > TLB requests can
> > > > +        * Backend is already disabled at this point. No new TLB
> > > > requests can
> > > >          * appear.
> > > >          */
> > > >  
> > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > -       spin_lock_irq(&gt->tlb_inval.pending_lock);
> > > > -       cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > > > +       tlb_inval->ops->lock(tlb_inval);
> > > 
> > > I think you want a dedicated lock embedded in struct xe_tlb_inval,
> > > rather than reaching into the backend to grab one.
> > > 
> > > This will deadlock as written: G2H TLB inval messages are sometimes
> > > processed while holding ct->lock (non-fast path, unlikely) and
> > > sometimes
> > > without it (fast path, likely).
> > 
> > Ugh, I'm off today. Ignore the deadlock part, I was confusing
> > myself...
> > I was thinking this was the function xe_tlb_inval_done_handler, it is
> > not. I still think xe_tlb_inval should its own lock but this patch
> > written should work with s/xe_guc_ct_send/xe_guc_ct_send_locked.
> 
> So one reason I didn't go that way is we did just the reverse recently
> - moved from a TLB dedicated lock to the more specific CT lock since
> these are all going into the CT handler anyway when we use GuC
> submission. Then this embedded version allows us to lock at the bottom
> data layer rather than having a separate lock in the upper layer.
> Another thing is we might want to have different types of invalidation
> running in parallel without locking the data in the upper layer since
> the real contention would be in the lower level pipelining anyway.
> 

I can see the reasoning behind this approach, and maybe it’s fine.

But consider the case where the GuC backend has to look up a VM, iterate
over a list of exec queues, and send multiple H2Gs to the hardware, each
with a corresponding G2H (per-context invalidations). In the worst case,
the CT code may have to wait for and process some G2Hs because our G2H
credits are exhausted—all while holding the CT lock, which currently
blocks any hardware submissions (i.e., hardware submissions need the CT
lock). Now imagine multiple sources issuing invalidations: they could
grab the CT lock before a submission waiting on it, further delaying that
submission. 

The longer a mutex is held, the more likely the CPU thread holding it
could switched out while holding it.

This doesn’t seem scalable compared to using a finer-grained CT lock
(e.g., only taking it in xe_guc_ct_send).

I’m not saying this won’t work as you have it—I think it will—but the
consequences of holding the CT lock for an extended period need to be
considered.

Matt

> Thanks,
> Stuart
> 
> > 
> > Matt 
> > 
> > > 
> > > I’d call this lock seqno_lock, since it protects exactly that—the
> > > order
> > > in which a seqno is assigned by the frontend and handed to the
> > > backend.
> > > 
> > > Prime this lock for reclaim as well—do what primelockdep() does in
> > > xe_guc_ct.c—to make it clear that memory allocations are not
> > > allowed
> > > while the lock is held as TLB invalidations can be called from two
> > > reclaim paths:
> > > 
> > > - MMU notifier callbacks
> > > - The dma-fence signaling path of VM binds that require a TLB
> > >   invalidation
> > > 
> > > > +       spin_lock_irq(&tlb_inval->pending_lock);
> > > > +       cancel_delayed_work(&tlb_inval->fence_tdr);
> > > >         /*
> > > >          * We might have various kworkers waiting for TLB flushes
> > > > to complete
> > > >          * which are not tracked with an explicit TLB fence,
> > > > however at this
> > > > -        * stage that will never happen since the CT is already
> > > > disabled, so
> > > > -        * make sure we signal them here under the assumption
> > > > that we have
> > > > +        * stage that will never happen since the backend is
> > > > already disabled,
> > > > +        * so make sure we signal them here under the assumption
> > > > that we have
> > > >          * completed a full GT reset.
> > > >          */
> > > > -       if (gt->tlb_inval.seqno == 1)
> > > > +       if (tlb_inval->seqno == 1)
> > > >                 pending_seqno = TLB_INVALIDATION_SEQNO_MAX - 1;
> > > >         else
> > > > -               pending_seqno = gt->tlb_inval.seqno - 1;
> > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, pending_seqno);
> > > > +               pending_seqno = tlb_inval->seqno - 1;
> > > > +       WRITE_ONCE(tlb_inval->seqno_recv, pending_seqno);
> > > >  
> > > >         list_for_each_entry_safe(fence, next,
> > > > -                                &gt->tlb_inval.pending_fences,
> > > > link)
> > > > -               inval_fence_signal(gt_to_xe(gt), fence);
> > > > -       spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > +                                &tlb_inval->pending_fences,
> > > > link)
> > > > +               xe_tlb_inval_fence_signal(fence);
> > > > +       spin_unlock_irq(&tlb_inval->pending_lock);
> > > > +       tlb_inval->ops->unlock(tlb_inval);
> > > >  }
> > > >  
> > > > -static bool tlb_inval_seqno_past(struct xe_gt *gt, int seqno)
> > > > +static bool xe_tlb_inval_seqno_past(struct xe_tlb_inval
> > > > *tlb_inval, int seqno)
> > > >  {
> > > > -       int seqno_recv = READ_ONCE(gt->tlb_inval.seqno_recv);
> > > > +       int seqno_recv = READ_ONCE(tlb_inval->seqno_recv);
> > > > +
> > > > +       lockdep_assert_held(&tlb_inval->pending_lock);
> > > >  
> > > >         if (seqno - seqno_recv < -(TLB_INVALIDATION_SEQNO_MAX /
> > > > 2))
> > > >                 return false;
> > > > @@ -201,44 +192,20 @@ static bool tlb_inval_seqno_past(struct
> > > > xe_gt *gt, int seqno)
> > > >         return seqno_recv >= seqno;
> > > >  }
> > > >  
> > > > -static int send_tlb_inval(struct xe_guc *guc, const u32 *action,
> > > > int len)
> > > > -{
> > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > -
> > > > -       xe_gt_assert(gt, action[1]);    /* Seqno */
> > > > -       lockdep_assert_held(&guc->ct.lock);
> > > > -
> > > > -       /*
> > > > -        * XXX: The seqno algorithm relies on TLB invalidation
> > > > being processed
> > > > -        * in order which they currently are, if that changes the
> > > > algorithm will
> > > > -        * need to be updated.
> > > > -        */
> > > > -
> > > > -       xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> > > > -
> > > > -       return xe_guc_ct_send(&guc->ct, action, len,
> > > > -                             G2H_LEN_DW_TLB_INVALIDATE, 1);
> > > > -}
> > > > -
> > > >  static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence
> > > > *fence)
> > > >  {
> > > >         struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
> > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > -
> > > > -       lockdep_assert_held(&gt->uc.guc.ct.lock);
> > > >  
> > > >         fence->seqno = tlb_inval->seqno;
> > > > -       trace_xe_tlb_inval_fence_send(xe, fence);
> > > > +       trace_xe_tlb_inval_fence_send(tlb_inval->xe, fence);
> > > >  
> > > >         spin_lock_irq(&tlb_inval->pending_lock);
> > > >         fence->inval_time = ktime_get();
> > > >         list_add_tail(&fence->link, &tlb_inval->pending_fences);
> > > >  
> > > >         if (list_is_singular(&tlb_inval->pending_fences))
> > > > -               queue_delayed_work(system_wq,
> > > > -                                  &tlb_inval->fence_tdr,
> > > > -                                  tlb_timeout_jiffies(gt));
> > > > +               queue_delayed_work(system_wq, &tlb_inval-
> > > > >fence_tdr,
> > > > +                                  tlb_inval->ops-
> > > > >timeout_delay(tlb_inval));
> > > >         spin_unlock_irq(&tlb_inval->pending_lock);
> > > >  
> > > >         tlb_inval->seqno = (tlb_inval->seqno + 1) %
> > > > @@ -247,202 +214,63 @@ static void xe_tlb_inval_fence_prep(struct
> > > > xe_tlb_inval_fence *fence)
> > > >                 tlb_inval->seqno = 1;
> > > >  }
> > > >  
> > > > -#define MAKE_INVAL_OP(type)    ((type <<
> > > > XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > > > -               XE_GUC_TLB_INVAL_MODE_HEAVY <<
> > > > XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > > > -               XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > > > -
> > > > -static int send_tlb_inval_ggtt(struct xe_gt *gt, int seqno)
> > > > -{
> > > > -       u32 action[] = {
> > > > -               XE_GUC_ACTION_TLB_INVALIDATION,
> > > > -               seqno,
> > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > > > -       };
> > > > -
> > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > ARRAY_SIZE(action));
> > > > -}
> > > > -
> > > > -static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > -                             struct xe_tlb_inval_fence *fence)
> > > > -{
> > > > -       u32 action[] = {
> > > > -               XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > > > -               0,  /* seqno, replaced in send_tlb_inval */
> > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > > -       };
> > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > -
> > > > -       xe_gt_assert(gt, fence);
> > > > -
> > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > ARRAY_SIZE(action));
> > > > -}
> > > > +#define xe_tlb_inval_issue(__tlb_inval, __fence, op, args...)  \
> > > > +({                                                             \
> > > > +       int __ret;                                              \
> > > > +                                                               \
> > > > +       xe_assert((__tlb_inval)->xe, (__tlb_inval)->ops);       \
> > > > +       xe_assert((__tlb_inval)->xe, (__fence));                \
> > > > +                                                               \
> > > > +       (__tlb_inval)->ops->lock((__tlb_inval));                \
> > > > +       xe_tlb_inval_fence_prep((__fence));                     \
> > > > +       __ret = op((__tlb_inval), (__fence)->seqno, ##args);    \
> > > > +       if (__ret < 0)                                          \
> > > > +               xe_tlb_inval_fence_signal_unlocked((__fence));  \
> > > > +       (__tlb_inval)->ops->unlock((__tlb_inval));              \
> > > > +                                                               \
> > > > +       __ret == -ECANCELED ? 0 : __ret;                        \
> > > > +})
> > > >  
> > > >  /**
> > > > - * xe_gt_tlb_invalidation_all - Invalidate all TLBs across PF
> > > > and all VFs.
> > > > - * @gt: the &xe_gt structure
> > > > - * @fence: the &xe_tlb_inval_fence to be signaled on completion
> > > > + * xe_tlb_inval_all() - Issue a TLB invalidation for all TLBs
> > > > + * @tlb_inval: TLB invalidation client
> > > > + * @fence: invalidation fence which will be signal on TLB
> > > > invalidation
> > > > + * completion
> > > >   *
> > > > - * Send a request to invalidate all TLBs across PF and all VFs.
> > > > + * Issue a TLB invalidation for all TLBs. Completion of TLB is
> > > > asynchronous and
> > > > + * caller can use the invalidation fence to wait for completion.
> > > >   *
> > > >   * Return: 0 on success, negative error code on error
> > > >   */
> > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > >                      struct xe_tlb_inval_fence *fence)
> > > >  {
> > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > -       int err;
> > > > -
> > > > -       err = send_tlb_inval_all(tlb_inval, fence);
> > > > -       if (err)
> > > > -               xe_gt_err(gt, "TLB invalidation request failed
> > > > (%pe)", ERR_PTR(err));
> > > > -
> > > > -       return err;
> > > > -}
> > > > -
> > > > -/*
> > > > - * Ensure that roundup_pow_of_two(length) doesn't overflow.
> > > > - * Note that roundup_pow_of_two() operates on unsigned long,
> > > > - * not on u64.
> > > > - */
> > > > -#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > > > (rounddown_pow_of_two(ULONG_MAX))
> > > > -
> > > > -static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64 start, u64
> > > > end,
> > > > -                               u32 asid, int seqno)
> > > > -{
> > > > -#define MAX_TLB_INVALIDATION_LEN       7
> > > > -       u32 action[MAX_TLB_INVALIDATION_LEN];
> > > > -       u64 length = end - start;
> > > > -       int len = 0;
> > > > -
> > > > -       action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > > > -       action[len++] = seqno;
> > > > -       if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > > -           length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > > > -               action[len++] =
> > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > > > -       } else {
> > > > -               u64 orig_start = start;
> > > > -               u64 align;
> > > > -
> > > > -               if (length < SZ_4K)
> > > > -                       length = SZ_4K;
> > > > -
> > > > -               /*
> > > > -                * We need to invalidate a higher granularity if
> > > > start address
> > > > -                * is not aligned to length. When start is not
> > > > aligned with
> > > > -                * length we need to find the length large enough
> > > > to create an
> > > > -                * address mask covering the required range.
> > > > -                */
> > > > -               align = roundup_pow_of_two(length);
> > > > -               start = ALIGN_DOWN(start, align);
> > > > -               end = ALIGN(end, align);
> > > > -               length = align;
> > > > -               while (start + length < end) {
> > > > -                       length <<= 1;
> > > > -                       start = ALIGN_DOWN(orig_start, length);
> > > > -               }
> > > > -
> > > > -               /*
> > > > -                * Minimum invalidation size for a 2MB page that
> > > > the hardware
> > > > -                * expects is 16MB
> > > > -                */
> > > > -               if (length >= SZ_2M) {
> > > > -                       length = max_t(u64, SZ_16M, length);
> > > > -                       start = ALIGN_DOWN(orig_start, length);
> > > > -               }
> > > > -
> > > > -               xe_gt_assert(gt, length >= SZ_4K);
> > > > -               xe_gt_assert(gt, is_power_of_2(length));
> > > > -               xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M)
> > > > - 1,
> > > > -                                                   ilog2(SZ_2M)
> > > > + 1)));
> > > > -               xe_gt_assert(gt, IS_ALIGNED(start, length));
> > > > -
> > > > -               action[len++] =
> > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > > > -               action[len++] = asid;
> > > > -               action[len++] = lower_32_bits(start);
> > > > -               action[len++] = upper_32_bits(start);
> > > > -               action[len++] = ilog2(length) - ilog2(SZ_4K);
> > > > -       }
> > > > -
> > > > -       xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> > > > -
> > > > -       return send_tlb_inval(&gt->uc.guc, action, len);
> > > > -}
> > > > -
> > > > -static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
> > > > -                              struct xe_tlb_inval_fence *fence)
> > > > -{
> > > > -       int ret;
> > > > -
> > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > -
> > > > -       xe_tlb_inval_fence_prep(fence);
> > > > -
> > > > -       ret = send_tlb_inval_ggtt(gt, fence->seqno);
> > > > -       if (ret < 0)
> > > > -               inval_fence_signal_unlocked(gt_to_xe(gt), fence);
> > > > -
> > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > -
> > > > -       /*
> > > > -        * -ECANCELED indicates the CT is stopped for a GT reset.
> > > > TLB caches
> > > > -        *  should be nuked on a GT reset so this error can be
> > > > ignored.
> > > > -        */
> > > > -       if (ret == -ECANCELED)
> > > > -               return 0;
> > > > -
> > > > -       return ret;
> > > > +       return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval-
> > > > >ops->all);
> > > >  }
> > > >  
> > > >  /**
> > > > - * xe_tlb_inval_ggtt - Issue a TLB invalidation on this GT for
> > > > the GGTT
> > > > + * xe_tlb_inval_ggtt() - Issue a TLB invalidation for the GGTT
> > > >   * @tlb_inval: TLB invalidation client
> > > >   *
> > > > - * Issue a TLB invalidation for the GGTT. Completion of TLB
> > > > invalidation is
> > > > - * synchronous.
> > > > + * Issue a TLB invalidation for the GGTT. Completion of TLB is
> > > > asynchronous and
> > > > + * caller can use the invalidation fence to wait for completion.
> > > >   *
> > > >   * Return: 0 on success, negative error code on error
> > > >   */
> > > >  int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
> > > >  {
> > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > -       unsigned int fw_ref;
> > > > -
> > > > -       if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
> > > > -           gt->uc.guc.submission_state.enabled) {
> > > > -               struct xe_tlb_inval_fence fence;
> > > > -               int ret;
> > > > -
> > > > -               xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> > > > -               ret = __xe_tlb_inval_ggtt(gt, &fence);
> > > > -               if (ret)
> > > > -                       return ret;
> > > > -
> > > > -               xe_tlb_inval_fence_wait(&fence);
> > > > -       } else if (xe_device_uc_enabled(xe) &&
> > > > !xe_device_wedged(xe)) {
> > > > -               struct xe_mmio *mmio = &gt->mmio;
> > > > -
> > > > -               if (IS_SRIOV_VF(xe))
> > > > -                       return 0;
> > > > -
> > > > -               fw_ref = xe_force_wake_get(gt_to_fw(gt),
> > > > XE_FW_GT);
> > > > -               if (xe->info.platform == XE_PVC ||
> > > > GRAPHICS_VER(xe) >= 20) {
> > > > -                       xe_mmio_write32(mmio,
> > > > PVC_GUC_TLB_INV_DESC1,
> > > > -
> > > >                                        PVC_GUC_TLB_INV_DESC1_INVAL
> > > > IDATE);
> > > > -                       xe_mmio_write32(mmio,
> > > > PVC_GUC_TLB_INV_DESC0,
> > > > -
> > > >                                        PVC_GUC_TLB_INV_DESC0_VALID
> > > > );
> > > > -               } else {
> > > > -                       xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> > > > -
> > > >                                        GUC_TLB_INV_CR_INVALIDATE);
> > > > -               }
> > > > -               xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > > > -       }
> > > > +       struct xe_tlb_inval_fence fence, *fence_ptr = &fence;
> > > > +       int ret;
> > > >  
> > > > -       return 0;
> > > > +       xe_tlb_inval_fence_init(tlb_inval, fence_ptr, true);
> > > > +       ret = xe_tlb_inval_issue(tlb_inval, fence_ptr, tlb_inval-
> > > > >ops->ggtt);
> > > > +       xe_tlb_inval_fence_wait(fence_ptr);
> > > > +
> > > > +       return ret;
> > > >  }
> > > >  
> > > >  /**
> > > > - * xe_tlb_inval_range - Issue a TLB invalidation on this GT for
> > > > an address range
> > > > + * xe_tlb_inval_range() - Issue a TLB invalidation for an
> > > > address range
> > > >   * @tlb_inval: TLB invalidation client
> > > >   * @fence: invalidation fence which will be signal on TLB
> > > > invalidation
> > > >   * completion
> > > > @@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct xe_tlb_inval
> > > > *tlb_inval,
> > > >                        struct xe_tlb_inval_fence *fence, u64
> > > > start, u64 end,
> > > >                        u32 asid)
> > > >  {
> > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > -       int  ret;
> > > > -
> > > > -       xe_gt_assert(gt, fence);
> > > > -
> > > > -       /* Execlists not supported */
> > > > -       if (xe->info.force_execlist) {
> > > > -               __inval_fence_signal(xe, fence);
> > > > -               return 0;
> > > > -       }
> > > > -
> > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > -
> > > > -       xe_tlb_inval_fence_prep(fence);
> > > > -
> > > > -       ret = send_tlb_inval_ppgtt(gt, start, end, asid, fence-
> > > > >seqno);
> > > > -       if (ret < 0)
> > > > -               inval_fence_signal_unlocked(xe, fence);
> > > > -
> > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > -
> > > > -       return ret;
> > > > +       return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval-
> > > > >ops->ppgtt,
> > > > +                                 start, end, asid);
> > > >  }
> > > >  
> > > >  /**
> > > > - * xe_tlb_inval_vm - Issue a TLB invalidation on this GT for a
> > > > VM
> > > > + * xe_tlb_inval_vm() - Issue a TLB invalidation for a VM
> > > >   * @tlb_inval: TLB invalidation client
> > > >   * @vm: VM to invalidate
> > > >   *
> > > > @@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct xe_tlb_inval
> > > > *tlb_inval, struct xe_vm *vm)
> > > >  {
> > > >         struct xe_tlb_inval_fence fence;
> > > >         u64 range = 1ull << vm->xe->info.va_bits;
> > > > -       int ret;
> > > >  
> > > >         xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> > > > -
> > > > -       ret = xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm-
> > > > >usm.asid);
> > > > -       if (ret < 0)
> > > > -               return;
> > > > -
> > > > +       xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm-
> > > > >usm.asid);
> > > >         xe_tlb_inval_fence_wait(&fence);
> > > >  }
> > > >  
> > > >  /**
> > > > - * xe_tlb_inval_done_handler - TLB invalidation done handler
> > > > - * @gt: gt
> > > > + * xe_tlb_inval_done_handler() - TLB invalidation done handler
> > > > + * @tlb_inval: TLB invalidation client
> > > >   * @seqno: seqno of invalidation that is done
> > > >   *
> > > >   * Update recv seqno, signal any TLB invalidation fences, and
> > > > restart TDR
> > > 
> > > I'd mention that is function is safe be called from any context
> > > (i.e.,
> > > process, atomic, and hardirq contexts are allowed).
> > > 
> > > We might need to convert tlb_inval.pending_lock to a raw_spinlock_t
> > > for
> > > PREEMPT_RT enablement. Same for the GuC fast_lock. AFAIK we haven’t
> > > had
> > > any complaints, so maybe I’m just overthinking it, but also perhaps
> > > not.
> > > 
> > > >   */
> > > > -static void xe_tlb_inval_done_handler(struct xe_gt *gt, int
> > > > seqno)
> > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval,
> > > > int seqno)
> > > >  {
> > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > +       struct xe_device *xe = tlb_inval->xe;
> > > >         struct xe_tlb_inval_fence *fence, *next;
> > > >         unsigned long flags;
> > > >  
> > > > @@ -535,77 +337,53 @@ static void
> > > > xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> > > >          * officially process the CT message like if racing
> > > > against
> > > >          * process_g2h_msg().
> > > >          */
> > > > -       spin_lock_irqsave(&gt->tlb_inval.pending_lock, flags);
> > > > -       if (tlb_inval_seqno_past(gt, seqno)) {
> > > > -               spin_unlock_irqrestore(&gt-
> > > > >tlb_inval.pending_lock, flags);
> > > > +       spin_lock_irqsave(&tlb_inval->pending_lock, flags);
> > > > +       if (xe_tlb_inval_seqno_past(tlb_inval, seqno)) {
> > > > +               spin_unlock_irqrestore(&tlb_inval->pending_lock,
> > > > flags);
> > > >                 return;
> > > >         }
> > > >  
> > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
> > > > +       WRITE_ONCE(tlb_inval->seqno_recv, seqno);
> > > >  
> > > >         list_for_each_entry_safe(fence, next,
> > > > -                                &gt->tlb_inval.pending_fences,
> > > > link) {
> > > > +                                &tlb_inval->pending_fences,
> > > > link) {
> > > >                 trace_xe_tlb_inval_fence_recv(xe, fence);
> > > >  
> > > > -               if (!tlb_inval_seqno_past(gt, fence->seqno))
> > > > +               if (!xe_tlb_inval_seqno_past(tlb_inval, fence-
> > > > >seqno))
> > > >                         break;
> > > >  
> > > > -               inval_fence_signal(xe, fence);
> > > > +               xe_tlb_inval_fence_signal(fence);
> > > >         }
> > > >  
> > > > -       if (!list_empty(&gt->tlb_inval.pending_fences))
> > > > +       if (!list_empty(&tlb_inval->pending_fences))
> > > >                 mod_delayed_work(system_wq,
> > > > -                                &gt->tlb_inval.fence_tdr,
> > > > -                                tlb_timeout_jiffies(gt));
> > > > +                                &tlb_inval->fence_tdr,
> > > > +                                tlb_inval->ops-
> > > > >timeout_delay(tlb_inval));
> > > >         else
> > > > -               cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > > > +               cancel_delayed_work(&tlb_inval->fence_tdr);
> > > >  
> > > > -       spin_unlock_irqrestore(&gt->tlb_inval.pending_lock,
> > > > flags);
> > > > -}
> > > > -
> > > > -/**
> > > > - * xe_guc_tlb_inval_done_handler - TLB invalidation done handler
> > > > - * @guc: guc
> > > > - * @msg: message indicating TLB invalidation done
> > > > - * @len: length of message
> > > > - *
> > > > - * Parse seqno of TLB invalidation, wake any waiters for seqno,
> > > > and signal any
> > > > - * invalidation fences for seqno. Algorithm for this depends on
> > > > seqno being
> > > > - * received in-order and asserts this assumption.
> > > > - *
> > > > - * Return: 0 on success, -EPROTO for malformed messages.
> > > > - */
> > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg,
> > > > u32 len)
> > > > -{
> > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > -
> > > > -       if (unlikely(len != 1))
> > > > -               return -EPROTO;
> > > > -
> > > > -       xe_tlb_inval_done_handler(gt, msg[0]);
> > > > -
> > > > -       return 0;
> > > > +       spin_unlock_irqrestore(&tlb_inval->pending_lock, flags);
> > > >  }
> > > >  
> > > >  static const char *
> > > > -inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > > > +xe_inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > > >  {
> > > >         return "xe";
> > > >  }
> > > >  
> > > >  static const char *
> > > > -inval_fence_get_timeline_name(struct dma_fence *dma_fence)
> > > > +xe_inval_fence_get_timeline_name(struct dma_fence *dma_fence)
> > > >  {
> > > > -       return "inval_fence";
> > > > +       return "tlb_inval_fence";
> > > >  }
> > > >  
> > > >  static const struct dma_fence_ops inval_fence_ops = {
> > > > -       .get_driver_name = inval_fence_get_driver_name,
> > > > -       .get_timeline_name = inval_fence_get_timeline_name,
> > > > +       .get_driver_name = xe_inval_fence_get_driver_name,
> > > > +       .get_timeline_name = xe_inval_fence_get_timeline_name,
> > > >  };
> > > >  
> > > >  /**
> > > > - * xe_tlb_inval_fence_init - Initialize TLB invalidation fence
> > > > + * xe_tlb_inval_fence_init() - Initialize TLB invalidation fence
> > > >   * @tlb_inval: TLB invalidation client
> > > >   * @fence: TLB invalidation fence to initialize
> > > >   * @stack: fence is stack variable
> > > > @@ -618,15 +396,12 @@ void xe_tlb_inval_fence_init(struct
> > > > xe_tlb_inval *tlb_inval,
> > > >                              struct xe_tlb_inval_fence *fence,
> > > >                              bool stack)
> > > >  {
> > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > -
> > > > -       xe_pm_runtime_get_noresume(gt_to_xe(gt));
> > > > +       xe_pm_runtime_get_noresume(tlb_inval->xe);
> > > >  
> > > > -       spin_lock_irq(&gt->tlb_inval.lock);
> > > > -       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > -                      &gt->tlb_inval.lock,
> > > > +       spin_lock_irq(&tlb_inval->lock);
> > > > +       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > &tlb_inval->lock,
> > > >                        dma_fence_context_alloc(1), 1);
> > > > -       spin_unlock_irq(&gt->tlb_inval.lock);
> > > > +       spin_unlock_irq(&tlb_inval->lock);
> > > 
> > > While here, 'fence_lock' is probably a better name.
> > > 
> > > Matt
> > > 
> > > >         INIT_LIST_HEAD(&fence->link);
> > > >         if (stack)
> > > >                 set_bit(FENCE_STACK_BIT, &fence->base.flags);
> > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > index 7adee3f8c551..cdeafc8d4391 100644
> > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > @@ -18,24 +18,30 @@ struct xe_vma;
> > > >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
> > > >  
> > > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval);
> > > > -int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > -void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct
> > > > xe_vm *vm);
> > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > >                      struct xe_tlb_inval_fence *fence);
> > > > +int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > +void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct
> > > > xe_vm *vm);
> > > >  int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
> > > >                        struct xe_tlb_inval_fence *fence,
> > > >                        u64 start, u64 end, u32 asid);
> > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32 *msg,
> > > > u32 len);
> > > >  
> > > >  void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
> > > >                              struct xe_tlb_inval_fence *fence,
> > > >                              bool stack);
> > > > -void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence
> > > > *fence);
> > > >  
> > > > +/**
> > > > + * xe_tlb_inval_fence_wait() - TLB invalidiation fence wait
> > > > + * @fence: TLB invalidation fence to wait on
> > > > + *
> > > > + * Wait on a TLB invalidiation fence until it signals, non
> > > > interruptable
> > > > + */
> > > >  static inline void
> > > >  xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence *fence)
> > > >  {
> > > >         dma_fence_wait(&fence->base, false);
> > > >  }
> > > >  
> > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval,
> > > > int seqno);
> > > > +
> > > >  #endif /* _XE_TLB_INVAL_ */
> > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > index 05b6adc929bb..c1ad96d24fc8 100644
> > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > @@ -9,10 +9,85 @@
> > > >  #include <linux/workqueue.h>
> > > >  #include <linux/dma-fence.h>
> > > >  
> > > > -/** struct xe_tlb_inval - TLB invalidation client */
> > > > +struct xe_tlb_inval;
> > > > +
> > > > +/** struct xe_tlb_inval_ops - TLB invalidation ops (backend) */
> > > > +struct xe_tlb_inval_ops {
> > > > +       /**
> > > > +        * @all: Invalidate all TLBs
> > > > +        * @tlb_inval: TLB invalidation client
> > > > +        * @seqno: Seqno of TLB invalidation
> > > > +        *
> > > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > > reset, error on
> > > > +        * failure
> > > > +        */
> > > > +       int (*all)(struct xe_tlb_inval *tlb_inval, u32 seqno);
> > > > +
> > > > +       /**
> > > > +        * @ggtt: Invalidate global translation TLBs
> > > > +        * @tlb_inval: TLB invalidation client
> > > > +        * @seqno: Seqno of TLB invalidation
> > > > +        *
> > > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > > reset, error on
> > > > +        * failure
> > > > +        */
> > > > +       int (*ggtt)(struct xe_tlb_inval *tlb_inval, u32 seqno);
> > > > +
> > > > +       /**
> > > > +        * @ppttt: Invalidate per-process translation TLBs
> > > > +        * @tlb_inval: TLB invalidation client
> > > > +        * @seqno: Seqno of TLB invalidation
> > > > +        * @start: Start address
> > > > +        * @end: End address
> > > > +        * @asid: Address space ID
> > > > +        *
> > > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > > reset, error on
> > > > +        * failure
> > > > +        */
> > > > +       int (*ppgtt)(struct xe_tlb_inval *tlb_inval, u32 seqno,
> > > > u64 start,
> > > > +                    u64 end, u32 asid);
> > > > +
> > > > +       /**
> > > > +        * @initialized: Backend is initialized
> > > > +        * @tlb_inval: TLB invalidation client
> > > > +        *
> > > > +        * Return: True if back is initialized, False otherwise
> > > > +        */
> > > > +       bool (*initialized)(struct xe_tlb_inval *tlb_inval);
> > > > +
> > > > +       /**
> > > > +        * @flush: Flush pending TLB invalidations
> > > > +        * @tlb_inval: TLB invalidation client
> > > > +        */
> > > > +       void (*flush)(struct xe_tlb_inval *tlb_inval);
> > > > +
> > > > +       /**
> > > > +        * @timeout_delay: Timeout delay for TLB invalidation
> > > > +        * @tlb_inval: TLB invalidation client
> > > > +        *
> > > > +        * Return: Timeout delay for TLB invalidation in jiffies
> > > > +        */
> > > > +       long (*timeout_delay)(struct xe_tlb_inval *tlb_inval);
> > > > +
> > > > +       /**
> > > > +        * @lock: Lock resources protecting the backend seqno
> > > > management
> > > > +        */
> > > > +       void (*lock)(struct xe_tlb_inval *tlb_inval);
> > > > +
> > > > +       /**
> > > > +        * @unlock: Lock resources protecting the backend seqno
> > > > management
> > > > +        */
> > > > +       void (*unlock)(struct xe_tlb_inval *tlb_inval);
> > > > +};
> > > > +
> > > > +/** struct xe_tlb_inval - TLB invalidation client (frontend) */
> > > >  struct xe_tlb_inval {
> > > >         /** @private: Backend private pointer */
> > > >         void *private;
> > > > +       /** @xe: Pointer to Xe device */
> > > > +       struct xe_device *xe;
> > > > +       /** @ops: TLB invalidation ops */
> > > > +       const struct xe_tlb_inval_ops *ops;
> > > >         /** @tlb_inval.seqno: TLB invalidation seqno, protected
> > > > by CT lock */
> > > >  #define TLB_INVALIDATION_SEQNO_MAX     0x100000
> > > >         int seqno;
> > > > -- 
> > > > 2.34.1
> > > > 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend
  2025-07-23 20:47         ` Matthew Brost
@ 2025-07-23 20:55           ` Summers, Stuart
  2025-07-23 21:22             ` Matthew Brost
  0 siblings, 1 reply; 19+ messages in thread
From: Summers, Stuart @ 2025-07-23 20:55 UTC (permalink / raw)
  To: Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
	Kassabri, Farah, Auld, Matthew

On Wed, 2025-07-23 at 13:47 -0700, Matthew Brost wrote:
> 

<cut>
(just to reduce the noise in the rest of the patch here for now...)

> > > > >  
> > > > >  /**
> > > > > - * xe_tlb_inval_reset - Initialize TLB invalidation reset
> > > > > + * xe_tlb_inval_reset() - TLB invalidation reset
> > > > >   * @tlb_inval: TLB invalidation client
> > > > >   *
> > > > >   * Signal any pending invalidation fences, should be called
> > > > > during a GT reset
> > > > >   */
> > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval)
> > > > >  {
> > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > >         int pending_seqno;
> > > > >  
> > > > >         /*
> > > > > -        * we can get here before the CTs are even
> > > > > initialized if
> > > > > we're wedging
> > > > > -        * very early, in which case there are not going to
> > > > > be
> > > > > any pending
> > > > > -        * fences so we can bail immediately.
> > > > > +        * we can get here before the backends are even
> > > > > initialized if we're
> > > > > +        * wedging very early, in which case there are not
> > > > > going
> > > > > to be any
> > > > > +        * pendind fences so we can bail immediately.
> > > > >          */
> > > > > -       if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
> > > > > +       if (!tlb_inval->ops->initialized(tlb_inval))
> > > > >                 return;
> > > > >  
> > > > >         /*
> > > > > -        * CT channel is already disabled at this point. No
> > > > > new
> > > > > TLB requests can
> > > > > +        * Backend is already disabled at this point. No new
> > > > > TLB
> > > > > requests can
> > > > >          * appear.
> > > > >          */
> > > > >  
> > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > -       spin_lock_irq(&gt->tlb_inval.pending_lock);
> > > > > -       cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > > > > +       tlb_inval->ops->lock(tlb_inval);
> > > > 
> > > > I think you want a dedicated lock embedded in struct
> > > > xe_tlb_inval,
> > > > rather than reaching into the backend to grab one.
> > > > 
> > > > This will deadlock as written: G2H TLB inval messages are
> > > > sometimes
> > > > processed while holding ct->lock (non-fast path, unlikely) and
> > > > sometimes
> > > > without it (fast path, likely).
> > > 
> > > Ugh, I'm off today. Ignore the deadlock part, I was confusing
> > > myself...
> > > I was thinking this was the function xe_tlb_inval_done_handler,
> > > it is
> > > not. I still think xe_tlb_inval should its own lock but this
> > > patch
> > > written should work with s/xe_guc_ct_send/xe_guc_ct_send_locked.
> > 
> > So one reason I didn't go that way is we did just the reverse
> > recently
> > - moved from a TLB dedicated lock to the more specific CT lock
> > since
> > these are all going into the CT handler anyway when we use GuC
> > submission. Then this embedded version allows us to lock at the
> > bottom
> > data layer rather than having a separate lock in the upper layer.
> > Another thing is we might want to have different types of
> > invalidation
> > running in parallel without locking the data in the upper layer
> > since
> > the real contention would be in the lower level pipelining anyway.
> > 
> 
> I can see the reasoning behind this approach, and maybe it’s fine.
> 
> But consider the case where the GuC backend has to look up a VM,
> iterate
> over a list of exec queues, and send multiple H2Gs to the hardware,
> each
> with a corresponding G2H (per-context invalidations). In the worst
> case,
> the CT code may have to wait for and process some G2Hs because our
> G2H
> credits are exhausted—all while holding the CT lock, which currently
> blocks any hardware submissions (i.e., hardware submissions need the
> CT
> lock). Now imagine multiple sources issuing invalidations: they could
> grab the CT lock before a submission waiting on it, further delaying
> that
> submission. 
> 
> The longer a mutex is held, the more likely the CPU thread holding it
> could switched out while holding it.
> 
> This doesn’t seem scalable compared to using a finer-grained CT lock
> (e.g., only taking it in xe_guc_ct_send).
> 
> I’m not saying this won’t work as you have it—I think it will—but the
> consequences of holding the CT lock for an extended period need to be
> considered.

Couple more thoughts.. so in the case you mentioned, ideally I'd like
to have just a single invalidation per request, rather than across a
whole VM. That's the reason we have the range based invalidation to
begin with. If we get to the point where we want to make that even
finer, that's great, but we should still just have a single
invalidation per request (again, ideally).

Also, you already have some patches up on the list that do some
coalescing of invalidations so we reduce the number of invalidations
for multiple ranges. I didn't want to include those patches here
because IMO they are really a separate feature here and it'd be nice to
review that on its own.

So basically, the per request lock here also pushes us to implement in
a more efficient and precise way rather than just hammering as many
invalidations over a given range as possible.

And of course there are going to need to be bigger hammer invalidations
sometimes (like the full VF invalidation we're doing in the
invalidate_all() routines), but those still fall into the same category
of precision, just with a larger scope (rather than multiple smaller
invalidations).

Thanks,
Stuart

> 
> Matt
> 
> > Thanks,
> > Stuart
> > 
> > > 
> > > Matt 
> > > 
> > > > 
> > > > I’d call this lock seqno_lock, since it protects exactly
> > > > that—the
> > > > order
> > > > in which a seqno is assigned by the frontend and handed to the
> > > > backend.
> > > > 
> > > > Prime this lock for reclaim as well—do what primelockdep() does
> > > > in
> > > > xe_guc_ct.c—to make it clear that memory allocations are not
> > > > allowed
> > > > while the lock is held as TLB invalidations can be called from
> > > > two
> > > > reclaim paths:
> > > > 
> > > > - MMU notifier callbacks
> > > > - The dma-fence signaling path of VM binds that require a TLB
> > > >   invalidation
> > > > 
> > > > > +       spin_lock_irq(&tlb_inval->pending_lock);
> > > > > +       cancel_delayed_work(&tlb_inval->fence_tdr);
> > > > >         /*
> > > > >          * We might have various kworkers waiting for TLB
> > > > > flushes
> > > > > to complete
> > > > >          * which are not tracked with an explicit TLB fence,
> > > > > however at this
> > > > > -        * stage that will never happen since the CT is
> > > > > already
> > > > > disabled, so
> > > > > -        * make sure we signal them here under the assumption
> > > > > that we have
> > > > > +        * stage that will never happen since the backend is
> > > > > already disabled,
> > > > > +        * so make sure we signal them here under the
> > > > > assumption
> > > > > that we have
> > > > >          * completed a full GT reset.
> > > > >          */
> > > > > -       if (gt->tlb_inval.seqno == 1)
> > > > > +       if (tlb_inval->seqno == 1)
> > > > >                 pending_seqno = TLB_INVALIDATION_SEQNO_MAX -
> > > > > 1;
> > > > >         else
> > > > > -               pending_seqno = gt->tlb_inval.seqno - 1;
> > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, pending_seqno);
> > > > > +               pending_seqno = tlb_inval->seqno - 1;
> > > > > +       WRITE_ONCE(tlb_inval->seqno_recv, pending_seqno);
> > > > >  
> > > > >         list_for_each_entry_safe(fence, next,
> > > > > -                                &gt-
> > > > > >tlb_inval.pending_fences,
> > > > > link)
> > > > > -               inval_fence_signal(gt_to_xe(gt), fence);
> > > > > -       spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > +                                &tlb_inval->pending_fences,
> > > > > link)
> > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > +       spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > +       tlb_inval->ops->unlock(tlb_inval);
> > > > >  }
> > > > >  
> > > > > -static bool tlb_inval_seqno_past(struct xe_gt *gt, int
> > > > > seqno)
> > > > > +static bool xe_tlb_inval_seqno_past(struct xe_tlb_inval
> > > > > *tlb_inval, int seqno)
> > > > >  {
> > > > > -       int seqno_recv = READ_ONCE(gt->tlb_inval.seqno_recv);
> > > > > +       int seqno_recv = READ_ONCE(tlb_inval->seqno_recv);
> > > > > +
> > > > > +       lockdep_assert_held(&tlb_inval->pending_lock);
> > > > >  
> > > > >         if (seqno - seqno_recv < -(TLB_INVALIDATION_SEQNO_MAX
> > > > > /
> > > > > 2))
> > > > >                 return false;
> > > > > @@ -201,44 +192,20 @@ static bool tlb_inval_seqno_past(struct
> > > > > xe_gt *gt, int seqno)
> > > > >         return seqno_recv >= seqno;
> > > > >  }
> > > > >  
> > > > > -static int send_tlb_inval(struct xe_guc *guc, const u32
> > > > > *action,
> > > > > int len)
> > > > > -{
> > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > -
> > > > > -       xe_gt_assert(gt, action[1]);    /* Seqno */
> > > > > -       lockdep_assert_held(&guc->ct.lock);
> > > > > -
> > > > > -       /*
> > > > > -        * XXX: The seqno algorithm relies on TLB
> > > > > invalidation
> > > > > being processed
> > > > > -        * in order which they currently are, if that changes
> > > > > the
> > > > > algorithm will
> > > > > -        * need to be updated.
> > > > > -        */
> > > > > -
> > > > > -       xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> > > > > -
> > > > > -       return xe_guc_ct_send(&guc->ct, action, len,
> > > > > -                             G2H_LEN_DW_TLB_INVALIDATE, 1);
> > > > > -}
> > > > > -
> > > > >  static void xe_tlb_inval_fence_prep(struct
> > > > > xe_tlb_inval_fence
> > > > > *fence)
> > > > >  {
> > > > >         struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
> > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > -
> > > > > -       lockdep_assert_held(&gt->uc.guc.ct.lock);
> > > > >  
> > > > >         fence->seqno = tlb_inval->seqno;
> > > > > -       trace_xe_tlb_inval_fence_send(xe, fence);
> > > > > +       trace_xe_tlb_inval_fence_send(tlb_inval->xe, fence);
> > > > >  
> > > > >         spin_lock_irq(&tlb_inval->pending_lock);
> > > > >         fence->inval_time = ktime_get();
> > > > >         list_add_tail(&fence->link, &tlb_inval-
> > > > > >pending_fences);
> > > > >  
> > > > >         if (list_is_singular(&tlb_inval->pending_fences))
> > > > > -               queue_delayed_work(system_wq,
> > > > > -                                  &tlb_inval->fence_tdr,
> > > > > -                                  tlb_timeout_jiffies(gt));
> > > > > +               queue_delayed_work(system_wq, &tlb_inval-
> > > > > > fence_tdr,
> > > > > +                                  tlb_inval->ops-
> > > > > > timeout_delay(tlb_inval));
> > > > >         spin_unlock_irq(&tlb_inval->pending_lock);
> > > > >  
> > > > >         tlb_inval->seqno = (tlb_inval->seqno + 1) %
> > > > > @@ -247,202 +214,63 @@ static void
> > > > > xe_tlb_inval_fence_prep(struct
> > > > > xe_tlb_inval_fence *fence)
> > > > >                 tlb_inval->seqno = 1;
> > > > >  }
> > > > >  
> > > > > -#define MAKE_INVAL_OP(type)    ((type <<
> > > > > XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > > > > -               XE_GUC_TLB_INVAL_MODE_HEAVY <<
> > > > > XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > > > > -               XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > > > > -
> > > > > -static int send_tlb_inval_ggtt(struct xe_gt *gt, int seqno)
> > > > > -{
> > > > > -       u32 action[] = {
> > > > > -               XE_GUC_ACTION_TLB_INVALIDATION,
> > > > > -               seqno,
> > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > > > > -       };
> > > > > -
> > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > ARRAY_SIZE(action));
> > > > > -}
> > > > > -
> > > > > -static int send_tlb_inval_all(struct xe_tlb_inval
> > > > > *tlb_inval,
> > > > > -                             struct xe_tlb_inval_fence
> > > > > *fence)
> > > > > -{
> > > > > -       u32 action[] = {
> > > > > -               XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > > > > -               0,  /* seqno, replaced in send_tlb_inval */
> > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > > > -       };
> > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > -
> > > > > -       xe_gt_assert(gt, fence);
> > > > > -
> > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > ARRAY_SIZE(action));
> > > > > -}
> > > > > +#define xe_tlb_inval_issue(__tlb_inval, __fence, op,
> > > > > args...)  \
> > > > > +({                                                          
> > > > >    \
> > > > > +       int
> > > > > __ret;                                              \
> > > > > +                                                            
> > > > >    \
> > > > > +       xe_assert((__tlb_inval)->xe, (__tlb_inval)-
> > > > > >ops);       \
> > > > > +       xe_assert((__tlb_inval)->xe,
> > > > > (__fence));                \
> > > > > +                                                            
> > > > >    \
> > > > > +       (__tlb_inval)->ops-
> > > > > >lock((__tlb_inval));                \
> > > > > +       xe_tlb_inval_fence_prep((__fence));                  
> > > > >    \
> > > > > +       __ret = op((__tlb_inval), (__fence)->seqno,
> > > > > ##args);    \
> > > > > +       if (__ret <
> > > > > 0)                                          \
> > > > > +               xe_tlb_inval_fence_signal_unlocked((__fence))
> > > > > ;  \
> > > > > +       (__tlb_inval)->ops-
> > > > > >unlock((__tlb_inval));              \
> > > > > +                                                            
> > > > >    \
> > > > > +       __ret == -ECANCELED ? 0 :
> > > > > __ret;                        \
> > > > > +})
> > > > >  
> > > > >  /**
> > > > > - * xe_gt_tlb_invalidation_all - Invalidate all TLBs across
> > > > > PF
> > > > > and all VFs.
> > > > > - * @gt: the &xe_gt structure
> > > > > - * @fence: the &xe_tlb_inval_fence to be signaled on
> > > > > completion
> > > > > + * xe_tlb_inval_all() - Issue a TLB invalidation for all
> > > > > TLBs
> > > > > + * @tlb_inval: TLB invalidation client
> > > > > + * @fence: invalidation fence which will be signal on TLB
> > > > > invalidation
> > > > > + * completion
> > > > >   *
> > > > > - * Send a request to invalidate all TLBs across PF and all
> > > > > VFs.
> > > > > + * Issue a TLB invalidation for all TLBs. Completion of TLB
> > > > > is
> > > > > asynchronous and
> > > > > + * caller can use the invalidation fence to wait for
> > > > > completion.
> > > > >   *
> > > > >   * Return: 0 on success, negative error code on error
> > > > >   */
> > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > >                      struct xe_tlb_inval_fence *fence)
> > > > >  {
> > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > -       int err;
> > > > > -
> > > > > -       err = send_tlb_inval_all(tlb_inval, fence);
> > > > > -       if (err)
> > > > > -               xe_gt_err(gt, "TLB invalidation request
> > > > > failed
> > > > > (%pe)", ERR_PTR(err));
> > > > > -
> > > > > -       return err;
> > > > > -}
> > > > > -
> > > > > -/*
> > > > > - * Ensure that roundup_pow_of_two(length) doesn't overflow.
> > > > > - * Note that roundup_pow_of_two() operates on unsigned long,
> > > > > - * not on u64.
> > > > > - */
> > > > > -#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > > > > (rounddown_pow_of_two(ULONG_MAX))
> > > > > -
> > > > > -static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64 start,
> > > > > u64
> > > > > end,
> > > > > -                               u32 asid, int seqno)
> > > > > -{
> > > > > -#define MAX_TLB_INVALIDATION_LEN       7
> > > > > -       u32 action[MAX_TLB_INVALIDATION_LEN];
> > > > > -       u64 length = end - start;
> > > > > -       int len = 0;
> > > > > -
> > > > > -       action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > > > > -       action[len++] = seqno;
> > > > > -       if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > > > -           length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > > > > -               action[len++] =
> > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > > > > -       } else {
> > > > > -               u64 orig_start = start;
> > > > > -               u64 align;
> > > > > -
> > > > > -               if (length < SZ_4K)
> > > > > -                       length = SZ_4K;
> > > > > -
> > > > > -               /*
> > > > > -                * We need to invalidate a higher granularity
> > > > > if
> > > > > start address
> > > > > -                * is not aligned to length. When start is
> > > > > not
> > > > > aligned with
> > > > > -                * length we need to find the length large
> > > > > enough
> > > > > to create an
> > > > > -                * address mask covering the required range.
> > > > > -                */
> > > > > -               align = roundup_pow_of_two(length);
> > > > > -               start = ALIGN_DOWN(start, align);
> > > > > -               end = ALIGN(end, align);
> > > > > -               length = align;
> > > > > -               while (start + length < end) {
> > > > > -                       length <<= 1;
> > > > > -                       start = ALIGN_DOWN(orig_start,
> > > > > length);
> > > > > -               }
> > > > > -
> > > > > -               /*
> > > > > -                * Minimum invalidation size for a 2MB page
> > > > > that
> > > > > the hardware
> > > > > -                * expects is 16MB
> > > > > -                */
> > > > > -               if (length >= SZ_2M) {
> > > > > -                       length = max_t(u64, SZ_16M, length);
> > > > > -                       start = ALIGN_DOWN(orig_start,
> > > > > length);
> > > > > -               }
> > > > > -
> > > > > -               xe_gt_assert(gt, length >= SZ_4K);
> > > > > -               xe_gt_assert(gt, is_power_of_2(length));
> > > > > -               xe_gt_assert(gt, !(length &
> > > > > GENMASK(ilog2(SZ_16M)
> > > > > - 1,
> > > > > -                                                  
> > > > > ilog2(SZ_2M)
> > > > > + 1)));
> > > > > -               xe_gt_assert(gt, IS_ALIGNED(start, length));
> > > > > -
> > > > > -               action[len++] =
> > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > > > > -               action[len++] = asid;
> > > > > -               action[len++] = lower_32_bits(start);
> > > > > -               action[len++] = upper_32_bits(start);
> > > > > -               action[len++] = ilog2(length) - ilog2(SZ_4K);
> > > > > -       }
> > > > > -
> > > > > -       xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> > > > > -
> > > > > -       return send_tlb_inval(&gt->uc.guc, action, len);
> > > > > -}
> > > > > -
> > > > > -static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
> > > > > -                              struct xe_tlb_inval_fence
> > > > > *fence)
> > > > > -{
> > > > > -       int ret;
> > > > > -
> > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > -
> > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > -
> > > > > -       ret = send_tlb_inval_ggtt(gt, fence->seqno);
> > > > > -       if (ret < 0)
> > > > > -               inval_fence_signal_unlocked(gt_to_xe(gt),
> > > > > fence);
> > > > > -
> > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > -
> > > > > -       /*
> > > > > -        * -ECANCELED indicates the CT is stopped for a GT
> > > > > reset.
> > > > > TLB caches
> > > > > -        *  should be nuked on a GT reset so this error can
> > > > > be
> > > > > ignored.
> > > > > -        */
> > > > > -       if (ret == -ECANCELED)
> > > > > -               return 0;
> > > > > -
> > > > > -       return ret;
> > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > tlb_inval-
> > > > > > ops->all);
> > > > >  }
> > > > >  
> > > > >  /**
> > > > > - * xe_tlb_inval_ggtt - Issue a TLB invalidation on this GT
> > > > > for
> > > > > the GGTT
> > > > > + * xe_tlb_inval_ggtt() - Issue a TLB invalidation for the
> > > > > GGTT
> > > > >   * @tlb_inval: TLB invalidation client
> > > > >   *
> > > > > - * Issue a TLB invalidation for the GGTT. Completion of TLB
> > > > > invalidation is
> > > > > - * synchronous.
> > > > > + * Issue a TLB invalidation for the GGTT. Completion of TLB
> > > > > is
> > > > > asynchronous and
> > > > > + * caller can use the invalidation fence to wait for
> > > > > completion.
> > > > >   *
> > > > >   * Return: 0 on success, negative error code on error
> > > > >   */
> > > > >  int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
> > > > >  {
> > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > -       unsigned int fw_ref;
> > > > > -
> > > > > -       if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
> > > > > -           gt->uc.guc.submission_state.enabled) {
> > > > > -               struct xe_tlb_inval_fence fence;
> > > > > -               int ret;
> > > > > -
> > > > > -               xe_tlb_inval_fence_init(tlb_inval, &fence,
> > > > > true);
> > > > > -               ret = __xe_tlb_inval_ggtt(gt, &fence);
> > > > > -               if (ret)
> > > > > -                       return ret;
> > > > > -
> > > > > -               xe_tlb_inval_fence_wait(&fence);
> > > > > -       } else if (xe_device_uc_enabled(xe) &&
> > > > > !xe_device_wedged(xe)) {
> > > > > -               struct xe_mmio *mmio = &gt->mmio;
> > > > > -
> > > > > -               if (IS_SRIOV_VF(xe))
> > > > > -                       return 0;
> > > > > -
> > > > > -               fw_ref = xe_force_wake_get(gt_to_fw(gt),
> > > > > XE_FW_GT);
> > > > > -               if (xe->info.platform == XE_PVC ||
> > > > > GRAPHICS_VER(xe) >= 20) {
> > > > > -                       xe_mmio_write32(mmio,
> > > > > PVC_GUC_TLB_INV_DESC1,
> > > > > -
> > > > >                                        PVC_GUC_TLB_INV_DESC1_
> > > > > INVAL
> > > > > IDATE);
> > > > > -                       xe_mmio_write32(mmio,
> > > > > PVC_GUC_TLB_INV_DESC0,
> > > > > -
> > > > >                                        PVC_GUC_TLB_INV_DESC0_
> > > > > VALID
> > > > > );
> > > > > -               } else {
> > > > > -                       xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> > > > > -
> > > > >                                        GUC_TLB_INV_CR_INVALID
> > > > > ATE);
> > > > > -               }
> > > > > -               xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > > > > -       }
> > > > > +       struct xe_tlb_inval_fence fence, *fence_ptr = &fence;
> > > > > +       int ret;
> > > > >  
> > > > > -       return 0;
> > > > > +       xe_tlb_inval_fence_init(tlb_inval, fence_ptr, true);
> > > > > +       ret = xe_tlb_inval_issue(tlb_inval, fence_ptr,
> > > > > tlb_inval-
> > > > > > ops->ggtt);
> > > > > +       xe_tlb_inval_fence_wait(fence_ptr);
> > > > > +
> > > > > +       return ret;
> > > > >  }
> > > > >  
> > > > >  /**
> > > > > - * xe_tlb_inval_range - Issue a TLB invalidation on this GT
> > > > > for
> > > > > an address range
> > > > > + * xe_tlb_inval_range() - Issue a TLB invalidation for an
> > > > > address range
> > > > >   * @tlb_inval: TLB invalidation client
> > > > >   * @fence: invalidation fence which will be signal on TLB
> > > > > invalidation
> > > > >   * completion
> > > > > @@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct
> > > > > xe_tlb_inval
> > > > > *tlb_inval,
> > > > >                        struct xe_tlb_inval_fence *fence, u64
> > > > > start, u64 end,
> > > > >                        u32 asid)
> > > > >  {
> > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > -       int  ret;
> > > > > -
> > > > > -       xe_gt_assert(gt, fence);
> > > > > -
> > > > > -       /* Execlists not supported */
> > > > > -       if (xe->info.force_execlist) {
> > > > > -               __inval_fence_signal(xe, fence);
> > > > > -               return 0;
> > > > > -       }
> > > > > -
> > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > -
> > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > -
> > > > > -       ret = send_tlb_inval_ppgtt(gt, start, end, asid,
> > > > > fence-
> > > > > > seqno);
> > > > > -       if (ret < 0)
> > > > > -               inval_fence_signal_unlocked(xe, fence);
> > > > > -
> > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > -
> > > > > -       return ret;
> > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > tlb_inval-
> > > > > > ops->ppgtt,
> > > > > +                                 start, end, asid);
> > > > >  }
> > > > >  
> > > > >  /**
> > > > > - * xe_tlb_inval_vm - Issue a TLB invalidation on this GT for
> > > > > a
> > > > > VM
> > > > > + * xe_tlb_inval_vm() - Issue a TLB invalidation for a VM
> > > > >   * @tlb_inval: TLB invalidation client
> > > > >   * @vm: VM to invalidate
> > > > >   *
> > > > > @@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct
> > > > > xe_tlb_inval
> > > > > *tlb_inval, struct xe_vm *vm)
> > > > >  {
> > > > >         struct xe_tlb_inval_fence fence;
> > > > >         u64 range = 1ull << vm->xe->info.va_bits;
> > > > > -       int ret;
> > > > >  
> > > > >         xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> > > > > -
> > > > > -       ret = xe_tlb_inval_range(tlb_inval, &fence, 0, range,
> > > > > vm-
> > > > > > usm.asid);
> > > > > -       if (ret < 0)
> > > > > -               return;
> > > > > -
> > > > > +       xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm-
> > > > > > usm.asid);
> > > > >         xe_tlb_inval_fence_wait(&fence);
> > > > >  }
> > > > >  
> > > > >  /**
> > > > > - * xe_tlb_inval_done_handler - TLB invalidation done handler
> > > > > - * @gt: gt
> > > > > + * xe_tlb_inval_done_handler() - TLB invalidation done
> > > > > handler
> > > > > + * @tlb_inval: TLB invalidation client
> > > > >   * @seqno: seqno of invalidation that is done
> > > > >   *
> > > > >   * Update recv seqno, signal any TLB invalidation fences,
> > > > > and
> > > > > restart TDR
> > > > 
> > > > I'd mention that is function is safe be called from any context
> > > > (i.e.,
> > > > process, atomic, and hardirq contexts are allowed).
> > > > 
> > > > We might need to convert tlb_inval.pending_lock to a
> > > > raw_spinlock_t
> > > > for
> > > > PREEMPT_RT enablement. Same for the GuC fast_lock. AFAIK we
> > > > haven’t
> > > > had
> > > > any complaints, so maybe I’m just overthinking it, but also
> > > > perhaps
> > > > not.
> > > > 
> > > > >   */
> > > > > -static void xe_tlb_inval_done_handler(struct xe_gt *gt, int
> > > > > seqno)
> > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > *tlb_inval,
> > > > > int seqno)
> > > > >  {
> > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > +       struct xe_device *xe = tlb_inval->xe;
> > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > >         unsigned long flags;
> > > > >  
> > > > > @@ -535,77 +337,53 @@ static void
> > > > > xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> > > > >          * officially process the CT message like if racing
> > > > > against
> > > > >          * process_g2h_msg().
> > > > >          */
> > > > > -       spin_lock_irqsave(&gt->tlb_inval.pending_lock,
> > > > > flags);
> > > > > -       if (tlb_inval_seqno_past(gt, seqno)) {
> > > > > -               spin_unlock_irqrestore(&gt-
> > > > > > tlb_inval.pending_lock, flags);
> > > > > +       spin_lock_irqsave(&tlb_inval->pending_lock, flags);
> > > > > +       if (xe_tlb_inval_seqno_past(tlb_inval, seqno)) {
> > > > > +               spin_unlock_irqrestore(&tlb_inval-
> > > > > >pending_lock,
> > > > > flags);
> > > > >                 return;
> > > > >         }
> > > > >  
> > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
> > > > > +       WRITE_ONCE(tlb_inval->seqno_recv, seqno);
> > > > >  
> > > > >         list_for_each_entry_safe(fence, next,
> > > > > -                                &gt-
> > > > > >tlb_inval.pending_fences,
> > > > > link) {
> > > > > +                                &tlb_inval->pending_fences,
> > > > > link) {
> > > > >                 trace_xe_tlb_inval_fence_recv(xe, fence);
> > > > >  
> > > > > -               if (!tlb_inval_seqno_past(gt, fence->seqno))
> > > > > +               if (!xe_tlb_inval_seqno_past(tlb_inval,
> > > > > fence-
> > > > > > seqno))
> > > > >                         break;
> > > > >  
> > > > > -               inval_fence_signal(xe, fence);
> > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > >         }
> > > > >  
> > > > > -       if (!list_empty(&gt->tlb_inval.pending_fences))
> > > > > +       if (!list_empty(&tlb_inval->pending_fences))
> > > > >                 mod_delayed_work(system_wq,
> > > > > -                                &gt->tlb_inval.fence_tdr,
> > > > > -                                tlb_timeout_jiffies(gt));
> > > > > +                                &tlb_inval->fence_tdr,
> > > > > +                                tlb_inval->ops-
> > > > > > timeout_delay(tlb_inval));
> > > > >         else
> > > > > -               cancel_delayed_work(&gt-
> > > > > >tlb_inval.fence_tdr);
> > > > > +               cancel_delayed_work(&tlb_inval->fence_tdr);
> > > > >  
> > > > > -       spin_unlock_irqrestore(&gt->tlb_inval.pending_lock,
> > > > > flags);
> > > > > -}
> > > > > -
> > > > > -/**
> > > > > - * xe_guc_tlb_inval_done_handler - TLB invalidation done
> > > > > handler
> > > > > - * @guc: guc
> > > > > - * @msg: message indicating TLB invalidation done
> > > > > - * @len: length of message
> > > > > - *
> > > > > - * Parse seqno of TLB invalidation, wake any waiters for
> > > > > seqno,
> > > > > and signal any
> > > > > - * invalidation fences for seqno. Algorithm for this depends
> > > > > on
> > > > > seqno being
> > > > > - * received in-order and asserts this assumption.
> > > > > - *
> > > > > - * Return: 0 on success, -EPROTO for malformed messages.
> > > > > - */
> > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32
> > > > > *msg,
> > > > > u32 len)
> > > > > -{
> > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > -
> > > > > -       if (unlikely(len != 1))
> > > > > -               return -EPROTO;
> > > > > -
> > > > > -       xe_tlb_inval_done_handler(gt, msg[0]);
> > > > > -
> > > > > -       return 0;
> > > > > +       spin_unlock_irqrestore(&tlb_inval->pending_lock,
> > > > > flags);
> > > > >  }
> > > > >  
> > > > >  static const char *
> > > > > -inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > > > > +xe_inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > > > >  {
> > > > >         return "xe";
> > > > >  }
> > > > >  
> > > > >  static const char *
> > > > > -inval_fence_get_timeline_name(struct dma_fence *dma_fence)
> > > > > +xe_inval_fence_get_timeline_name(struct dma_fence
> > > > > *dma_fence)
> > > > >  {
> > > > > -       return "inval_fence";
> > > > > +       return "tlb_inval_fence";
> > > > >  }
> > > > >  
> > > > >  static const struct dma_fence_ops inval_fence_ops = {
> > > > > -       .get_driver_name = inval_fence_get_driver_name,
> > > > > -       .get_timeline_name = inval_fence_get_timeline_name,
> > > > > +       .get_driver_name = xe_inval_fence_get_driver_name,
> > > > > +       .get_timeline_name =
> > > > > xe_inval_fence_get_timeline_name,
> > > > >  };
> > > > >  
> > > > >  /**
> > > > > - * xe_tlb_inval_fence_init - Initialize TLB invalidation
> > > > > fence
> > > > > + * xe_tlb_inval_fence_init() - Initialize TLB invalidation
> > > > > fence
> > > > >   * @tlb_inval: TLB invalidation client
> > > > >   * @fence: TLB invalidation fence to initialize
> > > > >   * @stack: fence is stack variable
> > > > > @@ -618,15 +396,12 @@ void xe_tlb_inval_fence_init(struct
> > > > > xe_tlb_inval *tlb_inval,
> > > > >                              struct xe_tlb_inval_fence
> > > > > *fence,
> > > > >                              bool stack)
> > > > >  {
> > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > -
> > > > > -       xe_pm_runtime_get_noresume(gt_to_xe(gt));
> > > > > +       xe_pm_runtime_get_noresume(tlb_inval->xe);
> > > > >  
> > > > > -       spin_lock_irq(&gt->tlb_inval.lock);
> > > > > -       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > > -                      &gt->tlb_inval.lock,
> > > > > +       spin_lock_irq(&tlb_inval->lock);
> > > > > +       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > > &tlb_inval->lock,
> > > > >                        dma_fence_context_alloc(1), 1);
> > > > > -       spin_unlock_irq(&gt->tlb_inval.lock);
> > > > > +       spin_unlock_irq(&tlb_inval->lock);
> > > > 
> > > > While here, 'fence_lock' is probably a better name.
> > > > 
> > > > Matt
> > > > 
> > > > >         INIT_LIST_HEAD(&fence->link);
> > > > >         if (stack)
> > > > >                 set_bit(FENCE_STACK_BIT, &fence->base.flags);
> > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > index 7adee3f8c551..cdeafc8d4391 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > @@ -18,24 +18,30 @@ struct xe_vma;
> > > > >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
> > > > >  
> > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval);
> > > > > -int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > > -void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct
> > > > > xe_vm *vm);
> > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > >                      struct xe_tlb_inval_fence *fence);
> > > > > +int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > > +void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct
> > > > > xe_vm *vm);
> > > > >  int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
> > > > >                        struct xe_tlb_inval_fence *fence,
> > > > >                        u64 start, u64 end, u32 asid);
> > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32
> > > > > *msg,
> > > > > u32 len);
> > > > >  
> > > > >  void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
> > > > >                              struct xe_tlb_inval_fence
> > > > > *fence,
> > > > >                              bool stack);
> > > > > -void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence
> > > > > *fence);
> > > > >  
> > > > > +/**
> > > > > + * xe_tlb_inval_fence_wait() - TLB invalidiation fence wait
> > > > > + * @fence: TLB invalidation fence to wait on
> > > > > + *
> > > > > + * Wait on a TLB invalidiation fence until it signals, non
> > > > > interruptable
> > > > > + */
> > > > >  static inline void
> > > > >  xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence *fence)
> > > > >  {
> > > > >         dma_fence_wait(&fence->base, false);
> > > > >  }
> > > > >  
> > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > *tlb_inval,
> > > > > int seqno);
> > > > > +
> > > > >  #endif /* _XE_TLB_INVAL_ */
> > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > index 05b6adc929bb..c1ad96d24fc8 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > @@ -9,10 +9,85 @@
> > > > >  #include <linux/workqueue.h>
> > > > >  #include <linux/dma-fence.h>
> > > > >  
> > > > > -/** struct xe_tlb_inval - TLB invalidation client */
> > > > > +struct xe_tlb_inval;
> > > > > +
> > > > > +/** struct xe_tlb_inval_ops - TLB invalidation ops (backend)
> > > > > */
> > > > > +struct xe_tlb_inval_ops {
> > > > > +       /**
> > > > > +        * @all: Invalidate all TLBs
> > > > > +        * @tlb_inval: TLB invalidation client
> > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > +        *
> > > > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > > > reset, error on
> > > > > +        * failure
> > > > > +        */
> > > > > +       int (*all)(struct xe_tlb_inval *tlb_inval, u32
> > > > > seqno);
> > > > > +
> > > > > +       /**
> > > > > +        * @ggtt: Invalidate global translation TLBs
> > > > > +        * @tlb_inval: TLB invalidation client
> > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > +        *
> > > > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > > > reset, error on
> > > > > +        * failure
> > > > > +        */
> > > > > +       int (*ggtt)(struct xe_tlb_inval *tlb_inval, u32
> > > > > seqno);
> > > > > +
> > > > > +       /**
> > > > > +        * @ppttt: Invalidate per-process translation TLBs
> > > > > +        * @tlb_inval: TLB invalidation client
> > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > +        * @start: Start address
> > > > > +        * @end: End address
> > > > > +        * @asid: Address space ID
> > > > > +        *
> > > > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > > > reset, error on
> > > > > +        * failure
> > > > > +        */
> > > > > +       int (*ppgtt)(struct xe_tlb_inval *tlb_inval, u32
> > > > > seqno,
> > > > > u64 start,
> > > > > +                    u64 end, u32 asid);
> > > > > +
> > > > > +       /**
> > > > > +        * @initialized: Backend is initialized
> > > > > +        * @tlb_inval: TLB invalidation client
> > > > > +        *
> > > > > +        * Return: True if back is initialized, False
> > > > > otherwise
> > > > > +        */
> > > > > +       bool (*initialized)(struct xe_tlb_inval *tlb_inval);
> > > > > +
> > > > > +       /**
> > > > > +        * @flush: Flush pending TLB invalidations
> > > > > +        * @tlb_inval: TLB invalidation client
> > > > > +        */
> > > > > +       void (*flush)(struct xe_tlb_inval *tlb_inval);
> > > > > +
> > > > > +       /**
> > > > > +        * @timeout_delay: Timeout delay for TLB invalidation
> > > > > +        * @tlb_inval: TLB invalidation client
> > > > > +        *
> > > > > +        * Return: Timeout delay for TLB invalidation in
> > > > > jiffies
> > > > > +        */
> > > > > +       long (*timeout_delay)(struct xe_tlb_inval
> > > > > *tlb_inval);
> > > > > +
> > > > > +       /**
> > > > > +        * @lock: Lock resources protecting the backend seqno
> > > > > management
> > > > > +        */
> > > > > +       void (*lock)(struct xe_tlb_inval *tlb_inval);
> > > > > +
> > > > > +       /**
> > > > > +        * @unlock: Lock resources protecting the backend
> > > > > seqno
> > > > > management
> > > > > +        */
> > > > > +       void (*unlock)(struct xe_tlb_inval *tlb_inval);
> > > > > +};
> > > > > +
> > > > > +/** struct xe_tlb_inval - TLB invalidation client (frontend)
> > > > > */
> > > > >  struct xe_tlb_inval {
> > > > >         /** @private: Backend private pointer */
> > > > >         void *private;
> > > > > +       /** @xe: Pointer to Xe device */
> > > > > +       struct xe_device *xe;
> > > > > +       /** @ops: TLB invalidation ops */
> > > > > +       const struct xe_tlb_inval_ops *ops;
> > > > >         /** @tlb_inval.seqno: TLB invalidation seqno,
> > > > > protected
> > > > > by CT lock */
> > > > >  #define TLB_INVALIDATION_SEQNO_MAX     0x100000
> > > > >         int seqno;
> > > > > -- 
> > > > > 2.34.1
> > > > > 
> > 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend
  2025-07-23 20:55           ` Summers, Stuart
@ 2025-07-23 21:22             ` Matthew Brost
  2025-07-23 22:03               ` Summers, Stuart
  2025-07-23 23:19               ` Summers, Stuart
  0 siblings, 2 replies; 19+ messages in thread
From: Matthew Brost @ 2025-07-23 21:22 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
	Kassabri, Farah, Auld, Matthew

On Wed, Jul 23, 2025 at 02:55:24PM -0600, Summers, Stuart wrote:
> On Wed, 2025-07-23 at 13:47 -0700, Matthew Brost wrote:
> > 
> 
> <cut>
> (just to reduce the noise in the rest of the patch here for now...)
> 
> > > > > >  
> > > > > >  /**
> > > > > > - * xe_tlb_inval_reset - Initialize TLB invalidation reset
> > > > > > + * xe_tlb_inval_reset() - TLB invalidation reset
> > > > > >   * @tlb_inval: TLB invalidation client
> > > > > >   *
> > > > > >   * Signal any pending invalidation fences, should be called
> > > > > > during a GT reset
> > > > > >   */
> > > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval)
> > > > > >  {
> > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > > >         int pending_seqno;
> > > > > >  
> > > > > >         /*
> > > > > > -        * we can get here before the CTs are even
> > > > > > initialized if
> > > > > > we're wedging
> > > > > > -        * very early, in which case there are not going to
> > > > > > be
> > > > > > any pending
> > > > > > -        * fences so we can bail immediately.
> > > > > > +        * we can get here before the backends are even
> > > > > > initialized if we're
> > > > > > +        * wedging very early, in which case there are not
> > > > > > going
> > > > > > to be any
> > > > > > +        * pendind fences so we can bail immediately.
> > > > > >          */
> > > > > > -       if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
> > > > > > +       if (!tlb_inval->ops->initialized(tlb_inval))
> > > > > >                 return;
> > > > > >  
> > > > > >         /*
> > > > > > -        * CT channel is already disabled at this point. No
> > > > > > new
> > > > > > TLB requests can
> > > > > > +        * Backend is already disabled at this point. No new
> > > > > > TLB
> > > > > > requests can
> > > > > >          * appear.
> > > > > >          */
> > > > > >  
> > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > -       spin_lock_irq(&gt->tlb_inval.pending_lock);
> > > > > > -       cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > > > > > +       tlb_inval->ops->lock(tlb_inval);
> > > > > 
> > > > > I think you want a dedicated lock embedded in struct
> > > > > xe_tlb_inval,
> > > > > rather than reaching into the backend to grab one.
> > > > > 
> > > > > This will deadlock as written: G2H TLB inval messages are
> > > > > sometimes
> > > > > processed while holding ct->lock (non-fast path, unlikely) and
> > > > > sometimes
> > > > > without it (fast path, likely).
> > > > 
> > > > Ugh, I'm off today. Ignore the deadlock part, I was confusing
> > > > myself...
> > > > I was thinking this was the function xe_tlb_inval_done_handler,
> > > > it is
> > > > not. I still think xe_tlb_inval should its own lock but this
> > > > patch
> > > > written should work with s/xe_guc_ct_send/xe_guc_ct_send_locked.
> > > 
> > > So one reason I didn't go that way is we did just the reverse
> > > recently
> > > - moved from a TLB dedicated lock to the more specific CT lock
> > > since
> > > these are all going into the CT handler anyway when we use GuC
> > > submission. Then this embedded version allows us to lock at the
> > > bottom
> > > data layer rather than having a separate lock in the upper layer.
> > > Another thing is we might want to have different types of
> > > invalidation
> > > running in parallel without locking the data in the upper layer
> > > since
> > > the real contention would be in the lower level pipelining anyway.
> > > 
> > 
> > I can see the reasoning behind this approach, and maybe it’s fine.
> > 
> > But consider the case where the GuC backend has to look up a VM,
> > iterate
> > over a list of exec queues, and send multiple H2Gs to the hardware,
> > each
> > with a corresponding G2H (per-context invalidations). In the worst
> > case,
> > the CT code may have to wait for and process some G2Hs because our
> > G2H
> > credits are exhausted—all while holding the CT lock, which currently
> > blocks any hardware submissions (i.e., hardware submissions need the
> > CT
> > lock). Now imagine multiple sources issuing invalidations: they could
> > grab the CT lock before a submission waiting on it, further delaying
> > that
> > submission. 
> > 
> > The longer a mutex is held, the more likely the CPU thread holding it
> > could switched out while holding it.
> > 
> > This doesn’t seem scalable compared to using a finer-grained CT lock
> > (e.g., only taking it in xe_guc_ct_send).
> > 
> > I’m not saying this won’t work as you have it—I think it will—but the
> > consequences of holding the CT lock for an extended period need to be
> > considered.
> 
> Couple more thoughts.. so in the case you mentioned, ideally I'd like
> to have just a single invalidation per request, rather than across a
> whole VM. That's the reason we have the range based invalidation to

Yes, this is ranged based.

> begin with. If we get to the point where we want to make that even
> finer, that's great, but we should still just have a single
> invalidation per request (again, ideally).
> 

Maybe you have a different idea, but I was thinking of queue-based
invalidations: the frontend assigns a single seqno, the backend issues N
invalidations to the hardware—one per GCID mapped in the VM/GT tuple—and
then signals the frontend when all invalidations associated with the
seqno are complete. With the GuC, a GCID corresponds to each exec queue’s
gucid mapped in the VM/GT tuple. Different backends can handle this
differently.

> Also, you already have some patches up on the list that do some
> coalescing of invalidations so we reduce the number of invalidations
> for multiple ranges. I didn't want to include those patches here
> because IMO they are really a separate feature here and it'd be nice to
> review that on its own.
>

I agree it is a seperate thing, that should help in some cases, and
should be reviewed on its own.

That doesn't help in the case of multiple VM's issuing invalidations
though (think eviction is occuring or MMU notifiers are firing). The
lock contenion is moved from a dedicated TLB invalidation lock, to a
widely shared CT lock. If multiple TLB invalidations are contending, now
all other users of the CT lock contend at this higher level. i.e., by
only acquring CT lock at last part of an invalidation, other waiters
(non-invalidation) get QoS.

Matt
 
> So basically, the per request lock here also pushes us to implement in
> a more efficient and precise way rather than just hammering as many
> invalidations over a given range as possible.
> 
> And of course there are going to need to be bigger hammer invalidations
> sometimes (like the full VF invalidation we're doing in the
> invalidate_all() routines), but those still fall into the same category
> of precision, just with a larger scope (rather than multiple smaller
> invalidations).
> 
> Thanks,
> Stuart
> 
> > 
> > Matt
> > 
> > > Thanks,
> > > Stuart
> > > 
> > > > 
> > > > Matt 
> > > > 
> > > > > 
> > > > > I’d call this lock seqno_lock, since it protects exactly
> > > > > that—the
> > > > > order
> > > > > in which a seqno is assigned by the frontend and handed to the
> > > > > backend.
> > > > > 
> > > > > Prime this lock for reclaim as well—do what primelockdep() does
> > > > > in
> > > > > xe_guc_ct.c—to make it clear that memory allocations are not
> > > > > allowed
> > > > > while the lock is held as TLB invalidations can be called from
> > > > > two
> > > > > reclaim paths:
> > > > > 
> > > > > - MMU notifier callbacks
> > > > > - The dma-fence signaling path of VM binds that require a TLB
> > > > >   invalidation
> > > > > 
> > > > > > +       spin_lock_irq(&tlb_inval->pending_lock);
> > > > > > +       cancel_delayed_work(&tlb_inval->fence_tdr);
> > > > > >         /*
> > > > > >          * We might have various kworkers waiting for TLB
> > > > > > flushes
> > > > > > to complete
> > > > > >          * which are not tracked with an explicit TLB fence,
> > > > > > however at this
> > > > > > -        * stage that will never happen since the CT is
> > > > > > already
> > > > > > disabled, so
> > > > > > -        * make sure we signal them here under the assumption
> > > > > > that we have
> > > > > > +        * stage that will never happen since the backend is
> > > > > > already disabled,
> > > > > > +        * so make sure we signal them here under the
> > > > > > assumption
> > > > > > that we have
> > > > > >          * completed a full GT reset.
> > > > > >          */
> > > > > > -       if (gt->tlb_inval.seqno == 1)
> > > > > > +       if (tlb_inval->seqno == 1)
> > > > > >                 pending_seqno = TLB_INVALIDATION_SEQNO_MAX -
> > > > > > 1;
> > > > > >         else
> > > > > > -               pending_seqno = gt->tlb_inval.seqno - 1;
> > > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, pending_seqno);
> > > > > > +               pending_seqno = tlb_inval->seqno - 1;
> > > > > > +       WRITE_ONCE(tlb_inval->seqno_recv, pending_seqno);
> > > > > >  
> > > > > >         list_for_each_entry_safe(fence, next,
> > > > > > -                                &gt-
> > > > > > >tlb_inval.pending_fences,
> > > > > > link)
> > > > > > -               inval_fence_signal(gt_to_xe(gt), fence);
> > > > > > -       spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > +                                &tlb_inval->pending_fences,
> > > > > > link)
> > > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > > +       spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > > +       tlb_inval->ops->unlock(tlb_inval);
> > > > > >  }
> > > > > >  
> > > > > > -static bool tlb_inval_seqno_past(struct xe_gt *gt, int
> > > > > > seqno)
> > > > > > +static bool xe_tlb_inval_seqno_past(struct xe_tlb_inval
> > > > > > *tlb_inval, int seqno)
> > > > > >  {
> > > > > > -       int seqno_recv = READ_ONCE(gt->tlb_inval.seqno_recv);
> > > > > > +       int seqno_recv = READ_ONCE(tlb_inval->seqno_recv);
> > > > > > +
> > > > > > +       lockdep_assert_held(&tlb_inval->pending_lock);
> > > > > >  
> > > > > >         if (seqno - seqno_recv < -(TLB_INVALIDATION_SEQNO_MAX
> > > > > > /
> > > > > > 2))
> > > > > >                 return false;
> > > > > > @@ -201,44 +192,20 @@ static bool tlb_inval_seqno_past(struct
> > > > > > xe_gt *gt, int seqno)
> > > > > >         return seqno_recv >= seqno;
> > > > > >  }
> > > > > >  
> > > > > > -static int send_tlb_inval(struct xe_guc *guc, const u32
> > > > > > *action,
> > > > > > int len)
> > > > > > -{
> > > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > > -
> > > > > > -       xe_gt_assert(gt, action[1]);    /* Seqno */
> > > > > > -       lockdep_assert_held(&guc->ct.lock);
> > > > > > -
> > > > > > -       /*
> > > > > > -        * XXX: The seqno algorithm relies on TLB
> > > > > > invalidation
> > > > > > being processed
> > > > > > -        * in order which they currently are, if that changes
> > > > > > the
> > > > > > algorithm will
> > > > > > -        * need to be updated.
> > > > > > -        */
> > > > > > -
> > > > > > -       xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> > > > > > -
> > > > > > -       return xe_guc_ct_send(&guc->ct, action, len,
> > > > > > -                             G2H_LEN_DW_TLB_INVALIDATE, 1);
> > > > > > -}
> > > > > > -
> > > > > >  static void xe_tlb_inval_fence_prep(struct
> > > > > > xe_tlb_inval_fence
> > > > > > *fence)
> > > > > >  {
> > > > > >         struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
> > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > -
> > > > > > -       lockdep_assert_held(&gt->uc.guc.ct.lock);
> > > > > >  
> > > > > >         fence->seqno = tlb_inval->seqno;
> > > > > > -       trace_xe_tlb_inval_fence_send(xe, fence);
> > > > > > +       trace_xe_tlb_inval_fence_send(tlb_inval->xe, fence);
> > > > > >  
> > > > > >         spin_lock_irq(&tlb_inval->pending_lock);
> > > > > >         fence->inval_time = ktime_get();
> > > > > >         list_add_tail(&fence->link, &tlb_inval-
> > > > > > >pending_fences);
> > > > > >  
> > > > > >         if (list_is_singular(&tlb_inval->pending_fences))
> > > > > > -               queue_delayed_work(system_wq,
> > > > > > -                                  &tlb_inval->fence_tdr,
> > > > > > -                                  tlb_timeout_jiffies(gt));
> > > > > > +               queue_delayed_work(system_wq, &tlb_inval-
> > > > > > > fence_tdr,
> > > > > > +                                  tlb_inval->ops-
> > > > > > > timeout_delay(tlb_inval));
> > > > > >         spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > >  
> > > > > >         tlb_inval->seqno = (tlb_inval->seqno + 1) %
> > > > > > @@ -247,202 +214,63 @@ static void
> > > > > > xe_tlb_inval_fence_prep(struct
> > > > > > xe_tlb_inval_fence *fence)
> > > > > >                 tlb_inval->seqno = 1;
> > > > > >  }
> > > > > >  
> > > > > > -#define MAKE_INVAL_OP(type)    ((type <<
> > > > > > XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > > > > > -               XE_GUC_TLB_INVAL_MODE_HEAVY <<
> > > > > > XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > > > > > -               XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > > > > > -
> > > > > > -static int send_tlb_inval_ggtt(struct xe_gt *gt, int seqno)
> > > > > > -{
> > > > > > -       u32 action[] = {
> > > > > > -               XE_GUC_ACTION_TLB_INVALIDATION,
> > > > > > -               seqno,
> > > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > > > > > -       };
> > > > > > -
> > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > ARRAY_SIZE(action));
> > > > > > -}
> > > > > > -
> > > > > > -static int send_tlb_inval_all(struct xe_tlb_inval
> > > > > > *tlb_inval,
> > > > > > -                             struct xe_tlb_inval_fence
> > > > > > *fence)
> > > > > > -{
> > > > > > -       u32 action[] = {
> > > > > > -               XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > > > > > -               0,  /* seqno, replaced in send_tlb_inval */
> > > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > > > > -       };
> > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > -
> > > > > > -       xe_gt_assert(gt, fence);
> > > > > > -
> > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > ARRAY_SIZE(action));
> > > > > > -}
> > > > > > +#define xe_tlb_inval_issue(__tlb_inval, __fence, op,
> > > > > > args...)  \
> > > > > > +({                                                          
> > > > > >    \
> > > > > > +       int
> > > > > > __ret;                                              \
> > > > > > +                                                            
> > > > > >    \
> > > > > > +       xe_assert((__tlb_inval)->xe, (__tlb_inval)-
> > > > > > >ops);       \
> > > > > > +       xe_assert((__tlb_inval)->xe,
> > > > > > (__fence));                \
> > > > > > +                                                            
> > > > > >    \
> > > > > > +       (__tlb_inval)->ops-
> > > > > > >lock((__tlb_inval));                \
> > > > > > +       xe_tlb_inval_fence_prep((__fence));                  
> > > > > >    \
> > > > > > +       __ret = op((__tlb_inval), (__fence)->seqno,
> > > > > > ##args);    \
> > > > > > +       if (__ret <
> > > > > > 0)                                          \
> > > > > > +               xe_tlb_inval_fence_signal_unlocked((__fence))
> > > > > > ;  \
> > > > > > +       (__tlb_inval)->ops-
> > > > > > >unlock((__tlb_inval));              \
> > > > > > +                                                            
> > > > > >    \
> > > > > > +       __ret == -ECANCELED ? 0 :
> > > > > > __ret;                        \
> > > > > > +})
> > > > > >  
> > > > > >  /**
> > > > > > - * xe_gt_tlb_invalidation_all - Invalidate all TLBs across
> > > > > > PF
> > > > > > and all VFs.
> > > > > > - * @gt: the &xe_gt structure
> > > > > > - * @fence: the &xe_tlb_inval_fence to be signaled on
> > > > > > completion
> > > > > > + * xe_tlb_inval_all() - Issue a TLB invalidation for all
> > > > > > TLBs
> > > > > > + * @tlb_inval: TLB invalidation client
> > > > > > + * @fence: invalidation fence which will be signal on TLB
> > > > > > invalidation
> > > > > > + * completion
> > > > > >   *
> > > > > > - * Send a request to invalidate all TLBs across PF and all
> > > > > > VFs.
> > > > > > + * Issue a TLB invalidation for all TLBs. Completion of TLB
> > > > > > is
> > > > > > asynchronous and
> > > > > > + * caller can use the invalidation fence to wait for
> > > > > > completion.
> > > > > >   *
> > > > > >   * Return: 0 on success, negative error code on error
> > > > > >   */
> > > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > > >                      struct xe_tlb_inval_fence *fence)
> > > > > >  {
> > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > -       int err;
> > > > > > -
> > > > > > -       err = send_tlb_inval_all(tlb_inval, fence);
> > > > > > -       if (err)
> > > > > > -               xe_gt_err(gt, "TLB invalidation request
> > > > > > failed
> > > > > > (%pe)", ERR_PTR(err));
> > > > > > -
> > > > > > -       return err;
> > > > > > -}
> > > > > > -
> > > > > > -/*
> > > > > > - * Ensure that roundup_pow_of_two(length) doesn't overflow.
> > > > > > - * Note that roundup_pow_of_two() operates on unsigned long,
> > > > > > - * not on u64.
> > > > > > - */
> > > > > > -#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > > > > > (rounddown_pow_of_two(ULONG_MAX))
> > > > > > -
> > > > > > -static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64 start,
> > > > > > u64
> > > > > > end,
> > > > > > -                               u32 asid, int seqno)
> > > > > > -{
> > > > > > -#define MAX_TLB_INVALIDATION_LEN       7
> > > > > > -       u32 action[MAX_TLB_INVALIDATION_LEN];
> > > > > > -       u64 length = end - start;
> > > > > > -       int len = 0;
> > > > > > -
> > > > > > -       action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > > > > > -       action[len++] = seqno;
> > > > > > -       if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > > > > -           length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > > > > > -               action[len++] =
> > > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > > > > > -       } else {
> > > > > > -               u64 orig_start = start;
> > > > > > -               u64 align;
> > > > > > -
> > > > > > -               if (length < SZ_4K)
> > > > > > -                       length = SZ_4K;
> > > > > > -
> > > > > > -               /*
> > > > > > -                * We need to invalidate a higher granularity
> > > > > > if
> > > > > > start address
> > > > > > -                * is not aligned to length. When start is
> > > > > > not
> > > > > > aligned with
> > > > > > -                * length we need to find the length large
> > > > > > enough
> > > > > > to create an
> > > > > > -                * address mask covering the required range.
> > > > > > -                */
> > > > > > -               align = roundup_pow_of_two(length);
> > > > > > -               start = ALIGN_DOWN(start, align);
> > > > > > -               end = ALIGN(end, align);
> > > > > > -               length = align;
> > > > > > -               while (start + length < end) {
> > > > > > -                       length <<= 1;
> > > > > > -                       start = ALIGN_DOWN(orig_start,
> > > > > > length);
> > > > > > -               }
> > > > > > -
> > > > > > -               /*
> > > > > > -                * Minimum invalidation size for a 2MB page
> > > > > > that
> > > > > > the hardware
> > > > > > -                * expects is 16MB
> > > > > > -                */
> > > > > > -               if (length >= SZ_2M) {
> > > > > > -                       length = max_t(u64, SZ_16M, length);
> > > > > > -                       start = ALIGN_DOWN(orig_start,
> > > > > > length);
> > > > > > -               }
> > > > > > -
> > > > > > -               xe_gt_assert(gt, length >= SZ_4K);
> > > > > > -               xe_gt_assert(gt, is_power_of_2(length));
> > > > > > -               xe_gt_assert(gt, !(length &
> > > > > > GENMASK(ilog2(SZ_16M)
> > > > > > - 1,
> > > > > > -                                                  
> > > > > > ilog2(SZ_2M)
> > > > > > + 1)));
> > > > > > -               xe_gt_assert(gt, IS_ALIGNED(start, length));
> > > > > > -
> > > > > > -               action[len++] =
> > > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > > > > > -               action[len++] = asid;
> > > > > > -               action[len++] = lower_32_bits(start);
> > > > > > -               action[len++] = upper_32_bits(start);
> > > > > > -               action[len++] = ilog2(length) - ilog2(SZ_4K);
> > > > > > -       }
> > > > > > -
> > > > > > -       xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> > > > > > -
> > > > > > -       return send_tlb_inval(&gt->uc.guc, action, len);
> > > > > > -}
> > > > > > -
> > > > > > -static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
> > > > > > -                              struct xe_tlb_inval_fence
> > > > > > *fence)
> > > > > > -{
> > > > > > -       int ret;
> > > > > > -
> > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > -
> > > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > > -
> > > > > > -       ret = send_tlb_inval_ggtt(gt, fence->seqno);
> > > > > > -       if (ret < 0)
> > > > > > -               inval_fence_signal_unlocked(gt_to_xe(gt),
> > > > > > fence);
> > > > > > -
> > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > -
> > > > > > -       /*
> > > > > > -        * -ECANCELED indicates the CT is stopped for a GT
> > > > > > reset.
> > > > > > TLB caches
> > > > > > -        *  should be nuked on a GT reset so this error can
> > > > > > be
> > > > > > ignored.
> > > > > > -        */
> > > > > > -       if (ret == -ECANCELED)
> > > > > > -               return 0;
> > > > > > -
> > > > > > -       return ret;
> > > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > > tlb_inval-
> > > > > > > ops->all);
> > > > > >  }
> > > > > >  
> > > > > >  /**
> > > > > > - * xe_tlb_inval_ggtt - Issue a TLB invalidation on this GT
> > > > > > for
> > > > > > the GGTT
> > > > > > + * xe_tlb_inval_ggtt() - Issue a TLB invalidation for the
> > > > > > GGTT
> > > > > >   * @tlb_inval: TLB invalidation client
> > > > > >   *
> > > > > > - * Issue a TLB invalidation for the GGTT. Completion of TLB
> > > > > > invalidation is
> > > > > > - * synchronous.
> > > > > > + * Issue a TLB invalidation for the GGTT. Completion of TLB
> > > > > > is
> > > > > > asynchronous and
> > > > > > + * caller can use the invalidation fence to wait for
> > > > > > completion.
> > > > > >   *
> > > > > >   * Return: 0 on success, negative error code on error
> > > > > >   */
> > > > > >  int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
> > > > > >  {
> > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > -       unsigned int fw_ref;
> > > > > > -
> > > > > > -       if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
> > > > > > -           gt->uc.guc.submission_state.enabled) {
> > > > > > -               struct xe_tlb_inval_fence fence;
> > > > > > -               int ret;
> > > > > > -
> > > > > > -               xe_tlb_inval_fence_init(tlb_inval, &fence,
> > > > > > true);
> > > > > > -               ret = __xe_tlb_inval_ggtt(gt, &fence);
> > > > > > -               if (ret)
> > > > > > -                       return ret;
> > > > > > -
> > > > > > -               xe_tlb_inval_fence_wait(&fence);
> > > > > > -       } else if (xe_device_uc_enabled(xe) &&
> > > > > > !xe_device_wedged(xe)) {
> > > > > > -               struct xe_mmio *mmio = &gt->mmio;
> > > > > > -
> > > > > > -               if (IS_SRIOV_VF(xe))
> > > > > > -                       return 0;
> > > > > > -
> > > > > > -               fw_ref = xe_force_wake_get(gt_to_fw(gt),
> > > > > > XE_FW_GT);
> > > > > > -               if (xe->info.platform == XE_PVC ||
> > > > > > GRAPHICS_VER(xe) >= 20) {
> > > > > > -                       xe_mmio_write32(mmio,
> > > > > > PVC_GUC_TLB_INV_DESC1,
> > > > > > -
> > > > > >                                        PVC_GUC_TLB_INV_DESC1_
> > > > > > INVAL
> > > > > > IDATE);
> > > > > > -                       xe_mmio_write32(mmio,
> > > > > > PVC_GUC_TLB_INV_DESC0,
> > > > > > -
> > > > > >                                        PVC_GUC_TLB_INV_DESC0_
> > > > > > VALID
> > > > > > );
> > > > > > -               } else {
> > > > > > -                       xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> > > > > > -
> > > > > >                                        GUC_TLB_INV_CR_INVALID
> > > > > > ATE);
> > > > > > -               }
> > > > > > -               xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > > > > > -       }
> > > > > > +       struct xe_tlb_inval_fence fence, *fence_ptr = &fence;
> > > > > > +       int ret;
> > > > > >  
> > > > > > -       return 0;
> > > > > > +       xe_tlb_inval_fence_init(tlb_inval, fence_ptr, true);
> > > > > > +       ret = xe_tlb_inval_issue(tlb_inval, fence_ptr,
> > > > > > tlb_inval-
> > > > > > > ops->ggtt);
> > > > > > +       xe_tlb_inval_fence_wait(fence_ptr);
> > > > > > +
> > > > > > +       return ret;
> > > > > >  }
> > > > > >  
> > > > > >  /**
> > > > > > - * xe_tlb_inval_range - Issue a TLB invalidation on this GT
> > > > > > for
> > > > > > an address range
> > > > > > + * xe_tlb_inval_range() - Issue a TLB invalidation for an
> > > > > > address range
> > > > > >   * @tlb_inval: TLB invalidation client
> > > > > >   * @fence: invalidation fence which will be signal on TLB
> > > > > > invalidation
> > > > > >   * completion
> > > > > > @@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct
> > > > > > xe_tlb_inval
> > > > > > *tlb_inval,
> > > > > >                        struct xe_tlb_inval_fence *fence, u64
> > > > > > start, u64 end,
> > > > > >                        u32 asid)
> > > > > >  {
> > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > -       int  ret;
> > > > > > -
> > > > > > -       xe_gt_assert(gt, fence);
> > > > > > -
> > > > > > -       /* Execlists not supported */
> > > > > > -       if (xe->info.force_execlist) {
> > > > > > -               __inval_fence_signal(xe, fence);
> > > > > > -               return 0;
> > > > > > -       }
> > > > > > -
> > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > -
> > > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > > -
> > > > > > -       ret = send_tlb_inval_ppgtt(gt, start, end, asid,
> > > > > > fence-
> > > > > > > seqno);
> > > > > > -       if (ret < 0)
> > > > > > -               inval_fence_signal_unlocked(xe, fence);
> > > > > > -
> > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > -
> > > > > > -       return ret;
> > > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > > tlb_inval-
> > > > > > > ops->ppgtt,
> > > > > > +                                 start, end, asid);
> > > > > >  }
> > > > > >  
> > > > > >  /**
> > > > > > - * xe_tlb_inval_vm - Issue a TLB invalidation on this GT for
> > > > > > a
> > > > > > VM
> > > > > > + * xe_tlb_inval_vm() - Issue a TLB invalidation for a VM
> > > > > >   * @tlb_inval: TLB invalidation client
> > > > > >   * @vm: VM to invalidate
> > > > > >   *
> > > > > > @@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct
> > > > > > xe_tlb_inval
> > > > > > *tlb_inval, struct xe_vm *vm)
> > > > > >  {
> > > > > >         struct xe_tlb_inval_fence fence;
> > > > > >         u64 range = 1ull << vm->xe->info.va_bits;
> > > > > > -       int ret;
> > > > > >  
> > > > > >         xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> > > > > > -
> > > > > > -       ret = xe_tlb_inval_range(tlb_inval, &fence, 0, range,
> > > > > > vm-
> > > > > > > usm.asid);
> > > > > > -       if (ret < 0)
> > > > > > -               return;
> > > > > > -
> > > > > > +       xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm-
> > > > > > > usm.asid);
> > > > > >         xe_tlb_inval_fence_wait(&fence);
> > > > > >  }
> > > > > >  
> > > > > >  /**
> > > > > > - * xe_tlb_inval_done_handler - TLB invalidation done handler
> > > > > > - * @gt: gt
> > > > > > + * xe_tlb_inval_done_handler() - TLB invalidation done
> > > > > > handler
> > > > > > + * @tlb_inval: TLB invalidation client
> > > > > >   * @seqno: seqno of invalidation that is done
> > > > > >   *
> > > > > >   * Update recv seqno, signal any TLB invalidation fences,
> > > > > > and
> > > > > > restart TDR
> > > > > 
> > > > > I'd mention that is function is safe be called from any context
> > > > > (i.e.,
> > > > > process, atomic, and hardirq contexts are allowed).
> > > > > 
> > > > > We might need to convert tlb_inval.pending_lock to a
> > > > > raw_spinlock_t
> > > > > for
> > > > > PREEMPT_RT enablement. Same for the GuC fast_lock. AFAIK we
> > > > > haven’t
> > > > > had
> > > > > any complaints, so maybe I’m just overthinking it, but also
> > > > > perhaps
> > > > > not.
> > > > > 
> > > > > >   */
> > > > > > -static void xe_tlb_inval_done_handler(struct xe_gt *gt, int
> > > > > > seqno)
> > > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > > *tlb_inval,
> > > > > > int seqno)
> > > > > >  {
> > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > +       struct xe_device *xe = tlb_inval->xe;
> > > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > > >         unsigned long flags;
> > > > > >  
> > > > > > @@ -535,77 +337,53 @@ static void
> > > > > > xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> > > > > >          * officially process the CT message like if racing
> > > > > > against
> > > > > >          * process_g2h_msg().
> > > > > >          */
> > > > > > -       spin_lock_irqsave(&gt->tlb_inval.pending_lock,
> > > > > > flags);
> > > > > > -       if (tlb_inval_seqno_past(gt, seqno)) {
> > > > > > -               spin_unlock_irqrestore(&gt-
> > > > > > > tlb_inval.pending_lock, flags);
> > > > > > +       spin_lock_irqsave(&tlb_inval->pending_lock, flags);
> > > > > > +       if (xe_tlb_inval_seqno_past(tlb_inval, seqno)) {
> > > > > > +               spin_unlock_irqrestore(&tlb_inval-
> > > > > > >pending_lock,
> > > > > > flags);
> > > > > >                 return;
> > > > > >         }
> > > > > >  
> > > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
> > > > > > +       WRITE_ONCE(tlb_inval->seqno_recv, seqno);
> > > > > >  
> > > > > >         list_for_each_entry_safe(fence, next,
> > > > > > -                                &gt-
> > > > > > >tlb_inval.pending_fences,
> > > > > > link) {
> > > > > > +                                &tlb_inval->pending_fences,
> > > > > > link) {
> > > > > >                 trace_xe_tlb_inval_fence_recv(xe, fence);
> > > > > >  
> > > > > > -               if (!tlb_inval_seqno_past(gt, fence->seqno))
> > > > > > +               if (!xe_tlb_inval_seqno_past(tlb_inval,
> > > > > > fence-
> > > > > > > seqno))
> > > > > >                         break;
> > > > > >  
> > > > > > -               inval_fence_signal(xe, fence);
> > > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > >         }
> > > > > >  
> > > > > > -       if (!list_empty(&gt->tlb_inval.pending_fences))
> > > > > > +       if (!list_empty(&tlb_inval->pending_fences))
> > > > > >                 mod_delayed_work(system_wq,
> > > > > > -                                &gt->tlb_inval.fence_tdr,
> > > > > > -                                tlb_timeout_jiffies(gt));
> > > > > > +                                &tlb_inval->fence_tdr,
> > > > > > +                                tlb_inval->ops-
> > > > > > > timeout_delay(tlb_inval));
> > > > > >         else
> > > > > > -               cancel_delayed_work(&gt-
> > > > > > >tlb_inval.fence_tdr);
> > > > > > +               cancel_delayed_work(&tlb_inval->fence_tdr);
> > > > > >  
> > > > > > -       spin_unlock_irqrestore(&gt->tlb_inval.pending_lock,
> > > > > > flags);
> > > > > > -}
> > > > > > -
> > > > > > -/**
> > > > > > - * xe_guc_tlb_inval_done_handler - TLB invalidation done
> > > > > > handler
> > > > > > - * @guc: guc
> > > > > > - * @msg: message indicating TLB invalidation done
> > > > > > - * @len: length of message
> > > > > > - *
> > > > > > - * Parse seqno of TLB invalidation, wake any waiters for
> > > > > > seqno,
> > > > > > and signal any
> > > > > > - * invalidation fences for seqno. Algorithm for this depends
> > > > > > on
> > > > > > seqno being
> > > > > > - * received in-order and asserts this assumption.
> > > > > > - *
> > > > > > - * Return: 0 on success, -EPROTO for malformed messages.
> > > > > > - */
> > > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32
> > > > > > *msg,
> > > > > > u32 len)
> > > > > > -{
> > > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > > -
> > > > > > -       if (unlikely(len != 1))
> > > > > > -               return -EPROTO;
> > > > > > -
> > > > > > -       xe_tlb_inval_done_handler(gt, msg[0]);
> > > > > > -
> > > > > > -       return 0;
> > > > > > +       spin_unlock_irqrestore(&tlb_inval->pending_lock,
> > > > > > flags);
> > > > > >  }
> > > > > >  
> > > > > >  static const char *
> > > > > > -inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > > > > > +xe_inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > > > > >  {
> > > > > >         return "xe";
> > > > > >  }
> > > > > >  
> > > > > >  static const char *
> > > > > > -inval_fence_get_timeline_name(struct dma_fence *dma_fence)
> > > > > > +xe_inval_fence_get_timeline_name(struct dma_fence
> > > > > > *dma_fence)
> > > > > >  {
> > > > > > -       return "inval_fence";
> > > > > > +       return "tlb_inval_fence";
> > > > > >  }
> > > > > >  
> > > > > >  static const struct dma_fence_ops inval_fence_ops = {
> > > > > > -       .get_driver_name = inval_fence_get_driver_name,
> > > > > > -       .get_timeline_name = inval_fence_get_timeline_name,
> > > > > > +       .get_driver_name = xe_inval_fence_get_driver_name,
> > > > > > +       .get_timeline_name =
> > > > > > xe_inval_fence_get_timeline_name,
> > > > > >  };
> > > > > >  
> > > > > >  /**
> > > > > > - * xe_tlb_inval_fence_init - Initialize TLB invalidation
> > > > > > fence
> > > > > > + * xe_tlb_inval_fence_init() - Initialize TLB invalidation
> > > > > > fence
> > > > > >   * @tlb_inval: TLB invalidation client
> > > > > >   * @fence: TLB invalidation fence to initialize
> > > > > >   * @stack: fence is stack variable
> > > > > > @@ -618,15 +396,12 @@ void xe_tlb_inval_fence_init(struct
> > > > > > xe_tlb_inval *tlb_inval,
> > > > > >                              struct xe_tlb_inval_fence
> > > > > > *fence,
> > > > > >                              bool stack)
> > > > > >  {
> > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > -
> > > > > > -       xe_pm_runtime_get_noresume(gt_to_xe(gt));
> > > > > > +       xe_pm_runtime_get_noresume(tlb_inval->xe);
> > > > > >  
> > > > > > -       spin_lock_irq(&gt->tlb_inval.lock);
> > > > > > -       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > > > -                      &gt->tlb_inval.lock,
> > > > > > +       spin_lock_irq(&tlb_inval->lock);
> > > > > > +       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > > > &tlb_inval->lock,
> > > > > >                        dma_fence_context_alloc(1), 1);
> > > > > > -       spin_unlock_irq(&gt->tlb_inval.lock);
> > > > > > +       spin_unlock_irq(&tlb_inval->lock);
> > > > > 
> > > > > While here, 'fence_lock' is probably a better name.
> > > > > 
> > > > > Matt
> > > > > 
> > > > > >         INIT_LIST_HEAD(&fence->link);
> > > > > >         if (stack)
> > > > > >                 set_bit(FENCE_STACK_BIT, &fence->base.flags);
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > index 7adee3f8c551..cdeafc8d4391 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > @@ -18,24 +18,30 @@ struct xe_vma;
> > > > > >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
> > > > > >  
> > > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval);
> > > > > > -int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > > > -void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct
> > > > > > xe_vm *vm);
> > > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > > >                      struct xe_tlb_inval_fence *fence);
> > > > > > +int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > > > +void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct
> > > > > > xe_vm *vm);
> > > > > >  int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
> > > > > >                        struct xe_tlb_inval_fence *fence,
> > > > > >                        u64 start, u64 end, u32 asid);
> > > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32
> > > > > > *msg,
> > > > > > u32 len);
> > > > > >  
> > > > > >  void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
> > > > > >                              struct xe_tlb_inval_fence
> > > > > > *fence,
> > > > > >                              bool stack);
> > > > > > -void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence
> > > > > > *fence);
> > > > > >  
> > > > > > +/**
> > > > > > + * xe_tlb_inval_fence_wait() - TLB invalidiation fence wait
> > > > > > + * @fence: TLB invalidation fence to wait on
> > > > > > + *
> > > > > > + * Wait on a TLB invalidiation fence until it signals, non
> > > > > > interruptable
> > > > > > + */
> > > > > >  static inline void
> > > > > >  xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence *fence)
> > > > > >  {
> > > > > >         dma_fence_wait(&fence->base, false);
> > > > > >  }
> > > > > >  
> > > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > > *tlb_inval,
> > > > > > int seqno);
> > > > > > +
> > > > > >  #endif /* _XE_TLB_INVAL_ */
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > index 05b6adc929bb..c1ad96d24fc8 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > @@ -9,10 +9,85 @@
> > > > > >  #include <linux/workqueue.h>
> > > > > >  #include <linux/dma-fence.h>
> > > > > >  
> > > > > > -/** struct xe_tlb_inval - TLB invalidation client */
> > > > > > +struct xe_tlb_inval;
> > > > > > +
> > > > > > +/** struct xe_tlb_inval_ops - TLB invalidation ops (backend)
> > > > > > */
> > > > > > +struct xe_tlb_inval_ops {
> > > > > > +       /**
> > > > > > +        * @all: Invalidate all TLBs
> > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > +        *
> > > > > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > > > > reset, error on
> > > > > > +        * failure
> > > > > > +        */
> > > > > > +       int (*all)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > seqno);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @ggtt: Invalidate global translation TLBs
> > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > +        *
> > > > > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > > > > reset, error on
> > > > > > +        * failure
> > > > > > +        */
> > > > > > +       int (*ggtt)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > seqno);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @ppttt: Invalidate per-process translation TLBs
> > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > +        * @start: Start address
> > > > > > +        * @end: End address
> > > > > > +        * @asid: Address space ID
> > > > > > +        *
> > > > > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > > > > reset, error on
> > > > > > +        * failure
> > > > > > +        */
> > > > > > +       int (*ppgtt)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > seqno,
> > > > > > u64 start,
> > > > > > +                    u64 end, u32 asid);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @initialized: Backend is initialized
> > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > +        *
> > > > > > +        * Return: True if back is initialized, False
> > > > > > otherwise
> > > > > > +        */
> > > > > > +       bool (*initialized)(struct xe_tlb_inval *tlb_inval);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @flush: Flush pending TLB invalidations
> > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > +        */
> > > > > > +       void (*flush)(struct xe_tlb_inval *tlb_inval);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @timeout_delay: Timeout delay for TLB invalidation
> > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > +        *
> > > > > > +        * Return: Timeout delay for TLB invalidation in
> > > > > > jiffies
> > > > > > +        */
> > > > > > +       long (*timeout_delay)(struct xe_tlb_inval
> > > > > > *tlb_inval);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @lock: Lock resources protecting the backend seqno
> > > > > > management
> > > > > > +        */
> > > > > > +       void (*lock)(struct xe_tlb_inval *tlb_inval);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @unlock: Lock resources protecting the backend
> > > > > > seqno
> > > > > > management
> > > > > > +        */
> > > > > > +       void (*unlock)(struct xe_tlb_inval *tlb_inval);
> > > > > > +};
> > > > > > +
> > > > > > +/** struct xe_tlb_inval - TLB invalidation client (frontend)
> > > > > > */
> > > > > >  struct xe_tlb_inval {
> > > > > >         /** @private: Backend private pointer */
> > > > > >         void *private;
> > > > > > +       /** @xe: Pointer to Xe device */
> > > > > > +       struct xe_device *xe;
> > > > > > +       /** @ops: TLB invalidation ops */
> > > > > > +       const struct xe_tlb_inval_ops *ops;
> > > > > >         /** @tlb_inval.seqno: TLB invalidation seqno,
> > > > > > protected
> > > > > > by CT lock */
> > > > > >  #define TLB_INVALIDATION_SEQNO_MAX     0x100000
> > > > > >         int seqno;
> > > > > > -- 
> > > > > > 2.34.1
> > > > > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend
  2025-07-23 21:22             ` Matthew Brost
@ 2025-07-23 22:03               ` Summers, Stuart
  2025-07-23 22:43                 ` Summers, Stuart
  2025-07-23 23:21                 ` Matthew Brost
  2025-07-23 23:19               ` Summers, Stuart
  1 sibling, 2 replies; 19+ messages in thread
From: Summers, Stuart @ 2025-07-23 22:03 UTC (permalink / raw)
  To: Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
	Kassabri, Farah, Auld, Matthew

On Wed, 2025-07-23 at 14:22 -0700, Matthew Brost wrote:
> On Wed, Jul 23, 2025 at 02:55:24PM -0600, Summers, Stuart wrote:
> > On Wed, 2025-07-23 at 13:47 -0700, Matthew Brost wrote:
> > > 
> > 
> > <cut>
> > (just to reduce the noise in the rest of the patch here for now...)
> > 
> > > > > > >  
> > > > > > >  /**
> > > > > > > - * xe_tlb_inval_reset - Initialize TLB invalidation
> > > > > > > reset
> > > > > > > + * xe_tlb_inval_reset() - TLB invalidation reset
> > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > >   *
> > > > > > >   * Signal any pending invalidation fences, should be
> > > > > > > called
> > > > > > > during a GT reset
> > > > > > >   */
> > > > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval)
> > > > > > >  {
> > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > > > >         int pending_seqno;
> > > > > > >  
> > > > > > >         /*
> > > > > > > -        * we can get here before the CTs are even
> > > > > > > initialized if
> > > > > > > we're wedging
> > > > > > > -        * very early, in which case there are not going
> > > > > > > to
> > > > > > > be
> > > > > > > any pending
> > > > > > > -        * fences so we can bail immediately.
> > > > > > > +        * we can get here before the backends are even
> > > > > > > initialized if we're
> > > > > > > +        * wedging very early, in which case there are
> > > > > > > not
> > > > > > > going
> > > > > > > to be any
> > > > > > > +        * pendind fences so we can bail immediately.
> > > > > > >          */
> > > > > > > -       if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
> > > > > > > +       if (!tlb_inval->ops->initialized(tlb_inval))
> > > > > > >                 return;
> > > > > > >  
> > > > > > >         /*
> > > > > > > -        * CT channel is already disabled at this point.
> > > > > > > No
> > > > > > > new
> > > > > > > TLB requests can
> > > > > > > +        * Backend is already disabled at this point. No
> > > > > > > new
> > > > > > > TLB
> > > > > > > requests can
> > > > > > >          * appear.
> > > > > > >          */
> > > > > > >  
> > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > -       spin_lock_irq(&gt->tlb_inval.pending_lock);
> > > > > > > -       cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > > > > > > +       tlb_inval->ops->lock(tlb_inval);
> > > > > > 
> > > > > > I think you want a dedicated lock embedded in struct
> > > > > > xe_tlb_inval,
> > > > > > rather than reaching into the backend to grab one.
> > > > > > 
> > > > > > This will deadlock as written: G2H TLB inval messages are
> > > > > > sometimes
> > > > > > processed while holding ct->lock (non-fast path, unlikely)
> > > > > > and
> > > > > > sometimes
> > > > > > without it (fast path, likely).
> > > > > 
> > > > > Ugh, I'm off today. Ignore the deadlock part, I was confusing
> > > > > myself...
> > > > > I was thinking this was the function
> > > > > xe_tlb_inval_done_handler,
> > > > > it is
> > > > > not. I still think xe_tlb_inval should its own lock but this
> > > > > patch
> > > > > written should work with
> > > > > s/xe_guc_ct_send/xe_guc_ct_send_locked.
> > > > 
> > > > So one reason I didn't go that way is we did just the reverse
> > > > recently
> > > > - moved from a TLB dedicated lock to the more specific CT lock
> > > > since
> > > > these are all going into the CT handler anyway when we use GuC
> > > > submission. Then this embedded version allows us to lock at the
> > > > bottom
> > > > data layer rather than having a separate lock in the upper
> > > > layer.
> > > > Another thing is we might want to have different types of
> > > > invalidation
> > > > running in parallel without locking the data in the upper layer
> > > > since
> > > > the real contention would be in the lower level pipelining
> > > > anyway.
> > > > 
> > > 
> > > I can see the reasoning behind this approach, and maybe it’s
> > > fine.
> > > 
> > > But consider the case where the GuC backend has to look up a VM,
> > > iterate
> > > over a list of exec queues, and send multiple H2Gs to the
> > > hardware,
> > > each
> > > with a corresponding G2H (per-context invalidations). In the
> > > worst
> > > case,
> > > the CT code may have to wait for and process some G2Hs because
> > > our
> > > G2H
> > > credits are exhausted—all while holding the CT lock, which
> > > currently
> > > blocks any hardware submissions (i.e., hardware submissions need
> > > the
> > > CT
> > > lock). Now imagine multiple sources issuing invalidations: they
> > > could
> > > grab the CT lock before a submission waiting on it, further
> > > delaying
> > > that
> > > submission. 
> > > 
> > > The longer a mutex is held, the more likely the CPU thread
> > > holding it
> > > could switched out while holding it.
> > > 
> > > This doesn’t seem scalable compared to using a finer-grained CT
> > > lock
> > > (e.g., only taking it in xe_guc_ct_send).
> > > 
> > > I’m not saying this won’t work as you have it—I think it will—but
> > > the
> > > consequences of holding the CT lock for an extended period need
> > > to be
> > > considered.
> > 
> > Couple more thoughts.. so in the case you mentioned, ideally I'd
> > like
> > to have just a single invalidation per request, rather than across
> > a
> > whole VM. That's the reason we have the range based invalidation to
> 
> Yes, this is ranged based.
> 
> > begin with. If we get to the point where we want to make that even
> > finer, that's great, but we should still just have a single
> > invalidation per request (again, ideally).
> > 
> 
> Maybe you have a different idea, but I was thinking of queue-based
> invalidations: the frontend assigns a single seqno, the backend
> issues N
> invalidations to the hardware—one per GCID mapped in the VM/GT
> tuple—and
> then signals the frontend when all invalidations associated with the
> seqno are complete. With the GuC, a GCID corresponds to each exec
> queue’s
> gucid mapped in the VM/GT tuple. Different backends can handle this
> differently.
> 
> > Also, you already have some patches up on the list that do some
> > coalescing of invalidations so we reduce the number of
> > invalidations
> > for multiple ranges. I didn't want to include those patches here
> > because IMO they are really a separate feature here and it'd be
> > nice to
> > review that on its own.
> > 
> 
> I agree it is a seperate thing, that should help in some cases, and
> should be reviewed on its own.
> 
> That doesn't help in the case of multiple VM's issuing invalidations
> though (think eviction is occuring or MMU notifiers are firing). The
> lock contenion is moved from a dedicated TLB invalidation lock, to a
> widely shared CT lock. If multiple TLB invalidations are contending,
> now
> all other users of the CT lock contend at this higher level. i.e., by
> only acquring CT lock at last part of an invalidation, other waiters
> (non-invalidation) get QoS.

I mean, this was the original reason I had understood for having the
separate lock in the first place. But it feels a little like we're
running in circles here moving between the two modes..

I do see what you're saying though, basically the problem is the CT
send routine right now is doing a busy wait for a reply from guc each
time it sends something, all within the lock.

                if (!wait_event_timeout(ct->wq, !ct->g2h_outstanding ||
                                        g2h_avail(ct), HZ))

So if we're going to stick with this, yeah I agree we really need some
kind of queuing if we're going to have a lot of these fine grained
invalidations all in a row or we'll start blocking things like page
fault replies.

I'm wondering if the better way to approach this though would be to
refactor on the GuC side rather than do something really complicated on
the TLB side. I.e. why can't we do the CT busy wait in a worker thread
and let the send thread keep going adding more and more? It would mean
we'd have to do a better job of tracking each unique request out to guc
rather than just relying on the current g2h_outstanding count, but it
would at least let us do some of this work in parallel.

The queueing mechanism is still going to take work on top of what we
have in this series to build up these chains of h2g messages with the
CT lock held only for that last one. And IMO it still will be a little
messy calling into the lower layer (guc) and back out to the upper
layer (tlb) and back again to build these queues. And I'm not sure how
great that will work if we move to a different back end than guc - we
might not get any benefit there after all this work on the guc side.

Let me know what you think about a CT refactor like what I said.

And I still do think we can do a better job reducing the scope of some
of these invalidations, particularly in a case where we weanted to
associate something like the guc id with the VM to build a range rather
than just the addresses within the VM. At least in that case we can
look a little longer term at something like the CT refactor and still
keep the backend/frontend isolation intact.

Thanks,
Stuart

> 
> Matt
>  
> > So basically, the per request lock here also pushes us to implement
> > in
> > a more efficient and precise way rather than just hammering as many
> > invalidations over a given range as possible.
> > 
> > And of course there are going to need to be bigger hammer
> > invalidations
> > sometimes (like the full VF invalidation we're doing in the
> > invalidate_all() routines), but those still fall into the same
> > category
> > of precision, just with a larger scope (rather than multiple
> > smaller
> > invalidations).
> > 
> > Thanks,
> > Stuart
> > 
> > > 
> > > Matt
> > > 
> > > > Thanks,
> > > > Stuart
> > > > 
> > > > > 
> > > > > Matt 
> > > > > 
> > > > > > 
> > > > > > I’d call this lock seqno_lock, since it protects exactly
> > > > > > that—the
> > > > > > order
> > > > > > in which a seqno is assigned by the frontend and handed to
> > > > > > the
> > > > > > backend.
> > > > > > 
> > > > > > Prime this lock for reclaim as well—do what primelockdep()
> > > > > > does
> > > > > > in
> > > > > > xe_guc_ct.c—to make it clear that memory allocations are
> > > > > > not
> > > > > > allowed
> > > > > > while the lock is held as TLB invalidations can be called
> > > > > > from
> > > > > > two
> > > > > > reclaim paths:
> > > > > > 
> > > > > > - MMU notifier callbacks
> > > > > > - The dma-fence signaling path of VM binds that require a
> > > > > > TLB
> > > > > >   invalidation
> > > > > > 
> > > > > > > +       spin_lock_irq(&tlb_inval->pending_lock);
> > > > > > > +       cancel_delayed_work(&tlb_inval->fence_tdr);
> > > > > > >         /*
> > > > > > >          * We might have various kworkers waiting for TLB
> > > > > > > flushes
> > > > > > > to complete
> > > > > > >          * which are not tracked with an explicit TLB
> > > > > > > fence,
> > > > > > > however at this
> > > > > > > -        * stage that will never happen since the CT is
> > > > > > > already
> > > > > > > disabled, so
> > > > > > > -        * make sure we signal them here under the
> > > > > > > assumption
> > > > > > > that we have
> > > > > > > +        * stage that will never happen since the backend
> > > > > > > is
> > > > > > > already disabled,
> > > > > > > +        * so make sure we signal them here under the
> > > > > > > assumption
> > > > > > > that we have
> > > > > > >          * completed a full GT reset.
> > > > > > >          */
> > > > > > > -       if (gt->tlb_inval.seqno == 1)
> > > > > > > +       if (tlb_inval->seqno == 1)
> > > > > > >                 pending_seqno =
> > > > > > > TLB_INVALIDATION_SEQNO_MAX -
> > > > > > > 1;
> > > > > > >         else
> > > > > > > -               pending_seqno = gt->tlb_inval.seqno - 1;
> > > > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv,
> > > > > > > pending_seqno);
> > > > > > > +               pending_seqno = tlb_inval->seqno - 1;
> > > > > > > +       WRITE_ONCE(tlb_inval->seqno_recv, pending_seqno);
> > > > > > >  
> > > > > > >         list_for_each_entry_safe(fence, next,
> > > > > > > -                                &gt-
> > > > > > > > tlb_inval.pending_fences,
> > > > > > > link)
> > > > > > > -               inval_fence_signal(gt_to_xe(gt), fence);
> > > > > > > -       spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > +                                &tlb_inval-
> > > > > > > >pending_fences,
> > > > > > > link)
> > > > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > > > +       spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > > > +       tlb_inval->ops->unlock(tlb_inval);
> > > > > > >  }
> > > > > > >  
> > > > > > > -static bool tlb_inval_seqno_past(struct xe_gt *gt, int
> > > > > > > seqno)
> > > > > > > +static bool xe_tlb_inval_seqno_past(struct xe_tlb_inval
> > > > > > > *tlb_inval, int seqno)
> > > > > > >  {
> > > > > > > -       int seqno_recv = READ_ONCE(gt-
> > > > > > > >tlb_inval.seqno_recv);
> > > > > > > +       int seqno_recv = READ_ONCE(tlb_inval-
> > > > > > > >seqno_recv);
> > > > > > > +
> > > > > > > +       lockdep_assert_held(&tlb_inval->pending_lock);
> > > > > > >  
> > > > > > >         if (seqno - seqno_recv < -
> > > > > > > (TLB_INVALIDATION_SEQNO_MAX
> > > > > > > /
> > > > > > > 2))
> > > > > > >                 return false;
> > > > > > > @@ -201,44 +192,20 @@ static bool
> > > > > > > tlb_inval_seqno_past(struct
> > > > > > > xe_gt *gt, int seqno)
> > > > > > >         return seqno_recv >= seqno;
> > > > > > >  }
> > > > > > >  
> > > > > > > -static int send_tlb_inval(struct xe_guc *guc, const u32
> > > > > > > *action,
> > > > > > > int len)
> > > > > > > -{
> > > > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > > > -
> > > > > > > -       xe_gt_assert(gt, action[1]);    /* Seqno */
> > > > > > > -       lockdep_assert_held(&guc->ct.lock);
> > > > > > > -
> > > > > > > -       /*
> > > > > > > -        * XXX: The seqno algorithm relies on TLB
> > > > > > > invalidation
> > > > > > > being processed
> > > > > > > -        * in order which they currently are, if that
> > > > > > > changes
> > > > > > > the
> > > > > > > algorithm will
> > > > > > > -        * need to be updated.
> > > > > > > -        */
> > > > > > > -
> > > > > > > -       xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL,
> > > > > > > 1);
> > > > > > > -
> > > > > > > -       return xe_guc_ct_send(&guc->ct, action, len,
> > > > > > > -                             G2H_LEN_DW_TLB_INVALIDATE,
> > > > > > > 1);
> > > > > > > -}
> > > > > > > -
> > > > > > >  static void xe_tlb_inval_fence_prep(struct
> > > > > > > xe_tlb_inval_fence
> > > > > > > *fence)
> > > > > > >  {
> > > > > > >         struct xe_tlb_inval *tlb_inval = fence-
> > > > > > > >tlb_inval;
> > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > -
> > > > > > > -       lockdep_assert_held(&gt->uc.guc.ct.lock);
> > > > > > >  
> > > > > > >         fence->seqno = tlb_inval->seqno;
> > > > > > > -       trace_xe_tlb_inval_fence_send(xe, fence);
> > > > > > > +       trace_xe_tlb_inval_fence_send(tlb_inval->xe,
> > > > > > > fence);
> > > > > > >  
> > > > > > >         spin_lock_irq(&tlb_inval->pending_lock);
> > > > > > >         fence->inval_time = ktime_get();
> > > > > > >         list_add_tail(&fence->link, &tlb_inval-
> > > > > > > > pending_fences);
> > > > > > >  
> > > > > > >         if (list_is_singular(&tlb_inval->pending_fences))
> > > > > > > -               queue_delayed_work(system_wq,
> > > > > > > -                                  &tlb_inval->fence_tdr,
> > > > > > > -                                 
> > > > > > > tlb_timeout_jiffies(gt));
> > > > > > > +               queue_delayed_work(system_wq, &tlb_inval-
> > > > > > > > fence_tdr,
> > > > > > > +                                  tlb_inval->ops-
> > > > > > > > timeout_delay(tlb_inval));
> > > > > > >         spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > > >  
> > > > > > >         tlb_inval->seqno = (tlb_inval->seqno + 1) %
> > > > > > > @@ -247,202 +214,63 @@ static void
> > > > > > > xe_tlb_inval_fence_prep(struct
> > > > > > > xe_tlb_inval_fence *fence)
> > > > > > >                 tlb_inval->seqno = 1;
> > > > > > >  }
> > > > > > >  
> > > > > > > -#define MAKE_INVAL_OP(type)    ((type <<
> > > > > > > XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > > > > > > -               XE_GUC_TLB_INVAL_MODE_HEAVY <<
> > > > > > > XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > > > > > > -               XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > > > > > > -
> > > > > > > -static int send_tlb_inval_ggtt(struct xe_gt *gt, int
> > > > > > > seqno)
> > > > > > > -{
> > > > > > > -       u32 action[] = {
> > > > > > > -               XE_GUC_ACTION_TLB_INVALIDATION,
> > > > > > > -               seqno,
> > > > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > > > > > > -       };
> > > > > > > -
> > > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > > ARRAY_SIZE(action));
> > > > > > > -}
> > > > > > > -
> > > > > > > -static int send_tlb_inval_all(struct xe_tlb_inval
> > > > > > > *tlb_inval,
> > > > > > > -                             struct xe_tlb_inval_fence
> > > > > > > *fence)
> > > > > > > -{
> > > > > > > -       u32 action[] = {
> > > > > > > -               XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > > > > > > -               0,  /* seqno, replaced in send_tlb_inval
> > > > > > > */
> > > > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > > > > > -       };
> > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > -
> > > > > > > -       xe_gt_assert(gt, fence);
> > > > > > > -
> > > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > > ARRAY_SIZE(action));
> > > > > > > -}
> > > > > > > +#define xe_tlb_inval_issue(__tlb_inval, __fence, op,
> > > > > > > args...)  \
> > > > > > > +({                                                      
> > > > > > >     
> > > > > > >    \
> > > > > > > +       int
> > > > > > > __ret;                                              \
> > > > > > > +                                                        
> > > > > > >     
> > > > > > >    \
> > > > > > > +       xe_assert((__tlb_inval)->xe, (__tlb_inval)-
> > > > > > > > ops);       \
> > > > > > > +       xe_assert((__tlb_inval)->xe,
> > > > > > > (__fence));                \
> > > > > > > +                                                        
> > > > > > >     
> > > > > > >    \
> > > > > > > +       (__tlb_inval)->ops-
> > > > > > > > lock((__tlb_inval));                \
> > > > > > > +       xe_tlb_inval_fence_prep((__fence));              
> > > > > > >     
> > > > > > >    \
> > > > > > > +       __ret = op((__tlb_inval), (__fence)->seqno,
> > > > > > > ##args);    \
> > > > > > > +       if (__ret <
> > > > > > > 0)                                          \
> > > > > > > +               xe_tlb_inval_fence_signal_unlocked((__fen
> > > > > > > ce))
> > > > > > > ;  \
> > > > > > > +       (__tlb_inval)->ops-
> > > > > > > > unlock((__tlb_inval));              \
> > > > > > > +                                                        
> > > > > > >     
> > > > > > >    \
> > > > > > > +       __ret == -ECANCELED ? 0 :
> > > > > > > __ret;                        \
> > > > > > > +})
> > > > > > >  
> > > > > > >  /**
> > > > > > > - * xe_gt_tlb_invalidation_all - Invalidate all TLBs
> > > > > > > across
> > > > > > > PF
> > > > > > > and all VFs.
> > > > > > > - * @gt: the &xe_gt structure
> > > > > > > - * @fence: the &xe_tlb_inval_fence to be signaled on
> > > > > > > completion
> > > > > > > + * xe_tlb_inval_all() - Issue a TLB invalidation for all
> > > > > > > TLBs
> > > > > > > + * @tlb_inval: TLB invalidation client
> > > > > > > + * @fence: invalidation fence which will be signal on
> > > > > > > TLB
> > > > > > > invalidation
> > > > > > > + * completion
> > > > > > >   *
> > > > > > > - * Send a request to invalidate all TLBs across PF and
> > > > > > > all
> > > > > > > VFs.
> > > > > > > + * Issue a TLB invalidation for all TLBs. Completion of
> > > > > > > TLB
> > > > > > > is
> > > > > > > asynchronous and
> > > > > > > + * caller can use the invalidation fence to wait for
> > > > > > > completion.
> > > > > > >   *
> > > > > > >   * Return: 0 on success, negative error code on error
> > > > > > >   */
> > > > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > > > >                      struct xe_tlb_inval_fence *fence)
> > > > > > >  {
> > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > -       int err;
> > > > > > > -
> > > > > > > -       err = send_tlb_inval_all(tlb_inval, fence);
> > > > > > > -       if (err)
> > > > > > > -               xe_gt_err(gt, "TLB invalidation request
> > > > > > > failed
> > > > > > > (%pe)", ERR_PTR(err));
> > > > > > > -
> > > > > > > -       return err;
> > > > > > > -}
> > > > > > > -
> > > > > > > -/*
> > > > > > > - * Ensure that roundup_pow_of_two(length) doesn't
> > > > > > > overflow.
> > > > > > > - * Note that roundup_pow_of_two() operates on unsigned
> > > > > > > long,
> > > > > > > - * not on u64.
> > > > > > > - */
> > > > > > > -#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > > > > > > (rounddown_pow_of_two(ULONG_MAX))
> > > > > > > -
> > > > > > > -static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64
> > > > > > > start,
> > > > > > > u64
> > > > > > > end,
> > > > > > > -                               u32 asid, int seqno)
> > > > > > > -{
> > > > > > > -#define MAX_TLB_INVALIDATION_LEN       7
> > > > > > > -       u32 action[MAX_TLB_INVALIDATION_LEN];
> > > > > > > -       u64 length = end - start;
> > > > > > > -       int len = 0;
> > > > > > > -
> > > > > > > -       action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > > > > > > -       action[len++] = seqno;
> > > > > > > -       if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > > > > > -           length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > > > > > > -               action[len++] =
> > > > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > > > > > > -       } else {
> > > > > > > -               u64 orig_start = start;
> > > > > > > -               u64 align;
> > > > > > > -
> > > > > > > -               if (length < SZ_4K)
> > > > > > > -                       length = SZ_4K;
> > > > > > > -
> > > > > > > -               /*
> > > > > > > -                * We need to invalidate a higher
> > > > > > > granularity
> > > > > > > if
> > > > > > > start address
> > > > > > > -                * is not aligned to length. When start
> > > > > > > is
> > > > > > > not
> > > > > > > aligned with
> > > > > > > -                * length we need to find the length
> > > > > > > large
> > > > > > > enough
> > > > > > > to create an
> > > > > > > -                * address mask covering the required
> > > > > > > range.
> > > > > > > -                */
> > > > > > > -               align = roundup_pow_of_two(length);
> > > > > > > -               start = ALIGN_DOWN(start, align);
> > > > > > > -               end = ALIGN(end, align);
> > > > > > > -               length = align;
> > > > > > > -               while (start + length < end) {
> > > > > > > -                       length <<= 1;
> > > > > > > -                       start = ALIGN_DOWN(orig_start,
> > > > > > > length);
> > > > > > > -               }
> > > > > > > -
> > > > > > > -               /*
> > > > > > > -                * Minimum invalidation size for a 2MB
> > > > > > > page
> > > > > > > that
> > > > > > > the hardware
> > > > > > > -                * expects is 16MB
> > > > > > > -                */
> > > > > > > -               if (length >= SZ_2M) {
> > > > > > > -                       length = max_t(u64, SZ_16M,
> > > > > > > length);
> > > > > > > -                       start = ALIGN_DOWN(orig_start,
> > > > > > > length);
> > > > > > > -               }
> > > > > > > -
> > > > > > > -               xe_gt_assert(gt, length >= SZ_4K);
> > > > > > > -               xe_gt_assert(gt, is_power_of_2(length));
> > > > > > > -               xe_gt_assert(gt, !(length &
> > > > > > > GENMASK(ilog2(SZ_16M)
> > > > > > > - 1,
> > > > > > > -                                                  
> > > > > > > ilog2(SZ_2M)
> > > > > > > + 1)));
> > > > > > > -               xe_gt_assert(gt, IS_ALIGNED(start,
> > > > > > > length));
> > > > > > > -
> > > > > > > -               action[len++] =
> > > > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > > > > > > -               action[len++] = asid;
> > > > > > > -               action[len++] = lower_32_bits(start);
> > > > > > > -               action[len++] = upper_32_bits(start);
> > > > > > > -               action[len++] = ilog2(length) -
> > > > > > > ilog2(SZ_4K);
> > > > > > > -       }
> > > > > > > -
> > > > > > > -       xe_gt_assert(gt, len <=
> > > > > > > MAX_TLB_INVALIDATION_LEN);
> > > > > > > -
> > > > > > > -       return send_tlb_inval(&gt->uc.guc, action, len);
> > > > > > > -}
> > > > > > > -
> > > > > > > -static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
> > > > > > > -                              struct xe_tlb_inval_fence
> > > > > > > *fence)
> > > > > > > -{
> > > > > > > -       int ret;
> > > > > > > -
> > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > -
> > > > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > > > -
> > > > > > > -       ret = send_tlb_inval_ggtt(gt, fence->seqno);
> > > > > > > -       if (ret < 0)
> > > > > > > -               inval_fence_signal_unlocked(gt_to_xe(gt),
> > > > > > > fence);
> > > > > > > -
> > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > -
> > > > > > > -       /*
> > > > > > > -        * -ECANCELED indicates the CT is stopped for a
> > > > > > > GT
> > > > > > > reset.
> > > > > > > TLB caches
> > > > > > > -        *  should be nuked on a GT reset so this error
> > > > > > > can
> > > > > > > be
> > > > > > > ignored.
> > > > > > > -        */
> > > > > > > -       if (ret == -ECANCELED)
> > > > > > > -               return 0;
> > > > > > > -
> > > > > > > -       return ret;
> > > > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > > > tlb_inval-
> > > > > > > > ops->all);
> > > > > > >  }
> > > > > > >  
> > > > > > >  /**
> > > > > > > - * xe_tlb_inval_ggtt - Issue a TLB invalidation on this
> > > > > > > GT
> > > > > > > for
> > > > > > > the GGTT
> > > > > > > + * xe_tlb_inval_ggtt() - Issue a TLB invalidation for
> > > > > > > the
> > > > > > > GGTT
> > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > >   *
> > > > > > > - * Issue a TLB invalidation for the GGTT. Completion of
> > > > > > > TLB
> > > > > > > invalidation is
> > > > > > > - * synchronous.
> > > > > > > + * Issue a TLB invalidation for the GGTT. Completion of
> > > > > > > TLB
> > > > > > > is
> > > > > > > asynchronous and
> > > > > > > + * caller can use the invalidation fence to wait for
> > > > > > > completion.
> > > > > > >   *
> > > > > > >   * Return: 0 on success, negative error code on error
> > > > > > >   */
> > > > > > >  int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
> > > > > > >  {
> > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > -       unsigned int fw_ref;
> > > > > > > -
> > > > > > > -       if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
> > > > > > > -           gt->uc.guc.submission_state.enabled) {
> > > > > > > -               struct xe_tlb_inval_fence fence;
> > > > > > > -               int ret;
> > > > > > > -
> > > > > > > -               xe_tlb_inval_fence_init(tlb_inval,
> > > > > > > &fence,
> > > > > > > true);
> > > > > > > -               ret = __xe_tlb_inval_ggtt(gt, &fence);
> > > > > > > -               if (ret)
> > > > > > > -                       return ret;
> > > > > > > -
> > > > > > > -               xe_tlb_inval_fence_wait(&fence);
> > > > > > > -       } else if (xe_device_uc_enabled(xe) &&
> > > > > > > !xe_device_wedged(xe)) {
> > > > > > > -               struct xe_mmio *mmio = &gt->mmio;
> > > > > > > -
> > > > > > > -               if (IS_SRIOV_VF(xe))
> > > > > > > -                       return 0;
> > > > > > > -
> > > > > > > -               fw_ref = xe_force_wake_get(gt_to_fw(gt),
> > > > > > > XE_FW_GT);
> > > > > > > -               if (xe->info.platform == XE_PVC ||
> > > > > > > GRAPHICS_VER(xe) >= 20) {
> > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > PVC_GUC_TLB_INV_DESC1,
> > > > > > > -
> > > > > > >                                        PVC_GUC_TLB_INV_DE
> > > > > > > SC1_
> > > > > > > INVAL
> > > > > > > IDATE);
> > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > PVC_GUC_TLB_INV_DESC0,
> > > > > > > -
> > > > > > >                                        PVC_GUC_TLB_INV_DE
> > > > > > > SC0_
> > > > > > > VALID
> > > > > > > );
> > > > > > > -               } else {
> > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > GUC_TLB_INV_CR,
> > > > > > > -
> > > > > > >                                        GUC_TLB_INV_CR_INV
> > > > > > > ALID
> > > > > > > ATE);
> > > > > > > -               }
> > > > > > > -               xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > > > > > > -       }
> > > > > > > +       struct xe_tlb_inval_fence fence, *fence_ptr =
> > > > > > > &fence;
> > > > > > > +       int ret;
> > > > > > >  
> > > > > > > -       return 0;
> > > > > > > +       xe_tlb_inval_fence_init(tlb_inval, fence_ptr,
> > > > > > > true);
> > > > > > > +       ret = xe_tlb_inval_issue(tlb_inval, fence_ptr,
> > > > > > > tlb_inval-
> > > > > > > > ops->ggtt);
> > > > > > > +       xe_tlb_inval_fence_wait(fence_ptr);
> > > > > > > +
> > > > > > > +       return ret;
> > > > > > >  }
> > > > > > >  
> > > > > > >  /**
> > > > > > > - * xe_tlb_inval_range - Issue a TLB invalidation on this
> > > > > > > GT
> > > > > > > for
> > > > > > > an address range
> > > > > > > + * xe_tlb_inval_range() - Issue a TLB invalidation for
> > > > > > > an
> > > > > > > address range
> > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > >   * @fence: invalidation fence which will be signal on
> > > > > > > TLB
> > > > > > > invalidation
> > > > > > >   * completion
> > > > > > > @@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct
> > > > > > > xe_tlb_inval
> > > > > > > *tlb_inval,
> > > > > > >                        struct xe_tlb_inval_fence *fence,
> > > > > > > u64
> > > > > > > start, u64 end,
> > > > > > >                        u32 asid)
> > > > > > >  {
> > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > -       int  ret;
> > > > > > > -
> > > > > > > -       xe_gt_assert(gt, fence);
> > > > > > > -
> > > > > > > -       /* Execlists not supported */
> > > > > > > -       if (xe->info.force_execlist) {
> > > > > > > -               __inval_fence_signal(xe, fence);
> > > > > > > -               return 0;
> > > > > > > -       }
> > > > > > > -
> > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > -
> > > > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > > > -
> > > > > > > -       ret = send_tlb_inval_ppgtt(gt, start, end, asid,
> > > > > > > fence-
> > > > > > > > seqno);
> > > > > > > -       if (ret < 0)
> > > > > > > -               inval_fence_signal_unlocked(xe, fence);
> > > > > > > -
> > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > -
> > > > > > > -       return ret;
> > > > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > > > tlb_inval-
> > > > > > > > ops->ppgtt,
> > > > > > > +                                 start, end, asid);
> > > > > > >  }
> > > > > > >  
> > > > > > >  /**
> > > > > > > - * xe_tlb_inval_vm - Issue a TLB invalidation on this GT
> > > > > > > for
> > > > > > > a
> > > > > > > VM
> > > > > > > + * xe_tlb_inval_vm() - Issue a TLB invalidation for a VM
> > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > >   * @vm: VM to invalidate
> > > > > > >   *
> > > > > > > @@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct
> > > > > > > xe_tlb_inval
> > > > > > > *tlb_inval, struct xe_vm *vm)
> > > > > > >  {
> > > > > > >         struct xe_tlb_inval_fence fence;
> > > > > > >         u64 range = 1ull << vm->xe->info.va_bits;
> > > > > > > -       int ret;
> > > > > > >  
> > > > > > >         xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> > > > > > > -
> > > > > > > -       ret = xe_tlb_inval_range(tlb_inval, &fence, 0,
> > > > > > > range,
> > > > > > > vm-
> > > > > > > > usm.asid);
> > > > > > > -       if (ret < 0)
> > > > > > > -               return;
> > > > > > > -
> > > > > > > +       xe_tlb_inval_range(tlb_inval, &fence, 0, range,
> > > > > > > vm-
> > > > > > > > usm.asid);
> > > > > > >         xe_tlb_inval_fence_wait(&fence);
> > > > > > >  }
> > > > > > >  
> > > > > > >  /**
> > > > > > > - * xe_tlb_inval_done_handler - TLB invalidation done
> > > > > > > handler
> > > > > > > - * @gt: gt
> > > > > > > + * xe_tlb_inval_done_handler() - TLB invalidation done
> > > > > > > handler
> > > > > > > + * @tlb_inval: TLB invalidation client
> > > > > > >   * @seqno: seqno of invalidation that is done
> > > > > > >   *
> > > > > > >   * Update recv seqno, signal any TLB invalidation
> > > > > > > fences,
> > > > > > > and
> > > > > > > restart TDR
> > > > > > 
> > > > > > I'd mention that is function is safe be called from any
> > > > > > context
> > > > > > (i.e.,
> > > > > > process, atomic, and hardirq contexts are allowed).
> > > > > > 
> > > > > > We might need to convert tlb_inval.pending_lock to a
> > > > > > raw_spinlock_t
> > > > > > for
> > > > > > PREEMPT_RT enablement. Same for the GuC fast_lock. AFAIK we
> > > > > > haven’t
> > > > > > had
> > > > > > any complaints, so maybe I’m just overthinking it, but also
> > > > > > perhaps
> > > > > > not.
> > > > > > 
> > > > > > >   */
> > > > > > > -static void xe_tlb_inval_done_handler(struct xe_gt *gt,
> > > > > > > int
> > > > > > > seqno)
> > > > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > > > *tlb_inval,
> > > > > > > int seqno)
> > > > > > >  {
> > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > +       struct xe_device *xe = tlb_inval->xe;
> > > > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > > > >         unsigned long flags;
> > > > > > >  
> > > > > > > @@ -535,77 +337,53 @@ static void
> > > > > > > xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> > > > > > >          * officially process the CT message like if
> > > > > > > racing
> > > > > > > against
> > > > > > >          * process_g2h_msg().
> > > > > > >          */
> > > > > > > -       spin_lock_irqsave(&gt->tlb_inval.pending_lock,
> > > > > > > flags);
> > > > > > > -       if (tlb_inval_seqno_past(gt, seqno)) {
> > > > > > > -               spin_unlock_irqrestore(&gt-
> > > > > > > > tlb_inval.pending_lock, flags);
> > > > > > > +       spin_lock_irqsave(&tlb_inval->pending_lock,
> > > > > > > flags);
> > > > > > > +       if (xe_tlb_inval_seqno_past(tlb_inval, seqno)) {
> > > > > > > +               spin_unlock_irqrestore(&tlb_inval-
> > > > > > > > pending_lock,
> > > > > > > flags);
> > > > > > >                 return;
> > > > > > >         }
> > > > > > >  
> > > > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
> > > > > > > +       WRITE_ONCE(tlb_inval->seqno_recv, seqno);
> > > > > > >  
> > > > > > >         list_for_each_entry_safe(fence, next,
> > > > > > > -                                &gt-
> > > > > > > > tlb_inval.pending_fences,
> > > > > > > link) {
> > > > > > > +                                &tlb_inval-
> > > > > > > >pending_fences,
> > > > > > > link) {
> > > > > > >                 trace_xe_tlb_inval_fence_recv(xe, fence);
> > > > > > >  
> > > > > > > -               if (!tlb_inval_seqno_past(gt, fence-
> > > > > > > >seqno))
> > > > > > > +               if (!xe_tlb_inval_seqno_past(tlb_inval,
> > > > > > > fence-
> > > > > > > > seqno))
> > > > > > >                         break;
> > > > > > >  
> > > > > > > -               inval_fence_signal(xe, fence);
> > > > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > > >         }
> > > > > > >  
> > > > > > > -       if (!list_empty(&gt->tlb_inval.pending_fences))
> > > > > > > +       if (!list_empty(&tlb_inval->pending_fences))
> > > > > > >                 mod_delayed_work(system_wq,
> > > > > > > -                                &gt-
> > > > > > > >tlb_inval.fence_tdr,
> > > > > > > -                               
> > > > > > > tlb_timeout_jiffies(gt));
> > > > > > > +                                &tlb_inval->fence_tdr,
> > > > > > > +                                tlb_inval->ops-
> > > > > > > > timeout_delay(tlb_inval));
> > > > > > >         else
> > > > > > > -               cancel_delayed_work(&gt-
> > > > > > > > tlb_inval.fence_tdr);
> > > > > > > +               cancel_delayed_work(&tlb_inval-
> > > > > > > >fence_tdr);
> > > > > > >  
> > > > > > > -       spin_unlock_irqrestore(&gt-
> > > > > > > >tlb_inval.pending_lock,
> > > > > > > flags);
> > > > > > > -}
> > > > > > > -
> > > > > > > -/**
> > > > > > > - * xe_guc_tlb_inval_done_handler - TLB invalidation done
> > > > > > > handler
> > > > > > > - * @guc: guc
> > > > > > > - * @msg: message indicating TLB invalidation done
> > > > > > > - * @len: length of message
> > > > > > > - *
> > > > > > > - * Parse seqno of TLB invalidation, wake any waiters for
> > > > > > > seqno,
> > > > > > > and signal any
> > > > > > > - * invalidation fences for seqno. Algorithm for this
> > > > > > > depends
> > > > > > > on
> > > > > > > seqno being
> > > > > > > - * received in-order and asserts this assumption.
> > > > > > > - *
> > > > > > > - * Return: 0 on success, -EPROTO for malformed messages.
> > > > > > > - */
> > > > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc,
> > > > > > > u32
> > > > > > > *msg,
> > > > > > > u32 len)
> > > > > > > -{
> > > > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > > > -
> > > > > > > -       if (unlikely(len != 1))
> > > > > > > -               return -EPROTO;
> > > > > > > -
> > > > > > > -       xe_tlb_inval_done_handler(gt, msg[0]);
> > > > > > > -
> > > > > > > -       return 0;
> > > > > > > +       spin_unlock_irqrestore(&tlb_inval->pending_lock,
> > > > > > > flags);
> > > > > > >  }
> > > > > > >  
> > > > > > >  static const char *
> > > > > > > -inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > > > > > > +xe_inval_fence_get_driver_name(struct dma_fence
> > > > > > > *dma_fence)
> > > > > > >  {
> > > > > > >         return "xe";
> > > > > > >  }
> > > > > > >  
> > > > > > >  static const char *
> > > > > > > -inval_fence_get_timeline_name(struct dma_fence
> > > > > > > *dma_fence)
> > > > > > > +xe_inval_fence_get_timeline_name(struct dma_fence
> > > > > > > *dma_fence)
> > > > > > >  {
> > > > > > > -       return "inval_fence";
> > > > > > > +       return "tlb_inval_fence";
> > > > > > >  }
> > > > > > >  
> > > > > > >  static const struct dma_fence_ops inval_fence_ops = {
> > > > > > > -       .get_driver_name = inval_fence_get_driver_name,
> > > > > > > -       .get_timeline_name =
> > > > > > > inval_fence_get_timeline_name,
> > > > > > > +       .get_driver_name =
> > > > > > > xe_inval_fence_get_driver_name,
> > > > > > > +       .get_timeline_name =
> > > > > > > xe_inval_fence_get_timeline_name,
> > > > > > >  };
> > > > > > >  
> > > > > > >  /**
> > > > > > > - * xe_tlb_inval_fence_init - Initialize TLB invalidation
> > > > > > > fence
> > > > > > > + * xe_tlb_inval_fence_init() - Initialize TLB
> > > > > > > invalidation
> > > > > > > fence
> > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > >   * @fence: TLB invalidation fence to initialize
> > > > > > >   * @stack: fence is stack variable
> > > > > > > @@ -618,15 +396,12 @@ void xe_tlb_inval_fence_init(struct
> > > > > > > xe_tlb_inval *tlb_inval,
> > > > > > >                              struct xe_tlb_inval_fence
> > > > > > > *fence,
> > > > > > >                              bool stack)
> > > > > > >  {
> > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > -
> > > > > > > -       xe_pm_runtime_get_noresume(gt_to_xe(gt));
> > > > > > > +       xe_pm_runtime_get_noresume(tlb_inval->xe);
> > > > > > >  
> > > > > > > -       spin_lock_irq(&gt->tlb_inval.lock);
> > > > > > > -       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > > > > -                      &gt->tlb_inval.lock,
> > > > > > > +       spin_lock_irq(&tlb_inval->lock);
> > > > > > > +       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > > > > &tlb_inval->lock,
> > > > > > >                        dma_fence_context_alloc(1), 1);
> > > > > > > -       spin_unlock_irq(&gt->tlb_inval.lock);
> > > > > > > +       spin_unlock_irq(&tlb_inval->lock);
> > > > > > 
> > > > > > While here, 'fence_lock' is probably a better name.
> > > > > > 
> > > > > > Matt
> > > > > > 
> > > > > > >         INIT_LIST_HEAD(&fence->link);
> > > > > > >         if (stack)
> > > > > > >                 set_bit(FENCE_STACK_BIT, &fence-
> > > > > > > >base.flags);
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > index 7adee3f8c551..cdeafc8d4391 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > @@ -18,24 +18,30 @@ struct xe_vma;
> > > > > > >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
> > > > > > >  
> > > > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval);
> > > > > > > -int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > > > > -void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval,
> > > > > > > struct
> > > > > > > xe_vm *vm);
> > > > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > > > >                      struct xe_tlb_inval_fence *fence);
> > > > > > > +int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > > > > +void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval,
> > > > > > > struct
> > > > > > > xe_vm *vm);
> > > > > > >  int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
> > > > > > >                        struct xe_tlb_inval_fence *fence,
> > > > > > >                        u64 start, u64 end, u32 asid);
> > > > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc,
> > > > > > > u32
> > > > > > > *msg,
> > > > > > > u32 len);
> > > > > > >  
> > > > > > >  void xe_tlb_inval_fence_init(struct xe_tlb_inval
> > > > > > > *tlb_inval,
> > > > > > >                              struct xe_tlb_inval_fence
> > > > > > > *fence,
> > > > > > >                              bool stack);
> > > > > > > -void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence
> > > > > > > *fence);
> > > > > > >  
> > > > > > > +/**
> > > > > > > + * xe_tlb_inval_fence_wait() - TLB invalidiation fence
> > > > > > > wait
> > > > > > > + * @fence: TLB invalidation fence to wait on
> > > > > > > + *
> > > > > > > + * Wait on a TLB invalidiation fence until it signals,
> > > > > > > non
> > > > > > > interruptable
> > > > > > > + */
> > > > > > >  static inline void
> > > > > > >  xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence
> > > > > > > *fence)
> > > > > > >  {
> > > > > > >         dma_fence_wait(&fence->base, false);
> > > > > > >  }
> > > > > > >  
> > > > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > > > *tlb_inval,
> > > > > > > int seqno);
> > > > > > > +
> > > > > > >  #endif /* _XE_TLB_INVAL_ */
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > index 05b6adc929bb..c1ad96d24fc8 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > @@ -9,10 +9,85 @@
> > > > > > >  #include <linux/workqueue.h>
> > > > > > >  #include <linux/dma-fence.h>
> > > > > > >  
> > > > > > > -/** struct xe_tlb_inval - TLB invalidation client */
> > > > > > > +struct xe_tlb_inval;
> > > > > > > +
> > > > > > > +/** struct xe_tlb_inval_ops - TLB invalidation ops
> > > > > > > (backend)
> > > > > > > */
> > > > > > > +struct xe_tlb_inval_ops {
> > > > > > > +       /**
> > > > > > > +        * @all: Invalidate all TLBs
> > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > +        *
> > > > > > > +        * Return 0 on success, -ECANCELED if backend is
> > > > > > > mid-
> > > > > > > reset, error on
> > > > > > > +        * failure
> > > > > > > +        */
> > > > > > > +       int (*all)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > > seqno);
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +        * @ggtt: Invalidate global translation TLBs
> > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > +        *
> > > > > > > +        * Return 0 on success, -ECANCELED if backend is
> > > > > > > mid-
> > > > > > > reset, error on
> > > > > > > +        * failure
> > > > > > > +        */
> > > > > > > +       int (*ggtt)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > > seqno);
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +        * @ppttt: Invalidate per-process translation
> > > > > > > TLBs
> > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > +        * @start: Start address
> > > > > > > +        * @end: End address
> > > > > > > +        * @asid: Address space ID
> > > > > > > +        *
> > > > > > > +        * Return 0 on success, -ECANCELED if backend is
> > > > > > > mid-
> > > > > > > reset, error on
> > > > > > > +        * failure
> > > > > > > +        */
> > > > > > > +       int (*ppgtt)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > > seqno,
> > > > > > > u64 start,
> > > > > > > +                    u64 end, u32 asid);
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +        * @initialized: Backend is initialized
> > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > +        *
> > > > > > > +        * Return: True if back is initialized, False
> > > > > > > otherwise
> > > > > > > +        */
> > > > > > > +       bool (*initialized)(struct xe_tlb_inval
> > > > > > > *tlb_inval);
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +        * @flush: Flush pending TLB invalidations
> > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > +        */
> > > > > > > +       void (*flush)(struct xe_tlb_inval *tlb_inval);
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +        * @timeout_delay: Timeout delay for TLB
> > > > > > > invalidation
> > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > +        *
> > > > > > > +        * Return: Timeout delay for TLB invalidation in
> > > > > > > jiffies
> > > > > > > +        */
> > > > > > > +       long (*timeout_delay)(struct xe_tlb_inval
> > > > > > > *tlb_inval);
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +        * @lock: Lock resources protecting the backend
> > > > > > > seqno
> > > > > > > management
> > > > > > > +        */
> > > > > > > +       void (*lock)(struct xe_tlb_inval *tlb_inval);
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +        * @unlock: Lock resources protecting the backend
> > > > > > > seqno
> > > > > > > management
> > > > > > > +        */
> > > > > > > +       void (*unlock)(struct xe_tlb_inval *tlb_inval);
> > > > > > > +};
> > > > > > > +
> > > > > > > +/** struct xe_tlb_inval - TLB invalidation client
> > > > > > > (frontend)
> > > > > > > */
> > > > > > >  struct xe_tlb_inval {
> > > > > > >         /** @private: Backend private pointer */
> > > > > > >         void *private;
> > > > > > > +       /** @xe: Pointer to Xe device */
> > > > > > > +       struct xe_device *xe;
> > > > > > > +       /** @ops: TLB invalidation ops */
> > > > > > > +       const struct xe_tlb_inval_ops *ops;
> > > > > > >         /** @tlb_inval.seqno: TLB invalidation seqno,
> > > > > > > protected
> > > > > > > by CT lock */
> > > > > > >  #define TLB_INVALIDATION_SEQNO_MAX     0x100000
> > > > > > >         int seqno;
> > > > > > > -- 
> > > > > > > 2.34.1
> > > > > > > 
> > > > 
> > 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend
  2025-07-23 22:03               ` Summers, Stuart
@ 2025-07-23 22:43                 ` Summers, Stuart
  2025-07-23 23:21                 ` Matthew Brost
  1 sibling, 0 replies; 19+ messages in thread
From: Summers, Stuart @ 2025-07-23 22:43 UTC (permalink / raw)
  To: Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
	Kassabri, Farah, Auld, Matthew

On Wed, 2025-07-23 at 22:03 +0000, Summers, Stuart wrote:
> On Wed, 2025-07-23 at 14:22 -0700, Matthew Brost wrote:
> > On Wed, Jul 23, 2025 at 02:55:24PM -0600, Summers, Stuart wrote:
> > > On Wed, 2025-07-23 at 13:47 -0700, Matthew Brost wrote:
> > > > 
> > > 
> > > <cut>
> > > (just to reduce the noise in the rest of the patch here for
> > > now...)
> > > 
> > > > > > > >  
> > > > > > > >  /**
> > > > > > > > - * xe_tlb_inval_reset - Initialize TLB invalidation
> > > > > > > > reset
> > > > > > > > + * xe_tlb_inval_reset() - TLB invalidation reset
> > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > >   *
> > > > > > > >   * Signal any pending invalidation fences, should be
> > > > > > > > called
> > > > > > > > during a GT reset
> > > > > > > >   */
> > > > > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval
> > > > > > > > *tlb_inval)
> > > > > > > >  {
> > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > > > > >         int pending_seqno;
> > > > > > > >  
> > > > > > > >         /*
> > > > > > > > -        * we can get here before the CTs are even
> > > > > > > > initialized if
> > > > > > > > we're wedging
> > > > > > > > -        * very early, in which case there are not
> > > > > > > > going
> > > > > > > > to
> > > > > > > > be
> > > > > > > > any pending
> > > > > > > > -        * fences so we can bail immediately.
> > > > > > > > +        * we can get here before the backends are even
> > > > > > > > initialized if we're
> > > > > > > > +        * wedging very early, in which case there are
> > > > > > > > not
> > > > > > > > going
> > > > > > > > to be any
> > > > > > > > +        * pendind fences so we can bail immediately.
> > > > > > > >          */
> > > > > > > > -       if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
> > > > > > > > +       if (!tlb_inval->ops->initialized(tlb_inval))
> > > > > > > >                 return;
> > > > > > > >  
> > > > > > > >         /*
> > > > > > > > -        * CT channel is already disabled at this
> > > > > > > > point.
> > > > > > > > No
> > > > > > > > new
> > > > > > > > TLB requests can
> > > > > > > > +        * Backend is already disabled at this point.
> > > > > > > > No
> > > > > > > > new
> > > > > > > > TLB
> > > > > > > > requests can
> > > > > > > >          * appear.
> > > > > > > >          */
> > > > > > > >  
> > > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > > -       spin_lock_irq(&gt->tlb_inval.pending_lock);
> > > > > > > > -       cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > > > > > > > +       tlb_inval->ops->lock(tlb_inval);
> > > > > > > 
> > > > > > > I think you want a dedicated lock embedded in struct
> > > > > > > xe_tlb_inval,
> > > > > > > rather than reaching into the backend to grab one.
> > > > > > > 
> > > > > > > This will deadlock as written: G2H TLB inval messages are
> > > > > > > sometimes
> > > > > > > processed while holding ct->lock (non-fast path,
> > > > > > > unlikely)
> > > > > > > and
> > > > > > > sometimes
> > > > > > > without it (fast path, likely).
> > > > > > 
> > > > > > Ugh, I'm off today. Ignore the deadlock part, I was
> > > > > > confusing
> > > > > > myself...
> > > > > > I was thinking this was the function
> > > > > > xe_tlb_inval_done_handler,
> > > > > > it is
> > > > > > not. I still think xe_tlb_inval should its own lock but
> > > > > > this
> > > > > > patch
> > > > > > written should work with
> > > > > > s/xe_guc_ct_send/xe_guc_ct_send_locked.
> > > > > 
> > > > > So one reason I didn't go that way is we did just the reverse
> > > > > recently
> > > > > - moved from a TLB dedicated lock to the more specific CT
> > > > > lock
> > > > > since
> > > > > these are all going into the CT handler anyway when we use
> > > > > GuC
> > > > > submission. Then this embedded version allows us to lock at
> > > > > the
> > > > > bottom
> > > > > data layer rather than having a separate lock in the upper
> > > > > layer.
> > > > > Another thing is we might want to have different types of
> > > > > invalidation
> > > > > running in parallel without locking the data in the upper
> > > > > layer
> > > > > since
> > > > > the real contention would be in the lower level pipelining
> > > > > anyway.
> > > > > 
> > > > 
> > > > I can see the reasoning behind this approach, and maybe it’s
> > > > fine.
> > > > 
> > > > But consider the case where the GuC backend has to look up a
> > > > VM,
> > > > iterate
> > > > over a list of exec queues, and send multiple H2Gs to the
> > > > hardware,
> > > > each
> > > > with a corresponding G2H (per-context invalidations). In the
> > > > worst
> > > > case,
> > > > the CT code may have to wait for and process some G2Hs because
> > > > our
> > > > G2H
> > > > credits are exhausted—all while holding the CT lock, which
> > > > currently
> > > > blocks any hardware submissions (i.e., hardware submissions
> > > > need
> > > > the
> > > > CT
> > > > lock). Now imagine multiple sources issuing invalidations: they
> > > > could
> > > > grab the CT lock before a submission waiting on it, further
> > > > delaying
> > > > that
> > > > submission. 
> > > > 
> > > > The longer a mutex is held, the more likely the CPU thread
> > > > holding it
> > > > could switched out while holding it.
> > > > 
> > > > This doesn’t seem scalable compared to using a finer-grained CT
> > > > lock
> > > > (e.g., only taking it in xe_guc_ct_send).
> > > > 
> > > > I’m not saying this won’t work as you have it—I think it
> > > > will—but
> > > > the
> > > > consequences of holding the CT lock for an extended period need
> > > > to be
> > > > considered.
> > > 
> > > Couple more thoughts.. so in the case you mentioned, ideally I'd
> > > like
> > > to have just a single invalidation per request, rather than
> > > across
> > > a
> > > whole VM. That's the reason we have the range based invalidation
> > > to
> > 
> > Yes, this is ranged based.
> > 
> > > begin with. If we get to the point where we want to make that
> > > even
> > > finer, that's great, but we should still just have a single
> > > invalidation per request (again, ideally).
> > > 
> > 
> > Maybe you have a different idea, but I was thinking of queue-based
> > invalidations: the frontend assigns a single seqno, the backend
> > issues N
> > invalidations to the hardware—one per GCID mapped in the VM/GT
> > tuple—and
> > then signals the frontend when all invalidations associated with
> > the
> > seqno are complete. With the GuC, a GCID corresponds to each exec
> > queue’s
> > gucid mapped in the VM/GT tuple. Different backends can handle this
> > differently.
> > 
> > > Also, you already have some patches up on the list that do some
> > > coalescing of invalidations so we reduce the number of
> > > invalidations
> > > for multiple ranges. I didn't want to include those patches here
> > > because IMO they are really a separate feature here and it'd be
> > > nice to
> > > review that on its own.
> > > 
> > 
> > I agree it is a seperate thing, that should help in some cases, and
> > should be reviewed on its own.
> > 
> > That doesn't help in the case of multiple VM's issuing
> > invalidations
> > though (think eviction is occuring or MMU notifiers are firing).
> > The
> > lock contenion is moved from a dedicated TLB invalidation lock, to
> > a
> > widely shared CT lock. If multiple TLB invalidations are
> > contending,
> > now
> > all other users of the CT lock contend at this higher level. i.e.,
> > by
> > only acquring CT lock at last part of an invalidation, other
> > waiters
> > (non-invalidation) get QoS.
> 
> I mean, this was the original reason I had understood for having the
> separate lock in the first place. But it feels a little like we're
> running in circles here moving between the two modes..
> 
> I do see what you're saying though, basically the problem is the CT
> send routine right now is doing a busy wait for a reply from guc each
> time it sends something, all within the lock.
> 
>                 if (!wait_event_timeout(ct->wq, !ct->g2h_outstanding
> ||
>                                         g2h_avail(ct), HZ))

Ok maybe ignore what I said here for now. Let me dig a bit and get
back. The code I linked here is clearly just on the busy path so not a
block like that.

Thanks,
Stuart

> 
> So if we're going to stick with this, yeah I agree we really need
> some
> kind of queuing if we're going to have a lot of these fine grained
> invalidations all in a row or we'll start blocking things like page
> fault replies.
> 
> I'm wondering if the better way to approach this though would be to
> refactor on the GuC side rather than do something really complicated
> on
> the TLB side. I.e. why can't we do the CT busy wait in a worker
> thread
> and let the send thread keep going adding more and more? It would
> mean
> we'd have to do a better job of tracking each unique request out to
> guc
> rather than just relying on the current g2h_outstanding count, but it
> would at least let us do some of this work in parallel.
> 
> The queueing mechanism is still going to take work on top of what we
> have in this series to build up these chains of h2g messages with the
> CT lock held only for that last one. And IMO it still will be a
> little
> messy calling into the lower layer (guc) and back out to the upper
> layer (tlb) and back again to build these queues. And I'm not sure
> how
> great that will work if we move to a different back end than guc - we
> might not get any benefit there after all this work on the guc side.
> 
> Let me know what you think about a CT refactor like what I said.
> 
> And I still do think we can do a better job reducing the scope of
> some
> of these invalidations, particularly in a case where we weanted to
> associate something like the guc id with the VM to build a range
> rather
> than just the addresses within the VM. At least in that case we can
> look a little longer term at something like the CT refactor and still
> keep the backend/frontend isolation intact.
> 
> Thanks,
> Stuart
> 
> > 
> > Matt
> >  
> > > So basically, the per request lock here also pushes us to
> > > implement
> > > in
> > > a more efficient and precise way rather than just hammering as
> > > many
> > > invalidations over a given range as possible.
> > > 
> > > And of course there are going to need to be bigger hammer
> > > invalidations
> > > sometimes (like the full VF invalidation we're doing in the
> > > invalidate_all() routines), but those still fall into the same
> > > category
> > > of precision, just with a larger scope (rather than multiple
> > > smaller
> > > invalidations).
> > > 
> > > Thanks,
> > > Stuart
> > > 
> > > > 
> > > > Matt
> > > > 
> > > > > Thanks,
> > > > > Stuart
> > > > > 
> > > > > > 
> > > > > > Matt 
> > > > > > 
> > > > > > > 
> > > > > > > I’d call this lock seqno_lock, since it protects exactly
> > > > > > > that—the
> > > > > > > order
> > > > > > > in which a seqno is assigned by the frontend and handed
> > > > > > > to
> > > > > > > the
> > > > > > > backend.
> > > > > > > 
> > > > > > > Prime this lock for reclaim as well—do what
> > > > > > > primelockdep()
> > > > > > > does
> > > > > > > in
> > > > > > > xe_guc_ct.c—to make it clear that memory allocations are
> > > > > > > not
> > > > > > > allowed
> > > > > > > while the lock is held as TLB invalidations can be called
> > > > > > > from
> > > > > > > two
> > > > > > > reclaim paths:
> > > > > > > 
> > > > > > > - MMU notifier callbacks
> > > > > > > - The dma-fence signaling path of VM binds that require a
> > > > > > > TLB
> > > > > > >   invalidation
> > > > > > > 
> > > > > > > > +       spin_lock_irq(&tlb_inval->pending_lock);
> > > > > > > > +       cancel_delayed_work(&tlb_inval->fence_tdr);
> > > > > > > >         /*
> > > > > > > >          * We might have various kworkers waiting for
> > > > > > > > TLB
> > > > > > > > flushes
> > > > > > > > to complete
> > > > > > > >          * which are not tracked with an explicit TLB
> > > > > > > > fence,
> > > > > > > > however at this
> > > > > > > > -        * stage that will never happen since the CT is
> > > > > > > > already
> > > > > > > > disabled, so
> > > > > > > > -        * make sure we signal them here under the
> > > > > > > > assumption
> > > > > > > > that we have
> > > > > > > > +        * stage that will never happen since the
> > > > > > > > backend
> > > > > > > > is
> > > > > > > > already disabled,
> > > > > > > > +        * so make sure we signal them here under the
> > > > > > > > assumption
> > > > > > > > that we have
> > > > > > > >          * completed a full GT reset.
> > > > > > > >          */
> > > > > > > > -       if (gt->tlb_inval.seqno == 1)
> > > > > > > > +       if (tlb_inval->seqno == 1)
> > > > > > > >                 pending_seqno =
> > > > > > > > TLB_INVALIDATION_SEQNO_MAX -
> > > > > > > > 1;
> > > > > > > >         else
> > > > > > > > -               pending_seqno = gt->tlb_inval.seqno -
> > > > > > > > 1;
> > > > > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv,
> > > > > > > > pending_seqno);
> > > > > > > > +               pending_seqno = tlb_inval->seqno - 1;
> > > > > > > > +       WRITE_ONCE(tlb_inval->seqno_recv,
> > > > > > > > pending_seqno);
> > > > > > > >  
> > > > > > > >         list_for_each_entry_safe(fence, next,
> > > > > > > > -                                &gt-
> > > > > > > > > tlb_inval.pending_fences,
> > > > > > > > link)
> > > > > > > > -               inval_fence_signal(gt_to_xe(gt),
> > > > > > > > fence);
> > > > > > > > -       spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > > +                                &tlb_inval-
> > > > > > > > > pending_fences,
> > > > > > > > link)
> > > > > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > > > > +       spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > > > > +       tlb_inval->ops->unlock(tlb_inval);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > -static bool tlb_inval_seqno_past(struct xe_gt *gt, int
> > > > > > > > seqno)
> > > > > > > > +static bool xe_tlb_inval_seqno_past(struct
> > > > > > > > xe_tlb_inval
> > > > > > > > *tlb_inval, int seqno)
> > > > > > > >  {
> > > > > > > > -       int seqno_recv = READ_ONCE(gt-
> > > > > > > > > tlb_inval.seqno_recv);
> > > > > > > > +       int seqno_recv = READ_ONCE(tlb_inval-
> > > > > > > > > seqno_recv);
> > > > > > > > +
> > > > > > > > +       lockdep_assert_held(&tlb_inval->pending_lock);
> > > > > > > >  
> > > > > > > >         if (seqno - seqno_recv < -
> > > > > > > > (TLB_INVALIDATION_SEQNO_MAX
> > > > > > > > /
> > > > > > > > 2))
> > > > > > > >                 return false;
> > > > > > > > @@ -201,44 +192,20 @@ static bool
> > > > > > > > tlb_inval_seqno_past(struct
> > > > > > > > xe_gt *gt, int seqno)
> > > > > > > >         return seqno_recv >= seqno;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > -static int send_tlb_inval(struct xe_guc *guc, const
> > > > > > > > u32
> > > > > > > > *action,
> > > > > > > > int len)
> > > > > > > > -{
> > > > > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > > > > -
> > > > > > > > -       xe_gt_assert(gt, action[1]);    /* Seqno */
> > > > > > > > -       lockdep_assert_held(&guc->ct.lock);
> > > > > > > > -
> > > > > > > > -       /*
> > > > > > > > -        * XXX: The seqno algorithm relies on TLB
> > > > > > > > invalidation
> > > > > > > > being processed
> > > > > > > > -        * in order which they currently are, if that
> > > > > > > > changes
> > > > > > > > the
> > > > > > > > algorithm will
> > > > > > > > -        * need to be updated.
> > > > > > > > -        */
> > > > > > > > -
> > > > > > > > -       xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL,
> > > > > > > > 1);
> > > > > > > > -
> > > > > > > > -       return xe_guc_ct_send(&guc->ct, action, len,
> > > > > > > > -                            
> > > > > > > > G2H_LEN_DW_TLB_INVALIDATE,
> > > > > > > > 1);
> > > > > > > > -}
> > > > > > > > -
> > > > > > > >  static void xe_tlb_inval_fence_prep(struct
> > > > > > > > xe_tlb_inval_fence
> > > > > > > > *fence)
> > > > > > > >  {
> > > > > > > >         struct xe_tlb_inval *tlb_inval = fence-
> > > > > > > > > tlb_inval;
> > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > > -
> > > > > > > > -       lockdep_assert_held(&gt->uc.guc.ct.lock);
> > > > > > > >  
> > > > > > > >         fence->seqno = tlb_inval->seqno;
> > > > > > > > -       trace_xe_tlb_inval_fence_send(xe, fence);
> > > > > > > > +       trace_xe_tlb_inval_fence_send(tlb_inval->xe,
> > > > > > > > fence);
> > > > > > > >  
> > > > > > > >         spin_lock_irq(&tlb_inval->pending_lock);
> > > > > > > >         fence->inval_time = ktime_get();
> > > > > > > >         list_add_tail(&fence->link, &tlb_inval-
> > > > > > > > > pending_fences);
> > > > > > > >  
> > > > > > > >         if (list_is_singular(&tlb_inval-
> > > > > > > > >pending_fences))
> > > > > > > > -               queue_delayed_work(system_wq,
> > > > > > > > -                                  &tlb_inval-
> > > > > > > > >fence_tdr,
> > > > > > > > -                                 
> > > > > > > > tlb_timeout_jiffies(gt));
> > > > > > > > +               queue_delayed_work(system_wq,
> > > > > > > > &tlb_inval-
> > > > > > > > > fence_tdr,
> > > > > > > > +                                  tlb_inval->ops-
> > > > > > > > > timeout_delay(tlb_inval));
> > > > > > > >         spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > > > >  
> > > > > > > >         tlb_inval->seqno = (tlb_inval->seqno + 1) %
> > > > > > > > @@ -247,202 +214,63 @@ static void
> > > > > > > > xe_tlb_inval_fence_prep(struct
> > > > > > > > xe_tlb_inval_fence *fence)
> > > > > > > >                 tlb_inval->seqno = 1;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > -#define MAKE_INVAL_OP(type)    ((type <<
> > > > > > > > XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > > > > > > > -               XE_GUC_TLB_INVAL_MODE_HEAVY <<
> > > > > > > > XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > > > > > > > -               XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > > > > > > > -
> > > > > > > > -static int send_tlb_inval_ggtt(struct xe_gt *gt, int
> > > > > > > > seqno)
> > > > > > > > -{
> > > > > > > > -       u32 action[] = {
> > > > > > > > -               XE_GUC_ACTION_TLB_INVALIDATION,
> > > > > > > > -               seqno,
> > > > > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > > > > > > > -       };
> > > > > > > > -
> > > > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > > > ARRAY_SIZE(action));
> > > > > > > > -}
> > > > > > > > -
> > > > > > > > -static int send_tlb_inval_all(struct xe_tlb_inval
> > > > > > > > *tlb_inval,
> > > > > > > > -                             struct xe_tlb_inval_fence
> > > > > > > > *fence)
> > > > > > > > -{
> > > > > > > > -       u32 action[] = {
> > > > > > > > -               XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > > > > > > > -               0,  /* seqno, replaced in
> > > > > > > > send_tlb_inval
> > > > > > > > */
> > > > > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > > > > > > -       };
> > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > -
> > > > > > > > -       xe_gt_assert(gt, fence);
> > > > > > > > -
> > > > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > > > ARRAY_SIZE(action));
> > > > > > > > -}
> > > > > > > > +#define xe_tlb_inval_issue(__tlb_inval, __fence, op,
> > > > > > > > args...)  \
> > > > > > > > +({                                                    
> > > > > > > >   
> > > > > > > >     
> > > > > > > >    \
> > > > > > > > +       int
> > > > > > > > __ret;                                              \
> > > > > > > > +                                                      
> > > > > > > >   
> > > > > > > >     
> > > > > > > >    \
> > > > > > > > +       xe_assert((__tlb_inval)->xe, (__tlb_inval)-
> > > > > > > > > ops);       \
> > > > > > > > +       xe_assert((__tlb_inval)->xe,
> > > > > > > > (__fence));                \
> > > > > > > > +                                                      
> > > > > > > >   
> > > > > > > >     
> > > > > > > >    \
> > > > > > > > +       (__tlb_inval)->ops-
> > > > > > > > > lock((__tlb_inval));                \
> > > > > > > > +       xe_tlb_inval_fence_prep((__fence));            
> > > > > > > >   
> > > > > > > >     
> > > > > > > >    \
> > > > > > > > +       __ret = op((__tlb_inval), (__fence)->seqno,
> > > > > > > > ##args);    \
> > > > > > > > +       if (__ret <
> > > > > > > > 0)                                          \
> > > > > > > > +               xe_tlb_inval_fence_signal_unlocked((__f
> > > > > > > > en
> > > > > > > > ce))
> > > > > > > > ;  \
> > > > > > > > +       (__tlb_inval)->ops-
> > > > > > > > > unlock((__tlb_inval));              \
> > > > > > > > +                                                      
> > > > > > > >   
> > > > > > > >     
> > > > > > > >    \
> > > > > > > > +       __ret == -ECANCELED ? 0 :
> > > > > > > > __ret;                        \
> > > > > > > > +})
> > > > > > > >  
> > > > > > > >  /**
> > > > > > > > - * xe_gt_tlb_invalidation_all - Invalidate all TLBs
> > > > > > > > across
> > > > > > > > PF
> > > > > > > > and all VFs.
> > > > > > > > - * @gt: the &xe_gt structure
> > > > > > > > - * @fence: the &xe_tlb_inval_fence to be signaled on
> > > > > > > > completion
> > > > > > > > + * xe_tlb_inval_all() - Issue a TLB invalidation for
> > > > > > > > all
> > > > > > > > TLBs
> > > > > > > > + * @tlb_inval: TLB invalidation client
> > > > > > > > + * @fence: invalidation fence which will be signal on
> > > > > > > > TLB
> > > > > > > > invalidation
> > > > > > > > + * completion
> > > > > > > >   *
> > > > > > > > - * Send a request to invalidate all TLBs across PF and
> > > > > > > > all
> > > > > > > > VFs.
> > > > > > > > + * Issue a TLB invalidation for all TLBs. Completion
> > > > > > > > of
> > > > > > > > TLB
> > > > > > > > is
> > > > > > > > asynchronous and
> > > > > > > > + * caller can use the invalidation fence to wait for
> > > > > > > > completion.
> > > > > > > >   *
> > > > > > > >   * Return: 0 on success, negative error code on error
> > > > > > > >   */
> > > > > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > > > > >                      struct xe_tlb_inval_fence *fence)
> > > > > > > >  {
> > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > -       int err;
> > > > > > > > -
> > > > > > > > -       err = send_tlb_inval_all(tlb_inval, fence);
> > > > > > > > -       if (err)
> > > > > > > > -               xe_gt_err(gt, "TLB invalidation request
> > > > > > > > failed
> > > > > > > > (%pe)", ERR_PTR(err));
> > > > > > > > -
> > > > > > > > -       return err;
> > > > > > > > -}
> > > > > > > > -
> > > > > > > > -/*
> > > > > > > > - * Ensure that roundup_pow_of_two(length) doesn't
> > > > > > > > overflow.
> > > > > > > > - * Note that roundup_pow_of_two() operates on unsigned
> > > > > > > > long,
> > > > > > > > - * not on u64.
> > > > > > > > - */
> > > > > > > > -#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > > > > > > > (rounddown_pow_of_two(ULONG_MAX))
> > > > > > > > -
> > > > > > > > -static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64
> > > > > > > > start,
> > > > > > > > u64
> > > > > > > > end,
> > > > > > > > -                               u32 asid, int seqno)
> > > > > > > > -{
> > > > > > > > -#define MAX_TLB_INVALIDATION_LEN       7
> > > > > > > > -       u32 action[MAX_TLB_INVALIDATION_LEN];
> > > > > > > > -       u64 length = end - start;
> > > > > > > > -       int len = 0;
> > > > > > > > -
> > > > > > > > -       action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > > > > > > > -       action[len++] = seqno;
> > > > > > > > -       if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > > > > > > -           length > MAX_RANGE_TLB_INVALIDATION_LENGTH)
> > > > > > > > {
> > > > > > > > -               action[len++] =
> > > > > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > > > > > > > -       } else {
> > > > > > > > -               u64 orig_start = start;
> > > > > > > > -               u64 align;
> > > > > > > > -
> > > > > > > > -               if (length < SZ_4K)
> > > > > > > > -                       length = SZ_4K;
> > > > > > > > -
> > > > > > > > -               /*
> > > > > > > > -                * We need to invalidate a higher
> > > > > > > > granularity
> > > > > > > > if
> > > > > > > > start address
> > > > > > > > -                * is not aligned to length. When start
> > > > > > > > is
> > > > > > > > not
> > > > > > > > aligned with
> > > > > > > > -                * length we need to find the length
> > > > > > > > large
> > > > > > > > enough
> > > > > > > > to create an
> > > > > > > > -                * address mask covering the required
> > > > > > > > range.
> > > > > > > > -                */
> > > > > > > > -               align = roundup_pow_of_two(length);
> > > > > > > > -               start = ALIGN_DOWN(start, align);
> > > > > > > > -               end = ALIGN(end, align);
> > > > > > > > -               length = align;
> > > > > > > > -               while (start + length < end) {
> > > > > > > > -                       length <<= 1;
> > > > > > > > -                       start = ALIGN_DOWN(orig_start,
> > > > > > > > length);
> > > > > > > > -               }
> > > > > > > > -
> > > > > > > > -               /*
> > > > > > > > -                * Minimum invalidation size for a 2MB
> > > > > > > > page
> > > > > > > > that
> > > > > > > > the hardware
> > > > > > > > -                * expects is 16MB
> > > > > > > > -                */
> > > > > > > > -               if (length >= SZ_2M) {
> > > > > > > > -                       length = max_t(u64, SZ_16M,
> > > > > > > > length);
> > > > > > > > -                       start = ALIGN_DOWN(orig_start,
> > > > > > > > length);
> > > > > > > > -               }
> > > > > > > > -
> > > > > > > > -               xe_gt_assert(gt, length >= SZ_4K);
> > > > > > > > -               xe_gt_assert(gt,
> > > > > > > > is_power_of_2(length));
> > > > > > > > -               xe_gt_assert(gt, !(length &
> > > > > > > > GENMASK(ilog2(SZ_16M)
> > > > > > > > - 1,
> > > > > > > > -                                                  
> > > > > > > > ilog2(SZ_2M)
> > > > > > > > + 1)));
> > > > > > > > -               xe_gt_assert(gt, IS_ALIGNED(start,
> > > > > > > > length));
> > > > > > > > -
> > > > > > > > -               action[len++] =
> > > > > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > > > > > > > -               action[len++] = asid;
> > > > > > > > -               action[len++] = lower_32_bits(start);
> > > > > > > > -               action[len++] = upper_32_bits(start);
> > > > > > > > -               action[len++] = ilog2(length) -
> > > > > > > > ilog2(SZ_4K);
> > > > > > > > -       }
> > > > > > > > -
> > > > > > > > -       xe_gt_assert(gt, len <=
> > > > > > > > MAX_TLB_INVALIDATION_LEN);
> > > > > > > > -
> > > > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > > > len);
> > > > > > > > -}
> > > > > > > > -
> > > > > > > > -static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
> > > > > > > > -                              struct
> > > > > > > > xe_tlb_inval_fence
> > > > > > > > *fence)
> > > > > > > > -{
> > > > > > > > -       int ret;
> > > > > > > > -
> > > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > > -
> > > > > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > > > > -
> > > > > > > > -       ret = send_tlb_inval_ggtt(gt, fence->seqno);
> > > > > > > > -       if (ret < 0)
> > > > > > > > -
> > > > > > > >                inval_fence_signal_unlocked(gt_to_xe(gt),
> > > > > > > > fence);
> > > > > > > > -
> > > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > > -
> > > > > > > > -       /*
> > > > > > > > -        * -ECANCELED indicates the CT is stopped for a
> > > > > > > > GT
> > > > > > > > reset.
> > > > > > > > TLB caches
> > > > > > > > -        *  should be nuked on a GT reset so this error
> > > > > > > > can
> > > > > > > > be
> > > > > > > > ignored.
> > > > > > > > -        */
> > > > > > > > -       if (ret == -ECANCELED)
> > > > > > > > -               return 0;
> > > > > > > > -
> > > > > > > > -       return ret;
> > > > > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > > > > tlb_inval-
> > > > > > > > > ops->all);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  /**
> > > > > > > > - * xe_tlb_inval_ggtt - Issue a TLB invalidation on
> > > > > > > > this
> > > > > > > > GT
> > > > > > > > for
> > > > > > > > the GGTT
> > > > > > > > + * xe_tlb_inval_ggtt() - Issue a TLB invalidation for
> > > > > > > > the
> > > > > > > > GGTT
> > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > >   *
> > > > > > > > - * Issue a TLB invalidation for the GGTT. Completion
> > > > > > > > of
> > > > > > > > TLB
> > > > > > > > invalidation is
> > > > > > > > - * synchronous.
> > > > > > > > + * Issue a TLB invalidation for the GGTT. Completion
> > > > > > > > of
> > > > > > > > TLB
> > > > > > > > is
> > > > > > > > asynchronous and
> > > > > > > > + * caller can use the invalidation fence to wait for
> > > > > > > > completion.
> > > > > > > >   *
> > > > > > > >   * Return: 0 on success, negative error code on error
> > > > > > > >   */
> > > > > > > >  int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
> > > > > > > >  {
> > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > > -       unsigned int fw_ref;
> > > > > > > > -
> > > > > > > > -       if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
> > > > > > > > -           gt->uc.guc.submission_state.enabled) {
> > > > > > > > -               struct xe_tlb_inval_fence fence;
> > > > > > > > -               int ret;
> > > > > > > > -
> > > > > > > > -               xe_tlb_inval_fence_init(tlb_inval,
> > > > > > > > &fence,
> > > > > > > > true);
> > > > > > > > -               ret = __xe_tlb_inval_ggtt(gt, &fence);
> > > > > > > > -               if (ret)
> > > > > > > > -                       return ret;
> > > > > > > > -
> > > > > > > > -               xe_tlb_inval_fence_wait(&fence);
> > > > > > > > -       } else if (xe_device_uc_enabled(xe) &&
> > > > > > > > !xe_device_wedged(xe)) {
> > > > > > > > -               struct xe_mmio *mmio = &gt->mmio;
> > > > > > > > -
> > > > > > > > -               if (IS_SRIOV_VF(xe))
> > > > > > > > -                       return 0;
> > > > > > > > -
> > > > > > > > -               fw_ref =
> > > > > > > > xe_force_wake_get(gt_to_fw(gt),
> > > > > > > > XE_FW_GT);
> > > > > > > > -               if (xe->info.platform == XE_PVC ||
> > > > > > > > GRAPHICS_VER(xe) >= 20) {
> > > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > > PVC_GUC_TLB_INV_DESC1,
> > > > > > > > -
> > > > > > > >                                        PVC_GUC_TLB_INV_
> > > > > > > > DE
> > > > > > > > SC1_
> > > > > > > > INVAL
> > > > > > > > IDATE);
> > > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > > PVC_GUC_TLB_INV_DESC0,
> > > > > > > > -
> > > > > > > >                                        PVC_GUC_TLB_INV_
> > > > > > > > DE
> > > > > > > > SC0_
> > > > > > > > VALID
> > > > > > > > );
> > > > > > > > -               } else {
> > > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > > GUC_TLB_INV_CR,
> > > > > > > > -
> > > > > > > >                                        GUC_TLB_INV_CR_I
> > > > > > > > NV
> > > > > > > > ALID
> > > > > > > > ATE);
> > > > > > > > -               }
> > > > > > > > -               xe_force_wake_put(gt_to_fw(gt),
> > > > > > > > fw_ref);
> > > > > > > > -       }
> > > > > > > > +       struct xe_tlb_inval_fence fence, *fence_ptr =
> > > > > > > > &fence;
> > > > > > > > +       int ret;
> > > > > > > >  
> > > > > > > > -       return 0;
> > > > > > > > +       xe_tlb_inval_fence_init(tlb_inval, fence_ptr,
> > > > > > > > true);
> > > > > > > > +       ret = xe_tlb_inval_issue(tlb_inval, fence_ptr,
> > > > > > > > tlb_inval-
> > > > > > > > > ops->ggtt);
> > > > > > > > +       xe_tlb_inval_fence_wait(fence_ptr);
> > > > > > > > +
> > > > > > > > +       return ret;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  /**
> > > > > > > > - * xe_tlb_inval_range - Issue a TLB invalidation on
> > > > > > > > this
> > > > > > > > GT
> > > > > > > > for
> > > > > > > > an address range
> > > > > > > > + * xe_tlb_inval_range() - Issue a TLB invalidation for
> > > > > > > > an
> > > > > > > > address range
> > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > >   * @fence: invalidation fence which will be signal on
> > > > > > > > TLB
> > > > > > > > invalidation
> > > > > > > >   * completion
> > > > > > > > @@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct
> > > > > > > > xe_tlb_inval
> > > > > > > > *tlb_inval,
> > > > > > > >                        struct xe_tlb_inval_fence
> > > > > > > > *fence,
> > > > > > > > u64
> > > > > > > > start, u64 end,
> > > > > > > >                        u32 asid)
> > > > > > > >  {
> > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > > -       int  ret;
> > > > > > > > -
> > > > > > > > -       xe_gt_assert(gt, fence);
> > > > > > > > -
> > > > > > > > -       /* Execlists not supported */
> > > > > > > > -       if (xe->info.force_execlist) {
> > > > > > > > -               __inval_fence_signal(xe, fence);
> > > > > > > > -               return 0;
> > > > > > > > -       }
> > > > > > > > -
> > > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > > -
> > > > > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > > > > -
> > > > > > > > -       ret = send_tlb_inval_ppgtt(gt, start, end,
> > > > > > > > asid,
> > > > > > > > fence-
> > > > > > > > > seqno);
> > > > > > > > -       if (ret < 0)
> > > > > > > > -               inval_fence_signal_unlocked(xe, fence);
> > > > > > > > -
> > > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > > -
> > > > > > > > -       return ret;
> > > > > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > > > > tlb_inval-
> > > > > > > > > ops->ppgtt,
> > > > > > > > +                                 start, end, asid);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  /**
> > > > > > > > - * xe_tlb_inval_vm - Issue a TLB invalidation on this
> > > > > > > > GT
> > > > > > > > for
> > > > > > > > a
> > > > > > > > VM
> > > > > > > > + * xe_tlb_inval_vm() - Issue a TLB invalidation for a
> > > > > > > > VM
> > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > >   * @vm: VM to invalidate
> > > > > > > >   *
> > > > > > > > @@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct
> > > > > > > > xe_tlb_inval
> > > > > > > > *tlb_inval, struct xe_vm *vm)
> > > > > > > >  {
> > > > > > > >         struct xe_tlb_inval_fence fence;
> > > > > > > >         u64 range = 1ull << vm->xe->info.va_bits;
> > > > > > > > -       int ret;
> > > > > > > >  
> > > > > > > >         xe_tlb_inval_fence_init(tlb_inval, &fence,
> > > > > > > > true);
> > > > > > > > -
> > > > > > > > -       ret = xe_tlb_inval_range(tlb_inval, &fence, 0,
> > > > > > > > range,
> > > > > > > > vm-
> > > > > > > > > usm.asid);
> > > > > > > > -       if (ret < 0)
> > > > > > > > -               return;
> > > > > > > > -
> > > > > > > > +       xe_tlb_inval_range(tlb_inval, &fence, 0, range,
> > > > > > > > vm-
> > > > > > > > > usm.asid);
> > > > > > > >         xe_tlb_inval_fence_wait(&fence);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  /**
> > > > > > > > - * xe_tlb_inval_done_handler - TLB invalidation done
> > > > > > > > handler
> > > > > > > > - * @gt: gt
> > > > > > > > + * xe_tlb_inval_done_handler() - TLB invalidation done
> > > > > > > > handler
> > > > > > > > + * @tlb_inval: TLB invalidation client
> > > > > > > >   * @seqno: seqno of invalidation that is done
> > > > > > > >   *
> > > > > > > >   * Update recv seqno, signal any TLB invalidation
> > > > > > > > fences,
> > > > > > > > and
> > > > > > > > restart TDR
> > > > > > > 
> > > > > > > I'd mention that is function is safe be called from any
> > > > > > > context
> > > > > > > (i.e.,
> > > > > > > process, atomic, and hardirq contexts are allowed).
> > > > > > > 
> > > > > > > We might need to convert tlb_inval.pending_lock to a
> > > > > > > raw_spinlock_t
> > > > > > > for
> > > > > > > PREEMPT_RT enablement. Same for the GuC fast_lock. AFAIK
> > > > > > > we
> > > > > > > haven’t
> > > > > > > had
> > > > > > > any complaints, so maybe I’m just overthinking it, but
> > > > > > > also
> > > > > > > perhaps
> > > > > > > not.
> > > > > > > 
> > > > > > > >   */
> > > > > > > > -static void xe_tlb_inval_done_handler(struct xe_gt
> > > > > > > > *gt,
> > > > > > > > int
> > > > > > > > seqno)
> > > > > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > > > > *tlb_inval,
> > > > > > > > int seqno)
> > > > > > > >  {
> > > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > > +       struct xe_device *xe = tlb_inval->xe;
> > > > > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > > > > >         unsigned long flags;
> > > > > > > >  
> > > > > > > > @@ -535,77 +337,53 @@ static void
> > > > > > > > xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> > > > > > > >          * officially process the CT message like if
> > > > > > > > racing
> > > > > > > > against
> > > > > > > >          * process_g2h_msg().
> > > > > > > >          */
> > > > > > > > -       spin_lock_irqsave(&gt->tlb_inval.pending_lock,
> > > > > > > > flags);
> > > > > > > > -       if (tlb_inval_seqno_past(gt, seqno)) {
> > > > > > > > -               spin_unlock_irqrestore(&gt-
> > > > > > > > > tlb_inval.pending_lock, flags);
> > > > > > > > +       spin_lock_irqsave(&tlb_inval->pending_lock,
> > > > > > > > flags);
> > > > > > > > +       if (xe_tlb_inval_seqno_past(tlb_inval, seqno))
> > > > > > > > {
> > > > > > > > +               spin_unlock_irqrestore(&tlb_inval-
> > > > > > > > > pending_lock,
> > > > > > > > flags);
> > > > > > > >                 return;
> > > > > > > >         }
> > > > > > > >  
> > > > > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
> > > > > > > > +       WRITE_ONCE(tlb_inval->seqno_recv, seqno);
> > > > > > > >  
> > > > > > > >         list_for_each_entry_safe(fence, next,
> > > > > > > > -                                &gt-
> > > > > > > > > tlb_inval.pending_fences,
> > > > > > > > link) {
> > > > > > > > +                                &tlb_inval-
> > > > > > > > > pending_fences,
> > > > > > > > link) {
> > > > > > > >                 trace_xe_tlb_inval_fence_recv(xe,
> > > > > > > > fence);
> > > > > > > >  
> > > > > > > > -               if (!tlb_inval_seqno_past(gt, fence-
> > > > > > > > > seqno))
> > > > > > > > +               if (!xe_tlb_inval_seqno_past(tlb_inval,
> > > > > > > > fence-
> > > > > > > > > seqno))
> > > > > > > >                         break;
> > > > > > > >  
> > > > > > > > -               inval_fence_signal(xe, fence);
> > > > > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > > > >         }
> > > > > > > >  
> > > > > > > > -       if (!list_empty(&gt->tlb_inval.pending_fences))
> > > > > > > > +       if (!list_empty(&tlb_inval->pending_fences))
> > > > > > > >                 mod_delayed_work(system_wq,
> > > > > > > > -                                &gt-
> > > > > > > > > tlb_inval.fence_tdr,
> > > > > > > > -                               
> > > > > > > > tlb_timeout_jiffies(gt));
> > > > > > > > +                                &tlb_inval->fence_tdr,
> > > > > > > > +                                tlb_inval->ops-
> > > > > > > > > timeout_delay(tlb_inval));
> > > > > > > >         else
> > > > > > > > -               cancel_delayed_work(&gt-
> > > > > > > > > tlb_inval.fence_tdr);
> > > > > > > > +               cancel_delayed_work(&tlb_inval-
> > > > > > > > > fence_tdr);
> > > > > > > >  
> > > > > > > > -       spin_unlock_irqrestore(&gt-
> > > > > > > > > tlb_inval.pending_lock,
> > > > > > > > flags);
> > > > > > > > -}
> > > > > > > > -
> > > > > > > > -/**
> > > > > > > > - * xe_guc_tlb_inval_done_handler - TLB invalidation
> > > > > > > > done
> > > > > > > > handler
> > > > > > > > - * @guc: guc
> > > > > > > > - * @msg: message indicating TLB invalidation done
> > > > > > > > - * @len: length of message
> > > > > > > > - *
> > > > > > > > - * Parse seqno of TLB invalidation, wake any waiters
> > > > > > > > for
> > > > > > > > seqno,
> > > > > > > > and signal any
> > > > > > > > - * invalidation fences for seqno. Algorithm for this
> > > > > > > > depends
> > > > > > > > on
> > > > > > > > seqno being
> > > > > > > > - * received in-order and asserts this assumption.
> > > > > > > > - *
> > > > > > > > - * Return: 0 on success, -EPROTO for malformed
> > > > > > > > messages.
> > > > > > > > - */
> > > > > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc,
> > > > > > > > u32
> > > > > > > > *msg,
> > > > > > > > u32 len)
> > > > > > > > -{
> > > > > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > > > > -
> > > > > > > > -       if (unlikely(len != 1))
> > > > > > > > -               return -EPROTO;
> > > > > > > > -
> > > > > > > > -       xe_tlb_inval_done_handler(gt, msg[0]);
> > > > > > > > -
> > > > > > > > -       return 0;
> > > > > > > > +       spin_unlock_irqrestore(&tlb_inval-
> > > > > > > > >pending_lock,
> > > > > > > > flags);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  static const char *
> > > > > > > > -inval_fence_get_driver_name(struct dma_fence
> > > > > > > > *dma_fence)
> > > > > > > > +xe_inval_fence_get_driver_name(struct dma_fence
> > > > > > > > *dma_fence)
> > > > > > > >  {
> > > > > > > >         return "xe";
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  static const char *
> > > > > > > > -inval_fence_get_timeline_name(struct dma_fence
> > > > > > > > *dma_fence)
> > > > > > > > +xe_inval_fence_get_timeline_name(struct dma_fence
> > > > > > > > *dma_fence)
> > > > > > > >  {
> > > > > > > > -       return "inval_fence";
> > > > > > > > +       return "tlb_inval_fence";
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  static const struct dma_fence_ops inval_fence_ops = {
> > > > > > > > -       .get_driver_name = inval_fence_get_driver_name,
> > > > > > > > -       .get_timeline_name =
> > > > > > > > inval_fence_get_timeline_name,
> > > > > > > > +       .get_driver_name =
> > > > > > > > xe_inval_fence_get_driver_name,
> > > > > > > > +       .get_timeline_name =
> > > > > > > > xe_inval_fence_get_timeline_name,
> > > > > > > >  };
> > > > > > > >  
> > > > > > > >  /**
> > > > > > > > - * xe_tlb_inval_fence_init - Initialize TLB
> > > > > > > > invalidation
> > > > > > > > fence
> > > > > > > > + * xe_tlb_inval_fence_init() - Initialize TLB
> > > > > > > > invalidation
> > > > > > > > fence
> > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > >   * @fence: TLB invalidation fence to initialize
> > > > > > > >   * @stack: fence is stack variable
> > > > > > > > @@ -618,15 +396,12 @@ void
> > > > > > > > xe_tlb_inval_fence_init(struct
> > > > > > > > xe_tlb_inval *tlb_inval,
> > > > > > > >                              struct xe_tlb_inval_fence
> > > > > > > > *fence,
> > > > > > > >                              bool stack)
> > > > > > > >  {
> > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > -
> > > > > > > > -       xe_pm_runtime_get_noresume(gt_to_xe(gt));
> > > > > > > > +       xe_pm_runtime_get_noresume(tlb_inval->xe);
> > > > > > > >  
> > > > > > > > -       spin_lock_irq(&gt->tlb_inval.lock);
> > > > > > > > -       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > > > > > -                      &gt->tlb_inval.lock,
> > > > > > > > +       spin_lock_irq(&tlb_inval->lock);
> > > > > > > > +       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > > > > > &tlb_inval->lock,
> > > > > > > >                        dma_fence_context_alloc(1), 1);
> > > > > > > > -       spin_unlock_irq(&gt->tlb_inval.lock);
> > > > > > > > +       spin_unlock_irq(&tlb_inval->lock);
> > > > > > > 
> > > > > > > While here, 'fence_lock' is probably a better name.
> > > > > > > 
> > > > > > > Matt
> > > > > > > 
> > > > > > > >         INIT_LIST_HEAD(&fence->link);
> > > > > > > >         if (stack)
> > > > > > > >                 set_bit(FENCE_STACK_BIT, &fence-
> > > > > > > > > base.flags);
> > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > > b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > > index 7adee3f8c551..cdeafc8d4391 100644
> > > > > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > > @@ -18,24 +18,30 @@ struct xe_vma;
> > > > > > > >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
> > > > > > > >  
> > > > > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval
> > > > > > > > *tlb_inval);
> > > > > > > > -int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > > > > > -void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval,
> > > > > > > > struct
> > > > > > > > xe_vm *vm);
> > > > > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > > > > >                      struct xe_tlb_inval_fence *fence);
> > > > > > > > +int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > > > > > +void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval,
> > > > > > > > struct
> > > > > > > > xe_vm *vm);
> > > > > > > >  int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
> > > > > > > >                        struct xe_tlb_inval_fence
> > > > > > > > *fence,
> > > > > > > >                        u64 start, u64 end, u32 asid);
> > > > > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc,
> > > > > > > > u32
> > > > > > > > *msg,
> > > > > > > > u32 len);
> > > > > > > >  
> > > > > > > >  void xe_tlb_inval_fence_init(struct xe_tlb_inval
> > > > > > > > *tlb_inval,
> > > > > > > >                              struct xe_tlb_inval_fence
> > > > > > > > *fence,
> > > > > > > >                              bool stack);
> > > > > > > > -void xe_tlb_inval_fence_signal(struct
> > > > > > > > xe_tlb_inval_fence
> > > > > > > > *fence);
> > > > > > > >  
> > > > > > > > +/**
> > > > > > > > + * xe_tlb_inval_fence_wait() - TLB invalidiation fence
> > > > > > > > wait
> > > > > > > > + * @fence: TLB invalidation fence to wait on
> > > > > > > > + *
> > > > > > > > + * Wait on a TLB invalidiation fence until it signals,
> > > > > > > > non
> > > > > > > > interruptable
> > > > > > > > + */
> > > > > > > >  static inline void
> > > > > > > >  xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence
> > > > > > > > *fence)
> > > > > > > >  {
> > > > > > > >         dma_fence_wait(&fence->base, false);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > > > > *tlb_inval,
> > > > > > > > int seqno);
> > > > > > > > +
> > > > > > > >  #endif /* _XE_TLB_INVAL_ */
> > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > > b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > > index 05b6adc929bb..c1ad96d24fc8 100644
> > > > > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > > @@ -9,10 +9,85 @@
> > > > > > > >  #include <linux/workqueue.h>
> > > > > > > >  #include <linux/dma-fence.h>
> > > > > > > >  
> > > > > > > > -/** struct xe_tlb_inval - TLB invalidation client */
> > > > > > > > +struct xe_tlb_inval;
> > > > > > > > +
> > > > > > > > +/** struct xe_tlb_inval_ops - TLB invalidation ops
> > > > > > > > (backend)
> > > > > > > > */
> > > > > > > > +struct xe_tlb_inval_ops {
> > > > > > > > +       /**
> > > > > > > > +        * @all: Invalidate all TLBs
> > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > > +        *
> > > > > > > > +        * Return 0 on success, -ECANCELED if backend
> > > > > > > > is
> > > > > > > > mid-
> > > > > > > > reset, error on
> > > > > > > > +        * failure
> > > > > > > > +        */
> > > > > > > > +       int (*all)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > > > seqno);
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +        * @ggtt: Invalidate global translation TLBs
> > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > > +        *
> > > > > > > > +        * Return 0 on success, -ECANCELED if backend
> > > > > > > > is
> > > > > > > > mid-
> > > > > > > > reset, error on
> > > > > > > > +        * failure
> > > > > > > > +        */
> > > > > > > > +       int (*ggtt)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > > > seqno);
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +        * @ppttt: Invalidate per-process translation
> > > > > > > > TLBs
> > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > > +        * @start: Start address
> > > > > > > > +        * @end: End address
> > > > > > > > +        * @asid: Address space ID
> > > > > > > > +        *
> > > > > > > > +        * Return 0 on success, -ECANCELED if backend
> > > > > > > > is
> > > > > > > > mid-
> > > > > > > > reset, error on
> > > > > > > > +        * failure
> > > > > > > > +        */
> > > > > > > > +       int (*ppgtt)(struct xe_tlb_inval *tlb_inval,
> > > > > > > > u32
> > > > > > > > seqno,
> > > > > > > > u64 start,
> > > > > > > > +                    u64 end, u32 asid);
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +        * @initialized: Backend is initialized
> > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > +        *
> > > > > > > > +        * Return: True if back is initialized, False
> > > > > > > > otherwise
> > > > > > > > +        */
> > > > > > > > +       bool (*initialized)(struct xe_tlb_inval
> > > > > > > > *tlb_inval);
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +        * @flush: Flush pending TLB invalidations
> > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > +        */
> > > > > > > > +       void (*flush)(struct xe_tlb_inval *tlb_inval);
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +        * @timeout_delay: Timeout delay for TLB
> > > > > > > > invalidation
> > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > +        *
> > > > > > > > +        * Return: Timeout delay for TLB invalidation
> > > > > > > > in
> > > > > > > > jiffies
> > > > > > > > +        */
> > > > > > > > +       long (*timeout_delay)(struct xe_tlb_inval
> > > > > > > > *tlb_inval);
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +        * @lock: Lock resources protecting the backend
> > > > > > > > seqno
> > > > > > > > management
> > > > > > > > +        */
> > > > > > > > +       void (*lock)(struct xe_tlb_inval *tlb_inval);
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +        * @unlock: Lock resources protecting the
> > > > > > > > backend
> > > > > > > > seqno
> > > > > > > > management
> > > > > > > > +        */
> > > > > > > > +       void (*unlock)(struct xe_tlb_inval *tlb_inval);
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +/** struct xe_tlb_inval - TLB invalidation client
> > > > > > > > (frontend)
> > > > > > > > */
> > > > > > > >  struct xe_tlb_inval {
> > > > > > > >         /** @private: Backend private pointer */
> > > > > > > >         void *private;
> > > > > > > > +       /** @xe: Pointer to Xe device */
> > > > > > > > +       struct xe_device *xe;
> > > > > > > > +       /** @ops: TLB invalidation ops */
> > > > > > > > +       const struct xe_tlb_inval_ops *ops;
> > > > > > > >         /** @tlb_inval.seqno: TLB invalidation seqno,
> > > > > > > > protected
> > > > > > > > by CT lock */
> > > > > > > >  #define TLB_INVALIDATION_SEQNO_MAX     0x100000
> > > > > > > >         int seqno;
> > > > > > > > -- 
> > > > > > > > 2.34.1
> > > > > > > > 
> > > > > 
> > > 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend
  2025-07-23 21:22             ` Matthew Brost
  2025-07-23 22:03               ` Summers, Stuart
@ 2025-07-23 23:19               ` Summers, Stuart
  1 sibling, 0 replies; 19+ messages in thread
From: Summers, Stuart @ 2025-07-23 23:19 UTC (permalink / raw)
  To: Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
	Kassabri, Farah, Auld, Matthew

On Wed, 2025-07-23 at 14:22 -0700, Matthew Brost wrote:
> On Wed, Jul 23, 2025 at 02:55:24PM -0600, Summers, Stuart wrote:
> > On Wed, 2025-07-23 at 13:47 -0700, Matthew Brost wrote:
> > > 
> > 
> > <cut>
> > (just to reduce the noise in the rest of the patch here for now...)
> > 
> > > > > > >  
> > > > > > >  /**
> > > > > > > - * xe_tlb_inval_reset - Initialize TLB invalidation
> > > > > > > reset
> > > > > > > + * xe_tlb_inval_reset() - TLB invalidation reset
> > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > >   *
> > > > > > >   * Signal any pending invalidation fences, should be
> > > > > > > called
> > > > > > > during a GT reset
> > > > > > >   */
> > > > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval)
> > > > > > >  {
> > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > > > >         int pending_seqno;
> > > > > > >  
> > > > > > >         /*
> > > > > > > -        * we can get here before the CTs are even
> > > > > > > initialized if
> > > > > > > we're wedging
> > > > > > > -        * very early, in which case there are not going
> > > > > > > to
> > > > > > > be
> > > > > > > any pending
> > > > > > > -        * fences so we can bail immediately.
> > > > > > > +        * we can get here before the backends are even
> > > > > > > initialized if we're
> > > > > > > +        * wedging very early, in which case there are
> > > > > > > not
> > > > > > > going
> > > > > > > to be any
> > > > > > > +        * pendind fences so we can bail immediately.
> > > > > > >          */
> > > > > > > -       if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
> > > > > > > +       if (!tlb_inval->ops->initialized(tlb_inval))
> > > > > > >                 return;
> > > > > > >  
> > > > > > >         /*
> > > > > > > -        * CT channel is already disabled at this point.
> > > > > > > No
> > > > > > > new
> > > > > > > TLB requests can
> > > > > > > +        * Backend is already disabled at this point. No
> > > > > > > new
> > > > > > > TLB
> > > > > > > requests can
> > > > > > >          * appear.
> > > > > > >          */
> > > > > > >  
> > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > -       spin_lock_irq(&gt->tlb_inval.pending_lock);
> > > > > > > -       cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > > > > > > +       tlb_inval->ops->lock(tlb_inval);
> > > > > > 
> > > > > > I think you want a dedicated lock embedded in struct
> > > > > > xe_tlb_inval,
> > > > > > rather than reaching into the backend to grab one.
> > > > > > 
> > > > > > This will deadlock as written: G2H TLB inval messages are
> > > > > > sometimes
> > > > > > processed while holding ct->lock (non-fast path, unlikely)
> > > > > > and
> > > > > > sometimes
> > > > > > without it (fast path, likely).
> > > > > 
> > > > > Ugh, I'm off today. Ignore the deadlock part, I was confusing
> > > > > myself...
> > > > > I was thinking this was the function
> > > > > xe_tlb_inval_done_handler,
> > > > > it is
> > > > > not. I still think xe_tlb_inval should its own lock but this
> > > > > patch
> > > > > written should work with
> > > > > s/xe_guc_ct_send/xe_guc_ct_send_locked.
> > > > 
> > > > So one reason I didn't go that way is we did just the reverse
> > > > recently
> > > > - moved from a TLB dedicated lock to the more specific CT lock
> > > > since
> > > > these are all going into the CT handler anyway when we use GuC
> > > > submission. Then this embedded version allows us to lock at the
> > > > bottom
> > > > data layer rather than having a separate lock in the upper
> > > > layer.
> > > > Another thing is we might want to have different types of
> > > > invalidation
> > > > running in parallel without locking the data in the upper layer
> > > > since
> > > > the real contention would be in the lower level pipelining
> > > > anyway.
> > > > 
> > > 
> > > I can see the reasoning behind this approach, and maybe it’s
> > > fine.
> > > 
> > > But consider the case where the GuC backend has to look up a VM,
> > > iterate
> > > over a list of exec queues, and send multiple H2Gs to the
> > > hardware,
> > > each
> > > with a corresponding G2H (per-context invalidations). In the
> > > worst
> > > case,
> > > the CT code may have to wait for and process some G2Hs because
> > > our
> > > G2H
> > > credits are exhausted—all while holding the CT lock, which
> > > currently
> > > blocks any hardware submissions (i.e., hardware submissions need
> > > the
> > > CT
> > > lock). Now imagine multiple sources issuing invalidations: they
> > > could
> > > grab the CT lock before a submission waiting on it, further
> > > delaying
> > > that
> > > submission. 
> > > 
> > > The longer a mutex is held, the more likely the CPU thread
> > > holding it
> > > could switched out while holding it.
> > > 
> > > This doesn’t seem scalable compared to using a finer-grained CT
> > > lock
> > > (e.g., only taking it in xe_guc_ct_send).
> > > 
> > > I’m not saying this won’t work as you have it—I think it will—but
> > > the
> > > consequences of holding the CT lock for an extended period need
> > > to be
> > > considered.
> > 
> > Couple more thoughts.. so in the case you mentioned, ideally I'd
> > like
> > to have just a single invalidation per request, rather than across
> > a
> > whole VM. That's the reason we have the range based invalidation to
> 
> Yes, this is ranged based.
> 
> > begin with. If we get to the point where we want to make that even
> > finer, that's great, but we should still just have a single
> > invalidation per request (again, ideally).
> > 
> 
> Maybe you have a different idea, but I was thinking of queue-based
> invalidations: the frontend assigns a single seqno, the backend
> issues N
> invalidations to the hardware—one per GCID mapped in the VM/GT
> tuple—and
> then signals the frontend when all invalidations associated with the
> seqno are complete. With the GuC, a GCID corresponds to each exec
> queue’s
> gucid mapped in the VM/GT tuple. Different backends can handle this
> differently.

Yeah I guess I'm thinking it would be best to address that separately.
Right now we are doing a single invalidation per range (just talking
ppgtt updates here). That invalidation takes the ct lock (via the
function pointer), then goes into the guc to do the send and comes back
out, then releases the lock. Another range done subsequently either in
the same VM or separately will do the same operation.

Like I had mentioned in that other response, if we do decide to do
something based on a context ID of some kind where that represents a
range within a VM, we can do that too, but it should be on a per
context ID basis rather than just looping through all contexts within
that VM and invalidating everything. If we were to just blindly loop
through, the context IDs might each overlap within that range and so
you'd be potentially invalidating the same range multiple times as part
of the range based invalidation sent down to GuC. Versus the more
targeted approach that only invalidates for a specific address range or
a specific context ID that applies to that range. It isn't quite that
simple since most of these will be coming out of PT updates which might
not have a direct context ID reference, and so you have to do some
amount of calculation to determine the appropriate context (or set of
contexts) that fits in that PT range, but as mentioned if you are going
to try avoiding the duplicate invalidations, you really will have to do
that anyway.

I guess for a queue-based approach like you mentioned, I'd like to
tackle that as the need arises rather than implementing something
preemptively.

And of course you're right about the ct_send_locked() change above
which I'll make.

I don't think we have any major performance gaps here as this stands
today, so doing the locking the way I have it (minus the change for the
ct_send_locked() change as mentioned) keeps things simple, doesn't add
an extra lock at the TLB layer, and lets us lock the resources at the
backend that is actually touching the resource rather than at a higher
layer, particularly when we don't really need to combine them, at least
today.

But let me know what you think of all that.

Thanks,
Stuart

> 
> > Also, you already have some patches up on the list that do some
> > coalescing of invalidations so we reduce the number of
> > invalidations
> > for multiple ranges. I didn't want to include those patches here
> > because IMO they are really a separate feature here and it'd be
> > nice to
> > review that on its own.
> > 
> 
> I agree it is a seperate thing, that should help in some cases, and
> should be reviewed on its own.
> 
> That doesn't help in the case of multiple VM's issuing invalidations
> though (think eviction is occuring or MMU notifiers are firing). The
> lock contenion is moved from a dedicated TLB invalidation lock, to a
> widely shared CT lock. If multiple TLB invalidations are contending,
> now
> all other users of the CT lock contend at this higher level. i.e., by
> only acquring CT lock at last part of an invalidation, other waiters
> (non-invalidation) get QoS.
> 
> Matt
>  
> > So basically, the per request lock here also pushes us to implement
> > in
> > a more efficient and precise way rather than just hammering as many
> > invalidations over a given range as possible.
> > 
> > And of course there are going to need to be bigger hammer
> > invalidations
> > sometimes (like the full VF invalidation we're doing in the
> > invalidate_all() routines), but those still fall into the same
> > category
> > of precision, just with a larger scope (rather than multiple
> > smaller
> > invalidations).
> > 
> > Thanks,
> > Stuart
> > 
> > > 
> > > Matt
> > > 
> > > > Thanks,
> > > > Stuart
> > > > 
> > > > > 
> > > > > Matt 
> > > > > 
> > > > > > 
> > > > > > I’d call this lock seqno_lock, since it protects exactly
> > > > > > that—the
> > > > > > order
> > > > > > in which a seqno is assigned by the frontend and handed to
> > > > > > the
> > > > > > backend.
> > > > > > 
> > > > > > Prime this lock for reclaim as well—do what primelockdep()
> > > > > > does
> > > > > > in
> > > > > > xe_guc_ct.c—to make it clear that memory allocations are
> > > > > > not
> > > > > > allowed
> > > > > > while the lock is held as TLB invalidations can be called
> > > > > > from
> > > > > > two
> > > > > > reclaim paths:
> > > > > > 
> > > > > > - MMU notifier callbacks
> > > > > > - The dma-fence signaling path of VM binds that require a
> > > > > > TLB
> > > > > >   invalidation
> > > > > > 
> > > > > > > +       spin_lock_irq(&tlb_inval->pending_lock);
> > > > > > > +       cancel_delayed_work(&tlb_inval->fence_tdr);
> > > > > > >         /*
> > > > > > >          * We might have various kworkers waiting for TLB
> > > > > > > flushes
> > > > > > > to complete
> > > > > > >          * which are not tracked with an explicit TLB
> > > > > > > fence,
> > > > > > > however at this
> > > > > > > -        * stage that will never happen since the CT is
> > > > > > > already
> > > > > > > disabled, so
> > > > > > > -        * make sure we signal them here under the
> > > > > > > assumption
> > > > > > > that we have
> > > > > > > +        * stage that will never happen since the backend
> > > > > > > is
> > > > > > > already disabled,
> > > > > > > +        * so make sure we signal them here under the
> > > > > > > assumption
> > > > > > > that we have
> > > > > > >          * completed a full GT reset.
> > > > > > >          */
> > > > > > > -       if (gt->tlb_inval.seqno == 1)
> > > > > > > +       if (tlb_inval->seqno == 1)
> > > > > > >                 pending_seqno =
> > > > > > > TLB_INVALIDATION_SEQNO_MAX -
> > > > > > > 1;
> > > > > > >         else
> > > > > > > -               pending_seqno = gt->tlb_inval.seqno - 1;
> > > > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv,
> > > > > > > pending_seqno);
> > > > > > > +               pending_seqno = tlb_inval->seqno - 1;
> > > > > > > +       WRITE_ONCE(tlb_inval->seqno_recv, pending_seqno);
> > > > > > >  
> > > > > > >         list_for_each_entry_safe(fence, next,
> > > > > > > -                                &gt-
> > > > > > > > tlb_inval.pending_fences,
> > > > > > > link)
> > > > > > > -               inval_fence_signal(gt_to_xe(gt), fence);
> > > > > > > -       spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > +                                &tlb_inval-
> > > > > > > >pending_fences,
> > > > > > > link)
> > > > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > > > +       spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > > > +       tlb_inval->ops->unlock(tlb_inval);
> > > > > > >  }
> > > > > > >  
> > > > > > > -static bool tlb_inval_seqno_past(struct xe_gt *gt, int
> > > > > > > seqno)
> > > > > > > +static bool xe_tlb_inval_seqno_past(struct xe_tlb_inval
> > > > > > > *tlb_inval, int seqno)
> > > > > > >  {
> > > > > > > -       int seqno_recv = READ_ONCE(gt-
> > > > > > > >tlb_inval.seqno_recv);
> > > > > > > +       int seqno_recv = READ_ONCE(tlb_inval-
> > > > > > > >seqno_recv);
> > > > > > > +
> > > > > > > +       lockdep_assert_held(&tlb_inval->pending_lock);
> > > > > > >  
> > > > > > >         if (seqno - seqno_recv < -
> > > > > > > (TLB_INVALIDATION_SEQNO_MAX
> > > > > > > /
> > > > > > > 2))
> > > > > > >                 return false;
> > > > > > > @@ -201,44 +192,20 @@ static bool
> > > > > > > tlb_inval_seqno_past(struct
> > > > > > > xe_gt *gt, int seqno)
> > > > > > >         return seqno_recv >= seqno;
> > > > > > >  }
> > > > > > >  
> > > > > > > -static int send_tlb_inval(struct xe_guc *guc, const u32
> > > > > > > *action,
> > > > > > > int len)
> > > > > > > -{
> > > > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > > > -
> > > > > > > -       xe_gt_assert(gt, action[1]);    /* Seqno */
> > > > > > > -       lockdep_assert_held(&guc->ct.lock);
> > > > > > > -
> > > > > > > -       /*
> > > > > > > -        * XXX: The seqno algorithm relies on TLB
> > > > > > > invalidation
> > > > > > > being processed
> > > > > > > -        * in order which they currently are, if that
> > > > > > > changes
> > > > > > > the
> > > > > > > algorithm will
> > > > > > > -        * need to be updated.
> > > > > > > -        */
> > > > > > > -
> > > > > > > -       xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL,
> > > > > > > 1);
> > > > > > > -
> > > > > > > -       return xe_guc_ct_send(&guc->ct, action, len,
> > > > > > > -                             G2H_LEN_DW_TLB_INVALIDATE,
> > > > > > > 1);
> > > > > > > -}
> > > > > > > -
> > > > > > >  static void xe_tlb_inval_fence_prep(struct
> > > > > > > xe_tlb_inval_fence
> > > > > > > *fence)
> > > > > > >  {
> > > > > > >         struct xe_tlb_inval *tlb_inval = fence-
> > > > > > > >tlb_inval;
> > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > -
> > > > > > > -       lockdep_assert_held(&gt->uc.guc.ct.lock);
> > > > > > >  
> > > > > > >         fence->seqno = tlb_inval->seqno;
> > > > > > > -       trace_xe_tlb_inval_fence_send(xe, fence);
> > > > > > > +       trace_xe_tlb_inval_fence_send(tlb_inval->xe,
> > > > > > > fence);
> > > > > > >  
> > > > > > >         spin_lock_irq(&tlb_inval->pending_lock);
> > > > > > >         fence->inval_time = ktime_get();
> > > > > > >         list_add_tail(&fence->link, &tlb_inval-
> > > > > > > > pending_fences);
> > > > > > >  
> > > > > > >         if (list_is_singular(&tlb_inval->pending_fences))
> > > > > > > -               queue_delayed_work(system_wq,
> > > > > > > -                                  &tlb_inval->fence_tdr,
> > > > > > > -                                 
> > > > > > > tlb_timeout_jiffies(gt));
> > > > > > > +               queue_delayed_work(system_wq, &tlb_inval-
> > > > > > > > fence_tdr,
> > > > > > > +                                  tlb_inval->ops-
> > > > > > > > timeout_delay(tlb_inval));
> > > > > > >         spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > > >  
> > > > > > >         tlb_inval->seqno = (tlb_inval->seqno + 1) %
> > > > > > > @@ -247,202 +214,63 @@ static void
> > > > > > > xe_tlb_inval_fence_prep(struct
> > > > > > > xe_tlb_inval_fence *fence)
> > > > > > >                 tlb_inval->seqno = 1;
> > > > > > >  }
> > > > > > >  
> > > > > > > -#define MAKE_INVAL_OP(type)    ((type <<
> > > > > > > XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > > > > > > -               XE_GUC_TLB_INVAL_MODE_HEAVY <<
> > > > > > > XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > > > > > > -               XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > > > > > > -
> > > > > > > -static int send_tlb_inval_ggtt(struct xe_gt *gt, int
> > > > > > > seqno)
> > > > > > > -{
> > > > > > > -       u32 action[] = {
> > > > > > > -               XE_GUC_ACTION_TLB_INVALIDATION,
> > > > > > > -               seqno,
> > > > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > > > > > > -       };
> > > > > > > -
> > > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > > ARRAY_SIZE(action));
> > > > > > > -}
> > > > > > > -
> > > > > > > -static int send_tlb_inval_all(struct xe_tlb_inval
> > > > > > > *tlb_inval,
> > > > > > > -                             struct xe_tlb_inval_fence
> > > > > > > *fence)
> > > > > > > -{
> > > > > > > -       u32 action[] = {
> > > > > > > -               XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > > > > > > -               0,  /* seqno, replaced in send_tlb_inval
> > > > > > > */
> > > > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > > > > > -       };
> > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > -
> > > > > > > -       xe_gt_assert(gt, fence);
> > > > > > > -
> > > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > > ARRAY_SIZE(action));
> > > > > > > -}
> > > > > > > +#define xe_tlb_inval_issue(__tlb_inval, __fence, op,
> > > > > > > args...)  \
> > > > > > > +({                                                      
> > > > > > >     
> > > > > > >    \
> > > > > > > +       int
> > > > > > > __ret;                                              \
> > > > > > > +                                                        
> > > > > > >     
> > > > > > >    \
> > > > > > > +       xe_assert((__tlb_inval)->xe, (__tlb_inval)-
> > > > > > > > ops);       \
> > > > > > > +       xe_assert((__tlb_inval)->xe,
> > > > > > > (__fence));                \
> > > > > > > +                                                        
> > > > > > >     
> > > > > > >    \
> > > > > > > +       (__tlb_inval)->ops-
> > > > > > > > lock((__tlb_inval));                \
> > > > > > > +       xe_tlb_inval_fence_prep((__fence));              
> > > > > > >     
> > > > > > >    \
> > > > > > > +       __ret = op((__tlb_inval), (__fence)->seqno,
> > > > > > > ##args);    \
> > > > > > > +       if (__ret <
> > > > > > > 0)                                          \
> > > > > > > +               xe_tlb_inval_fence_signal_unlocked((__fen
> > > > > > > ce))
> > > > > > > ;  \
> > > > > > > +       (__tlb_inval)->ops-
> > > > > > > > unlock((__tlb_inval));              \
> > > > > > > +                                                        
> > > > > > >     
> > > > > > >    \
> > > > > > > +       __ret == -ECANCELED ? 0 :
> > > > > > > __ret;                        \
> > > > > > > +})
> > > > > > >  
> > > > > > >  /**
> > > > > > > - * xe_gt_tlb_invalidation_all - Invalidate all TLBs
> > > > > > > across
> > > > > > > PF
> > > > > > > and all VFs.
> > > > > > > - * @gt: the &xe_gt structure
> > > > > > > - * @fence: the &xe_tlb_inval_fence to be signaled on
> > > > > > > completion
> > > > > > > + * xe_tlb_inval_all() - Issue a TLB invalidation for all
> > > > > > > TLBs
> > > > > > > + * @tlb_inval: TLB invalidation client
> > > > > > > + * @fence: invalidation fence which will be signal on
> > > > > > > TLB
> > > > > > > invalidation
> > > > > > > + * completion
> > > > > > >   *
> > > > > > > - * Send a request to invalidate all TLBs across PF and
> > > > > > > all
> > > > > > > VFs.
> > > > > > > + * Issue a TLB invalidation for all TLBs. Completion of
> > > > > > > TLB
> > > > > > > is
> > > > > > > asynchronous and
> > > > > > > + * caller can use the invalidation fence to wait for
> > > > > > > completion.
> > > > > > >   *
> > > > > > >   * Return: 0 on success, negative error code on error
> > > > > > >   */
> > > > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > > > >                      struct xe_tlb_inval_fence *fence)
> > > > > > >  {
> > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > -       int err;
> > > > > > > -
> > > > > > > -       err = send_tlb_inval_all(tlb_inval, fence);
> > > > > > > -       if (err)
> > > > > > > -               xe_gt_err(gt, "TLB invalidation request
> > > > > > > failed
> > > > > > > (%pe)", ERR_PTR(err));
> > > > > > > -
> > > > > > > -       return err;
> > > > > > > -}
> > > > > > > -
> > > > > > > -/*
> > > > > > > - * Ensure that roundup_pow_of_two(length) doesn't
> > > > > > > overflow.
> > > > > > > - * Note that roundup_pow_of_two() operates on unsigned
> > > > > > > long,
> > > > > > > - * not on u64.
> > > > > > > - */
> > > > > > > -#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > > > > > > (rounddown_pow_of_two(ULONG_MAX))
> > > > > > > -
> > > > > > > -static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64
> > > > > > > start,
> > > > > > > u64
> > > > > > > end,
> > > > > > > -                               u32 asid, int seqno)
> > > > > > > -{
> > > > > > > -#define MAX_TLB_INVALIDATION_LEN       7
> > > > > > > -       u32 action[MAX_TLB_INVALIDATION_LEN];
> > > > > > > -       u64 length = end - start;
> > > > > > > -       int len = 0;
> > > > > > > -
> > > > > > > -       action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > > > > > > -       action[len++] = seqno;
> > > > > > > -       if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > > > > > -           length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > > > > > > -               action[len++] =
> > > > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > > > > > > -       } else {
> > > > > > > -               u64 orig_start = start;
> > > > > > > -               u64 align;
> > > > > > > -
> > > > > > > -               if (length < SZ_4K)
> > > > > > > -                       length = SZ_4K;
> > > > > > > -
> > > > > > > -               /*
> > > > > > > -                * We need to invalidate a higher
> > > > > > > granularity
> > > > > > > if
> > > > > > > start address
> > > > > > > -                * is not aligned to length. When start
> > > > > > > is
> > > > > > > not
> > > > > > > aligned with
> > > > > > > -                * length we need to find the length
> > > > > > > large
> > > > > > > enough
> > > > > > > to create an
> > > > > > > -                * address mask covering the required
> > > > > > > range.
> > > > > > > -                */
> > > > > > > -               align = roundup_pow_of_two(length);
> > > > > > > -               start = ALIGN_DOWN(start, align);
> > > > > > > -               end = ALIGN(end, align);
> > > > > > > -               length = align;
> > > > > > > -               while (start + length < end) {
> > > > > > > -                       length <<= 1;
> > > > > > > -                       start = ALIGN_DOWN(orig_start,
> > > > > > > length);
> > > > > > > -               }
> > > > > > > -
> > > > > > > -               /*
> > > > > > > -                * Minimum invalidation size for a 2MB
> > > > > > > page
> > > > > > > that
> > > > > > > the hardware
> > > > > > > -                * expects is 16MB
> > > > > > > -                */
> > > > > > > -               if (length >= SZ_2M) {
> > > > > > > -                       length = max_t(u64, SZ_16M,
> > > > > > > length);
> > > > > > > -                       start = ALIGN_DOWN(orig_start,
> > > > > > > length);
> > > > > > > -               }
> > > > > > > -
> > > > > > > -               xe_gt_assert(gt, length >= SZ_4K);
> > > > > > > -               xe_gt_assert(gt, is_power_of_2(length));
> > > > > > > -               xe_gt_assert(gt, !(length &
> > > > > > > GENMASK(ilog2(SZ_16M)
> > > > > > > - 1,
> > > > > > > -                                                  
> > > > > > > ilog2(SZ_2M)
> > > > > > > + 1)));
> > > > > > > -               xe_gt_assert(gt, IS_ALIGNED(start,
> > > > > > > length));
> > > > > > > -
> > > > > > > -               action[len++] =
> > > > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > > > > > > -               action[len++] = asid;
> > > > > > > -               action[len++] = lower_32_bits(start);
> > > > > > > -               action[len++] = upper_32_bits(start);
> > > > > > > -               action[len++] = ilog2(length) -
> > > > > > > ilog2(SZ_4K);
> > > > > > > -       }
> > > > > > > -
> > > > > > > -       xe_gt_assert(gt, len <=
> > > > > > > MAX_TLB_INVALIDATION_LEN);
> > > > > > > -
> > > > > > > -       return send_tlb_inval(&gt->uc.guc, action, len);
> > > > > > > -}
> > > > > > > -
> > > > > > > -static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
> > > > > > > -                              struct xe_tlb_inval_fence
> > > > > > > *fence)
> > > > > > > -{
> > > > > > > -       int ret;
> > > > > > > -
> > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > -
> > > > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > > > -
> > > > > > > -       ret = send_tlb_inval_ggtt(gt, fence->seqno);
> > > > > > > -       if (ret < 0)
> > > > > > > -               inval_fence_signal_unlocked(gt_to_xe(gt),
> > > > > > > fence);
> > > > > > > -
> > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > -
> > > > > > > -       /*
> > > > > > > -        * -ECANCELED indicates the CT is stopped for a
> > > > > > > GT
> > > > > > > reset.
> > > > > > > TLB caches
> > > > > > > -        *  should be nuked on a GT reset so this error
> > > > > > > can
> > > > > > > be
> > > > > > > ignored.
> > > > > > > -        */
> > > > > > > -       if (ret == -ECANCELED)
> > > > > > > -               return 0;
> > > > > > > -
> > > > > > > -       return ret;
> > > > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > > > tlb_inval-
> > > > > > > > ops->all);
> > > > > > >  }
> > > > > > >  
> > > > > > >  /**
> > > > > > > - * xe_tlb_inval_ggtt - Issue a TLB invalidation on this
> > > > > > > GT
> > > > > > > for
> > > > > > > the GGTT
> > > > > > > + * xe_tlb_inval_ggtt() - Issue a TLB invalidation for
> > > > > > > the
> > > > > > > GGTT
> > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > >   *
> > > > > > > - * Issue a TLB invalidation for the GGTT. Completion of
> > > > > > > TLB
> > > > > > > invalidation is
> > > > > > > - * synchronous.
> > > > > > > + * Issue a TLB invalidation for the GGTT. Completion of
> > > > > > > TLB
> > > > > > > is
> > > > > > > asynchronous and
> > > > > > > + * caller can use the invalidation fence to wait for
> > > > > > > completion.
> > > > > > >   *
> > > > > > >   * Return: 0 on success, negative error code on error
> > > > > > >   */
> > > > > > >  int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
> > > > > > >  {
> > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > -       unsigned int fw_ref;
> > > > > > > -
> > > > > > > -       if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
> > > > > > > -           gt->uc.guc.submission_state.enabled) {
> > > > > > > -               struct xe_tlb_inval_fence fence;
> > > > > > > -               int ret;
> > > > > > > -
> > > > > > > -               xe_tlb_inval_fence_init(tlb_inval,
> > > > > > > &fence,
> > > > > > > true);
> > > > > > > -               ret = __xe_tlb_inval_ggtt(gt, &fence);
> > > > > > > -               if (ret)
> > > > > > > -                       return ret;
> > > > > > > -
> > > > > > > -               xe_tlb_inval_fence_wait(&fence);
> > > > > > > -       } else if (xe_device_uc_enabled(xe) &&
> > > > > > > !xe_device_wedged(xe)) {
> > > > > > > -               struct xe_mmio *mmio = &gt->mmio;
> > > > > > > -
> > > > > > > -               if (IS_SRIOV_VF(xe))
> > > > > > > -                       return 0;
> > > > > > > -
> > > > > > > -               fw_ref = xe_force_wake_get(gt_to_fw(gt),
> > > > > > > XE_FW_GT);
> > > > > > > -               if (xe->info.platform == XE_PVC ||
> > > > > > > GRAPHICS_VER(xe) >= 20) {
> > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > PVC_GUC_TLB_INV_DESC1,
> > > > > > > -
> > > > > > >                                        PVC_GUC_TLB_INV_DE
> > > > > > > SC1_
> > > > > > > INVAL
> > > > > > > IDATE);
> > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > PVC_GUC_TLB_INV_DESC0,
> > > > > > > -
> > > > > > >                                        PVC_GUC_TLB_INV_DE
> > > > > > > SC0_
> > > > > > > VALID
> > > > > > > );
> > > > > > > -               } else {
> > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > GUC_TLB_INV_CR,
> > > > > > > -
> > > > > > >                                        GUC_TLB_INV_CR_INV
> > > > > > > ALID
> > > > > > > ATE);
> > > > > > > -               }
> > > > > > > -               xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > > > > > > -       }
> > > > > > > +       struct xe_tlb_inval_fence fence, *fence_ptr =
> > > > > > > &fence;
> > > > > > > +       int ret;
> > > > > > >  
> > > > > > > -       return 0;
> > > > > > > +       xe_tlb_inval_fence_init(tlb_inval, fence_ptr,
> > > > > > > true);
> > > > > > > +       ret = xe_tlb_inval_issue(tlb_inval, fence_ptr,
> > > > > > > tlb_inval-
> > > > > > > > ops->ggtt);
> > > > > > > +       xe_tlb_inval_fence_wait(fence_ptr);
> > > > > > > +
> > > > > > > +       return ret;
> > > > > > >  }
> > > > > > >  
> > > > > > >  /**
> > > > > > > - * xe_tlb_inval_range - Issue a TLB invalidation on this
> > > > > > > GT
> > > > > > > for
> > > > > > > an address range
> > > > > > > + * xe_tlb_inval_range() - Issue a TLB invalidation for
> > > > > > > an
> > > > > > > address range
> > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > >   * @fence: invalidation fence which will be signal on
> > > > > > > TLB
> > > > > > > invalidation
> > > > > > >   * completion
> > > > > > > @@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct
> > > > > > > xe_tlb_inval
> > > > > > > *tlb_inval,
> > > > > > >                        struct xe_tlb_inval_fence *fence,
> > > > > > > u64
> > > > > > > start, u64 end,
> > > > > > >                        u32 asid)
> > > > > > >  {
> > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > -       int  ret;
> > > > > > > -
> > > > > > > -       xe_gt_assert(gt, fence);
> > > > > > > -
> > > > > > > -       /* Execlists not supported */
> > > > > > > -       if (xe->info.force_execlist) {
> > > > > > > -               __inval_fence_signal(xe, fence);
> > > > > > > -               return 0;
> > > > > > > -       }
> > > > > > > -
> > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > -
> > > > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > > > -
> > > > > > > -       ret = send_tlb_inval_ppgtt(gt, start, end, asid,
> > > > > > > fence-
> > > > > > > > seqno);
> > > > > > > -       if (ret < 0)
> > > > > > > -               inval_fence_signal_unlocked(xe, fence);
> > > > > > > -
> > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > -
> > > > > > > -       return ret;
> > > > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > > > tlb_inval-
> > > > > > > > ops->ppgtt,
> > > > > > > +                                 start, end, asid);
> > > > > > >  }
> > > > > > >  
> > > > > > >  /**
> > > > > > > - * xe_tlb_inval_vm - Issue a TLB invalidation on this GT
> > > > > > > for
> > > > > > > a
> > > > > > > VM
> > > > > > > + * xe_tlb_inval_vm() - Issue a TLB invalidation for a VM
> > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > >   * @vm: VM to invalidate
> > > > > > >   *
> > > > > > > @@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct
> > > > > > > xe_tlb_inval
> > > > > > > *tlb_inval, struct xe_vm *vm)
> > > > > > >  {
> > > > > > >         struct xe_tlb_inval_fence fence;
> > > > > > >         u64 range = 1ull << vm->xe->info.va_bits;
> > > > > > > -       int ret;
> > > > > > >  
> > > > > > >         xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> > > > > > > -
> > > > > > > -       ret = xe_tlb_inval_range(tlb_inval, &fence, 0,
> > > > > > > range,
> > > > > > > vm-
> > > > > > > > usm.asid);
> > > > > > > -       if (ret < 0)
> > > > > > > -               return;
> > > > > > > -
> > > > > > > +       xe_tlb_inval_range(tlb_inval, &fence, 0, range,
> > > > > > > vm-
> > > > > > > > usm.asid);
> > > > > > >         xe_tlb_inval_fence_wait(&fence);
> > > > > > >  }
> > > > > > >  
> > > > > > >  /**
> > > > > > > - * xe_tlb_inval_done_handler - TLB invalidation done
> > > > > > > handler
> > > > > > > - * @gt: gt
> > > > > > > + * xe_tlb_inval_done_handler() - TLB invalidation done
> > > > > > > handler
> > > > > > > + * @tlb_inval: TLB invalidation client
> > > > > > >   * @seqno: seqno of invalidation that is done
> > > > > > >   *
> > > > > > >   * Update recv seqno, signal any TLB invalidation
> > > > > > > fences,
> > > > > > > and
> > > > > > > restart TDR
> > > > > > 
> > > > > > I'd mention that is function is safe be called from any
> > > > > > context
> > > > > > (i.e.,
> > > > > > process, atomic, and hardirq contexts are allowed).
> > > > > > 
> > > > > > We might need to convert tlb_inval.pending_lock to a
> > > > > > raw_spinlock_t
> > > > > > for
> > > > > > PREEMPT_RT enablement. Same for the GuC fast_lock. AFAIK we
> > > > > > haven’t
> > > > > > had
> > > > > > any complaints, so maybe I’m just overthinking it, but also
> > > > > > perhaps
> > > > > > not.
> > > > > > 
> > > > > > >   */
> > > > > > > -static void xe_tlb_inval_done_handler(struct xe_gt *gt,
> > > > > > > int
> > > > > > > seqno)
> > > > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > > > *tlb_inval,
> > > > > > > int seqno)
> > > > > > >  {
> > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > +       struct xe_device *xe = tlb_inval->xe;
> > > > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > > > >         unsigned long flags;
> > > > > > >  
> > > > > > > @@ -535,77 +337,53 @@ static void
> > > > > > > xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> > > > > > >          * officially process the CT message like if
> > > > > > > racing
> > > > > > > against
> > > > > > >          * process_g2h_msg().
> > > > > > >          */
> > > > > > > -       spin_lock_irqsave(&gt->tlb_inval.pending_lock,
> > > > > > > flags);
> > > > > > > -       if (tlb_inval_seqno_past(gt, seqno)) {
> > > > > > > -               spin_unlock_irqrestore(&gt-
> > > > > > > > tlb_inval.pending_lock, flags);
> > > > > > > +       spin_lock_irqsave(&tlb_inval->pending_lock,
> > > > > > > flags);
> > > > > > > +       if (xe_tlb_inval_seqno_past(tlb_inval, seqno)) {
> > > > > > > +               spin_unlock_irqrestore(&tlb_inval-
> > > > > > > > pending_lock,
> > > > > > > flags);
> > > > > > >                 return;
> > > > > > >         }
> > > > > > >  
> > > > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
> > > > > > > +       WRITE_ONCE(tlb_inval->seqno_recv, seqno);
> > > > > > >  
> > > > > > >         list_for_each_entry_safe(fence, next,
> > > > > > > -                                &gt-
> > > > > > > > tlb_inval.pending_fences,
> > > > > > > link) {
> > > > > > > +                                &tlb_inval-
> > > > > > > >pending_fences,
> > > > > > > link) {
> > > > > > >                 trace_xe_tlb_inval_fence_recv(xe, fence);
> > > > > > >  
> > > > > > > -               if (!tlb_inval_seqno_past(gt, fence-
> > > > > > > >seqno))
> > > > > > > +               if (!xe_tlb_inval_seqno_past(tlb_inval,
> > > > > > > fence-
> > > > > > > > seqno))
> > > > > > >                         break;
> > > > > > >  
> > > > > > > -               inval_fence_signal(xe, fence);
> > > > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > > >         }
> > > > > > >  
> > > > > > > -       if (!list_empty(&gt->tlb_inval.pending_fences))
> > > > > > > +       if (!list_empty(&tlb_inval->pending_fences))
> > > > > > >                 mod_delayed_work(system_wq,
> > > > > > > -                                &gt-
> > > > > > > >tlb_inval.fence_tdr,
> > > > > > > -                               
> > > > > > > tlb_timeout_jiffies(gt));
> > > > > > > +                                &tlb_inval->fence_tdr,
> > > > > > > +                                tlb_inval->ops-
> > > > > > > > timeout_delay(tlb_inval));
> > > > > > >         else
> > > > > > > -               cancel_delayed_work(&gt-
> > > > > > > > tlb_inval.fence_tdr);
> > > > > > > +               cancel_delayed_work(&tlb_inval-
> > > > > > > >fence_tdr);
> > > > > > >  
> > > > > > > -       spin_unlock_irqrestore(&gt-
> > > > > > > >tlb_inval.pending_lock,
> > > > > > > flags);
> > > > > > > -}
> > > > > > > -
> > > > > > > -/**
> > > > > > > - * xe_guc_tlb_inval_done_handler - TLB invalidation done
> > > > > > > handler
> > > > > > > - * @guc: guc
> > > > > > > - * @msg: message indicating TLB invalidation done
> > > > > > > - * @len: length of message
> > > > > > > - *
> > > > > > > - * Parse seqno of TLB invalidation, wake any waiters for
> > > > > > > seqno,
> > > > > > > and signal any
> > > > > > > - * invalidation fences for seqno. Algorithm for this
> > > > > > > depends
> > > > > > > on
> > > > > > > seqno being
> > > > > > > - * received in-order and asserts this assumption.
> > > > > > > - *
> > > > > > > - * Return: 0 on success, -EPROTO for malformed messages.
> > > > > > > - */
> > > > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc,
> > > > > > > u32
> > > > > > > *msg,
> > > > > > > u32 len)
> > > > > > > -{
> > > > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > > > -
> > > > > > > -       if (unlikely(len != 1))
> > > > > > > -               return -EPROTO;
> > > > > > > -
> > > > > > > -       xe_tlb_inval_done_handler(gt, msg[0]);
> > > > > > > -
> > > > > > > -       return 0;
> > > > > > > +       spin_unlock_irqrestore(&tlb_inval->pending_lock,
> > > > > > > flags);
> > > > > > >  }
> > > > > > >  
> > > > > > >  static const char *
> > > > > > > -inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > > > > > > +xe_inval_fence_get_driver_name(struct dma_fence
> > > > > > > *dma_fence)
> > > > > > >  {
> > > > > > >         return "xe";
> > > > > > >  }
> > > > > > >  
> > > > > > >  static const char *
> > > > > > > -inval_fence_get_timeline_name(struct dma_fence
> > > > > > > *dma_fence)
> > > > > > > +xe_inval_fence_get_timeline_name(struct dma_fence
> > > > > > > *dma_fence)
> > > > > > >  {
> > > > > > > -       return "inval_fence";
> > > > > > > +       return "tlb_inval_fence";
> > > > > > >  }
> > > > > > >  
> > > > > > >  static const struct dma_fence_ops inval_fence_ops = {
> > > > > > > -       .get_driver_name = inval_fence_get_driver_name,
> > > > > > > -       .get_timeline_name =
> > > > > > > inval_fence_get_timeline_name,
> > > > > > > +       .get_driver_name =
> > > > > > > xe_inval_fence_get_driver_name,
> > > > > > > +       .get_timeline_name =
> > > > > > > xe_inval_fence_get_timeline_name,
> > > > > > >  };
> > > > > > >  
> > > > > > >  /**
> > > > > > > - * xe_tlb_inval_fence_init - Initialize TLB invalidation
> > > > > > > fence
> > > > > > > + * xe_tlb_inval_fence_init() - Initialize TLB
> > > > > > > invalidation
> > > > > > > fence
> > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > >   * @fence: TLB invalidation fence to initialize
> > > > > > >   * @stack: fence is stack variable
> > > > > > > @@ -618,15 +396,12 @@ void xe_tlb_inval_fence_init(struct
> > > > > > > xe_tlb_inval *tlb_inval,
> > > > > > >                              struct xe_tlb_inval_fence
> > > > > > > *fence,
> > > > > > >                              bool stack)
> > > > > > >  {
> > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > -
> > > > > > > -       xe_pm_runtime_get_noresume(gt_to_xe(gt));
> > > > > > > +       xe_pm_runtime_get_noresume(tlb_inval->xe);
> > > > > > >  
> > > > > > > -       spin_lock_irq(&gt->tlb_inval.lock);
> > > > > > > -       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > > > > -                      &gt->tlb_inval.lock,
> > > > > > > +       spin_lock_irq(&tlb_inval->lock);
> > > > > > > +       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > > > > &tlb_inval->lock,
> > > > > > >                        dma_fence_context_alloc(1), 1);
> > > > > > > -       spin_unlock_irq(&gt->tlb_inval.lock);
> > > > > > > +       spin_unlock_irq(&tlb_inval->lock);
> > > > > > 
> > > > > > While here, 'fence_lock' is probably a better name.
> > > > > > 
> > > > > > Matt
> > > > > > 
> > > > > > >         INIT_LIST_HEAD(&fence->link);
> > > > > > >         if (stack)
> > > > > > >                 set_bit(FENCE_STACK_BIT, &fence-
> > > > > > > >base.flags);
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > index 7adee3f8c551..cdeafc8d4391 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > @@ -18,24 +18,30 @@ struct xe_vma;
> > > > > > >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
> > > > > > >  
> > > > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval);
> > > > > > > -int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > > > > -void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval,
> > > > > > > struct
> > > > > > > xe_vm *vm);
> > > > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > > > >                      struct xe_tlb_inval_fence *fence);
> > > > > > > +int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > > > > +void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval,
> > > > > > > struct
> > > > > > > xe_vm *vm);
> > > > > > >  int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
> > > > > > >                        struct xe_tlb_inval_fence *fence,
> > > > > > >                        u64 start, u64 end, u32 asid);
> > > > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc,
> > > > > > > u32
> > > > > > > *msg,
> > > > > > > u32 len);
> > > > > > >  
> > > > > > >  void xe_tlb_inval_fence_init(struct xe_tlb_inval
> > > > > > > *tlb_inval,
> > > > > > >                              struct xe_tlb_inval_fence
> > > > > > > *fence,
> > > > > > >                              bool stack);
> > > > > > > -void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence
> > > > > > > *fence);
> > > > > > >  
> > > > > > > +/**
> > > > > > > + * xe_tlb_inval_fence_wait() - TLB invalidiation fence
> > > > > > > wait
> > > > > > > + * @fence: TLB invalidation fence to wait on
> > > > > > > + *
> > > > > > > + * Wait on a TLB invalidiation fence until it signals,
> > > > > > > non
> > > > > > > interruptable
> > > > > > > + */
> > > > > > >  static inline void
> > > > > > >  xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence
> > > > > > > *fence)
> > > > > > >  {
> > > > > > >         dma_fence_wait(&fence->base, false);
> > > > > > >  }
> > > > > > >  
> > > > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > > > *tlb_inval,
> > > > > > > int seqno);
> > > > > > > +
> > > > > > >  #endif /* _XE_TLB_INVAL_ */
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > index 05b6adc929bb..c1ad96d24fc8 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > @@ -9,10 +9,85 @@
> > > > > > >  #include <linux/workqueue.h>
> > > > > > >  #include <linux/dma-fence.h>
> > > > > > >  
> > > > > > > -/** struct xe_tlb_inval - TLB invalidation client */
> > > > > > > +struct xe_tlb_inval;
> > > > > > > +
> > > > > > > +/** struct xe_tlb_inval_ops - TLB invalidation ops
> > > > > > > (backend)
> > > > > > > */
> > > > > > > +struct xe_tlb_inval_ops {
> > > > > > > +       /**
> > > > > > > +        * @all: Invalidate all TLBs
> > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > +        *
> > > > > > > +        * Return 0 on success, -ECANCELED if backend is
> > > > > > > mid-
> > > > > > > reset, error on
> > > > > > > +        * failure
> > > > > > > +        */
> > > > > > > +       int (*all)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > > seqno);
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +        * @ggtt: Invalidate global translation TLBs
> > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > +        *
> > > > > > > +        * Return 0 on success, -ECANCELED if backend is
> > > > > > > mid-
> > > > > > > reset, error on
> > > > > > > +        * failure
> > > > > > > +        */
> > > > > > > +       int (*ggtt)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > > seqno);
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +        * @ppttt: Invalidate per-process translation
> > > > > > > TLBs
> > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > +        * @start: Start address
> > > > > > > +        * @end: End address
> > > > > > > +        * @asid: Address space ID
> > > > > > > +        *
> > > > > > > +        * Return 0 on success, -ECANCELED if backend is
> > > > > > > mid-
> > > > > > > reset, error on
> > > > > > > +        * failure
> > > > > > > +        */
> > > > > > > +       int (*ppgtt)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > > seqno,
> > > > > > > u64 start,
> > > > > > > +                    u64 end, u32 asid);
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +        * @initialized: Backend is initialized
> > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > +        *
> > > > > > > +        * Return: True if back is initialized, False
> > > > > > > otherwise
> > > > > > > +        */
> > > > > > > +       bool (*initialized)(struct xe_tlb_inval
> > > > > > > *tlb_inval);
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +        * @flush: Flush pending TLB invalidations
> > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > +        */
> > > > > > > +       void (*flush)(struct xe_tlb_inval *tlb_inval);
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +        * @timeout_delay: Timeout delay for TLB
> > > > > > > invalidation
> > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > +        *
> > > > > > > +        * Return: Timeout delay for TLB invalidation in
> > > > > > > jiffies
> > > > > > > +        */
> > > > > > > +       long (*timeout_delay)(struct xe_tlb_inval
> > > > > > > *tlb_inval);
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +        * @lock: Lock resources protecting the backend
> > > > > > > seqno
> > > > > > > management
> > > > > > > +        */
> > > > > > > +       void (*lock)(struct xe_tlb_inval *tlb_inval);
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +        * @unlock: Lock resources protecting the backend
> > > > > > > seqno
> > > > > > > management
> > > > > > > +        */
> > > > > > > +       void (*unlock)(struct xe_tlb_inval *tlb_inval);
> > > > > > > +};
> > > > > > > +
> > > > > > > +/** struct xe_tlb_inval - TLB invalidation client
> > > > > > > (frontend)
> > > > > > > */
> > > > > > >  struct xe_tlb_inval {
> > > > > > >         /** @private: Backend private pointer */
> > > > > > >         void *private;
> > > > > > > +       /** @xe: Pointer to Xe device */
> > > > > > > +       struct xe_device *xe;
> > > > > > > +       /** @ops: TLB invalidation ops */
> > > > > > > +       const struct xe_tlb_inval_ops *ops;
> > > > > > >         /** @tlb_inval.seqno: TLB invalidation seqno,
> > > > > > > protected
> > > > > > > by CT lock */
> > > > > > >  #define TLB_INVALIDATION_SEQNO_MAX     0x100000
> > > > > > >         int seqno;
> > > > > > > -- 
> > > > > > > 2.34.1
> > > > > > > 
> > > > 
> > 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend
  2025-07-23 22:03               ` Summers, Stuart
  2025-07-23 22:43                 ` Summers, Stuart
@ 2025-07-23 23:21                 ` Matthew Brost
  2025-07-23 23:46                   ` Summers, Stuart
  1 sibling, 1 reply; 19+ messages in thread
From: Matthew Brost @ 2025-07-23 23:21 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
	Kassabri, Farah, Auld, Matthew

On Wed, Jul 23, 2025 at 04:03:12PM -0600, Summers, Stuart wrote:
> On Wed, 2025-07-23 at 14:22 -0700, Matthew Brost wrote:
> > On Wed, Jul 23, 2025 at 02:55:24PM -0600, Summers, Stuart wrote:
> > > On Wed, 2025-07-23 at 13:47 -0700, Matthew Brost wrote:
> > > > 
> > > 
> > > <cut>
> > > (just to reduce the noise in the rest of the patch here for now...)
> > > 
> > > > > > > >  
> > > > > > > >  /**
> > > > > > > > - * xe_tlb_inval_reset - Initialize TLB invalidation
> > > > > > > > reset
> > > > > > > > + * xe_tlb_inval_reset() - TLB invalidation reset
> > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > >   *
> > > > > > > >   * Signal any pending invalidation fences, should be
> > > > > > > > called
> > > > > > > > during a GT reset
> > > > > > > >   */
> > > > > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval)
> > > > > > > >  {
> > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > > > > >         int pending_seqno;
> > > > > > > >  
> > > > > > > >         /*
> > > > > > > > -        * we can get here before the CTs are even
> > > > > > > > initialized if
> > > > > > > > we're wedging
> > > > > > > > -        * very early, in which case there are not going
> > > > > > > > to
> > > > > > > > be
> > > > > > > > any pending
> > > > > > > > -        * fences so we can bail immediately.
> > > > > > > > +        * we can get here before the backends are even
> > > > > > > > initialized if we're
> > > > > > > > +        * wedging very early, in which case there are
> > > > > > > > not
> > > > > > > > going
> > > > > > > > to be any
> > > > > > > > +        * pendind fences so we can bail immediately.
> > > > > > > >          */
> > > > > > > > -       if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
> > > > > > > > +       if (!tlb_inval->ops->initialized(tlb_inval))
> > > > > > > >                 return;
> > > > > > > >  
> > > > > > > >         /*
> > > > > > > > -        * CT channel is already disabled at this point.
> > > > > > > > No
> > > > > > > > new
> > > > > > > > TLB requests can
> > > > > > > > +        * Backend is already disabled at this point. No
> > > > > > > > new
> > > > > > > > TLB
> > > > > > > > requests can
> > > > > > > >          * appear.
> > > > > > > >          */
> > > > > > > >  
> > > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > > -       spin_lock_irq(&gt->tlb_inval.pending_lock);
> > > > > > > > -       cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > > > > > > > +       tlb_inval->ops->lock(tlb_inval);
> > > > > > > 
> > > > > > > I think you want a dedicated lock embedded in struct
> > > > > > > xe_tlb_inval,
> > > > > > > rather than reaching into the backend to grab one.
> > > > > > > 
> > > > > > > This will deadlock as written: G2H TLB inval messages are
> > > > > > > sometimes
> > > > > > > processed while holding ct->lock (non-fast path, unlikely)
> > > > > > > and
> > > > > > > sometimes
> > > > > > > without it (fast path, likely).
> > > > > > 
> > > > > > Ugh, I'm off today. Ignore the deadlock part, I was confusing
> > > > > > myself...
> > > > > > I was thinking this was the function
> > > > > > xe_tlb_inval_done_handler,
> > > > > > it is
> > > > > > not. I still think xe_tlb_inval should its own lock but this
> > > > > > patch
> > > > > > written should work with
> > > > > > s/xe_guc_ct_send/xe_guc_ct_send_locked.
> > > > > 
> > > > > So one reason I didn't go that way is we did just the reverse
> > > > > recently
> > > > > - moved from a TLB dedicated lock to the more specific CT lock
> > > > > since
> > > > > these are all going into the CT handler anyway when we use GuC
> > > > > submission. Then this embedded version allows us to lock at the
> > > > > bottom
> > > > > data layer rather than having a separate lock in the upper
> > > > > layer.
> > > > > Another thing is we might want to have different types of
> > > > > invalidation
> > > > > running in parallel without locking the data in the upper layer
> > > > > since
> > > > > the real contention would be in the lower level pipelining
> > > > > anyway.
> > > > > 
> > > > 
> > > > I can see the reasoning behind this approach, and maybe it’s
> > > > fine.
> > > > 
> > > > But consider the case where the GuC backend has to look up a VM,
> > > > iterate
> > > > over a list of exec queues, and send multiple H2Gs to the
> > > > hardware,
> > > > each
> > > > with a corresponding G2H (per-context invalidations). In the
> > > > worst
> > > > case,
> > > > the CT code may have to wait for and process some G2Hs because
> > > > our
> > > > G2H
> > > > credits are exhausted—all while holding the CT lock, which
> > > > currently
> > > > blocks any hardware submissions (i.e., hardware submissions need
> > > > the
> > > > CT
> > > > lock). Now imagine multiple sources issuing invalidations: they
> > > > could
> > > > grab the CT lock before a submission waiting on it, further
> > > > delaying
> > > > that
> > > > submission. 
> > > > 
> > > > The longer a mutex is held, the more likely the CPU thread
> > > > holding it
> > > > could switched out while holding it.
> > > > 
> > > > This doesn’t seem scalable compared to using a finer-grained CT
> > > > lock
> > > > (e.g., only taking it in xe_guc_ct_send).
> > > > 
> > > > I’m not saying this won’t work as you have it—I think it will—but
> > > > the
> > > > consequences of holding the CT lock for an extended period need
> > > > to be
> > > > considered.
> > > 
> > > Couple more thoughts.. so in the case you mentioned, ideally I'd
> > > like
> > > to have just a single invalidation per request, rather than across
> > > a
> > > whole VM. That's the reason we have the range based invalidation to
> > 
> > Yes, this is ranged based.
> > 
> > > begin with. If we get to the point where we want to make that even
> > > finer, that's great, but we should still just have a single
> > > invalidation per request (again, ideally).
> > > 
> > 
> > Maybe you have a different idea, but I was thinking of queue-based
> > invalidations: the frontend assigns a single seqno, the backend
> > issues N
> > invalidations to the hardware—one per GCID mapped in the VM/GT
> > tuple—and
> > then signals the frontend when all invalidations associated with the
> > seqno are complete. With the GuC, a GCID corresponds to each exec
> > queue’s
> > gucid mapped in the VM/GT tuple. Different backends can handle this
> > differently.
> > 
> > > Also, you already have some patches up on the list that do some
> > > coalescing of invalidations so we reduce the number of
> > > invalidations
> > > for multiple ranges. I didn't want to include those patches here
> > > because IMO they are really a separate feature here and it'd be
> > > nice to
> > > review that on its own.
> > > 
> > 
> > I agree it is a seperate thing, that should help in some cases, and
> > should be reviewed on its own.
> > 
> > That doesn't help in the case of multiple VM's issuing invalidations
> > though (think eviction is occuring or MMU notifiers are firing). The
> > lock contenion is moved from a dedicated TLB invalidation lock, to a
> > widely shared CT lock. If multiple TLB invalidations are contending,
> > now
> > all other users of the CT lock contend at this higher level. i.e., by
> > only acquring CT lock at last part of an invalidation, other waiters
> > (non-invalidation) get QoS.
> 
> I mean, this was the original reason I had understood for having the
> separate lock in the first place. But it feels a little like we're
> running in circles here moving between the two modes..
> 

We might be getting a little side tracked but let me give a quick
example of the contention with CT lock vs. a dedicated lock.

- VM[0] has N queues attached to it
- VM[1] has M queues attached to it
- Q[0], mapped in a different VM[0], VM[1] 

In very short period of this occurs...

1. VM[0] issues an invalidation
2. VM[1] issues an invalidation
3. Q[0] does a submission

With a CT lock, thia is going to be the order of the H2G:
VM[0] - Invalidation[0]
...
VM[0] - Invalidation[N-1]
VM[1] - Invalidation[0]
...
VM[1] - Invalidation[M-1]
Q[0] - Submit

With a dedicated lock:
VM[0] - Invalidation[0]
Q[0] - Submit (this could actually first or a little later depending exact timing)
...
VM[0] - Invalidation[N-1]
VM[1] - Invalidation[0]
...
VM[1] - Invalidation[M-1]

The more pathological case—many VMs doing things like freeing memory
(e.g., a user-space free with SVM triggers an invalidation)—could
severely hurt QoS for submissions. I'm pretty sure we could craft a test
case to demonstrate this. Is it likely to be common? No. But that
doesn’t mean, as we rewrite this code, we shouldn’t account for the
worst cases and design our locking accordingly.

Here’s a quick list of common places where the CT lock is required:

- User submissions
- Binds (although this is likely to use the only CPU as some point)
- Memory allocations (clear jobs)
- SVM copys (both GPU and CPU faults)
- BO eviction (copy jobs)
- Prefetches (i.e., KMD triggered migration)
- In place memory decompression
- GPU page fault service ack
- Exec queue destory
- Preempt fences and resume

Again, if multiple TLB invalidations stack up, QoS for all of the above
operations could be denied.

> I do see what you're saying though, basically the problem is the CT
> send routine right now is doing a busy wait for a reply from guc each
> time it sends something, all within the lock.
> 
>                 if (!wait_event_timeout(ct->wq, !ct->g2h_outstanding ||
>                                         g2h_avail(ct), HZ))
> 
> So if we're going to stick with this, yeah I agree we really need some

Invalidations are a very hot path, so we need to make sure they’re
implemented as optimally as possible—the original implementation was,
well, horrible. That one’s on me.

We've actually already fixed a decent amount of issues already but there
is more work to do. Good locking here will help too.

More ideas:

- Now that we have invalidations jobs, we can pipeline invalidations
  from BO moves into copy jobs

- Coalescing should help

- SVM garbage collector likely should batch together unbinds of ranges
  to avoid multiple TLB invalidations

- I think we issue too many GGTT invalidations (both on alloc and free),
  we should be able to rid of one of those

- Supress G2H ack on TLB invalidations we don't care about (e.g., when
  issuing multiple queue based invalidations within a VM, we really only
  want on ack on the last one)

- If we are on native, maybe we don't even talk to the GuC and issue VM
  invalidations directly from the KMD (GGTT invalidations would always a
  H2G)

> kind of queuing if we're going to have a lot of these fine grained
> invalidations all in a row or we'll start blocking things like page
> fault replies.
> 
> I'm wondering if the better way to approach this though would be to
> refactor on the GuC side rather than do something really complicated on

Here, I'm not suggesting anything complicated, just a dedicated lock.

Some of suggestions above would get slightly more complicated but if
everything is layered right, it actually shouldn't be all that bad as
we'd just be modifying individual components in each case.

> the TLB side. I.e. why can't we do the CT busy wait in a worker thread
> and let the send thread keep going adding more and more? It would mean
> we'd have to do a better job of tracking each unique request out to guc
> rather than just relying on the current g2h_outstanding count, but it
> would at least let us do some of this work in parallel.
> 
> The queueing mechanism is still going to take work on top of what we
> have in this series to build up these chains of h2g messages with the
> CT lock held only for that last one. And IMO it still will be a little
> messy calling into the lower layer (guc) and back out to the upper
> layer (tlb) and back again to build these queues. And I'm not sure how
> great that will work if we move to a different back end than guc - we
> might not get any benefit there after all this work on the guc side.
> 
> Let me know what you think about a CT refactor like what I said.
> 

I'm not really following the above, but the TL;DR of how we wait for G2H
space under the CT lock is this: the CT is a closed loop—you can’t send
an H2G unless there’s space to land the G2H—so you hold the CT lock,
wait, and make space for yourself as you’re next in line for service.

> And I still do think we can do a better job reducing the scope of some
> of these invalidations, particularly in a case where we weanted to
> associate something like the guc id with the VM to build a range rather
> than just the addresses within the VM. At least in that case we can

I'm not really following this either—are you proposing an H2G where we
pass a list of gucids, or that we tell the GuC which gucids are
associated with a VM? I think either is worth exploring. For the latter,
I don’t see why we even need the whole queue-based invalidation when the
GuC could build a hash table keyed by the ASID and find everything it
needs to issue the invalidation(s). Maybe this doesn't scale with VFs?

Matt

> look a little longer term at something like the CT refactor and still
> keep the backend/frontend isolation intact.
> 
> Thanks,
> Stuart
> 
> > 
> > Matt
> >  
> > > So basically, the per request lock here also pushes us to implement
> > > in
> > > a more efficient and precise way rather than just hammering as many
> > > invalidations over a given range as possible.
> > > 
> > > And of course there are going to need to be bigger hammer
> > > invalidations
> > > sometimes (like the full VF invalidation we're doing in the
> > > invalidate_all() routines), but those still fall into the same
> > > category
> > > of precision, just with a larger scope (rather than multiple
> > > smaller
> > > invalidations).
> > > 
> > > Thanks,
> > > Stuart
> > > 
> > > > 
> > > > Matt
> > > > 
> > > > > Thanks,
> > > > > Stuart
> > > > > 
> > > > > > 
> > > > > > Matt 
> > > > > > 
> > > > > > > 
> > > > > > > I’d call this lock seqno_lock, since it protects exactly
> > > > > > > that—the
> > > > > > > order
> > > > > > > in which a seqno is assigned by the frontend and handed to
> > > > > > > the
> > > > > > > backend.
> > > > > > > 
> > > > > > > Prime this lock for reclaim as well—do what primelockdep()
> > > > > > > does
> > > > > > > in
> > > > > > > xe_guc_ct.c—to make it clear that memory allocations are
> > > > > > > not
> > > > > > > allowed
> > > > > > > while the lock is held as TLB invalidations can be called
> > > > > > > from
> > > > > > > two
> > > > > > > reclaim paths:
> > > > > > > 
> > > > > > > - MMU notifier callbacks
> > > > > > > - The dma-fence signaling path of VM binds that require a
> > > > > > > TLB
> > > > > > >   invalidation
> > > > > > > 
> > > > > > > > +       spin_lock_irq(&tlb_inval->pending_lock);
> > > > > > > > +       cancel_delayed_work(&tlb_inval->fence_tdr);
> > > > > > > >         /*
> > > > > > > >          * We might have various kworkers waiting for TLB
> > > > > > > > flushes
> > > > > > > > to complete
> > > > > > > >          * which are not tracked with an explicit TLB
> > > > > > > > fence,
> > > > > > > > however at this
> > > > > > > > -        * stage that will never happen since the CT is
> > > > > > > > already
> > > > > > > > disabled, so
> > > > > > > > -        * make sure we signal them here under the
> > > > > > > > assumption
> > > > > > > > that we have
> > > > > > > > +        * stage that will never happen since the backend
> > > > > > > > is
> > > > > > > > already disabled,
> > > > > > > > +        * so make sure we signal them here under the
> > > > > > > > assumption
> > > > > > > > that we have
> > > > > > > >          * completed a full GT reset.
> > > > > > > >          */
> > > > > > > > -       if (gt->tlb_inval.seqno == 1)
> > > > > > > > +       if (tlb_inval->seqno == 1)
> > > > > > > >                 pending_seqno =
> > > > > > > > TLB_INVALIDATION_SEQNO_MAX -
> > > > > > > > 1;
> > > > > > > >         else
> > > > > > > > -               pending_seqno = gt->tlb_inval.seqno - 1;
> > > > > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv,
> > > > > > > > pending_seqno);
> > > > > > > > +               pending_seqno = tlb_inval->seqno - 1;
> > > > > > > > +       WRITE_ONCE(tlb_inval->seqno_recv, pending_seqno);
> > > > > > > >  
> > > > > > > >         list_for_each_entry_safe(fence, next,
> > > > > > > > -                                &gt-
> > > > > > > > > tlb_inval.pending_fences,
> > > > > > > > link)
> > > > > > > > -               inval_fence_signal(gt_to_xe(gt), fence);
> > > > > > > > -       spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > > +                                &tlb_inval-
> > > > > > > > >pending_fences,
> > > > > > > > link)
> > > > > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > > > > +       spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > > > > +       tlb_inval->ops->unlock(tlb_inval);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > -static bool tlb_inval_seqno_past(struct xe_gt *gt, int
> > > > > > > > seqno)
> > > > > > > > +static bool xe_tlb_inval_seqno_past(struct xe_tlb_inval
> > > > > > > > *tlb_inval, int seqno)
> > > > > > > >  {
> > > > > > > > -       int seqno_recv = READ_ONCE(gt-
> > > > > > > > >tlb_inval.seqno_recv);
> > > > > > > > +       int seqno_recv = READ_ONCE(tlb_inval-
> > > > > > > > >seqno_recv);
> > > > > > > > +
> > > > > > > > +       lockdep_assert_held(&tlb_inval->pending_lock);
> > > > > > > >  
> > > > > > > >         if (seqno - seqno_recv < -
> > > > > > > > (TLB_INVALIDATION_SEQNO_MAX
> > > > > > > > /
> > > > > > > > 2))
> > > > > > > >                 return false;
> > > > > > > > @@ -201,44 +192,20 @@ static bool
> > > > > > > > tlb_inval_seqno_past(struct
> > > > > > > > xe_gt *gt, int seqno)
> > > > > > > >         return seqno_recv >= seqno;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > -static int send_tlb_inval(struct xe_guc *guc, const u32
> > > > > > > > *action,
> > > > > > > > int len)
> > > > > > > > -{
> > > > > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > > > > -
> > > > > > > > -       xe_gt_assert(gt, action[1]);    /* Seqno */
> > > > > > > > -       lockdep_assert_held(&guc->ct.lock);
> > > > > > > > -
> > > > > > > > -       /*
> > > > > > > > -        * XXX: The seqno algorithm relies on TLB
> > > > > > > > invalidation
> > > > > > > > being processed
> > > > > > > > -        * in order which they currently are, if that
> > > > > > > > changes
> > > > > > > > the
> > > > > > > > algorithm will
> > > > > > > > -        * need to be updated.
> > > > > > > > -        */
> > > > > > > > -
> > > > > > > > -       xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL,
> > > > > > > > 1);
> > > > > > > > -
> > > > > > > > -       return xe_guc_ct_send(&guc->ct, action, len,
> > > > > > > > -                             G2H_LEN_DW_TLB_INVALIDATE,
> > > > > > > > 1);
> > > > > > > > -}
> > > > > > > > -
> > > > > > > >  static void xe_tlb_inval_fence_prep(struct
> > > > > > > > xe_tlb_inval_fence
> > > > > > > > *fence)
> > > > > > > >  {
> > > > > > > >         struct xe_tlb_inval *tlb_inval = fence-
> > > > > > > > >tlb_inval;
> > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > > -
> > > > > > > > -       lockdep_assert_held(&gt->uc.guc.ct.lock);
> > > > > > > >  
> > > > > > > >         fence->seqno = tlb_inval->seqno;
> > > > > > > > -       trace_xe_tlb_inval_fence_send(xe, fence);
> > > > > > > > +       trace_xe_tlb_inval_fence_send(tlb_inval->xe,
> > > > > > > > fence);
> > > > > > > >  
> > > > > > > >         spin_lock_irq(&tlb_inval->pending_lock);
> > > > > > > >         fence->inval_time = ktime_get();
> > > > > > > >         list_add_tail(&fence->link, &tlb_inval-
> > > > > > > > > pending_fences);
> > > > > > > >  
> > > > > > > >         if (list_is_singular(&tlb_inval->pending_fences))
> > > > > > > > -               queue_delayed_work(system_wq,
> > > > > > > > -                                  &tlb_inval->fence_tdr,
> > > > > > > > -                                 
> > > > > > > > tlb_timeout_jiffies(gt));
> > > > > > > > +               queue_delayed_work(system_wq, &tlb_inval-
> > > > > > > > > fence_tdr,
> > > > > > > > +                                  tlb_inval->ops-
> > > > > > > > > timeout_delay(tlb_inval));
> > > > > > > >         spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > > > >  
> > > > > > > >         tlb_inval->seqno = (tlb_inval->seqno + 1) %
> > > > > > > > @@ -247,202 +214,63 @@ static void
> > > > > > > > xe_tlb_inval_fence_prep(struct
> > > > > > > > xe_tlb_inval_fence *fence)
> > > > > > > >                 tlb_inval->seqno = 1;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > -#define MAKE_INVAL_OP(type)    ((type <<
> > > > > > > > XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > > > > > > > -               XE_GUC_TLB_INVAL_MODE_HEAVY <<
> > > > > > > > XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > > > > > > > -               XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > > > > > > > -
> > > > > > > > -static int send_tlb_inval_ggtt(struct xe_gt *gt, int
> > > > > > > > seqno)
> > > > > > > > -{
> > > > > > > > -       u32 action[] = {
> > > > > > > > -               XE_GUC_ACTION_TLB_INVALIDATION,
> > > > > > > > -               seqno,
> > > > > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > > > > > > > -       };
> > > > > > > > -
> > > > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > > > ARRAY_SIZE(action));
> > > > > > > > -}
> > > > > > > > -
> > > > > > > > -static int send_tlb_inval_all(struct xe_tlb_inval
> > > > > > > > *tlb_inval,
> > > > > > > > -                             struct xe_tlb_inval_fence
> > > > > > > > *fence)
> > > > > > > > -{
> > > > > > > > -       u32 action[] = {
> > > > > > > > -               XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > > > > > > > -               0,  /* seqno, replaced in send_tlb_inval
> > > > > > > > */
> > > > > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > > > > > > -       };
> > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > -
> > > > > > > > -       xe_gt_assert(gt, fence);
> > > > > > > > -
> > > > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > > > ARRAY_SIZE(action));
> > > > > > > > -}
> > > > > > > > +#define xe_tlb_inval_issue(__tlb_inval, __fence, op,
> > > > > > > > args...)  \
> > > > > > > > +({                                                      
> > > > > > > >     
> > > > > > > >    \
> > > > > > > > +       int
> > > > > > > > __ret;                                              \
> > > > > > > > +                                                        
> > > > > > > >     
> > > > > > > >    \
> > > > > > > > +       xe_assert((__tlb_inval)->xe, (__tlb_inval)-
> > > > > > > > > ops);       \
> > > > > > > > +       xe_assert((__tlb_inval)->xe,
> > > > > > > > (__fence));                \
> > > > > > > > +                                                        
> > > > > > > >     
> > > > > > > >    \
> > > > > > > > +       (__tlb_inval)->ops-
> > > > > > > > > lock((__tlb_inval));                \
> > > > > > > > +       xe_tlb_inval_fence_prep((__fence));              
> > > > > > > >     
> > > > > > > >    \
> > > > > > > > +       __ret = op((__tlb_inval), (__fence)->seqno,
> > > > > > > > ##args);    \
> > > > > > > > +       if (__ret <
> > > > > > > > 0)                                          \
> > > > > > > > +               xe_tlb_inval_fence_signal_unlocked((__fen
> > > > > > > > ce))
> > > > > > > > ;  \
> > > > > > > > +       (__tlb_inval)->ops-
> > > > > > > > > unlock((__tlb_inval));              \
> > > > > > > > +                                                        
> > > > > > > >     
> > > > > > > >    \
> > > > > > > > +       __ret == -ECANCELED ? 0 :
> > > > > > > > __ret;                        \
> > > > > > > > +})
> > > > > > > >  
> > > > > > > >  /**
> > > > > > > > - * xe_gt_tlb_invalidation_all - Invalidate all TLBs
> > > > > > > > across
> > > > > > > > PF
> > > > > > > > and all VFs.
> > > > > > > > - * @gt: the &xe_gt structure
> > > > > > > > - * @fence: the &xe_tlb_inval_fence to be signaled on
> > > > > > > > completion
> > > > > > > > + * xe_tlb_inval_all() - Issue a TLB invalidation for all
> > > > > > > > TLBs
> > > > > > > > + * @tlb_inval: TLB invalidation client
> > > > > > > > + * @fence: invalidation fence which will be signal on
> > > > > > > > TLB
> > > > > > > > invalidation
> > > > > > > > + * completion
> > > > > > > >   *
> > > > > > > > - * Send a request to invalidate all TLBs across PF and
> > > > > > > > all
> > > > > > > > VFs.
> > > > > > > > + * Issue a TLB invalidation for all TLBs. Completion of
> > > > > > > > TLB
> > > > > > > > is
> > > > > > > > asynchronous and
> > > > > > > > + * caller can use the invalidation fence to wait for
> > > > > > > > completion.
> > > > > > > >   *
> > > > > > > >   * Return: 0 on success, negative error code on error
> > > > > > > >   */
> > > > > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > > > > >                      struct xe_tlb_inval_fence *fence)
> > > > > > > >  {
> > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > -       int err;
> > > > > > > > -
> > > > > > > > -       err = send_tlb_inval_all(tlb_inval, fence);
> > > > > > > > -       if (err)
> > > > > > > > -               xe_gt_err(gt, "TLB invalidation request
> > > > > > > > failed
> > > > > > > > (%pe)", ERR_PTR(err));
> > > > > > > > -
> > > > > > > > -       return err;
> > > > > > > > -}
> > > > > > > > -
> > > > > > > > -/*
> > > > > > > > - * Ensure that roundup_pow_of_two(length) doesn't
> > > > > > > > overflow.
> > > > > > > > - * Note that roundup_pow_of_two() operates on unsigned
> > > > > > > > long,
> > > > > > > > - * not on u64.
> > > > > > > > - */
> > > > > > > > -#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > > > > > > > (rounddown_pow_of_two(ULONG_MAX))
> > > > > > > > -
> > > > > > > > -static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64
> > > > > > > > start,
> > > > > > > > u64
> > > > > > > > end,
> > > > > > > > -                               u32 asid, int seqno)
> > > > > > > > -{
> > > > > > > > -#define MAX_TLB_INVALIDATION_LEN       7
> > > > > > > > -       u32 action[MAX_TLB_INVALIDATION_LEN];
> > > > > > > > -       u64 length = end - start;
> > > > > > > > -       int len = 0;
> > > > > > > > -
> > > > > > > > -       action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > > > > > > > -       action[len++] = seqno;
> > > > > > > > -       if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > > > > > > -           length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > > > > > > > -               action[len++] =
> > > > > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > > > > > > > -       } else {
> > > > > > > > -               u64 orig_start = start;
> > > > > > > > -               u64 align;
> > > > > > > > -
> > > > > > > > -               if (length < SZ_4K)
> > > > > > > > -                       length = SZ_4K;
> > > > > > > > -
> > > > > > > > -               /*
> > > > > > > > -                * We need to invalidate a higher
> > > > > > > > granularity
> > > > > > > > if
> > > > > > > > start address
> > > > > > > > -                * is not aligned to length. When start
> > > > > > > > is
> > > > > > > > not
> > > > > > > > aligned with
> > > > > > > > -                * length we need to find the length
> > > > > > > > large
> > > > > > > > enough
> > > > > > > > to create an
> > > > > > > > -                * address mask covering the required
> > > > > > > > range.
> > > > > > > > -                */
> > > > > > > > -               align = roundup_pow_of_two(length);
> > > > > > > > -               start = ALIGN_DOWN(start, align);
> > > > > > > > -               end = ALIGN(end, align);
> > > > > > > > -               length = align;
> > > > > > > > -               while (start + length < end) {
> > > > > > > > -                       length <<= 1;
> > > > > > > > -                       start = ALIGN_DOWN(orig_start,
> > > > > > > > length);
> > > > > > > > -               }
> > > > > > > > -
> > > > > > > > -               /*
> > > > > > > > -                * Minimum invalidation size for a 2MB
> > > > > > > > page
> > > > > > > > that
> > > > > > > > the hardware
> > > > > > > > -                * expects is 16MB
> > > > > > > > -                */
> > > > > > > > -               if (length >= SZ_2M) {
> > > > > > > > -                       length = max_t(u64, SZ_16M,
> > > > > > > > length);
> > > > > > > > -                       start = ALIGN_DOWN(orig_start,
> > > > > > > > length);
> > > > > > > > -               }
> > > > > > > > -
> > > > > > > > -               xe_gt_assert(gt, length >= SZ_4K);
> > > > > > > > -               xe_gt_assert(gt, is_power_of_2(length));
> > > > > > > > -               xe_gt_assert(gt, !(length &
> > > > > > > > GENMASK(ilog2(SZ_16M)
> > > > > > > > - 1,
> > > > > > > > -                                                  
> > > > > > > > ilog2(SZ_2M)
> > > > > > > > + 1)));
> > > > > > > > -               xe_gt_assert(gt, IS_ALIGNED(start,
> > > > > > > > length));
> > > > > > > > -
> > > > > > > > -               action[len++] =
> > > > > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > > > > > > > -               action[len++] = asid;
> > > > > > > > -               action[len++] = lower_32_bits(start);
> > > > > > > > -               action[len++] = upper_32_bits(start);
> > > > > > > > -               action[len++] = ilog2(length) -
> > > > > > > > ilog2(SZ_4K);
> > > > > > > > -       }
> > > > > > > > -
> > > > > > > > -       xe_gt_assert(gt, len <=
> > > > > > > > MAX_TLB_INVALIDATION_LEN);
> > > > > > > > -
> > > > > > > > -       return send_tlb_inval(&gt->uc.guc, action, len);
> > > > > > > > -}
> > > > > > > > -
> > > > > > > > -static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
> > > > > > > > -                              struct xe_tlb_inval_fence
> > > > > > > > *fence)
> > > > > > > > -{
> > > > > > > > -       int ret;
> > > > > > > > -
> > > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > > -
> > > > > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > > > > -
> > > > > > > > -       ret = send_tlb_inval_ggtt(gt, fence->seqno);
> > > > > > > > -       if (ret < 0)
> > > > > > > > -               inval_fence_signal_unlocked(gt_to_xe(gt),
> > > > > > > > fence);
> > > > > > > > -
> > > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > > -
> > > > > > > > -       /*
> > > > > > > > -        * -ECANCELED indicates the CT is stopped for a
> > > > > > > > GT
> > > > > > > > reset.
> > > > > > > > TLB caches
> > > > > > > > -        *  should be nuked on a GT reset so this error
> > > > > > > > can
> > > > > > > > be
> > > > > > > > ignored.
> > > > > > > > -        */
> > > > > > > > -       if (ret == -ECANCELED)
> > > > > > > > -               return 0;
> > > > > > > > -
> > > > > > > > -       return ret;
> > > > > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > > > > tlb_inval-
> > > > > > > > > ops->all);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  /**
> > > > > > > > - * xe_tlb_inval_ggtt - Issue a TLB invalidation on this
> > > > > > > > GT
> > > > > > > > for
> > > > > > > > the GGTT
> > > > > > > > + * xe_tlb_inval_ggtt() - Issue a TLB invalidation for
> > > > > > > > the
> > > > > > > > GGTT
> > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > >   *
> > > > > > > > - * Issue a TLB invalidation for the GGTT. Completion of
> > > > > > > > TLB
> > > > > > > > invalidation is
> > > > > > > > - * synchronous.
> > > > > > > > + * Issue a TLB invalidation for the GGTT. Completion of
> > > > > > > > TLB
> > > > > > > > is
> > > > > > > > asynchronous and
> > > > > > > > + * caller can use the invalidation fence to wait for
> > > > > > > > completion.
> > > > > > > >   *
> > > > > > > >   * Return: 0 on success, negative error code on error
> > > > > > > >   */
> > > > > > > >  int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
> > > > > > > >  {
> > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > > -       unsigned int fw_ref;
> > > > > > > > -
> > > > > > > > -       if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
> > > > > > > > -           gt->uc.guc.submission_state.enabled) {
> > > > > > > > -               struct xe_tlb_inval_fence fence;
> > > > > > > > -               int ret;
> > > > > > > > -
> > > > > > > > -               xe_tlb_inval_fence_init(tlb_inval,
> > > > > > > > &fence,
> > > > > > > > true);
> > > > > > > > -               ret = __xe_tlb_inval_ggtt(gt, &fence);
> > > > > > > > -               if (ret)
> > > > > > > > -                       return ret;
> > > > > > > > -
> > > > > > > > -               xe_tlb_inval_fence_wait(&fence);
> > > > > > > > -       } else if (xe_device_uc_enabled(xe) &&
> > > > > > > > !xe_device_wedged(xe)) {
> > > > > > > > -               struct xe_mmio *mmio = &gt->mmio;
> > > > > > > > -
> > > > > > > > -               if (IS_SRIOV_VF(xe))
> > > > > > > > -                       return 0;
> > > > > > > > -
> > > > > > > > -               fw_ref = xe_force_wake_get(gt_to_fw(gt),
> > > > > > > > XE_FW_GT);
> > > > > > > > -               if (xe->info.platform == XE_PVC ||
> > > > > > > > GRAPHICS_VER(xe) >= 20) {
> > > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > > PVC_GUC_TLB_INV_DESC1,
> > > > > > > > -
> > > > > > > >                                        PVC_GUC_TLB_INV_DE
> > > > > > > > SC1_
> > > > > > > > INVAL
> > > > > > > > IDATE);
> > > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > > PVC_GUC_TLB_INV_DESC0,
> > > > > > > > -
> > > > > > > >                                        PVC_GUC_TLB_INV_DE
> > > > > > > > SC0_
> > > > > > > > VALID
> > > > > > > > );
> > > > > > > > -               } else {
> > > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > > GUC_TLB_INV_CR,
> > > > > > > > -
> > > > > > > >                                        GUC_TLB_INV_CR_INV
> > > > > > > > ALID
> > > > > > > > ATE);
> > > > > > > > -               }
> > > > > > > > -               xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > > > > > > > -       }
> > > > > > > > +       struct xe_tlb_inval_fence fence, *fence_ptr =
> > > > > > > > &fence;
> > > > > > > > +       int ret;
> > > > > > > >  
> > > > > > > > -       return 0;
> > > > > > > > +       xe_tlb_inval_fence_init(tlb_inval, fence_ptr,
> > > > > > > > true);
> > > > > > > > +       ret = xe_tlb_inval_issue(tlb_inval, fence_ptr,
> > > > > > > > tlb_inval-
> > > > > > > > > ops->ggtt);
> > > > > > > > +       xe_tlb_inval_fence_wait(fence_ptr);
> > > > > > > > +
> > > > > > > > +       return ret;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  /**
> > > > > > > > - * xe_tlb_inval_range - Issue a TLB invalidation on this
> > > > > > > > GT
> > > > > > > > for
> > > > > > > > an address range
> > > > > > > > + * xe_tlb_inval_range() - Issue a TLB invalidation for
> > > > > > > > an
> > > > > > > > address range
> > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > >   * @fence: invalidation fence which will be signal on
> > > > > > > > TLB
> > > > > > > > invalidation
> > > > > > > >   * completion
> > > > > > > > @@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct
> > > > > > > > xe_tlb_inval
> > > > > > > > *tlb_inval,
> > > > > > > >                        struct xe_tlb_inval_fence *fence,
> > > > > > > > u64
> > > > > > > > start, u64 end,
> > > > > > > >                        u32 asid)
> > > > > > > >  {
> > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > > -       int  ret;
> > > > > > > > -
> > > > > > > > -       xe_gt_assert(gt, fence);
> > > > > > > > -
> > > > > > > > -       /* Execlists not supported */
> > > > > > > > -       if (xe->info.force_execlist) {
> > > > > > > > -               __inval_fence_signal(xe, fence);
> > > > > > > > -               return 0;
> > > > > > > > -       }
> > > > > > > > -
> > > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > > -
> > > > > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > > > > -
> > > > > > > > -       ret = send_tlb_inval_ppgtt(gt, start, end, asid,
> > > > > > > > fence-
> > > > > > > > > seqno);
> > > > > > > > -       if (ret < 0)
> > > > > > > > -               inval_fence_signal_unlocked(xe, fence);
> > > > > > > > -
> > > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > > -
> > > > > > > > -       return ret;
> > > > > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > > > > tlb_inval-
> > > > > > > > > ops->ppgtt,
> > > > > > > > +                                 start, end, asid);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  /**
> > > > > > > > - * xe_tlb_inval_vm - Issue a TLB invalidation on this GT
> > > > > > > > for
> > > > > > > > a
> > > > > > > > VM
> > > > > > > > + * xe_tlb_inval_vm() - Issue a TLB invalidation for a VM
> > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > >   * @vm: VM to invalidate
> > > > > > > >   *
> > > > > > > > @@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct
> > > > > > > > xe_tlb_inval
> > > > > > > > *tlb_inval, struct xe_vm *vm)
> > > > > > > >  {
> > > > > > > >         struct xe_tlb_inval_fence fence;
> > > > > > > >         u64 range = 1ull << vm->xe->info.va_bits;
> > > > > > > > -       int ret;
> > > > > > > >  
> > > > > > > >         xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> > > > > > > > -
> > > > > > > > -       ret = xe_tlb_inval_range(tlb_inval, &fence, 0,
> > > > > > > > range,
> > > > > > > > vm-
> > > > > > > > > usm.asid);
> > > > > > > > -       if (ret < 0)
> > > > > > > > -               return;
> > > > > > > > -
> > > > > > > > +       xe_tlb_inval_range(tlb_inval, &fence, 0, range,
> > > > > > > > vm-
> > > > > > > > > usm.asid);
> > > > > > > >         xe_tlb_inval_fence_wait(&fence);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  /**
> > > > > > > > - * xe_tlb_inval_done_handler - TLB invalidation done
> > > > > > > > handler
> > > > > > > > - * @gt: gt
> > > > > > > > + * xe_tlb_inval_done_handler() - TLB invalidation done
> > > > > > > > handler
> > > > > > > > + * @tlb_inval: TLB invalidation client
> > > > > > > >   * @seqno: seqno of invalidation that is done
> > > > > > > >   *
> > > > > > > >   * Update recv seqno, signal any TLB invalidation
> > > > > > > > fences,
> > > > > > > > and
> > > > > > > > restart TDR
> > > > > > > 
> > > > > > > I'd mention that is function is safe be called from any
> > > > > > > context
> > > > > > > (i.e.,
> > > > > > > process, atomic, and hardirq contexts are allowed).
> > > > > > > 
> > > > > > > We might need to convert tlb_inval.pending_lock to a
> > > > > > > raw_spinlock_t
> > > > > > > for
> > > > > > > PREEMPT_RT enablement. Same for the GuC fast_lock. AFAIK we
> > > > > > > haven’t
> > > > > > > had
> > > > > > > any complaints, so maybe I’m just overthinking it, but also
> > > > > > > perhaps
> > > > > > > not.
> > > > > > > 
> > > > > > > >   */
> > > > > > > > -static void xe_tlb_inval_done_handler(struct xe_gt *gt,
> > > > > > > > int
> > > > > > > > seqno)
> > > > > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > > > > *tlb_inval,
> > > > > > > > int seqno)
> > > > > > > >  {
> > > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > > +       struct xe_device *xe = tlb_inval->xe;
> > > > > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > > > > >         unsigned long flags;
> > > > > > > >  
> > > > > > > > @@ -535,77 +337,53 @@ static void
> > > > > > > > xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> > > > > > > >          * officially process the CT message like if
> > > > > > > > racing
> > > > > > > > against
> > > > > > > >          * process_g2h_msg().
> > > > > > > >          */
> > > > > > > > -       spin_lock_irqsave(&gt->tlb_inval.pending_lock,
> > > > > > > > flags);
> > > > > > > > -       if (tlb_inval_seqno_past(gt, seqno)) {
> > > > > > > > -               spin_unlock_irqrestore(&gt-
> > > > > > > > > tlb_inval.pending_lock, flags);
> > > > > > > > +       spin_lock_irqsave(&tlb_inval->pending_lock,
> > > > > > > > flags);
> > > > > > > > +       if (xe_tlb_inval_seqno_past(tlb_inval, seqno)) {
> > > > > > > > +               spin_unlock_irqrestore(&tlb_inval-
> > > > > > > > > pending_lock,
> > > > > > > > flags);
> > > > > > > >                 return;
> > > > > > > >         }
> > > > > > > >  
> > > > > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
> > > > > > > > +       WRITE_ONCE(tlb_inval->seqno_recv, seqno);
> > > > > > > >  
> > > > > > > >         list_for_each_entry_safe(fence, next,
> > > > > > > > -                                &gt-
> > > > > > > > > tlb_inval.pending_fences,
> > > > > > > > link) {
> > > > > > > > +                                &tlb_inval-
> > > > > > > > >pending_fences,
> > > > > > > > link) {
> > > > > > > >                 trace_xe_tlb_inval_fence_recv(xe, fence);
> > > > > > > >  
> > > > > > > > -               if (!tlb_inval_seqno_past(gt, fence-
> > > > > > > > >seqno))
> > > > > > > > +               if (!xe_tlb_inval_seqno_past(tlb_inval,
> > > > > > > > fence-
> > > > > > > > > seqno))
> > > > > > > >                         break;
> > > > > > > >  
> > > > > > > > -               inval_fence_signal(xe, fence);
> > > > > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > > > >         }
> > > > > > > >  
> > > > > > > > -       if (!list_empty(&gt->tlb_inval.pending_fences))
> > > > > > > > +       if (!list_empty(&tlb_inval->pending_fences))
> > > > > > > >                 mod_delayed_work(system_wq,
> > > > > > > > -                                &gt-
> > > > > > > > >tlb_inval.fence_tdr,
> > > > > > > > -                               
> > > > > > > > tlb_timeout_jiffies(gt));
> > > > > > > > +                                &tlb_inval->fence_tdr,
> > > > > > > > +                                tlb_inval->ops-
> > > > > > > > > timeout_delay(tlb_inval));
> > > > > > > >         else
> > > > > > > > -               cancel_delayed_work(&gt-
> > > > > > > > > tlb_inval.fence_tdr);
> > > > > > > > +               cancel_delayed_work(&tlb_inval-
> > > > > > > > >fence_tdr);
> > > > > > > >  
> > > > > > > > -       spin_unlock_irqrestore(&gt-
> > > > > > > > >tlb_inval.pending_lock,
> > > > > > > > flags);
> > > > > > > > -}
> > > > > > > > -
> > > > > > > > -/**
> > > > > > > > - * xe_guc_tlb_inval_done_handler - TLB invalidation done
> > > > > > > > handler
> > > > > > > > - * @guc: guc
> > > > > > > > - * @msg: message indicating TLB invalidation done
> > > > > > > > - * @len: length of message
> > > > > > > > - *
> > > > > > > > - * Parse seqno of TLB invalidation, wake any waiters for
> > > > > > > > seqno,
> > > > > > > > and signal any
> > > > > > > > - * invalidation fences for seqno. Algorithm for this
> > > > > > > > depends
> > > > > > > > on
> > > > > > > > seqno being
> > > > > > > > - * received in-order and asserts this assumption.
> > > > > > > > - *
> > > > > > > > - * Return: 0 on success, -EPROTO for malformed messages.
> > > > > > > > - */
> > > > > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc,
> > > > > > > > u32
> > > > > > > > *msg,
> > > > > > > > u32 len)
> > > > > > > > -{
> > > > > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > > > > -
> > > > > > > > -       if (unlikely(len != 1))
> > > > > > > > -               return -EPROTO;
> > > > > > > > -
> > > > > > > > -       xe_tlb_inval_done_handler(gt, msg[0]);
> > > > > > > > -
> > > > > > > > -       return 0;
> > > > > > > > +       spin_unlock_irqrestore(&tlb_inval->pending_lock,
> > > > > > > > flags);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  static const char *
> > > > > > > > -inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > > > > > > > +xe_inval_fence_get_driver_name(struct dma_fence
> > > > > > > > *dma_fence)
> > > > > > > >  {
> > > > > > > >         return "xe";
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  static const char *
> > > > > > > > -inval_fence_get_timeline_name(struct dma_fence
> > > > > > > > *dma_fence)
> > > > > > > > +xe_inval_fence_get_timeline_name(struct dma_fence
> > > > > > > > *dma_fence)
> > > > > > > >  {
> > > > > > > > -       return "inval_fence";
> > > > > > > > +       return "tlb_inval_fence";
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  static const struct dma_fence_ops inval_fence_ops = {
> > > > > > > > -       .get_driver_name = inval_fence_get_driver_name,
> > > > > > > > -       .get_timeline_name =
> > > > > > > > inval_fence_get_timeline_name,
> > > > > > > > +       .get_driver_name =
> > > > > > > > xe_inval_fence_get_driver_name,
> > > > > > > > +       .get_timeline_name =
> > > > > > > > xe_inval_fence_get_timeline_name,
> > > > > > > >  };
> > > > > > > >  
> > > > > > > >  /**
> > > > > > > > - * xe_tlb_inval_fence_init - Initialize TLB invalidation
> > > > > > > > fence
> > > > > > > > + * xe_tlb_inval_fence_init() - Initialize TLB
> > > > > > > > invalidation
> > > > > > > > fence
> > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > >   * @fence: TLB invalidation fence to initialize
> > > > > > > >   * @stack: fence is stack variable
> > > > > > > > @@ -618,15 +396,12 @@ void xe_tlb_inval_fence_init(struct
> > > > > > > > xe_tlb_inval *tlb_inval,
> > > > > > > >                              struct xe_tlb_inval_fence
> > > > > > > > *fence,
> > > > > > > >                              bool stack)
> > > > > > > >  {
> > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > -
> > > > > > > > -       xe_pm_runtime_get_noresume(gt_to_xe(gt));
> > > > > > > > +       xe_pm_runtime_get_noresume(tlb_inval->xe);
> > > > > > > >  
> > > > > > > > -       spin_lock_irq(&gt->tlb_inval.lock);
> > > > > > > > -       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > > > > > -                      &gt->tlb_inval.lock,
> > > > > > > > +       spin_lock_irq(&tlb_inval->lock);
> > > > > > > > +       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > > > > > &tlb_inval->lock,
> > > > > > > >                        dma_fence_context_alloc(1), 1);
> > > > > > > > -       spin_unlock_irq(&gt->tlb_inval.lock);
> > > > > > > > +       spin_unlock_irq(&tlb_inval->lock);
> > > > > > > 
> > > > > > > While here, 'fence_lock' is probably a better name.
> > > > > > > 
> > > > > > > Matt
> > > > > > > 
> > > > > > > >         INIT_LIST_HEAD(&fence->link);
> > > > > > > >         if (stack)
> > > > > > > >                 set_bit(FENCE_STACK_BIT, &fence-
> > > > > > > > >base.flags);
> > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > > b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > > index 7adee3f8c551..cdeafc8d4391 100644
> > > > > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > > @@ -18,24 +18,30 @@ struct xe_vma;
> > > > > > > >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
> > > > > > > >  
> > > > > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval);
> > > > > > > > -int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > > > > > -void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval,
> > > > > > > > struct
> > > > > > > > xe_vm *vm);
> > > > > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > > > > >                      struct xe_tlb_inval_fence *fence);
> > > > > > > > +int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > > > > > +void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval,
> > > > > > > > struct
> > > > > > > > xe_vm *vm);
> > > > > > > >  int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
> > > > > > > >                        struct xe_tlb_inval_fence *fence,
> > > > > > > >                        u64 start, u64 end, u32 asid);
> > > > > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc,
> > > > > > > > u32
> > > > > > > > *msg,
> > > > > > > > u32 len);
> > > > > > > >  
> > > > > > > >  void xe_tlb_inval_fence_init(struct xe_tlb_inval
> > > > > > > > *tlb_inval,
> > > > > > > >                              struct xe_tlb_inval_fence
> > > > > > > > *fence,
> > > > > > > >                              bool stack);
> > > > > > > > -void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence
> > > > > > > > *fence);
> > > > > > > >  
> > > > > > > > +/**
> > > > > > > > + * xe_tlb_inval_fence_wait() - TLB invalidiation fence
> > > > > > > > wait
> > > > > > > > + * @fence: TLB invalidation fence to wait on
> > > > > > > > + *
> > > > > > > > + * Wait on a TLB invalidiation fence until it signals,
> > > > > > > > non
> > > > > > > > interruptable
> > > > > > > > + */
> > > > > > > >  static inline void
> > > > > > > >  xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence
> > > > > > > > *fence)
> > > > > > > >  {
> > > > > > > >         dma_fence_wait(&fence->base, false);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > > > > *tlb_inval,
> > > > > > > > int seqno);
> > > > > > > > +
> > > > > > > >  #endif /* _XE_TLB_INVAL_ */
> > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > > b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > > index 05b6adc929bb..c1ad96d24fc8 100644
> > > > > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > > @@ -9,10 +9,85 @@
> > > > > > > >  #include <linux/workqueue.h>
> > > > > > > >  #include <linux/dma-fence.h>
> > > > > > > >  
> > > > > > > > -/** struct xe_tlb_inval - TLB invalidation client */
> > > > > > > > +struct xe_tlb_inval;
> > > > > > > > +
> > > > > > > > +/** struct xe_tlb_inval_ops - TLB invalidation ops
> > > > > > > > (backend)
> > > > > > > > */
> > > > > > > > +struct xe_tlb_inval_ops {
> > > > > > > > +       /**
> > > > > > > > +        * @all: Invalidate all TLBs
> > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > > +        *
> > > > > > > > +        * Return 0 on success, -ECANCELED if backend is
> > > > > > > > mid-
> > > > > > > > reset, error on
> > > > > > > > +        * failure
> > > > > > > > +        */
> > > > > > > > +       int (*all)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > > > seqno);
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +        * @ggtt: Invalidate global translation TLBs
> > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > > +        *
> > > > > > > > +        * Return 0 on success, -ECANCELED if backend is
> > > > > > > > mid-
> > > > > > > > reset, error on
> > > > > > > > +        * failure
> > > > > > > > +        */
> > > > > > > > +       int (*ggtt)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > > > seqno);
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +        * @ppttt: Invalidate per-process translation
> > > > > > > > TLBs
> > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > > +        * @start: Start address
> > > > > > > > +        * @end: End address
> > > > > > > > +        * @asid: Address space ID
> > > > > > > > +        *
> > > > > > > > +        * Return 0 on success, -ECANCELED if backend is
> > > > > > > > mid-
> > > > > > > > reset, error on
> > > > > > > > +        * failure
> > > > > > > > +        */
> > > > > > > > +       int (*ppgtt)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > > > seqno,
> > > > > > > > u64 start,
> > > > > > > > +                    u64 end, u32 asid);
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +        * @initialized: Backend is initialized
> > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > +        *
> > > > > > > > +        * Return: True if back is initialized, False
> > > > > > > > otherwise
> > > > > > > > +        */
> > > > > > > > +       bool (*initialized)(struct xe_tlb_inval
> > > > > > > > *tlb_inval);
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +        * @flush: Flush pending TLB invalidations
> > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > +        */
> > > > > > > > +       void (*flush)(struct xe_tlb_inval *tlb_inval);
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +        * @timeout_delay: Timeout delay for TLB
> > > > > > > > invalidation
> > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > +        *
> > > > > > > > +        * Return: Timeout delay for TLB invalidation in
> > > > > > > > jiffies
> > > > > > > > +        */
> > > > > > > > +       long (*timeout_delay)(struct xe_tlb_inval
> > > > > > > > *tlb_inval);
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +        * @lock: Lock resources protecting the backend
> > > > > > > > seqno
> > > > > > > > management
> > > > > > > > +        */
> > > > > > > > +       void (*lock)(struct xe_tlb_inval *tlb_inval);
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +        * @unlock: Lock resources protecting the backend
> > > > > > > > seqno
> > > > > > > > management
> > > > > > > > +        */
> > > > > > > > +       void (*unlock)(struct xe_tlb_inval *tlb_inval);
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +/** struct xe_tlb_inval - TLB invalidation client
> > > > > > > > (frontend)
> > > > > > > > */
> > > > > > > >  struct xe_tlb_inval {
> > > > > > > >         /** @private: Backend private pointer */
> > > > > > > >         void *private;
> > > > > > > > +       /** @xe: Pointer to Xe device */
> > > > > > > > +       struct xe_device *xe;
> > > > > > > > +       /** @ops: TLB invalidation ops */
> > > > > > > > +       const struct xe_tlb_inval_ops *ops;
> > > > > > > >         /** @tlb_inval.seqno: TLB invalidation seqno,
> > > > > > > > protected
> > > > > > > > by CT lock */
> > > > > > > >  #define TLB_INVALIDATION_SEQNO_MAX     0x100000
> > > > > > > >         int seqno;
> > > > > > > > -- 
> > > > > > > > 2.34.1
> > > > > > > > 
> > > > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend
  2025-07-23 23:21                 ` Matthew Brost
@ 2025-07-23 23:46                   ` Summers, Stuart
  0 siblings, 0 replies; 19+ messages in thread
From: Summers, Stuart @ 2025-07-23 23:46 UTC (permalink / raw)
  To: Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
	Kassabri, Farah, Auld, Matthew

On Wed, 2025-07-23 at 16:21 -0700, Matthew Brost wrote:
> On Wed, Jul 23, 2025 at 04:03:12PM -0600, Summers, Stuart wrote:
> > On Wed, 2025-07-23 at 14:22 -0700, Matthew Brost wrote:
> > > On Wed, Jul 23, 2025 at 02:55:24PM -0600, Summers, Stuart wrote:
> > > > On Wed, 2025-07-23 at 13:47 -0700, Matthew Brost wrote:
> > > > > 
> > > > 
> > > > <cut>
> > > > (just to reduce the noise in the rest of the patch here for
> > > > now...)
> > > > 
> > > > > > > > >  
> > > > > > > > >  /**
> > > > > > > > > - * xe_tlb_inval_reset - Initialize TLB invalidation
> > > > > > > > > reset
> > > > > > > > > + * xe_tlb_inval_reset() - TLB invalidation reset
> > > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > > >   *
> > > > > > > > >   * Signal any pending invalidation fences, should be
> > > > > > > > > called
> > > > > > > > > during a GT reset
> > > > > > > > >   */
> > > > > > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval
> > > > > > > > > *tlb_inval)
> > > > > > > > >  {
> > > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > > > > > >         int pending_seqno;
> > > > > > > > >  
> > > > > > > > >         /*
> > > > > > > > > -        * we can get here before the CTs are even
> > > > > > > > > initialized if
> > > > > > > > > we're wedging
> > > > > > > > > -        * very early, in which case there are not
> > > > > > > > > going
> > > > > > > > > to
> > > > > > > > > be
> > > > > > > > > any pending
> > > > > > > > > -        * fences so we can bail immediately.
> > > > > > > > > +        * we can get here before the backends are
> > > > > > > > > even
> > > > > > > > > initialized if we're
> > > > > > > > > +        * wedging very early, in which case there
> > > > > > > > > are
> > > > > > > > > not
> > > > > > > > > going
> > > > > > > > > to be any
> > > > > > > > > +        * pendind fences so we can bail immediately.
> > > > > > > > >          */
> > > > > > > > > -       if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
> > > > > > > > > +       if (!tlb_inval->ops->initialized(tlb_inval))
> > > > > > > > >                 return;
> > > > > > > > >  
> > > > > > > > >         /*
> > > > > > > > > -        * CT channel is already disabled at this
> > > > > > > > > point.
> > > > > > > > > No
> > > > > > > > > new
> > > > > > > > > TLB requests can
> > > > > > > > > +        * Backend is already disabled at this point.
> > > > > > > > > No
> > > > > > > > > new
> > > > > > > > > TLB
> > > > > > > > > requests can
> > > > > > > > >          * appear.
> > > > > > > > >          */
> > > > > > > > >  
> > > > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > > > -       spin_lock_irq(&gt->tlb_inval.pending_lock);
> > > > > > > > > -       cancel_delayed_work(&gt-
> > > > > > > > > >tlb_inval.fence_tdr);
> > > > > > > > > +       tlb_inval->ops->lock(tlb_inval);
> > > > > > > > 
> > > > > > > > I think you want a dedicated lock embedded in struct
> > > > > > > > xe_tlb_inval,
> > > > > > > > rather than reaching into the backend to grab one.
> > > > > > > > 
> > > > > > > > This will deadlock as written: G2H TLB inval messages
> > > > > > > > are
> > > > > > > > sometimes
> > > > > > > > processed while holding ct->lock (non-fast path,
> > > > > > > > unlikely)
> > > > > > > > and
> > > > > > > > sometimes
> > > > > > > > without it (fast path, likely).
> > > > > > > 
> > > > > > > Ugh, I'm off today. Ignore the deadlock part, I was
> > > > > > > confusing
> > > > > > > myself...
> > > > > > > I was thinking this was the function
> > > > > > > xe_tlb_inval_done_handler,
> > > > > > > it is
> > > > > > > not. I still think xe_tlb_inval should its own lock but
> > > > > > > this
> > > > > > > patch
> > > > > > > written should work with
> > > > > > > s/xe_guc_ct_send/xe_guc_ct_send_locked.
> > > > > > 
> > > > > > So one reason I didn't go that way is we did just the
> > > > > > reverse
> > > > > > recently
> > > > > > - moved from a TLB dedicated lock to the more specific CT
> > > > > > lock
> > > > > > since
> > > > > > these are all going into the CT handler anyway when we use
> > > > > > GuC
> > > > > > submission. Then this embedded version allows us to lock at
> > > > > > the
> > > > > > bottom
> > > > > > data layer rather than having a separate lock in the upper
> > > > > > layer.
> > > > > > Another thing is we might want to have different types of
> > > > > > invalidation
> > > > > > running in parallel without locking the data in the upper
> > > > > > layer
> > > > > > since
> > > > > > the real contention would be in the lower level pipelining
> > > > > > anyway.
> > > > > > 
> > > > > 
> > > > > I can see the reasoning behind this approach, and maybe it’s
> > > > > fine.
> > > > > 
> > > > > But consider the case where the GuC backend has to look up a
> > > > > VM,
> > > > > iterate
> > > > > over a list of exec queues, and send multiple H2Gs to the
> > > > > hardware,
> > > > > each
> > > > > with a corresponding G2H (per-context invalidations). In the
> > > > > worst
> > > > > case,
> > > > > the CT code may have to wait for and process some G2Hs
> > > > > because
> > > > > our
> > > > > G2H
> > > > > credits are exhausted—all while holding the CT lock, which
> > > > > currently
> > > > > blocks any hardware submissions (i.e., hardware submissions
> > > > > need
> > > > > the
> > > > > CT
> > > > > lock). Now imagine multiple sources issuing invalidations:
> > > > > they
> > > > > could
> > > > > grab the CT lock before a submission waiting on it, further
> > > > > delaying
> > > > > that
> > > > > submission. 
> > > > > 
> > > > > The longer a mutex is held, the more likely the CPU thread
> > > > > holding it
> > > > > could switched out while holding it.
> > > > > 
> > > > > This doesn’t seem scalable compared to using a finer-grained
> > > > > CT
> > > > > lock
> > > > > (e.g., only taking it in xe_guc_ct_send).
> > > > > 
> > > > > I’m not saying this won’t work as you have it—I think it
> > > > > will—but
> > > > > the
> > > > > consequences of holding the CT lock for an extended period
> > > > > need
> > > > > to be
> > > > > considered.
> > > > 
> > > > Couple more thoughts.. so in the case you mentioned, ideally
> > > > I'd
> > > > like
> > > > to have just a single invalidation per request, rather than
> > > > across
> > > > a
> > > > whole VM. That's the reason we have the range based
> > > > invalidation to
> > > 
> > > Yes, this is ranged based.
> > > 
> > > > begin with. If we get to the point where we want to make that
> > > > even
> > > > finer, that's great, but we should still just have a single
> > > > invalidation per request (again, ideally).
> > > > 
> > > 
> > > Maybe you have a different idea, but I was thinking of queue-
> > > based
> > > invalidations: the frontend assigns a single seqno, the backend
> > > issues N
> > > invalidations to the hardware—one per GCID mapped in the VM/GT
> > > tuple—and
> > > then signals the frontend when all invalidations associated with
> > > the
> > > seqno are complete. With the GuC, a GCID corresponds to each exec
> > > queue’s
> > > gucid mapped in the VM/GT tuple. Different backends can handle
> > > this
> > > differently.
> > > 
> > > > Also, you already have some patches up on the list that do some
> > > > coalescing of invalidations so we reduce the number of
> > > > invalidations
> > > > for multiple ranges. I didn't want to include those patches
> > > > here
> > > > because IMO they are really a separate feature here and it'd be
> > > > nice to
> > > > review that on its own.
> > > > 
> > > 
> > > I agree it is a seperate thing, that should help in some cases,
> > > and
> > > should be reviewed on its own.
> > > 
> > > That doesn't help in the case of multiple VM's issuing
> > > invalidations
> > > though (think eviction is occuring or MMU notifiers are firing).
> > > The
> > > lock contenion is moved from a dedicated TLB invalidation lock,
> > > to a
> > > widely shared CT lock. If multiple TLB invalidations are
> > > contending,
> > > now
> > > all other users of the CT lock contend at this higher level.
> > > i.e., by
> > > only acquring CT lock at last part of an invalidation, other
> > > waiters
> > > (non-invalidation) get QoS.
> > 
> > I mean, this was the original reason I had understood for having
> > the
> > separate lock in the first place. But it feels a little like we're
> > running in circles here moving between the two modes..
> > 
> 
> We might be getting a little side tracked but let me give a quick
> example of the contention with CT lock vs. a dedicated lock.
> 
> - VM[0] has N queues attached to it
> - VM[1] has M queues attached to it
> - Q[0], mapped in a different VM[0], VM[1] 
> 
> In very short period of this occurs...
> 
> 1. VM[0] issues an invalidation
> 2. VM[1] issues an invalidation
> 3. Q[0] does a submission
> 
> With a CT lock, thia is going to be the order of the H2G:
> VM[0] - Invalidation[0]
> ...
> VM[0] - Invalidation[N-1]
> VM[1] - Invalidation[0]
> ...
> VM[1] - Invalidation[M-1]
> Q[0] - Submit
> 
> With a dedicated lock:
> VM[0] - Invalidation[0]
> Q[0] - Submit (this could actually first or a little later depending
> exact timing)
> ...
> VM[0] - Invalidation[N-1]
> VM[1] - Invalidation[0]
> ...
> VM[1] - Invalidation[M-1]
> 
> The more pathological case—many VMs doing things like freeing memory
> (e.g., a user-space free with SVM triggers an invalidation)—could
> severely hurt QoS for submissions. I'm pretty sure we could craft a
> test
> case to demonstrate this. Is it likely to be common? No. But that
> doesn’t mean, as we rewrite this code, we shouldn’t account for the
> worst cases and design our locking accordingly.
> 
> Here’s a quick list of common places where the CT lock is required:
> 
> - User submissions
> - Binds (although this is likely to use the only CPU as some point)
> - Memory allocations (clear jobs)
> - SVM copys (both GPU and CPU faults)
> - BO eviction (copy jobs)
> - Prefetches (i.e., KMD triggered migration)
> - In place memory decompression
> - GPU page fault service ack
> - Exec queue destory
> - Preempt fences and resume
> 
> Again, if multiple TLB invalidations stack up, QoS for all of the
> above
> operations could be denied.
> 
> > I do see what you're saying though, basically the problem is the CT
> > send routine right now is doing a busy wait for a reply from guc
> > each
> > time it sends something, all within the lock.
> > 
> >                 if (!wait_event_timeout(ct->wq, !ct-
> > >g2h_outstanding ||
> >                                         g2h_avail(ct), HZ))
> > 
> > So if we're going to stick with this, yeah I agree we really need
> > some
> 
> Invalidations are a very hot path, so we need to make sure they’re
> implemented as optimally as possible—the original implementation was,
> well, horrible. That one’s on me.
> 
> We've actually already fixed a decent amount of issues already but
> there
> is more work to do. Good locking here will help too.
> 
> More ideas:
> 
> - Now that we have invalidations jobs, we can pipeline invalidations
>   from BO moves into copy jobs
> 
> - Coalescing should help
> 
> - SVM garbage collector likely should batch together unbinds of
> ranges
>   to avoid multiple TLB invalidations
> 
> - I think we issue too many GGTT invalidations (both on alloc and
> free),
>   we should be able to rid of one of those
> 
> - Supress G2H ack on TLB invalidations we don't care about (e.g.,
> when
>   issuing multiple queue based invalidations within a VM, we really
> only
>   want on ack on the last one)
> 
> - If we are on native, maybe we don't even talk to the GuC and issue
> VM
>   invalidations directly from the KMD (GGTT invalidations would
> always a
>   H2G)
> 
> > kind of queuing if we're going to have a lot of these fine grained
> > invalidations all in a row or we'll start blocking things like page
> > fault replies.
> > 
> > I'm wondering if the better way to approach this though would be to
> > refactor on the GuC side rather than do something really
> > complicated on
> 
> Here, I'm not suggesting anything complicated, just a dedicated lock.
> 
> Some of suggestions above would get slightly more complicated but if
> everything is layered right, it actually shouldn't be all that bad as
> we'd just be modifying individual components in each case.
> 
> > the TLB side. I.e. why can't we do the CT busy wait in a worker
> > thread
> > and let the send thread keep going adding more and more? It would
> > mean
> > we'd have to do a better job of tracking each unique request out to
> > guc
> > rather than just relying on the current g2h_outstanding count, but
> > it
> > would at least let us do some of this work in parallel.
> > 
> > The queueing mechanism is still going to take work on top of what
> > we
> > have in this series to build up these chains of h2g messages with
> > the
> > CT lock held only for that last one. And IMO it still will be a
> > little
> > messy calling into the lower layer (guc) and back out to the upper
> > layer (tlb) and back again to build these queues. And I'm not sure
> > how
> > great that will work if we move to a different back end than guc -
> > we
> > might not get any benefit there after all this work on the guc
> > side.
> > 
> > Let me know what you think about a CT refactor like what I said.
> > 
> 
> I'm not really following the above, but the TL;DR of how we wait for
> G2H
> space under the CT lock is this: the CT is a closed loop—you can’t
> send
> an H2G unless there’s space to land the G2H—so you hold the CT lock,
> wait, and make space for yourself as you’re next in line for service.
> 
> > And I still do think we can do a better job reducing the scope of
> > some
> > of these invalidations, particularly in a case where we weanted to
> > associate something like the guc id with the VM to build a range
> > rather
> > than just the addresses within the VM. At least in that case we can
> 
> I'm not really following this either—are you proposing an H2G where
> we
> pass a list of gucids, or that we tell the GuC which gucids are
> associated with a VM? I think either is worth exploring. For the
> latter,
> I don’t see why we even need the whole queue-based invalidation when
> the
> GuC could build a hash table keyed by the ASID and find everything it
> needs to issue the invalidation(s). Maybe this doesn't scale with
> VFs?

Hey thanks for the detailed reply up there and use case review. We also
had a quick chat offline and I think we're aligned here. I'll add back
in the TLB specific lock. We aren't doing anything explicit in terms of
queueing right now but I think we both agree on the need to add
something like that so we aren't trying to invalidate a lot more than
we actually need to like we are today (e.g. in the case where we have
multiple page table updates back to back that each touch the same
overall range within a VM). The lock layering lets us do that outside
of the GuC space (or any backend) to minimize the communication we're
doing either with the firmware or hardware for those invalidations.

So let me take this back, add that layering and send an update here.

Thanks for the feedback!

-Stuart

> 
> Matt
> 
> > look a little longer term at something like the CT refactor and
> > still
> > keep the backend/frontend isolation intact.
> > 
> > Thanks,
> > Stuart
> > 
> > > 
> > > Matt
> > >  
> > > > So basically, the per request lock here also pushes us to
> > > > implement
> > > > in
> > > > a more efficient and precise way rather than just hammering as
> > > > many
> > > > invalidations over a given range as possible.
> > > > 
> > > > And of course there are going to need to be bigger hammer
> > > > invalidations
> > > > sometimes (like the full VF invalidation we're doing in the
> > > > invalidate_all() routines), but those still fall into the same
> > > > category
> > > > of precision, just with a larger scope (rather than multiple
> > > > smaller
> > > > invalidations).
> > > > 
> > > > Thanks,
> > > > Stuart
> > > > 
> > > > > 
> > > > > Matt
> > > > > 
> > > > > > Thanks,
> > > > > > Stuart
> > > > > > 
> > > > > > > 
> > > > > > > Matt 
> > > > > > > 
> > > > > > > > 
> > > > > > > > I’d call this lock seqno_lock, since it protects
> > > > > > > > exactly
> > > > > > > > that—the
> > > > > > > > order
> > > > > > > > in which a seqno is assigned by the frontend and handed
> > > > > > > > to
> > > > > > > > the
> > > > > > > > backend.
> > > > > > > > 
> > > > > > > > Prime this lock for reclaim as well—do what
> > > > > > > > primelockdep()
> > > > > > > > does
> > > > > > > > in
> > > > > > > > xe_guc_ct.c—to make it clear that memory allocations
> > > > > > > > are
> > > > > > > > not
> > > > > > > > allowed
> > > > > > > > while the lock is held as TLB invalidations can be
> > > > > > > > called
> > > > > > > > from
> > > > > > > > two
> > > > > > > > reclaim paths:
> > > > > > > > 
> > > > > > > > - MMU notifier callbacks
> > > > > > > > - The dma-fence signaling path of VM binds that require
> > > > > > > > a
> > > > > > > > TLB
> > > > > > > >   invalidation
> > > > > > > > 
> > > > > > > > > +       spin_lock_irq(&tlb_inval->pending_lock);
> > > > > > > > > +       cancel_delayed_work(&tlb_inval->fence_tdr);
> > > > > > > > >         /*
> > > > > > > > >          * We might have various kworkers waiting for
> > > > > > > > > TLB
> > > > > > > > > flushes
> > > > > > > > > to complete
> > > > > > > > >          * which are not tracked with an explicit TLB
> > > > > > > > > fence,
> > > > > > > > > however at this
> > > > > > > > > -        * stage that will never happen since the CT
> > > > > > > > > is
> > > > > > > > > already
> > > > > > > > > disabled, so
> > > > > > > > > -        * make sure we signal them here under the
> > > > > > > > > assumption
> > > > > > > > > that we have
> > > > > > > > > +        * stage that will never happen since the
> > > > > > > > > backend
> > > > > > > > > is
> > > > > > > > > already disabled,
> > > > > > > > > +        * so make sure we signal them here under the
> > > > > > > > > assumption
> > > > > > > > > that we have
> > > > > > > > >          * completed a full GT reset.
> > > > > > > > >          */
> > > > > > > > > -       if (gt->tlb_inval.seqno == 1)
> > > > > > > > > +       if (tlb_inval->seqno == 1)
> > > > > > > > >                 pending_seqno =
> > > > > > > > > TLB_INVALIDATION_SEQNO_MAX -
> > > > > > > > > 1;
> > > > > > > > >         else
> > > > > > > > > -               pending_seqno = gt->tlb_inval.seqno -
> > > > > > > > > 1;
> > > > > > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv,
> > > > > > > > > pending_seqno);
> > > > > > > > > +               pending_seqno = tlb_inval->seqno - 1;
> > > > > > > > > +       WRITE_ONCE(tlb_inval->seqno_recv,
> > > > > > > > > pending_seqno);
> > > > > > > > >  
> > > > > > > > >         list_for_each_entry_safe(fence, next,
> > > > > > > > > -                                &gt-
> > > > > > > > > > tlb_inval.pending_fences,
> > > > > > > > > link)
> > > > > > > > > -               inval_fence_signal(gt_to_xe(gt),
> > > > > > > > > fence);
> > > > > > > > > -       spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > > > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > > > +                                &tlb_inval-
> > > > > > > > > > pending_fences,
> > > > > > > > > link)
> > > > > > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > > > > > +       spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > > > > > +       tlb_inval->ops->unlock(tlb_inval);
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > > -static bool tlb_inval_seqno_past(struct xe_gt *gt,
> > > > > > > > > int
> > > > > > > > > seqno)
> > > > > > > > > +static bool xe_tlb_inval_seqno_past(struct
> > > > > > > > > xe_tlb_inval
> > > > > > > > > *tlb_inval, int seqno)
> > > > > > > > >  {
> > > > > > > > > -       int seqno_recv = READ_ONCE(gt-
> > > > > > > > > > tlb_inval.seqno_recv);
> > > > > > > > > +       int seqno_recv = READ_ONCE(tlb_inval-
> > > > > > > > > > seqno_recv);
> > > > > > > > > +
> > > > > > > > > +       lockdep_assert_held(&tlb_inval-
> > > > > > > > > >pending_lock);
> > > > > > > > >  
> > > > > > > > >         if (seqno - seqno_recv < -
> > > > > > > > > (TLB_INVALIDATION_SEQNO_MAX
> > > > > > > > > /
> > > > > > > > > 2))
> > > > > > > > >                 return false;
> > > > > > > > > @@ -201,44 +192,20 @@ static bool
> > > > > > > > > tlb_inval_seqno_past(struct
> > > > > > > > > xe_gt *gt, int seqno)
> > > > > > > > >         return seqno_recv >= seqno;
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > > -static int send_tlb_inval(struct xe_guc *guc, const
> > > > > > > > > u32
> > > > > > > > > *action,
> > > > > > > > > int len)
> > > > > > > > > -{
> > > > > > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > > > > > -
> > > > > > > > > -       xe_gt_assert(gt, action[1]);    /* Seqno */
> > > > > > > > > -       lockdep_assert_held(&guc->ct.lock);
> > > > > > > > > -
> > > > > > > > > -       /*
> > > > > > > > > -        * XXX: The seqno algorithm relies on TLB
> > > > > > > > > invalidation
> > > > > > > > > being processed
> > > > > > > > > -        * in order which they currently are, if that
> > > > > > > > > changes
> > > > > > > > > the
> > > > > > > > > algorithm will
> > > > > > > > > -        * need to be updated.
> > > > > > > > > -        */
> > > > > > > > > -
> > > > > > > > > -       xe_gt_stats_incr(gt,
> > > > > > > > > XE_GT_STATS_ID_TLB_INVAL,
> > > > > > > > > 1);
> > > > > > > > > -
> > > > > > > > > -       return xe_guc_ct_send(&guc->ct, action, len,
> > > > > > > > > -                            
> > > > > > > > > G2H_LEN_DW_TLB_INVALIDATE,
> > > > > > > > > 1);
> > > > > > > > > -}
> > > > > > > > > -
> > > > > > > > >  static void xe_tlb_inval_fence_prep(struct
> > > > > > > > > xe_tlb_inval_fence
> > > > > > > > > *fence)
> > > > > > > > >  {
> > > > > > > > >         struct xe_tlb_inval *tlb_inval = fence-
> > > > > > > > > > tlb_inval;
> > > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > > > -
> > > > > > > > > -       lockdep_assert_held(&gt->uc.guc.ct.lock);
> > > > > > > > >  
> > > > > > > > >         fence->seqno = tlb_inval->seqno;
> > > > > > > > > -       trace_xe_tlb_inval_fence_send(xe, fence);
> > > > > > > > > +       trace_xe_tlb_inval_fence_send(tlb_inval->xe,
> > > > > > > > > fence);
> > > > > > > > >  
> > > > > > > > >         spin_lock_irq(&tlb_inval->pending_lock);
> > > > > > > > >         fence->inval_time = ktime_get();
> > > > > > > > >         list_add_tail(&fence->link, &tlb_inval-
> > > > > > > > > > pending_fences);
> > > > > > > > >  
> > > > > > > > >         if (list_is_singular(&tlb_inval-
> > > > > > > > > >pending_fences))
> > > > > > > > > -               queue_delayed_work(system_wq,
> > > > > > > > > -                                  &tlb_inval-
> > > > > > > > > >fence_tdr,
> > > > > > > > > -                                 
> > > > > > > > > tlb_timeout_jiffies(gt));
> > > > > > > > > +               queue_delayed_work(system_wq,
> > > > > > > > > &tlb_inval-
> > > > > > > > > > fence_tdr,
> > > > > > > > > +                                  tlb_inval->ops-
> > > > > > > > > > timeout_delay(tlb_inval));
> > > > > > > > >         spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > > > > >  
> > > > > > > > >         tlb_inval->seqno = (tlb_inval->seqno + 1) %
> > > > > > > > > @@ -247,202 +214,63 @@ static void
> > > > > > > > > xe_tlb_inval_fence_prep(struct
> > > > > > > > > xe_tlb_inval_fence *fence)
> > > > > > > > >                 tlb_inval->seqno = 1;
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > > -#define MAKE_INVAL_OP(type)    ((type <<
> > > > > > > > > XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > > > > > > > > -               XE_GUC_TLB_INVAL_MODE_HEAVY <<
> > > > > > > > > XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > > > > > > > > -               XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > > > > > > > > -
> > > > > > > > > -static int send_tlb_inval_ggtt(struct xe_gt *gt, int
> > > > > > > > > seqno)
> > > > > > > > > -{
> > > > > > > > > -       u32 action[] = {
> > > > > > > > > -               XE_GUC_ACTION_TLB_INVALIDATION,
> > > > > > > > > -               seqno,
> > > > > > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > > > > > > > > -       };
> > > > > > > > > -
> > > > > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > > > > ARRAY_SIZE(action));
> > > > > > > > > -}
> > > > > > > > > -
> > > > > > > > > -static int send_tlb_inval_all(struct xe_tlb_inval
> > > > > > > > > *tlb_inval,
> > > > > > > > > -                             struct
> > > > > > > > > xe_tlb_inval_fence
> > > > > > > > > *fence)
> > > > > > > > > -{
> > > > > > > > > -       u32 action[] = {
> > > > > > > > > -               XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > > > > > > > > -               0,  /* seqno, replaced in
> > > > > > > > > send_tlb_inval
> > > > > > > > > */
> > > > > > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > > > > > > > -       };
> > > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > > -
> > > > > > > > > -       xe_gt_assert(gt, fence);
> > > > > > > > > -
> > > > > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > > > > ARRAY_SIZE(action));
> > > > > > > > > -}
> > > > > > > > > +#define xe_tlb_inval_issue(__tlb_inval, __fence, op,
> > > > > > > > > args...)  \
> > > > > > > > > +({                                                  
> > > > > > > > >     
> > > > > > > > >     
> > > > > > > > >    \
> > > > > > > > > +       int
> > > > > > > > > __ret;                                              \
> > > > > > > > > +                                                    
> > > > > > > > >     
> > > > > > > > >     
> > > > > > > > >    \
> > > > > > > > > +       xe_assert((__tlb_inval)->xe, (__tlb_inval)-
> > > > > > > > > > ops);       \
> > > > > > > > > +       xe_assert((__tlb_inval)->xe,
> > > > > > > > > (__fence));                \
> > > > > > > > > +                                                    
> > > > > > > > >     
> > > > > > > > >     
> > > > > > > > >    \
> > > > > > > > > +       (__tlb_inval)->ops-
> > > > > > > > > > lock((__tlb_inval));                \
> > > > > > > > > +       xe_tlb_inval_fence_prep((__fence));          
> > > > > > > > >     
> > > > > > > > >     
> > > > > > > > >    \
> > > > > > > > > +       __ret = op((__tlb_inval), (__fence)->seqno,
> > > > > > > > > ##args);    \
> > > > > > > > > +       if (__ret <
> > > > > > > > > 0)                                          \
> > > > > > > > > +               xe_tlb_inval_fence_signal_unlocked((_
> > > > > > > > > _fen
> > > > > > > > > ce))
> > > > > > > > > ;  \
> > > > > > > > > +       (__tlb_inval)->ops-
> > > > > > > > > > unlock((__tlb_inval));              \
> > > > > > > > > +                                                    
> > > > > > > > >     
> > > > > > > > >     
> > > > > > > > >    \
> > > > > > > > > +       __ret == -ECANCELED ? 0 :
> > > > > > > > > __ret;                        \
> > > > > > > > > +})
> > > > > > > > >  
> > > > > > > > >  /**
> > > > > > > > > - * xe_gt_tlb_invalidation_all - Invalidate all TLBs
> > > > > > > > > across
> > > > > > > > > PF
> > > > > > > > > and all VFs.
> > > > > > > > > - * @gt: the &xe_gt structure
> > > > > > > > > - * @fence: the &xe_tlb_inval_fence to be signaled on
> > > > > > > > > completion
> > > > > > > > > + * xe_tlb_inval_all() - Issue a TLB invalidation for
> > > > > > > > > all
> > > > > > > > > TLBs
> > > > > > > > > + * @tlb_inval: TLB invalidation client
> > > > > > > > > + * @fence: invalidation fence which will be signal
> > > > > > > > > on
> > > > > > > > > TLB
> > > > > > > > > invalidation
> > > > > > > > > + * completion
> > > > > > > > >   *
> > > > > > > > > - * Send a request to invalidate all TLBs across PF
> > > > > > > > > and
> > > > > > > > > all
> > > > > > > > > VFs.
> > > > > > > > > + * Issue a TLB invalidation for all TLBs. Completion
> > > > > > > > > of
> > > > > > > > > TLB
> > > > > > > > > is
> > > > > > > > > asynchronous and
> > > > > > > > > + * caller can use the invalidation fence to wait for
> > > > > > > > > completion.
> > > > > > > > >   *
> > > > > > > > >   * Return: 0 on success, negative error code on
> > > > > > > > > error
> > > > > > > > >   */
> > > > > > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > > > > > >                      struct xe_tlb_inval_fence
> > > > > > > > > *fence)
> > > > > > > > >  {
> > > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > > -       int err;
> > > > > > > > > -
> > > > > > > > > -       err = send_tlb_inval_all(tlb_inval, fence);
> > > > > > > > > -       if (err)
> > > > > > > > > -               xe_gt_err(gt, "TLB invalidation
> > > > > > > > > request
> > > > > > > > > failed
> > > > > > > > > (%pe)", ERR_PTR(err));
> > > > > > > > > -
> > > > > > > > > -       return err;
> > > > > > > > > -}
> > > > > > > > > -
> > > > > > > > > -/*
> > > > > > > > > - * Ensure that roundup_pow_of_two(length) doesn't
> > > > > > > > > overflow.
> > > > > > > > > - * Note that roundup_pow_of_two() operates on
> > > > > > > > > unsigned
> > > > > > > > > long,
> > > > > > > > > - * not on u64.
> > > > > > > > > - */
> > > > > > > > > -#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > > > > > > > > (rounddown_pow_of_two(ULONG_MAX))
> > > > > > > > > -
> > > > > > > > > -static int send_tlb_inval_ppgtt(struct xe_gt *gt,
> > > > > > > > > u64
> > > > > > > > > start,
> > > > > > > > > u64
> > > > > > > > > end,
> > > > > > > > > -                               u32 asid, int seqno)
> > > > > > > > > -{
> > > > > > > > > -#define MAX_TLB_INVALIDATION_LEN       7
> > > > > > > > > -       u32 action[MAX_TLB_INVALIDATION_LEN];
> > > > > > > > > -       u64 length = end - start;
> > > > > > > > > -       int len = 0;
> > > > > > > > > -
> > > > > > > > > -       action[len++] =
> > > > > > > > > XE_GUC_ACTION_TLB_INVALIDATION;
> > > > > > > > > -       action[len++] = seqno;
> > > > > > > > > -       if (!gt_to_xe(gt)->info.has_range_tlb_inval
> > > > > > > > > ||
> > > > > > > > > -           length >
> > > > > > > > > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > > > > > > > > -               action[len++] =
> > > > > > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > > > > > > > > -       } else {
> > > > > > > > > -               u64 orig_start = start;
> > > > > > > > > -               u64 align;
> > > > > > > > > -
> > > > > > > > > -               if (length < SZ_4K)
> > > > > > > > > -                       length = SZ_4K;
> > > > > > > > > -
> > > > > > > > > -               /*
> > > > > > > > > -                * We need to invalidate a higher
> > > > > > > > > granularity
> > > > > > > > > if
> > > > > > > > > start address
> > > > > > > > > -                * is not aligned to length. When
> > > > > > > > > start
> > > > > > > > > is
> > > > > > > > > not
> > > > > > > > > aligned with
> > > > > > > > > -                * length we need to find the length
> > > > > > > > > large
> > > > > > > > > enough
> > > > > > > > > to create an
> > > > > > > > > -                * address mask covering the required
> > > > > > > > > range.
> > > > > > > > > -                */
> > > > > > > > > -               align = roundup_pow_of_two(length);
> > > > > > > > > -               start = ALIGN_DOWN(start, align);
> > > > > > > > > -               end = ALIGN(end, align);
> > > > > > > > > -               length = align;
> > > > > > > > > -               while (start + length < end) {
> > > > > > > > > -                       length <<= 1;
> > > > > > > > > -                       start =
> > > > > > > > > ALIGN_DOWN(orig_start,
> > > > > > > > > length);
> > > > > > > > > -               }
> > > > > > > > > -
> > > > > > > > > -               /*
> > > > > > > > > -                * Minimum invalidation size for a
> > > > > > > > > 2MB
> > > > > > > > > page
> > > > > > > > > that
> > > > > > > > > the hardware
> > > > > > > > > -                * expects is 16MB
> > > > > > > > > -                */
> > > > > > > > > -               if (length >= SZ_2M) {
> > > > > > > > > -                       length = max_t(u64, SZ_16M,
> > > > > > > > > length);
> > > > > > > > > -                       start =
> > > > > > > > > ALIGN_DOWN(orig_start,
> > > > > > > > > length);
> > > > > > > > > -               }
> > > > > > > > > -
> > > > > > > > > -               xe_gt_assert(gt, length >= SZ_4K);
> > > > > > > > > -               xe_gt_assert(gt,
> > > > > > > > > is_power_of_2(length));
> > > > > > > > > -               xe_gt_assert(gt, !(length &
> > > > > > > > > GENMASK(ilog2(SZ_16M)
> > > > > > > > > - 1,
> > > > > > > > > -                                                  
> > > > > > > > > ilog2(SZ_2M)
> > > > > > > > > + 1)));
> > > > > > > > > -               xe_gt_assert(gt, IS_ALIGNED(start,
> > > > > > > > > length));
> > > > > > > > > -
> > > > > > > > > -               action[len++] =
> > > > > > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > > > > > > > > -               action[len++] = asid;
> > > > > > > > > -               action[len++] = lower_32_bits(start);
> > > > > > > > > -               action[len++] = upper_32_bits(start);
> > > > > > > > > -               action[len++] = ilog2(length) -
> > > > > > > > > ilog2(SZ_4K);
> > > > > > > > > -       }
> > > > > > > > > -
> > > > > > > > > -       xe_gt_assert(gt, len <=
> > > > > > > > > MAX_TLB_INVALIDATION_LEN);
> > > > > > > > > -
> > > > > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > > > > len);
> > > > > > > > > -}
> > > > > > > > > -
> > > > > > > > > -static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
> > > > > > > > > -                              struct
> > > > > > > > > xe_tlb_inval_fence
> > > > > > > > > *fence)
> > > > > > > > > -{
> > > > > > > > > -       int ret;
> > > > > > > > > -
> > > > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > > > -
> > > > > > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > > > > > -
> > > > > > > > > -       ret = send_tlb_inval_ggtt(gt, fence->seqno);
> > > > > > > > > -       if (ret < 0)
> > > > > > > > > -
> > > > > > > > >                inval_fence_signal_unlocked(gt_to_xe(gt
> > > > > > > > > ),
> > > > > > > > > fence);
> > > > > > > > > -
> > > > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > > > -
> > > > > > > > > -       /*
> > > > > > > > > -        * -ECANCELED indicates the CT is stopped for
> > > > > > > > > a
> > > > > > > > > GT
> > > > > > > > > reset.
> > > > > > > > > TLB caches
> > > > > > > > > -        *  should be nuked on a GT reset so this
> > > > > > > > > error
> > > > > > > > > can
> > > > > > > > > be
> > > > > > > > > ignored.
> > > > > > > > > -        */
> > > > > > > > > -       if (ret == -ECANCELED)
> > > > > > > > > -               return 0;
> > > > > > > > > -
> > > > > > > > > -       return ret;
> > > > > > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > > > > > tlb_inval-
> > > > > > > > > > ops->all);
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > >  /**
> > > > > > > > > - * xe_tlb_inval_ggtt - Issue a TLB invalidation on
> > > > > > > > > this
> > > > > > > > > GT
> > > > > > > > > for
> > > > > > > > > the GGTT
> > > > > > > > > + * xe_tlb_inval_ggtt() - Issue a TLB invalidation
> > > > > > > > > for
> > > > > > > > > the
> > > > > > > > > GGTT
> > > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > > >   *
> > > > > > > > > - * Issue a TLB invalidation for the GGTT. Completion
> > > > > > > > > of
> > > > > > > > > TLB
> > > > > > > > > invalidation is
> > > > > > > > > - * synchronous.
> > > > > > > > > + * Issue a TLB invalidation for the GGTT. Completion
> > > > > > > > > of
> > > > > > > > > TLB
> > > > > > > > > is
> > > > > > > > > asynchronous and
> > > > > > > > > + * caller can use the invalidation fence to wait for
> > > > > > > > > completion.
> > > > > > > > >   *
> > > > > > > > >   * Return: 0 on success, negative error code on
> > > > > > > > > error
> > > > > > > > >   */
> > > > > > > > >  int xe_tlb_inval_ggtt(struct xe_tlb_inval
> > > > > > > > > *tlb_inval)
> > > > > > > > >  {
> > > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > > > -       unsigned int fw_ref;
> > > > > > > > > -
> > > > > > > > > -       if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
> > > > > > > > > -           gt->uc.guc.submission_state.enabled) {
> > > > > > > > > -               struct xe_tlb_inval_fence fence;
> > > > > > > > > -               int ret;
> > > > > > > > > -
> > > > > > > > > -               xe_tlb_inval_fence_init(tlb_inval,
> > > > > > > > > &fence,
> > > > > > > > > true);
> > > > > > > > > -               ret = __xe_tlb_inval_ggtt(gt,
> > > > > > > > > &fence);
> > > > > > > > > -               if (ret)
> > > > > > > > > -                       return ret;
> > > > > > > > > -
> > > > > > > > > -               xe_tlb_inval_fence_wait(&fence);
> > > > > > > > > -       } else if (xe_device_uc_enabled(xe) &&
> > > > > > > > > !xe_device_wedged(xe)) {
> > > > > > > > > -               struct xe_mmio *mmio = &gt->mmio;
> > > > > > > > > -
> > > > > > > > > -               if (IS_SRIOV_VF(xe))
> > > > > > > > > -                       return 0;
> > > > > > > > > -
> > > > > > > > > -               fw_ref =
> > > > > > > > > xe_force_wake_get(gt_to_fw(gt),
> > > > > > > > > XE_FW_GT);
> > > > > > > > > -               if (xe->info.platform == XE_PVC ||
> > > > > > > > > GRAPHICS_VER(xe) >= 20) {
> > > > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > > > PVC_GUC_TLB_INV_DESC1,
> > > > > > > > > -
> > > > > > > > >                                        PVC_GUC_TLB_IN
> > > > > > > > > V_DE
> > > > > > > > > SC1_
> > > > > > > > > INVAL
> > > > > > > > > IDATE);
> > > > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > > > PVC_GUC_TLB_INV_DESC0,
> > > > > > > > > -
> > > > > > > > >                                        PVC_GUC_TLB_IN
> > > > > > > > > V_DE
> > > > > > > > > SC0_
> > > > > > > > > VALID
> > > > > > > > > );
> > > > > > > > > -               } else {
> > > > > > > > > -                       xe_mmio_write32(mmio,
> > > > > > > > > GUC_TLB_INV_CR,
> > > > > > > > > -
> > > > > > > > >                                        GUC_TLB_INV_CR
> > > > > > > > > _INV
> > > > > > > > > ALID
> > > > > > > > > ATE);
> > > > > > > > > -               }
> > > > > > > > > -               xe_force_wake_put(gt_to_fw(gt),
> > > > > > > > > fw_ref);
> > > > > > > > > -       }
> > > > > > > > > +       struct xe_tlb_inval_fence fence, *fence_ptr =
> > > > > > > > > &fence;
> > > > > > > > > +       int ret;
> > > > > > > > >  
> > > > > > > > > -       return 0;
> > > > > > > > > +       xe_tlb_inval_fence_init(tlb_inval, fence_ptr,
> > > > > > > > > true);
> > > > > > > > > +       ret = xe_tlb_inval_issue(tlb_inval,
> > > > > > > > > fence_ptr,
> > > > > > > > > tlb_inval-
> > > > > > > > > > ops->ggtt);
> > > > > > > > > +       xe_tlb_inval_fence_wait(fence_ptr);
> > > > > > > > > +
> > > > > > > > > +       return ret;
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > >  /**
> > > > > > > > > - * xe_tlb_inval_range - Issue a TLB invalidation on
> > > > > > > > > this
> > > > > > > > > GT
> > > > > > > > > for
> > > > > > > > > an address range
> > > > > > > > > + * xe_tlb_inval_range() - Issue a TLB invalidation
> > > > > > > > > for
> > > > > > > > > an
> > > > > > > > > address range
> > > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > > >   * @fence: invalidation fence which will be signal
> > > > > > > > > on
> > > > > > > > > TLB
> > > > > > > > > invalidation
> > > > > > > > >   * completion
> > > > > > > > > @@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct
> > > > > > > > > xe_tlb_inval
> > > > > > > > > *tlb_inval,
> > > > > > > > >                        struct xe_tlb_inval_fence
> > > > > > > > > *fence,
> > > > > > > > > u64
> > > > > > > > > start, u64 end,
> > > > > > > > >                        u32 asid)
> > > > > > > > >  {
> > > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > > > -       int  ret;
> > > > > > > > > -
> > > > > > > > > -       xe_gt_assert(gt, fence);
> > > > > > > > > -
> > > > > > > > > -       /* Execlists not supported */
> > > > > > > > > -       if (xe->info.force_execlist) {
> > > > > > > > > -               __inval_fence_signal(xe, fence);
> > > > > > > > > -               return 0;
> > > > > > > > > -       }
> > > > > > > > > -
> > > > > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > > > > -
> > > > > > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > > > > > -
> > > > > > > > > -       ret = send_tlb_inval_ppgtt(gt, start, end,
> > > > > > > > > asid,
> > > > > > > > > fence-
> > > > > > > > > > seqno);
> > > > > > > > > -       if (ret < 0)
> > > > > > > > > -               inval_fence_signal_unlocked(xe,
> > > > > > > > > fence);
> > > > > > > > > -
> > > > > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > > > > -
> > > > > > > > > -       return ret;
> > > > > > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > > > > > tlb_inval-
> > > > > > > > > > ops->ppgtt,
> > > > > > > > > +                                 start, end, asid);
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > >  /**
> > > > > > > > > - * xe_tlb_inval_vm - Issue a TLB invalidation on
> > > > > > > > > this GT
> > > > > > > > > for
> > > > > > > > > a
> > > > > > > > > VM
> > > > > > > > > + * xe_tlb_inval_vm() - Issue a TLB invalidation for
> > > > > > > > > a VM
> > > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > > >   * @vm: VM to invalidate
> > > > > > > > >   *
> > > > > > > > > @@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct
> > > > > > > > > xe_tlb_inval
> > > > > > > > > *tlb_inval, struct xe_vm *vm)
> > > > > > > > >  {
> > > > > > > > >         struct xe_tlb_inval_fence fence;
> > > > > > > > >         u64 range = 1ull << vm->xe->info.va_bits;
> > > > > > > > > -       int ret;
> > > > > > > > >  
> > > > > > > > >         xe_tlb_inval_fence_init(tlb_inval, &fence,
> > > > > > > > > true);
> > > > > > > > > -
> > > > > > > > > -       ret = xe_tlb_inval_range(tlb_inval, &fence,
> > > > > > > > > 0,
> > > > > > > > > range,
> > > > > > > > > vm-
> > > > > > > > > > usm.asid);
> > > > > > > > > -       if (ret < 0)
> > > > > > > > > -               return;
> > > > > > > > > -
> > > > > > > > > +       xe_tlb_inval_range(tlb_inval, &fence, 0,
> > > > > > > > > range,
> > > > > > > > > vm-
> > > > > > > > > > usm.asid);
> > > > > > > > >         xe_tlb_inval_fence_wait(&fence);
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > >  /**
> > > > > > > > > - * xe_tlb_inval_done_handler - TLB invalidation done
> > > > > > > > > handler
> > > > > > > > > - * @gt: gt
> > > > > > > > > + * xe_tlb_inval_done_handler() - TLB invalidation
> > > > > > > > > done
> > > > > > > > > handler
> > > > > > > > > + * @tlb_inval: TLB invalidation client
> > > > > > > > >   * @seqno: seqno of invalidation that is done
> > > > > > > > >   *
> > > > > > > > >   * Update recv seqno, signal any TLB invalidation
> > > > > > > > > fences,
> > > > > > > > > and
> > > > > > > > > restart TDR
> > > > > > > > 
> > > > > > > > I'd mention that is function is safe be called from any
> > > > > > > > context
> > > > > > > > (i.e.,
> > > > > > > > process, atomic, and hardirq contexts are allowed).
> > > > > > > > 
> > > > > > > > We might need to convert tlb_inval.pending_lock to a
> > > > > > > > raw_spinlock_t
> > > > > > > > for
> > > > > > > > PREEMPT_RT enablement. Same for the GuC fast_lock.
> > > > > > > > AFAIK we
> > > > > > > > haven’t
> > > > > > > > had
> > > > > > > > any complaints, so maybe I’m just overthinking it, but
> > > > > > > > also
> > > > > > > > perhaps
> > > > > > > > not.
> > > > > > > > 
> > > > > > > > >   */
> > > > > > > > > -static void xe_tlb_inval_done_handler(struct xe_gt
> > > > > > > > > *gt,
> > > > > > > > > int
> > > > > > > > > seqno)
> > > > > > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > > > > > *tlb_inval,
> > > > > > > > > int seqno)
> > > > > > > > >  {
> > > > > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > > > > +       struct xe_device *xe = tlb_inval->xe;
> > > > > > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > > > > > >         unsigned long flags;
> > > > > > > > >  
> > > > > > > > > @@ -535,77 +337,53 @@ static void
> > > > > > > > > xe_tlb_inval_done_handler(struct xe_gt *gt, int
> > > > > > > > > seqno)
> > > > > > > > >          * officially process the CT message like if
> > > > > > > > > racing
> > > > > > > > > against
> > > > > > > > >          * process_g2h_msg().
> > > > > > > > >          */
> > > > > > > > > -       spin_lock_irqsave(&gt-
> > > > > > > > > >tlb_inval.pending_lock,
> > > > > > > > > flags);
> > > > > > > > > -       if (tlb_inval_seqno_past(gt, seqno)) {
> > > > > > > > > -               spin_unlock_irqrestore(&gt-
> > > > > > > > > > tlb_inval.pending_lock, flags);
> > > > > > > > > +       spin_lock_irqsave(&tlb_inval->pending_lock,
> > > > > > > > > flags);
> > > > > > > > > +       if (xe_tlb_inval_seqno_past(tlb_inval,
> > > > > > > > > seqno)) {
> > > > > > > > > +               spin_unlock_irqrestore(&tlb_inval-
> > > > > > > > > > pending_lock,
> > > > > > > > > flags);
> > > > > > > > >                 return;
> > > > > > > > >         }
> > > > > > > > >  
> > > > > > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
> > > > > > > > > +       WRITE_ONCE(tlb_inval->seqno_recv, seqno);
> > > > > > > > >  
> > > > > > > > >         list_for_each_entry_safe(fence, next,
> > > > > > > > > -                                &gt-
> > > > > > > > > > tlb_inval.pending_fences,
> > > > > > > > > link) {
> > > > > > > > > +                                &tlb_inval-
> > > > > > > > > > pending_fences,
> > > > > > > > > link) {
> > > > > > > > >                 trace_xe_tlb_inval_fence_recv(xe,
> > > > > > > > > fence);
> > > > > > > > >  
> > > > > > > > > -               if (!tlb_inval_seqno_past(gt, fence-
> > > > > > > > > > seqno))
> > > > > > > > > +               if
> > > > > > > > > (!xe_tlb_inval_seqno_past(tlb_inval,
> > > > > > > > > fence-
> > > > > > > > > > seqno))
> > > > > > > > >                         break;
> > > > > > > > >  
> > > > > > > > > -               inval_fence_signal(xe, fence);
> > > > > > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > > > > >         }
> > > > > > > > >  
> > > > > > > > > -       if (!list_empty(&gt-
> > > > > > > > > >tlb_inval.pending_fences))
> > > > > > > > > +       if (!list_empty(&tlb_inval->pending_fences))
> > > > > > > > >                 mod_delayed_work(system_wq,
> > > > > > > > > -                                &gt-
> > > > > > > > > > tlb_inval.fence_tdr,
> > > > > > > > > -                               
> > > > > > > > > tlb_timeout_jiffies(gt));
> > > > > > > > > +                                &tlb_inval-
> > > > > > > > > >fence_tdr,
> > > > > > > > > +                                tlb_inval->ops-
> > > > > > > > > > timeout_delay(tlb_inval));
> > > > > > > > >         else
> > > > > > > > > -               cancel_delayed_work(&gt-
> > > > > > > > > > tlb_inval.fence_tdr);
> > > > > > > > > +               cancel_delayed_work(&tlb_inval-
> > > > > > > > > > fence_tdr);
> > > > > > > > >  
> > > > > > > > > -       spin_unlock_irqrestore(&gt-
> > > > > > > > > > tlb_inval.pending_lock,
> > > > > > > > > flags);
> > > > > > > > > -}
> > > > > > > > > -
> > > > > > > > > -/**
> > > > > > > > > - * xe_guc_tlb_inval_done_handler - TLB invalidation
> > > > > > > > > done
> > > > > > > > > handler
> > > > > > > > > - * @guc: guc
> > > > > > > > > - * @msg: message indicating TLB invalidation done
> > > > > > > > > - * @len: length of message
> > > > > > > > > - *
> > > > > > > > > - * Parse seqno of TLB invalidation, wake any waiters
> > > > > > > > > for
> > > > > > > > > seqno,
> > > > > > > > > and signal any
> > > > > > > > > - * invalidation fences for seqno. Algorithm for this
> > > > > > > > > depends
> > > > > > > > > on
> > > > > > > > > seqno being
> > > > > > > > > - * received in-order and asserts this assumption.
> > > > > > > > > - *
> > > > > > > > > - * Return: 0 on success, -EPROTO for malformed
> > > > > > > > > messages.
> > > > > > > > > - */
> > > > > > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc
> > > > > > > > > *guc,
> > > > > > > > > u32
> > > > > > > > > *msg,
> > > > > > > > > u32 len)
> > > > > > > > > -{
> > > > > > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > > > > > -
> > > > > > > > > -       if (unlikely(len != 1))
> > > > > > > > > -               return -EPROTO;
> > > > > > > > > -
> > > > > > > > > -       xe_tlb_inval_done_handler(gt, msg[0]);
> > > > > > > > > -
> > > > > > > > > -       return 0;
> > > > > > > > > +       spin_unlock_irqrestore(&tlb_inval-
> > > > > > > > > >pending_lock,
> > > > > > > > > flags);
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > >  static const char *
> > > > > > > > > -inval_fence_get_driver_name(struct dma_fence
> > > > > > > > > *dma_fence)
> > > > > > > > > +xe_inval_fence_get_driver_name(struct dma_fence
> > > > > > > > > *dma_fence)
> > > > > > > > >  {
> > > > > > > > >         return "xe";
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > >  static const char *
> > > > > > > > > -inval_fence_get_timeline_name(struct dma_fence
> > > > > > > > > *dma_fence)
> > > > > > > > > +xe_inval_fence_get_timeline_name(struct dma_fence
> > > > > > > > > *dma_fence)
> > > > > > > > >  {
> > > > > > > > > -       return "inval_fence";
> > > > > > > > > +       return "tlb_inval_fence";
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > >  static const struct dma_fence_ops inval_fence_ops =
> > > > > > > > > {
> > > > > > > > > -       .get_driver_name =
> > > > > > > > > inval_fence_get_driver_name,
> > > > > > > > > -       .get_timeline_name =
> > > > > > > > > inval_fence_get_timeline_name,
> > > > > > > > > +       .get_driver_name =
> > > > > > > > > xe_inval_fence_get_driver_name,
> > > > > > > > > +       .get_timeline_name =
> > > > > > > > > xe_inval_fence_get_timeline_name,
> > > > > > > > >  };
> > > > > > > > >  
> > > > > > > > >  /**
> > > > > > > > > - * xe_tlb_inval_fence_init - Initialize TLB
> > > > > > > > > invalidation
> > > > > > > > > fence
> > > > > > > > > + * xe_tlb_inval_fence_init() - Initialize TLB
> > > > > > > > > invalidation
> > > > > > > > > fence
> > > > > > > > >   * @tlb_inval: TLB invalidation client
> > > > > > > > >   * @fence: TLB invalidation fence to initialize
> > > > > > > > >   * @stack: fence is stack variable
> > > > > > > > > @@ -618,15 +396,12 @@ void
> > > > > > > > > xe_tlb_inval_fence_init(struct
> > > > > > > > > xe_tlb_inval *tlb_inval,
> > > > > > > > >                              struct
> > > > > > > > > xe_tlb_inval_fence
> > > > > > > > > *fence,
> > > > > > > > >                              bool stack)
> > > > > > > > >  {
> > > > > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > > > > -
> > > > > > > > > -       xe_pm_runtime_get_noresume(gt_to_xe(gt));
> > > > > > > > > +       xe_pm_runtime_get_noresume(tlb_inval->xe);
> > > > > > > > >  
> > > > > > > > > -       spin_lock_irq(&gt->tlb_inval.lock);
> > > > > > > > > -       dma_fence_init(&fence->base,
> > > > > > > > > &inval_fence_ops,
> > > > > > > > > -                      &gt->tlb_inval.lock,
> > > > > > > > > +       spin_lock_irq(&tlb_inval->lock);
> > > > > > > > > +       dma_fence_init(&fence->base,
> > > > > > > > > &inval_fence_ops,
> > > > > > > > > &tlb_inval->lock,
> > > > > > > > >                        dma_fence_context_alloc(1),
> > > > > > > > > 1);
> > > > > > > > > -       spin_unlock_irq(&gt->tlb_inval.lock);
> > > > > > > > > +       spin_unlock_irq(&tlb_inval->lock);
> > > > > > > > 
> > > > > > > > While here, 'fence_lock' is probably a better name.
> > > > > > > > 
> > > > > > > > Matt
> > > > > > > > 
> > > > > > > > >         INIT_LIST_HEAD(&fence->link);
> > > > > > > > >         if (stack)
> > > > > > > > >                 set_bit(FENCE_STACK_BIT, &fence-
> > > > > > > > > > base.flags);
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > > > b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > > > index 7adee3f8c551..cdeafc8d4391 100644
> > > > > > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > > > > @@ -18,24 +18,30 @@ struct xe_vma;
> > > > > > > > >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
> > > > > > > > >  
> > > > > > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval
> > > > > > > > > *tlb_inval);
> > > > > > > > > -int xe_tlb_inval_ggtt(struct xe_tlb_inval
> > > > > > > > > *tlb_inval);
> > > > > > > > > -void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval,
> > > > > > > > > struct
> > > > > > > > > xe_vm *vm);
> > > > > > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > > > > > >                      struct xe_tlb_inval_fence
> > > > > > > > > *fence);
> > > > > > > > > +int xe_tlb_inval_ggtt(struct xe_tlb_inval
> > > > > > > > > *tlb_inval);
> > > > > > > > > +void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval,
> > > > > > > > > struct
> > > > > > > > > xe_vm *vm);
> > > > > > > > >  int xe_tlb_inval_range(struct xe_tlb_inval
> > > > > > > > > *tlb_inval,
> > > > > > > > >                        struct xe_tlb_inval_fence
> > > > > > > > > *fence,
> > > > > > > > >                        u64 start, u64 end, u32 asid);
> > > > > > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc
> > > > > > > > > *guc,
> > > > > > > > > u32
> > > > > > > > > *msg,
> > > > > > > > > u32 len);
> > > > > > > > >  
> > > > > > > > >  void xe_tlb_inval_fence_init(struct xe_tlb_inval
> > > > > > > > > *tlb_inval,
> > > > > > > > >                              struct
> > > > > > > > > xe_tlb_inval_fence
> > > > > > > > > *fence,
> > > > > > > > >                              bool stack);
> > > > > > > > > -void xe_tlb_inval_fence_signal(struct
> > > > > > > > > xe_tlb_inval_fence
> > > > > > > > > *fence);
> > > > > > > > >  
> > > > > > > > > +/**
> > > > > > > > > + * xe_tlb_inval_fence_wait() - TLB invalidiation
> > > > > > > > > fence
> > > > > > > > > wait
> > > > > > > > > + * @fence: TLB invalidation fence to wait on
> > > > > > > > > + *
> > > > > > > > > + * Wait on a TLB invalidiation fence until it
> > > > > > > > > signals,
> > > > > > > > > non
> > > > > > > > > interruptable
> > > > > > > > > + */
> > > > > > > > >  static inline void
> > > > > > > > >  xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence
> > > > > > > > > *fence)
> > > > > > > > >  {
> > > > > > > > >         dma_fence_wait(&fence->base, false);
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > > > > > *tlb_inval,
> > > > > > > > > int seqno);
> > > > > > > > > +
> > > > > > > > >  #endif /* _XE_TLB_INVAL_ */
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > > > b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > > > index 05b6adc929bb..c1ad96d24fc8 100644
> > > > > > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > > > > @@ -9,10 +9,85 @@
> > > > > > > > >  #include <linux/workqueue.h>
> > > > > > > > >  #include <linux/dma-fence.h>
> > > > > > > > >  
> > > > > > > > > -/** struct xe_tlb_inval - TLB invalidation client */
> > > > > > > > > +struct xe_tlb_inval;
> > > > > > > > > +
> > > > > > > > > +/** struct xe_tlb_inval_ops - TLB invalidation ops
> > > > > > > > > (backend)
> > > > > > > > > */
> > > > > > > > > +struct xe_tlb_inval_ops {
> > > > > > > > > +       /**
> > > > > > > > > +        * @all: Invalidate all TLBs
> > > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > > > +        *
> > > > > > > > > +        * Return 0 on success, -ECANCELED if backend
> > > > > > > > > is
> > > > > > > > > mid-
> > > > > > > > > reset, error on
> > > > > > > > > +        * failure
> > > > > > > > > +        */
> > > > > > > > > +       int (*all)(struct xe_tlb_inval *tlb_inval,
> > > > > > > > > u32
> > > > > > > > > seqno);
> > > > > > > > > +
> > > > > > > > > +       /**
> > > > > > > > > +        * @ggtt: Invalidate global translation TLBs
> > > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > > > +        *
> > > > > > > > > +        * Return 0 on success, -ECANCELED if backend
> > > > > > > > > is
> > > > > > > > > mid-
> > > > > > > > > reset, error on
> > > > > > > > > +        * failure
> > > > > > > > > +        */
> > > > > > > > > +       int (*ggtt)(struct xe_tlb_inval *tlb_inval,
> > > > > > > > > u32
> > > > > > > > > seqno);
> > > > > > > > > +
> > > > > > > > > +       /**
> > > > > > > > > +        * @ppttt: Invalidate per-process translation
> > > > > > > > > TLBs
> > > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > > > > +        * @start: Start address
> > > > > > > > > +        * @end: End address
> > > > > > > > > +        * @asid: Address space ID
> > > > > > > > > +        *
> > > > > > > > > +        * Return 0 on success, -ECANCELED if backend
> > > > > > > > > is
> > > > > > > > > mid-
> > > > > > > > > reset, error on
> > > > > > > > > +        * failure
> > > > > > > > > +        */
> > > > > > > > > +       int (*ppgtt)(struct xe_tlb_inval *tlb_inval,
> > > > > > > > > u32
> > > > > > > > > seqno,
> > > > > > > > > u64 start,
> > > > > > > > > +                    u64 end, u32 asid);
> > > > > > > > > +
> > > > > > > > > +       /**
> > > > > > > > > +        * @initialized: Backend is initialized
> > > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > > +        *
> > > > > > > > > +        * Return: True if back is initialized, False
> > > > > > > > > otherwise
> > > > > > > > > +        */
> > > > > > > > > +       bool (*initialized)(struct xe_tlb_inval
> > > > > > > > > *tlb_inval);
> > > > > > > > > +
> > > > > > > > > +       /**
> > > > > > > > > +        * @flush: Flush pending TLB invalidations
> > > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > > +        */
> > > > > > > > > +       void (*flush)(struct xe_tlb_inval
> > > > > > > > > *tlb_inval);
> > > > > > > > > +
> > > > > > > > > +       /**
> > > > > > > > > +        * @timeout_delay: Timeout delay for TLB
> > > > > > > > > invalidation
> > > > > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > > > > +        *
> > > > > > > > > +        * Return: Timeout delay for TLB invalidation
> > > > > > > > > in
> > > > > > > > > jiffies
> > > > > > > > > +        */
> > > > > > > > > +       long (*timeout_delay)(struct xe_tlb_inval
> > > > > > > > > *tlb_inval);
> > > > > > > > > +
> > > > > > > > > +       /**
> > > > > > > > > +        * @lock: Lock resources protecting the
> > > > > > > > > backend
> > > > > > > > > seqno
> > > > > > > > > management
> > > > > > > > > +        */
> > > > > > > > > +       void (*lock)(struct xe_tlb_inval *tlb_inval);
> > > > > > > > > +
> > > > > > > > > +       /**
> > > > > > > > > +        * @unlock: Lock resources protecting the
> > > > > > > > > backend
> > > > > > > > > seqno
> > > > > > > > > management
> > > > > > > > > +        */
> > > > > > > > > +       void (*unlock)(struct xe_tlb_inval
> > > > > > > > > *tlb_inval);
> > > > > > > > > +};
> > > > > > > > > +
> > > > > > > > > +/** struct xe_tlb_inval - TLB invalidation client
> > > > > > > > > (frontend)
> > > > > > > > > */
> > > > > > > > >  struct xe_tlb_inval {
> > > > > > > > >         /** @private: Backend private pointer */
> > > > > > > > >         void *private;
> > > > > > > > > +       /** @xe: Pointer to Xe device */
> > > > > > > > > +       struct xe_device *xe;
> > > > > > > > > +       /** @ops: TLB invalidation ops */
> > > > > > > > > +       const struct xe_tlb_inval_ops *ops;
> > > > > > > > >         /** @tlb_inval.seqno: TLB invalidation seqno,
> > > > > > > > > protected
> > > > > > > > > by CT lock */
> > > > > > > > >  #define TLB_INVALIDATION_SEQNO_MAX     0x100000
> > > > > > > > >         int seqno;
> > > > > > > > > -- 
> > > > > > > > > 2.34.1
> > > > > > > > > 
> > > > > > 
> > > > 
> > 


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-07-23 23:47 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-23 18:22 [PATCH 0/5] Add TLB invalidation abstraction stuartsummers
2025-07-23 18:22 ` [PATCH 1/5] drm/xe: Add xe_gt_tlb_invalidation_done_handler stuartsummers
2025-07-23 18:22 ` [PATCH 2/5] drm/xe: Decouple TLB invalidations from GT stuartsummers
2025-07-23 18:22 ` [PATCH 3/5] drm/xe: Prep TLB invalidation fence before sending stuartsummers
2025-07-23 18:22 ` [PATCH 4/5] drm/xe: Add helpers to send TLB invalidations stuartsummers
2025-07-23 18:22 ` [PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend stuartsummers
2025-07-23 18:45   ` Matthew Brost
2025-07-23 18:51     ` Matthew Brost
2025-07-23 19:17   ` Matthew Brost
2025-07-23 20:18     ` Matthew Brost
2025-07-23 20:20       ` Summers, Stuart
2025-07-23 20:47         ` Matthew Brost
2025-07-23 20:55           ` Summers, Stuart
2025-07-23 21:22             ` Matthew Brost
2025-07-23 22:03               ` Summers, Stuart
2025-07-23 22:43                 ` Summers, Stuart
2025-07-23 23:21                 ` Matthew Brost
2025-07-23 23:46                   ` Summers, Stuart
2025-07-23 23:19               ` Summers, Stuart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.