[PATCH v2 0/2] Move TLB invalidation code for its own file and document it

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] Move TLB invalidation code for its own file and document it
@ 2022-07-29  7:03 Mauro Carvalho Chehab
  2022-07-29  7:03 ` [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab
  2022-07-29  7:03 ` [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions Mauro Carvalho Chehab
  0 siblings, 2 replies; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-29  7:03 UTC (permalink / raw)
  Cc: Thomas Zimmermann, Jonathan Corbet, David Airlie, intel-gfx,
	linux-doc, linux-kernel, dri-devel, Mauro Carvalho Chehab

There are more things to be added to TLB invalidation. Before doing that,
move the code to its own file, and add the relevant documentation.

Patch 1 only moves the code and do some function renames. No functional
change.

Patch 2 adds documentation for the TLB invalidation algorithm and functions.

---

v2: only patch 2 (kernel-doc) was modified:

  - The kernel-doc markups for TLB were added to i915.rst doc;
  - Some minor fixes at the texts;
  - Use a table instead of a literal block while explaining how the algorithm works.
    That should make easier to understand the logic, both in text form and after
    its conversion to HTML/PDF;
  - Remove mention for GuC, as this depends on a series that will be sent later.

Chris Wilson (1):
  drm/i915/gt: Move TLB invalidation to its own file

Mauro Carvalho Chehab (1):
  drm/i915/gt: document TLB cache invalidation functions

 Documentation/gpu/i915.rst                |   7 +
 drivers/gpu/drm/i915/Makefile             |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_pages.c |   4 +-
 drivers/gpu/drm/i915/gt/intel_gt.c        | 168 +----------------
 drivers/gpu/drm/i915/gt/intel_gt.h        |  12 --
 drivers/gpu/drm/i915/gt/intel_tlb.c       | 208 ++++++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_tlb.h       | 130 ++++++++++++++
 drivers/gpu/drm/i915/i915_vma.c           |   1 +
 8 files changed, 352 insertions(+), 179 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h

-- 
2.36.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file
  2022-07-29  7:03 [PATCH v2 0/2] Move TLB invalidation code for its own file and document it Mauro Carvalho Chehab
@ 2022-07-29  7:03 ` Mauro Carvalho Chehab
  2022-08-02 22:21   ` [Intel-gfx] " Niranjana Vishwanathapura
  2022-08-03  9:19   ` Andi Shyti
  2022-07-29  7:03 ` [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions Mauro Carvalho Chehab
  1 sibling, 2 replies; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-29  7:03 UTC (permalink / raw)
  Cc: Thomas Hellström, Andi Shyti, Tvrtko Ursulin, David Airlie,
	dri-devel, Casey Bowman, Lucas De Marchi, linux-kernel,
	Chris Wilson, Daniele Ceraolo Spurio, Rodrigo Vivi, Tomas Winkler,
	Mauro Carvalho Chehab, intel-gfx, Fei Yang, Matthew Auld

From: Chris Wilson <chris.p.wilson@intel.com>

Prepare for supporting more TLB invalidation scenarios by moving
the current MMIO invalidation to its own file.

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 0/2] at: https://lore.kernel.org/all/cover.1659077372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/Makefile             |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_pages.c |   4 +-
 drivers/gpu/drm/i915/gt/intel_gt.c        | 168 +-------------------
 drivers/gpu/drm/i915/gt/intel_gt.h        |  12 --
 drivers/gpu/drm/i915/gt/intel_tlb.c       | 183 ++++++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_tlb.h       |  29 ++++
 drivers/gpu/drm/i915/i915_vma.c           |   1 +
 7 files changed, 219 insertions(+), 179 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 522ef9b4aff3..d3df9832d1f7 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -126,6 +126,7 @@ gt-y += \
 	gt/intel_sseu.o \
 	gt/intel_sseu_debugfs.o \
 	gt/intel_timeline.o \
+	gt/intel_tlb.o \
 	gt/intel_workarounds.o \
 	gt/shmem_utils.o \
 	gt/sysfs_engines.o
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 8357dbdcab5c..1cd76cc5d9f3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -7,7 +7,7 @@
 #include <drm/drm_cache.h>
 
 #include "gt/intel_gt.h"
-#include "gt/intel_gt_pm.h"
+#include "gt/intel_tlb.h"
 
 #include "i915_drv.h"
 #include "i915_gem_object.h"
@@ -199,7 +199,7 @@ static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
 	if (!obj->mm.tlb)
 		return;
 
-	intel_gt_invalidate_tlb(gt, obj->mm.tlb);
+	intel_gt_invalidate_tlb_full(gt, obj->mm.tlb);
 	obj->mm.tlb = 0;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index f435e06125aa..18d82cd620bd 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -11,9 +11,7 @@
 #include "pxp/intel_pxp.h"
 
 #include "i915_drv.h"
-#include "i915_perf_oa_regs.h"
 #include "intel_context.h"
-#include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
 #include "intel_ggtt_gmch.h"
 #include "intel_gt.h"
@@ -31,6 +29,7 @@
 #include "intel_renderstate.h"
 #include "intel_rps.h"
 #include "intel_gt_sysfs.h"
+#include "intel_tlb.h"
 #include "intel_uncore.h"
 #include "shmem_utils.h"
 
@@ -48,8 +47,7 @@ static void __intel_gt_init_early(struct intel_gt *gt)
 	intel_gt_init_reset(gt);
 	intel_gt_init_requests(gt);
 	intel_gt_init_timelines(gt);
-	mutex_init(&gt->tlb.invalidate_lock);
-	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
+	intel_gt_init_tlb(gt);
 	intel_gt_pm_init_early(gt);
 
 	intel_uc_init_early(&gt->uc);
@@ -770,7 +768,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
 		intel_gt_fini_requests(gt);
 		intel_gt_fini_reset(gt);
 		intel_gt_fini_timelines(gt);
-		mutex_destroy(&gt->tlb.invalidate_lock);
+		intel_gt_fini_tlb(gt);
 		intel_engines_free(gt);
 	}
 }
@@ -881,163 +879,3 @@ void intel_gt_info_print(const struct intel_gt_info *info,
 
 	intel_sseu_dump(&info->sseu, p);
 }
-
-struct reg_and_bit {
-	i915_reg_t reg;
-	u32 bit;
-};
-
-static struct reg_and_bit
-get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
-		const i915_reg_t *regs, const unsigned int num)
-{
-	const unsigned int class = engine->class;
-	struct reg_and_bit rb = { };
-
-	if (drm_WARN_ON_ONCE(&engine->i915->drm,
-			     class >= num || !regs[class].reg))
-		return rb;
-
-	rb.reg = regs[class];
-	if (gen8 && class == VIDEO_DECODE_CLASS)
-		rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */
-	else
-		rb.bit = engine->instance;
-
-	rb.bit = BIT(rb.bit);
-
-	return rb;
-}
-
-static void mmio_invalidate_full(struct intel_gt *gt)
-{
-	static const i915_reg_t gen8_regs[] = {
-		[RENDER_CLASS]			= GEN8_RTCR,
-		[VIDEO_DECODE_CLASS]		= GEN8_M1TCR, /* , GEN8_M2TCR */
-		[VIDEO_ENHANCEMENT_CLASS]	= GEN8_VTCR,
-		[COPY_ENGINE_CLASS]		= GEN8_BTCR,
-	};
-	static const i915_reg_t gen12_regs[] = {
-		[RENDER_CLASS]			= GEN12_GFX_TLB_INV_CR,
-		[VIDEO_DECODE_CLASS]		= GEN12_VD_TLB_INV_CR,
-		[VIDEO_ENHANCEMENT_CLASS]	= GEN12_VE_TLB_INV_CR,
-		[COPY_ENGINE_CLASS]		= GEN12_BLT_TLB_INV_CR,
-		[COMPUTE_CLASS]			= GEN12_COMPCTX_TLB_INV_CR,
-	};
-	struct drm_i915_private *i915 = gt->i915;
-	struct intel_uncore *uncore = gt->uncore;
-	struct intel_engine_cs *engine;
-	intel_engine_mask_t awake, tmp;
-	enum intel_engine_id id;
-	const i915_reg_t *regs;
-	unsigned int num = 0;
-
-	if (GRAPHICS_VER(i915) == 12) {
-		regs = gen12_regs;
-		num = ARRAY_SIZE(gen12_regs);
-	} else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) {
-		regs = gen8_regs;
-		num = ARRAY_SIZE(gen8_regs);
-	} else if (GRAPHICS_VER(i915) < 8) {
-		return;
-	}
-
-	if (drm_WARN_ONCE(&i915->drm, !num,
-			  "Platform does not implement TLB invalidation!"))
-		return;
-
-	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
-
-	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
-
-	awake = 0;
-	for_each_engine(engine, gt, id) {
-		struct reg_and_bit rb;
-
-		if (!intel_engine_pm_is_awake(engine))
-			continue;
-
-		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
-		if (!i915_mmio_reg_offset(rb.reg))
-			continue;
-
-		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
-		awake |= engine->mask;
-	}
-
-	GT_TRACE(gt, "invalidated engines %08x\n", awake);
-
-	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
-	if (awake &&
-	    (IS_TIGERLAKE(i915) ||
-	     IS_DG1(i915) ||
-	     IS_ROCKETLAKE(i915) ||
-	     IS_ALDERLAKE_S(i915) ||
-	     IS_ALDERLAKE_P(i915)))
-		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
-
-	spin_unlock_irq(&uncore->lock);
-
-	for_each_engine_masked(engine, gt, awake, tmp) {
-		struct reg_and_bit rb;
-
-		/*
-		 * HW architecture suggest typical invalidation time at 40us,
-		 * with pessimistic cases up to 100us and a recommendation to
-		 * cap at 1ms. We go a bit higher just in case.
-		 */
-		const unsigned int timeout_us = 100;
-		const unsigned int timeout_ms = 4;
-
-		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
-		if (__intel_wait_for_register_fw(uncore,
-						 rb.reg, rb.bit, 0,
-						 timeout_us, timeout_ms,
-						 NULL))
-			drm_err_ratelimited(&gt->i915->drm,
-					    "%s TLB invalidation did not complete in %ums!\n",
-					    engine->name, timeout_ms);
-	}
-
-	/*
-	 * Use delayed put since a) we mostly expect a flurry of TLB
-	 * invalidations so it is good to avoid paying the forcewake cost and
-	 * b) it works around a bug in Icelake which cannot cope with too rapid
-	 * transitions.
-	 */
-	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
-}
-
-static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
-{
-	u32 cur = intel_gt_tlb_seqno(gt);
-
-	/* Only skip if a *full* TLB invalidate barrier has passed */
-	return (s32)(cur - ALIGN(seqno, 2)) > 0;
-}
-
-void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
-{
-	intel_wakeref_t wakeref;
-
-	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
-		return;
-
-	if (intel_gt_is_wedged(gt))
-		return;
-
-	if (tlb_seqno_passed(gt, seqno))
-		return;
-
-	with_intel_gt_pm_if_awake(gt, wakeref) {
-		mutex_lock(&gt->tlb.invalidate_lock);
-		if (tlb_seqno_passed(gt, seqno))
-			goto unlock;
-
-		mmio_invalidate_full(gt);
-
-		write_seqcount_invalidate(&gt->tlb.seqno);
-unlock:
-		mutex_unlock(&gt->tlb.invalidate_lock);
-	}
-}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 40b06adf509a..b4bba16cdb53 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -101,16 +101,4 @@ void intel_gt_info_print(const struct intel_gt_info *info,
 
 void intel_gt_watchdog_work(struct work_struct *work);
 
-static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
-{
-	return seqprop_sequence(&gt->tlb.seqno);
-}
-
-static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
-{
-	return intel_gt_tlb_seqno(gt) | 1;
-}
-
-void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
-
 #endif /* __INTEL_GT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
new file mode 100644
index 000000000000..af8cae979489
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
@@ -0,0 +1,183 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "i915_perf_oa_regs.h"
+#include "intel_engine_pm.h"
+#include "intel_gt.h"
+#include "intel_gt_pm.h"
+#include "intel_gt_regs.h"
+#include "intel_tlb.h"
+
+struct reg_and_bit {
+	i915_reg_t reg;
+	u32 bit;
+};
+
+static struct reg_and_bit
+get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
+		const i915_reg_t *regs, const unsigned int num)
+{
+	const unsigned int class = engine->class;
+	struct reg_and_bit rb = { };
+
+	if (drm_WARN_ON_ONCE(&engine->i915->drm,
+			     class >= num || !regs[class].reg))
+		return rb;
+
+	rb.reg = regs[class];
+	if (gen8 && class == VIDEO_DECODE_CLASS)
+		rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */
+	else
+		rb.bit = engine->instance;
+
+	rb.bit = BIT(rb.bit);
+
+	return rb;
+}
+
+static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
+{
+	u32 cur = intel_gt_tlb_seqno(gt);
+
+	/* Only skip if a *full* TLB invalidate barrier has passed */
+	return (s32)(cur - ALIGN(seqno, 2)) > 0;
+}
+
+static void mmio_invalidate_full(struct intel_gt *gt)
+{
+	static const i915_reg_t gen8_regs[] = {
+		[RENDER_CLASS]			= GEN8_RTCR,
+		[VIDEO_DECODE_CLASS]		= GEN8_M1TCR, /* , GEN8_M2TCR */
+		[VIDEO_ENHANCEMENT_CLASS]	= GEN8_VTCR,
+		[COPY_ENGINE_CLASS]		= GEN8_BTCR,
+	};
+	static const i915_reg_t gen12_regs[] = {
+		[RENDER_CLASS]			= GEN12_GFX_TLB_INV_CR,
+		[VIDEO_DECODE_CLASS]		= GEN12_VD_TLB_INV_CR,
+		[VIDEO_ENHANCEMENT_CLASS]	= GEN12_VE_TLB_INV_CR,
+		[COPY_ENGINE_CLASS]		= GEN12_BLT_TLB_INV_CR,
+		[COMPUTE_CLASS]			= GEN12_COMPCTX_TLB_INV_CR,
+	};
+	struct drm_i915_private *i915 = gt->i915;
+	struct intel_uncore *uncore = gt->uncore;
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t awake, tmp;
+	enum intel_engine_id id;
+	const i915_reg_t *regs;
+	unsigned int num = 0;
+
+	if (GRAPHICS_VER(i915) == 12) {
+		regs = gen12_regs;
+		num = ARRAY_SIZE(gen12_regs);
+	} else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) {
+		regs = gen8_regs;
+		num = ARRAY_SIZE(gen8_regs);
+	} else if (GRAPHICS_VER(i915) < 8) {
+		return;
+	}
+
+	if (drm_WARN_ONCE(&i915->drm, !num,
+			  "Platform does not implement TLB invalidation!"))
+		return;
+
+	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
+
+	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
+
+	awake = 0;
+	for_each_engine(engine, gt, id) {
+		struct reg_and_bit rb;
+
+		if (!intel_engine_pm_is_awake(engine))
+			continue;
+
+		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+		if (!i915_mmio_reg_offset(rb.reg))
+			continue;
+
+		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+		awake |= engine->mask;
+	}
+
+	GT_TRACE(gt, "invalidated engines %08x\n", awake);
+
+	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
+	if (awake &&
+	    (IS_TIGERLAKE(i915) ||
+	     IS_DG1(i915) ||
+	     IS_ROCKETLAKE(i915) ||
+	     IS_ALDERLAKE_S(i915) ||
+	     IS_ALDERLAKE_P(i915)))
+		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
+
+	spin_unlock_irq(&uncore->lock);
+
+	for_each_engine_masked(engine, gt, awake, tmp) {
+		struct reg_and_bit rb;
+
+		/*
+		 * HW architecture suggest typical invalidation time at 40us,
+		 * with pessimistic cases up to 100us and a recommendation to
+		 * cap at 1ms. We go a bit higher just in case.
+		 */
+		const unsigned int timeout_us = 100;
+		const unsigned int timeout_ms = 4;
+
+		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+		if (__intel_wait_for_register_fw(uncore,
+						 rb.reg, rb.bit, 0,
+						 timeout_us, timeout_ms,
+						 NULL))
+			drm_err_ratelimited(&gt->i915->drm,
+					    "%s TLB invalidation did not complete in %ums!\n",
+					    engine->name, timeout_ms);
+	}
+
+	/*
+	 * Use delayed put since a) we mostly expect a flurry of TLB
+	 * invalidations so it is good to avoid paying the forcewake cost and
+	 * b) it works around a bug in Icelake which cannot cope with too rapid
+	 * transitions.
+	 */
+	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
+}
+
+void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
+{
+	intel_wakeref_t wakeref;
+
+	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
+		return;
+
+	if (intel_gt_is_wedged(gt))
+		return;
+
+	if (tlb_seqno_passed(gt, seqno))
+		return;
+
+	with_intel_gt_pm_if_awake(gt, wakeref) {
+		mutex_lock(&gt->tlb.invalidate_lock);
+		if (tlb_seqno_passed(gt, seqno))
+			goto unlock;
+
+		mmio_invalidate_full(gt);
+
+		write_seqcount_invalidate(&gt->tlb.seqno);
+unlock:
+		mutex_unlock(&gt->tlb.invalidate_lock);
+	}
+}
+
+void intel_gt_init_tlb(struct intel_gt *gt)
+{
+	mutex_init(&gt->tlb.invalidate_lock);
+	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
+}
+
+void intel_gt_fini_tlb(struct intel_gt *gt)
+{
+	mutex_destroy(&gt->tlb.invalidate_lock);
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
new file mode 100644
index 000000000000..46ce25bf5afe
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#ifndef INTEL_TLB_H
+#define INTEL_TLB_H
+
+#include <linux/seqlock.h>
+#include <linux/types.h>
+
+#include "intel_gt_types.h"
+
+void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
+
+void intel_gt_init_tlb(struct intel_gt *gt);
+void intel_gt_fini_tlb(struct intel_gt *gt);
+
+static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
+{
+	return seqprop_sequence(&gt->tlb.seqno);
+}
+
+static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
+{
+	return intel_gt_tlb_seqno(gt) | 1;
+}
+
+#endif /* INTEL_TLB_H */
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 84a9ccbc5fc5..fe947d1456d5 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -33,6 +33,7 @@
 #include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_requests.h"
+#include "gt/intel_tlb.h"
 
 #include "i915_drv.h"
 #include "i915_gem_evict.h"
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions
  2022-07-29  7:03 [PATCH v2 0/2] Move TLB invalidation code for its own file and document it Mauro Carvalho Chehab
  2022-07-29  7:03 ` [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab
@ 2022-07-29  7:03 ` Mauro Carvalho Chehab
  2022-08-02 22:30   ` [Intel-gfx] " Niranjana Vishwanathapura
  1 sibling, 1 reply; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-29  7:03 UTC (permalink / raw)
  Cc: Tvrtko Ursulin, Thomas Zimmermann, Jonathan Corbet, David Airlie,
	dri-devel, linux-kernel, linux-doc, Chris Wilson, Rodrigo Vivi,
	Mauro Carvalho Chehab, intel-gfx

Add a description for the TLB cache invalidation algorithm and for
the related kAPI functions.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 0/2] at: https://lore.kernel.org/all/cover.1659077372.git.mchehab@kernel.org/

 Documentation/gpu/i915.rst          |   7 ++
 drivers/gpu/drm/i915/gt/intel_tlb.c |  25 +++++++
 drivers/gpu/drm/i915/gt/intel_tlb.h | 101 ++++++++++++++++++++++++++++
 3 files changed, 133 insertions(+)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 4e59db1cfb00..46911fdd79e8 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -58,6 +58,13 @@ Intel GVT-g Host Support(vGPU device model)
 .. kernel-doc:: drivers/gpu/drm/i915/intel_gvt.c
    :internal:
 
+TLB cache invalidation
+----------------------
+
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.h
+
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.c
+
 Workarounds
 -----------
 
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
index af8cae979489..4873b7ecc015 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
@@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt)
 	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
 }
 
+/**
+ * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
+ * @gt: GT structure
+ * @seqno: sequence number
+ *
+ * Do a full TLB cache invalidation if the @seqno is bigger than the last
+ * full TLB cache invalidation.
+ *
+ * Note:
+ * The TLB cache invalidation logic depends on GEN-specific registers.
+ * It currently supports MMIO-based TLB flush for GEN8 to GEN12.
+ */
 void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
 {
 	intel_wakeref_t wakeref;
@@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
 	}
 }
 
+/**
+ * intel_gt_init_tlb - initialize TLB-specific vars
+ * @gt: GT structure
+ *
+ * TLB cache invalidation logic internally uses some resources that require
+ * initialization. Should be called before doing any TLB cache invalidation.
+ */
 void intel_gt_init_tlb(struct intel_gt *gt)
 {
 	mutex_init(&gt->tlb.invalidate_lock);
 	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
 }
 
+/**
+ * intel_gt_fini_tlb - initialize TLB-specific vars
+ * @gt: GT structure
+ *
+ * Frees any resources needed by TLB cache invalidation logic.
+ */
 void intel_gt_fini_tlb(struct intel_gt *gt)
 {
 	mutex_destroy(&gt->tlb.invalidate_lock);
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
index 46ce25bf5afe..dca70c33bd61 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.h
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
@@ -11,16 +11,117 @@
 
 #include "intel_gt_types.h"
 
+/**
+ * DOC: TLB cache invalidation logic
+ *
+ * The way the current algorithm works is that a struct drm_i915_gem_object can
+ * be created on any order. At unbind/evict time, the object is warranted that
+ * it won't be used anymore. So, a sequence number provided by
+ * intel_gt_next_invalidate_tlb_full() is stored on it. This can happen either
+ * at __vma_put_pages() - for VMA sync unbind, or at ppgtt_unbind_vma() - for
+ * VMA async VMA bind.
+ *
+ * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb_full() is called,
+ * where it checks if the sequence number of the object was already invalidated
+ * or not. If not, it flushes the TLB and increments the sequence number::
+ *
+ *   void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
+ *   {
+ *   ...
+ * 	with_intel_gt_pm_if_awake(gt, wakeref) {
+ * 		mutex_lock(&gt->tlb.invalidate_lock);
+ * 		if (tlb_seqno_passed(gt, seqno))
+ * 				goto unlock;
+ *
+ * 		// Some code to do TLB invalidation
+ *   ...
+ *
+ * 		write_seqcount_invalidate(&gt->tlb.seqno); // increment seqno
+ * 		mutex_lock(&gt->tlb.invalidate_lock);
+ *      }
+ *
+ * So, let's say the current seqno is 2 and 3 new objects were created,
+ * on this order::
+ *
+ * 	obj1
+ * 	obj2
+ * 	obj3
+ *
+ * They can be unbind/evict on a different order. At unbind/evict time,
+ * the mm.tlb will be stamped with the sequence number, using the number
+ * from the last TLB flush, plus 1.
+ *
+ * Different threads may be used on unbind/evict and/or unset pages.
+ * As the logic at void intel_gt_invalidate_tlb_full() is protected by a mutex,
+ * for simplicity, let's consider just two threads:
+ *
+ * +-------------------+-------------------------+---------------------------------+
+ * | sequence number   | Thread 0                | Thread 1                        +
+ * +===================+=========================+=================================+
+ * | seqno=2           |                         |                                 |
+ * |                   +-------------------------+---------------------------------+
+ * |                   | unbind/evict obj3.      |                                 |
+ * |                   |                         |                                 |
+ * |                   | obj3.mm.tlb = seqno | 1 |                                 |
+ * |                   | // obj3.mm.tlb = 3      |                                 |
+ * |                   +-------------------------+---------------------------------+
+ * |                   | unbind/evict obj1.      |                                 |
+ * |                   |                         |                                 |
+ * |                   | obj1.mm.tlb = seqno | 1 |                                 |
+ * |                   | // obj1.mm.tlb = 3      |                                 |
+ * |                   +-------------------------+---------------------------------+
+ * |                   |                         | __i915_gem_object_unset_pages() |
+ * |                   |                         | called for obj3 => TLB flush    |
+ * |                   |                         | invalidating both obj1 and obj2.|
+ * |                   |                         |                                 |
+ * |                   |                         | seqno += 2                      |
+ * +-------------------+-------------------------+---------------------------------+
+ * | seqno=4           |                         |                                 |
+ * |                   +-------------------------+---------------------------------+
+ * |                   | unbind/evict obj2.      |                                 |
+ * |                   |                         |                                 |
+ * |                   | obj2.mm.tlb = seqno | 1 |                                 |
+ * |                   | // obj2.mm.tlb = 5      |                                 |
+ * |                   +-------------------------+---------------------------------+
+ * |                   |                         | __i915_gem_object_unset_pages() |
+ * |                   |                         | called for obj1, don't flush    |
+ * |                   |                         | as past flush invalidated obj1. |
+ * |                   +-------------------------+---------------------------------+
+ * |                   |                         | __i915_gem_object_unset_pages() |
+ * |                   |                         | called for obj2 => TLB flush.   |
+ * |                   |                         | invalidating obj2.              |
+ * |                   |                         |                                 |
+ * |                   |                         | seqno += 2                      |
+ * +-------------------+-------------------------+---------------------------------+
+ * | seqno=6           |                         |                                 |
+ * +-------------------+-------------------------+---------------------------------+
+ */
+
 void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
 
 void intel_gt_init_tlb(struct intel_gt *gt);
 void intel_gt_fini_tlb(struct intel_gt *gt);
 
+/**
+ * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number
+ *
+ * @gt: GT structure
+ *
+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
+ */
 static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
 {
 	return seqprop_sequence(&gt->tlb.seqno);
 }
 
+/**
+ * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation
+ *	sequence number
+ *
+ * @gt: GT structure
+ *
+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
+ */
 static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
 {
 	return intel_gt_tlb_seqno(gt) | 1;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file
  2022-07-29  7:03 ` [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab
@ 2022-08-02 22:21   ` Niranjana Vishwanathapura
  2022-08-03  9:19   ` Andi Shyti
  1 sibling, 0 replies; 7+ messages in thread
From: Niranjana Vishwanathapura @ 2022-08-02 22:21 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Thomas Hellström, David Airlie, intel-gfx, Lucas De Marchi,
	linux-kernel, Chris Wilson, dri-devel, Rodrigo Vivi,
	Tomas Winkler, Matthew Auld

On Fri, Jul 29, 2022 at 09:03:54AM +0200, Mauro Carvalho Chehab wrote:
>From: Chris Wilson <chris.p.wilson@intel.com>
>
>Prepare for supporting more TLB invalidation scenarios by moving
>the current MMIO invalidation to its own file.

And looks like,
1. Rename intel_gt_invalidate_tlb() to intel_gt_invalidate_tlb_full()
2. Add intel_gt_init_tlb() and intel_gt_fini_tlb() abstracts.

Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

>
>Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
>Cc: Fei Yang <fei.yang@intel.com>
>Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>---
>
>To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
>See [PATCH v2 0/2] at: https://lore.kernel.org/all/cover.1659077372.git.mchehab@kernel.org/
>
> drivers/gpu/drm/i915/Makefile             |   1 +
> drivers/gpu/drm/i915/gem/i915_gem_pages.c |   4 +-
> drivers/gpu/drm/i915/gt/intel_gt.c        | 168 +-------------------
> drivers/gpu/drm/i915/gt/intel_gt.h        |  12 --
> drivers/gpu/drm/i915/gt/intel_tlb.c       | 183 ++++++++++++++++++++++
> drivers/gpu/drm/i915/gt/intel_tlb.h       |  29 ++++
> drivers/gpu/drm/i915/i915_vma.c           |   1 +
> 7 files changed, 219 insertions(+), 179 deletions(-)
> create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c
> create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h
>
>diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>index 522ef9b4aff3..d3df9832d1f7 100644
>--- a/drivers/gpu/drm/i915/Makefile
>+++ b/drivers/gpu/drm/i915/Makefile
>@@ -126,6 +126,7 @@ gt-y += \
> 	gt/intel_sseu.o \
> 	gt/intel_sseu_debugfs.o \
> 	gt/intel_timeline.o \
>+	gt/intel_tlb.o \
> 	gt/intel_workarounds.o \
> 	gt/shmem_utils.o \
> 	gt/sysfs_engines.o
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>index 8357dbdcab5c..1cd76cc5d9f3 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>@@ -7,7 +7,7 @@
> #include <drm/drm_cache.h>
>
> #include "gt/intel_gt.h"
>-#include "gt/intel_gt_pm.h"
>+#include "gt/intel_tlb.h"
>
> #include "i915_drv.h"
> #include "i915_gem_object.h"
>@@ -199,7 +199,7 @@ static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
> 	if (!obj->mm.tlb)
> 		return;
>
>-	intel_gt_invalidate_tlb(gt, obj->mm.tlb);
>+	intel_gt_invalidate_tlb_full(gt, obj->mm.tlb);
> 	obj->mm.tlb = 0;
> }
>
>diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>index f435e06125aa..18d82cd620bd 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gt.c
>+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>@@ -11,9 +11,7 @@
> #include "pxp/intel_pxp.h"
>
> #include "i915_drv.h"
>-#include "i915_perf_oa_regs.h"
> #include "intel_context.h"
>-#include "intel_engine_pm.h"
> #include "intel_engine_regs.h"
> #include "intel_ggtt_gmch.h"
> #include "intel_gt.h"
>@@ -31,6 +29,7 @@
> #include "intel_renderstate.h"
> #include "intel_rps.h"
> #include "intel_gt_sysfs.h"
>+#include "intel_tlb.h"
> #include "intel_uncore.h"
> #include "shmem_utils.h"
>
>@@ -48,8 +47,7 @@ static void __intel_gt_init_early(struct intel_gt *gt)
> 	intel_gt_init_reset(gt);
> 	intel_gt_init_requests(gt);
> 	intel_gt_init_timelines(gt);
>-	mutex_init(&gt->tlb.invalidate_lock);
>-	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
>+	intel_gt_init_tlb(gt);
> 	intel_gt_pm_init_early(gt);
>
> 	intel_uc_init_early(&gt->uc);
>@@ -770,7 +768,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
> 		intel_gt_fini_requests(gt);
> 		intel_gt_fini_reset(gt);
> 		intel_gt_fini_timelines(gt);
>-		mutex_destroy(&gt->tlb.invalidate_lock);
>+		intel_gt_fini_tlb(gt);
> 		intel_engines_free(gt);
> 	}
> }
>@@ -881,163 +879,3 @@ void intel_gt_info_print(const struct intel_gt_info *info,
>
> 	intel_sseu_dump(&info->sseu, p);
> }
>-
>-struct reg_and_bit {
>-	i915_reg_t reg;
>-	u32 bit;
>-};
>-
>-static struct reg_and_bit
>-get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
>-		const i915_reg_t *regs, const unsigned int num)
>-{
>-	const unsigned int class = engine->class;
>-	struct reg_and_bit rb = { };
>-
>-	if (drm_WARN_ON_ONCE(&engine->i915->drm,
>-			     class >= num || !regs[class].reg))
>-		return rb;
>-
>-	rb.reg = regs[class];
>-	if (gen8 && class == VIDEO_DECODE_CLASS)
>-		rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */
>-	else
>-		rb.bit = engine->instance;
>-
>-	rb.bit = BIT(rb.bit);
>-
>-	return rb;
>-}
>-
>-static void mmio_invalidate_full(struct intel_gt *gt)
>-{
>-	static const i915_reg_t gen8_regs[] = {
>-		[RENDER_CLASS]			= GEN8_RTCR,
>-		[VIDEO_DECODE_CLASS]		= GEN8_M1TCR, /* , GEN8_M2TCR */
>-		[VIDEO_ENHANCEMENT_CLASS]	= GEN8_VTCR,
>-		[COPY_ENGINE_CLASS]		= GEN8_BTCR,
>-	};
>-	static const i915_reg_t gen12_regs[] = {
>-		[RENDER_CLASS]			= GEN12_GFX_TLB_INV_CR,
>-		[VIDEO_DECODE_CLASS]		= GEN12_VD_TLB_INV_CR,
>-		[VIDEO_ENHANCEMENT_CLASS]	= GEN12_VE_TLB_INV_CR,
>-		[COPY_ENGINE_CLASS]		= GEN12_BLT_TLB_INV_CR,
>-		[COMPUTE_CLASS]			= GEN12_COMPCTX_TLB_INV_CR,
>-	};
>-	struct drm_i915_private *i915 = gt->i915;
>-	struct intel_uncore *uncore = gt->uncore;
>-	struct intel_engine_cs *engine;
>-	intel_engine_mask_t awake, tmp;
>-	enum intel_engine_id id;
>-	const i915_reg_t *regs;
>-	unsigned int num = 0;
>-
>-	if (GRAPHICS_VER(i915) == 12) {
>-		regs = gen12_regs;
>-		num = ARRAY_SIZE(gen12_regs);
>-	} else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) {
>-		regs = gen8_regs;
>-		num = ARRAY_SIZE(gen8_regs);
>-	} else if (GRAPHICS_VER(i915) < 8) {
>-		return;
>-	}
>-
>-	if (drm_WARN_ONCE(&i915->drm, !num,
>-			  "Platform does not implement TLB invalidation!"))
>-		return;
>-
>-	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>-
>-	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
>-
>-	awake = 0;
>-	for_each_engine(engine, gt, id) {
>-		struct reg_and_bit rb;
>-
>-		if (!intel_engine_pm_is_awake(engine))
>-			continue;
>-
>-		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>-		if (!i915_mmio_reg_offset(rb.reg))
>-			continue;
>-
>-		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>-		awake |= engine->mask;
>-	}
>-
>-	GT_TRACE(gt, "invalidated engines %08x\n", awake);
>-
>-	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
>-	if (awake &&
>-	    (IS_TIGERLAKE(i915) ||
>-	     IS_DG1(i915) ||
>-	     IS_ROCKETLAKE(i915) ||
>-	     IS_ALDERLAKE_S(i915) ||
>-	     IS_ALDERLAKE_P(i915)))
>-		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
>-
>-	spin_unlock_irq(&uncore->lock);
>-
>-	for_each_engine_masked(engine, gt, awake, tmp) {
>-		struct reg_and_bit rb;
>-
>-		/*
>-		 * HW architecture suggest typical invalidation time at 40us,
>-		 * with pessimistic cases up to 100us and a recommendation to
>-		 * cap at 1ms. We go a bit higher just in case.
>-		 */
>-		const unsigned int timeout_us = 100;
>-		const unsigned int timeout_ms = 4;
>-
>-		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>-		if (__intel_wait_for_register_fw(uncore,
>-						 rb.reg, rb.bit, 0,
>-						 timeout_us, timeout_ms,
>-						 NULL))
>-			drm_err_ratelimited(&gt->i915->drm,
>-					    "%s TLB invalidation did not complete in %ums!\n",
>-					    engine->name, timeout_ms);
>-	}
>-
>-	/*
>-	 * Use delayed put since a) we mostly expect a flurry of TLB
>-	 * invalidations so it is good to avoid paying the forcewake cost and
>-	 * b) it works around a bug in Icelake which cannot cope with too rapid
>-	 * transitions.
>-	 */
>-	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
>-}
>-
>-static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
>-{
>-	u32 cur = intel_gt_tlb_seqno(gt);
>-
>-	/* Only skip if a *full* TLB invalidate barrier has passed */
>-	return (s32)(cur - ALIGN(seqno, 2)) > 0;
>-}
>-
>-void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
>-{
>-	intel_wakeref_t wakeref;
>-
>-	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
>-		return;
>-
>-	if (intel_gt_is_wedged(gt))
>-		return;
>-
>-	if (tlb_seqno_passed(gt, seqno))
>-		return;
>-
>-	with_intel_gt_pm_if_awake(gt, wakeref) {
>-		mutex_lock(&gt->tlb.invalidate_lock);
>-		if (tlb_seqno_passed(gt, seqno))
>-			goto unlock;
>-
>-		mmio_invalidate_full(gt);
>-
>-		write_seqcount_invalidate(&gt->tlb.seqno);
>-unlock:
>-		mutex_unlock(&gt->tlb.invalidate_lock);
>-	}
>-}
>diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
>index 40b06adf509a..b4bba16cdb53 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gt.h
>+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
>@@ -101,16 +101,4 @@ void intel_gt_info_print(const struct intel_gt_info *info,
>
> void intel_gt_watchdog_work(struct work_struct *work);
>
>-static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
>-{
>-	return seqprop_sequence(&gt->tlb.seqno);
>-}
>-
>-static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
>-{
>-	return intel_gt_tlb_seqno(gt) | 1;
>-}
>-
>-void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
>-
> #endif /* __INTEL_GT_H__ */
>diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
>new file mode 100644
>index 000000000000..af8cae979489
>--- /dev/null
>+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
>@@ -0,0 +1,183 @@
>+// SPDX-License-Identifier: MIT
>+/*
>+ * Copyright © 2022 Intel Corporation
>+ */
>+
>+#include "i915_drv.h"
>+#include "i915_perf_oa_regs.h"
>+#include "intel_engine_pm.h"
>+#include "intel_gt.h"
>+#include "intel_gt_pm.h"
>+#include "intel_gt_regs.h"
>+#include "intel_tlb.h"
>+
>+struct reg_and_bit {
>+	i915_reg_t reg;
>+	u32 bit;
>+};
>+
>+static struct reg_and_bit
>+get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
>+		const i915_reg_t *regs, const unsigned int num)
>+{
>+	const unsigned int class = engine->class;
>+	struct reg_and_bit rb = { };
>+
>+	if (drm_WARN_ON_ONCE(&engine->i915->drm,
>+			     class >= num || !regs[class].reg))
>+		return rb;
>+
>+	rb.reg = regs[class];
>+	if (gen8 && class == VIDEO_DECODE_CLASS)
>+		rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */
>+	else
>+		rb.bit = engine->instance;
>+
>+	rb.bit = BIT(rb.bit);
>+
>+	return rb;
>+}
>+
>+static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
>+{
>+	u32 cur = intel_gt_tlb_seqno(gt);
>+
>+	/* Only skip if a *full* TLB invalidate barrier has passed */
>+	return (s32)(cur - ALIGN(seqno, 2)) > 0;
>+}
>+
>+static void mmio_invalidate_full(struct intel_gt *gt)
>+{
>+	static const i915_reg_t gen8_regs[] = {
>+		[RENDER_CLASS]			= GEN8_RTCR,
>+		[VIDEO_DECODE_CLASS]		= GEN8_M1TCR, /* , GEN8_M2TCR */
>+		[VIDEO_ENHANCEMENT_CLASS]	= GEN8_VTCR,
>+		[COPY_ENGINE_CLASS]		= GEN8_BTCR,
>+	};
>+	static const i915_reg_t gen12_regs[] = {
>+		[RENDER_CLASS]			= GEN12_GFX_TLB_INV_CR,
>+		[VIDEO_DECODE_CLASS]		= GEN12_VD_TLB_INV_CR,
>+		[VIDEO_ENHANCEMENT_CLASS]	= GEN12_VE_TLB_INV_CR,
>+		[COPY_ENGINE_CLASS]		= GEN12_BLT_TLB_INV_CR,
>+		[COMPUTE_CLASS]			= GEN12_COMPCTX_TLB_INV_CR,
>+	};
>+	struct drm_i915_private *i915 = gt->i915;
>+	struct intel_uncore *uncore = gt->uncore;
>+	struct intel_engine_cs *engine;
>+	intel_engine_mask_t awake, tmp;
>+	enum intel_engine_id id;
>+	const i915_reg_t *regs;
>+	unsigned int num = 0;
>+
>+	if (GRAPHICS_VER(i915) == 12) {
>+		regs = gen12_regs;
>+		num = ARRAY_SIZE(gen12_regs);
>+	} else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) {
>+		regs = gen8_regs;
>+		num = ARRAY_SIZE(gen8_regs);
>+	} else if (GRAPHICS_VER(i915) < 8) {
>+		return;
>+	}
>+
>+	if (drm_WARN_ONCE(&i915->drm, !num,
>+			  "Platform does not implement TLB invalidation!"))
>+		return;
>+
>+	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>+
>+	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
>+
>+	awake = 0;
>+	for_each_engine(engine, gt, id) {
>+		struct reg_and_bit rb;
>+
>+		if (!intel_engine_pm_is_awake(engine))
>+			continue;
>+
>+		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>+		if (!i915_mmio_reg_offset(rb.reg))
>+			continue;
>+
>+		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>+		awake |= engine->mask;
>+	}
>+
>+	GT_TRACE(gt, "invalidated engines %08x\n", awake);
>+
>+	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
>+	if (awake &&
>+	    (IS_TIGERLAKE(i915) ||
>+	     IS_DG1(i915) ||
>+	     IS_ROCKETLAKE(i915) ||
>+	     IS_ALDERLAKE_S(i915) ||
>+	     IS_ALDERLAKE_P(i915)))
>+		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
>+
>+	spin_unlock_irq(&uncore->lock);
>+
>+	for_each_engine_masked(engine, gt, awake, tmp) {
>+		struct reg_and_bit rb;
>+
>+		/*
>+		 * HW architecture suggest typical invalidation time at 40us,
>+		 * with pessimistic cases up to 100us and a recommendation to
>+		 * cap at 1ms. We go a bit higher just in case.
>+		 */
>+		const unsigned int timeout_us = 100;
>+		const unsigned int timeout_ms = 4;
>+
>+		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>+		if (__intel_wait_for_register_fw(uncore,
>+						 rb.reg, rb.bit, 0,
>+						 timeout_us, timeout_ms,
>+						 NULL))
>+			drm_err_ratelimited(&gt->i915->drm,
>+					    "%s TLB invalidation did not complete in %ums!\n",
>+					    engine->name, timeout_ms);
>+	}
>+
>+	/*
>+	 * Use delayed put since a) we mostly expect a flurry of TLB
>+	 * invalidations so it is good to avoid paying the forcewake cost and
>+	 * b) it works around a bug in Icelake which cannot cope with too rapid
>+	 * transitions.
>+	 */
>+	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
>+}
>+
>+void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
>+{
>+	intel_wakeref_t wakeref;
>+
>+	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
>+		return;
>+
>+	if (intel_gt_is_wedged(gt))
>+		return;
>+
>+	if (tlb_seqno_passed(gt, seqno))
>+		return;
>+
>+	with_intel_gt_pm_if_awake(gt, wakeref) {
>+		mutex_lock(&gt->tlb.invalidate_lock);
>+		if (tlb_seqno_passed(gt, seqno))
>+			goto unlock;
>+
>+		mmio_invalidate_full(gt);
>+
>+		write_seqcount_invalidate(&gt->tlb.seqno);
>+unlock:
>+		mutex_unlock(&gt->tlb.invalidate_lock);
>+	}
>+}
>+
>+void intel_gt_init_tlb(struct intel_gt *gt)
>+{
>+	mutex_init(&gt->tlb.invalidate_lock);
>+	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
>+}
>+
>+void intel_gt_fini_tlb(struct intel_gt *gt)
>+{
>+	mutex_destroy(&gt->tlb.invalidate_lock);
>+}
>diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
>new file mode 100644
>index 000000000000..46ce25bf5afe
>--- /dev/null
>+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
>@@ -0,0 +1,29 @@
>+/* SPDX-License-Identifier: MIT */
>+/*
>+ * Copyright © 2022 Intel Corporation
>+ */
>+
>+#ifndef INTEL_TLB_H
>+#define INTEL_TLB_H
>+
>+#include <linux/seqlock.h>
>+#include <linux/types.h>
>+
>+#include "intel_gt_types.h"
>+
>+void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
>+
>+void intel_gt_init_tlb(struct intel_gt *gt);
>+void intel_gt_fini_tlb(struct intel_gt *gt);
>+
>+static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
>+{
>+	return seqprop_sequence(&gt->tlb.seqno);
>+}
>+
>+static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
>+{
>+	return intel_gt_tlb_seqno(gt) | 1;
>+}
>+
>+#endif /* INTEL_TLB_H */
>diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>index 84a9ccbc5fc5..fe947d1456d5 100644
>--- a/drivers/gpu/drm/i915/i915_vma.c
>+++ b/drivers/gpu/drm/i915/i915_vma.c
>@@ -33,6 +33,7 @@
> #include "gt/intel_engine_heartbeat.h"
> #include "gt/intel_gt.h"
> #include "gt/intel_gt_requests.h"
>+#include "gt/intel_tlb.h"
>
> #include "i915_drv.h"
> #include "i915_gem_evict.h"
>-- 
>2.36.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions
  2022-07-29  7:03 ` [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions Mauro Carvalho Chehab
@ 2022-08-02 22:30   ` Niranjana Vishwanathapura
  2022-08-04  7:24     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 7+ messages in thread
From: Niranjana Vishwanathapura @ 2022-08-02 22:30 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Jonathan Corbet, David Airlie, intel-gfx, linux-doc,
	linux-kernel, dri-devel, Thomas Zimmermann, Rodrigo Vivi

On Fri, Jul 29, 2022 at 09:03:55AM +0200, Mauro Carvalho Chehab wrote:
>Add a description for the TLB cache invalidation algorithm and for
>the related kAPI functions.
>
>Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>---
>
>To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
>See [PATCH v2 0/2] at: https://lore.kernel.org/all/cover.1659077372.git.mchehab@kernel.org/
>
> Documentation/gpu/i915.rst          |   7 ++
> drivers/gpu/drm/i915/gt/intel_tlb.c |  25 +++++++
> drivers/gpu/drm/i915/gt/intel_tlb.h | 101 ++++++++++++++++++++++++++++
> 3 files changed, 133 insertions(+)
>
>diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
>index 4e59db1cfb00..46911fdd79e8 100644
>--- a/Documentation/gpu/i915.rst
>+++ b/Documentation/gpu/i915.rst
>@@ -58,6 +58,13 @@ Intel GVT-g Host Support(vGPU device model)
> .. kernel-doc:: drivers/gpu/drm/i915/intel_gvt.c
>    :internal:
>
>+TLB cache invalidation
>+----------------------
>+
>+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.h
>+
>+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.c
>+
> Workarounds
> -----------
>
>diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
>index af8cae979489..4873b7ecc015 100644
>--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
>+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
>@@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt)
> 	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
> }
>
>+/**
>+ * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
>+ * @gt: GT structure
>+ * @seqno: sequence number
>+ *
>+ * Do a full TLB cache invalidation if the @seqno is bigger than the last
>+ * full TLB cache invalidation.
>+ *
>+ * Note:
>+ * The TLB cache invalidation logic depends on GEN-specific registers.
>+ * It currently supports MMIO-based TLB flush for GEN8 to GEN12.
>+ */
> void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> {
> 	intel_wakeref_t wakeref;
>@@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> 	}
> }
>
>+/**
>+ * intel_gt_init_tlb - initialize TLB-specific vars
>+ * @gt: GT structure
>+ *
>+ * TLB cache invalidation logic internally uses some resources that require
>+ * initialization. Should be called before doing any TLB cache invalidation.
>+ */
> void intel_gt_init_tlb(struct intel_gt *gt)
> {
> 	mutex_init(&gt->tlb.invalidate_lock);
> 	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
> }
>
>+/**
>+ * intel_gt_fini_tlb - initialize TLB-specific vars

Free TLB-specific vars

>+ * @gt: GT structure
>+ *
>+ * Frees any resources needed by TLB cache invalidation logic.
>+ */
> void intel_gt_fini_tlb(struct intel_gt *gt)
> {
> 	mutex_destroy(&gt->tlb.invalidate_lock);
>diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
>index 46ce25bf5afe..dca70c33bd61 100644
>--- a/drivers/gpu/drm/i915/gt/intel_tlb.h
>+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
>@@ -11,16 +11,117 @@
>
> #include "intel_gt_types.h"
>
>+/**
>+ * DOC: TLB cache invalidation logic
>+ *
>+ * The way the current algorithm works is that a struct drm_i915_gem_object can
>+ * be created on any order. At unbind/evict time, the object is warranted that
>+ * it won't be used anymore. So, a sequence number provided by
>+ * intel_gt_next_invalidate_tlb_full() is stored on it. This can happen either
>+ * at __vma_put_pages() - for VMA sync unbind, or at ppgtt_unbind_vma() - for
>+ * VMA async VMA bind.
>+ *
>+ * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb_full() is called,
>+ * where it checks if the sequence number of the object was already invalidated
>+ * or not. If not, it flushes the TLB and increments the sequence number::
>+ *
>+ *   void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
>+ *   {
>+ *   ...
>+ * 	with_intel_gt_pm_if_awake(gt, wakeref) {
>+ * 		mutex_lock(&gt->tlb.invalidate_lock);
>+ * 		if (tlb_seqno_passed(gt, seqno))
>+ * 				goto unlock;
>+ *
>+ * 		// Some code to do TLB invalidation
>+ *   ...
>+ *
>+ * 		write_seqcount_invalidate(&gt->tlb.seqno); // increment seqno
>+ * 		mutex_lock(&gt->tlb.invalidate_lock);
>+ *      }
>+ *
>+ * So, let's say the current seqno is 2 and 3 new objects were created,
>+ * on this order::
>+ *
>+ * 	obj1
>+ * 	obj2
>+ * 	obj3
>+ *
>+ * They can be unbind/evict on a different order. At unbind/evict time,
>+ * the mm.tlb will be stamped with the sequence number, using the number
>+ * from the last TLB flush, plus 1.

I am trying to get my head around the below function.

void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
{
        WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
}

Though we pass obj->mm.tlb for 'tlb' while calling this function,
aren't we writing to local 'tlb' variable here instead of obj->mm.tlb?

>+ *
>+ * Different threads may be used on unbind/evict and/or unset pages.
>+ * As the logic at void intel_gt_invalidate_tlb_full() is protected by a mutex,

May be we can skip 'void' and just keep function name here.

>+ * for simplicity, let's consider just two threads:
>+ *
>+ * +-------------------+-------------------------+---------------------------------+
>+ * | sequence number   | Thread 0                | Thread 1                        +
>+ * +===================+=========================+=================================+
>+ * | seqno=2           |                         |                                 |
>+ * |                   +-------------------------+---------------------------------+
>+ * |                   | unbind/evict obj3.      |                                 |
>+ * |                   |                         |                                 |
>+ * |                   | obj3.mm.tlb = seqno | 1 |                                 |
>+ * |                   | // obj3.mm.tlb = 3      |                                 |
>+ * |                   +-------------------------+---------------------------------+
>+ * |                   | unbind/evict obj1.      |                                 |
>+ * |                   |                         |                                 |
>+ * |                   | obj1.mm.tlb = seqno | 1 |                                 |
>+ * |                   | // obj1.mm.tlb = 3      |                                 |
>+ * |                   +-------------------------+---------------------------------+
>+ * |                   |                         | __i915_gem_object_unset_pages() |
>+ * |                   |                         | called for obj3 => TLB flush    |
>+ * |                   |                         | invalidating both obj1 and obj2.|
>+ * |                   |                         |                                 |
>+ * |                   |                         | seqno += 2                      |
>+ * +-------------------+-------------------------+---------------------------------+
>+ * | seqno=4           |                         |                                 |
>+ * |                   +-------------------------+---------------------------------+
>+ * |                   | unbind/evict obj2.      |                                 |
>+ * |                   |                         |                                 |
>+ * |                   | obj2.mm.tlb = seqno | 1 |                                 |
>+ * |                   | // obj2.mm.tlb = 5      |                                 |
>+ * |                   +-------------------------+---------------------------------+
>+ * |                   |                         | __i915_gem_object_unset_pages() |
>+ * |                   |                         | called for obj1, don't flush    |
>+ * |                   |                         | as past flush invalidated obj1. |
>+ * |                   +-------------------------+---------------------------------+
>+ * |                   |                         | __i915_gem_object_unset_pages() |
>+ * |                   |                         | called for obj2 => TLB flush.   |
>+ * |                   |                         | invalidating obj2.              |
>+ * |                   |                         |                                 |
>+ * |                   |                         | seqno += 2                      |
>+ * +-------------------+-------------------------+---------------------------------+
>+ * | seqno=6           |                         |                                 |
>+ * +-------------------+-------------------------+---------------------------------+
>+ */
>+
> void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
>
> void intel_gt_init_tlb(struct intel_gt *gt);
> void intel_gt_fini_tlb(struct intel_gt *gt);
>
>+/**
>+ * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number
>+ *

Probably this empty comment line needs to be removed before the parameter
description below?

>+ * @gt: GT structure
>+ *
>+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
>+ */
> static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
> {
> 	return seqprop_sequence(&gt->tlb.seqno);
> }
>
>+/**
>+ * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation
>+ *	sequence number
>+ *

Same here.

-Niranjana

>+ * @gt: GT structure
>+ *
>+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
>+ */
> static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
> {
> 	return intel_gt_tlb_seqno(gt) | 1;
>-- 
>2.36.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file
  2022-07-29  7:03 ` [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab
  2022-08-02 22:21   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-08-03  9:19   ` Andi Shyti
  1 sibling, 0 replies; 7+ messages in thread
From: Andi Shyti @ 2022-08-03  9:19 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Thomas Hellström, Andi Shyti, Tvrtko Ursulin, David Airlie,
	dri-devel, Casey Bowman, Lucas De Marchi, linux-kernel,
	Chris Wilson, Daniele Ceraolo Spurio, Rodrigo Vivi, Tomas Winkler,
	intel-gfx, Fei Yang, Matthew Auld

Hi Mauro,

On Fri, Jul 29, 2022 at 09:03:54AM +0200, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Prepare for supporting more TLB invalidation scenarios by moving
> the current MMIO invalidation to its own file.
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

I already reviewed this patch... anyway I checked it again and
it's all correct.

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions
  2022-08-02 22:30   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-08-04  7:24     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2022-08-04  7:24 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: linux-doc, David Airlie, intel-gfx, Jonathan Corbet, linux-kernel,
	Chris Wilson, dri-devel, Thomas Zimmermann, Rodrigo Vivi,
	Mauro Carvalho Chehab

On Tue, 2 Aug 2022 15:30:44 -0700
Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> wrote:

> On Fri, Jul 29, 2022 at 09:03:55AM +0200, Mauro Carvalho Chehab wrote:
> >Add a description for the TLB cache invalidation algorithm and for
> >the related kAPI functions.
> >
> >Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >---
> >
> >To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> >See [PATCH v2 0/2] at: https://lore.kernel.org/all/cover.1659077372.git.mchehab@kernel.org/
> >
> > Documentation/gpu/i915.rst          |   7 ++
> > drivers/gpu/drm/i915/gt/intel_tlb.c |  25 +++++++
> > drivers/gpu/drm/i915/gt/intel_tlb.h | 101 ++++++++++++++++++++++++++++
> > 3 files changed, 133 insertions(+)
> >
> >diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> >index 4e59db1cfb00..46911fdd79e8 100644
> >--- a/Documentation/gpu/i915.rst
> >+++ b/Documentation/gpu/i915.rst
> >@@ -58,6 +58,13 @@ Intel GVT-g Host Support(vGPU device model)
> > .. kernel-doc:: drivers/gpu/drm/i915/intel_gvt.c
> >    :internal:
> >
> >+TLB cache invalidation
> >+----------------------
> >+
> >+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.h
> >+
> >+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.c
> >+
> > Workarounds
> > -----------
> >
> >diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
> >index af8cae979489..4873b7ecc015 100644
> >--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
> >+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
> >@@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt)
> > 	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
> > }
> >
> >+/**
> >+ * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
> >+ * @gt: GT structure
> >+ * @seqno: sequence number
> >+ *
> >+ * Do a full TLB cache invalidation if the @seqno is bigger than the last
> >+ * full TLB cache invalidation.
> >+ *
> >+ * Note:
> >+ * The TLB cache invalidation logic depends on GEN-specific registers.
> >+ * It currently supports MMIO-based TLB flush for GEN8 to GEN12.
> >+ */
> > void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> > {
> > 	intel_wakeref_t wakeref;
> >@@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> > 	}
> > }
> >
> >+/**
> >+ * intel_gt_init_tlb - initialize TLB-specific vars
> >+ * @gt: GT structure
> >+ *
> >+ * TLB cache invalidation logic internally uses some resources that require
> >+ * initialization. Should be called before doing any TLB cache invalidation.
> >+ */
> > void intel_gt_init_tlb(struct intel_gt *gt)
> > {
> > 	mutex_init(&gt->tlb.invalidate_lock);
> > 	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
> > }
> >
> >+/**
> >+ * intel_gt_fini_tlb - initialize TLB-specific vars  
> 
> Free TLB-specific vars

OK.

> 
> >+ * @gt: GT structure
> >+ *
> >+ * Frees any resources needed by TLB cache invalidation logic.
> >+ */
> > void intel_gt_fini_tlb(struct intel_gt *gt)
> > {
> > 	mutex_destroy(&gt->tlb.invalidate_lock);
> >diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
> >index 46ce25bf5afe..dca70c33bd61 100644
> >--- a/drivers/gpu/drm/i915/gt/intel_tlb.h
> >+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
> >@@ -11,16 +11,117 @@
> >
> > #include "intel_gt_types.h"
> >
> >+/**
> >+ * DOC: TLB cache invalidation logic
> >+ *
> >+ * The way the current algorithm works is that a struct drm_i915_gem_object can
> >+ * be created on any order. At unbind/evict time, the object is warranted that
> >+ * it won't be used anymore. So, a sequence number provided by
> >+ * intel_gt_next_invalidate_tlb_full() is stored on it. This can happen either
> >+ * at __vma_put_pages() - for VMA sync unbind, or at ppgtt_unbind_vma() - for
> >+ * VMA async VMA bind.
> >+ *
> >+ * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb_full() is called,
> >+ * where it checks if the sequence number of the object was already invalidated
> >+ * or not. If not, it flushes the TLB and increments the sequence number::
> >+ *
> >+ *   void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> >+ *   {
> >+ *   ...
> >+ * 	with_intel_gt_pm_if_awake(gt, wakeref) {
> >+ * 		mutex_lock(&gt->tlb.invalidate_lock);
> >+ * 		if (tlb_seqno_passed(gt, seqno))
> >+ * 				goto unlock;
> >+ *
> >+ * 		// Some code to do TLB invalidation
> >+ *   ...
> >+ *
> >+ * 		write_seqcount_invalidate(&gt->tlb.seqno); // increment seqno
> >+ * 		mutex_lock(&gt->tlb.invalidate_lock);
> >+ *      }
> >+ *
> >+ * So, let's say the current seqno is 2 and 3 new objects were created,
> >+ * on this order::
> >+ *
> >+ * 	obj1
> >+ * 	obj2
> >+ * 	obj3
> >+ *
> >+ * They can be unbind/evict on a different order. At unbind/evict time,
> >+ * the mm.tlb will be stamped with the sequence number, using the number
> >+ * from the last TLB flush, plus 1.  
> 
> I am trying to get my head around the below function.
> 
> void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
> {
>         WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
> }
> 
> Though we pass obj->mm.tlb for 'tlb' while calling this function,
> aren't we writing to local 'tlb' variable here instead of obj->mm.tlb?

It should be passing a pointer. I wrote such fix after a review,
but somehow it ended getting lost. I'll send the fix at v3.

> >+ *
> >+ * Different threads may be used on unbind/evict and/or unset pages.
> >+ * As the logic at void intel_gt_invalidate_tlb_full() is protected by a mutex,  
> 
> May be we can skip 'void' and just keep function name here.

Sure.

> >+ * for simplicity, let's consider just two threads:
> >+ *
> >+ * +-------------------+-------------------------+---------------------------------+
> >+ * | sequence number   | Thread 0                | Thread 1                        +
> >+ * +===================+=========================+=================================+
> >+ * | seqno=2           |                         |                                 |
> >+ * |                   +-------------------------+---------------------------------+
> >+ * |                   | unbind/evict obj3.      |                                 |
> >+ * |                   |                         |                                 |
> >+ * |                   | obj3.mm.tlb = seqno | 1 |                                 |
> >+ * |                   | // obj3.mm.tlb = 3      |                                 |
> >+ * |                   +-------------------------+---------------------------------+
> >+ * |                   | unbind/evict obj1.      |                                 |
> >+ * |                   |                         |                                 |
> >+ * |                   | obj1.mm.tlb = seqno | 1 |                                 |
> >+ * |                   | // obj1.mm.tlb = 3      |                                 |
> >+ * |                   +-------------------------+---------------------------------+
> >+ * |                   |                         | __i915_gem_object_unset_pages() |
> >+ * |                   |                         | called for obj3 => TLB flush    |
> >+ * |                   |                         | invalidating both obj1 and obj2.|
> >+ * |                   |                         |                                 |
> >+ * |                   |                         | seqno += 2                      |
> >+ * +-------------------+-------------------------+---------------------------------+
> >+ * | seqno=4           |                         |                                 |
> >+ * |                   +-------------------------+---------------------------------+
> >+ * |                   | unbind/evict obj2.      |                                 |
> >+ * |                   |                         |                                 |
> >+ * |                   | obj2.mm.tlb = seqno | 1 |                                 |
> >+ * |                   | // obj2.mm.tlb = 5      |                                 |
> >+ * |                   +-------------------------+---------------------------------+
> >+ * |                   |                         | __i915_gem_object_unset_pages() |
> >+ * |                   |                         | called for obj1, don't flush    |
> >+ * |                   |                         | as past flush invalidated obj1. |
> >+ * |                   +-------------------------+---------------------------------+
> >+ * |                   |                         | __i915_gem_object_unset_pages() |
> >+ * |                   |                         | called for obj2 => TLB flush.   |
> >+ * |                   |                         | invalidating obj2.              |
> >+ * |                   |                         |                                 |
> >+ * |                   |                         | seqno += 2                      |
> >+ * +-------------------+-------------------------+---------------------------------+
> >+ * | seqno=6           |                         |                                 |
> >+ * +-------------------+-------------------------+---------------------------------+
> >+ */
> >+
> > void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
> >
> > void intel_gt_init_tlb(struct intel_gt *gt);
> > void intel_gt_fini_tlb(struct intel_gt *gt);
> >
> >+/**
> >+ * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number
> >+ *  
> 
> Probably this empty comment line needs to be removed before the parameter
> description below?

Kernel-doc actually accepts both with or without a blank line. My
personal preference is to place a blank line, because sometimes the
function description plus function name is bigger than one line.
So, it is usually clearer when adding a blank line than doing
something like this (perfectly valid kerneldoc markup):

	/**
	 * long_function_name_foo - Lorem ipsum dolor sit amet, consectetur
	 * adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore.
	 * @bar: some parameter
	 *  ...

But yeah, kernel-doc documentation example doesn't have a blank
line. So, I'll drop it.

> 
> >+ * @gt: GT structure
> >+ *
> >+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
> >+ */
> > static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
> > {
> > 	return seqprop_sequence(&gt->tlb.seqno);
> > }
> >
> >+/**
> >+ * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation
> >+ *	sequence number
> >+ *  
> 
> Same here.
> 
> -Niranjana
> 
> >+ * @gt: GT structure
> >+ *
> >+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
> >+ */
> > static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
> > {
> > 	return intel_gt_tlb_seqno(gt) | 1;
> >-- 
> >2.36.1
> >  

Thanks!
Mauro

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-04  7:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-29  7:03 [PATCH v2 0/2] Move TLB invalidation code for its own file and document it Mauro Carvalho Chehab
2022-07-29  7:03 ` [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab
2022-08-02 22:21   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-08-03  9:19   ` Andi Shyti
2022-07-29  7:03 ` [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions Mauro Carvalho Chehab
2022-08-02 22:30   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-08-04  7:24     ` Mauro Carvalho Chehab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).