* [PATCH v2 0/2] Move TLB invalidation code for its own file and document it
@ 2022-07-29 7:03 Mauro Carvalho Chehab
2022-07-29 7:03 ` [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions Mauro Carvalho Chehab
0 siblings, 1 reply; 4+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-29 7:03 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Daniel Vetter, David Airlie,
Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, dri-devel, intel-gfx, linux-doc, linux-kernel
There are more things to be added to TLB invalidation. Before doing that,
move the code to its own file, and add the relevant documentation.
Patch 1 only moves the code and do some function renames. No functional
change.
Patch 2 adds documentation for the TLB invalidation algorithm and functions.
---
v2: only patch 2 (kernel-doc) was modified:
- The kernel-doc markups for TLB were added to i915.rst doc;
- Some minor fixes at the texts;
- Use a table instead of a literal block while explaining how the algorithm works.
That should make easier to understand the logic, both in text form and after
its conversion to HTML/PDF;
- Remove mention for GuC, as this depends on a series that will be sent later.
Chris Wilson (1):
drm/i915/gt: Move TLB invalidation to its own file
Mauro Carvalho Chehab (1):
drm/i915/gt: document TLB cache invalidation functions
Documentation/gpu/i915.rst | 7 +
drivers/gpu/drm/i915/Makefile | 1 +
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 +-
drivers/gpu/drm/i915/gt/intel_gt.c | 168 +----------------
drivers/gpu/drm/i915/gt/intel_gt.h | 12 --
drivers/gpu/drm/i915/gt/intel_tlb.c | 208 ++++++++++++++++++++++
drivers/gpu/drm/i915/gt/intel_tlb.h | 130 ++++++++++++++
drivers/gpu/drm/i915/i915_vma.c | 1 +
8 files changed, 352 insertions(+), 179 deletions(-)
create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c
create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h
--
2.36.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions
2022-07-29 7:03 [PATCH v2 0/2] Move TLB invalidation code for its own file and document it Mauro Carvalho Chehab
@ 2022-07-29 7:03 ` Mauro Carvalho Chehab
2022-08-02 22:30 ` [Intel-gfx] " Niranjana Vishwanathapura
0 siblings, 1 reply; 4+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-29 7:03 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Chris Wilson, Daniel Vetter, David Airlie,
Jani Nikula, Jonathan Corbet, Joonas Lahtinen, Maarten Lankhorst,
Maxime Ripard, Rodrigo Vivi, Thomas Zimmermann, Tvrtko Ursulin,
dri-devel, intel-gfx, linux-doc, linux-kernel
Add a description for the TLB cache invalidation algorithm and for
the related kAPI functions.
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---
To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 0/2] at: https://lore.kernel.org/all/cover.1659077372.git.mchehab@kernel.org/
Documentation/gpu/i915.rst | 7 ++
drivers/gpu/drm/i915/gt/intel_tlb.c | 25 +++++++
drivers/gpu/drm/i915/gt/intel_tlb.h | 101 ++++++++++++++++++++++++++++
3 files changed, 133 insertions(+)
diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 4e59db1cfb00..46911fdd79e8 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -58,6 +58,13 @@ Intel GVT-g Host Support(vGPU device model)
.. kernel-doc:: drivers/gpu/drm/i915/intel_gvt.c
:internal:
+TLB cache invalidation
+----------------------
+
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.h
+
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.c
+
Workarounds
-----------
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
index af8cae979489..4873b7ecc015 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
@@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt)
intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
}
+/**
+ * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
+ * @gt: GT structure
+ * @seqno: sequence number
+ *
+ * Do a full TLB cache invalidation if the @seqno is bigger than the last
+ * full TLB cache invalidation.
+ *
+ * Note:
+ * The TLB cache invalidation logic depends on GEN-specific registers.
+ * It currently supports MMIO-based TLB flush for GEN8 to GEN12.
+ */
void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
{
intel_wakeref_t wakeref;
@@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
}
}
+/**
+ * intel_gt_init_tlb - initialize TLB-specific vars
+ * @gt: GT structure
+ *
+ * TLB cache invalidation logic internally uses some resources that require
+ * initialization. Should be called before doing any TLB cache invalidation.
+ */
void intel_gt_init_tlb(struct intel_gt *gt)
{
mutex_init(>->tlb.invalidate_lock);
seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock);
}
+/**
+ * intel_gt_fini_tlb - initialize TLB-specific vars
+ * @gt: GT structure
+ *
+ * Frees any resources needed by TLB cache invalidation logic.
+ */
void intel_gt_fini_tlb(struct intel_gt *gt)
{
mutex_destroy(>->tlb.invalidate_lock);
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
index 46ce25bf5afe..dca70c33bd61 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.h
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
@@ -11,16 +11,117 @@
#include "intel_gt_types.h"
+/**
+ * DOC: TLB cache invalidation logic
+ *
+ * The way the current algorithm works is that a struct drm_i915_gem_object can
+ * be created on any order. At unbind/evict time, the object is warranted that
+ * it won't be used anymore. So, a sequence number provided by
+ * intel_gt_next_invalidate_tlb_full() is stored on it. This can happen either
+ * at __vma_put_pages() - for VMA sync unbind, or at ppgtt_unbind_vma() - for
+ * VMA async VMA bind.
+ *
+ * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb_full() is called,
+ * where it checks if the sequence number of the object was already invalidated
+ * or not. If not, it flushes the TLB and increments the sequence number::
+ *
+ * void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
+ * {
+ * ...
+ * with_intel_gt_pm_if_awake(gt, wakeref) {
+ * mutex_lock(>->tlb.invalidate_lock);
+ * if (tlb_seqno_passed(gt, seqno))
+ * goto unlock;
+ *
+ * // Some code to do TLB invalidation
+ * ...
+ *
+ * write_seqcount_invalidate(>->tlb.seqno); // increment seqno
+ * mutex_lock(>->tlb.invalidate_lock);
+ * }
+ *
+ * So, let's say the current seqno is 2 and 3 new objects were created,
+ * on this order::
+ *
+ * obj1
+ * obj2
+ * obj3
+ *
+ * They can be unbind/evict on a different order. At unbind/evict time,
+ * the mm.tlb will be stamped with the sequence number, using the number
+ * from the last TLB flush, plus 1.
+ *
+ * Different threads may be used on unbind/evict and/or unset pages.
+ * As the logic at void intel_gt_invalidate_tlb_full() is protected by a mutex,
+ * for simplicity, let's consider just two threads:
+ *
+ * +-------------------+-------------------------+---------------------------------+
+ * | sequence number | Thread 0 | Thread 1 +
+ * +===================+=========================+=================================+
+ * | seqno=2 | | |
+ * | +-------------------------+---------------------------------+
+ * | | unbind/evict obj3. | |
+ * | | | |
+ * | | obj3.mm.tlb = seqno | 1 | |
+ * | | // obj3.mm.tlb = 3 | |
+ * | +-------------------------+---------------------------------+
+ * | | unbind/evict obj1. | |
+ * | | | |
+ * | | obj1.mm.tlb = seqno | 1 | |
+ * | | // obj1.mm.tlb = 3 | |
+ * | +-------------------------+---------------------------------+
+ * | | | __i915_gem_object_unset_pages() |
+ * | | | called for obj3 => TLB flush |
+ * | | | invalidating both obj1 and obj2.|
+ * | | | |
+ * | | | seqno += 2 |
+ * +-------------------+-------------------------+---------------------------------+
+ * | seqno=4 | | |
+ * | +-------------------------+---------------------------------+
+ * | | unbind/evict obj2. | |
+ * | | | |
+ * | | obj2.mm.tlb = seqno | 1 | |
+ * | | // obj2.mm.tlb = 5 | |
+ * | +-------------------------+---------------------------------+
+ * | | | __i915_gem_object_unset_pages() |
+ * | | | called for obj1, don't flush |
+ * | | | as past flush invalidated obj1. |
+ * | +-------------------------+---------------------------------+
+ * | | | __i915_gem_object_unset_pages() |
+ * | | | called for obj2 => TLB flush. |
+ * | | | invalidating obj2. |
+ * | | | |
+ * | | | seqno += 2 |
+ * +-------------------+-------------------------+---------------------------------+
+ * | seqno=6 | | |
+ * +-------------------+-------------------------+---------------------------------+
+ */
+
void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
void intel_gt_init_tlb(struct intel_gt *gt);
void intel_gt_fini_tlb(struct intel_gt *gt);
+/**
+ * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number
+ *
+ * @gt: GT structure
+ *
+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
+ */
static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
{
return seqprop_sequence(>->tlb.seqno);
}
+/**
+ * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation
+ * sequence number
+ *
+ * @gt: GT structure
+ *
+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
+ */
static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
{
return intel_gt_tlb_seqno(gt) | 1;
--
2.36.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions
2022-07-29 7:03 ` [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions Mauro Carvalho Chehab
@ 2022-08-02 22:30 ` Niranjana Vishwanathapura
2022-08-04 7:24 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 4+ messages in thread
From: Niranjana Vishwanathapura @ 2022-08-02 22:30 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, David Airlie,
dri-devel, linux-kernel, linux-doc, Chris Wilson, Rodrigo Vivi,
intel-gfx
On Fri, Jul 29, 2022 at 09:03:55AM +0200, Mauro Carvalho Chehab wrote:
>Add a description for the TLB cache invalidation algorithm and for
>the related kAPI functions.
>
>Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>---
>
>To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
>See [PATCH v2 0/2] at: https://lore.kernel.org/all/cover.1659077372.git.mchehab@kernel.org/
>
> Documentation/gpu/i915.rst | 7 ++
> drivers/gpu/drm/i915/gt/intel_tlb.c | 25 +++++++
> drivers/gpu/drm/i915/gt/intel_tlb.h | 101 ++++++++++++++++++++++++++++
> 3 files changed, 133 insertions(+)
>
>diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
>index 4e59db1cfb00..46911fdd79e8 100644
>--- a/Documentation/gpu/i915.rst
>+++ b/Documentation/gpu/i915.rst
>@@ -58,6 +58,13 @@ Intel GVT-g Host Support(vGPU device model)
> .. kernel-doc:: drivers/gpu/drm/i915/intel_gvt.c
> :internal:
>
>+TLB cache invalidation
>+----------------------
>+
>+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.h
>+
>+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.c
>+
> Workarounds
> -----------
>
>diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
>index af8cae979489..4873b7ecc015 100644
>--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
>+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
>@@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt)
> intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
> }
>
>+/**
>+ * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
>+ * @gt: GT structure
>+ * @seqno: sequence number
>+ *
>+ * Do a full TLB cache invalidation if the @seqno is bigger than the last
>+ * full TLB cache invalidation.
>+ *
>+ * Note:
>+ * The TLB cache invalidation logic depends on GEN-specific registers.
>+ * It currently supports MMIO-based TLB flush for GEN8 to GEN12.
>+ */
> void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> {
> intel_wakeref_t wakeref;
>@@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> }
> }
>
>+/**
>+ * intel_gt_init_tlb - initialize TLB-specific vars
>+ * @gt: GT structure
>+ *
>+ * TLB cache invalidation logic internally uses some resources that require
>+ * initialization. Should be called before doing any TLB cache invalidation.
>+ */
> void intel_gt_init_tlb(struct intel_gt *gt)
> {
> mutex_init(>->tlb.invalidate_lock);
> seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock);
> }
>
>+/**
>+ * intel_gt_fini_tlb - initialize TLB-specific vars
Free TLB-specific vars
>+ * @gt: GT structure
>+ *
>+ * Frees any resources needed by TLB cache invalidation logic.
>+ */
> void intel_gt_fini_tlb(struct intel_gt *gt)
> {
> mutex_destroy(>->tlb.invalidate_lock);
>diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
>index 46ce25bf5afe..dca70c33bd61 100644
>--- a/drivers/gpu/drm/i915/gt/intel_tlb.h
>+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
>@@ -11,16 +11,117 @@
>
> #include "intel_gt_types.h"
>
>+/**
>+ * DOC: TLB cache invalidation logic
>+ *
>+ * The way the current algorithm works is that a struct drm_i915_gem_object can
>+ * be created on any order. At unbind/evict time, the object is warranted that
>+ * it won't be used anymore. So, a sequence number provided by
>+ * intel_gt_next_invalidate_tlb_full() is stored on it. This can happen either
>+ * at __vma_put_pages() - for VMA sync unbind, or at ppgtt_unbind_vma() - for
>+ * VMA async VMA bind.
>+ *
>+ * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb_full() is called,
>+ * where it checks if the sequence number of the object was already invalidated
>+ * or not. If not, it flushes the TLB and increments the sequence number::
>+ *
>+ * void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
>+ * {
>+ * ...
>+ * with_intel_gt_pm_if_awake(gt, wakeref) {
>+ * mutex_lock(>->tlb.invalidate_lock);
>+ * if (tlb_seqno_passed(gt, seqno))
>+ * goto unlock;
>+ *
>+ * // Some code to do TLB invalidation
>+ * ...
>+ *
>+ * write_seqcount_invalidate(>->tlb.seqno); // increment seqno
>+ * mutex_lock(>->tlb.invalidate_lock);
>+ * }
>+ *
>+ * So, let's say the current seqno is 2 and 3 new objects were created,
>+ * on this order::
>+ *
>+ * obj1
>+ * obj2
>+ * obj3
>+ *
>+ * They can be unbind/evict on a different order. At unbind/evict time,
>+ * the mm.tlb will be stamped with the sequence number, using the number
>+ * from the last TLB flush, plus 1.
I am trying to get my head around the below function.
void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
{
WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
}
Though we pass obj->mm.tlb for 'tlb' while calling this function,
aren't we writing to local 'tlb' variable here instead of obj->mm.tlb?
>+ *
>+ * Different threads may be used on unbind/evict and/or unset pages.
>+ * As the logic at void intel_gt_invalidate_tlb_full() is protected by a mutex,
May be we can skip 'void' and just keep function name here.
>+ * for simplicity, let's consider just two threads:
>+ *
>+ * +-------------------+-------------------------+---------------------------------+
>+ * | sequence number | Thread 0 | Thread 1 +
>+ * +===================+=========================+=================================+
>+ * | seqno=2 | | |
>+ * | +-------------------------+---------------------------------+
>+ * | | unbind/evict obj3. | |
>+ * | | | |
>+ * | | obj3.mm.tlb = seqno | 1 | |
>+ * | | // obj3.mm.tlb = 3 | |
>+ * | +-------------------------+---------------------------------+
>+ * | | unbind/evict obj1. | |
>+ * | | | |
>+ * | | obj1.mm.tlb = seqno | 1 | |
>+ * | | // obj1.mm.tlb = 3 | |
>+ * | +-------------------------+---------------------------------+
>+ * | | | __i915_gem_object_unset_pages() |
>+ * | | | called for obj3 => TLB flush |
>+ * | | | invalidating both obj1 and obj2.|
>+ * | | | |
>+ * | | | seqno += 2 |
>+ * +-------------------+-------------------------+---------------------------------+
>+ * | seqno=4 | | |
>+ * | +-------------------------+---------------------------------+
>+ * | | unbind/evict obj2. | |
>+ * | | | |
>+ * | | obj2.mm.tlb = seqno | 1 | |
>+ * | | // obj2.mm.tlb = 5 | |
>+ * | +-------------------------+---------------------------------+
>+ * | | | __i915_gem_object_unset_pages() |
>+ * | | | called for obj1, don't flush |
>+ * | | | as past flush invalidated obj1. |
>+ * | +-------------------------+---------------------------------+
>+ * | | | __i915_gem_object_unset_pages() |
>+ * | | | called for obj2 => TLB flush. |
>+ * | | | invalidating obj2. |
>+ * | | | |
>+ * | | | seqno += 2 |
>+ * +-------------------+-------------------------+---------------------------------+
>+ * | seqno=6 | | |
>+ * +-------------------+-------------------------+---------------------------------+
>+ */
>+
> void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
>
> void intel_gt_init_tlb(struct intel_gt *gt);
> void intel_gt_fini_tlb(struct intel_gt *gt);
>
>+/**
>+ * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number
>+ *
Probably this empty comment line needs to be removed before the parameter
description below?
>+ * @gt: GT structure
>+ *
>+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
>+ */
> static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
> {
> return seqprop_sequence(>->tlb.seqno);
> }
>
>+/**
>+ * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation
>+ * sequence number
>+ *
Same here.
-Niranjana
>+ * @gt: GT structure
>+ *
>+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
>+ */
> static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
> {
> return intel_gt_tlb_seqno(gt) | 1;
>--
>2.36.1
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions
2022-08-02 22:30 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-08-04 7:24 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 4+ messages in thread
From: Mauro Carvalho Chehab @ 2022-08-04 7:24 UTC (permalink / raw)
To: Niranjana Vishwanathapura
Cc: Mauro Carvalho Chehab, Chris Wilson, Jonathan Corbet,
David Airlie, intel-gfx, linux-doc, linux-kernel, dri-devel,
Maxime Ripard, Thomas Zimmermann, Rodrigo Vivi
On Tue, 2 Aug 2022 15:30:44 -0700
Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> wrote:
> On Fri, Jul 29, 2022 at 09:03:55AM +0200, Mauro Carvalho Chehab wrote:
> >Add a description for the TLB cache invalidation algorithm and for
> >the related kAPI functions.
> >
> >Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >---
> >
> >To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> >See [PATCH v2 0/2] at: https://lore.kernel.org/all/cover.1659077372.git.mchehab@kernel.org/
> >
> > Documentation/gpu/i915.rst | 7 ++
> > drivers/gpu/drm/i915/gt/intel_tlb.c | 25 +++++++
> > drivers/gpu/drm/i915/gt/intel_tlb.h | 101 ++++++++++++++++++++++++++++
> > 3 files changed, 133 insertions(+)
> >
> >diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> >index 4e59db1cfb00..46911fdd79e8 100644
> >--- a/Documentation/gpu/i915.rst
> >+++ b/Documentation/gpu/i915.rst
> >@@ -58,6 +58,13 @@ Intel GVT-g Host Support(vGPU device model)
> > .. kernel-doc:: drivers/gpu/drm/i915/intel_gvt.c
> > :internal:
> >
> >+TLB cache invalidation
> >+----------------------
> >+
> >+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.h
> >+
> >+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.c
> >+
> > Workarounds
> > -----------
> >
> >diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
> >index af8cae979489..4873b7ecc015 100644
> >--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
> >+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
> >@@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt)
> > intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
> > }
> >
> >+/**
> >+ * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
> >+ * @gt: GT structure
> >+ * @seqno: sequence number
> >+ *
> >+ * Do a full TLB cache invalidation if the @seqno is bigger than the last
> >+ * full TLB cache invalidation.
> >+ *
> >+ * Note:
> >+ * The TLB cache invalidation logic depends on GEN-specific registers.
> >+ * It currently supports MMIO-based TLB flush for GEN8 to GEN12.
> >+ */
> > void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> > {
> > intel_wakeref_t wakeref;
> >@@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> > }
> > }
> >
> >+/**
> >+ * intel_gt_init_tlb - initialize TLB-specific vars
> >+ * @gt: GT structure
> >+ *
> >+ * TLB cache invalidation logic internally uses some resources that require
> >+ * initialization. Should be called before doing any TLB cache invalidation.
> >+ */
> > void intel_gt_init_tlb(struct intel_gt *gt)
> > {
> > mutex_init(>->tlb.invalidate_lock);
> > seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock);
> > }
> >
> >+/**
> >+ * intel_gt_fini_tlb - initialize TLB-specific vars
>
> Free TLB-specific vars
OK.
>
> >+ * @gt: GT structure
> >+ *
> >+ * Frees any resources needed by TLB cache invalidation logic.
> >+ */
> > void intel_gt_fini_tlb(struct intel_gt *gt)
> > {
> > mutex_destroy(>->tlb.invalidate_lock);
> >diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
> >index 46ce25bf5afe..dca70c33bd61 100644
> >--- a/drivers/gpu/drm/i915/gt/intel_tlb.h
> >+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
> >@@ -11,16 +11,117 @@
> >
> > #include "intel_gt_types.h"
> >
> >+/**
> >+ * DOC: TLB cache invalidation logic
> >+ *
> >+ * The way the current algorithm works is that a struct drm_i915_gem_object can
> >+ * be created on any order. At unbind/evict time, the object is warranted that
> >+ * it won't be used anymore. So, a sequence number provided by
> >+ * intel_gt_next_invalidate_tlb_full() is stored on it. This can happen either
> >+ * at __vma_put_pages() - for VMA sync unbind, or at ppgtt_unbind_vma() - for
> >+ * VMA async VMA bind.
> >+ *
> >+ * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb_full() is called,
> >+ * where it checks if the sequence number of the object was already invalidated
> >+ * or not. If not, it flushes the TLB and increments the sequence number::
> >+ *
> >+ * void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> >+ * {
> >+ * ...
> >+ * with_intel_gt_pm_if_awake(gt, wakeref) {
> >+ * mutex_lock(>->tlb.invalidate_lock);
> >+ * if (tlb_seqno_passed(gt, seqno))
> >+ * goto unlock;
> >+ *
> >+ * // Some code to do TLB invalidation
> >+ * ...
> >+ *
> >+ * write_seqcount_invalidate(>->tlb.seqno); // increment seqno
> >+ * mutex_lock(>->tlb.invalidate_lock);
> >+ * }
> >+ *
> >+ * So, let's say the current seqno is 2 and 3 new objects were created,
> >+ * on this order::
> >+ *
> >+ * obj1
> >+ * obj2
> >+ * obj3
> >+ *
> >+ * They can be unbind/evict on a different order. At unbind/evict time,
> >+ * the mm.tlb will be stamped with the sequence number, using the number
> >+ * from the last TLB flush, plus 1.
>
> I am trying to get my head around the below function.
>
> void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
> {
> WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
> }
>
> Though we pass obj->mm.tlb for 'tlb' while calling this function,
> aren't we writing to local 'tlb' variable here instead of obj->mm.tlb?
It should be passing a pointer. I wrote such fix after a review,
but somehow it ended getting lost. I'll send the fix at v3.
> >+ *
> >+ * Different threads may be used on unbind/evict and/or unset pages.
> >+ * As the logic at void intel_gt_invalidate_tlb_full() is protected by a mutex,
>
> May be we can skip 'void' and just keep function name here.
Sure.
> >+ * for simplicity, let's consider just two threads:
> >+ *
> >+ * +-------------------+-------------------------+---------------------------------+
> >+ * | sequence number | Thread 0 | Thread 1 +
> >+ * +===================+=========================+=================================+
> >+ * | seqno=2 | | |
> >+ * | +-------------------------+---------------------------------+
> >+ * | | unbind/evict obj3. | |
> >+ * | | | |
> >+ * | | obj3.mm.tlb = seqno | 1 | |
> >+ * | | // obj3.mm.tlb = 3 | |
> >+ * | +-------------------------+---------------------------------+
> >+ * | | unbind/evict obj1. | |
> >+ * | | | |
> >+ * | | obj1.mm.tlb = seqno | 1 | |
> >+ * | | // obj1.mm.tlb = 3 | |
> >+ * | +-------------------------+---------------------------------+
> >+ * | | | __i915_gem_object_unset_pages() |
> >+ * | | | called for obj3 => TLB flush |
> >+ * | | | invalidating both obj1 and obj2.|
> >+ * | | | |
> >+ * | | | seqno += 2 |
> >+ * +-------------------+-------------------------+---------------------------------+
> >+ * | seqno=4 | | |
> >+ * | +-------------------------+---------------------------------+
> >+ * | | unbind/evict obj2. | |
> >+ * | | | |
> >+ * | | obj2.mm.tlb = seqno | 1 | |
> >+ * | | // obj2.mm.tlb = 5 | |
> >+ * | +-------------------------+---------------------------------+
> >+ * | | | __i915_gem_object_unset_pages() |
> >+ * | | | called for obj1, don't flush |
> >+ * | | | as past flush invalidated obj1. |
> >+ * | +-------------------------+---------------------------------+
> >+ * | | | __i915_gem_object_unset_pages() |
> >+ * | | | called for obj2 => TLB flush. |
> >+ * | | | invalidating obj2. |
> >+ * | | | |
> >+ * | | | seqno += 2 |
> >+ * +-------------------+-------------------------+---------------------------------+
> >+ * | seqno=6 | | |
> >+ * +-------------------+-------------------------+---------------------------------+
> >+ */
> >+
> > void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
> >
> > void intel_gt_init_tlb(struct intel_gt *gt);
> > void intel_gt_fini_tlb(struct intel_gt *gt);
> >
> >+/**
> >+ * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number
> >+ *
>
> Probably this empty comment line needs to be removed before the parameter
> description below?
Kernel-doc actually accepts both with or without a blank line. My
personal preference is to place a blank line, because sometimes the
function description plus function name is bigger than one line.
So, it is usually clearer when adding a blank line than doing
something like this (perfectly valid kerneldoc markup):
/**
* long_function_name_foo - Lorem ipsum dolor sit amet, consectetur
* adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore.
* @bar: some parameter
* ...
But yeah, kernel-doc documentation example doesn't have a blank
line. So, I'll drop it.
>
> >+ * @gt: GT structure
> >+ *
> >+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
> >+ */
> > static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
> > {
> > return seqprop_sequence(>->tlb.seqno);
> > }
> >
> >+/**
> >+ * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation
> >+ * sequence number
> >+ *
>
> Same here.
>
> -Niranjana
>
> >+ * @gt: GT structure
> >+ *
> >+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
> >+ */
> > static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
> > {
> > return intel_gt_tlb_seqno(gt) | 1;
> >--
> >2.36.1
> >
Thanks!
Mauro
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-08-04 7:24 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-29 7:03 [PATCH v2 0/2] Move TLB invalidation code for its own file and document it Mauro Carvalho Chehab
2022-07-29 7:03 ` [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions Mauro Carvalho Chehab
2022-08-02 22:30 ` [Intel-gfx] " Niranjana Vishwanathapura
2022-08-04 7:24 ` Mauro Carvalho Chehab
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).