* [PATCH v2 0/2] Move TLB invalidation code for its own file and document it @ 2022-07-29 7:03 Mauro Carvalho Chehab 2022-07-29 7:03 ` [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab 2022-07-29 7:03 ` [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions Mauro Carvalho Chehab 0 siblings, 2 replies; 7+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-29 7:03 UTC (permalink / raw) Cc: Thomas Zimmermann, Jonathan Corbet, David Airlie, intel-gfx, linux-doc, linux-kernel, dri-devel, Mauro Carvalho Chehab There are more things to be added to TLB invalidation. Before doing that, move the code to its own file, and add the relevant documentation. Patch 1 only moves the code and do some function renames. No functional change. Patch 2 adds documentation for the TLB invalidation algorithm and functions. --- v2: only patch 2 (kernel-doc) was modified: - The kernel-doc markups for TLB were added to i915.rst doc; - Some minor fixes at the texts; - Use a table instead of a literal block while explaining how the algorithm works. That should make easier to understand the logic, both in text form and after its conversion to HTML/PDF; - Remove mention for GuC, as this depends on a series that will be sent later. Chris Wilson (1): drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab (1): drm/i915/gt: document TLB cache invalidation functions Documentation/gpu/i915.rst | 7 + drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 +- drivers/gpu/drm/i915/gt/intel_gt.c | 168 +---------------- drivers/gpu/drm/i915/gt/intel_gt.h | 12 -- drivers/gpu/drm/i915/gt/intel_tlb.c | 208 ++++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_tlb.h | 130 ++++++++++++++ drivers/gpu/drm/i915/i915_vma.c | 1 + 8 files changed, 352 insertions(+), 179 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h -- 2.36.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file 2022-07-29 7:03 [PATCH v2 0/2] Move TLB invalidation code for its own file and document it Mauro Carvalho Chehab @ 2022-07-29 7:03 ` Mauro Carvalho Chehab 2022-08-02 22:21 ` [Intel-gfx] " Niranjana Vishwanathapura 2022-08-03 9:19 ` Andi Shyti 2022-07-29 7:03 ` [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions Mauro Carvalho Chehab 1 sibling, 2 replies; 7+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-29 7:03 UTC (permalink / raw) Cc: Thomas Hellström, Andi Shyti, Tvrtko Ursulin, David Airlie, dri-devel, Casey Bowman, Lucas De Marchi, linux-kernel, Chris Wilson, Daniele Ceraolo Spurio, Rodrigo Vivi, Tomas Winkler, Mauro Carvalho Chehab, intel-gfx, Fei Yang, Matthew Auld From: Chris Wilson <chris.p.wilson@intel.com> Prepare for supporting more TLB invalidation scenarios by moving the current MMIO invalidation to its own file. Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v2 0/2] at: https://lore.kernel.org/all/cover.1659077372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 +- drivers/gpu/drm/i915/gt/intel_gt.c | 168 +------------------- drivers/gpu/drm/i915/gt/intel_gt.h | 12 -- drivers/gpu/drm/i915/gt/intel_tlb.c | 183 ++++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_tlb.h | 29 ++++ drivers/gpu/drm/i915/i915_vma.c | 1 + 7 files changed, 219 insertions(+), 179 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 522ef9b4aff3..d3df9832d1f7 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -126,6 +126,7 @@ gt-y += \ gt/intel_sseu.o \ gt/intel_sseu_debugfs.o \ gt/intel_timeline.o \ + gt/intel_tlb.o \ gt/intel_workarounds.o \ gt/shmem_utils.o \ gt/sysfs_engines.o diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 8357dbdcab5c..1cd76cc5d9f3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -7,7 +7,7 @@ #include <drm/drm_cache.h> #include "gt/intel_gt.h" -#include "gt/intel_gt_pm.h" +#include "gt/intel_tlb.h" #include "i915_drv.h" #include "i915_gem_object.h" @@ -199,7 +199,7 @@ static void flush_tlb_invalidate(struct drm_i915_gem_object *obj) if (!obj->mm.tlb) return; - intel_gt_invalidate_tlb(gt, obj->mm.tlb); + intel_gt_invalidate_tlb_full(gt, obj->mm.tlb); obj->mm.tlb = 0; } diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index f435e06125aa..18d82cd620bd 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -11,9 +11,7 @@ #include "pxp/intel_pxp.h" #include "i915_drv.h" -#include "i915_perf_oa_regs.h" #include "intel_context.h" -#include "intel_engine_pm.h" #include "intel_engine_regs.h" #include "intel_ggtt_gmch.h" #include "intel_gt.h" @@ -31,6 +29,7 @@ #include "intel_renderstate.h" #include "intel_rps.h" #include "intel_gt_sysfs.h" +#include "intel_tlb.h" #include "intel_uncore.h" #include "shmem_utils.h" @@ -48,8 +47,7 @@ static void __intel_gt_init_early(struct intel_gt *gt) intel_gt_init_reset(gt); intel_gt_init_requests(gt); intel_gt_init_timelines(gt); - mutex_init(>->tlb.invalidate_lock); - seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); + intel_gt_init_tlb(gt); intel_gt_pm_init_early(gt); intel_uc_init_early(>->uc); @@ -770,7 +768,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915) intel_gt_fini_requests(gt); intel_gt_fini_reset(gt); intel_gt_fini_timelines(gt); - mutex_destroy(>->tlb.invalidate_lock); + intel_gt_fini_tlb(gt); intel_engines_free(gt); } } @@ -881,163 +879,3 @@ void intel_gt_info_print(const struct intel_gt_info *info, intel_sseu_dump(&info->sseu, p); } - -struct reg_and_bit { - i915_reg_t reg; - u32 bit; -}; - -static struct reg_and_bit -get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8, - const i915_reg_t *regs, const unsigned int num) -{ - const unsigned int class = engine->class; - struct reg_and_bit rb = { }; - - if (drm_WARN_ON_ONCE(&engine->i915->drm, - class >= num || !regs[class].reg)) - return rb; - - rb.reg = regs[class]; - if (gen8 && class == VIDEO_DECODE_CLASS) - rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */ - else - rb.bit = engine->instance; - - rb.bit = BIT(rb.bit); - - return rb; -} - -static void mmio_invalidate_full(struct intel_gt *gt) -{ - static const i915_reg_t gen8_regs[] = { - [RENDER_CLASS] = GEN8_RTCR, - [VIDEO_DECODE_CLASS] = GEN8_M1TCR, /* , GEN8_M2TCR */ - [VIDEO_ENHANCEMENT_CLASS] = GEN8_VTCR, - [COPY_ENGINE_CLASS] = GEN8_BTCR, - }; - static const i915_reg_t gen12_regs[] = { - [RENDER_CLASS] = GEN12_GFX_TLB_INV_CR, - [VIDEO_DECODE_CLASS] = GEN12_VD_TLB_INV_CR, - [VIDEO_ENHANCEMENT_CLASS] = GEN12_VE_TLB_INV_CR, - [COPY_ENGINE_CLASS] = GEN12_BLT_TLB_INV_CR, - [COMPUTE_CLASS] = GEN12_COMPCTX_TLB_INV_CR, - }; - struct drm_i915_private *i915 = gt->i915; - struct intel_uncore *uncore = gt->uncore; - struct intel_engine_cs *engine; - intel_engine_mask_t awake, tmp; - enum intel_engine_id id; - const i915_reg_t *regs; - unsigned int num = 0; - - if (GRAPHICS_VER(i915) == 12) { - regs = gen12_regs; - num = ARRAY_SIZE(gen12_regs); - } else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) { - regs = gen8_regs; - num = ARRAY_SIZE(gen8_regs); - } else if (GRAPHICS_VER(i915) < 8) { - return; - } - - if (drm_WARN_ONCE(&i915->drm, !num, - "Platform does not implement TLB invalidation!")) - return; - - intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); - - spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ - - awake = 0; - for_each_engine(engine, gt, id) { - struct reg_and_bit rb; - - if (!intel_engine_pm_is_awake(engine)) - continue; - - rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); - if (!i915_mmio_reg_offset(rb.reg)) - continue; - - intel_uncore_write_fw(uncore, rb.reg, rb.bit); - awake |= engine->mask; - } - - GT_TRACE(gt, "invalidated engines %08x\n", awake); - - /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ - if (awake && - (IS_TIGERLAKE(i915) || - IS_DG1(i915) || - IS_ROCKETLAKE(i915) || - IS_ALDERLAKE_S(i915) || - IS_ALDERLAKE_P(i915))) - intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1); - - spin_unlock_irq(&uncore->lock); - - for_each_engine_masked(engine, gt, awake, tmp) { - struct reg_and_bit rb; - - /* - * HW architecture suggest typical invalidation time at 40us, - * with pessimistic cases up to 100us and a recommendation to - * cap at 1ms. We go a bit higher just in case. - */ - const unsigned int timeout_us = 100; - const unsigned int timeout_ms = 4; - - rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); - if (__intel_wait_for_register_fw(uncore, - rb.reg, rb.bit, 0, - timeout_us, timeout_ms, - NULL)) - drm_err_ratelimited(>->i915->drm, - "%s TLB invalidation did not complete in %ums!\n", - engine->name, timeout_ms); - } - - /* - * Use delayed put since a) we mostly expect a flurry of TLB - * invalidations so it is good to avoid paying the forcewake cost and - * b) it works around a bug in Icelake which cannot cope with too rapid - * transitions. - */ - intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); -} - -static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno) -{ - u32 cur = intel_gt_tlb_seqno(gt); - - /* Only skip if a *full* TLB invalidate barrier has passed */ - return (s32)(cur - ALIGN(seqno, 2)) > 0; -} - -void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno) -{ - intel_wakeref_t wakeref; - - if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) - return; - - if (intel_gt_is_wedged(gt)) - return; - - if (tlb_seqno_passed(gt, seqno)) - return; - - with_intel_gt_pm_if_awake(gt, wakeref) { - mutex_lock(>->tlb.invalidate_lock); - if (tlb_seqno_passed(gt, seqno)) - goto unlock; - - mmio_invalidate_full(gt); - - write_seqcount_invalidate(>->tlb.seqno); -unlock: - mutex_unlock(>->tlb.invalidate_lock); - } -} diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 40b06adf509a..b4bba16cdb53 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -101,16 +101,4 @@ void intel_gt_info_print(const struct intel_gt_info *info, void intel_gt_watchdog_work(struct work_struct *work); -static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) -{ - return seqprop_sequence(>->tlb.seqno); -} - -static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) -{ - return intel_gt_tlb_seqno(gt) | 1; -} - -void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno); - #endif /* __INTEL_GT_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c new file mode 100644 index 000000000000..af8cae979489 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_tlb.c @@ -0,0 +1,183 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2022 Intel Corporation + */ + +#include "i915_drv.h" +#include "i915_perf_oa_regs.h" +#include "intel_engine_pm.h" +#include "intel_gt.h" +#include "intel_gt_pm.h" +#include "intel_gt_regs.h" +#include "intel_tlb.h" + +struct reg_and_bit { + i915_reg_t reg; + u32 bit; +}; + +static struct reg_and_bit +get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8, + const i915_reg_t *regs, const unsigned int num) +{ + const unsigned int class = engine->class; + struct reg_and_bit rb = { }; + + if (drm_WARN_ON_ONCE(&engine->i915->drm, + class >= num || !regs[class].reg)) + return rb; + + rb.reg = regs[class]; + if (gen8 && class == VIDEO_DECODE_CLASS) + rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */ + else + rb.bit = engine->instance; + + rb.bit = BIT(rb.bit); + + return rb; +} + +static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno) +{ + u32 cur = intel_gt_tlb_seqno(gt); + + /* Only skip if a *full* TLB invalidate barrier has passed */ + return (s32)(cur - ALIGN(seqno, 2)) > 0; +} + +static void mmio_invalidate_full(struct intel_gt *gt) +{ + static const i915_reg_t gen8_regs[] = { + [RENDER_CLASS] = GEN8_RTCR, + [VIDEO_DECODE_CLASS] = GEN8_M1TCR, /* , GEN8_M2TCR */ + [VIDEO_ENHANCEMENT_CLASS] = GEN8_VTCR, + [COPY_ENGINE_CLASS] = GEN8_BTCR, + }; + static const i915_reg_t gen12_regs[] = { + [RENDER_CLASS] = GEN12_GFX_TLB_INV_CR, + [VIDEO_DECODE_CLASS] = GEN12_VD_TLB_INV_CR, + [VIDEO_ENHANCEMENT_CLASS] = GEN12_VE_TLB_INV_CR, + [COPY_ENGINE_CLASS] = GEN12_BLT_TLB_INV_CR, + [COMPUTE_CLASS] = GEN12_COMPCTX_TLB_INV_CR, + }; + struct drm_i915_private *i915 = gt->i915; + struct intel_uncore *uncore = gt->uncore; + struct intel_engine_cs *engine; + intel_engine_mask_t awake, tmp; + enum intel_engine_id id; + const i915_reg_t *regs; + unsigned int num = 0; + + if (GRAPHICS_VER(i915) == 12) { + regs = gen12_regs; + num = ARRAY_SIZE(gen12_regs); + } else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) { + regs = gen8_regs; + num = ARRAY_SIZE(gen8_regs); + } else if (GRAPHICS_VER(i915) < 8) { + return; + } + + if (drm_WARN_ONCE(&i915->drm, !num, + "Platform does not implement TLB invalidation!")) + return; + + intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); + + spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ + + awake = 0; + for_each_engine(engine, gt, id) { + struct reg_and_bit rb; + + if (!intel_engine_pm_is_awake(engine)) + continue; + + rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); + if (!i915_mmio_reg_offset(rb.reg)) + continue; + + intel_uncore_write_fw(uncore, rb.reg, rb.bit); + awake |= engine->mask; + } + + GT_TRACE(gt, "invalidated engines %08x\n", awake); + + /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ + if (awake && + (IS_TIGERLAKE(i915) || + IS_DG1(i915) || + IS_ROCKETLAKE(i915) || + IS_ALDERLAKE_S(i915) || + IS_ALDERLAKE_P(i915))) + intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1); + + spin_unlock_irq(&uncore->lock); + + for_each_engine_masked(engine, gt, awake, tmp) { + struct reg_and_bit rb; + + /* + * HW architecture suggest typical invalidation time at 40us, + * with pessimistic cases up to 100us and a recommendation to + * cap at 1ms. We go a bit higher just in case. + */ + const unsigned int timeout_us = 100; + const unsigned int timeout_ms = 4; + + rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); + if (__intel_wait_for_register_fw(uncore, + rb.reg, rb.bit, 0, + timeout_us, timeout_ms, + NULL)) + drm_err_ratelimited(>->i915->drm, + "%s TLB invalidation did not complete in %ums!\n", + engine->name, timeout_ms); + } + + /* + * Use delayed put since a) we mostly expect a flurry of TLB + * invalidations so it is good to avoid paying the forcewake cost and + * b) it works around a bug in Icelake which cannot cope with too rapid + * transitions. + */ + intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); +} + +void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) +{ + intel_wakeref_t wakeref; + + if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) + return; + + if (intel_gt_is_wedged(gt)) + return; + + if (tlb_seqno_passed(gt, seqno)) + return; + + with_intel_gt_pm_if_awake(gt, wakeref) { + mutex_lock(>->tlb.invalidate_lock); + if (tlb_seqno_passed(gt, seqno)) + goto unlock; + + mmio_invalidate_full(gt); + + write_seqcount_invalidate(>->tlb.seqno); +unlock: + mutex_unlock(>->tlb.invalidate_lock); + } +} + +void intel_gt_init_tlb(struct intel_gt *gt) +{ + mutex_init(>->tlb.invalidate_lock); + seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); +} + +void intel_gt_fini_tlb(struct intel_gt *gt) +{ + mutex_destroy(>->tlb.invalidate_lock); +} diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h new file mode 100644 index 000000000000..46ce25bf5afe --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_tlb.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2022 Intel Corporation + */ + +#ifndef INTEL_TLB_H +#define INTEL_TLB_H + +#include <linux/seqlock.h> +#include <linux/types.h> + +#include "intel_gt_types.h" + +void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno); + +void intel_gt_init_tlb(struct intel_gt *gt); +void intel_gt_fini_tlb(struct intel_gt *gt); + +static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) +{ + return seqprop_sequence(>->tlb.seqno); +} + +static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) +{ + return intel_gt_tlb_seqno(gt) | 1; +} + +#endif /* INTEL_TLB_H */ diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 84a9ccbc5fc5..fe947d1456d5 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -33,6 +33,7 @@ #include "gt/intel_engine_heartbeat.h" #include "gt/intel_gt.h" #include "gt/intel_gt_requests.h" +#include "gt/intel_tlb.h" #include "i915_drv.h" #include "i915_gem_evict.h" -- 2.36.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Intel-gfx] [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file 2022-07-29 7:03 ` [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab @ 2022-08-02 22:21 ` Niranjana Vishwanathapura 2022-08-03 9:19 ` Andi Shyti 1 sibling, 0 replies; 7+ messages in thread From: Niranjana Vishwanathapura @ 2022-08-02 22:21 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Thomas Hellström, David Airlie, intel-gfx, Lucas De Marchi, linux-kernel, Chris Wilson, dri-devel, Rodrigo Vivi, Tomas Winkler, Matthew Auld On Fri, Jul 29, 2022 at 09:03:54AM +0200, Mauro Carvalho Chehab wrote: >From: Chris Wilson <chris.p.wilson@intel.com> > >Prepare for supporting more TLB invalidation scenarios by moving >the current MMIO invalidation to its own file. And looks like, 1. Rename intel_gt_invalidate_tlb() to intel_gt_invalidate_tlb_full() 2. Add intel_gt_init_tlb() and intel_gt_fini_tlb() abstracts. Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> > >Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> >Cc: Fei Yang <fei.yang@intel.com> >Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> >--- > >To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. >See [PATCH v2 0/2] at: https://lore.kernel.org/all/cover.1659077372.git.mchehab@kernel.org/ > > drivers/gpu/drm/i915/Makefile | 1 + > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 +- > drivers/gpu/drm/i915/gt/intel_gt.c | 168 +------------------- > drivers/gpu/drm/i915/gt/intel_gt.h | 12 -- > drivers/gpu/drm/i915/gt/intel_tlb.c | 183 ++++++++++++++++++++++ > drivers/gpu/drm/i915/gt/intel_tlb.h | 29 ++++ > drivers/gpu/drm/i915/i915_vma.c | 1 + > 7 files changed, 219 insertions(+), 179 deletions(-) > create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c > create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h > >diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile >index 522ef9b4aff3..d3df9832d1f7 100644 >--- a/drivers/gpu/drm/i915/Makefile >+++ b/drivers/gpu/drm/i915/Makefile >@@ -126,6 +126,7 @@ gt-y += \ > gt/intel_sseu.o \ > gt/intel_sseu_debugfs.o \ > gt/intel_timeline.o \ >+ gt/intel_tlb.o \ > gt/intel_workarounds.o \ > gt/shmem_utils.o \ > gt/sysfs_engines.o >diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c >index 8357dbdcab5c..1cd76cc5d9f3 100644 >--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c >+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c >@@ -7,7 +7,7 @@ > #include <drm/drm_cache.h> > > #include "gt/intel_gt.h" >-#include "gt/intel_gt_pm.h" >+#include "gt/intel_tlb.h" > > #include "i915_drv.h" > #include "i915_gem_object.h" >@@ -199,7 +199,7 @@ static void flush_tlb_invalidate(struct drm_i915_gem_object *obj) > if (!obj->mm.tlb) > return; > >- intel_gt_invalidate_tlb(gt, obj->mm.tlb); >+ intel_gt_invalidate_tlb_full(gt, obj->mm.tlb); > obj->mm.tlb = 0; > } > >diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c >index f435e06125aa..18d82cd620bd 100644 >--- a/drivers/gpu/drm/i915/gt/intel_gt.c >+++ b/drivers/gpu/drm/i915/gt/intel_gt.c >@@ -11,9 +11,7 @@ > #include "pxp/intel_pxp.h" > > #include "i915_drv.h" >-#include "i915_perf_oa_regs.h" > #include "intel_context.h" >-#include "intel_engine_pm.h" > #include "intel_engine_regs.h" > #include "intel_ggtt_gmch.h" > #include "intel_gt.h" >@@ -31,6 +29,7 @@ > #include "intel_renderstate.h" > #include "intel_rps.h" > #include "intel_gt_sysfs.h" >+#include "intel_tlb.h" > #include "intel_uncore.h" > #include "shmem_utils.h" > >@@ -48,8 +47,7 @@ static void __intel_gt_init_early(struct intel_gt *gt) > intel_gt_init_reset(gt); > intel_gt_init_requests(gt); > intel_gt_init_timelines(gt); >- mutex_init(>->tlb.invalidate_lock); >- seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); >+ intel_gt_init_tlb(gt); > intel_gt_pm_init_early(gt); > > intel_uc_init_early(>->uc); >@@ -770,7 +768,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915) > intel_gt_fini_requests(gt); > intel_gt_fini_reset(gt); > intel_gt_fini_timelines(gt); >- mutex_destroy(>->tlb.invalidate_lock); >+ intel_gt_fini_tlb(gt); > intel_engines_free(gt); > } > } >@@ -881,163 +879,3 @@ void intel_gt_info_print(const struct intel_gt_info *info, > > intel_sseu_dump(&info->sseu, p); > } >- >-struct reg_and_bit { >- i915_reg_t reg; >- u32 bit; >-}; >- >-static struct reg_and_bit >-get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8, >- const i915_reg_t *regs, const unsigned int num) >-{ >- const unsigned int class = engine->class; >- struct reg_and_bit rb = { }; >- >- if (drm_WARN_ON_ONCE(&engine->i915->drm, >- class >= num || !regs[class].reg)) >- return rb; >- >- rb.reg = regs[class]; >- if (gen8 && class == VIDEO_DECODE_CLASS) >- rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */ >- else >- rb.bit = engine->instance; >- >- rb.bit = BIT(rb.bit); >- >- return rb; >-} >- >-static void mmio_invalidate_full(struct intel_gt *gt) >-{ >- static const i915_reg_t gen8_regs[] = { >- [RENDER_CLASS] = GEN8_RTCR, >- [VIDEO_DECODE_CLASS] = GEN8_M1TCR, /* , GEN8_M2TCR */ >- [VIDEO_ENHANCEMENT_CLASS] = GEN8_VTCR, >- [COPY_ENGINE_CLASS] = GEN8_BTCR, >- }; >- static const i915_reg_t gen12_regs[] = { >- [RENDER_CLASS] = GEN12_GFX_TLB_INV_CR, >- [VIDEO_DECODE_CLASS] = GEN12_VD_TLB_INV_CR, >- [VIDEO_ENHANCEMENT_CLASS] = GEN12_VE_TLB_INV_CR, >- [COPY_ENGINE_CLASS] = GEN12_BLT_TLB_INV_CR, >- [COMPUTE_CLASS] = GEN12_COMPCTX_TLB_INV_CR, >- }; >- struct drm_i915_private *i915 = gt->i915; >- struct intel_uncore *uncore = gt->uncore; >- struct intel_engine_cs *engine; >- intel_engine_mask_t awake, tmp; >- enum intel_engine_id id; >- const i915_reg_t *regs; >- unsigned int num = 0; >- >- if (GRAPHICS_VER(i915) == 12) { >- regs = gen12_regs; >- num = ARRAY_SIZE(gen12_regs); >- } else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) { >- regs = gen8_regs; >- num = ARRAY_SIZE(gen8_regs); >- } else if (GRAPHICS_VER(i915) < 8) { >- return; >- } >- >- if (drm_WARN_ONCE(&i915->drm, !num, >- "Platform does not implement TLB invalidation!")) >- return; >- >- intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); >- >- spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ >- >- awake = 0; >- for_each_engine(engine, gt, id) { >- struct reg_and_bit rb; >- >- if (!intel_engine_pm_is_awake(engine)) >- continue; >- >- rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); >- if (!i915_mmio_reg_offset(rb.reg)) >- continue; >- >- intel_uncore_write_fw(uncore, rb.reg, rb.bit); >- awake |= engine->mask; >- } >- >- GT_TRACE(gt, "invalidated engines %08x\n", awake); >- >- /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ >- if (awake && >- (IS_TIGERLAKE(i915) || >- IS_DG1(i915) || >- IS_ROCKETLAKE(i915) || >- IS_ALDERLAKE_S(i915) || >- IS_ALDERLAKE_P(i915))) >- intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1); >- >- spin_unlock_irq(&uncore->lock); >- >- for_each_engine_masked(engine, gt, awake, tmp) { >- struct reg_and_bit rb; >- >- /* >- * HW architecture suggest typical invalidation time at 40us, >- * with pessimistic cases up to 100us and a recommendation to >- * cap at 1ms. We go a bit higher just in case. >- */ >- const unsigned int timeout_us = 100; >- const unsigned int timeout_ms = 4; >- >- rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); >- if (__intel_wait_for_register_fw(uncore, >- rb.reg, rb.bit, 0, >- timeout_us, timeout_ms, >- NULL)) >- drm_err_ratelimited(>->i915->drm, >- "%s TLB invalidation did not complete in %ums!\n", >- engine->name, timeout_ms); >- } >- >- /* >- * Use delayed put since a) we mostly expect a flurry of TLB >- * invalidations so it is good to avoid paying the forcewake cost and >- * b) it works around a bug in Icelake which cannot cope with too rapid >- * transitions. >- */ >- intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); >-} >- >-static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno) >-{ >- u32 cur = intel_gt_tlb_seqno(gt); >- >- /* Only skip if a *full* TLB invalidate barrier has passed */ >- return (s32)(cur - ALIGN(seqno, 2)) > 0; >-} >- >-void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno) >-{ >- intel_wakeref_t wakeref; >- >- if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) >- return; >- >- if (intel_gt_is_wedged(gt)) >- return; >- >- if (tlb_seqno_passed(gt, seqno)) >- return; >- >- with_intel_gt_pm_if_awake(gt, wakeref) { >- mutex_lock(>->tlb.invalidate_lock); >- if (tlb_seqno_passed(gt, seqno)) >- goto unlock; >- >- mmio_invalidate_full(gt); >- >- write_seqcount_invalidate(>->tlb.seqno); >-unlock: >- mutex_unlock(>->tlb.invalidate_lock); >- } >-} >diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h >index 40b06adf509a..b4bba16cdb53 100644 >--- a/drivers/gpu/drm/i915/gt/intel_gt.h >+++ b/drivers/gpu/drm/i915/gt/intel_gt.h >@@ -101,16 +101,4 @@ void intel_gt_info_print(const struct intel_gt_info *info, > > void intel_gt_watchdog_work(struct work_struct *work); > >-static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) >-{ >- return seqprop_sequence(>->tlb.seqno); >-} >- >-static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) >-{ >- return intel_gt_tlb_seqno(gt) | 1; >-} >- >-void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno); >- > #endif /* __INTEL_GT_H__ */ >diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c >new file mode 100644 >index 000000000000..af8cae979489 >--- /dev/null >+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c >@@ -0,0 +1,183 @@ >+// SPDX-License-Identifier: MIT >+/* >+ * Copyright © 2022 Intel Corporation >+ */ >+ >+#include "i915_drv.h" >+#include "i915_perf_oa_regs.h" >+#include "intel_engine_pm.h" >+#include "intel_gt.h" >+#include "intel_gt_pm.h" >+#include "intel_gt_regs.h" >+#include "intel_tlb.h" >+ >+struct reg_and_bit { >+ i915_reg_t reg; >+ u32 bit; >+}; >+ >+static struct reg_and_bit >+get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8, >+ const i915_reg_t *regs, const unsigned int num) >+{ >+ const unsigned int class = engine->class; >+ struct reg_and_bit rb = { }; >+ >+ if (drm_WARN_ON_ONCE(&engine->i915->drm, >+ class >= num || !regs[class].reg)) >+ return rb; >+ >+ rb.reg = regs[class]; >+ if (gen8 && class == VIDEO_DECODE_CLASS) >+ rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */ >+ else >+ rb.bit = engine->instance; >+ >+ rb.bit = BIT(rb.bit); >+ >+ return rb; >+} >+ >+static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno) >+{ >+ u32 cur = intel_gt_tlb_seqno(gt); >+ >+ /* Only skip if a *full* TLB invalidate barrier has passed */ >+ return (s32)(cur - ALIGN(seqno, 2)) > 0; >+} >+ >+static void mmio_invalidate_full(struct intel_gt *gt) >+{ >+ static const i915_reg_t gen8_regs[] = { >+ [RENDER_CLASS] = GEN8_RTCR, >+ [VIDEO_DECODE_CLASS] = GEN8_M1TCR, /* , GEN8_M2TCR */ >+ [VIDEO_ENHANCEMENT_CLASS] = GEN8_VTCR, >+ [COPY_ENGINE_CLASS] = GEN8_BTCR, >+ }; >+ static const i915_reg_t gen12_regs[] = { >+ [RENDER_CLASS] = GEN12_GFX_TLB_INV_CR, >+ [VIDEO_DECODE_CLASS] = GEN12_VD_TLB_INV_CR, >+ [VIDEO_ENHANCEMENT_CLASS] = GEN12_VE_TLB_INV_CR, >+ [COPY_ENGINE_CLASS] = GEN12_BLT_TLB_INV_CR, >+ [COMPUTE_CLASS] = GEN12_COMPCTX_TLB_INV_CR, >+ }; >+ struct drm_i915_private *i915 = gt->i915; >+ struct intel_uncore *uncore = gt->uncore; >+ struct intel_engine_cs *engine; >+ intel_engine_mask_t awake, tmp; >+ enum intel_engine_id id; >+ const i915_reg_t *regs; >+ unsigned int num = 0; >+ >+ if (GRAPHICS_VER(i915) == 12) { >+ regs = gen12_regs; >+ num = ARRAY_SIZE(gen12_regs); >+ } else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) { >+ regs = gen8_regs; >+ num = ARRAY_SIZE(gen8_regs); >+ } else if (GRAPHICS_VER(i915) < 8) { >+ return; >+ } >+ >+ if (drm_WARN_ONCE(&i915->drm, !num, >+ "Platform does not implement TLB invalidation!")) >+ return; >+ >+ intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); >+ >+ spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ >+ >+ awake = 0; >+ for_each_engine(engine, gt, id) { >+ struct reg_and_bit rb; >+ >+ if (!intel_engine_pm_is_awake(engine)) >+ continue; >+ >+ rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); >+ if (!i915_mmio_reg_offset(rb.reg)) >+ continue; >+ >+ intel_uncore_write_fw(uncore, rb.reg, rb.bit); >+ awake |= engine->mask; >+ } >+ >+ GT_TRACE(gt, "invalidated engines %08x\n", awake); >+ >+ /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ >+ if (awake && >+ (IS_TIGERLAKE(i915) || >+ IS_DG1(i915) || >+ IS_ROCKETLAKE(i915) || >+ IS_ALDERLAKE_S(i915) || >+ IS_ALDERLAKE_P(i915))) >+ intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1); >+ >+ spin_unlock_irq(&uncore->lock); >+ >+ for_each_engine_masked(engine, gt, awake, tmp) { >+ struct reg_and_bit rb; >+ >+ /* >+ * HW architecture suggest typical invalidation time at 40us, >+ * with pessimistic cases up to 100us and a recommendation to >+ * cap at 1ms. We go a bit higher just in case. >+ */ >+ const unsigned int timeout_us = 100; >+ const unsigned int timeout_ms = 4; >+ >+ rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); >+ if (__intel_wait_for_register_fw(uncore, >+ rb.reg, rb.bit, 0, >+ timeout_us, timeout_ms, >+ NULL)) >+ drm_err_ratelimited(>->i915->drm, >+ "%s TLB invalidation did not complete in %ums!\n", >+ engine->name, timeout_ms); >+ } >+ >+ /* >+ * Use delayed put since a) we mostly expect a flurry of TLB >+ * invalidations so it is good to avoid paying the forcewake cost and >+ * b) it works around a bug in Icelake which cannot cope with too rapid >+ * transitions. >+ */ >+ intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); >+} >+ >+void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) >+{ >+ intel_wakeref_t wakeref; >+ >+ if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) >+ return; >+ >+ if (intel_gt_is_wedged(gt)) >+ return; >+ >+ if (tlb_seqno_passed(gt, seqno)) >+ return; >+ >+ with_intel_gt_pm_if_awake(gt, wakeref) { >+ mutex_lock(>->tlb.invalidate_lock); >+ if (tlb_seqno_passed(gt, seqno)) >+ goto unlock; >+ >+ mmio_invalidate_full(gt); >+ >+ write_seqcount_invalidate(>->tlb.seqno); >+unlock: >+ mutex_unlock(>->tlb.invalidate_lock); >+ } >+} >+ >+void intel_gt_init_tlb(struct intel_gt *gt) >+{ >+ mutex_init(>->tlb.invalidate_lock); >+ seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); >+} >+ >+void intel_gt_fini_tlb(struct intel_gt *gt) >+{ >+ mutex_destroy(>->tlb.invalidate_lock); >+} >diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h >new file mode 100644 >index 000000000000..46ce25bf5afe >--- /dev/null >+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h >@@ -0,0 +1,29 @@ >+/* SPDX-License-Identifier: MIT */ >+/* >+ * Copyright © 2022 Intel Corporation >+ */ >+ >+#ifndef INTEL_TLB_H >+#define INTEL_TLB_H >+ >+#include <linux/seqlock.h> >+#include <linux/types.h> >+ >+#include "intel_gt_types.h" >+ >+void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno); >+ >+void intel_gt_init_tlb(struct intel_gt *gt); >+void intel_gt_fini_tlb(struct intel_gt *gt); >+ >+static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) >+{ >+ return seqprop_sequence(>->tlb.seqno); >+} >+ >+static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) >+{ >+ return intel_gt_tlb_seqno(gt) | 1; >+} >+ >+#endif /* INTEL_TLB_H */ >diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c >index 84a9ccbc5fc5..fe947d1456d5 100644 >--- a/drivers/gpu/drm/i915/i915_vma.c >+++ b/drivers/gpu/drm/i915/i915_vma.c >@@ -33,6 +33,7 @@ > #include "gt/intel_engine_heartbeat.h" > #include "gt/intel_gt.h" > #include "gt/intel_gt_requests.h" >+#include "gt/intel_tlb.h" > > #include "i915_drv.h" > #include "i915_gem_evict.h" >-- >2.36.1 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file 2022-07-29 7:03 ` [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab 2022-08-02 22:21 ` [Intel-gfx] " Niranjana Vishwanathapura @ 2022-08-03 9:19 ` Andi Shyti 1 sibling, 0 replies; 7+ messages in thread From: Andi Shyti @ 2022-08-03 9:19 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Thomas Hellström, Andi Shyti, Tvrtko Ursulin, David Airlie, dri-devel, Casey Bowman, Lucas De Marchi, linux-kernel, Chris Wilson, Daniele Ceraolo Spurio, Rodrigo Vivi, Tomas Winkler, intel-gfx, Fei Yang, Matthew Auld Hi Mauro, On Fri, Jul 29, 2022 at 09:03:54AM +0200, Mauro Carvalho Chehab wrote: > From: Chris Wilson <chris.p.wilson@intel.com> > > Prepare for supporting more TLB invalidation scenarios by moving > the current MMIO invalidation to its own file. > > Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> > Cc: Fei Yang <fei.yang@intel.com> > Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> I already reviewed this patch... anyway I checked it again and it's all correct. Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Thanks, Andi ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions 2022-07-29 7:03 [PATCH v2 0/2] Move TLB invalidation code for its own file and document it Mauro Carvalho Chehab 2022-07-29 7:03 ` [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab @ 2022-07-29 7:03 ` Mauro Carvalho Chehab 2022-08-02 22:30 ` [Intel-gfx] " Niranjana Vishwanathapura 1 sibling, 1 reply; 7+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-29 7:03 UTC (permalink / raw) Cc: Tvrtko Ursulin, Thomas Zimmermann, Jonathan Corbet, David Airlie, dri-devel, linux-kernel, linux-doc, Chris Wilson, Rodrigo Vivi, Mauro Carvalho Chehab, intel-gfx Add a description for the TLB cache invalidation algorithm and for the related kAPI functions. Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v2 0/2] at: https://lore.kernel.org/all/cover.1659077372.git.mchehab@kernel.org/ Documentation/gpu/i915.rst | 7 ++ drivers/gpu/drm/i915/gt/intel_tlb.c | 25 +++++++ drivers/gpu/drm/i915/gt/intel_tlb.h | 101 ++++++++++++++++++++++++++++ 3 files changed, 133 insertions(+) diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst index 4e59db1cfb00..46911fdd79e8 100644 --- a/Documentation/gpu/i915.rst +++ b/Documentation/gpu/i915.rst @@ -58,6 +58,13 @@ Intel GVT-g Host Support(vGPU device model) .. kernel-doc:: drivers/gpu/drm/i915/intel_gvt.c :internal: +TLB cache invalidation +---------------------- + +.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.h + +.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.c + Workarounds ----------- diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c index af8cae979489..4873b7ecc015 100644 --- a/drivers/gpu/drm/i915/gt/intel_tlb.c +++ b/drivers/gpu/drm/i915/gt/intel_tlb.c @@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt) intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); } +/** + * intel_gt_invalidate_tlb_full - do full TLB cache invalidation + * @gt: GT structure + * @seqno: sequence number + * + * Do a full TLB cache invalidation if the @seqno is bigger than the last + * full TLB cache invalidation. + * + * Note: + * The TLB cache invalidation logic depends on GEN-specific registers. + * It currently supports MMIO-based TLB flush for GEN8 to GEN12. + */ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) { intel_wakeref_t wakeref; @@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) } } +/** + * intel_gt_init_tlb - initialize TLB-specific vars + * @gt: GT structure + * + * TLB cache invalidation logic internally uses some resources that require + * initialization. Should be called before doing any TLB cache invalidation. + */ void intel_gt_init_tlb(struct intel_gt *gt) { mutex_init(>->tlb.invalidate_lock); seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); } +/** + * intel_gt_fini_tlb - initialize TLB-specific vars + * @gt: GT structure + * + * Frees any resources needed by TLB cache invalidation logic. + */ void intel_gt_fini_tlb(struct intel_gt *gt) { mutex_destroy(>->tlb.invalidate_lock); diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h index 46ce25bf5afe..dca70c33bd61 100644 --- a/drivers/gpu/drm/i915/gt/intel_tlb.h +++ b/drivers/gpu/drm/i915/gt/intel_tlb.h @@ -11,16 +11,117 @@ #include "intel_gt_types.h" +/** + * DOC: TLB cache invalidation logic + * + * The way the current algorithm works is that a struct drm_i915_gem_object can + * be created on any order. At unbind/evict time, the object is warranted that + * it won't be used anymore. So, a sequence number provided by + * intel_gt_next_invalidate_tlb_full() is stored on it. This can happen either + * at __vma_put_pages() - for VMA sync unbind, or at ppgtt_unbind_vma() - for + * VMA async VMA bind. + * + * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb_full() is called, + * where it checks if the sequence number of the object was already invalidated + * or not. If not, it flushes the TLB and increments the sequence number:: + * + * void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) + * { + * ... + * with_intel_gt_pm_if_awake(gt, wakeref) { + * mutex_lock(>->tlb.invalidate_lock); + * if (tlb_seqno_passed(gt, seqno)) + * goto unlock; + * + * // Some code to do TLB invalidation + * ... + * + * write_seqcount_invalidate(>->tlb.seqno); // increment seqno + * mutex_lock(>->tlb.invalidate_lock); + * } + * + * So, let's say the current seqno is 2 and 3 new objects were created, + * on this order:: + * + * obj1 + * obj2 + * obj3 + * + * They can be unbind/evict on a different order. At unbind/evict time, + * the mm.tlb will be stamped with the sequence number, using the number + * from the last TLB flush, plus 1. + * + * Different threads may be used on unbind/evict and/or unset pages. + * As the logic at void intel_gt_invalidate_tlb_full() is protected by a mutex, + * for simplicity, let's consider just two threads: + * + * +-------------------+-------------------------+---------------------------------+ + * | sequence number | Thread 0 | Thread 1 + + * +===================+=========================+=================================+ + * | seqno=2 | | | + * | +-------------------------+---------------------------------+ + * | | unbind/evict obj3. | | + * | | | | + * | | obj3.mm.tlb = seqno | 1 | | + * | | // obj3.mm.tlb = 3 | | + * | +-------------------------+---------------------------------+ + * | | unbind/evict obj1. | | + * | | | | + * | | obj1.mm.tlb = seqno | 1 | | + * | | // obj1.mm.tlb = 3 | | + * | +-------------------------+---------------------------------+ + * | | | __i915_gem_object_unset_pages() | + * | | | called for obj3 => TLB flush | + * | | | invalidating both obj1 and obj2.| + * | | | | + * | | | seqno += 2 | + * +-------------------+-------------------------+---------------------------------+ + * | seqno=4 | | | + * | +-------------------------+---------------------------------+ + * | | unbind/evict obj2. | | + * | | | | + * | | obj2.mm.tlb = seqno | 1 | | + * | | // obj2.mm.tlb = 5 | | + * | +-------------------------+---------------------------------+ + * | | | __i915_gem_object_unset_pages() | + * | | | called for obj1, don't flush | + * | | | as past flush invalidated obj1. | + * | +-------------------------+---------------------------------+ + * | | | __i915_gem_object_unset_pages() | + * | | | called for obj2 => TLB flush. | + * | | | invalidating obj2. | + * | | | | + * | | | seqno += 2 | + * +-------------------+-------------------------+---------------------------------+ + * | seqno=6 | | | + * +-------------------+-------------------------+---------------------------------+ + */ + void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno); void intel_gt_init_tlb(struct intel_gt *gt); void intel_gt_fini_tlb(struct intel_gt *gt); +/** + * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number + * + * @gt: GT structure + * + * There's no need to lock while calling it, as seqprop_sequence is thread-safe + */ static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) { return seqprop_sequence(>->tlb.seqno); } +/** + * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation + * sequence number + * + * @gt: GT structure + * + * There's no need to lock while calling it, as seqprop_sequence is thread-safe + */ static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) { return intel_gt_tlb_seqno(gt) | 1; -- 2.36.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions 2022-07-29 7:03 ` [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions Mauro Carvalho Chehab @ 2022-08-02 22:30 ` Niranjana Vishwanathapura 2022-08-04 7:24 ` Mauro Carvalho Chehab 0 siblings, 1 reply; 7+ messages in thread From: Niranjana Vishwanathapura @ 2022-08-02 22:30 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Chris Wilson, Jonathan Corbet, David Airlie, intel-gfx, linux-doc, linux-kernel, dri-devel, Thomas Zimmermann, Rodrigo Vivi On Fri, Jul 29, 2022 at 09:03:55AM +0200, Mauro Carvalho Chehab wrote: >Add a description for the TLB cache invalidation algorithm and for >the related kAPI functions. > >Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> >--- > >To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. >See [PATCH v2 0/2] at: https://lore.kernel.org/all/cover.1659077372.git.mchehab@kernel.org/ > > Documentation/gpu/i915.rst | 7 ++ > drivers/gpu/drm/i915/gt/intel_tlb.c | 25 +++++++ > drivers/gpu/drm/i915/gt/intel_tlb.h | 101 ++++++++++++++++++++++++++++ > 3 files changed, 133 insertions(+) > >diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst >index 4e59db1cfb00..46911fdd79e8 100644 >--- a/Documentation/gpu/i915.rst >+++ b/Documentation/gpu/i915.rst >@@ -58,6 +58,13 @@ Intel GVT-g Host Support(vGPU device model) > .. kernel-doc:: drivers/gpu/drm/i915/intel_gvt.c > :internal: > >+TLB cache invalidation >+---------------------- >+ >+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.h >+ >+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.c >+ > Workarounds > ----------- > >diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c >index af8cae979489..4873b7ecc015 100644 >--- a/drivers/gpu/drm/i915/gt/intel_tlb.c >+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c >@@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt) > intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); > } > >+/** >+ * intel_gt_invalidate_tlb_full - do full TLB cache invalidation >+ * @gt: GT structure >+ * @seqno: sequence number >+ * >+ * Do a full TLB cache invalidation if the @seqno is bigger than the last >+ * full TLB cache invalidation. >+ * >+ * Note: >+ * The TLB cache invalidation logic depends on GEN-specific registers. >+ * It currently supports MMIO-based TLB flush for GEN8 to GEN12. >+ */ > void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) > { > intel_wakeref_t wakeref; >@@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) > } > } > >+/** >+ * intel_gt_init_tlb - initialize TLB-specific vars >+ * @gt: GT structure >+ * >+ * TLB cache invalidation logic internally uses some resources that require >+ * initialization. Should be called before doing any TLB cache invalidation. >+ */ > void intel_gt_init_tlb(struct intel_gt *gt) > { > mutex_init(>->tlb.invalidate_lock); > seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); > } > >+/** >+ * intel_gt_fini_tlb - initialize TLB-specific vars Free TLB-specific vars >+ * @gt: GT structure >+ * >+ * Frees any resources needed by TLB cache invalidation logic. >+ */ > void intel_gt_fini_tlb(struct intel_gt *gt) > { > mutex_destroy(>->tlb.invalidate_lock); >diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h >index 46ce25bf5afe..dca70c33bd61 100644 >--- a/drivers/gpu/drm/i915/gt/intel_tlb.h >+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h >@@ -11,16 +11,117 @@ > > #include "intel_gt_types.h" > >+/** >+ * DOC: TLB cache invalidation logic >+ * >+ * The way the current algorithm works is that a struct drm_i915_gem_object can >+ * be created on any order. At unbind/evict time, the object is warranted that >+ * it won't be used anymore. So, a sequence number provided by >+ * intel_gt_next_invalidate_tlb_full() is stored on it. This can happen either >+ * at __vma_put_pages() - for VMA sync unbind, or at ppgtt_unbind_vma() - for >+ * VMA async VMA bind. >+ * >+ * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb_full() is called, >+ * where it checks if the sequence number of the object was already invalidated >+ * or not. If not, it flushes the TLB and increments the sequence number:: >+ * >+ * void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) >+ * { >+ * ... >+ * with_intel_gt_pm_if_awake(gt, wakeref) { >+ * mutex_lock(>->tlb.invalidate_lock); >+ * if (tlb_seqno_passed(gt, seqno)) >+ * goto unlock; >+ * >+ * // Some code to do TLB invalidation >+ * ... >+ * >+ * write_seqcount_invalidate(>->tlb.seqno); // increment seqno >+ * mutex_lock(>->tlb.invalidate_lock); >+ * } >+ * >+ * So, let's say the current seqno is 2 and 3 new objects were created, >+ * on this order:: >+ * >+ * obj1 >+ * obj2 >+ * obj3 >+ * >+ * They can be unbind/evict on a different order. At unbind/evict time, >+ * the mm.tlb will be stamped with the sequence number, using the number >+ * from the last TLB flush, plus 1. I am trying to get my head around the below function. void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb) { WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt)); } Though we pass obj->mm.tlb for 'tlb' while calling this function, aren't we writing to local 'tlb' variable here instead of obj->mm.tlb? >+ * >+ * Different threads may be used on unbind/evict and/or unset pages. >+ * As the logic at void intel_gt_invalidate_tlb_full() is protected by a mutex, May be we can skip 'void' and just keep function name here. >+ * for simplicity, let's consider just two threads: >+ * >+ * +-------------------+-------------------------+---------------------------------+ >+ * | sequence number | Thread 0 | Thread 1 + >+ * +===================+=========================+=================================+ >+ * | seqno=2 | | | >+ * | +-------------------------+---------------------------------+ >+ * | | unbind/evict obj3. | | >+ * | | | | >+ * | | obj3.mm.tlb = seqno | 1 | | >+ * | | // obj3.mm.tlb = 3 | | >+ * | +-------------------------+---------------------------------+ >+ * | | unbind/evict obj1. | | >+ * | | | | >+ * | | obj1.mm.tlb = seqno | 1 | | >+ * | | // obj1.mm.tlb = 3 | | >+ * | +-------------------------+---------------------------------+ >+ * | | | __i915_gem_object_unset_pages() | >+ * | | | called for obj3 => TLB flush | >+ * | | | invalidating both obj1 and obj2.| >+ * | | | | >+ * | | | seqno += 2 | >+ * +-------------------+-------------------------+---------------------------------+ >+ * | seqno=4 | | | >+ * | +-------------------------+---------------------------------+ >+ * | | unbind/evict obj2. | | >+ * | | | | >+ * | | obj2.mm.tlb = seqno | 1 | | >+ * | | // obj2.mm.tlb = 5 | | >+ * | +-------------------------+---------------------------------+ >+ * | | | __i915_gem_object_unset_pages() | >+ * | | | called for obj1, don't flush | >+ * | | | as past flush invalidated obj1. | >+ * | +-------------------------+---------------------------------+ >+ * | | | __i915_gem_object_unset_pages() | >+ * | | | called for obj2 => TLB flush. | >+ * | | | invalidating obj2. | >+ * | | | | >+ * | | | seqno += 2 | >+ * +-------------------+-------------------------+---------------------------------+ >+ * | seqno=6 | | | >+ * +-------------------+-------------------------+---------------------------------+ >+ */ >+ > void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno); > > void intel_gt_init_tlb(struct intel_gt *gt); > void intel_gt_fini_tlb(struct intel_gt *gt); > >+/** >+ * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number >+ * Probably this empty comment line needs to be removed before the parameter description below? >+ * @gt: GT structure >+ * >+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe >+ */ > static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) > { > return seqprop_sequence(>->tlb.seqno); > } > >+/** >+ * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation >+ * sequence number >+ * Same here. -Niranjana >+ * @gt: GT structure >+ * >+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe >+ */ > static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) > { > return intel_gt_tlb_seqno(gt) | 1; >-- >2.36.1 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions 2022-08-02 22:30 ` [Intel-gfx] " Niranjana Vishwanathapura @ 2022-08-04 7:24 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 7+ messages in thread From: Mauro Carvalho Chehab @ 2022-08-04 7:24 UTC (permalink / raw) To: Niranjana Vishwanathapura Cc: linux-doc, David Airlie, intel-gfx, Jonathan Corbet, linux-kernel, Chris Wilson, dri-devel, Thomas Zimmermann, Rodrigo Vivi, Mauro Carvalho Chehab On Tue, 2 Aug 2022 15:30:44 -0700 Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> wrote: > On Fri, Jul 29, 2022 at 09:03:55AM +0200, Mauro Carvalho Chehab wrote: > >Add a description for the TLB cache invalidation algorithm and for > >the related kAPI functions. > > > >Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> > >--- > > > >To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. > >See [PATCH v2 0/2] at: https://lore.kernel.org/all/cover.1659077372.git.mchehab@kernel.org/ > > > > Documentation/gpu/i915.rst | 7 ++ > > drivers/gpu/drm/i915/gt/intel_tlb.c | 25 +++++++ > > drivers/gpu/drm/i915/gt/intel_tlb.h | 101 ++++++++++++++++++++++++++++ > > 3 files changed, 133 insertions(+) > > > >diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst > >index 4e59db1cfb00..46911fdd79e8 100644 > >--- a/Documentation/gpu/i915.rst > >+++ b/Documentation/gpu/i915.rst > >@@ -58,6 +58,13 @@ Intel GVT-g Host Support(vGPU device model) > > .. kernel-doc:: drivers/gpu/drm/i915/intel_gvt.c > > :internal: > > > >+TLB cache invalidation > >+---------------------- > >+ > >+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.h > >+ > >+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.c > >+ > > Workarounds > > ----------- > > > >diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c > >index af8cae979489..4873b7ecc015 100644 > >--- a/drivers/gpu/drm/i915/gt/intel_tlb.c > >+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c > >@@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt) > > intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); > > } > > > >+/** > >+ * intel_gt_invalidate_tlb_full - do full TLB cache invalidation > >+ * @gt: GT structure > >+ * @seqno: sequence number > >+ * > >+ * Do a full TLB cache invalidation if the @seqno is bigger than the last > >+ * full TLB cache invalidation. > >+ * > >+ * Note: > >+ * The TLB cache invalidation logic depends on GEN-specific registers. > >+ * It currently supports MMIO-based TLB flush for GEN8 to GEN12. > >+ */ > > void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) > > { > > intel_wakeref_t wakeref; > >@@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) > > } > > } > > > >+/** > >+ * intel_gt_init_tlb - initialize TLB-specific vars > >+ * @gt: GT structure > >+ * > >+ * TLB cache invalidation logic internally uses some resources that require > >+ * initialization. Should be called before doing any TLB cache invalidation. > >+ */ > > void intel_gt_init_tlb(struct intel_gt *gt) > > { > > mutex_init(>->tlb.invalidate_lock); > > seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); > > } > > > >+/** > >+ * intel_gt_fini_tlb - initialize TLB-specific vars > > Free TLB-specific vars OK. > > >+ * @gt: GT structure > >+ * > >+ * Frees any resources needed by TLB cache invalidation logic. > >+ */ > > void intel_gt_fini_tlb(struct intel_gt *gt) > > { > > mutex_destroy(>->tlb.invalidate_lock); > >diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h > >index 46ce25bf5afe..dca70c33bd61 100644 > >--- a/drivers/gpu/drm/i915/gt/intel_tlb.h > >+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h > >@@ -11,16 +11,117 @@ > > > > #include "intel_gt_types.h" > > > >+/** > >+ * DOC: TLB cache invalidation logic > >+ * > >+ * The way the current algorithm works is that a struct drm_i915_gem_object can > >+ * be created on any order. At unbind/evict time, the object is warranted that > >+ * it won't be used anymore. So, a sequence number provided by > >+ * intel_gt_next_invalidate_tlb_full() is stored on it. This can happen either > >+ * at __vma_put_pages() - for VMA sync unbind, or at ppgtt_unbind_vma() - for > >+ * VMA async VMA bind. > >+ * > >+ * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb_full() is called, > >+ * where it checks if the sequence number of the object was already invalidated > >+ * or not. If not, it flushes the TLB and increments the sequence number:: > >+ * > >+ * void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) > >+ * { > >+ * ... > >+ * with_intel_gt_pm_if_awake(gt, wakeref) { > >+ * mutex_lock(>->tlb.invalidate_lock); > >+ * if (tlb_seqno_passed(gt, seqno)) > >+ * goto unlock; > >+ * > >+ * // Some code to do TLB invalidation > >+ * ... > >+ * > >+ * write_seqcount_invalidate(>->tlb.seqno); // increment seqno > >+ * mutex_lock(>->tlb.invalidate_lock); > >+ * } > >+ * > >+ * So, let's say the current seqno is 2 and 3 new objects were created, > >+ * on this order:: > >+ * > >+ * obj1 > >+ * obj2 > >+ * obj3 > >+ * > >+ * They can be unbind/evict on a different order. At unbind/evict time, > >+ * the mm.tlb will be stamped with the sequence number, using the number > >+ * from the last TLB flush, plus 1. > > I am trying to get my head around the below function. > > void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb) > { > WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt)); > } > > Though we pass obj->mm.tlb for 'tlb' while calling this function, > aren't we writing to local 'tlb' variable here instead of obj->mm.tlb? It should be passing a pointer. I wrote such fix after a review, but somehow it ended getting lost. I'll send the fix at v3. > >+ * > >+ * Different threads may be used on unbind/evict and/or unset pages. > >+ * As the logic at void intel_gt_invalidate_tlb_full() is protected by a mutex, > > May be we can skip 'void' and just keep function name here. Sure. > >+ * for simplicity, let's consider just two threads: > >+ * > >+ * +-------------------+-------------------------+---------------------------------+ > >+ * | sequence number | Thread 0 | Thread 1 + > >+ * +===================+=========================+=================================+ > >+ * | seqno=2 | | | > >+ * | +-------------------------+---------------------------------+ > >+ * | | unbind/evict obj3. | | > >+ * | | | | > >+ * | | obj3.mm.tlb = seqno | 1 | | > >+ * | | // obj3.mm.tlb = 3 | | > >+ * | +-------------------------+---------------------------------+ > >+ * | | unbind/evict obj1. | | > >+ * | | | | > >+ * | | obj1.mm.tlb = seqno | 1 | | > >+ * | | // obj1.mm.tlb = 3 | | > >+ * | +-------------------------+---------------------------------+ > >+ * | | | __i915_gem_object_unset_pages() | > >+ * | | | called for obj3 => TLB flush | > >+ * | | | invalidating both obj1 and obj2.| > >+ * | | | | > >+ * | | | seqno += 2 | > >+ * +-------------------+-------------------------+---------------------------------+ > >+ * | seqno=4 | | | > >+ * | +-------------------------+---------------------------------+ > >+ * | | unbind/evict obj2. | | > >+ * | | | | > >+ * | | obj2.mm.tlb = seqno | 1 | | > >+ * | | // obj2.mm.tlb = 5 | | > >+ * | +-------------------------+---------------------------------+ > >+ * | | | __i915_gem_object_unset_pages() | > >+ * | | | called for obj1, don't flush | > >+ * | | | as past flush invalidated obj1. | > >+ * | +-------------------------+---------------------------------+ > >+ * | | | __i915_gem_object_unset_pages() | > >+ * | | | called for obj2 => TLB flush. | > >+ * | | | invalidating obj2. | > >+ * | | | | > >+ * | | | seqno += 2 | > >+ * +-------------------+-------------------------+---------------------------------+ > >+ * | seqno=6 | | | > >+ * +-------------------+-------------------------+---------------------------------+ > >+ */ > >+ > > void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno); > > > > void intel_gt_init_tlb(struct intel_gt *gt); > > void intel_gt_fini_tlb(struct intel_gt *gt); > > > >+/** > >+ * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number > >+ * > > Probably this empty comment line needs to be removed before the parameter > description below? Kernel-doc actually accepts both with or without a blank line. My personal preference is to place a blank line, because sometimes the function description plus function name is bigger than one line. So, it is usually clearer when adding a blank line than doing something like this (perfectly valid kerneldoc markup): /** * long_function_name_foo - Lorem ipsum dolor sit amet, consectetur * adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore. * @bar: some parameter * ... But yeah, kernel-doc documentation example doesn't have a blank line. So, I'll drop it. > > >+ * @gt: GT structure > >+ * > >+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe > >+ */ > > static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) > > { > > return seqprop_sequence(>->tlb.seqno); > > } > > > >+/** > >+ * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation > >+ * sequence number > >+ * > > Same here. > > -Niranjana > > >+ * @gt: GT structure > >+ * > >+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe > >+ */ > > static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) > > { > > return intel_gt_tlb_seqno(gt) | 1; > >-- > >2.36.1 > > Thanks! Mauro ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-08-04 7:24 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-07-29 7:03 [PATCH v2 0/2] Move TLB invalidation code for its own file and document it Mauro Carvalho Chehab 2022-07-29 7:03 ` [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab 2022-08-02 22:21 ` [Intel-gfx] " Niranjana Vishwanathapura 2022-08-03 9:19 ` Andi Shyti 2022-07-29 7:03 ` [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions Mauro Carvalho Chehab 2022-08-02 22:30 ` [Intel-gfx] " Niranjana Vishwanathapura 2022-08-04 7:24 ` Mauro Carvalho Chehab
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).