From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: "Nerlige Ramappa, Umesh" <umesh.nerlige.ramappa@intel.com>,
intel-gfx@lists.freedesktop.org, John.C.Harrison@Intel.com
Subject: Re: [Intel-gfx] [PATCH] i915/pmu: Wire GuC backend to per-client busyness
Date: Tue, 14 Jun 2022 14:30:42 +0100 [thread overview]
Message-ID: <2d4f5fd5-a2d2-68ac-d7ae-bc81b14327bc@linux.intel.com> (raw)
In-Reply-To: <20220614004616.45561-1-umesh.nerlige.ramappa@intel.com>
On 14/06/2022 01:46, Nerlige Ramappa, Umesh wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> GuC provides engine_id and last_switch_in ticks for an active context in the
> pphwsp. The context image provides a 32 bit total ticks which is the accumulated
> by the context (a.k.a. context[CTX_TIMESTAMP]). This information is used to
> calculate the context busyness as follows:
>
> If the engine_id is valid, then busyness is the sum of accumulated total ticks
> and active ticks. Active ticks is calculated with current gt time as reference.
>
> If engine_id is invalid, busyness is equal to accumulated total ticks.
>
> Since KMD (CPU) retrieves busyness data from 2 sources - GPU and GuC, a
> potential race was highlighted in an earlier review that can lead to double
> accounting of busyness. While the solution to this is a wip, busyness is still
> usable for platforms running GuC submission.
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
> drivers/gpu/drm/i915/gt/intel_context.c | 11 +++-
> drivers/gpu/drm/i915/gt/intel_context.h | 6 +-
> drivers/gpu/drm/i915/gt/intel_context_types.h | 3 +
> drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 5 ++
> .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 55 ++++++++++++++++++-
> drivers/gpu/drm/i915/i915_drm_client.c | 6 +-
> 6 files changed, 75 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index 4070cb5711d8..a49f313db911 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -576,16 +576,23 @@ void intel_context_bind_parent_child(struct intel_context *parent,
> child->parallel.parent = parent;
> }
>
> -u64 intel_context_get_total_runtime_ns(const struct intel_context *ce)
> +u64 intel_context_get_total_runtime_ns(struct intel_context *ce)
> {
> u64 total, active;
>
> + if (ce->ops->update_stats)
> + ce->ops->update_stats(ce);
> +
> total = ce->stats.runtime.total;
> if (ce->ops->flags & COPS_RUNTIME_CYCLES)
> total *= ce->engine->gt->clock_period_ns;
>
> active = READ_ONCE(ce->stats.active);
> - if (active)
> + /*
> + * GuC backend returns the actual time the context was active, so skip
> + * the calculation here for GuC.
> + */
> + if (active && !intel_engine_uses_guc(ce->engine))
What is the point of looking at ce->stats.active in GuC mode? I see that
guc_context_update_stats/__guc_context_update_clks touches it, but I
can't spot that there is a purpose to it. This is the only conditional
reading it but it is short-circuited in GuC case.
Also, since a GuC only vfunc (update_stats) has been added, I wonder why
not just fork the whole runtime query (ce->get_total_runtime_ns). I
think that would end up cleaner.
> active = intel_context_clock() - active;
>
> return total + active;
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> index b7d3214d2cdd..5fc7c19ab29b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -56,7 +56,7 @@ static inline bool intel_context_is_parent(struct intel_context *ce)
> return !!ce->parallel.number_children;
> }
>
> -static inline bool intel_context_is_pinned(struct intel_context *ce);
> +static inline bool intel_context_is_pinned(const struct intel_context *ce);
>
> static inline struct intel_context *
> intel_context_to_parent(struct intel_context *ce)
> @@ -116,7 +116,7 @@ static inline int intel_context_lock_pinned(struct intel_context *ce)
> * Returns: true if the context is currently pinned for use by the GPU.
> */
> static inline bool
> -intel_context_is_pinned(struct intel_context *ce)
> +intel_context_is_pinned(const struct intel_context *ce)
> {
> return atomic_read(&ce->pin_count);
> }
> @@ -351,7 +351,7 @@ intel_context_clear_nopreempt(struct intel_context *ce)
> clear_bit(CONTEXT_NOPREEMPT, &ce->flags);
> }
>
> -u64 intel_context_get_total_runtime_ns(const struct intel_context *ce);
> +u64 intel_context_get_total_runtime_ns(struct intel_context *ce);
> u64 intel_context_get_avg_runtime_ns(struct intel_context *ce);
>
> static inline u64 intel_context_clock(void)
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 09f82545789f..0a3290c99a31 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -55,6 +55,8 @@ struct intel_context_ops {
>
> void (*sched_disable)(struct intel_context *ce);
>
> + void (*update_stats)(struct intel_context *ce);
> +
> void (*reset)(struct intel_context *ce);
> void (*destroy)(struct kref *kref);
>
> @@ -146,6 +148,7 @@ struct intel_context {
> struct ewma_runtime avg;
> u64 total;
> u32 last;
> + u64 start_gt_clk;
> I915_SELFTEST_DECLARE(u32 num_underflow);
> I915_SELFTEST_DECLARE(u32 max_underflow);
> } runtime;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> index b3c9a9327f76..6231ad03e4eb 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> @@ -196,6 +196,11 @@ static inline u8 guc_class_to_engine_class(u8 guc_class)
> return guc_class_engine_class_map[guc_class];
> }
>
> +/* Per context engine usage stats: */
> +#define PPHWSP_GUC_CONTEXT_USAGE_STAMP_LO (0x500 / sizeof(u32))
> +#define PPHWSP_GUC_CONTEXT_USAGE_STAMP_HI (PPHWSP_GUC_CONTEXT_USAGE_STAMP_LO + 1)
> +#define PPHWSP_GUC_CONTEXT_USAGE_ENGINE_ID (PPHWSP_GUC_CONTEXT_USAGE_STAMP_HI + 1)
> +
> /* Work item for submitting workloads into work queue of GuC. */
> struct guc_wq_item {
> u32 header;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 5a1dfacf24ea..b86401144417 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -378,7 +378,7 @@ static inline void set_context_guc_id_invalid(struct intel_context *ce)
> ce->guc_id.id = GUC_INVALID_CONTEXT_ID;
> }
>
> -static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
> +static inline struct intel_guc *ce_to_guc(const struct intel_context *ce)
> {
> return &ce->engine->gt->uc.guc;
> }
> @@ -1323,13 +1323,16 @@ static void __update_guc_busyness_stats(struct intel_guc *guc)
> spin_unlock_irqrestore(&guc->timestamp.lock, flags);
> }
>
> +static void __guc_context_update_clks(struct intel_context *ce);
> static void guc_timestamp_ping(struct work_struct *wrk)
> {
> struct intel_guc *guc = container_of(wrk, typeof(*guc),
> timestamp.work.work);
> struct intel_uc *uc = container_of(guc, typeof(*uc), guc);
> struct intel_gt *gt = guc_to_gt(guc);
> + struct intel_context *ce;
> intel_wakeref_t wakeref;
> + unsigned long index;
> int srcu, ret;
>
> /*
> @@ -1343,6 +1346,10 @@ static void guc_timestamp_ping(struct work_struct *wrk)
> with_intel_runtime_pm(>->i915->runtime_pm, wakeref)
> __update_guc_busyness_stats(guc);
>
> + /* adjust context stats for overflow */
> + xa_for_each(&guc->context_lookup, index, ce)
> + __guc_context_update_clks(ce);
> +
> intel_gt_reset_unlock(gt, srcu);
>
> mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
> @@ -1405,6 +1412,48 @@ void intel_guc_busyness_unpark(struct intel_gt *gt)
> guc->timestamp.ping_delay);
> }
>
> +static void __guc_context_update_clks(struct intel_context *ce)
> +{
> + struct intel_guc *guc = ce_to_guc(ce);
> + struct intel_gt *gt = ce->engine->gt;
> + u32 *pphwsp, last_switch, engine_id;
> + u64 start_gt_clk = 0, active = 0;
No need to init these two.
> + unsigned long flags;
> + ktime_t unused;
> +
> + spin_lock_irqsave(&guc->timestamp.lock, flags);
> +
> + pphwsp = ((void *)ce->lrc_reg_state) - LRC_STATE_OFFSET;
> + last_switch = READ_ONCE(pphwsp[PPHWSP_GUC_CONTEXT_USAGE_STAMP_LO]);
> + engine_id = READ_ONCE(pphwsp[PPHWSP_GUC_CONTEXT_USAGE_ENGINE_ID]);
> +
> + guc_update_pm_timestamp(guc, &unused);
> +
> + if (engine_id != 0xffffffff && last_switch) {
> + start_gt_clk = READ_ONCE(ce->stats.runtime.start_gt_clk);
> + __extend_last_switch(guc, &start_gt_clk, last_switch);
> + active = intel_gt_clock_interval_to_ns(gt, guc->timestamp.gt_stamp - start_gt_clk);
> + WRITE_ONCE(ce->stats.runtime.start_gt_clk, start_gt_clk);
> + WRITE_ONCE(ce->stats.active, active);
> + } else {
> + lrc_update_runtime(ce);
Why is this called from here? Presumably it was called already from
guc_context_unpin if here code things context is not active. Or will be
called shortly, once context save is done.
Also, some comments here describing the double accounting race would be
good. Or if there are some already in the physical engine code just
reference that function.
Regards,
Tvrtko
> + }
> +
> + spin_unlock_irqrestore(&guc->timestamp.lock, flags);
> +}
> +
> +static void guc_context_update_stats(struct intel_context *ce)
> +{
> + if (!intel_context_pin_if_active(ce)) {
> + WRITE_ONCE(ce->stats.runtime.start_gt_clk, 0);
> + WRITE_ONCE(ce->stats.active, 0);
> + return;
> + }
> +
> + __guc_context_update_clks(ce);
> + intel_context_unpin(ce);
> +}
> +
> static inline bool
> submission_disabled(struct intel_guc *guc)
> {
> @@ -2585,6 +2634,7 @@ static void guc_context_unpin(struct intel_context *ce)
> {
> struct intel_guc *guc = ce_to_guc(ce);
>
> + lrc_update_runtime(ce);
> unpin_guc_id(guc, ce);
> lrc_unpin(ce);
>
> @@ -3183,6 +3233,7 @@ static void remove_from_context(struct i915_request *rq)
> }
>
> static const struct intel_context_ops guc_context_ops = {
> + .flags = COPS_RUNTIME_CYCLES,
> .alloc = guc_context_alloc,
>
> .pre_pin = guc_context_pre_pin,
> @@ -3199,6 +3250,8 @@ static const struct intel_context_ops guc_context_ops = {
>
> .sched_disable = guc_context_sched_disable,
>
> + .update_stats = guc_context_update_stats,
> +
> .reset = lrc_reset,
> .destroy = guc_context_destroy,
>
> diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
> index 18d38cb59923..118db6f03f15 100644
> --- a/drivers/gpu/drm/i915/i915_drm_client.c
> +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> @@ -146,11 +146,7 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> seq_printf(m, "drm-client-id:\t%u\n", client->id);
>
> - /*
> - * Temporarily skip showing client engine information with GuC submission till
> - * fetching engine busyness is implemented in the GuC submission backend
> - */
> - if (GRAPHICS_VER(i915) < 8 || intel_uc_uses_guc_submission(&i915->gt0.uc))
> + if (GRAPHICS_VER(i915) < 8)
> return;
>
> for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
next prev parent reply other threads:[~2022-06-14 13:30 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-14 0:46 [Intel-gfx] [PATCH] i915/pmu: Wire GuC backend to per-client busyness Nerlige Ramappa, Umesh
2022-06-14 2:23 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2022-06-14 2:23 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-06-14 2:45 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-06-14 13:30 ` Tvrtko Ursulin [this message]
2022-06-14 16:32 ` [Intel-gfx] [PATCH] " Umesh Nerlige Ramappa
2022-06-15 7:08 ` Tvrtko Ursulin
2022-06-15 17:42 ` Umesh Nerlige Ramappa
2022-08-25 6:18 ` Dixit, Ashutosh
2022-06-15 0:29 ` [Intel-gfx] ✓ Fi.CI.IGT: success for " Patchwork
-- strict thread matches above, loose matches on Subject: below --
2022-06-16 19:08 [Intel-gfx] [PATCH] " Nerlige Ramappa, Umesh
2022-06-16 22:13 Nerlige Ramappa, Umesh
2022-06-17 8:00 ` Tvrtko Ursulin
2022-07-27 6:01 ` Umesh Nerlige Ramappa
2022-07-27 8:48 ` Tvrtko Ursulin
2022-08-01 19:02 ` Umesh Nerlige Ramappa
2022-08-02 8:41 ` Tvrtko Ursulin
2022-08-02 23:38 ` Umesh Nerlige Ramappa
2022-08-04 1:21 ` Umesh Nerlige Ramappa
2022-08-04 7:25 ` Tvrtko Ursulin
2022-08-04 23:21 Umesh Nerlige Ramappa
2022-08-05 9:45 ` Tvrtko Ursulin
2022-08-05 15:18 ` Umesh Nerlige Ramappa
[not found] ` <87fshl3yw0.wl-ashutosh.dixit@intel.com>
2022-08-26 15:44 ` Umesh Nerlige Ramappa
2022-08-25 5:03 ` Dixit, Ashutosh
2022-08-25 21:12 ` Dixit, Ashutosh
2022-08-26 1:44 ` Dixit, Ashutosh
2022-08-26 16:33 ` Umesh Nerlige Ramappa
2022-08-31 20:25 ` Dixit, Ashutosh
2022-08-31 22:57 ` Umesh Nerlige Ramappa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2d4f5fd5-a2d2-68ac-d7ae-bc81b14327bc@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=John.C.Harrison@Intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=umesh.nerlige.ramappa@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox