Re: [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	john.c.harrison@intel.com, daniel.vetter@ffwll.ch,
	Matthew Brost <matthew.brost@intel.com>
Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu
Date: Tue, 19 Oct 2021 09:32:07 +0100	[thread overview]
Message-ID: <bbaa19bf-d25d-d9cd-8064-cec23ec58b3a@linux.intel.com> (raw)
In-Reply-To: <20211018183544.GA10100@unerlige-ril-10.165.21.208>


On 18/10/2021 19:35, Umesh Nerlige Ramappa wrote:
> On Mon, Oct 18, 2021 at 08:58:01AM +0100, Tvrtko Ursulin wrote:
>>
>>
>> On 16/10/2021 00:47, Umesh Nerlige Ramappa wrote:
>>> With GuC handling scheduling, i915 is not aware of the time that a
>>> context is scheduled in and out of the engine. Since i915 pmu relies on
>>> this info to provide engine busyness to the user, GuC shares this info
>>> with i915 for all engines using shared memory. For each engine, this
>>> info contains:
>>>
>>> - total busyness: total time that the context was running (total)
>>> - id: id of the running context (id)
>>> - start timestamp: timestamp when the context started running (start)
>>>
>>> At the time (now) of sampling the engine busyness, if the id is valid
>>> (!= ~0), and start is non-zero, then the context is considered to be
>>> active and the engine busyness is calculated using the below equation
>>>
>>>     engine busyness = total + (now - start)
>>>
>>> All times are obtained from the gt clock base. For inactive contexts,
>>> engine busyness is just equal to the total.
>>>
>>> The start and total values provided by GuC are 32 bits and wrap around
>>> in a few minutes. Since perf pmu provides busyness as 64 bit
>>> monotonically increasing values, there is a need for this implementation
>>> to account for overflows and extend the time to 64 bits before returning
>>> busyness to the user. In order to do that, a worker runs periodically at
>>> frequency = 1/8th the time it takes for the timestamp to wrap. As an
>>> example, that would be once in 27 seconds for a gt clock frequency of
>>> 19.2 MHz.
>>>
>>> Note:
>>> There might be an overaccounting of busyness due to the fact that GuC
>>> may be updating the total and start values while kmd is reading them.
>>> (i.e kmd may read the updated total and the stale start). In such a
>>> case, user may see higher busyness value followed by smaller ones which
>>> would eventually catch up to the higher value.
>>>
>>> v2: (Tvrtko)
>>> - Include details in commit message
>>> - Move intel engine busyness function into execlist code
>>> - Use union inside engine->stats
>>> - Use natural type for ping delay jiffies
>>> - Drop active_work condition checks
>>> - Use for_each_engine if iterating all engines
>>> - Drop seq locking, use spinlock at guc level to update engine stats
>>> - Document worker specific details
>>>
>>> v3: (Tvrtko/Umesh)
>>> - Demarcate guc and execlist stat objects with comments
>>> - Document known over-accounting issue in commit
>>> - Provide a consistent view of guc state
>>> - Add hooks to gt park/unpark for guc busyness
>>> - Stop/start worker in gt park/unpark path
>>> - Drop inline
>>> - Move spinlock and worker inits to guc initialization
>>> - Drop helpers that are called only once
>>>
>>> v4: (Tvrtko/Matt/Umesh)
>>> - Drop addressed opens from commit message
>>> - Get runtime pm in ping, remove from the park path
>>> - Use cancel_delayed_work_sync in disable_submission path
>>> - Update stats during reset prepare
>>> - Skip ping if reset in progress
>>> - Explicitly name execlists and guc stats objects
>>> - Since disable_submission is called from many places, move resetting
>>>   stats to intel_guc_submission_reset_prepare
>>>
>>> v5: (Tvrtko)
>>> - Add a trylock helper that does not sleep and synchronize PMU event
>>>   callbacks and worker with gt reset
>>>
>>> v6: (CI BAT failures)
>>> - DUTs using execlist submission failed to boot since __gt_unpark is
>>>   called during i915 load. This ends up calling the guc busyness unpark
>>>   hook and results in kiskstarting an uninitialized worker. Let
>>>   park/unpark hooks check if guc submission has been initialized.
>>> - drop cant_sleep() from trylock hepler since rcu_read_lock takes care
>>>   of that.
>>>
>>> v7: (CI) Fix igt@i915_selftest@live@gt_engines
>>> - For guc mode of submission the engine busyness is derived from gt time
>>>   domain. Use gt time elapsed as reference in the selftest.
>>> - Increase busyness calculation to 10ms duration to ensure batch runs
>>>   longer and falls within the busyness tolerances in selftest.
>>
>> [snip]
>>
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c 
>>> b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>> index 75569666105d..24358bef6691 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>> @@ -234,6 +234,7 @@ static int live_engine_busy_stats(void *arg)
>>>          struct i915_request *rq;
>>>          ktime_t de, dt;
>>>          ktime_t t[2];
>>> +        u32 gt_stamp;
>>>          if (!intel_engine_supports_stats(engine))
>>>              continue;
>>> @@ -251,10 +252,16 @@ static int live_engine_busy_stats(void *arg)
>>>          ENGINE_TRACE(engine, "measuring idle time\n");
>>>          preempt_disable();
>>>          de = intel_engine_get_busy_time(engine, &t[0]);
>>> -        udelay(100);
>>> +        gt_stamp = intel_uncore_read(gt->uncore, GUCPMTIMESTAMP);
>>> +        udelay(10000);
>>>          de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de);
>>> +        gt_stamp = intel_uncore_read(gt->uncore, GUCPMTIMESTAMP) - 
>>> gt_stamp;
>>>          preempt_enable();
>>> -        dt = ktime_sub(t[1], t[0]);
>>> +
>>> +        dt = intel_engine_uses_guc(engine) ?
>>> +             intel_gt_clock_interval_to_ns(engine->gt, gt_stamp) :
>>> +             ktime_sub(t[1], t[0]);
>>
>> But this then shows the thing might not work for external callers like 
>> PMU who have no idea about GUCPMTIMESTAMP and cannot obtain it anyway.
>>
>> What is the root cause of the failure here, 100us or clock source? Is 
>> the granularity of GUCPMTIMESTAMP perhaps simply too coarse for 100us 
>> test period? I forget what frequency it runs at.
> 
> guc timestamp is ticking at 19.2 MHz in adlp/rkl (where I ran this).

So ~52ns clock granularity, right?

In which case 100us with +/- 52ns error should be max 0.05% error - is 
this math correct?

> 
> 1)
> With 100us, often times I see that the batch has not yet started, so I 
> get busy time in the range 0 - 60 %. I increased the time such that the 
> batch runs long enough to make the scheduling time < 5%.

0-60% should not be possible since there is a igt_wait_for_spinner call 
before measuring starts, which ensures spinner is executing on the GPU.

I think we first need to understand where is this 0 - 60% problem coming 
from because I don't think it is from batch not yet started.

Regards,

Tvrtko

> 
> 2)
> I did a 100 runs on rkl/adlp. No failures on rkl. On adlp, I saw one in 
> 25 runs show 93%/94% busyness for rcs0 and fail (expected is 95%). For 
> that I tried using the guc timestamp thinking it would provide more 
> accuracy. It did in my testing, but CI still failed for rkl-guc (110% 
> busyness!!), so now I just think we need to tweak the expected busyness 
> for guc.
> 
> Is 1) acceptable?
> 
> For 2) I am thinking of just changing the expected busyness to 90% plus 
> for guc mode OR should we just let it fail occassionally? Thoughts?
> 
> Thanks,
> Umesh
> 
>>
>> Regards,
>>
>> Tvrtko

next prev parent reply	other threads:[~2021-10-19  8:33 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-15 23:47 [Intel-gfx] [PATCH 1/2] drm/i915/pmu: Add a name to the execlists stats Umesh Nerlige Ramappa
2021-10-15 23:47 ` [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu Umesh Nerlige Ramappa
2021-10-18  7:58   ` Tvrtko Ursulin
2021-10-18 18:35     ` Umesh Nerlige Ramappa
2021-10-18 20:35       ` Umesh Nerlige Ramappa
2021-10-19  8:32       ` Tvrtko Ursulin [this message]
2021-10-20  4:41         ` Umesh Nerlige Ramappa
2021-10-16  1:22 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/2] drm/i915/pmu: Add a name to the execlists stats Patchwork
2021-10-16  1:24 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-10-16  2:06 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2021-10-27  0:48 [Intel-gfx] [PATCH 1/2] " Umesh Nerlige Ramappa
2021-10-27  0:48 ` [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu Umesh Nerlige Ramappa
2021-10-27 20:02   ` Matthew Brost
2021-10-30  0:40   ` Umesh Nerlige Ramappa
2022-10-21  8:42   ` Tvrtko Ursulin
2022-10-22  0:21     ` Umesh Nerlige Ramappa
2021-10-15  1:18 [Intel-gfx] [PATCH 1/2] drm/i915/pmu: Add a name to the execlists stats Umesh Nerlige Ramappa
2021-10-15  1:18 ` [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu Umesh Nerlige Ramappa
2021-10-13  0:56 [Intel-gfx] [PATCH 1/2] drm/i915/pmu: Add a name to the execlists stats Umesh Nerlige Ramappa
2021-10-13  0:56 ` [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu Umesh Nerlige Ramappa
2021-10-13 16:06   ` Tvrtko Ursulin
2021-10-13 16:27     ` Umesh Nerlige Ramappa
2021-10-14  8:21   ` Tvrtko Ursulin
2021-10-15  1:01     ` Umesh Nerlige Ramappa
2021-10-07 22:55 [Intel-gfx] [PATCH 1/2] drm/i915/pmu: Add a name to the execlists stats Umesh Nerlige Ramappa
2021-10-07 22:55 ` [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu Umesh Nerlige Ramappa
2021-10-11 11:41   ` Tvrtko Ursulin
2021-10-11 20:08     ` Umesh Nerlige Ramappa
2021-10-12  8:26       ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbaa19bf-d25d-d9cd-8064-cec23ec58b3a@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=john.c.harrison@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=umesh.nerlige.ramappa@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox