Re: [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	john.c.harrison@intel.com, daniel.vetter@ffwll.ch,
	Matthew Brost <matthew.brost@intel.com>
Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu
Date: Tue, 19 Oct 2021 09:32:07 +0100	[thread overview]
Message-ID: <bbaa19bf-d25d-d9cd-8064-cec23ec58b3a@linux.intel.com> (raw)
In-Reply-To: <20211018183544.GA10100@unerlige-ril-10.165.21.208>


On 18/10/2021 19:35, Umesh Nerlige Ramappa wrote:
> On Mon, Oct 18, 2021 at 08:58:01AM +0100, Tvrtko Ursulin wrote:
>>
>>
>> On 16/10/2021 00:47, Umesh Nerlige Ramappa wrote:
>>> With GuC handling scheduling, i915 is not aware of the time that a
>>> context is scheduled in and out of the engine. Since i915 pmu relies on
>>> this info to provide engine busyness to the user, GuC shares this info
>>> with i915 for all engines using shared memory. For each engine, this
>>> info contains:
>>>
>>> - total busyness: total time that the context was running (total)
>>> - id: id of the running context (id)
>>> - start timestamp: timestamp when the context started running (start)
>>>
>>> At the time (now) of sampling the engine busyness, if the id is valid
>>> (!= ~0), and start is non-zero, then the context is considered to be
>>> active and the engine busyness is calculated using the below equation
>>>
>>>     engine busyness = total + (now - start)
>>>
>>> All times are obtained from the gt clock base. For inactive contexts,
>>> engine busyness is just equal to the total.
>>>
>>> The start and total values provided by GuC are 32 bits and wrap around
>>> in a few minutes. Since perf pmu provides busyness as 64 bit
>>> monotonically increasing values, there is a need for this implementation
>>> to account for overflows and extend the time to 64 bits before returning
>>> busyness to the user. In order to do that, a worker runs periodically at
>>> frequency = 1/8th the time it takes for the timestamp to wrap. As an
>>> example, that would be once in 27 seconds for a gt clock frequency of
>>> 19.2 MHz.
>>>
>>> Note:
>>> There might be an overaccounting of busyness due to the fact that GuC
>>> may be updating the total and start values while kmd is reading them.
>>> (i.e kmd may read the updated total and the stale start). In such a
>>> case, user may see higher busyness value followed by smaller ones which
>>> would eventually catch up to the higher value.
>>>
>>> v2: (Tvrtko)
>>> - Include details in commit message
>>> - Move intel engine busyness function into execlist code
>>> - Use union inside engine->stats
>>> - Use natural type for ping delay jiffies
>>> - Drop active_work condition checks
>>> - Use for_each_engine if iterating all engines
>>> - Drop seq locking, use spinlock at guc level to update engine stats
>>> - Document worker specific details
>>>
>>> v3: (Tvrtko/Umesh)
>>> - Demarcate guc and execlist stat objects with comments
>>> - Document known over-accounting issue in commit
>>> - Provide a consistent view of guc state
>>> - Add hooks to gt park/unpark for guc busyness
>>> - Stop/start worker in gt park/unpark path
>>> - Drop inline
>>> - Move spinlock and worker inits to guc initialization
>>> - Drop helpers that are called only once
>>>
>>> v4: (Tvrtko/Matt/Umesh)
>>> - Drop addressed opens from commit message
>>> - Get runtime pm in ping, remove from the park path
>>> - Use cancel_delayed_work_sync in disable_submission path
>>> - Update stats during reset prepare
>>> - Skip ping if reset in progress
>>> - Explicitly name execlists and guc stats objects
>>> - Since disable_submission is called from many places, move resetting
>>>   stats to intel_guc_submission_reset_prepare
>>>
>>> v5: (Tvrtko)
>>> - Add a trylock helper that does not sleep and synchronize PMU event
>>>   callbacks and worker with gt reset
>>>
>>> v6: (CI BAT failures)
>>> - DUTs using execlist submission failed to boot since __gt_unpark is
>>>   called during i915 load. This ends up calling the guc busyness unpark
>>>   hook and results in kiskstarting an uninitialized worker. Let
>>>   park/unpark hooks check if guc submission has been initialized.
>>> - drop cant_sleep() from trylock hepler since rcu_read_lock takes care
>>>   of that.
>>>
>>> v7: (CI) Fix igt@i915_selftest@live@gt_engines
>>> - For guc mode of submission the engine busyness is derived from gt time
>>>   domain. Use gt time elapsed as reference in the selftest.
>>> - Increase busyness calculation to 10ms duration to ensure batch runs
>>>   longer and falls within the busyness tolerances in selftest.
>>
>> [snip]
>>
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c 
>>> b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>> index 75569666105d..24358bef6691 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>> @@ -234,6 +234,7 @@ static int live_engine_busy_stats(void *arg)
>>>          struct i915_request *rq;
>>>          ktime_t de, dt;
>>>          ktime_t t[2];
>>> +        u32 gt_stamp;
>>>          if (!intel_engine_supports_stats(engine))
>>>              continue;
>>> @@ -251,10 +252,16 @@ static int live_engine_busy_stats(void *arg)
>>>          ENGINE_TRACE(engine, "measuring idle time\n");
>>>          preempt_disable();
>>>          de = intel_engine_get_busy_time(engine, &t[0]);
>>> -        udelay(100);
>>> +        gt_stamp = intel_uncore_read(gt->uncore, GUCPMTIMESTAMP);
>>> +        udelay(10000);
>>>          de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de);
>>> +        gt_stamp = intel_uncore_read(gt->uncore, GUCPMTIMESTAMP) - 
>>> gt_stamp;
>>>          preempt_enable();
>>> -        dt = ktime_sub(t[1], t[0]);
>>> +
>>> +        dt = intel_engine_uses_guc(engine) ?
>>> +             intel_gt_clock_interval_to_ns(engine->gt, gt_stamp) :
>>> +             ktime_sub(t[1], t[0]);
>>
>> But this then shows the thing might not work for external callers like 
>> PMU who have no idea about GUCPMTIMESTAMP and cannot obtain it anyway.
>>
>> What is the root cause of the failure here, 100us or clock source? Is 
>> the granularity of GUCPMTIMESTAMP perhaps simply too coarse for 100us 
>> test period? I forget what frequency it runs at.
> 
> guc timestamp is ticking at 19.2 MHz in adlp/rkl (where I ran this).

So ~52ns clock granularity, right?

In which case 100us with +/- 52ns error should be max 0.05% error - is 
this math correct?

> 
> 1)
> With 100us, often times I see that the batch has not yet started, so I 
> get busy time in the range 0 - 60 %. I increased the time such that the 
> batch runs long enough to make the scheduling time < 5%.

0-60% should not be possible since there is a igt_wait_for_spinner call 
before measuring starts, which ensures spinner is executing on the GPU.

I think we first need to understand where is this 0 - 60% problem coming 
from because I don't think it is from batch not yet started.

Regards,

Tvrtko

> 
> 2)
> I did a 100 runs on rkl/adlp. No failures on rkl. On adlp, I saw one in 
> 25 runs show 93%/94% busyness for rcs0 and fail (expected is 95%). For 
> that I tried using the guc timestamp thinking it would provide more 
> accuracy. It did in my testing, but CI still failed for rkl-guc (110% 
> busyness!!), so now I just think we need to tweak the expected busyness 
> for guc.
> 
> Is 1) acceptable?
> 
> For 2) I am thinking of just changing the expected busyness to 90% plus 
> for guc mode OR should we just let it fail occassionally? Thoughts?
> 
> Thanks,
> Umesh
> 
>>
>> Regards,
>>
>> Tvrtko

WARNING: multiple messages have this Message-ID (diff)

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	john.c.harrison@intel.com, daniel.vetter@ffwll.ch,
	Matthew Brost <matthew.brost@intel.com>
Subject: Re: [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu
Date: Tue, 19 Oct 2021 09:32:07 +0100	[thread overview]
Message-ID: <bbaa19bf-d25d-d9cd-8064-cec23ec58b3a@linux.intel.com> (raw)
In-Reply-To: <20211018183544.GA10100@unerlige-ril-10.165.21.208>


On 18/10/2021 19:35, Umesh Nerlige Ramappa wrote:
> On Mon, Oct 18, 2021 at 08:58:01AM +0100, Tvrtko Ursulin wrote:
>>
>>
>> On 16/10/2021 00:47, Umesh Nerlige Ramappa wrote:
>>> With GuC handling scheduling, i915 is not aware of the time that a
>>> context is scheduled in and out of the engine. Since i915 pmu relies on
>>> this info to provide engine busyness to the user, GuC shares this info
>>> with i915 for all engines using shared memory. For each engine, this
>>> info contains:
>>>
>>> - total busyness: total time that the context was running (total)
>>> - id: id of the running context (id)
>>> - start timestamp: timestamp when the context started running (start)
>>>
>>> At the time (now) of sampling the engine busyness, if the id is valid
>>> (!= ~0), and start is non-zero, then the context is considered to be
>>> active and the engine busyness is calculated using the below equation
>>>
>>>     engine busyness = total + (now - start)
>>>
>>> All times are obtained from the gt clock base. For inactive contexts,
>>> engine busyness is just equal to the total.
>>>
>>> The start and total values provided by GuC are 32 bits and wrap around
>>> in a few minutes. Since perf pmu provides busyness as 64 bit
>>> monotonically increasing values, there is a need for this implementation
>>> to account for overflows and extend the time to 64 bits before returning
>>> busyness to the user. In order to do that, a worker runs periodically at
>>> frequency = 1/8th the time it takes for the timestamp to wrap. As an
>>> example, that would be once in 27 seconds for a gt clock frequency of
>>> 19.2 MHz.
>>>
>>> Note:
>>> There might be an overaccounting of busyness due to the fact that GuC
>>> may be updating the total and start values while kmd is reading them.
>>> (i.e kmd may read the updated total and the stale start). In such a
>>> case, user may see higher busyness value followed by smaller ones which
>>> would eventually catch up to the higher value.
>>>
>>> v2: (Tvrtko)
>>> - Include details in commit message
>>> - Move intel engine busyness function into execlist code
>>> - Use union inside engine->stats
>>> - Use natural type for ping delay jiffies
>>> - Drop active_work condition checks
>>> - Use for_each_engine if iterating all engines
>>> - Drop seq locking, use spinlock at guc level to update engine stats
>>> - Document worker specific details
>>>
>>> v3: (Tvrtko/Umesh)
>>> - Demarcate guc and execlist stat objects with comments
>>> - Document known over-accounting issue in commit
>>> - Provide a consistent view of guc state
>>> - Add hooks to gt park/unpark for guc busyness
>>> - Stop/start worker in gt park/unpark path
>>> - Drop inline
>>> - Move spinlock and worker inits to guc initialization
>>> - Drop helpers that are called only once
>>>
>>> v4: (Tvrtko/Matt/Umesh)
>>> - Drop addressed opens from commit message
>>> - Get runtime pm in ping, remove from the park path
>>> - Use cancel_delayed_work_sync in disable_submission path
>>> - Update stats during reset prepare
>>> - Skip ping if reset in progress
>>> - Explicitly name execlists and guc stats objects
>>> - Since disable_submission is called from many places, move resetting
>>>   stats to intel_guc_submission_reset_prepare
>>>
>>> v5: (Tvrtko)
>>> - Add a trylock helper that does not sleep and synchronize PMU event
>>>   callbacks and worker with gt reset
>>>
>>> v6: (CI BAT failures)
>>> - DUTs using execlist submission failed to boot since __gt_unpark is
>>>   called during i915 load. This ends up calling the guc busyness unpark
>>>   hook and results in kiskstarting an uninitialized worker. Let
>>>   park/unpark hooks check if guc submission has been initialized.
>>> - drop cant_sleep() from trylock hepler since rcu_read_lock takes care
>>>   of that.
>>>
>>> v7: (CI) Fix igt@i915_selftest@live@gt_engines
>>> - For guc mode of submission the engine busyness is derived from gt time
>>>   domain. Use gt time elapsed as reference in the selftest.
>>> - Increase busyness calculation to 10ms duration to ensure batch runs
>>>   longer and falls within the busyness tolerances in selftest.
>>
>> [snip]
>>
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c 
>>> b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>> index 75569666105d..24358bef6691 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>> @@ -234,6 +234,7 @@ static int live_engine_busy_stats(void *arg)
>>>          struct i915_request *rq;
>>>          ktime_t de, dt;
>>>          ktime_t t[2];
>>> +        u32 gt_stamp;
>>>          if (!intel_engine_supports_stats(engine))
>>>              continue;
>>> @@ -251,10 +252,16 @@ static int live_engine_busy_stats(void *arg)
>>>          ENGINE_TRACE(engine, "measuring idle time\n");
>>>          preempt_disable();
>>>          de = intel_engine_get_busy_time(engine, &t[0]);
>>> -        udelay(100);
>>> +        gt_stamp = intel_uncore_read(gt->uncore, GUCPMTIMESTAMP);
>>> +        udelay(10000);
>>>          de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de);
>>> +        gt_stamp = intel_uncore_read(gt->uncore, GUCPMTIMESTAMP) - 
>>> gt_stamp;
>>>          preempt_enable();
>>> -        dt = ktime_sub(t[1], t[0]);
>>> +
>>> +        dt = intel_engine_uses_guc(engine) ?
>>> +             intel_gt_clock_interval_to_ns(engine->gt, gt_stamp) :
>>> +             ktime_sub(t[1], t[0]);
>>
>> But this then shows the thing might not work for external callers like 
>> PMU who have no idea about GUCPMTIMESTAMP and cannot obtain it anyway.
>>
>> What is the root cause of the failure here, 100us or clock source? Is 
>> the granularity of GUCPMTIMESTAMP perhaps simply too coarse for 100us 
>> test period? I forget what frequency it runs at.
> 
> guc timestamp is ticking at 19.2 MHz in adlp/rkl (where I ran this).

So ~52ns clock granularity, right?

In which case 100us with +/- 52ns error should be max 0.05% error - is 
this math correct?

> 
> 1)
> With 100us, often times I see that the batch has not yet started, so I 
> get busy time in the range 0 - 60 %. I increased the time such that the 
> batch runs long enough to make the scheduling time < 5%.

0-60% should not be possible since there is a igt_wait_for_spinner call 
before measuring starts, which ensures spinner is executing on the GPU.

I think we first need to understand where is this 0 - 60% problem coming 
from because I don't think it is from batch not yet started.

Regards,

Tvrtko

> 
> 2)
> I did a 100 runs on rkl/adlp. No failures on rkl. On adlp, I saw one in 
> 25 runs show 93%/94% busyness for rcs0 and fail (expected is 95%). For 
> that I tried using the guc timestamp thinking it would provide more 
> accuracy. It did in my testing, but CI still failed for rkl-guc (110% 
> busyness!!), so now I just think we need to tweak the expected busyness 
> for guc.
> 
> Is 1) acceptable?
> 
> For 2) I am thinking of just changing the expected busyness to 90% plus 
> for guc mode OR should we just let it fail occassionally? Thoughts?
> 
> Thanks,
> Umesh
> 
>>
>> Regards,
>>
>> Tvrtko

next prev parent reply	other threads:[~2021-10-19  8:33 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-15 23:47 [Intel-gfx] [PATCH 1/2] drm/i915/pmu: Add a name to the execlists stats Umesh Nerlige Ramappa
2021-10-15 23:47 ` Umesh Nerlige Ramappa
2021-10-15 23:47 ` [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu Umesh Nerlige Ramappa
2021-10-15 23:47   ` Umesh Nerlige Ramappa
2021-10-18  7:58   ` [Intel-gfx] " Tvrtko Ursulin
2021-10-18  7:58     ` Tvrtko Ursulin
2021-10-18 18:35     ` [Intel-gfx] " Umesh Nerlige Ramappa
2021-10-18 18:35       ` Umesh Nerlige Ramappa
2021-10-18 20:35       ` [Intel-gfx] " Umesh Nerlige Ramappa
2021-10-18 20:35         ` Umesh Nerlige Ramappa
2021-10-19  8:32       ` Tvrtko Ursulin [this message]
2021-10-19  8:32         ` Tvrtko Ursulin
2021-10-20  4:41         ` [Intel-gfx] " Umesh Nerlige Ramappa
2021-10-20  4:41           ` Umesh Nerlige Ramappa
2021-10-16  1:22 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/2] drm/i915/pmu: Add a name to the execlists stats Patchwork
2021-10-16  1:24 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-10-16  2:06 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2021-10-27  0:48 [Intel-gfx] [PATCH 1/2] " Umesh Nerlige Ramappa
2021-10-27  0:48 ` [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu Umesh Nerlige Ramappa
2021-10-27 20:02   ` Matthew Brost
2021-10-30  0:40   ` Umesh Nerlige Ramappa
2022-10-21  8:42   ` Tvrtko Ursulin
2022-10-22  0:21     ` Umesh Nerlige Ramappa
2021-10-15  1:18 [Intel-gfx] [PATCH 1/2] drm/i915/pmu: Add a name to the execlists stats Umesh Nerlige Ramappa
2021-10-15  1:18 ` [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu Umesh Nerlige Ramappa
2021-10-13  0:56 [Intel-gfx] [PATCH 1/2] drm/i915/pmu: Add a name to the execlists stats Umesh Nerlige Ramappa
2021-10-13  0:56 ` [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu Umesh Nerlige Ramappa
2021-10-13 16:06   ` Tvrtko Ursulin
2021-10-13 16:27     ` Umesh Nerlige Ramappa
2021-10-14  8:21   ` Tvrtko Ursulin
2021-10-15  1:01     ` Umesh Nerlige Ramappa
2021-10-07 22:55 [Intel-gfx] [PATCH 1/2] drm/i915/pmu: Add a name to the execlists stats Umesh Nerlige Ramappa
2021-10-07 22:55 ` [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu Umesh Nerlige Ramappa
2021-10-11 11:41   ` Tvrtko Ursulin
2021-10-11 20:08     ` Umesh Nerlige Ramappa
2021-10-12  8:26       ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbaa19bf-d25d-d9cd-8064-cec23ec58b3a@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=john.c.harrison@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=umesh.nerlige.ramappa@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.