From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Alan Previn <alan.previn.teres.alexis@intel.com>,
intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH v4 1/1] drm/i915/guc: Don't update engine busyness stats too frequently
Date: Thu, 23 Jun 2022 09:09:20 +0100 [thread overview]
Message-ID: <82877d1a-03ed-fb88-1d1d-3f3944223fbc@linux.intel.com> (raw)
In-Reply-To: <20220623023157.211650-2-alan.previn.teres.alexis@intel.com>
On 23/06/2022 03:31, Alan Previn wrote:
> Using two different types of workoads, it was observed that
> guc_update_engine_gt_clks was being called too frequently and/or
> causing a CPU-to-lmem bandwidth hit over PCIE. Details on
> the workloads and numbers are in the notes below.
>
> Background: At the moment, guc_update_engine_gt_clks can be invoked
> via one of 3 ways. #1 and #2 are infrequent under normal operating
> conditions:
> 1.When a predefined "ping_delay" timer expires so that GuC-
> busyness can sample the GTPM clock counter to ensure it
> doesn't miss a wrap-around of the 32-bits of the HW counter.
> (The ping_delay is calculated based on 1/8th the time taken
> for the counter go from 0x0 to 0xffffffff based on the
> GT frequency. This comes to about once every 28 seconds at a
> GT frequency of 19.2Mhz).
> 2.In preparation for a gt reset.
> 3.In response to __gt_park events (as the gt power management
> puts the gt into a lower power state when there is no work
> being done).
>
> Root-cause: For both the workloads described farther below, it was
> observed that when user space calls IOCTLs that unparks the
> gt momentarily and repeats such calls many times in quick succession,
> it triggers calling guc_update_engine_gt_clks as many times. However,
> the primary purpose of guc_update_engine_gt_clks is to ensure we don't
> miss the wraparound while the counter is ticking. Thus, the solution
> is to ensure we skip that check if gt_park is calling this function
> earlier than necessary.
>
> Solution: Snapshot jiffies when we do actually update the busyness
> stats. Then get the new jiffies every time intel_guc_busyness_park
> is called and bail if we are being called too soon. Use half of the
> ping_delay as a safe threshold.
>
> NOTE1: Workload1: IGTs' gem_create was modified to create a file handle,
> allocate memory with sizes that range from a min of 4K to the max supported
> (in power of two step-sizes). Its maps, modifies and reads back the
> memory. Allocations and modification is repeated until total memory
> allocation reaches the max. Then the file handle is closed. With this
> workload, guc_update_engine_gt_clks was called over 188 thousand times
> in the span of 15 seconds while this test ran three times. With this patch,
> the number of calls reduced to 14.
>
> NOTE2: Workload2: 30 transcode sessions are created in quick succession.
> While these sessions are created, pcm-iio tool was used to measure I/O
> read operation bandwidth consumption sampled at 100 milisecond intervals
> over the course of 20 seconds. The total bandwidth consumed over 20 seconds
> without this patch was measured at average at 311KBps per sample. With this
> patch, the number went down to about 175Kbps which is about a 43% savings.
>
> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Super detailed and clear - thank you!
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
One question below:
> ---
> drivers/gpu/drm/i915/gt/uc/intel_guc.h | 8 ++++++++
> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 +++++++++++++
> 2 files changed, 21 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 966e69a8b1c1..d0d99f178f2d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -230,6 +230,14 @@ struct intel_guc {
> * @shift: Right shift value for the gpm timestamp
> */
> u32 shift;
> +
> + /**
> + * @last_stat_jiffies: jiffies at last actual stats collection time
> + * We use this timestamp to ensure we don't oversample the
> + * stats because runtime power management events can trigger
> + * stats collection at much higher rates than required.
> + */
> + unsigned long last_stat_jiffies;
> } timestamp;
>
> #ifdef CONFIG_DRM_I915_SELFTEST
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index e62ea35513ea..c9f167b80910 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1314,6 +1314,8 @@ static void __update_guc_busyness_stats(struct intel_guc *guc)
> unsigned long flags;
> ktime_t unused;
>
> + guc->timestamp.last_stat_jiffies = jiffies;
> +
> spin_lock_irqsave(&guc->timestamp.lock, flags);
>
> guc_update_pm_timestamp(guc, &unused);
> @@ -1386,6 +1388,17 @@ void intel_guc_busyness_park(struct intel_gt *gt)
> return;
>
> cancel_delayed_work(&guc->timestamp.work);
> +
> + /*
> + * Before parking, we should sample engine busyness stats if we need to.
> + * We can skip it if we are less than half a ping from the last time we
> + * sampled the busyness stats.
> + */
> + if (guc->timestamp.last_stat_jiffies &&
Is there a case where we enter park with last_stat_jiffies being unset
and so skip updating, and if there is, is that a concern? If it is then
see if it is safe just to not have the zero check. Worst that could
happen is one extra sampling if either unset or overflows, right?
Regards,
Tvrtko
> + !time_after(jiffies, guc->timestamp.last_stat_jiffies +
> + (guc->timestamp.ping_delay / 2)))
> + return;
> +
> __update_guc_busyness_stats(guc);
> }
>
next prev parent reply other threads:[~2022-06-23 8:09 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-23 2:31 [Intel-gfx] [Intel-gfx v4 0/1] drm/i915/guc: Don't update engine busyness stats too frequently Alan Previn
2022-06-23 2:31 ` [Intel-gfx] [PATCH v4 1/1] " Alan Previn
2022-06-23 8:09 ` Tvrtko Ursulin [this message]
2022-06-23 2:54 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
2022-06-23 3:24 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-06-27 12:47 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-06-27 16:56 ` Teres Alexis, Alan Previn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=82877d1a-03ed-fb88-1d1d-3f3944223fbc@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=alan.previn.teres.alexis@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox