Re: [Intel-gfx] [PATCH v4 1/1] drm/i915/guc: Don't update engine busyness stats too frequently

Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Alan Previn <alan.previn.teres.alexis@intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH v4 1/1] drm/i915/guc: Don't update engine busyness stats too frequently
Date: Thu, 23 Jun 2022 09:09:20 +0100	[thread overview]
Message-ID: <82877d1a-03ed-fb88-1d1d-3f3944223fbc@linux.intel.com> (raw)
In-Reply-To: <20220623023157.211650-2-alan.previn.teres.alexis@intel.com>


On 23/06/2022 03:31, Alan Previn wrote:
> Using two different types of workoads, it was observed that
> guc_update_engine_gt_clks was being called too frequently and/or
> causing a CPU-to-lmem bandwidth hit over PCIE. Details on
> the workloads and numbers are in the notes below.
> 
> Background: At the moment, guc_update_engine_gt_clks can be invoked
> via one of 3 ways. #1 and #2 are infrequent under normal operating
> conditions:
>       1.When a predefined "ping_delay" timer expires so that GuC-
>         busyness can sample the GTPM clock counter to ensure it
>         doesn't miss a wrap-around of the 32-bits of the HW counter.
>         (The ping_delay is calculated based on 1/8th the time taken
>         for the counter go from 0x0 to 0xffffffff based on the
>         GT frequency. This comes to about once every 28 seconds at a
>         GT frequency of 19.2Mhz).
>       2.In preparation for a gt reset.
>       3.In response to __gt_park events (as the gt power management
>         puts the gt into a lower power state when there is no work
>         being done).
> 
> Root-cause: For both the workloads described farther below, it was
> observed that when user space calls IOCTLs that unparks the
> gt momentarily and repeats such calls many times in quick succession,
> it triggers calling guc_update_engine_gt_clks as many times. However,
> the primary purpose of guc_update_engine_gt_clks is to ensure we don't
> miss the wraparound while the counter is ticking. Thus, the solution
> is to ensure we skip that check if gt_park is calling this function
> earlier than necessary.
> 
> Solution: Snapshot jiffies when we do actually update the busyness
> stats. Then get the new jiffies every time intel_guc_busyness_park
> is called and bail if we are being called too soon. Use half of the
> ping_delay as a safe threshold.
> 
> NOTE1: Workload1: IGTs' gem_create was modified to create a file handle,
> allocate memory with sizes that range from a min of 4K to the max supported
> (in power of two step-sizes). Its maps, modifies and reads back the
> memory. Allocations and modification is repeated until total memory
> allocation reaches the max. Then the file handle is closed. With this
> workload, guc_update_engine_gt_clks was called over 188 thousand times
> in the span of 15 seconds while this test ran three times. With this patch,
> the number of calls reduced to 14.
> 
> NOTE2: Workload2: 30 transcode sessions are created in quick succession.
> While these sessions are created, pcm-iio tool was used to measure I/O
> read operation bandwidth consumption sampled at 100 milisecond intervals
> over the course of 20 seconds. The total bandwidth consumed over 20 seconds
> without this patch was measured at average at 311KBps per sample. With this
> patch, the number went down to about 175Kbps which is about a 43% savings.
> 
> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Super detailed and clear - thank you!

Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

One question below:

> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h            |  8 ++++++++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 +++++++++++++
>   2 files changed, 21 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 966e69a8b1c1..d0d99f178f2d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -230,6 +230,14 @@ struct intel_guc {
>   		 * @shift: Right shift value for the gpm timestamp
>   		 */
>   		u32 shift;
> +
> +		/**
> +		 * @last_stat_jiffies: jiffies at last actual stats collection time
> +		 * We use this timestamp to ensure we don't oversample the
> +		 * stats because runtime power management events can trigger
> +		 * stats collection at much higher rates than required.
> +		 */
> +		unsigned long last_stat_jiffies;
>   	} timestamp;
>   
>   #ifdef CONFIG_DRM_I915_SELFTEST
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index e62ea35513ea..c9f167b80910 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1314,6 +1314,8 @@ static void __update_guc_busyness_stats(struct intel_guc *guc)
>   	unsigned long flags;
>   	ktime_t unused;
>   
> +	guc->timestamp.last_stat_jiffies = jiffies;
> +
>   	spin_lock_irqsave(&guc->timestamp.lock, flags);
>   
>   	guc_update_pm_timestamp(guc, &unused);
> @@ -1386,6 +1388,17 @@ void intel_guc_busyness_park(struct intel_gt *gt)
>   		return;
>   
>   	cancel_delayed_work(&guc->timestamp.work);
> +
> +	/*
> +	 * Before parking, we should sample engine busyness stats if we need to.
> +	 * We can skip it if we are less than half a ping from the last time we
> +	 * sampled the busyness stats.
> +	 */
> +	if (guc->timestamp.last_stat_jiffies &&

Is there a case where we enter park with last_stat_jiffies being unset 
and so skip updating, and if there is, is that a concern? If it is then 
see if it is safe just to not have the zero check. Worst that could 
happen is one extra sampling if either unset or overflows, right?

Regards,

Tvrtko

> +	    !time_after(jiffies, guc->timestamp.last_stat_jiffies +
> +			(guc->timestamp.ping_delay / 2)))
> +		return;
> +
>   	__update_guc_busyness_stats(guc);
>   }
>

next prev parent reply	other threads:[~2022-06-23  8:09 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-23  2:31 [Intel-gfx] [Intel-gfx v4 0/1] drm/i915/guc: Don't update engine busyness stats too frequently Alan Previn
2022-06-23  2:31 ` [Intel-gfx] [PATCH v4 1/1] " Alan Previn
2022-06-23  8:09   ` Tvrtko Ursulin [this message]
2022-06-23  2:54 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
2022-06-23  3:24 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-06-27 12:47 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-06-27 16:56   ` Teres Alexis, Alan Previn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=82877d1a-03ed-fb88-1d1d-3f3944223fbc@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=alan.previn.teres.alexis@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox