All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [RESEND] drm/i915: Interactive RPS mode
Date: Fri, 20 Jul 2018 13:49:09 +0100	[thread overview]
Message-ID: <cd5a47fc-a889-0f59-763f-9be8bb10e842@linux.intel.com> (raw)
In-Reply-To: <20180712173838.23129-1-chris@chris-wilson.co.uk>


On 12/07/2018 18:38, Chris Wilson wrote:
> RPS provides a feedback loop where we use the load during the previous
> evaluation interval to decide whether to up or down clock the GPU
> frequency. Our responsiveness is split into 3 regimes, a high and low
> plateau with the intent to keep the gpu clocked high to cover occasional
> stalls under high load, and low despite occasional glitches under steady
> low load, and inbetween. However, we run into situations like kodi where
> we want to stay at low power (video decoding is done efficiently
> inside the fixed function HW and doesn't need high clocks even for high
> bitrate streams), but just occasionally the pipeline is more complex
> than a video decode and we need a smidgen of extra GPU power to present
> on time. In the high power regime, we sample at sub frame intervals with
> a bias to upclocking, and conversely at low power we sample over a few
> frames worth to provide what we consider to be the right levels of
> responsiveness respectively. At low power, we more or less expect to be
> kicked out to high power at the start of a busy sequence by waitboosting.
> 
> Prior to commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active
> request") whenever we missed the frame or stalled, we would immediate go
> full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we
> relaxed the waitboosting to only apply if the pipeline was deep to avoid
> over-committing resources for a near miss. Sadly though, a near miss is
> still a miss, and perceptible as jitter in the frame delivery.
> 
> To try and prevent the near miss before having to resort to boosting
> after the fact, we use the pageflip queue as an indication that we are
> in an "interactive" regime and so should sample the load more frequently
> to provide power before the frame misses it vblank. This will make us
> more favorable to providing a small power increase (one or two bins) as
> required rather than going all the way to maximum and then having to
> work back down again. (We still keep the waitboosting mechanism around
> just in case a dramatic change in system load requires urgent uplocking,
> faster than we can provide in a few evaluation intervals.)
> 
> v2: Reduce rps_set_interactive to a boolean parameter to avoid the
> confusion of what if they wanted a new power mode after pinning to a
> different mode (which to choose?)
> v3: Only reprogram RPS while the GT is awake, it will be set when we
> wake the GT, and while off warns about being used outside of rpm.
> v4: Fix deferred application of interactive mode
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107111
> Fixes: e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c  |  1 +
>   drivers/gpu/drm/i915/i915_drv.h      |  4 ++
>   drivers/gpu/drm/i915/intel_display.c | 20 ++++++
>   drivers/gpu/drm/i915/intel_drv.h     |  2 +
>   drivers/gpu/drm/i915/intel_pm.c      | 91 +++++++++++++++++++---------
>   5 files changed, 89 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 099f97ef2303..ac019bb927d0 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2218,6 +2218,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
>   	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
>   	seq_printf(m, "Boosts outstanding? %d\n",
>   		   atomic_read(&rps->num_waiters));
> +	seq_printf(m, "Interactive? %d\n", READ_ONCE(rps->interactive));
>   	seq_printf(m, "Frequency requested %d\n",
>   		   intel_gpu_freq(dev_priv, rps->cur_freq));
>   	seq_printf(m, "  min hard:%d, soft:%d; max soft:%d, hard:%d\n",
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 01dd29837233..f02fbeee553f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -784,6 +784,8 @@ struct intel_rps {
>   
>   	int last_adj;
>   	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
> +	unsigned int interactive;
> +	struct mutex power_lock;
>   
>   	bool enabled;
>   	atomic_t num_waiters;
> @@ -3429,6 +3431,8 @@ extern void i915_redisable_vga_power_on(struct drm_i915_private *dev_priv);
>   extern bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val);
>   extern void intel_init_pch_refclk(struct drm_i915_private *dev_priv);
>   extern int intel_set_rps(struct drm_i915_private *dev_priv, u8 val);
> +extern void intel_rps_set_interactive(struct drm_i915_private *dev_priv,
> +				      bool state);
>   extern bool intel_set_memory_cxsr(struct drm_i915_private *dev_priv,
>   				  bool enable);
>   
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 7998e70a3174..5809366ff9f0 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -13104,6 +13104,19 @@ intel_prepare_plane_fb(struct drm_plane *plane,
>   		add_rps_boost_after_vblank(new_state->crtc, new_state->fence);
>   	}
>   
> +	/*
> +	 * We declare pageflips to be interactive and so merit a small bias
> +	 * towards upclocking to deliver the frame on time. By only changing
> +	 * the RPS thresholds to sample more regularly and aim for higher
> +	 * clocks we can hopefully deliver low power workloads (like kodi)
> +	 * that are not quite steady state without resorting to forcing
> +	 * maximum clocks following a vblank miss (see do_rps_boost()).
> +	 */
> +	if (!intel_state->rps_interactive) {
> +		intel_rps_set_interactive(dev_priv, true);
> +		intel_state->rps_interactive = true;
> +	}
> +
>   	return 0;
>   }
>   
> @@ -13120,8 +13133,15 @@ void
>   intel_cleanup_plane_fb(struct drm_plane *plane,
>   		       struct drm_plane_state *old_state)
>   {
> +	struct intel_atomic_state *intel_state =
> +		to_intel_atomic_state(old_state->state);
>   	struct drm_i915_private *dev_priv = to_i915(plane->dev);
>   
> +	if (intel_state->rps_interactive) {
> +		intel_rps_set_interactive(dev_priv, false);
> +		intel_state->rps_interactive = false;
> +	}

Why is the rps_intearctive flag needed in plane state? I wanted to 
assume prepare & cleanup hooks are fully symmetric, but this flags makes 
me unsure. A reviewer with more display knowledge will be needed here I 
think.

> +
>   	/* Should only be called after a successful intel_prepare_plane_fb()! */
>   	mutex_lock(&dev_priv->drm.struct_mutex);
>   	intel_plane_unpin_fb(to_intel_plane_state(old_state));
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 61e715ddd0d5..544812488821 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -482,6 +482,8 @@ struct intel_atomic_state {
>   	 */
>   	bool skip_intermediate_wm;
>   
> +	bool rps_interactive;

Should the name at this level be something more tied to the subsystem 
and not implying wider relationships? Like page flip pending? fb_prepared?

> +
>   	/* Gen9+ only */
>   	struct skl_ddb_values wm_results;
>   
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 53aaaa3e6886..01475bf3196a 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -6264,41 +6264,14 @@ static u32 intel_rps_limits(struct drm_i915_private *dev_priv, u8 val)
>   	return limits;
>   }
>   
> -static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
> +static void rps_set_power(struct drm_i915_private *dev_priv, int new_power)
>   {
>   	struct intel_rps *rps = &dev_priv->gt_pm.rps;
> -	int new_power;
>   	u32 threshold_up = 0, threshold_down = 0; /* in % */
>   	u32 ei_up = 0, ei_down = 0;
>   
> -	new_power = rps->power;
> -	switch (rps->power) {
> -	case LOW_POWER:
> -		if (val > rps->efficient_freq + 1 &&
> -		    val > rps->cur_freq)
> -			new_power = BETWEEN;
> -		break;
> +	lockdep_assert_held(&rps->power_lock);
>   
> -	case BETWEEN:
> -		if (val <= rps->efficient_freq &&
> -		    val < rps->cur_freq)
> -			new_power = LOW_POWER;
> -		else if (val >= rps->rp0_freq &&
> -			 val > rps->cur_freq)
> -			new_power = HIGH_POWER;
> -		break;
> -
> -	case HIGH_POWER:
> -		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
> -		    val < rps->cur_freq)
> -			new_power = BETWEEN;
> -		break;
> -	}
> -	/* Max/min bins are special */
> -	if (val <= rps->min_freq_softlimit)
> -		new_power = LOW_POWER;
> -	if (val >= rps->max_freq_softlimit)
> -		new_power = HIGH_POWER;
>   	if (new_power == rps->power)
>   		return;
>   
> @@ -6365,9 +6338,68 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
>   	rps->power = new_power;
>   	rps->up_threshold = threshold_up;
>   	rps->down_threshold = threshold_down;
> +}
> +
> +static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
> +{
> +	struct intel_rps *rps = &dev_priv->gt_pm.rps;
> +	int new_power;
> +
> +	new_power = rps->power;
> +	switch (rps->power) {
> +	case LOW_POWER:
> +		if (val > rps->efficient_freq + 1 &&
> +		    val > rps->cur_freq)
> +			new_power = BETWEEN;
> +		break;
> +
> +	case BETWEEN:
> +		if (val <= rps->efficient_freq &&
> +		    val < rps->cur_freq)
> +			new_power = LOW_POWER;
> +		else if (val >= rps->rp0_freq &&
> +			 val > rps->cur_freq)
> +			new_power = HIGH_POWER;
> +		break;
> +
> +	case HIGH_POWER:
> +		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
> +		    val < rps->cur_freq)
> +			new_power = BETWEEN;
> +		break;
> +	}
> +	/* Max/min bins are special */
> +	if (val <= rps->min_freq_softlimit)
> +		new_power = LOW_POWER;
> +	if (val >= rps->max_freq_softlimit)
> +		new_power = HIGH_POWER;
> +
> +	mutex_lock(&rps->power_lock);

Is it worth avoiding the atomic here when GEN < 6?

> +	if (rps->interactive)
> +		new_power = HIGH_POWER;
> +	rps_set_power(dev_priv, new_power);
> +	mutex_unlock(&rps->power_lock);
 >
 >   	rps->last_adj = 0;

This block should go up to the beginning of the function since when 
rps->interactive all preceding calculations are for nothing. And can 
immediately return then.

>   }
>   
> +void intel_rps_set_interactive(struct drm_i915_private *dev_priv, bool state)

s/state/interactive/ for more self-documenting function body?

And not s/dev_priv/i915/ ?!? :)

> +{
> +	struct intel_rps *rps = &dev_priv->gt_pm.rps;
> +
> +	if (INTEL_GEN(dev_priv) < 6)
> +		return;
> +
> +	mutex_lock(&rps->power_lock);
> +	if (state) {
> +		if (!rps->interactive++ && READ_ONCE(dev_priv->gt.awake))
> +			rps_set_power(dev_priv, HIGH_POWER);
> +	} else {
> +		GEM_BUG_ON(!rps->interactive);
> +		rps->interactive--;

Hm maybe protect this in production code so some missed code flows in 
the atomic paths do not cause underflow and so permanent interactive mode.

> +	}
> +	mutex_unlock(&rps->power_lock);
> +}
> +
>   static u32 gen6_rps_pm_mask(struct drm_i915_private *dev_priv, u8 val)
>   {
>   	struct intel_rps *rps = &dev_priv->gt_pm.rps;
> @@ -9604,6 +9636,7 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val)
>   void intel_pm_setup(struct drm_i915_private *dev_priv)
>   {
>   	mutex_init(&dev_priv->pcu_lock);
> +	mutex_init(&dev_priv->gt_pm.rps.power_lock);
>   
>   	atomic_set(&dev_priv->gt_pm.rps.num_waiters, 0);
>   
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2018-07-20 12:49 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 21:01 [PATCH] drm/i915: Interactive RPS mode Chris Wilson
2018-07-11 22:06 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2018-07-11 22:07 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-07-11 22:23 ` ✗ Fi.CI.BAT: failure " Patchwork
2018-07-12  7:50 ` [PATCH v3] " Chris Wilson
2018-07-12  8:02 ` [PATCH v4] " Chris Wilson
2018-07-12 17:38   ` [RESEND] " Chris Wilson
2018-07-20 12:49     ` Tvrtko Ursulin [this message]
2018-07-20 13:02       ` Chris Wilson
2018-07-20 13:07         ` Ville Syrjälä
2018-07-20 13:14           ` Chris Wilson
2018-07-20 13:32             ` Ville Syrjälä
2018-07-20 13:38               ` Chris Wilson
2018-07-20 13:50                 ` Ville Syrjälä
2018-07-20 14:03                   ` Chris Wilson
2018-07-20 14:19                     ` Ville Syrjälä
2018-07-21  8:44                       ` Chris Wilson
2018-07-20 13:29         ` Tvrtko Ursulin
2018-07-20 13:59           ` Chris Wilson
2018-07-20 14:58             ` Tvrtko Ursulin
2018-07-20 15:01               ` Chris Wilson
2018-07-12 17:58   ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Interactive RPS mode (rev4) Patchwork
2018-07-12 17:59   ` ✗ Fi.CI.SPARSE: " Patchwork
2018-07-12 18:15   ` ✓ Fi.CI.BAT: success " Patchwork
2018-07-23 10:01 ` [PATCH] drm/i915: Interactive RPS mode Chris Wilson
2018-07-31 13:01   ` Joonas Lahtinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd5a47fc-a889-0f59-763f-9be8bb10e842@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.