Re: [Intel-gfx] [PATCH 1/3] drm/i915/gem: Look for waitboosting across the whole object prior to individual waits

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Karolina Drobnik <karolina.drobnik@intel.com>
Cc: Thomas Voegtle <tv@lio96.de>,
	intel-gfx@lists.freedesktop.org, chris.p.wilson@intel.com,
	Chris Wilson <chris@chris-wilson.co.uk>
Subject: Re: [Intel-gfx] [PATCH 1/3] drm/i915/gem: Look for waitboosting across the whole object prior to individual waits
Date: Thu, 7 Jul 2022 13:57:52 -0400	[thread overview]
Message-ID: <YsceoP5CYySLWjJk@intel.com> (raw)
In-Reply-To: <b0d575e51f795d0b19ca93fbf3e796a747c961ab.1656911806.git.karolina.drobnik@intel.com>

On Tue, Jul 05, 2022 at 12:57:17PM +0200, Karolina Drobnik wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We employ a "waitboost" heuristic to detect when userspace is stalled
> waiting for results from earlier execution. Under latency sensitive work
> mixed between the gpu/cpu, the GPU is typically under-utilised and so
> RPS sees that low utilisation as a reason to downclock the frequency,
> causing longer stalls and lower throughput. The user left waiting for
> the results is not impressed.
> 
> On applying commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv
> workaround") it was observed that deinterlacing h264 on Haswell
> performance dropped by 2-5x. The reason being that the natural workload
> was not intense enough to trigger RPS (using HW evaluation intervals) to
> upclock, and so it was depending on waitboosting for the throughput.
> 
> Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
> changes the composition of dma-resv from keeping a single write fence +
> multiple read fences, to a single array of multiple write and read
> fences (a maximum of one pair of write/read fences per context). The
> iteration order was also changed implicitly from all-read fences then
> the single write fence, to a mix of write fences followed by read
> fences. It is that ordering change that belied the fragility of
> waitboosting.
> 
> Currently, a waitboost is inspected at the point of waiting on an
> outstanding fence. If the GPU is backlogged such that we haven't yet
> stated the request we need to wait on, we force the GPU to upclock until
> the completion of that request. By changing the order in which we waited
> upon requests, we ended up waiting on those requests in sequence and as
> such we saw that each request was already started and so not a suitable
> candidate for waitboosting.
> 
> Instead of

Okay, all the explanation makes sense. But this commit message and
the cover letter tells that we are doing X *Instead* *of* Y.
That would mean code for Y would be removed. But this patch just add X.

So it looks to me that we are adding extra boosts with the code below.

What am I missing?

asking whether to boost each fence in turn, we can look at
> whether boosting is required for the dma-resv ensemble prior to waiting
> on any fence, making the heuristic more robust to the order in which
> fences are stored in the dma-resv.
> 
> Reported-by: Thomas Voegtle <tv@lio96.de>
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
> Fixes: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
> Tested-by: Thomas Voegtle <tv@lio96.de>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_wait.c | 35 ++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> index 319936f91ac5..3fbb464746e1 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> @@ -9,6 +9,7 @@
>  #include <linux/jiffies.h>
>  
>  #include "gt/intel_engine.h"
> +#include "gt/intel_rps.h"
>  
>  #include "i915_gem_ioctls.h"
>  #include "i915_gem_object.h"
> @@ -31,6 +32,38 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
>  				      timeout);
>  }
>  
> +static void
> +i915_gem_object_boost(struct dma_resv *resv, unsigned int flags)
> +{
> +	struct dma_resv_iter cursor;
> +	struct dma_fence *fence;
> +
> +	/*
> +	 * Prescan all fences for potential boosting before we begin waiting.
> +	 *
> +	 * When we wait, we wait on outstanding fences serially. If the
> +	 * dma-resv contains a sequence such as 1:1, 1:2 instead of a reduced
> +	 * form 1:2, then as we look at each wait in turn we see that each
> +	 * request is currently executing and not worthy of boosting. But if
> +	 * we only happen to look at the final fence in the sequence (because
> +	 * of request coalescing or splitting between read/write arrays by
> +	 * the iterator), then we would boost. As such our decision to boost
> +	 * or not is delicately balanced on the order we wait on fences.
> +	 *
> +	 * So instead of looking for boosts sequentially, look for all boosts
> +	 * upfront and then wait on the outstanding fences.
> +	 */
> +
> +	dma_resv_iter_begin(&cursor, resv,
> +			    dma_resv_usage_rw(flags & I915_WAIT_ALL));
> +	dma_resv_for_each_fence_unlocked(&cursor, fence) {
> +		if (dma_fence_is_i915(fence) &&
> +		    !i915_request_started(to_request(fence)))
> +			intel_rps_boost(to_request(fence));
> +	}
> +	dma_resv_iter_end(&cursor);
> +}
> +
>  static long
>  i915_gem_object_wait_reservation(struct dma_resv *resv,
>  				 unsigned int flags,
> @@ -40,6 +73,8 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
>  	struct dma_fence *fence;
>  	long ret = timeout ?: 1;
>  
> +	i915_gem_object_boost(resv, flags);
> +
>  	dma_resv_iter_begin(&cursor, resv,
>  			    dma_resv_usage_rw(flags & I915_WAIT_ALL));
>  	dma_resv_for_each_fence_unlocked(&cursor, fence) {
> -- 
> 2.25.1
>

next prev parent reply	other threads:[~2022-07-07 17:58 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-05 10:57 [Intel-gfx] [PATCH 0/3] drm/i915: Apply waitboosting before fence wait Karolina Drobnik
2022-07-05 10:57 ` [Intel-gfx] [PATCH 1/3] drm/i915/gem: Look for waitboosting across the whole object prior to individual waits Karolina Drobnik
2022-07-07 17:57   ` Rodrigo Vivi [this message]
2022-07-07 21:50     ` Andi Shyti
2022-07-08 10:15       ` Karolina Drobnik
2022-07-08 11:38         ` Andi Shyti
2022-07-08 14:14           ` Karolina Drobnik
2022-07-05 10:57 ` [Intel-gfx] [PATCH 2/3] drm/i915: Bump GT idling delay to 2 jiffies Karolina Drobnik
2022-07-07 18:09   ` Rodrigo Vivi
2022-07-07 21:52   ` Andi Shyti
2022-07-05 10:57 ` [Intel-gfx] [PATCH 3/3] drm/i915/gt: Only kick the signal worker if there's been an update Karolina Drobnik
2022-07-05 13:51 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Apply waitboosting before fence wait Patchwork
2022-07-05 14:10 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-07-05 16:52 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YsceoP5CYySLWjJk@intel.com \
    --to=rodrigo.vivi@intel.com \
    --cc=chris.p.wilson@intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=karolina.drobnik@intel.com \
    --cc=tv@lio96.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.