Re: [Intel-gfx] [PATCH 2/2] drm/i915/gem: Don't try to map and fence large scanout buffers (v6)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/gem: Don't try to map and fence large scanout buffers (v6)
Date: Mon, 7 Feb 2022 15:24:34 +0200	[thread overview]
Message-ID: <YgEdkkHHgtSfQrJU@intel.com> (raw)
In-Reply-To: <4475de33-22b3-edbb-2995-f72e9bcc4162@linux.intel.com>

On Mon, Feb 07, 2022 at 11:47:16AM +0000, Tvrtko Ursulin wrote:
> 
> On 07/02/2022 10:58, Ville Syrjälä wrote:
> > On Thu, Feb 03, 2022 at 05:22:10PM -0800, Vivek Kasireddy wrote:
> >> On platforms capable of allowing 8K (7680 x 4320) modes, pinning 2 or
> >> more framebuffers/scanout buffers results in only one that is mappable/
> >> fenceable. Therefore, pageflipping between these 2 FBs where only one
> >> is mappable/fenceable creates latencies large enough to miss alternate
> >> vblanks thereby producing less optimal framerate.
> >>
> >> This mainly happens because when i915_gem_object_pin_to_display_plane()
> >> is called to pin one of the FB objs, the associated vma is identified
> >> as misplaced and therefore i915_vma_unbind() is called which unbinds and
> >> evicts it. This misplaced vma gets subseqently pinned only when
> >> i915_gem_object_ggtt_pin_ww() is called without PIN_MAPPABLE. This
> >> results in a latency of ~10ms and happens every other vblank/repaint cycle.
> >> Therefore, to fix this issue, we try to see if there is space to map
> >> at-least two objects of a given size and return early if there isn't. This
> >> would ensure that we do not try with PIN_MAPPABLE for any objects that
> >> are too big to map thereby preventing unncessary unbind.
> >>
> >> Testcase:
> >> Running Weston and weston-simple-egl on an Alderlake_S (ADLS) platform
> >> with a 8K@60 mode results in only ~40 FPS. Since upstream Weston submits
> >> a frame ~7ms before the next vblank, the latencies seen between atomic
> >> commit and flip event are 7, 24 (7 + 16.66), 7, 24..... suggesting that
> >> it misses the vblank every other frame.
> >>
> >> Here is the ftrace snippet that shows the source of the ~10ms latency:
> >>                i915_gem_object_pin_to_display_plane() {
> >> 0.102 us   |    i915_gem_object_set_cache_level();
> >>                  i915_gem_object_ggtt_pin_ww() {
> >> 0.390 us   |      i915_vma_instance();
> >> 0.178 us   |      i915_vma_misplaced();
> >>                    i915_vma_unbind() {
> >>                    __i915_active_wait() {
> >> 0.082 us   |        i915_active_acquire_if_busy();
> >> 0.475 us   |      }
> >>                    intel_runtime_pm_get() {
> >> 0.087 us   |        intel_runtime_pm_acquire();
> >> 0.259 us   |      }
> >>                    __i915_active_wait() {
> >> 0.085 us   |        i915_active_acquire_if_busy();
> >> 0.240 us   |      }
> >>                    __i915_vma_evict() {
> >>                      ggtt_unbind_vma() {
> >>                        gen8_ggtt_clear_range() {
> >> 10507.255 us |        }
> >> 10507.689 us |      }
> >> 10508.516 us |   }
> >>
> >> v2: Instead of using bigjoiner checks, determine whether a scanout
> >>      buffer is too big by checking to see if it is possible to map
> >>      two of them into the ggtt.
> >>
> >> v3 (Ville):
> >> - Count how many fb objects can be fit into the available holes
> >>    instead of checking for a hole twice the object size.
> >> - Take alignment constraints into account.
> >> - Limit this large scanout buffer check to >= Gen 11 platforms.
> >>
> >> v4:
> >> - Remove existing heuristic that checks just for size. (Ville)
> >> - Return early if we find space to map at-least two objects. (Tvrtko)
> >> - Slightly update the commit message.
> >>
> >> v5: (Tvrtko)
> >> - Rename the function to indicate that the object may be too big to
> >>    map into the aperture.
> >> - Account for guard pages while calculating the total size required
> >>    for the object.
> >> - Do not subject all objects to the heuristic check and instead
> >>    consider objects only of a certain size.
> >> - Do the hole walk using the rbtree.
> >> - Preserve the existing PIN_NONBLOCK logic.
> >> - Drop the PIN_MAPPABLE check while pinning the VMA.
> >>
> >> v6: (Tvrtko)
> >> - Return 0 on success and the specific error code on failure to
> >>    preserve the existing behavior.
> >>
> >> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >> Cc: Manasi Navare <manasi.d.navare@intel.com>
> >> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem.c | 120 ++++++++++++++++++++++++--------
> >>   1 file changed, 90 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >> index e3a2c2a0e156..39f0d17550c3 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >> @@ -46,6 +46,7 @@
> >>   #include "gem/i915_gem_mman.h"
> >>   #include "gem/i915_gem_region.h"
> >>   #include "gem/i915_gem_userptr.h"
> >> +#include "gem/i915_gem_tiling.h"
> >>   #include "gt/intel_engine_user.h"
> >>   #include "gt/intel_gt.h"
> >>   #include "gt/intel_gt_pm.h"
> >> @@ -876,6 +877,92 @@ static void discard_ggtt_vma(struct i915_vma *vma)
> >>   	spin_unlock(&obj->vma.lock);
> >>   }
> >>   
> >> +static int
> >> +i915_gem_object_fits_in_aperture(struct drm_i915_gem_object *obj,
> >> +				 u64 alignment, u64 flags)
> >> +{
> >> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> >> +	struct i915_ggtt *ggtt = to_gt(i915)->ggtt;
> >> +	struct drm_mm_node *hole;
> >> +	u64 hole_start, hole_end, start, end;
> >> +	u64 fence_size, fence_alignment;
> >> +	unsigned int count = 0;
> >> +
> >> +	/*
> >> +	 * If the required space is larger than the available
> >> +	 * aperture, we will not able to find a slot for the
> >> +	 * object and unbinding the object now will be in
> >> +	 * vain. Worse, doing so may cause us to ping-pong
> >> +	 * the object in and out of the Global GTT and
> >> +	 * waste a lot of cycles under the mutex.
> >> +	 */
> >> +	if (obj->base.size > ggtt->mappable_end)
> >> +		return -E2BIG;
> >> +
> >> +	/*
> >> +	 * If NONBLOCK is set the caller is optimistically
> >> +	 * trying to cache the full object within the mappable
> >> +	 * aperture, and *must* have a fallback in place for
> >> +	 * situations where we cannot bind the object. We
> >> +	 * can be a little more lax here and use the fallback
> >> +	 * more often to avoid costly migrations of ourselves
> >> +	 * and other objects within the aperture.
> >> +	 */
> >> +	if (!(flags & PIN_NONBLOCK))
> >> +		return 0;
> >> +
> >> +	/*
> >> +	 * We only consider objects whose size is at-least a quarter of
> >> +	 * the aperture to be too big and subject them to the new
> >> +	 * heuristic below.
> >> +	 */
> >> +	if (obj->base.size < ggtt->mappable_end / 4)
> >> +		return 0;
> > 
> > That seems a fairly arbitrary thing to put here. Maybe something the
> > caller should check/specify?
> 
> I have no strong opinion on this one. In my mind I categorised it under 
> "is it a large framebuffer" heuristics. Previously it was less than one 
> half of aperture always okay, now one quarter, plus 2x hole check if 
> larger. Both are heuristics. I even mentioned earlier if 2x should be an 
> input parameter as well, but again, given it's not an exported function 
> couldn't really justify it.

Is there any point in even having this extra check? If we 
don't think checking this is worth the hassle then why call
the function at all?

> 
> > 
> >> +
> >> +	if (HAS_GMCH(i915) || DISPLAY_VER(i915) < 11 ||
> >> +	    !i915_gem_object_is_framebuffer(obj))
> >> +		return 0;
> > 
> > None of that seems appropriate for a generic gem function
> > with this name.
> 
> It's not exported though, maybe remove i915_gem prefix to avoid any 
> ideas of it being generic?

These checks don't even seem to doing anything useful. HAS_GMCH should
already be covered by always setting PIN_MAPPABLE and hence O_NONBLOCK
is never even tried, the pre-icl vs. icl+ check should not exist at all
IMO, and if this is only called for framebuffers then why does the code
pretend that is not the case?

So I would suggest just ditching all these checks, and then the function
even does what it says on the tin.

-- 
Ville Syrjälä
Intel

next prev parent reply	other threads:[~2022-02-07 13:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-02  1:11 [Intel-gfx] [PATCH 1/2] drm/mm: Add an iterator to optimally walk over holes for an allocation Vivek Kasireddy
2022-02-02  1:11 ` [Intel-gfx] [PATCH 2/2] drm/i915/gem: Don't try to map and fence large scanout buffers (v5) Vivek Kasireddy
2022-02-02 13:19   ` Tvrtko Ursulin
2022-02-04  1:22     ` [Intel-gfx] [PATCH 2/2] drm/i915/gem: Don't try to map and fence large scanout buffers (v6) Vivek Kasireddy
2022-02-07 10:49       ` Tvrtko Ursulin
2022-02-07 10:58       ` Ville Syrjälä
2022-02-07 11:47         ` Tvrtko Ursulin
2022-02-07 13:24           ` Ville Syrjälä [this message]
2022-02-07 13:36             ` Tvrtko Ursulin
2022-02-08  5:10               ` Kasireddy, Vivek
2022-02-08  5:43                 ` Ville Syrjälä
2022-02-09  1:47                   ` Kasireddy, Vivek
2022-02-09  7:42                     ` Ville Syrjälä
2022-02-10  1:41                       ` Kasireddy, Vivek
2022-02-02  1:43 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/2] drm/mm: Add an iterator to optimally walk over holes for an allocation Patchwork
2022-02-02  1:45 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-02-02  2:09 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-02-04  1:45 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/2] drm/mm: Add an iterator to optimally walk over holes for an allocation (rev2) Patchwork
2022-02-07 10:51   ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YgEdkkHHgtSfQrJU@intel.com \
    --to=ville.syrjala@linux.intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.