From mboxrd@z Thu Jan  1 00:00:00 1970
From: Chris Wilson <chris@chris-wilson.co.uk>
Subject: Re: [PATCH] [RFC] drm/i915: read-read semaphore
	optimization
Date: Tue, 13 Dec 2011 16:59:14 +0000
Message-ID: <e0d58a$2iookp@orsmga002.jf.intel.com>
References: <1323748328-10153-1-git-send-email-ben@bwidawsk.net>
	<aefc95$2hvr0i@orsmga001.jf.intel.com>
	<20111213160133.GA4125@phenom.ffwll.local>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
	by gabe.freedesktop.org (Postfix) with ESMTP id D6CB09E8FB
	for <intel-gfx@lists.freedesktop.org>;
	Tue, 13 Dec 2011 08:59:35 -0800 (PST)
In-Reply-To: <20111213160133.GA4125@phenom.ffwll.local>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, Ben Widawsky <ben@bwidawsk.net>, intel-gfx@lists.freedesktop.org
List-Id: intel-gfx@lists.freedesktop.org

On Tue, 13 Dec 2011 17:01:33 +0100, Daniel Vetter <daniel@ffwll.ch> wrote:
> Afaik the only use-case for parallel reads is video decode with
> post-processing on the render ring. The decode ring needs read-only access
> to reference frames to decode the next frame and the render ring read-only
> access to past frames for post-processing (e.g. deinterlacing). But given
> the general state of perf optimizations in libva I think we have lower
> hanging fruit to chase if we actually miss a performance target for this
> use-case.

One in the near future will be: render to backbuffer (RCS),
pageflip to scanout (BCS), read from front (RCS).

And in its current form UXA will do the back-to-front blit on the BCS.
But that is async and so not a large race window, whereas the pageflip
may takes ~16ms to process. I don't think it is entirely unfeasible that
we see some form of this whilst running compositors or games. Or at
least would if we enabled semaphores for pageflips. Except in the
pageflip scenario we know we are protected by the fb ref, so consider
the hypothetical scenario where we have a working vsync'ed blit...

The real question is in any event do we have enough instrumentation to
diagnose GPU stalls upon buffer migration? Then we can replace the read
optimisation with a tracepoint and wait for a test case.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre