From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wilson Subject: Re: [PATCH] [RFC] drm/i915: read-read semaphore optimization Date: Tue, 13 Dec 2011 16:59:14 +0000 Message-ID: References: <1323748328-10153-1-git-send-email-ben@bwidawsk.net> <20111213160133.GA4125@phenom.ffwll.local> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTP id D6CB09E8FB for ; Tue, 13 Dec 2011 08:59:35 -0800 (PST) In-Reply-To: <20111213160133.GA4125@phenom.ffwll.local> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Daniel Vetter Cc: Daniel Vetter , Ben Widawsky , intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On Tue, 13 Dec 2011 17:01:33 +0100, Daniel Vetter wrote: > Afaik the only use-case for parallel reads is video decode with > post-processing on the render ring. The decode ring needs read-only access > to reference frames to decode the next frame and the render ring read-only > access to past frames for post-processing (e.g. deinterlacing). But given > the general state of perf optimizations in libva I think we have lower > hanging fruit to chase if we actually miss a performance target for this > use-case. One in the near future will be: render to backbuffer (RCS), pageflip to scanout (BCS), read from front (RCS). And in its current form UXA will do the back-to-front blit on the BCS. But that is async and so not a large race window, whereas the pageflip may takes ~16ms to process. I don't think it is entirely unfeasible that we see some form of this whilst running compositors or games. Or at least would if we enabled semaphores for pageflips. Except in the pageflip scenario we know we are protected by the fb ref, so consider the hypothetical scenario where we have a working vsync'ed blit... The real question is in any event do we have enough instrumentation to diagnose GPU stalls upon buffer migration? Then we can replace the read optimisation with a tracepoint and wait for a test case. -Chris -- Chris Wilson, Intel Open Source Technology Centre