From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: Re: [PATCH] drm/i915: Write RING_TAIL once per-request Date: Tue, 10 Sep 2013 16:13:53 +0200 Message-ID: <20130910141353.GL27291@phenom.ffwll.local> References: <1376169392-6272-1-git-send-email-chris@chris-wilson.co.uk> <20130826204212.GB1454@bwidawsk.net> <20130910130120.GB5555@nuc-i3427.alporthouse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail-bk0-f49.google.com (mail-bk0-f49.google.com [209.85.214.49]) by gabe.freedesktop.org (Postfix) with ESMTP id 27565E6233 for ; Tue, 10 Sep 2013 07:13:40 -0700 (PDT) Received: by mail-bk0-f49.google.com with SMTP id r7so2937053bkg.36 for ; Tue, 10 Sep 2013 07:13:40 -0700 (PDT) Content-Disposition: inline In-Reply-To: <20130910130120.GB5555@nuc-i3427.alporthouse.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Chris Wilson , Ben Widawsky , intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On Tue, Sep 10, 2013 at 02:01:20PM +0100, Chris Wilson wrote: > On Mon, Aug 26, 2013 at 01:42:12PM -0700, Ben Widawsky wrote: > > On Sat, Aug 10, 2013 at 10:16:32PM +0100, Chris Wilson wrote: > > > Ignoring the legacy DRI1 code, and a couple of special cases (to be > > > discussed later), all access to the ring is mediated through requests. > > > The first write to a ring will grab a seqno and mark the ring as having > > > an outstanding_lazy_request. Either through explicitly adding a request > > > after an execbuffer or through an implicit wait (either by the CPU or by > > > a semaphore), that sequence of writes will be terminated with a request. > > > So we can ellide all the intervening writes to the tail register and > > > send the entire command stream to the GPU at once. This will reduce the > > > number of *serialising* writes to the tail register by a factor or 3-5 > > > times (depending upon architecture and number of workarounds, context > > > switches, etc involved). This becomes even more noticeable when the > > > register write is overloaded with a number of debugging tools. The > > > astute reader will wonder if it is then possible to overflow the ring > > > with a single command. It is not. When we start a command sequence to > > > the ring, we check for available space and issue a wait in case we have > > > not. The ring wait will in this case be forced to flush the outstanding > > > register write and then poll the ACTHD for sufficient space to continue. > > > > > > The exception to the rule where everything is inside a request are a few > > > initialisation cases where we may want to write GPU commands via the CS > > > before userspace wakes up and page flips. > > > > > > > I'm not a huge fan of having the second intel_ring_advance() that does something > > other than it sounds like. I'd *much* prefer to not intel_ring_advance() > > if you are certain more emits will be coming like in the case you > > mention above. We can add a paranoia check whenever we're about to > > return to userspace that tail == RING_TAIL > > > > Also, without performance data, it's hard to say this indirection is > > worth it. > > Just a sample, UXA on i5-2520qm, aa10text: > 2580000.0/sec -> 2980000.0/sec Queued for -next, thanks for the patch. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch