From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mario Kleiner Subject: Re: [RFC] Async flips Date: Mon, 12 Nov 2012 04:53:10 +0100 Message-ID: <50A072A6.6060000@tuebingen.mpg.de> References: <1351622029-2276-1-git-send-email-jbarnes@virtuousgeek.org> <20121031125324.GD3791@intel.com> <87625qpmcw.fsf@eliezer.anholt.net> <20121031185118.GJ3791@intel.com> <50934FE9.2000408@tuebingen.mpg.de> <20121102092938.GT3791@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mx2.mpg.de (mx2.mpg.de [134.76.10.35]) by gabe.freedesktop.org (Postfix) with ESMTP id 584039E790 for ; Sun, 11 Nov 2012 19:53:12 -0800 (PST) In-Reply-To: <20121102092938.GT3791@intel.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: =?ISO-8859-1?Q?Ville_Syrj=E4l=E4?= Cc: intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On 02.11.12 10:29, Ville Syrj=E4l=E4 wrote: > On Fri, Nov 02, 2012 at 05:45:29AM +0100, Mario Kleiner wrote: >> >> >> On 31.10.12 19:51, Ville Syrj=E4l=E4 wrote: >>> On Wed, Oct 31, 2012 at 10:44:47AM -0700, Eric Anholt wrote: >>>> Ville Syrj=E4l=E4 writes: >>>> >>>>> On Tue, Oct 30, 2012 at 01:33:47PM -0500, Jesse Barnes wrote: >>>>>> The hw supports async flips through the render ring, so why not expo= se it? >>>>>> It gives us one more "tear me harder" option we can use in the DDX a= nd >>>>>> for other cases where simply flipping to the latest buffer is more >>>>>> important than visual quality. >>>>> >>>>> The only reason I can see why anyone would really want async flips is >>>>> when you're restricted to double buffering. With triple buffering you >>>>> should be able to override the previous flip w/o tearing. >>>>> >>>>> Well, actually if you use the ring based flips, then you can't do the >>>>> override. My atomic page flip code can do it because it's using mmio >>>>> flips. There were also other reasons favoring mmio over ring. >>>>> >>>>> Once the atomic code is deemed ready, I would suggest we just nuke the >>>>> ring based flip code (pun intended). >>>> >>>> Can you outline what exactly your plan is for doing faster-than-vblank >>>> page flipping without tearing, and how it gets synchronized with >>>> rendering? >>> >>> The faster than vrefresh flipping simply involves overwriting the >>> display plane registers before they've been latched by the hardware. >>> This appears to work fine already. >>> >>> As far as the synchronization goes, I basically just want a callback >>> from the GPU when it's done with the buffer. I'm expecting to find >>> some kind of GPU progress interrupt that I can enable while I'm waiting >>> for the GPU to catch up. So I also need a FIFO to store the flip >>> requests in the meantime. Once the GPU tells me it's ready, I pull the >>> flip request from the queue and proceed with the display plane >>> programming. >>> >>> So the synchronization part it's still quite handwavy, and I need >>> to study the hardware/driver in more detail to figure out the >>> specifics. >>> >> >> That's cool. But please make sure that the behaviour will be somehow >> controllable by OpenGL applications, via some OpenGL extension. I can >> see use for different modes: >> >> a) Normal double-buffering: For deterministic, well controlled timing - >> That's what my type of applications need. Maximum control over what to >> show next, based on precise and reliable flip completion timestamps. >> >> b) Triple buffering with FIFO queueing of frames ahead, what the intel >> ddx currently does, unfortunately for me with totally broken >> timestamping, so all my users have to disable it in the xorg.conf - >> quite a challenge for many Apple converts, which have trouble with the >> concept of editing configuration files. It's useful if an app manages to >> render at full refresh rate on average to smooth out occassional stalls, >> because the gpu has one frame of completed rendering queued up in >> advance. Maybe this also allows for some power saving if an app can >> render and queue frames ahead of time as fast as possible (race to >> completion) and then the cpu/gpu can go to some deeper sleep state earli= er? >> >> c) Your LIFO triple-buffering, as far as i understand, with dropping >> late frames, to reduce latency /lag for things like video games. >> > > Right. I've been occasionally thinking about pushing the swap interval > handling to the kernel. > > Currently user space needs to do the wait for vblank trick before > scheduling the swap, and then hoping that the GPU will catch up fast > enough so that the swap will happen on the next vblank. If the kernel > handled it, it could actually guarantee the OML_sync_control remainder > behaviour (well assuming kernel threads get scheduled in a timely > fashon), whereas the user space solution can't give such guarantees. Yes. You could even do much of it from the vblank irq for robustness of = timing. The downside would be probably the complexity of = error/special-case handling. E.g., if an app schedules a swap 10 seconds = into the future, but then the app dies/quits or a fullscreen window gets = switched back to windowed mode, so something that was meant to be = page-flipped suddenly can't be page-flipped anymore, or the window went = away during that 10 secs. > But even w/o that extra kernel feature, my code should be no worse in > that regard than the current code. You can still do the wait for vblank > trick in user space to get similar swap interval behaviour, and you can > still use as many buffers as you want. The only real difference to the > current situation is that if you schedule the flip too soon, you won't > get the EBUSY from the kernel, but instead you drop the previous flip. > But assuming the user space code is well behaved it won't try to flip > too soon, so essentially nothing will change. > Yes. My remark wrt application control was just because i assumed you = would be responsible for the whole stack when implementing this feature, = also how this gets exposed to apps via the ddx / glx / mesa etc. >> d) Flipping without vsync =3D tearing. I think this is at least useful f= or >> benchmarks, although not for anything else. > > This one I don't support curently. It would be possible to support it > (assuming the HW allows it). The simplest way would be to just add a > new flag to the ioctl to control this behaviour. > I think that's what Jesse's patches are supposed to add. thanks, -mario