From mboxrd@z Thu Jan  1 00:00:00 1970
From: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Subject: Re: [RFC] Async flips
Date: Mon, 12 Nov 2012 04:53:10 +0100
Message-ID: <50A072A6.6060000@tuebingen.mpg.de>
References: <1351622029-2276-1-git-send-email-jbarnes@virtuousgeek.org>
	<20121031125324.GD3791@intel.com>
	<87625qpmcw.fsf@eliezer.anholt.net>
	<20121031185118.GJ3791@intel.com>
	<50934FE9.2000408@tuebingen.mpg.de>
	<20121102092938.GT3791@intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
Content-Transfer-Encoding: quoted-printable
Return-path: <intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org>
Received: from mx2.mpg.de (mx2.mpg.de [134.76.10.35])
	by gabe.freedesktop.org (Postfix) with ESMTP id 584039E790
	for <intel-gfx@lists.freedesktop.org>;
	Sun, 11 Nov 2012 19:53:12 -0800 (PST)
In-Reply-To: <20121102092938.GT3791@intel.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
To: =?ISO-8859-1?Q?Ville_Syrj=E4l=E4?= <ville.syrjala@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
List-Id: intel-gfx@lists.freedesktop.org

On 02.11.12 10:29, Ville Syrj=E4l=E4 wrote:
> On Fri, Nov 02, 2012 at 05:45:29AM +0100, Mario Kleiner wrote:
>>
>>
>> On 31.10.12 19:51, Ville Syrj=E4l=E4 wrote:
>>> On Wed, Oct 31, 2012 at 10:44:47AM -0700, Eric Anholt wrote:
>>>> Ville Syrj=E4l=E4 <ville.syrjala@linux.intel.com> writes:
>>>>
>>>>> On Tue, Oct 30, 2012 at 01:33:47PM -0500, Jesse Barnes wrote:
>>>>>> The hw supports async flips through the render ring, so why not expo=
se it?
>>>>>> It gives us one more "tear me harder" option we can use in the DDX a=
nd
>>>>>> for other cases where simply flipping to the latest buffer is more
>>>>>> important than visual quality.
>>>>>
>>>>> The only reason I can see why anyone would really want async flips is
>>>>> when you're restricted to double buffering. With triple buffering you
>>>>> should be able to override the previous flip w/o tearing.
>>>>>
>>>>> Well, actually if you use the ring based flips, then you can't do the
>>>>> override. My atomic page flip code can do it because it's using mmio
>>>>> flips. There were also other reasons favoring mmio over ring.
>>>>>
>>>>> Once the atomic code is deemed ready, I would suggest we just nuke the
>>>>> ring based flip code (pun intended).
>>>>
>>>> Can you outline what exactly your plan is for doing faster-than-vblank
>>>> page flipping without tearing, and how it gets synchronized with
>>>> rendering?
>>>
>>> The faster than vrefresh flipping simply involves overwriting the
>>> display plane registers before they've been latched by the hardware.
>>> This appears to work fine already.
>>>
>>> As far as the synchronization goes, I basically just want a callback
>>> from the GPU when it's done with the buffer. I'm expecting to find
>>> some kind of GPU progress interrupt that I can enable while I'm waiting
>>> for the GPU to catch up. So I also need a FIFO to store the flip
>>> requests in the meantime. Once the GPU tells me it's ready, I pull the
>>> flip request from the queue and proceed with the display plane
>>> programming.
>>>
>>> So the synchronization part it's still quite handwavy, and I need
>>> to study the hardware/driver in more detail to figure out the
>>> specifics.
>>>
>>
>> That's cool. But please make sure that the behaviour will be somehow
>> controllable by OpenGL applications, via some OpenGL extension. I can
>> see use for different modes:
>>
>> a) Normal double-buffering: For deterministic, well controlled timing -
>> That's what my type of applications need. Maximum control over what to
>> show next, based on precise and reliable flip completion timestamps.
>>
>> b) Triple buffering with FIFO queueing of frames ahead, what the intel
>> ddx currently does, unfortunately for me with totally broken
>> timestamping, so all my users have to disable it in the xorg.conf -
>> quite a challenge for many Apple converts, which have trouble with the
>> concept of editing configuration files. It's useful if an app manages to
>> render at full refresh rate on average to smooth out occassional stalls,
>> because the gpu has one frame of completed rendering queued up in
>> advance. Maybe this also allows for some power saving if an app can
>> render and queue frames ahead of time as fast as possible (race to
>> completion) and then the cpu/gpu can go to some deeper sleep state earli=
er?
>>
>> c) Your LIFO triple-buffering, as far as i understand, with dropping
>> late frames, to reduce latency /lag for things like video games.
>>
>
> Right. I've been occasionally thinking about pushing the swap interval
> handling to the kernel.
>
> Currently user space needs to do the wait for vblank trick before
> scheduling the swap, and then hoping that the GPU will catch up fast
> enough so that the swap will happen on the next vblank. If the kernel
> handled it, it could actually guarantee the OML_sync_control remainder
> behaviour (well assuming kernel threads get scheduled in a timely
> fashon), whereas the user space solution can't give such guarantees.

Yes. You could even do much of it from the vblank irq for robustness of =

timing. The downside would be probably the complexity of =

error/special-case handling. E.g., if an app schedules a swap 10 seconds =

into the future, but then the app dies/quits or a fullscreen window gets =

switched back to windowed mode, so something that was meant to be =

page-flipped suddenly can't be page-flipped anymore, or the window went =

away during that 10 secs.

> But even w/o that extra kernel feature, my code should be no worse in
> that regard than the current code. You can still do the wait for vblank
> trick in user space to get similar swap interval behaviour, and you can
> still use as many buffers as you want. The only real difference to the
> current situation is that if you schedule the flip too soon, you won't
> get the EBUSY from the kernel, but instead you drop the previous flip.
> But assuming the user space code is well behaved it won't try to flip
> too soon, so essentially nothing will change.
>

Yes. My remark wrt application control was just because i assumed you =

would be responsible for the whole stack when implementing this feature, =

also how this gets exposed to apps via the ddx / glx / mesa etc.

>> d) Flipping without vsync =3D tearing. I think this is at least useful f=
or
>> benchmarks, although not for anything else.
>
> This one I don't support curently. It would be possible to support it
> (assuming the HW allows it). The simplest way would be to just add a
> new flag to the ioctl to control this behaviour.
>

I think that's what Jesse's patches are supposed to add.

thanks,
-mario