From mboxrd@z Thu Jan  1 00:00:00 1970
From: John Harrison <John.C.Harrison@Intel.com>
Subject: Re: Question on UAPI for fences
Date: Fri, 12 Sep 2014 17:38:24 +0100
Message-ID: <54132180.5050905@Intel.com>
References: <5412F3CA.9060306@amd.com>
 <CAKMK7uHr0Eu0nWn1TzWSpi6fEudyGoTqWOEVR-Jo_f9eZXk9mA@mail.gmail.com>
 <CAKMK7uHgrT-j3qo9hxZcLMFo0Dzr-KCmT+G6VzO+OVrS8D8_SQ@mail.gmail.com>
 <20140912145048.GA4139@gmail.com>
 <CADnq5_N1xPr+zhTGKA4HpQcJffTAS-NcVidWPE31j45H_bVRKw@mail.gmail.com>
 <20140912153346.GB4139@gmail.com> <54131481.4040905@amd.com>
 <20140912154831.GC4139@gmail.com> <54131811.4050509@amd.com>
 <20140912160349.GD4139@gmail.com> <54131A77.3030003@amd.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="windows-1252"; Format="flowed"
Content-Transfer-Encoding: quoted-printable
Return-path: <dri-devel-bounces@lists.freedesktop.org>
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
 by gabe.freedesktop.org (Postfix) with ESMTP id C34FD6E682
 for <dri-devel@lists.freedesktop.org>; Fri, 12 Sep 2014 09:43:05 -0700 (PDT)
In-Reply-To: <54131A77.3030003@amd.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
To: =?windows-1252?Q?Christian_K=F6nig?= <christian.koenig@amd.com>, Jerome Glisse <j.glisse@gmail.com>
Cc: Maarten Lankhorst <m.b.lankhorst@gmail.com>, Zach Pfeffer <zpfeffer@audience.com>, "dri-devel@lists.freedesktop.org" <dri-devel@lists.freedesktop.org>, "linaro-mm-sig@lists.linaro.org" <linaro-mm-sig@lists.linaro.org>, gpudriverdevsupport@amd.com
List-Id: dri-devel@lists.freedesktop.org

On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian K=F6nig wrote:
 > pass in a list of fences to wait for before beginning a command =

submission.

The Android implementation has a mechanism for combining multiple sync =

points into a brand new single sync pt. Thus APIs only ever need to take =

in a single fd not a list of them. If the user wants an operation to =

wait for multiple events to occur then it is up to them to request the =

combined version first. They can then happily close the individual fds =

that have been combined and only keep the one big one around. Indeed, =

even that fd can be closed once it has been passed on to some other API.

Doing such combining and cleaning up fds as soon as they have been =

passed on should keep each application's fd usage fairly small.


On 12/09/2014 17:08, Christian K=F6nig wrote:
>> As Daniel said using fd is most likely the way we want to do it but this
>> remains vague.
> Separating the discussion if it should be an fd or not. Using an fd =

> sounds fine to me in general, but I have some concerns as well.
>
> For example what was the maximum number of opened FDs per process =

> again? Could that become a problem? etc...
>
> Please comment,
> Christian.
>
> Am 12.09.2014 um 18:03 schrieb Jerome Glisse:
>> On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian K=F6nig wrote:
>>> Am 12.09.2014 um 17:48 schrieb Jerome Glisse:
>>>> On Fri, Sep 12, 2014 at 05:42:57PM +0200, Christian K=F6nig wrote:
>>>>> Am 12.09.2014 um 17:33 schrieb Jerome Glisse:
>>>>>> On Fri, Sep 12, 2014 at 11:25:12AM -0400, Alex Deucher wrote:
>>>>>>> On Fri, Sep 12, 2014 at 10:50 AM, Jerome Glisse =

>>>>>>> <j.glisse@gmail.com> wrote:
>>>>>>>> On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote:
>>>>>>>>> On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter =

>>>>>>>>> <daniel@ffwll.ch> wrote:
>>>>>>>>>> On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian K=F6nig wrot=
e:
>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>
>>>>>>>>>>> to allow concurrent buffer access by different engines =

>>>>>>>>>>> beyond the multiple
>>>>>>>>>>> readers/single writer model that we currently use in radeon =

>>>>>>>>>>> and other
>>>>>>>>>>> drivers we need some kind of synchonization object exposed =

>>>>>>>>>>> to userspace.
>>>>>>>>>>>
>>>>>>>>>>> My initial patch set for this used (or rather abused) zero =

>>>>>>>>>>> sized GEM buffers
>>>>>>>>>>> as fence handles. This is obviously isn't the best way of =

>>>>>>>>>>> doing this (to
>>>>>>>>>>> much overhead, rather ugly etc...), Jerome commented on this =

>>>>>>>>>>> accordingly.
>>>>>>>>>>>
>>>>>>>>>>> So what should a driver expose instead? Android sync points? =

>>>>>>>>>>> Something else?
>>>>>>>>>> I think actually exposing the struct fence objects as a fd, =

>>>>>>>>>> using android
>>>>>>>>>> syncpts (or at least something compatible to it) is the way =

>>>>>>>>>> to go. Problem
>>>>>>>>>> is that it's super-hard to get the android guys out of hiding =

>>>>>>>>>> for this :(
>>>>>>>>>>
>>>>>>>>>> Adding a bunch of people in the hopes that something sticks.
>>>>>>>>> More people.
>>>>>>>> Just to re-iterate, exposing such thing while still using =

>>>>>>>> command stream
>>>>>>>> ioctl that use implicit synchronization is a waste and you can =

>>>>>>>> only get
>>>>>>>> the lowest common denominator which is implicit =

>>>>>>>> synchronization. So i do
>>>>>>>> not see the point of such api if you are not also adding a new =

>>>>>>>> cs ioctl
>>>>>>>> with explicit contract that it does not do any kind of =

>>>>>>>> synchronization
>>>>>>>> (it could be almost the exact same code modulo the do not wait for
>>>>>>>> previous cmd to complete).
>>>>>>> Our thinking was to allow explicit sync from a single process, but
>>>>>>> implicitly sync between processes.
>>>>>> This is a BIG NAK if you are using the same ioctl as it would =

>>>>>> mean you are
>>>>>> changing userspace API, well at least userspace expectation. =

>>>>>> Adding a new
>>>>>> cs flag might do the trick but it should not be about =

>>>>>> inter-process, or any
>>>>>> thing special, it's just implicit sync or no synchronization. =

>>>>>> Converting
>>>>>> userspace is not that much of a big deal either, it can be broken =

>>>>>> into
>>>>>> several step. Like mesa use explicit synchronization all time but =

>>>>>> ddx use
>>>>>> implicit.
>>>>> The thinking here is that we need to be backward compatible for =

>>>>> DRI2/3 and
>>>>> support all kind of different use cases like old DDX and new Mesa, =

>>>>> or old
>>>>> Mesa and new DDX etc...
>>>>>
>>>>> So for my prototype if the kernel sees any access of a BO from two =

>>>>> different
>>>>> clients it falls back to the old behavior of implicit =

>>>>> synchronization of
>>>>> access to the same buffer object. That might not be the fastest =

>>>>> approach,
>>>>> but is as far as I can see conservative and so should work under all
>>>>> conditions.
>>>>>
>>>>> Apart from that the planning so far was that we just hide this =

>>>>> feature
>>>>> behind a couple of command submission flags and new chunks.
>>>> Just to reproduce IRC discussion, i think it's a lot simpler and =

>>>> not that
>>>> complex. For explicit cs ioctl you do not wait for any previous =

>>>> fence of
>>>> any of the buffer referenced in the cs ioctl, but you still =

>>>> associate a
>>>> new fence with all the buffer object referenced in the cs ioctl. So =

>>>> if the
>>>> next ioctl is an implicit sync ioctl it will wait properly and =

>>>> synchronize
>>>> properly with previous explicit cs ioctl. Hence you can easily have =

>>>> a mix
>>>> in userspace thing is you only get benefit once enough of your =

>>>> userspace
>>>> is using explicit.
>>> Yes, that's exactly what my patches currently implement.
>>>
>>> The only difference is that by current planning I implemented it as =

>>> a per BO
>>> flag for the command submission, but that was just for testing. =

>>> Having a
>>> single flag to switch between implicit and explicit synchronization for
>>> whole CS IOCTL would do equally well.
>> Doing it per BO sounds bogus to me. But otherwise yes we are in =

>> agreement.
>> As Daniel said using fd is most likely the way we want to do it but this
>> remains vague.
>>
>>>> Note that you still need a way to have explicit cs ioctl to wait on a
>>>> previos "explicit" fence so you need some api to expose fence per cs
>>>> submission.
>>> Exactly, that's what this mail thread is all about.
>>>
>>> As Daniel correctly noted you need something like a functionality to =

>>> get a
>>> fence as the result of a command submission as well as pass in a =

>>> list of
>>> fences to wait for before beginning a command submission.
>>>
>>> At least it looks like we are all on the same general line here, its =

>>> just
>>> nobody has a good idea how the details should look like.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> Cheers,
>>>> J=E9r=F4me
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Cheers,
>>>>>> J=E9r=F4me
>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>>> Also one thing that the Android sync point does not have, =

>>>>>>>> AFAICT, is a
>>>>>>>> way to schedule synchronization as part of a cs ioctl so cpu =

>>>>>>>> never have
>>>>>>>> to be involve for cmd stream that deal only one gpu (assuming =

>>>>>>>> the driver
>>>>>>>> and hw can do such trick).
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> J=E9r=F4me
>>>>>>>>
>>>>>>>>> -Daniel
>>>>>>>>> -- =

>>>>>>>>> Daniel Vetter
>>>>>>>>> Software Engineer, Intel Corporation
>>>>>>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>>>>>>>> _______________________________________________
>>>>>>>> dri-devel mailing list
>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>