From mboxrd@z Thu Jan  1 00:00:00 1970
From: =?windows-1252?Q?Christian_K=F6nig?= <christian.koenig@amd.com>
Subject: Re: Question on UAPI for fences
Date: Sat, 13 Sep 2014 14:25:28 +0200
Message-ID: <541437B8.4010607@amd.com>
References: <5412F3CA.9060306@amd.com>
 <CAKMK7uHr0Eu0nWn1TzWSpi6fEudyGoTqWOEVR-Jo_f9eZXk9mA@mail.gmail.com>
 <CAKMK7uHgrT-j3qo9hxZcLMFo0Dzr-KCmT+G6VzO+OVrS8D8_SQ@mail.gmail.com>
 <20140912145048.GA4139@gmail.com>
 <CADnq5_N1xPr+zhTGKA4HpQcJffTAS-NcVidWPE31j45H_bVRKw@mail.gmail.com>
 <20140912153346.GB4139@gmail.com> <54131481.4040905@amd.com>
 <20140912154831.GC4139@gmail.com> <54131811.4050509@amd.com>
 <20140912160349.GD4139@gmail.com> <54131A77.3030003@amd.com>
 <54132180.5050905@Intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="windows-1252"; Format="flowed"
Content-Transfer-Encoding: quoted-printable
Return-path: <dri-devel-bounces@lists.freedesktop.org>
Received: from na01-bl2-obe.outbound.protection.outlook.com
 (mail-bl2on0135.outbound.protection.outlook.com [65.55.169.135])
 by gabe.freedesktop.org (Postfix) with ESMTP id D36726E0C6
 for <dri-devel@lists.freedesktop.org>; Sat, 13 Sep 2014 05:25:42 -0700 (PDT)
In-Reply-To: <54132180.5050905@Intel.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
To: John Harrison <John.C.Harrison@Intel.com>, Jerome Glisse <j.glisse@gmail.com>
Cc: Maarten Lankhorst <m.b.lankhorst@gmail.com>, Zach Pfeffer <zpfeffer@audience.com>, "dri-devel@lists.freedesktop.org" <dri-devel@lists.freedesktop.org>, "linaro-mm-sig@lists.linaro.org" <linaro-mm-sig@lists.linaro.org>, gpudriverdevsupport@amd.com
List-Id: dri-devel@lists.freedesktop.org

> Doing such combining and cleaning up fds as soon as they have been =

> passed on should keep each application's fd usage fairly small.
Yeah, but this is exactly what we wanted to avoid internally because of =

the IOCTL overhead.

And thinking more about it for our driver internal use we will =

definitely hit some limitations with the number of FDs in use and the =

overhead for creating and closing them. With the execution model we =

target for the long term we will need something like 10k fences per =

second or more.

How about this: We use an identifier per client for the fence internally =

and when we need to somehow expose it to somebody else export it as sync =

point fd. Very similar to how we currently have GEM handles internally =

and when we need to expose them export a DMA_buf fd.

Regards,
Christian.

Am 12.09.2014 um 18:38 schrieb John Harrison:
> On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian K=F6nig wrote:
> > pass in a list of fences to wait for before beginning a command =

> submission.
>
> The Android implementation has a mechanism for combining multiple sync =

> points into a brand new single sync pt. Thus APIs only ever need to =

> take in a single fd not a list of them. If the user wants an operation =

> to wait for multiple events to occur then it is up to them to request =

> the combined version first. They can then happily close the individual =

> fds that have been combined and only keep the one big one around. =

> Indeed, even that fd can be closed once it has been passed on to some =

> other API.
>
> Doing such combining and cleaning up fds as soon as they have been =

> passed on should keep each application's fd usage fairly small.
>
>
> On 12/09/2014 17:08, Christian K=F6nig wrote:
>>> As Daniel said using fd is most likely the way we want to do it but =

>>> this
>>> remains vague.
>> Separating the discussion if it should be an fd or not. Using an fd =

>> sounds fine to me in general, but I have some concerns as well.
>>
>> For example what was the maximum number of opened FDs per process =

>> again? Could that become a problem? etc...
>>
>> Please comment,
>> Christian.
>>
>> Am 12.09.2014 um 18:03 schrieb Jerome Glisse:
>>> On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian K=F6nig wrote:
>>>> Am 12.09.2014 um 17:48 schrieb Jerome Glisse:
>>>>> On Fri, Sep 12, 2014 at 05:42:57PM +0200, Christian K=F6nig wrote:
>>>>>> Am 12.09.2014 um 17:33 schrieb Jerome Glisse:
>>>>>>> On Fri, Sep 12, 2014 at 11:25:12AM -0400, Alex Deucher wrote:
>>>>>>>> On Fri, Sep 12, 2014 at 10:50 AM, Jerome Glisse =

>>>>>>>> <j.glisse@gmail.com> wrote:
>>>>>>>>> On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote:
>>>>>>>>>> On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter =

>>>>>>>>>> <daniel@ffwll.ch> wrote:
>>>>>>>>>>> On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian K=F6nig =

>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>
>>>>>>>>>>>> to allow concurrent buffer access by different engines =

>>>>>>>>>>>> beyond the multiple
>>>>>>>>>>>> readers/single writer model that we currently use in radeon =

>>>>>>>>>>>> and other
>>>>>>>>>>>> drivers we need some kind of synchonization object exposed =

>>>>>>>>>>>> to userspace.
>>>>>>>>>>>>
>>>>>>>>>>>> My initial patch set for this used (or rather abused) zero =

>>>>>>>>>>>> sized GEM buffers
>>>>>>>>>>>> as fence handles. This is obviously isn't the best way of =

>>>>>>>>>>>> doing this (to
>>>>>>>>>>>> much overhead, rather ugly etc...), Jerome commented on =

>>>>>>>>>>>> this accordingly.
>>>>>>>>>>>>
>>>>>>>>>>>> So what should a driver expose instead? Android sync =

>>>>>>>>>>>> points? Something else?
>>>>>>>>>>> I think actually exposing the struct fence objects as a fd, =

>>>>>>>>>>> using android
>>>>>>>>>>> syncpts (or at least something compatible to it) is the way =

>>>>>>>>>>> to go. Problem
>>>>>>>>>>> is that it's super-hard to get the android guys out of =

>>>>>>>>>>> hiding for this :(
>>>>>>>>>>>
>>>>>>>>>>> Adding a bunch of people in the hopes that something sticks.
>>>>>>>>>> More people.
>>>>>>>>> Just to re-iterate, exposing such thing while still using =

>>>>>>>>> command stream
>>>>>>>>> ioctl that use implicit synchronization is a waste and you can =

>>>>>>>>> only get
>>>>>>>>> the lowest common denominator which is implicit =

>>>>>>>>> synchronization. So i do
>>>>>>>>> not see the point of such api if you are not also adding a new =

>>>>>>>>> cs ioctl
>>>>>>>>> with explicit contract that it does not do any kind of =

>>>>>>>>> synchronization
>>>>>>>>> (it could be almost the exact same code modulo the do not wait =

>>>>>>>>> for
>>>>>>>>> previous cmd to complete).
>>>>>>>> Our thinking was to allow explicit sync from a single process, but
>>>>>>>> implicitly sync between processes.
>>>>>>> This is a BIG NAK if you are using the same ioctl as it would =

>>>>>>> mean you are
>>>>>>> changing userspace API, well at least userspace expectation. =

>>>>>>> Adding a new
>>>>>>> cs flag might do the trick but it should not be about =

>>>>>>> inter-process, or any
>>>>>>> thing special, it's just implicit sync or no synchronization. =

>>>>>>> Converting
>>>>>>> userspace is not that much of a big deal either, it can be =

>>>>>>> broken into
>>>>>>> several step. Like mesa use explicit synchronization all time =

>>>>>>> but ddx use
>>>>>>> implicit.
>>>>>> The thinking here is that we need to be backward compatible for =

>>>>>> DRI2/3 and
>>>>>> support all kind of different use cases like old DDX and new =

>>>>>> Mesa, or old
>>>>>> Mesa and new DDX etc...
>>>>>>
>>>>>> So for my prototype if the kernel sees any access of a BO from =

>>>>>> two different
>>>>>> clients it falls back to the old behavior of implicit =

>>>>>> synchronization of
>>>>>> access to the same buffer object. That might not be the fastest =

>>>>>> approach,
>>>>>> but is as far as I can see conservative and so should work under all
>>>>>> conditions.
>>>>>>
>>>>>> Apart from that the planning so far was that we just hide this =

>>>>>> feature
>>>>>> behind a couple of command submission flags and new chunks.
>>>>> Just to reproduce IRC discussion, i think it's a lot simpler and =

>>>>> not that
>>>>> complex. For explicit cs ioctl you do not wait for any previous =

>>>>> fence of
>>>>> any of the buffer referenced in the cs ioctl, but you still =

>>>>> associate a
>>>>> new fence with all the buffer object referenced in the cs ioctl. =

>>>>> So if the
>>>>> next ioctl is an implicit sync ioctl it will wait properly and =

>>>>> synchronize
>>>>> properly with previous explicit cs ioctl. Hence you can easily =

>>>>> have a mix
>>>>> in userspace thing is you only get benefit once enough of your =

>>>>> userspace
>>>>> is using explicit.
>>>> Yes, that's exactly what my patches currently implement.
>>>>
>>>> The only difference is that by current planning I implemented it as =

>>>> a per BO
>>>> flag for the command submission, but that was just for testing. =

>>>> Having a
>>>> single flag to switch between implicit and explicit synchronization =

>>>> for
>>>> whole CS IOCTL would do equally well.
>>> Doing it per BO sounds bogus to me. But otherwise yes we are in =

>>> agreement.
>>> As Daniel said using fd is most likely the way we want to do it but =

>>> this
>>> remains vague.
>>>
>>>>> Note that you still need a way to have explicit cs ioctl to wait on a
>>>>> previos "explicit" fence so you need some api to expose fence per cs
>>>>> submission.
>>>> Exactly, that's what this mail thread is all about.
>>>>
>>>> As Daniel correctly noted you need something like a functionality =

>>>> to get a
>>>> fence as the result of a command submission as well as pass in a =

>>>> list of
>>>> fences to wait for before beginning a command submission.
>>>>
>>>> At least it looks like we are all on the same general line here, =

>>>> its just
>>>> nobody has a good idea how the details should look like.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Cheers,
>>>>> J=E9r=F4me
>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>> Cheers,
>>>>>>> J=E9r=F4me
>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>> Also one thing that the Android sync point does not have, =

>>>>>>>>> AFAICT, is a
>>>>>>>>> way to schedule synchronization as part of a cs ioctl so cpu =

>>>>>>>>> never have
>>>>>>>>> to be involve for cmd stream that deal only one gpu (assuming =

>>>>>>>>> the driver
>>>>>>>>> and hw can do such trick).
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> J=E9r=F4me
>>>>>>>>>
>>>>>>>>>> -Daniel
>>>>>>>>>> -- =

>>>>>>>>>> Daniel Vetter
>>>>>>>>>> Software Engineer, Intel Corporation
>>>>>>>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>>>>>>>>> _______________________________________________
>>>>>>>>> dri-devel mailing list
>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>