From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?windows-1252?Q?Christian_K=F6nig?= Subject: Re: Question on UAPI for fences Date: Sat, 13 Sep 2014 14:25:28 +0200 Message-ID: <541437B8.4010607@amd.com> References: <5412F3CA.9060306@amd.com> <20140912145048.GA4139@gmail.com> <20140912153346.GB4139@gmail.com> <54131481.4040905@amd.com> <20140912154831.GC4139@gmail.com> <54131811.4050509@amd.com> <20140912160349.GD4139@gmail.com> <54131A77.3030003@amd.com> <54132180.5050905@Intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from na01-bl2-obe.outbound.protection.outlook.com (mail-bl2on0135.outbound.protection.outlook.com [65.55.169.135]) by gabe.freedesktop.org (Postfix) with ESMTP id D36726E0C6 for ; Sat, 13 Sep 2014 05:25:42 -0700 (PDT) In-Reply-To: <54132180.5050905@Intel.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: John Harrison , Jerome Glisse Cc: Maarten Lankhorst , Zach Pfeffer , "dri-devel@lists.freedesktop.org" , "linaro-mm-sig@lists.linaro.org" , gpudriverdevsupport@amd.com List-Id: dri-devel@lists.freedesktop.org > Doing such combining and cleaning up fds as soon as they have been = > passed on should keep each application's fd usage fairly small. Yeah, but this is exactly what we wanted to avoid internally because of = the IOCTL overhead. And thinking more about it for our driver internal use we will = definitely hit some limitations with the number of FDs in use and the = overhead for creating and closing them. With the execution model we = target for the long term we will need something like 10k fences per = second or more. How about this: We use an identifier per client for the fence internally = and when we need to somehow expose it to somebody else export it as sync = point fd. Very similar to how we currently have GEM handles internally = and when we need to expose them export a DMA_buf fd. Regards, Christian. Am 12.09.2014 um 18:38 schrieb John Harrison: > On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian K=F6nig wrote: > > pass in a list of fences to wait for before beginning a command = > submission. > > The Android implementation has a mechanism for combining multiple sync = > points into a brand new single sync pt. Thus APIs only ever need to = > take in a single fd not a list of them. If the user wants an operation = > to wait for multiple events to occur then it is up to them to request = > the combined version first. They can then happily close the individual = > fds that have been combined and only keep the one big one around. = > Indeed, even that fd can be closed once it has been passed on to some = > other API. > > Doing such combining and cleaning up fds as soon as they have been = > passed on should keep each application's fd usage fairly small. > > > On 12/09/2014 17:08, Christian K=F6nig wrote: >>> As Daniel said using fd is most likely the way we want to do it but = >>> this >>> remains vague. >> Separating the discussion if it should be an fd or not. Using an fd = >> sounds fine to me in general, but I have some concerns as well. >> >> For example what was the maximum number of opened FDs per process = >> again? Could that become a problem? etc... >> >> Please comment, >> Christian. >> >> Am 12.09.2014 um 18:03 schrieb Jerome Glisse: >>> On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian K=F6nig wrote: >>>> Am 12.09.2014 um 17:48 schrieb Jerome Glisse: >>>>> On Fri, Sep 12, 2014 at 05:42:57PM +0200, Christian K=F6nig wrote: >>>>>> Am 12.09.2014 um 17:33 schrieb Jerome Glisse: >>>>>>> On Fri, Sep 12, 2014 at 11:25:12AM -0400, Alex Deucher wrote: >>>>>>>> On Fri, Sep 12, 2014 at 10:50 AM, Jerome Glisse = >>>>>>>> wrote: >>>>>>>>> On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote: >>>>>>>>>> On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter = >>>>>>>>>> wrote: >>>>>>>>>>> On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian K=F6nig = >>>>>>>>>>> wrote: >>>>>>>>>>>> Hello everyone, >>>>>>>>>>>> >>>>>>>>>>>> to allow concurrent buffer access by different engines = >>>>>>>>>>>> beyond the multiple >>>>>>>>>>>> readers/single writer model that we currently use in radeon = >>>>>>>>>>>> and other >>>>>>>>>>>> drivers we need some kind of synchonization object exposed = >>>>>>>>>>>> to userspace. >>>>>>>>>>>> >>>>>>>>>>>> My initial patch set for this used (or rather abused) zero = >>>>>>>>>>>> sized GEM buffers >>>>>>>>>>>> as fence handles. This is obviously isn't the best way of = >>>>>>>>>>>> doing this (to >>>>>>>>>>>> much overhead, rather ugly etc...), Jerome commented on = >>>>>>>>>>>> this accordingly. >>>>>>>>>>>> >>>>>>>>>>>> So what should a driver expose instead? Android sync = >>>>>>>>>>>> points? Something else? >>>>>>>>>>> I think actually exposing the struct fence objects as a fd, = >>>>>>>>>>> using android >>>>>>>>>>> syncpts (or at least something compatible to it) is the way = >>>>>>>>>>> to go. Problem >>>>>>>>>>> is that it's super-hard to get the android guys out of = >>>>>>>>>>> hiding for this :( >>>>>>>>>>> >>>>>>>>>>> Adding a bunch of people in the hopes that something sticks. >>>>>>>>>> More people. >>>>>>>>> Just to re-iterate, exposing such thing while still using = >>>>>>>>> command stream >>>>>>>>> ioctl that use implicit synchronization is a waste and you can = >>>>>>>>> only get >>>>>>>>> the lowest common denominator which is implicit = >>>>>>>>> synchronization. So i do >>>>>>>>> not see the point of such api if you are not also adding a new = >>>>>>>>> cs ioctl >>>>>>>>> with explicit contract that it does not do any kind of = >>>>>>>>> synchronization >>>>>>>>> (it could be almost the exact same code modulo the do not wait = >>>>>>>>> for >>>>>>>>> previous cmd to complete). >>>>>>>> Our thinking was to allow explicit sync from a single process, but >>>>>>>> implicitly sync between processes. >>>>>>> This is a BIG NAK if you are using the same ioctl as it would = >>>>>>> mean you are >>>>>>> changing userspace API, well at least userspace expectation. = >>>>>>> Adding a new >>>>>>> cs flag might do the trick but it should not be about = >>>>>>> inter-process, or any >>>>>>> thing special, it's just implicit sync or no synchronization. = >>>>>>> Converting >>>>>>> userspace is not that much of a big deal either, it can be = >>>>>>> broken into >>>>>>> several step. Like mesa use explicit synchronization all time = >>>>>>> but ddx use >>>>>>> implicit. >>>>>> The thinking here is that we need to be backward compatible for = >>>>>> DRI2/3 and >>>>>> support all kind of different use cases like old DDX and new = >>>>>> Mesa, or old >>>>>> Mesa and new DDX etc... >>>>>> >>>>>> So for my prototype if the kernel sees any access of a BO from = >>>>>> two different >>>>>> clients it falls back to the old behavior of implicit = >>>>>> synchronization of >>>>>> access to the same buffer object. That might not be the fastest = >>>>>> approach, >>>>>> but is as far as I can see conservative and so should work under all >>>>>> conditions. >>>>>> >>>>>> Apart from that the planning so far was that we just hide this = >>>>>> feature >>>>>> behind a couple of command submission flags and new chunks. >>>>> Just to reproduce IRC discussion, i think it's a lot simpler and = >>>>> not that >>>>> complex. For explicit cs ioctl you do not wait for any previous = >>>>> fence of >>>>> any of the buffer referenced in the cs ioctl, but you still = >>>>> associate a >>>>> new fence with all the buffer object referenced in the cs ioctl. = >>>>> So if the >>>>> next ioctl is an implicit sync ioctl it will wait properly and = >>>>> synchronize >>>>> properly with previous explicit cs ioctl. Hence you can easily = >>>>> have a mix >>>>> in userspace thing is you only get benefit once enough of your = >>>>> userspace >>>>> is using explicit. >>>> Yes, that's exactly what my patches currently implement. >>>> >>>> The only difference is that by current planning I implemented it as = >>>> a per BO >>>> flag for the command submission, but that was just for testing. = >>>> Having a >>>> single flag to switch between implicit and explicit synchronization = >>>> for >>>> whole CS IOCTL would do equally well. >>> Doing it per BO sounds bogus to me. But otherwise yes we are in = >>> agreement. >>> As Daniel said using fd is most likely the way we want to do it but = >>> this >>> remains vague. >>> >>>>> Note that you still need a way to have explicit cs ioctl to wait on a >>>>> previos "explicit" fence so you need some api to expose fence per cs >>>>> submission. >>>> Exactly, that's what this mail thread is all about. >>>> >>>> As Daniel correctly noted you need something like a functionality = >>>> to get a >>>> fence as the result of a command submission as well as pass in a = >>>> list of >>>> fences to wait for before beginning a command submission. >>>> >>>> At least it looks like we are all on the same general line here, = >>>> its just >>>> nobody has a good idea how the details should look like. >>>> >>>> Regards, >>>> Christian. >>>> >>>>> Cheers, >>>>> J=E9r=F4me >>>>> >>>>>> Regards, >>>>>> Christian. >>>>>> >>>>>>> Cheers, >>>>>>> J=E9r=F4me >>>>>>> >>>>>>>> Alex >>>>>>>> >>>>>>>>> Also one thing that the Android sync point does not have, = >>>>>>>>> AFAICT, is a >>>>>>>>> way to schedule synchronization as part of a cs ioctl so cpu = >>>>>>>>> never have >>>>>>>>> to be involve for cmd stream that deal only one gpu (assuming = >>>>>>>>> the driver >>>>>>>>> and hw can do such trick). >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> J=E9r=F4me >>>>>>>>> >>>>>>>>>> -Daniel >>>>>>>>>> -- = >>>>>>>>>> Daniel Vetter >>>>>>>>>> Software Engineer, Intel Corporation >>>>>>>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch >>>>>>>>> _______________________________________________ >>>>>>>>> dri-devel mailing list >>>>>>>>> dri-devel@lists.freedesktop.org >>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel >>