From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Harrison Subject: Re: Question on UAPI for fences Date: Fri, 12 Sep 2014 17:38:24 +0100 Message-ID: <54132180.5050905@Intel.com> References: <5412F3CA.9060306@amd.com> <20140912145048.GA4139@gmail.com> <20140912153346.GB4139@gmail.com> <54131481.4040905@amd.com> <20140912154831.GC4139@gmail.com> <54131811.4050509@amd.com> <20140912160349.GD4139@gmail.com> <54131A77.3030003@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTP id C34FD6E682 for ; Fri, 12 Sep 2014 09:43:05 -0700 (PDT) In-Reply-To: <54131A77.3030003@amd.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: =?windows-1252?Q?Christian_K=F6nig?= , Jerome Glisse Cc: Maarten Lankhorst , Zach Pfeffer , "dri-devel@lists.freedesktop.org" , "linaro-mm-sig@lists.linaro.org" , gpudriverdevsupport@amd.com List-Id: dri-devel@lists.freedesktop.org On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian K=F6nig wrote: > pass in a list of fences to wait for before beginning a command = submission. The Android implementation has a mechanism for combining multiple sync = points into a brand new single sync pt. Thus APIs only ever need to take = in a single fd not a list of them. If the user wants an operation to = wait for multiple events to occur then it is up to them to request the = combined version first. They can then happily close the individual fds = that have been combined and only keep the one big one around. Indeed, = even that fd can be closed once it has been passed on to some other API. Doing such combining and cleaning up fds as soon as they have been = passed on should keep each application's fd usage fairly small. On 12/09/2014 17:08, Christian K=F6nig wrote: >> As Daniel said using fd is most likely the way we want to do it but this >> remains vague. > Separating the discussion if it should be an fd or not. Using an fd = > sounds fine to me in general, but I have some concerns as well. > > For example what was the maximum number of opened FDs per process = > again? Could that become a problem? etc... > > Please comment, > Christian. > > Am 12.09.2014 um 18:03 schrieb Jerome Glisse: >> On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian K=F6nig wrote: >>> Am 12.09.2014 um 17:48 schrieb Jerome Glisse: >>>> On Fri, Sep 12, 2014 at 05:42:57PM +0200, Christian K=F6nig wrote: >>>>> Am 12.09.2014 um 17:33 schrieb Jerome Glisse: >>>>>> On Fri, Sep 12, 2014 at 11:25:12AM -0400, Alex Deucher wrote: >>>>>>> On Fri, Sep 12, 2014 at 10:50 AM, Jerome Glisse = >>>>>>> wrote: >>>>>>>> On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote: >>>>>>>>> On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter = >>>>>>>>> wrote: >>>>>>>>>> On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian K=F6nig wrot= e: >>>>>>>>>>> Hello everyone, >>>>>>>>>>> >>>>>>>>>>> to allow concurrent buffer access by different engines = >>>>>>>>>>> beyond the multiple >>>>>>>>>>> readers/single writer model that we currently use in radeon = >>>>>>>>>>> and other >>>>>>>>>>> drivers we need some kind of synchonization object exposed = >>>>>>>>>>> to userspace. >>>>>>>>>>> >>>>>>>>>>> My initial patch set for this used (or rather abused) zero = >>>>>>>>>>> sized GEM buffers >>>>>>>>>>> as fence handles. This is obviously isn't the best way of = >>>>>>>>>>> doing this (to >>>>>>>>>>> much overhead, rather ugly etc...), Jerome commented on this = >>>>>>>>>>> accordingly. >>>>>>>>>>> >>>>>>>>>>> So what should a driver expose instead? Android sync points? = >>>>>>>>>>> Something else? >>>>>>>>>> I think actually exposing the struct fence objects as a fd, = >>>>>>>>>> using android >>>>>>>>>> syncpts (or at least something compatible to it) is the way = >>>>>>>>>> to go. Problem >>>>>>>>>> is that it's super-hard to get the android guys out of hiding = >>>>>>>>>> for this :( >>>>>>>>>> >>>>>>>>>> Adding a bunch of people in the hopes that something sticks. >>>>>>>>> More people. >>>>>>>> Just to re-iterate, exposing such thing while still using = >>>>>>>> command stream >>>>>>>> ioctl that use implicit synchronization is a waste and you can = >>>>>>>> only get >>>>>>>> the lowest common denominator which is implicit = >>>>>>>> synchronization. So i do >>>>>>>> not see the point of such api if you are not also adding a new = >>>>>>>> cs ioctl >>>>>>>> with explicit contract that it does not do any kind of = >>>>>>>> synchronization >>>>>>>> (it could be almost the exact same code modulo the do not wait for >>>>>>>> previous cmd to complete). >>>>>>> Our thinking was to allow explicit sync from a single process, but >>>>>>> implicitly sync between processes. >>>>>> This is a BIG NAK if you are using the same ioctl as it would = >>>>>> mean you are >>>>>> changing userspace API, well at least userspace expectation. = >>>>>> Adding a new >>>>>> cs flag might do the trick but it should not be about = >>>>>> inter-process, or any >>>>>> thing special, it's just implicit sync or no synchronization. = >>>>>> Converting >>>>>> userspace is not that much of a big deal either, it can be broken = >>>>>> into >>>>>> several step. Like mesa use explicit synchronization all time but = >>>>>> ddx use >>>>>> implicit. >>>>> The thinking here is that we need to be backward compatible for = >>>>> DRI2/3 and >>>>> support all kind of different use cases like old DDX and new Mesa, = >>>>> or old >>>>> Mesa and new DDX etc... >>>>> >>>>> So for my prototype if the kernel sees any access of a BO from two = >>>>> different >>>>> clients it falls back to the old behavior of implicit = >>>>> synchronization of >>>>> access to the same buffer object. That might not be the fastest = >>>>> approach, >>>>> but is as far as I can see conservative and so should work under all >>>>> conditions. >>>>> >>>>> Apart from that the planning so far was that we just hide this = >>>>> feature >>>>> behind a couple of command submission flags and new chunks. >>>> Just to reproduce IRC discussion, i think it's a lot simpler and = >>>> not that >>>> complex. For explicit cs ioctl you do not wait for any previous = >>>> fence of >>>> any of the buffer referenced in the cs ioctl, but you still = >>>> associate a >>>> new fence with all the buffer object referenced in the cs ioctl. So = >>>> if the >>>> next ioctl is an implicit sync ioctl it will wait properly and = >>>> synchronize >>>> properly with previous explicit cs ioctl. Hence you can easily have = >>>> a mix >>>> in userspace thing is you only get benefit once enough of your = >>>> userspace >>>> is using explicit. >>> Yes, that's exactly what my patches currently implement. >>> >>> The only difference is that by current planning I implemented it as = >>> a per BO >>> flag for the command submission, but that was just for testing. = >>> Having a >>> single flag to switch between implicit and explicit synchronization for >>> whole CS IOCTL would do equally well. >> Doing it per BO sounds bogus to me. But otherwise yes we are in = >> agreement. >> As Daniel said using fd is most likely the way we want to do it but this >> remains vague. >> >>>> Note that you still need a way to have explicit cs ioctl to wait on a >>>> previos "explicit" fence so you need some api to expose fence per cs >>>> submission. >>> Exactly, that's what this mail thread is all about. >>> >>> As Daniel correctly noted you need something like a functionality to = >>> get a >>> fence as the result of a command submission as well as pass in a = >>> list of >>> fences to wait for before beginning a command submission. >>> >>> At least it looks like we are all on the same general line here, its = >>> just >>> nobody has a good idea how the details should look like. >>> >>> Regards, >>> Christian. >>> >>>> Cheers, >>>> J=E9r=F4me >>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>>> Cheers, >>>>>> J=E9r=F4me >>>>>> >>>>>>> Alex >>>>>>> >>>>>>>> Also one thing that the Android sync point does not have, = >>>>>>>> AFAICT, is a >>>>>>>> way to schedule synchronization as part of a cs ioctl so cpu = >>>>>>>> never have >>>>>>>> to be involve for cmd stream that deal only one gpu (assuming = >>>>>>>> the driver >>>>>>>> and hw can do such trick). >>>>>>>> >>>>>>>> Cheers, >>>>>>>> J=E9r=F4me >>>>>>>> >>>>>>>>> -Daniel >>>>>>>>> -- = >>>>>>>>> Daniel Vetter >>>>>>>>> Software Engineer, Intel Corporation >>>>>>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch >>>>>>>> _______________________________________________ >>>>>>>> dri-devel mailing list >>>>>>>> dri-devel@lists.freedesktop.org >>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel >