From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?windows-1252?Q?Christian_K=F6nig?= Subject: Re: Question on UAPI for fences Date: Fri, 12 Sep 2014 18:08:23 +0200 Message-ID: <54131A77.3030003@amd.com> References: <5412F3CA.9060306@amd.com> <20140912145048.GA4139@gmail.com> <20140912153346.GB4139@gmail.com> <54131481.4040905@amd.com> <20140912154831.GC4139@gmail.com> <54131811.4050509@amd.com> <20140912160349.GD4139@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from na01-bl2-obe.outbound.protection.outlook.com (mail-bl2on0105.outbound.protection.outlook.com [65.55.169.105]) by gabe.freedesktop.org (Postfix) with ESMTP id 550EE6E1ED for ; Fri, 12 Sep 2014 09:08:33 -0700 (PDT) In-Reply-To: <20140912160349.GD4139@gmail.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Jerome Glisse Cc: Maarten Lankhorst , Zach Pfeffer , "dri-devel@lists.freedesktop.org" , "linaro-mm-sig@lists.linaro.org" , gpudriverdevsupport@amd.com, John Harrison List-Id: dri-devel@lists.freedesktop.org > As Daniel said using fd is most likely the way we want to do it but this > remains vague. Separating the discussion if it should be an fd or not. Using an fd = sounds fine to me in general, but I have some concerns as well. For example what was the maximum number of opened FDs per process again? = Could that become a problem? etc... Please comment, Christian. Am 12.09.2014 um 18:03 schrieb Jerome Glisse: > On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian K=F6nig wrote: >> Am 12.09.2014 um 17:48 schrieb Jerome Glisse: >>> On Fri, Sep 12, 2014 at 05:42:57PM +0200, Christian K=F6nig wrote: >>>> Am 12.09.2014 um 17:33 schrieb Jerome Glisse: >>>>> On Fri, Sep 12, 2014 at 11:25:12AM -0400, Alex Deucher wrote: >>>>>> On Fri, Sep 12, 2014 at 10:50 AM, Jerome Glisse = wrote: >>>>>>> On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote: >>>>>>>> On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter w= rote: >>>>>>>>> On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian K=F6nig wrote: >>>>>>>>>> Hello everyone, >>>>>>>>>> >>>>>>>>>> to allow concurrent buffer access by different engines beyond th= e multiple >>>>>>>>>> readers/single writer model that we currently use in radeon and = other >>>>>>>>>> drivers we need some kind of synchonization object exposed to us= erspace. >>>>>>>>>> >>>>>>>>>> My initial patch set for this used (or rather abused) zero sized= GEM buffers >>>>>>>>>> as fence handles. This is obviously isn't the best way of doing = this (to >>>>>>>>>> much overhead, rather ugly etc...), Jerome commented on this acc= ordingly. >>>>>>>>>> >>>>>>>>>> So what should a driver expose instead? Android sync points? Som= ething else? >>>>>>>>> I think actually exposing the struct fence objects as a fd, using= android >>>>>>>>> syncpts (or at least something compatible to it) is the way to go= . Problem >>>>>>>>> is that it's super-hard to get the android guys out of hiding for= this :( >>>>>>>>> >>>>>>>>> Adding a bunch of people in the hopes that something sticks. >>>>>>>> More people. >>>>>>> Just to re-iterate, exposing such thing while still using command s= tream >>>>>>> ioctl that use implicit synchronization is a waste and you can only= get >>>>>>> the lowest common denominator which is implicit synchronization. So= i do >>>>>>> not see the point of such api if you are not also adding a new cs i= octl >>>>>>> with explicit contract that it does not do any kind of synchronizat= ion >>>>>>> (it could be almost the exact same code modulo the do not wait for >>>>>>> previous cmd to complete). >>>>>> Our thinking was to allow explicit sync from a single process, but >>>>>> implicitly sync between processes. >>>>> This is a BIG NAK if you are using the same ioctl as it would mean yo= u are >>>>> changing userspace API, well at least userspace expectation. Adding a= new >>>>> cs flag might do the trick but it should not be about inter-process, = or any >>>>> thing special, it's just implicit sync or no synchronization. Convert= ing >>>>> userspace is not that much of a big deal either, it can be broken into >>>>> several step. Like mesa use explicit synchronization all time but ddx= use >>>>> implicit. >>>> The thinking here is that we need to be backward compatible for DRI2/3= and >>>> support all kind of different use cases like old DDX and new Mesa, or = old >>>> Mesa and new DDX etc... >>>> >>>> So for my prototype if the kernel sees any access of a BO from two dif= ferent >>>> clients it falls back to the old behavior of implicit synchronization = of >>>> access to the same buffer object. That might not be the fastest approa= ch, >>>> but is as far as I can see conservative and so should work under all >>>> conditions. >>>> >>>> Apart from that the planning so far was that we just hide this feature >>>> behind a couple of command submission flags and new chunks. >>> Just to reproduce IRC discussion, i think it's a lot simpler and not th= at >>> complex. For explicit cs ioctl you do not wait for any previous fence of >>> any of the buffer referenced in the cs ioctl, but you still associate a >>> new fence with all the buffer object referenced in the cs ioctl. So if = the >>> next ioctl is an implicit sync ioctl it will wait properly and synchron= ize >>> properly with previous explicit cs ioctl. Hence you can easily have a m= ix >>> in userspace thing is you only get benefit once enough of your userspace >>> is using explicit. >> Yes, that's exactly what my patches currently implement. >> >> The only difference is that by current planning I implemented it as a pe= r BO >> flag for the command submission, but that was just for testing. Having a >> single flag to switch between implicit and explicit synchronization for >> whole CS IOCTL would do equally well. > Doing it per BO sounds bogus to me. But otherwise yes we are in agreement. > As Daniel said using fd is most likely the way we want to do it but this > remains vague. > >>> Note that you still need a way to have explicit cs ioctl to wait on a >>> previos "explicit" fence so you need some api to expose fence per cs >>> submission. >> Exactly, that's what this mail thread is all about. >> >> As Daniel correctly noted you need something like a functionality to get= a >> fence as the result of a command submission as well as pass in a list of >> fences to wait for before beginning a command submission. >> >> At least it looks like we are all on the same general line here, its just >> nobody has a good idea how the details should look like. >> >> Regards, >> Christian. >> >>> Cheers, >>> J=E9r=F4me >>> >>>> Regards, >>>> Christian. >>>> >>>>> Cheers, >>>>> J=E9r=F4me >>>>> >>>>>> Alex >>>>>> >>>>>>> Also one thing that the Android sync point does not have, AFAICT, i= s a >>>>>>> way to schedule synchronization as part of a cs ioctl so cpu never = have >>>>>>> to be involve for cmd stream that deal only one gpu (assuming the d= river >>>>>>> and hw can do such trick). >>>>>>> >>>>>>> Cheers, >>>>>>> J=E9r=F4me >>>>>>> >>>>>>>> -Daniel >>>>>>>> -- >>>>>>>> Daniel Vetter >>>>>>>> Software Engineer, Intel Corporation >>>>>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch >>>>>>> _______________________________________________ >>>>>>> dri-devel mailing list >>>>>>> dri-devel@lists.freedesktop.org >>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel