From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932303AbcASQMe (ORCPT ); Tue, 19 Jan 2016 11:12:34 -0500 Received: from mga09.intel.com ([134.134.136.24]:39911 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932087AbcASQMY (ORCPT ); Tue, 19 Jan 2016 11:12:24 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,317,1449561600"; d="scan'208";a="884796263" Message-ID: <569E6062.6030309@Intel.com> Date: Tue, 19 Jan 2016 16:12:18 +0000 From: John Harrison Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Gustavo Padovan , Greg Kroah-Hartman , linux-kernel@vger.kernel.org, devel@driverdev.osuosl.org, dri-devel@lists.freedesktop.org, daniels@collabora.com, =?windows-1252?Q?Arve_Hj=F8nnev=E5g?= , Riley Andrews , Rob Clark , Greg Hackmann , Maarten Lankhorst , Gustavo Padovan Subject: Re: [RFC 00/29] De-stage android's sync framework References: <1452869739-3304-1-git-send-email-gustavo@padovan.org> <20160119110017.GZ19130@phenom.ffwll.local> <20160119152309.GA8217@joana> In-Reply-To: <20160119152309.GA8217@joana> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 19/01/2016 15:23, Gustavo Padovan wrote: > Hi Daniel, > > 2016-01-19 Daniel Vetter : > >> On Fri, Jan 15, 2016 at 12:55:10PM -0200, Gustavo Padovan wrote: >>> From: Gustavo Padovan >>> >>> This patch series de-stage the sync framework, and in order to accomplish that >>> a bunch of cleanups/improvements on the sync and fence were made. >>> >>> The sync framework contained some abstractions around struct fence and those >>> were removed in the de-staging process among other changes: >>> >>> Userspace visible changes >>> ------------------------- >>> >>> * The sw_sync file was moved from /dev/sw_sync to /sync/sw_sync. No >>> other change. >>> >>> Kernel API changes >>> ------------------ >>> >>> * struct sync_timeline is now struct fence_timeline >>> * sync_timeline_ops is now fence_timeline_ops and they now carry struct >>> fence as parameter instead of struct sync_pt >>> * a .cleanup() fence op was added to allow sync_fence to run a cleanup when >>> the fence_timeline is destroyed >>> * added fence_add_used_data() to pass a private point to struct fence. This >>> pointer is sent back on the .cleanup op. >>> * The sync timeline function were moved to be fence_timeline functions: >>> - sync_timeline_create() -> fence_timeline_create() >>> - sync_timeline_get() -> fence_timeline_get() >>> - sync_timeline_put() -> fence_timeline_put() >>> - sync_timeline_destroy() -> fence_timeline_destroy() >>> - sync_timeline_signal() -> fence_timeline_signal() >>> >>> * sync_pt_create() was replaced be fence_create_on_timeline() >>> >>> Internal changes >>> ---------------- >>> >>> * fence_timeline_ops was removed in favor of direct use fence_ops >>> * fence default functions were created for fence_ops >>> * removed structs sync_pt, sw_sync_timeline and sw_sync_pt >> Bunch of fairly random comments all over: >> >> - include/uapi/linux/sw_sync.h imo should be dropped, it's just a private >> debugfs interface between fence fds and the testsuite. Since the plan is >> to have the testcases integrated into the kernel tree too we don't need >> a public header. >> >> - similar for include/linux/sw_sync.h Imo that should all be moved into >> sync_debug.c. Same for sw_sync.c, that should all land in sync_debug >> imo, and made optional with a Kconfig option. At least we should reuse >> CONFIG_DEBUGFS. > These two items sounds reasonable to me. I have just posted our in-progress IGT for testing i915 syncs (with a CC of Gustavo). It uses the sw_sync mechanisms. Can you take a quick look and see if it is the kind of thing you would expect us to be doing? Or is it using interfaces that you are planning to remove and/or make kernel only? I'm not sure having a kernel only test is the best way to go. Having user land tests like IGT would be much more versatile. >> - fence_context and fence_timeline are really the same. timeline has some >> super-basic support for doing sw-only fence timelines, but imo that's >> not really worth keeping (and if so better to keep seperate in a >> sw-fence.c or similar, like seqno-fence.c). The other main thing >> timeline provides is support to clean up fences on a timeline. And imo >> that cleanup should be done by the core fence support, not by the add-on >> stuff. > Yes, they are. But I currently doesn't know how to merge them best, so I > decided to go for a RFC instead of trying some crazy solution touching > all fence_context users. > >> Interlude about fence cleanup on driver unload: >> >> Working drivers imo should never call timeline_destroy when there's still >> an unsignalled fence around for that timeline/context. That just means >> they're broken and failed to clean up all the pending work. So the problem >> really is only what to do with fences where the driver disappeared, and >> for that we essentially need a fence_revoke() function (which could be >> called internally from timeline_free). So here's what I think >> timeline_free should do: >> >> for_each_fence_on_timel() { >> WARN_ON(!fence_is_signalled()); >> >> fence_revoke(fence); >> } >> >> Implementing fence_revoke is a bit tricky since we need to make sure the >> memory contained ->ops and similar stuff doesn't disappear. Simplest >> option might be to grab a temporary reference (using >> kref_get_unless_zero), and then exchange ->ops with one that has only a >> release function. We don't need anything else as long as all fence_* >> functions the kernel might call check for signalling correctly first >> (fence_wait is broken at least). >> >> Or we just give up (for now) and declare module unload as slightly racy. >> dma-buf is similar. An intermediate option might be to at least add a >> THIS_MODULE reference to each fence (but that's a bit expensive ...). > I'd say we just give up for now as we don't have any driver using > timeline_destroy for now. So we could go for other improvements first. > >> - back to timeline vs. context: I have no idea how to best clean up this >> mess, but least painful option long-term is probably to switch over all >> current users of fence_context_alloc to timelines and remove the plain >> context interface. > Agreed. > >> - Imo the interface in include/linux/sync.h is duplicating too much of >> fence.h. I think the only bits we need are the refcounting, creating, >> fd-install and that's it. Plus a macro to loop over all the fences in a >> sync_fence. With that drivers will only ever deal with a pile of >> struct fence, making implicit fencing (using the fence list in dma-buf) >> and explicit fencing (using the fence list in sync_fence) much more >> similar. > Yes, most of the sync_fence waiting should not be exported. Drivers > should only wait for fence imo, not sync_fences. > >> And we can easily do that since no internal users ;-) >> >> - get_timeline_name and get_driver_name are imo too much indirection, just >> add ->(drv_)name field to each of these. >> >> - struct sync_fence is a major confusion imo against struct fence. It >> made much more sense in the pure-android world where fence == sync_pt. >> Maybe we can rename sync_fence to sync_fence_fd (a bit long, and fd is a >> bit inaccurate), sync_file (like this best), fence_file (sounds silly >> imo), or something else? > sync_file sounds good for me. fence_file feels like it a file for a > single fence but we may have many fences on one sync_file. > >> - I guess just not yet part of this rfc, but moving the testsuite and >> adding kerneldoc for this is planned I guess? If you feel like I think >> it'd be best. We pull the current dma-buf stuff into >> device-drivers.tmpl, but it's completely lacking overview docs and all >> that. And I'd like to duplicate at least the dma-buf/fence sections into >> the gpu.tmpl docbook. > We have converted testsuite from android's libsync but we need to wait > for Google to re-license it to send it upstream. > > kerneldoc is planned for sure, but I'd say it will be better to have > some users first, DRM for example. > >> - If we make timelines first class objects I think we could move some of >> the fields from struct fence to struct fence_timeline. E.g. the ops >> struct. That also makes it clearer that some of the vfuncs really should >> be taking a struct fence_timeline *timeline instead of a struct fence >> *fence as their primary parameter. > I'll keep that as a final goal and work RFC v2 and see how far we can > get. > > Gustavo