From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mga09.intel.com ([134.134.136.24]:1288 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753567AbcC1TmU (ORCPT ); Mon, 28 Mar 2016 15:42:20 -0400 Subject: Re: [PATCH] dma-buf: Update docs for SYNC ioctl To: Chris Wilson , David Herrmann , Daniel Vetter , Sumit Semwal , Daniel Vetter , DRI Development , =?UTF-8?Q?St=c3=a9phane_Marchesin?= , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, Intel Graphics Development , devel@driverdev.osuosl.org, Hans Verkuil References: <1458546705-3564-1-git-send-email-daniel.vetter@ffwll.ch> <20160321171405.GP28483@phenom.ffwll.local> <20160323115659.GF21717@nuc-i3427.alporthouse.com> <20160323154223.GJ21717@nuc-i3427.alporthouse.com> From: Tiago Vignatti Message-ID: <56F98915.1030200@intel.com> Date: Mon, 28 Mar 2016 16:42:13 -0300 MIME-Version: 1.0 In-Reply-To: <20160323154223.GJ21717@nuc-i3427.alporthouse.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-media-owner@vger.kernel.org List-ID: On 03/23/2016 12:42 PM, Chris Wilson wrote: > On Wed, Mar 23, 2016 at 04:32:59PM +0100, David Herrmann wrote: >> Hi >> >> On Wed, Mar 23, 2016 at 12:56 PM, Chris Wilson wrote: >>> On Wed, Mar 23, 2016 at 12:30:42PM +0100, David Herrmann wrote: >>>> My question was rather about why we do this? Semantics for EINTR are >>>> well defined, and with SA_RESTART (default on linux) user-space can >>>> ignore it. However, looping on EAGAIN is very uncommon, and it is not >>>> at all clear why it is needed? >>>> >>>> Returning an error to user-space makes sense if user-space has a >>>> reason to react to it. I fail to see how EAGAIN on a cache-flush/sync >>>> operation helps user-space at all? As someone without insight into the >>>> driver implementation, it is hard to tell why.. Any hints? >>> >>> The reason we return EAGAIN is to workaround a deadlock we face when >>> blocking on the GPU holding the struct_mutex (inside the client's >>> process), but the GPU is dead. As our locking is very, very coarse we >>> cannot restart the GPU without acquiring the struct_mutex being held by >>> the client so we wake the client up and tell them the resource they are >>> waiting on (the flush of the object from the GPU into the CPU domain) is >>> temporarily unavailable. If they try to immediately wait upon the ioctl >>> again, they are blocked waiting for the reset to occur before they may >>> complete their flush. There are a few other possible deadlocks that are >>> also avoided with EAGAIN (again, the issue is more or less the lack of >>> fine grained locking). >> >> ...so you hijacked EAGAIN for all DRM ioctls just for a driver >> workaround? > > No, we utilized the fact that EAGAIN was already enshrined by libdrm as > the defacto mechanism for repeating the ioctl in order to repeat the > ioctl for a driver workaround. Do we have an agreement here after all? David? I need to know whether this fixup is okay to go cause I'll need to submit to Chrome OS then. Best Regards, Tiago