From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Hellstrom Subject: Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE Date: Thu, 12 Sep 2013 22:48:15 +0200 Message-ID: <5232288F.4070904@vmware.com> References: <20130912150645.GZ31370@twins.programming.kicks-ass.net> <20130912154329.GB31370@twins.programming.kicks-ass.net> <20130912162210.GE31370@twins.programming.kicks-ass.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Thomas Gleixner Cc: Peter Zijlstra , Daniel Vetter , intel-gfx , Linux Kernel Mailing List , dri-devel , Dave Airlie , Maarten Lankhorst , Ingo Molnar List-Id: intel-gfx@lists.freedesktop.org On 09/12/2013 10:39 PM, Thomas Gleixner wrote: > On Thu, 12 Sep 2013, Daniel Vetter wrote: > >> On Thu, Sep 12, 2013 at 10:20 PM, Thomas Gleixner wrote: >>>> I think for ttm drivers it's just execbuf being exploitable. But on >>>> drm/i915 we've >>>> had the same issue with the pwrite/pread ioctls, so a simple >>>> glBufferData(glMap) kind of recursion from gl clients blew the kernel >>>> to pieces ... >>> And the only answer you folks came up with is set_need_resched() and >>> yield()? Oh well.... >> The yield was for a different lifelock, and that one is also fixed by >> now. The fault handler deadlock was fixed in the usual "drop locks and >> jump into slowpath" fasion, at least in drm/i915. > So we can remove that whole yield/set_need_resched() mess completely ? > > Thanks, > > tglx No. The while(trylock) is there to address a potential locking inversion deadlock. If the trylock fails, the code returns to user-space which retries the fault. This code needs to stay until we can come up with either a way to drop the mmap_sem and sleep before returning to user-space, or a bunch of code is fixed with a different locking order. The set_need_resched() can (and should according to Peter) go. /Thomas