From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Hellstrom Subject: Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE Date: Thu, 12 Sep 2013 18:33:33 +0200 Message-ID: <5231ECDD.2050108@vmware.com> References: <20130912150645.GZ31370@twins.programming.kicks-ass.net> <20130912154329.GB31370@twins.programming.kicks-ass.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org To: Daniel Vetter Cc: Peter Zijlstra , intel-gfx , Linux Kernel Mailing List , dri-devel , Thomas Gleixner , Ingo Molnar List-Id: dri-devel@lists.freedesktop.org On 09/12/2013 05:58 PM, Daniel Vetter wrote: > On Thu, Sep 12, 2013 at 5:43 PM, Peter Zijlstra wrote: >>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse >>> into it's own pagefault handler and then deadlock, the trylock just >>> keeps lockdep quiet. Could you describe how it could recurse into it's own pagefault handler? IIRC the VM flags of the TTM VMAs makes get_user_pages() refrain from touching these VMAs, hence I don't think this code can deadlock, but admittedly it's far from the optimal solution. Never mind, more on the set_need_resched() below. >>> We've had that bug arise in drm/i915 due to some >>> fun userspace did and now have testcases for them. The right solution >>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it >>> holds locks and have slowpaths which drops locks, copies stuff into a >>> temp allocation and then continues. At least that's how we've fixed >>> all those inversions in i915-gem. I'm not volunteering to fix this ;-) >> Yikes.. so how common is it? If I simply rip the set_need_resched() out >> it will 'spin' on the fault a little longer until a 'natural' preemption >> point -- if such a thing is every going to happen. A typical case is if a process is throwing out a buffer from the GPU or system memory while another process pagefaults while writing to it. It's not a common situation, and it's by no means a fastpath situation. For correctness purposes, I think set_need_resched() can be safely removed. > It's a case of "our userspace doesn't do this", so as long as you're > not evil and frob the drm device nodes of ttm drivers directly the > deadlock will never happen. No idea how much contention actually > happens on e.g. shared buffer objects - in i915 we have just one lock > and so suffer quite a bit more from contention. So no idea how much > removing the yield would hurt. > -Daniel /Thomas