From mboxrd@z Thu Jan  1 00:00:00 1970
From: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Subject: Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
Date: Thu, 12 Sep 2013 17:45:17 +0200
Message-ID: <5231E18D.7070306@canonical.com>
References: <20130912150645.GZ31370@twins.programming.kicks-ass.net>
	<CAKMK7uFF2GkYapa2_DLv4WYAd0t+POaT+Vy9HG=HQhzMmtH9nA@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org>
In-Reply-To: <CAKMK7uFF2GkYapa2_DLv4WYAd0t+POaT+Vy9HG=HQhzMmtH9nA@mail.gmail.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
To: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Thomas Hellstrom <thellstrom@vmware.com>, Peter Zijlstra <peterz@infradead.org>, intel-gfx <intel-gfx@lists.freedesktop.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, dri-devel <dri-devel@lists.freedesktop.org>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@kernel.org>
List-Id: intel-gfx@lists.freedesktop.org

Op 12-09-13 17:36, Daniel Vetter schreef:
> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> So I'm poking around the preemption code and stumbled upon:
>>
>> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>>
>> All these sites basically do:
>>
>>   while (!trylock())
>>         yield();
>>
>> which is a horrible and broken locking pattern.
>>
>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>> task that preempted the lock holder at FIFOn.
>>
>> Secondly the implementation is worse than usual by abusing
>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>> doesn't retry, but you're using it as a get out of fault path. And
>> you're using set_need_resched() which is not something a driver should
>> _ever_ touch.
>>
>> Now I'm going to take away set_need_resched() -- and while you can
>> 'reimplement' it using set_thread_flag() you're not going to do that
>> because it will be broken due to changes to the preempt code.
>>
>> So please as to fix ASAP and don't allow anybody to trick you into
>> merging silly things like that again ;-)
> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
> removed. It was there to give the error handler a chance to sneak in
> and reset the hw/sw tracking when the gpu is dead. That hack goes back
> to the days when the locking around our error handler was somewhere
> between nonexistent and totally broken, nowadays we keep things from
> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
> whip up a patch to rip this out. I'll also check that our testsuite
> properly exercises this path (needs a bit of work on a quick look for
> better coverage).
>
> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
> into it's own pagefault handler and then deadlock, the trylock just
> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
> fun userspace did and now have testcases for them. The right solution
> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
> holds locks and have slowpaths which drops locks, copies stuff into a
> temp allocation and then continues. At least that's how we've fixed
> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
Ah the case where a mmap'd address is passed to the execbuf ioctl? :P

Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers..
> The one in udl just looks like copypasta from i915, without any
> justification (at least I don't see any) for why it's there. Probably
> can die, too, since there isn't any gpu to reset on usb display-link
> devices ...
>
> Cheers, Daniel