From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jani Nikula Subject: Re: [PATCH] drm/i915: Prevent recursive deadlock on releasing a busy userptr Date: Tue, 02 Sep 2014 15:48:42 +0300 Message-ID: <8738cai3x1.fsf@intel.com> References: <1407417640-8493-1-git-send-email-chris@chris-wilson.co.uk> <54047345.1020201@linux.intel.com> <20140902093500.GX15520@phenom.ffwll.local> <20140902093853.GB25238@nuc-i3427.alporthouse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTP id 1D7C96E46B for ; Tue, 2 Sep 2014 05:48:45 -0700 (PDT) In-Reply-To: <20140902093853.GB25238@nuc-i3427.alporthouse.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson , Daniel Vetter Cc: intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On Tue, 02 Sep 2014, Chris Wilson wrote: > On Tue, Sep 02, 2014 at 11:35:00AM +0200, Daniel Vetter wrote: >> On Mon, Sep 01, 2014 at 02:23:17PM +0100, Tvrtko Ursulin wrote: >> > >> > On 08/07/2014 02:20 PM, Chris Wilson wrote: >> > >During release of the GEM object we hold the struct_mutex. As the >> > >object may be holding onto the last reference for the task->mm, >> > >calling mmput() may trigger exit_mmap() which close the vma >> > >which will call drm_gem_vm_close() and attempt to reacquire >> > >the struct_mutex. In order to avoid that recursion, we have >> > >to defer the mmput() until after we drop the struct_mutex, >> > >i.e. we need to schedule a worker to do the clean up. A further issue >> > >spotted by Tvrtko was caused when we took a GTT mmapping of a userptr >> > >buffer object. In that case, we would never call mmput as the object >> > >would be cyclically referenced by the GTT mmapping and not freed upon >> > >process exit - keeping the entire process mm alive after the process >> > >task was reaped. The fix employed is to replace the mm_users/mmput() >> > >reference handling to mm_count/mmdrop() for the shared i915_mm_struct. >> > > >> > > INFO: task test_surfaces:1632 blocked for more than 120 seconds. >> > > Tainted: GF O 3.14.5+ #1 >> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> > > test_surfaces D 0000000000000000 0 1632 1590 0x00000082 >> > > ffff88014914baa8 0000000000000046 0000000000000000 ffff88014914a010 >> > > 0000000000012c40 0000000000012c40 ffff8800a0058210 ffff88014784b010 >> > > ffff88014914a010 ffff880037b1c820 ffff8800a0058210 ffff880037b1c824 >> > > Call Trace: >> > > [] schedule+0x29/0x70 >> > > [] schedule_preempt_disabled+0xe/0x10 >> > > [] __mutex_lock_slowpath+0x183/0x220 >> > > [] mutex_lock+0x23/0x40 >> > > [] drm_gem_vm_close+0x33/0x70 [drm] >> > > [] remove_vma+0x33/0x70 >> > > [] exit_mmap+0x11c/0x170 >> > > [] mmput+0x6b/0x100 >> > > [] i915_gem_userptr_release+0x89/0xc0 [i915] >> > > [] i915_gem_free_object+0x126/0x250 [i915] >> > > [] drm_gem_object_free+0x2a/0x40 [drm] >> > > [] drm_gem_object_handle_unreference_unlocked+0xe2/0x120 [drm] >> > > [] drm_gem_object_release_handle+0x64/0x90 [drm] >> > > [] idr_for_each+0xab/0x100 >> > > [] ? drm_gem_object_handle_unreference_unlocked+0x120/0x120 [drm] >> > > [] ? mutex_lock+0x16/0x40 >> > > [] drm_gem_release+0x24/0x40 [drm] >> > > [] drm_release+0x3fb/0x480 [drm] >> > > [] __fput+0xb2/0x260 >> > > [] ____fput+0xe/0x10 >> > > [] task_work_run+0x8f/0xf0 >> > > [] do_exit+0x1a8/0x480 >> > > [] do_group_exit+0x51/0xc0 >> > > [] SyS_exit_group+0x17/0x20 >> > > [] system_call_fastpath+0x16/0x1b >> > > >> > >v2: Incorporate feedback from Tvrtko and remove the unnessary mm >> > >referencing when creating the i915_mm_struct and improve some of the >> > >function names and comments. >> > >> > [snip] >> > >> > I reviewed this some weeks back and did not spot any issues. >> > >> > Reviewed-by: Tvrtko Ursulin >> Queued for -next, thanks for the patch. Aside: Is there other userptr >> stuff outstanding? I've definitely lost track of them :( > > The one I posted with the extra r-b and t-b has a minor tweak to kill > the unused error code during init. but otherwise this is the last > outstanding patch. This was referenced from [1], do we need a stable backport? BR, Jani. [1] https://bugs.freedesktop.org/show_bug.cgi?id=80745 > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Jani Nikula, Intel Open Source Technology Center