From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tvrtko Ursulin Subject: Re: [PATCH] drm/i915: Prevent recursive deadlock on releasing a busy userptr Date: Mon, 01 Sep 2014 14:23:17 +0100 Message-ID: <54047345.1020201@linux.intel.com> References: <1407417640-8493-1-git-send-email-chris@chris-wilson.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTP id 6B3976E2B3 for ; Mon, 1 Sep 2014 06:23:20 -0700 (PDT) In-Reply-To: <1407417640-8493-1-git-send-email-chris@chris-wilson.co.uk> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson , intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On 08/07/2014 02:20 PM, Chris Wilson wrote: > During release of the GEM object we hold the struct_mutex. As the > object may be holding onto the last reference for the task->mm, > calling mmput() may trigger exit_mmap() which close the vma > which will call drm_gem_vm_close() and attempt to reacquire > the struct_mutex. In order to avoid that recursion, we have > to defer the mmput() until after we drop the struct_mutex, > i.e. we need to schedule a worker to do the clean up. A further issue > spotted by Tvrtko was caused when we took a GTT mmapping of a userptr > buffer object. In that case, we would never call mmput as the object > would be cyclically referenced by the GTT mmapping and not freed upon > process exit - keeping the entire process mm alive after the process > task was reaped. The fix employed is to replace the mm_users/mmput() > reference handling to mm_count/mmdrop() for the shared i915_mm_struct. > > INFO: task test_surfaces:1632 blocked for more than 120 seconds. > Tainted: GF O 3.14.5+ #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > test_surfaces D 0000000000000000 0 1632 1590 0x00000082 > ffff88014914baa8 0000000000000046 0000000000000000 ffff88014914a010 > 0000000000012c40 0000000000012c40 ffff8800a0058210 ffff88014784b010 > ffff88014914a010 ffff880037b1c820 ffff8800a0058210 ffff880037b1c824 > Call Trace: > [] schedule+0x29/0x70 > [] schedule_preempt_disabled+0xe/0x10 > [] __mutex_lock_slowpath+0x183/0x220 > [] mutex_lock+0x23/0x40 > [] drm_gem_vm_close+0x33/0x70 [drm] > [] remove_vma+0x33/0x70 > [] exit_mmap+0x11c/0x170 > [] mmput+0x6b/0x100 > [] i915_gem_userptr_release+0x89/0xc0 [i915] > [] i915_gem_free_object+0x126/0x250 [i915] > [] drm_gem_object_free+0x2a/0x40 [drm] > [] drm_gem_object_handle_unreference_unlocked+0xe2/0x120 [drm] > [] drm_gem_object_release_handle+0x64/0x90 [drm] > [] idr_for_each+0xab/0x100 > [] ? drm_gem_object_handle_unreference_unlocked+0x120/0x120 [drm] > [] ? mutex_lock+0x16/0x40 > [] drm_gem_release+0x24/0x40 [drm] > [] drm_release+0x3fb/0x480 [drm] > [] __fput+0xb2/0x260 > [] ____fput+0xe/0x10 > [] task_work_run+0x8f/0xf0 > [] do_exit+0x1a8/0x480 > [] do_group_exit+0x51/0xc0 > [] SyS_exit_group+0x17/0x20 > [] system_call_fastpath+0x16/0x1b > > v2: Incorporate feedback from Tvrtko and remove the unnessary mm > referencing when creating the i915_mm_struct and improve some of the > function names and comments. [snip] I reviewed this some weeks back and did not spot any issues. Reviewed-by: Tvrtko Ursulin Thanks, Tvrtko