From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tvrtko Ursulin Subject: Re: [PATCH 2/2] drm/i915: Prevent recursive deadlock on releasing a busy userptr Date: Tue, 29 Jul 2014 10:39:55 +0100 Message-ID: <53D76BEB.2020001@linux.intel.com> References: <1405945284-24670-1-git-send-email-chris@chris-wilson.co.uk> <1405945284-24670-2-git-send-email-chris@chris-wilson.co.uk> <53CFE555.1060502@linux.intel.com> <20140723171540.GD29372@nuc-i3427.alporthouse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mga03.intel.com (mga03.intel.com [143.182.124.21]) by gabe.freedesktop.org (Postfix) with ESMTP id 693D16E122 for ; Tue, 29 Jul 2014 02:39:58 -0700 (PDT) In-Reply-To: <20140723171540.GD29372@nuc-i3427.alporthouse.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson , intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On 07/23/2014 06:15 PM, Chris Wilson wrote: > On Wed, Jul 23, 2014 at 05:39:49PM +0100, Tvrtko Ursulin wrote: >> On 07/21/2014 01:21 PM, Chris Wilson wrote: >>> + mn = i915_mmu_notifier_get(obj->userptr.mm); >>> + if (IS_ERR(mn)) >>> + return PTR_ERR(mn); >> >> Very minor, but I would perhaps consider renaming this to _find >> since _get in my mind strongly associates with reference counting >> and this does not do that. Especially if the reviewer looks at the >> bail out below and sees no matching put. But minor as I said, you >> can judge what you prefer. > > The same. It was _get because it did used to a reference counter, now > that counting has been removed from the i915_mmu_notifier. > >>> +static int >>> +i915_gem_userptr_init__mm_struct(struct drm_i915_gem_object *obj) >>> +{ >>> + struct drm_i915_private *dev_priv = to_i915(obj->base.dev); >>> + struct i915_mm_struct *mm; >>> + struct mm_struct *real; >>> + int ret = 0; >>> + >>> + real = get_task_mm(current); >>> + if (real == NULL) >>> + return -EINVAL; >> >> Do you think we need get_task_mm()/mmput() here, given it is all >> inside a single system call? > > No. I kept using get_task_mm() simply because it looked neater than > current->mm, but current->mm looks like it gives simpler code. > >>> + /* During release of the GEM object we hold the struct_mutex. As the >>> + * object may be holding onto the last reference for the task->mm, >>> + * calling mmput() may trigger exit_mmap() which close the vma >>> + * which will call drm_gem_vm_close() and attempt to reacquire >>> + * the struct_mutex. In order to avoid that recursion, we have >>> + * to defer the mmput() until after we drop the struct_mutex, >>> + * i.e. we need to schedule a worker to do the clean up. >>> + */ >> >> This comment reads like a strange mixture and past and present eg. >> what used to be the case and what is the fix. We don't hold a >> reference to the process mm as the address space (terminology OK?). >> We do hold a reference to the mm struct itself - which is enough to >> unregister the notifiers, correct? > > True. I was more or less trying to explain the bug and that comment > ended up being the changelog entry. It doesn't work well as a comment. > > + /* During release of the GEM object we hold the struct_mutex. This > + * precludes us from calling mmput() at that time as that may be > + * the last reference and so call exit_mmap(). exit_mmap() will > + * attempt to reap the vma, and if we were holding a GTT mmap > + * would then call drm_gem_vm_close() and attempt to reacquire > + * the struct mutex. So in order to avoid that recursion, we have > + * to defer releasing the mm reference until after we drop the > + * struct_mutex, i.e. we need to schedule a worker to do the clean > + * up. Sounds good, just saying really to remind you to post a respin. :) Regards, Tvrtko