From mboxrd@z Thu Jan 1 00:00:00 1970 From: Inki Dae Subject: RE: [RFC] deadlock in "drm/exynos: fix wrong pointer access at vm close" Date: Wed, 25 Sep 2013 13:34:30 +0900 Message-ID: <00f401ceb9a8$86dabed0$94903c70$%dae@samsung.com> References: <20130922212911.GU13318@ZenIV.linux.org.uk> <055301ceb831$6f8d42c0$4ea7c840$%dae@samsung.com> <20130923130727.GV13318@ZenIV.linux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mailout2.samsung.com (mailout2.samsung.com [203.254.224.25]) by gabe.freedesktop.org (Postfix) with ESMTP id 1BF49E5D58 for ; Tue, 24 Sep 2013 21:34:32 -0700 (PDT) Received: from epcpsbgr1.samsung.com (u141.gpu120.samsung.co.kr [203.254.230.141]) by mailout2.samsung.com (Oracle Communications Messaging Server 7u4-24.01 (7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MTN006SEZDBHVC0@mailout2.samsung.com> for dri-devel@lists.freedesktop.org; Wed, 25 Sep 2013 13:34:31 +0900 (KST) In-reply-to: Content-language: ko List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org To: 'Inki Dae' , 'Al Viro' Cc: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org > -----Original Message----- > From: Inki Dae [mailto:inki.dae@samsung.com] > Sent: Tuesday, September 24, 2013 1:41 PM > To: 'Al Viro' > Cc: 'YoungJun Cho'; 'dri-devel@lists.freedesktop.org' > Subject: RE: [RFC] deadlock in "drm/exynos: fix wrong pointer access at vm > close" > > Thanks for your comment. > > > -----Original Message----- > > From: Al Viro [mailto:viro@ftp.linux.org.uk] On Behalf Of Al Viro > > Sent: Monday, September 23, 2013 10:07 PM > > To: Inki Dae > > Cc: 'YoungJun Cho'; dri-devel@lists.freedesktop.org > > Subject: Re: [RFC] deadlock in "drm/exynos: fix wrong pointer access at > vm > > close" > > > > On Mon, Sep 23, 2013 at 04:49:30PM +0900, Inki Dae wrote: > > > Hi, > > > > > > > -----Original Message----- > > > > From: Al Viro [mailto:viro@ftp.linux.org.uk] On Behalf Of Al Viro > > > > Sent: Monday, September 23, 2013 6:29 AM > > > > To: YoungJun Cho > > > > Cc: dri-devel@lists.freedesktop.org; Inki Dae > > > > Subject: [RFC] deadlock in "drm/exynos: fix wrong pointer access at > vm > > > > close" > > > > > > > > You have drm_dev->struct_mutex grabbed before ->mmap_sem in > > > > exynos_drm_gem_mmap_ioctl() and after - in exynos_drm_gem_fault() > > > > (since ->fault() is always called with ->mmap_sem held). Looks like > > > > a garden-variety AB-BA deadlock... > > > > > > > > > > Right, if mmap system call is requested by another process between - > > >f_op > > > pointer changing and restoring, the deadlock can be incurred. > > > > > > For this, I think we can resolve this issue like below, > > > > > > At exynos_drm_gem_mmap_ioctl() > > > down_write(&mm->mmap_sem); > > > mutex_lock(&dev->struct_mutex); > > > ... > > > > Umm... If you do it that way, why bother with changing ->private_data > > at all? You can as well stash obj in dev->dev_private->something after > > you've grabbed the mutex and have ->mmap() pick it there... > > I changed ->private_data to a gem object so that mmap_buffer function can > get the gem object. But yes, we can use dev->dev_private->something > instead, and that seems like better way. For this, I missed that we can > get drm_device object from file pointer. > > > > > Said that, I really don't like that approach - both playing with ->f_op > > and the games with private vmas; > > Agree. But is there other better way to support direct mapping; not > needing page fault handler to map, and indirect mapping interfaces; > needing the page fault handler? > > > exynos_gem_get_vma(), AFAICS, calls > > find_vma() (without bothering to hold ->mmap_sem, BTW - there's nothing > > to prevent the result of find_vma() being freed just as it's returned > > I can't see to hold ->mmap_sem when it calls find_vma() anywhere else. > > > to caller) and clones it manually, regardless of whether that vma allows > > to clone itself or not. Quite a few drivers rely on that not > happening... > > > > I think that has no any problem because exynos_gem_get_vma() takes a > reference to vma, and also v4l2 side is using same way. I and v4l2 guys > might be missing something what you are concerning. So Could you give us > more comments? > > And, I found a bug that we never call exynos_gem_put_vma() when userptr is > freed. That should be fixed. > > > IOW, you are already digging deep inside VM guts and this only makes > > it deeper ;-/ > > > > And no, the deadlock doesn't depend on race between ioctl() and mmap() > > from another process; all it takes is > > Yes, there was my wrong comments. That is a mmap > callback(exynos_drm_gem_mmap_buffer) we don't want can be called if we > called mmap system call when ->f_op points to exynos_drm_gem_fops. > > > 1) thread A does clone(), creating thread B that shares address space > with > > it. > > 2) thread A does that ioctl, creating a mapping > > 3) thread A does that ioctl again, creating another mapping, while > thread > > B dereferences an address in the first mapping and triggers a page fault. > > > > The only race is on step 3 in the above; the question about mmap() vs. > > ioctl() was about mmap(2) _during_ that ioctl() hitting > > exynos_drm_gem_mmap_buffer() instead of exynos_drm_gem_mmap() it > would've > > called before or after ioctl(). Here the interesting case is when > > callers of mmap() and ioctl() do *not* share the address space, since > > in that case mmap(2) won't notice ->mmap_sem held by you - it's on the > > different mm_struct, so mmap(2) will get to calling the ->f_op->mmap() > > without waiting for you to restore ->f_op... > > Yes, it's a problem. When two more threads are running, and when thread A > called mmap system call, exynos_drm_gem_mmap_buffer() will be called if > thread B called ioctl() first and f_op was not restored yet. I have no > idea to resolve this issue yet. So do you have any good idea? At any rate, > we should fix this issue. > > > > > For another bug in the same area, try building that driver modular and > > watch what happens to use count after a round of this switching ->f_op > and > > restoring it back to original; fops_get() in there is wrong and AFAICS > > pointless. > > Yes, it would also be another bug. Will check it out. It seems that it's > not good way to change and restore ->f_op, and this way makes mmap part to > be complicated. I will try to find a way to resolve this issue or to find > another better way to support direct and indirect mapping interfaces. > It seems that we can use a new anon file instead of using drm file to resolve the issue. Thanks, Inki Dae > I'd appreciate all the advices I can get. > > Thanks, > Inki Dae