linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] hugetlbfs: lockdep annotate root inode properly
@ 2012-03-08  9:15 Aneesh Kumar K.V
  2012-03-08 21:02 ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-08  9:15 UTC (permalink / raw)
  To: linux-mm, akpm, davej, jboyer, tyhicks; +Cc: linux-kernel, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This fix the below lockdep warning

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.3.0-rc4+ #190 Not tainted
 -------------------------------------------------------
 shared/1568 is trying to acquire lock:
  (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108

 but task is already holding lock:
  (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f

 which lock already depends on the new lock.


 the existing dependency chain (in reverse order) is:

 -> #1 (&mm->mmap_sem){++++++}:
        [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
        [<ffffffff810ee439>] might_fault+0x6d/0x90
        [<ffffffff8111bc12>] filldir+0x6a/0xc2
        [<ffffffff81129942>] dcache_readdir+0x5c/0x222
        [<ffffffff8111be58>] vfs_readdir+0x76/0xac
        [<ffffffff8111bf6a>] sys_getdents+0x79/0xc9
        [<ffffffff816940a2>] system_call_fastpath+0x16/0x1b

 -> #0 (&sb->s_type->i_mutex_key#12){+.+.+.}:
        [<ffffffff8109f40a>] __lock_acquire+0xa6c/0xd60
        [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
        [<ffffffff816916be>] __mutex_lock_common+0x48/0x350
        [<ffffffff81691a85>] mutex_lock_nested+0x2a/0x31
        [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
        [<ffffffff810f4fd0>] mmap_region+0x26f/0x466
        [<ffffffff810f545b>] do_mmap_pgoff+0x294/0x2ee
        [<ffffffff810f55a9>] sys_mmap_pgoff+0xf4/0x12f
        [<ffffffff8103d1f2>] sys_mmap+0x1d/0x1f
        [<ffffffff816940a2>] system_call_fastpath+0x16/0x1b

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&mm->mmap_sem);
                                lock(&sb->s_type->i_mutex_key#12);
                                lock(&mm->mmap_sem);
   lock(&sb->s_type->i_mutex_key#12);

  *** DEADLOCK ***

 1 lock held by shared/1568:
  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f

 stack backtrace:
 Pid: 1568, comm: shared Not tainted 3.3.0-rc4+ #190
 Call Trace:
  [<ffffffff81688bf9>] print_circular_bug+0x1f8/0x209
  [<ffffffff8109f40a>] __lock_acquire+0xa6c/0xd60
  [<ffffffff8110e7b6>] ? files_lglock_local_lock_cpu+0x61/0x61
  [<ffffffff811efa0f>] ? hugetlbfs_file_mmap+0x7d/0x108
  [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
  [<ffffffff811efa0f>] ? hugetlbfs_file_mmap+0x7d/0x108

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/hugetlbfs/inode.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

NOTE: This patch also require 
http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
to remove the lockdep warning

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3645cd3..ca4fa70 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -459,6 +459,7 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb,
 		inode->i_fop = &simple_dir_operations;
 		/* directory inodes start off with i_nlink == 2 (for "." entry) */
 		inc_nlink(inode);
+		lockdep_annotate_inode_mutex_key(inode);
 	}
 	return inode;
 }
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
  2012-03-08  9:15 [PATCH] hugetlbfs: lockdep annotate root inode properly Aneesh Kumar K.V
@ 2012-03-08 21:02 ` Andrew Morton
  2012-03-08 21:10   ` Dave Jones
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Andrew Morton @ 2012-03-08 21:02 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, davej, jboyer, tyhicks, linux-kernel, Al Viro,
	Peter Zijlstra, Mimi Zohar

On Thu,  8 Mar 2012 14:45:16 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This fix the below lockdep warning

OK, what's going on here.

>  ======================================================
>  [ INFO: possible circular locking dependency detected ]
>  3.3.0-rc4+ #190 Not tainted
>  -------------------------------------------------------
>  shared/1568 is trying to acquire lock:
>   (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
> 
>  but task is already holding lock:
>   (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
> 
>  which lock already depends on the new lock.
> 
> 
>  the existing dependency chain (in reverse order) is:
> 
>  -> #1 (&mm->mmap_sem){++++++}:
>         [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
>         [<ffffffff810ee439>] might_fault+0x6d/0x90
>         [<ffffffff8111bc12>] filldir+0x6a/0xc2
>         [<ffffffff81129942>] dcache_readdir+0x5c/0x222
>         [<ffffffff8111be58>] vfs_readdir+0x76/0xac
>         [<ffffffff8111bf6a>] sys_getdents+0x79/0xc9
>         [<ffffffff816940a2>] system_call_fastpath+0x16/0x1b
> 
>  -> #0 (&sb->s_type->i_mutex_key#12){+.+.+.}:
>         [<ffffffff8109f40a>] __lock_acquire+0xa6c/0xd60
>         [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
>         [<ffffffff816916be>] __mutex_lock_common+0x48/0x350
>         [<ffffffff81691a85>] mutex_lock_nested+0x2a/0x31
>         [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
>         [<ffffffff810f4fd0>] mmap_region+0x26f/0x466
>         [<ffffffff810f545b>] do_mmap_pgoff+0x294/0x2ee
>         [<ffffffff810f55a9>] sys_mmap_pgoff+0xf4/0x12f
>         [<ffffffff8103d1f2>] sys_mmap+0x1d/0x1f
>         [<ffffffff816940a2>] system_call_fastpath+0x16/0x1b
> 
>  other info that might help us debug this:
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&mm->mmap_sem);
>                                 lock(&sb->s_type->i_mutex_key#12);
>                                 lock(&mm->mmap_sem);
>    lock(&sb->s_type->i_mutex_key#12);
> 
>   *** DEADLOCK ***
> 
>  1 lock held by shared/1568:
>   #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
> 
>  stack backtrace:
>  Pid: 1568, comm: shared Not tainted 3.3.0-rc4+ #190
>  Call Trace:
>   [<ffffffff81688bf9>] print_circular_bug+0x1f8/0x209
>   [<ffffffff8109f40a>] __lock_acquire+0xa6c/0xd60
>   [<ffffffff8110e7b6>] ? files_lglock_local_lock_cpu+0x61/0x61
>   [<ffffffff811efa0f>] ? hugetlbfs_file_mmap+0x7d/0x108
>   [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
>   [<ffffffff811efa0f>] ? hugetlbfs_file_mmap+0x7d/0x108
> 

Why have these lockdep warnings started coming out now - was the VFS
changed to newly take i_mutex somewhere in the directory handling?


Sigh.  Was lockdep_annotate_inode_mutex_key() sufficiently
self-explanatory to justify leaving it undocumented?

<goes off and reads e096d0c7e2e>

OK, the patch looks correct given the explanation in e096d0c7e2e, but
I'd like to understand why it becomes necessary only now.

> NOTE: This patch also require 
> http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> to remove the lockdep warning

And that patch has been basically ignored.

Sigh.  I guess I'll grab both patches, but I'm not confident in doing
so without an overall explanation of what is happening here.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
  2012-03-08 21:02 ` Andrew Morton
@ 2012-03-08 21:10   ` Dave Jones
  2012-03-08 21:19   ` Tyler Hicks
  2012-03-08 21:44   ` Al Viro
  2 siblings, 0 replies; 15+ messages in thread
From: Dave Jones @ 2012-03-08 21:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Aneesh Kumar K.V, linux-mm, jboyer, tyhicks, linux-kernel,
	Al Viro, Peter Zijlstra, Mimi Zohar

On Thu, Mar 08, 2012 at 01:02:56PM -0800, Andrew Morton wrote:

 > >  ======================================================
 > >  [ INFO: possible circular locking dependency detected ]
 > >  3.3.0-rc4+ #190 Not tainted
 > >  -------------------------------------------------------
 > >  shared/1568 is trying to acquire lock:
 > >   (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
 > > 
 > >  but task is already holding lock:
 > >   (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
 > > 
 > >  which lock already depends on the new lock.
 > > 
  > 
 > Why have these lockdep warnings started coming out now - was the VFS
 > changed to newly take i_mutex somewhere in the directory handling?

This has been happening for almost a year!
https://lkml.org/lkml/2011/4/15/272
See also https://lkml.org/lkml/2012/2/16/498

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
  2012-03-08 21:02 ` Andrew Morton
  2012-03-08 21:10   ` Dave Jones
@ 2012-03-08 21:19   ` Tyler Hicks
  2012-03-08 21:40     ` Andrew Morton
  2012-03-08 21:44   ` Al Viro
  2 siblings, 1 reply; 15+ messages in thread
From: Tyler Hicks @ 2012-03-08 21:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Aneesh Kumar K.V, linux-mm, davej, jboyer, linux-kernel, Al Viro,
	Peter Zijlstra, Mimi Zohar

[-- Attachment #1: Type: text/plain, Size: 4169 bytes --]

On 2012-03-08 13:02:56, Andrew Morton wrote:
> On Thu,  8 Mar 2012 14:45:16 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > This fix the below lockdep warning
> 
> OK, what's going on here.
> 
> >  ======================================================
> >  [ INFO: possible circular locking dependency detected ]
> >  3.3.0-rc4+ #190 Not tainted
> >  -------------------------------------------------------
> >  shared/1568 is trying to acquire lock:
> >   (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
> > 
> >  but task is already holding lock:
> >   (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
> > 
> >  which lock already depends on the new lock.
> > 
> > 
> >  the existing dependency chain (in reverse order) is:
> > 
> >  -> #1 (&mm->mmap_sem){++++++}:
> >         [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
> >         [<ffffffff810ee439>] might_fault+0x6d/0x90
> >         [<ffffffff8111bc12>] filldir+0x6a/0xc2
> >         [<ffffffff81129942>] dcache_readdir+0x5c/0x222
> >         [<ffffffff8111be58>] vfs_readdir+0x76/0xac
> >         [<ffffffff8111bf6a>] sys_getdents+0x79/0xc9
> >         [<ffffffff816940a2>] system_call_fastpath+0x16/0x1b
> > 
> >  -> #0 (&sb->s_type->i_mutex_key#12){+.+.+.}:
> >         [<ffffffff8109f40a>] __lock_acquire+0xa6c/0xd60
> >         [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
> >         [<ffffffff816916be>] __mutex_lock_common+0x48/0x350
> >         [<ffffffff81691a85>] mutex_lock_nested+0x2a/0x31
> >         [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
> >         [<ffffffff810f4fd0>] mmap_region+0x26f/0x466
> >         [<ffffffff810f545b>] do_mmap_pgoff+0x294/0x2ee
> >         [<ffffffff810f55a9>] sys_mmap_pgoff+0xf4/0x12f
> >         [<ffffffff8103d1f2>] sys_mmap+0x1d/0x1f
> >         [<ffffffff816940a2>] system_call_fastpath+0x16/0x1b
> > 
> >  other info that might help us debug this:
> > 
> >   Possible unsafe locking scenario:
> > 
> >         CPU0                    CPU1
> >         ----                    ----
> >    lock(&mm->mmap_sem);
> >                                 lock(&sb->s_type->i_mutex_key#12);
> >                                 lock(&mm->mmap_sem);
> >    lock(&sb->s_type->i_mutex_key#12);
> > 
> >   *** DEADLOCK ***
> > 
> >  1 lock held by shared/1568:
> >   #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
> > 
> >  stack backtrace:
> >  Pid: 1568, comm: shared Not tainted 3.3.0-rc4+ #190
> >  Call Trace:
> >   [<ffffffff81688bf9>] print_circular_bug+0x1f8/0x209
> >   [<ffffffff8109f40a>] __lock_acquire+0xa6c/0xd60
> >   [<ffffffff8110e7b6>] ? files_lglock_local_lock_cpu+0x61/0x61
> >   [<ffffffff811efa0f>] ? hugetlbfs_file_mmap+0x7d/0x108
> >   [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
> >   [<ffffffff811efa0f>] ? hugetlbfs_file_mmap+0x7d/0x108
> > 
> 
> Why have these lockdep warnings started coming out now - was the VFS
> changed to newly take i_mutex somewhere in the directory handling?

I'm not sure that they're new warnings. My patch (linked to below) may
have just gave folks a false hope that their nagging lockdep problems
are over.

> 
> 
> Sigh.  Was lockdep_annotate_inode_mutex_key() sufficiently
> self-explanatory to justify leaving it undocumented?
> 
> <goes off and reads e096d0c7e2e>
> 
> OK, the patch looks correct given the explanation in e096d0c7e2e, but
> I'd like to understand why it becomes necessary only now.
> 
> > NOTE: This patch also require 
> > http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> > to remove the lockdep warning
> 
> And that patch has been basically ignored.

Al commented on it here:

https://lkml.org/lkml/2012/2/16/518

He said that while my patch is correct, taking i_mutex inside mmap_sem
is still wrong.

Tyler

> 
> Sigh.  I guess I'll grab both patches, but I'm not confident in doing
> so without an overall explanation of what is happening here.
> 
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
  2012-03-08 21:19   ` Tyler Hicks
@ 2012-03-08 21:40     ` Andrew Morton
  2012-03-08 21:49       ` Al Viro
  2012-03-09  5:03       ` Aneesh Kumar K.V
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2012-03-08 21:40 UTC (permalink / raw)
  To: Tyler Hicks
  Cc: Aneesh Kumar K.V, linux-mm, davej, jboyer, linux-kernel, Al Viro,
	Peter Zijlstra, Mimi Zohar, David Gibson

On Thu, 8 Mar 2012 15:19:27 -0600
Tyler Hicks <tyhicks@canonical.com> wrote:

> > 
> > 
> > Sigh.  Was lockdep_annotate_inode_mutex_key() sufficiently
> > self-explanatory to justify leaving it undocumented?
> > 
> > <goes off and reads e096d0c7e2e>
> > 
> > OK, the patch looks correct given the explanation in e096d0c7e2e, but
> > I'd like to understand why it becomes necessary only now.
> > 
> > > NOTE: This patch also require 
> > > http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> > > to remove the lockdep warning
> > 
> > And that patch has been basically ignored.
> 
> Al commented on it here:
> 
> https://lkml.org/lkml/2012/2/16/518
> 
> He said that while my patch is correct, taking i_mutex inside mmap_sem
> is still wrong.

OK, thanks, yup.  Taking i_mutex in file_operations.mmap() is wrong.

Is hugetlbfs actually deadlockable because of this, or is it the case
that the i_mutex->mmap_sem ordering happens to never happen for this
filesystem?  Although we shouldn't go and create incompatible lock
ranking rules for different filesystems!

So we need to pull the i_mutex out of hugetlbfs_file_mmap().  What's it
actually trying to do in there?  If we switch to
i_size_read()/i_size_write() then AFAICT the problem comes down to
hugetlb_reserve_pages().

hugetlb_reserve_pages() fiddles with i_mapping->private_list and the fs
owns private_list and is free to use a lock other than i_mutex to
protect it.  (In fact i_mapping.private_lock is the usual lock for
private_list).



So from a quick scan here I'm thinking that a decent fix is to remove
the i_mutex locking from hugetlbfs_file_mmap(), switch
hugetlbfs_file_mmap() to i_size_read/write then use a hugetlb-private
lock to protect i_mapping->private_list.  region_chg() will do
GFP_KERNEL allocations under that lock, so some care is needed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
  2012-03-08 21:02 ` Andrew Morton
  2012-03-08 21:10   ` Dave Jones
  2012-03-08 21:19   ` Tyler Hicks
@ 2012-03-08 21:44   ` Al Viro
  2012-03-08 22:44     ` Peter Zijlstra
  2 siblings, 1 reply; 15+ messages in thread
From: Al Viro @ 2012-03-08 21:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Aneesh Kumar K.V, linux-mm, davej, jboyer, tyhicks, linux-kernel,
	Peter Zijlstra, Mimi Zohar

On Thu, Mar 08, 2012 at 01:02:56PM -0800, Andrew Morton wrote:
> > This fix the below lockdep warning
> 
> OK, what's going on here.

Deadlock in hugetlbfs mmap getting misreported.

One last time: ->mmap_sem nests inside ->i_mutex.  Both for regular
files and for directories.  Always had.

For directories there's copy_to_user() from ->readdir() done under ->i_mutex.
For regular files there's copy_from_user() from ->write(), usually done under
->i_mutex.  On hugetlbfs there's copy_to_user() from ->read() done under
->i_mutex.

It had not changed at all.  Lockdep sees both call chains; the only question
is which chain is seen first.  And usually reading a directory happens earlier
in the boot than writing into a file.  That's all there is to it.

Unfortunately, the fact that call chain being reported is obviously about
directories leads to false hopes that deadlock doesn't exist - mmap()
obviously can't happen to a directory inode, so people hope that it's a
false positive.  It isn't.

Patch separating directory and non-directory ->i_mutex into different classes
went in at some point, precisely due to those hopes.  It had a braino that
made it useless.  Fix for that braino had been posted and sits my queue; I'll
push it to Linus along with other pending fixes tonight.

It will *not* eliminate the (very real) deadlock.  It might make the warning
go away, but only if read() on hugetlbfs files doesn't happen during boot.

I suspect that they right thing would be to have a way to set explicit
nesting rules, not tied to speficic call trace.  I hadn't looked into
lockdep guts, so no idea how much will that hurt to implement.  As in
lockdep_lock_nests(class_outer, class_inner, message), acting as if
there had been a call chain where class_outer had been taken before
class_inner, with message going in place of call trace for that chain
when we run into a conflict...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
  2012-03-08 21:40     ` Andrew Morton
@ 2012-03-08 21:49       ` Al Viro
  2012-03-08 22:19         ` Andrew Morton
  2012-03-09  5:03       ` Aneesh Kumar K.V
  1 sibling, 1 reply; 15+ messages in thread
From: Al Viro @ 2012-03-08 21:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tyler Hicks, Aneesh Kumar K.V, linux-mm, davej, jboyer,
	linux-kernel, Peter Zijlstra, Mimi Zohar, David Gibson

On Thu, Mar 08, 2012 at 01:40:50PM -0800, Andrew Morton wrote:

> OK, thanks, yup.  Taking i_mutex in file_operations.mmap() is wrong.

... or in .release() (munmap() does fput() under mmap_sem).

> Is hugetlbfs actually deadlockable because of this, or is it the case
> that the i_mutex->mmap_sem ordering happens to never happen for this
> filesystem?

Yes, it is.  Look at read(2) on hugetlbfs; it copies userland data
while holding ->i_mutex.  So we have

read(2):
mutex_lock(&A)
down_read(&B)

mmap(2):
down_write(&B);
mutex_lock(&A);

which is an obvious deadlock.

> So we need to pull the i_mutex out of hugetlbfs_file_mmap().

IIRC, you have a patch in your tree doing just that...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
  2012-03-08 21:49       ` Al Viro
@ 2012-03-08 22:19         ` Andrew Morton
  2012-03-08 22:33           ` Dave Jones
  2012-03-09  5:00           ` Aneesh Kumar K.V
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2012-03-08 22:19 UTC (permalink / raw)
  To: Al Viro
  Cc: Tyler Hicks, Aneesh Kumar K.V, linux-mm, davej, jboyer,
	linux-kernel, Peter Zijlstra, Mimi Zohar, David Gibson

On Thu, 8 Mar 2012 21:49:52 +0000
Al Viro <viro@ZenIV.linux.org.uk> wrote:

> > So we need to pull the i_mutex out of hugetlbfs_file_mmap().
> 
> IIRC, you have a patch in your tree doing just that...

Nope.

But it seems that you've recently seen such a patch - can you recall
where?  Or was it the ecryptfs thing?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
  2012-03-08 22:19         ` Andrew Morton
@ 2012-03-08 22:33           ` Dave Jones
  2012-03-08 22:45             ` Andrew Morton
  2012-03-09  5:00           ` Aneesh Kumar K.V
  1 sibling, 1 reply; 15+ messages in thread
From: Dave Jones @ 2012-03-08 22:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Tyler Hicks, Aneesh Kumar K.V, linux-mm, jboyer,
	linux-kernel, Peter Zijlstra, Mimi Zohar, David Gibson

On Thu, Mar 08, 2012 at 02:19:38PM -0800, Andrew Morton wrote:
 > On Thu, 8 Mar 2012 21:49:52 +0000
 > Al Viro <viro@ZenIV.linux.org.uk> wrote:
 > 
 > > > So we need to pull the i_mutex out of hugetlbfs_file_mmap().
 > > 
 > > IIRC, you have a patch in your tree doing just that...
 > 
 > Nope.
 > 
 > But it seems that you've recently seen such a patch - can you recall
 > where?

this ? https://lkml.org/lkml/2012/2/23/64

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
  2012-03-08 21:44   ` Al Viro
@ 2012-03-08 22:44     ` Peter Zijlstra
  2012-03-08 22:46       ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2012-03-08 22:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Aneesh Kumar K.V, linux-mm, davej, jboyer, tyhicks,
	linux-kernel, Mimi Zohar

On Thu, 2012-03-08 at 21:44 +0000, Al Viro wrote:
> I suspect that they right thing would be to have a way to set explicit
> nesting rules, not tied to speficic call trace. 

See might_lock() / might_lock_read(), these are used to implement
might_fault(), which is used to annotate paths that could -- but rarely
do -- fault.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
  2012-03-08 22:33           ` Dave Jones
@ 2012-03-08 22:45             ` Andrew Morton
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2012-03-08 22:45 UTC (permalink / raw)
  To: Dave Jones
  Cc: Al Viro, Tyler Hicks, Aneesh Kumar K.V, linux-mm, jboyer,
	linux-kernel, Peter Zijlstra, Mimi Zohar, David Gibson

On Thu, 8 Mar 2012 17:33:34 -0500
Dave Jones <davej@redhat.com> wrote:

> On Thu, Mar 08, 2012 at 02:19:38PM -0800, Andrew Morton wrote:
>  > On Thu, 8 Mar 2012 21:49:52 +0000
>  > Al Viro <viro@ZenIV.linux.org.uk> wrote:
>  > 
>  > > > So we need to pull the i_mutex out of hugetlbfs_file_mmap().
>  > > 
>  > > IIRC, you have a patch in your tree doing just that...
>  > 
>  > Nope.
>  > 
>  > But it seems that you've recently seen such a patch - can you recall
>  > where?
> 
> this ? https://lkml.org/lkml/2012/2/23/64
> 

Thanks, yes, probably that.  Needs the i_size_read()/write() changes.

I worry a bit about the region handling code in mm/hugetlb.c.  

 * The region data structures are protected by a combination of the mmap_sem
 * and the hugetlb_instantion_mutex.  To access or modify a region the caller
 * must either hold the mmap_sem for write, or the mmap_sem for read and
 * the hugetlb_instantiation mutex:

I hope that's true - it would be nice to have some debug assertions in
the various region_foo() functions to verify that the required locks are
held.

But if that code is all nice and tight, I guess that removing that
i_mutex acquisition will be pretty simple.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
  2012-03-08 22:44     ` Peter Zijlstra
@ 2012-03-08 22:46       ` Peter Zijlstra
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2012-03-08 22:46 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Aneesh Kumar K.V, linux-mm, davej, jboyer, tyhicks,
	linux-kernel, Mimi Zohar

On Thu, 2012-03-08 at 23:44 +0100, Peter Zijlstra wrote:
> On Thu, 2012-03-08 at 21:44 +0000, Al Viro wrote:
> > I suspect that they right thing would be to have a way to set explicit
> > nesting rules, not tied to speficic call trace. 
> 
> See might_lock() / might_lock_read(), these are used to implement
> might_fault(), which is used to annotate paths that could -- but rarely
> do -- fault.

This will of course result in a specific trace, but if you do it early
enough the trace points to your setup function, which can contain a
comment explaining things.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
  2012-03-08 22:19         ` Andrew Morton
  2012-03-08 22:33           ` Dave Jones
@ 2012-03-09  5:00           ` Aneesh Kumar K.V
  1 sibling, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-09  5:00 UTC (permalink / raw)
  To: Andrew Morton, Al Viro
  Cc: Tyler Hicks, linux-mm, davej, jboyer, linux-kernel,
	Peter Zijlstra, Mimi Zohar, David Gibson

On Thu, 8 Mar 2012 14:19:38 -0800, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 8 Mar 2012 21:49:52 +0000
> Al Viro <viro@ZenIV.linux.org.uk> wrote:
> 
> > > So we need to pull the i_mutex out of hugetlbfs_file_mmap().
> > 
> > IIRC, you have a patch in your tree doing just that...
> 
> Nope.
> 
> But it seems that you've recently seen such a patch - can you recall
> where?  Or was it the ecryptfs thing?
> 

So what we ended up doing was

http://article.gmane.org/gmane.linux.kernel.mm/74732

The patch update hugetlbfs_read to not take i_mutex. That should make
sure deadlock won't happen. 


-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
  2012-03-08 21:40     ` Andrew Morton
  2012-03-08 21:49       ` Al Viro
@ 2012-03-09  5:03       ` Aneesh Kumar K.V
  1 sibling, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-09  5:03 UTC (permalink / raw)
  To: Andrew Morton, Tyler Hicks
  Cc: linux-mm, davej, jboyer, linux-kernel, Al Viro, Peter Zijlstra,
	Mimi Zohar, David Gibson

On Thu, 8 Mar 2012 13:40:50 -0800, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 8 Mar 2012 15:19:27 -0600
> Tyler Hicks <tyhicks@canonical.com> wrote:
> 
> > > 
> > > 
> > > Sigh.  Was lockdep_annotate_inode_mutex_key() sufficiently
> > > self-explanatory to justify leaving it undocumented?
> > > 
> > > <goes off and reads e096d0c7e2e>
> > > 
> > > OK, the patch looks correct given the explanation in e096d0c7e2e, but
> > > I'd like to understand why it becomes necessary only now.
> > > 
> > > > NOTE: This patch also require 
> > > > http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> > > > to remove the lockdep warning
> > > 
> > > And that patch has been basically ignored.
> > 
> > Al commented on it here:
> > 
> > https://lkml.org/lkml/2012/2/16/518
> > 
> > He said that while my patch is correct, taking i_mutex inside mmap_sem
> > is still wrong.
> 
> OK, thanks, yup.  Taking i_mutex in file_operations.mmap() is wrong.
> 
> Is hugetlbfs actually deadlockable because of this, or is it the case
> that the i_mutex->mmap_sem ordering happens to never happen for this
> filesystem?  Although we shouldn't go and create incompatible lock
> ranking rules for different filesystems!
> 
> So we need to pull the i_mutex out of hugetlbfs_file_mmap().  What's it
> actually trying to do in there?  If we switch to
> i_size_read()/i_size_write() then AFAICT the problem comes down to
> hugetlb_reserve_pages().
> 
> hugetlb_reserve_pages() fiddles with i_mapping->private_list and the fs
> owns private_list and is free to use a lock other than i_mutex to
> protect it.  (In fact i_mapping.private_lock is the usual lock for
> private_list).
> 
> 
> 
> So from a quick scan here I'm thinking that a decent fix is to remove
> the i_mutex locking from hugetlbfs_file_mmap(), switch
> hugetlbfs_file_mmap() to i_size_read/write then use a hugetlb-private
> lock to protect i_mapping->private_list.  region_chg() will do
> GFP_KERNEL allocations under that lock, so some care is needed.
> 

But as per 7762f5a0b709b415fda132258ad37b9f2a1db994 i_size_write should
always happen with i_mutex held 

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] hugetlbfs: lockdep annotate root inode properly
@ 2012-04-16 20:28 Aneesh Kumar K.V
  0 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2012-04-16 20:28 UTC (permalink / raw)
  To: akpm, linux-mm, davej, linux-kernel, viro, jwboyer; +Cc: Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This fix the below reported false lockdep warning. e096d0c7e2e4e5893792db865dd065ac73cf1f00
did a similar annotation for every other inode in hugetlbfs but missed the root
inode because it was allocated by a separate function.

For HugeTLB fs we allow taking i_mutex in mmap. HugeTLB fs doesn't support file
write and its file read callback is modified in a05b0855fd15504972dba2358e5faa172a1e50ba
to not take i_mutex. Hence for HugeTLB fs with regular files we really don't take
i_mutex with mmap_sem held.

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.4.0-rc1+ #322 Not tainted
 -------------------------------------------------------
 bash/1572 is trying to acquire lock:
  (&mm->mmap_sem){++++++}, at: [<ffffffff810f1618>] might_fault+0x40/0x90

 but task is already holding lock:
  (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff81125f88>] vfs_readdir+0x56/0xa8

 which lock already depends on the new lock.


 the existing dependency chain (in reverse order) is:

 -> #1 (&sb->s_type->i_mutex_key#12){+.+.+.}:
        [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa
        [<ffffffff816a2f5e>] __mutex_lock_common+0x48/0x350
        [<ffffffff816a3325>] mutex_lock_nested+0x2a/0x31
        [<ffffffff811fb8e1>] hugetlbfs_file_mmap+0x7d/0x104
        [<ffffffff810f859a>] mmap_region+0x272/0x47d
        [<ffffffff810f8a39>] do_mmap_pgoff+0x294/0x2ee
        [<ffffffff810f8b65>] sys_mmap_pgoff+0xd2/0x10e
        [<ffffffff8103d19e>] sys_mmap+0x1d/0x1f
        [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b

 -> #0 (&mm->mmap_sem){++++++}:
        [<ffffffff810a0256>] __lock_acquire+0xa81/0xd75
        [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa
        [<ffffffff810f1645>] might_fault+0x6d/0x90
        [<ffffffff81125d62>] filldir+0x6a/0xc2
        [<ffffffff81133a83>] dcache_readdir+0x5c/0x222
        [<ffffffff81125fa8>] vfs_readdir+0x76/0xa8
        [<ffffffff811260b6>] sys_getdents+0x79/0xc9
        [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&sb->s_type->i_mutex_key#12);
                                lock(&mm->mmap_sem);
                                lock(&sb->s_type->i_mutex_key#12);
   lock(&mm->mmap_sem);

  *** DEADLOCK ***

 1 lock held by bash/1572:
  #0:  (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff81125f88>] vfs_readdir+0x56/0xa8

 stack backtrace:
 Pid: 1572, comm: bash Not tainted 3.4.0-rc1+ #322
 Call Trace:
  [<ffffffff81699a3c>] print_circular_bug+0x1f8/0x209
  [<ffffffff810a0256>] __lock_acquire+0xa81/0xd75
  [<ffffffff810f38aa>] ? handle_pte_fault+0x5ff/0x614
  [<ffffffff8109e622>] ? mark_lock+0x2d/0x258
  [<ffffffff810f1618>] ? might_fault+0x40/0x90
  [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa
  [<ffffffff810f1618>] ? might_fault+0x40/0x90
  [<ffffffff816a3249>] ? __mutex_lock_common+0x333/0x350
  [<ffffffff810f1645>] might_fault+0x6d/0x90
  [<ffffffff810f1618>] ? might_fault+0x40/0x90
  [<ffffffff81125d62>] filldir+0x6a/0xc2
  [<ffffffff81133a83>] dcache_readdir+0x5c/0x222
  [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74
  [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74
  [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74
  [<ffffffff81125fa8>] vfs_readdir+0x76/0xa8
  [<ffffffff811260b6>] sys_getdents+0x79/0xc9
  [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/hugetlbfs/inode.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 92f75aa..d8899e1 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -485,6 +485,7 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb,
 		inode->i_fop = &simple_dir_operations;
 		/* directory inodes start off with i_nlink == 2 (for "." entry) */
 		inc_nlink(inode);
+		lockdep_annotate_inode_mutex_key(inode);
 	}
 	return inode;
 }
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-04-16 20:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-08  9:15 [PATCH] hugetlbfs: lockdep annotate root inode properly Aneesh Kumar K.V
2012-03-08 21:02 ` Andrew Morton
2012-03-08 21:10   ` Dave Jones
2012-03-08 21:19   ` Tyler Hicks
2012-03-08 21:40     ` Andrew Morton
2012-03-08 21:49       ` Al Viro
2012-03-08 22:19         ` Andrew Morton
2012-03-08 22:33           ` Dave Jones
2012-03-08 22:45             ` Andrew Morton
2012-03-09  5:00           ` Aneesh Kumar K.V
2012-03-09  5:03       ` Aneesh Kumar K.V
2012-03-08 21:44   ` Al Viro
2012-03-08 22:44     ` Peter Zijlstra
2012-03-08 22:46       ` Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2012-04-16 20:28 Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).