hugetlbfs lockdep spew revisited.

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* hugetlbfs lockdep spew revisited.
@ 2012-02-17  0:08 Dave Jones
  2012-02-17  0:16 ` Josh Boyer
  2012-02-17  0:27 ` Al Viro
  0 siblings, 2 replies; 13+ messages in thread
From: Dave Jones @ 2012-02-17  0:08 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Linux Kernel

Remember this ? https://lkml.org/lkml/2011/4/15/272
Josh took a stab at fixing it in e096d0c7e2e4e5893792db865dd065ac73cf1f00,
but it seems to still be there.

	Dave


======================================================
[ INFO: possible circular locking dependency detected ]
3.3.0-rc3+ #2 Not tainted
-------------------------------------------------------
trinity/30663 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#18){+.+...}, at: [<ffffffff81298169>] hugetlbfs_file_mmap+0x89/0x140

but task is already holding lock:
 (&mm->mmap_sem){++++++}, at: [<ffffffff81182d97>] sys_mmap_pgoff+0x1d7/0x230

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&mm->mmap_sem){++++++}:
       [<ffffffff810d073d>] lock_acquire+0x9d/0x220
       [<ffffffff811789c0>] might_fault+0x80/0xb0
       [<ffffffff811d2997>] filldir+0x77/0xe0
       [<ffffffff811e61ae>] dcache_readdir+0x5e/0x220
       [<ffffffff811d2c68>] vfs_readdir+0xb8/0xf0
       [<ffffffff811d2d99>] sys_getdents+0x89/0x100
       [<ffffffff816a5b69>] system_call_fastpath+0x16/0x1b

-> #0 (&sb->s_type->i_mutex_key#18){+.+...}:
       [<ffffffff810d0008>] __lock_acquire+0x1bf8/0x1c20
       [<ffffffff810d073d>] lock_acquire+0x9d/0x220
       [<ffffffff8169a5b9>] __mutex_lock_common+0x59/0x500
       [<ffffffff8169ab94>] mutex_lock_nested+0x44/0x50
       [<ffffffff81298169>] hugetlbfs_file_mmap+0x89/0x140
       [<ffffffff811826a9>] mmap_region+0x369/0x4f0
       [<ffffffff81182b9f>] do_mmap_pgoff+0x36f/0x390
       [<ffffffff81182db7>] sys_mmap_pgoff+0x1f7/0x230
       [<ffffffff8101eda2>] sys_mmap+0x22/0x30
       [<ffffffff816a5b69>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&mm->mmap_sem);
                               lock(&sb->s_type->i_mutex_key#18);
                               lock(&mm->mmap_sem);
  lock(&sb->s_type->i_mutex_key#18);

 *** DEADLOCK ***

1 lock held by trinity/30663:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff81182d97>] sys_mmap_pgoff+0x1d7/0x230

stack backtrace:
Pid: 30663, comm: trinity Not tainted 3.3.0-rc3+ #2
Call Trace:
 [<ffffffff816924d7>] print_circular_bug+0x1fb/0x20c
 [<ffffffff810d0008>] __lock_acquire+0x1bf8/0x1c20
 [<ffffffff816a1c2d>] ? sub_preempt_count+0x9d/0xd0
 [<ffffffff811a40cc>] ? deactivate_slab+0x54c/0x5f0
 [<ffffffff810d073d>] lock_acquire+0x9d/0x220
 [<ffffffff81298169>] ? hugetlbfs_file_mmap+0x89/0x140
 [<ffffffff810d12fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff8169a5b9>] __mutex_lock_common+0x59/0x500
 [<ffffffff81298169>] ? hugetlbfs_file_mmap+0x89/0x140
 [<ffffffff811825e5>] ? mmap_region+0x2a5/0x4f0
 [<ffffffff81298169>] ? hugetlbfs_file_mmap+0x89/0x140
 [<ffffffff8169ab94>] mutex_lock_nested+0x44/0x50
 [<ffffffff81298169>] hugetlbfs_file_mmap+0x89/0x140
 [<ffffffff811826a9>] mmap_region+0x369/0x4f0
 [<ffffffff812c1e9a>] ? file_map_prot_check+0xaa/0xe0
 [<ffffffff81182b9f>] do_mmap_pgoff+0x36f/0x390
 [<ffffffff81182d97>] ? sys_mmap_pgoff+0x1d7/0x230
 [<ffffffff81182db7>] sys_mmap_pgoff+0x1f7/0x230
 [<ffffffff810d12fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff8101eda2>] sys_mmap+0x22/0x30
 [<ffffffff816a5b69>] system_call_fastpath+0x16/0x1b


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: hugetlbfs lockdep spew revisited.
  2012-02-17  0:08 hugetlbfs lockdep spew revisited Dave Jones
@ 2012-02-17  0:16 ` Josh Boyer
  2012-02-17  0:34   ` Al Viro
  2012-02-17  0:38   ` Tyler Hicks
  2012-02-17  0:27 ` Al Viro
  1 sibling, 2 replies; 13+ messages in thread
From: Josh Boyer @ 2012-02-17  0:16 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel; +Cc: tyhicks

On Thu, Feb 16, 2012 at 07:08:57PM -0500, Dave Jones wrote:
> Remember this ? https://lkml.org/lkml/2011/4/15/272
> Josh took a stab at fixing it in e096d0c7e2e4e5893792db865dd065ac73cf1f00,
> but it seems to still be there.

I think Tyler Hicks actually noticed this a while ago, but his patch has
been waiting on comment from Al and Christoph:

http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565

I've been hesitant to comment because I obviously screwed up once
already.  We could try this patch in Fedora for a while if Al and
company don't speak up soon.

josh

> 
> 	Dave
> 
> 
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.3.0-rc3+ #2 Not tainted
> -------------------------------------------------------
> trinity/30663 is trying to acquire lock:
>  (&sb->s_type->i_mutex_key#18){+.+...}, at: [<ffffffff81298169>] hugetlbfs_file_mmap+0x89/0x140
> 
> but task is already holding lock:
>  (&mm->mmap_sem){++++++}, at: [<ffffffff81182d97>] sys_mmap_pgoff+0x1d7/0x230
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&mm->mmap_sem){++++++}:
>        [<ffffffff810d073d>] lock_acquire+0x9d/0x220
>        [<ffffffff811789c0>] might_fault+0x80/0xb0
>        [<ffffffff811d2997>] filldir+0x77/0xe0
>        [<ffffffff811e61ae>] dcache_readdir+0x5e/0x220
>        [<ffffffff811d2c68>] vfs_readdir+0xb8/0xf0
>        [<ffffffff811d2d99>] sys_getdents+0x89/0x100
>        [<ffffffff816a5b69>] system_call_fastpath+0x16/0x1b
> 
> -> #0 (&sb->s_type->i_mutex_key#18){+.+...}:
>        [<ffffffff810d0008>] __lock_acquire+0x1bf8/0x1c20
>        [<ffffffff810d073d>] lock_acquire+0x9d/0x220
>        [<ffffffff8169a5b9>] __mutex_lock_common+0x59/0x500
>        [<ffffffff8169ab94>] mutex_lock_nested+0x44/0x50
>        [<ffffffff81298169>] hugetlbfs_file_mmap+0x89/0x140
>        [<ffffffff811826a9>] mmap_region+0x369/0x4f0
>        [<ffffffff81182b9f>] do_mmap_pgoff+0x36f/0x390
>        [<ffffffff81182db7>] sys_mmap_pgoff+0x1f7/0x230
>        [<ffffffff8101eda2>] sys_mmap+0x22/0x30
>        [<ffffffff816a5b69>] system_call_fastpath+0x16/0x1b
> 
> other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&mm->mmap_sem);
>                                lock(&sb->s_type->i_mutex_key#18);
>                                lock(&mm->mmap_sem);
>   lock(&sb->s_type->i_mutex_key#18);
> 
>  *** DEADLOCK ***
> 
> 1 lock held by trinity/30663:
>  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff81182d97>] sys_mmap_pgoff+0x1d7/0x230
> 
> stack backtrace:
> Pid: 30663, comm: trinity Not tainted 3.3.0-rc3+ #2
> Call Trace:
>  [<ffffffff816924d7>] print_circular_bug+0x1fb/0x20c
>  [<ffffffff810d0008>] __lock_acquire+0x1bf8/0x1c20
>  [<ffffffff816a1c2d>] ? sub_preempt_count+0x9d/0xd0
>  [<ffffffff811a40cc>] ? deactivate_slab+0x54c/0x5f0
>  [<ffffffff810d073d>] lock_acquire+0x9d/0x220
>  [<ffffffff81298169>] ? hugetlbfs_file_mmap+0x89/0x140
>  [<ffffffff810d12fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
>  [<ffffffff8169a5b9>] __mutex_lock_common+0x59/0x500
>  [<ffffffff81298169>] ? hugetlbfs_file_mmap+0x89/0x140
>  [<ffffffff811825e5>] ? mmap_region+0x2a5/0x4f0
>  [<ffffffff81298169>] ? hugetlbfs_file_mmap+0x89/0x140
>  [<ffffffff8169ab94>] mutex_lock_nested+0x44/0x50
>  [<ffffffff81298169>] hugetlbfs_file_mmap+0x89/0x140
>  [<ffffffff811826a9>] mmap_region+0x369/0x4f0
>  [<ffffffff812c1e9a>] ? file_map_prot_check+0xaa/0xe0
>  [<ffffffff81182b9f>] do_mmap_pgoff+0x36f/0x390
>  [<ffffffff81182d97>] ? sys_mmap_pgoff+0x1d7/0x230
>  [<ffffffff81182db7>] sys_mmap_pgoff+0x1f7/0x230
>  [<ffffffff810d12fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
>  [<ffffffff8101eda2>] sys_mmap+0x22/0x30
>  [<ffffffff816a5b69>] system_call_fastpath+0x16/0x1b
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: hugetlbfs lockdep spew revisited.
  2012-02-17  0:16 ` Josh Boyer
@ 2012-02-17  0:34   ` Al Viro
  2012-02-17  0:38   ` Tyler Hicks
  1 sibling, 0 replies; 13+ messages in thread
From: Al Viro @ 2012-02-17  0:34 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Dave Jones, Linux Kernel, tyhicks

On Thu, Feb 16, 2012 at 07:16:34PM -0500, Josh Boyer wrote:
> On Thu, Feb 16, 2012 at 07:08:57PM -0500, Dave Jones wrote:
> > Remember this ? https://lkml.org/lkml/2011/4/15/272
> > Josh took a stab at fixing it in e096d0c7e2e4e5893792db865dd065ac73cf1f00,
> > but it seems to still be there.
> 
> I think Tyler Hicks actually noticed this a while ago, but his patch has
> been waiting on comment from Al and Christoph:
> 
> http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> 
> I've been hesitant to comment because I obviously screwed up once
> already.  We could try this patch in Fedora for a while if Al and
> company don't speak up soon.

That has nothing to do with the deadlock in question; it's *NOT* about
directories at all and no, it's not a false positive.

This is very simple: ->mmap() should never take ->i_mutex.  Directories
have nothing to do with that.  Simple grep for i_mutex in fs/hugetlbfs/*.c
will instantly show its use for non-directories, with pagefaults taken
while holding it.  Pagefault handlers take ->mmap_sem; so does ->mmap()
caller.  QED.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: hugetlbfs lockdep spew revisited.
  2012-02-17  0:16 ` Josh Boyer
  2012-02-17  0:34   ` Al Viro
@ 2012-02-17  0:38   ` Tyler Hicks
  2012-02-17  0:49     ` Al Viro
  1 sibling, 1 reply; 13+ messages in thread
From: Tyler Hicks @ 2012-02-17  0:38 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Dave Jones, Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 4619 bytes --]

On 2012-02-16 19:16:34, Josh Boyer wrote:
> On Thu, Feb 16, 2012 at 07:08:57PM -0500, Dave Jones wrote:
> > Remember this ? https://lkml.org/lkml/2011/4/15/272
> > Josh took a stab at fixing it in e096d0c7e2e4e5893792db865dd065ac73cf1f00,
> > but it seems to still be there.
> 
> I think Tyler Hicks actually noticed this a while ago, but his patch has
> been waiting on comment from Al and Christoph:
> 
> http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> 
> I've been hesitant to comment because I obviously screwed up once
> already.  We could try this patch in Fedora for a while if Al and
> company don't speak up soon.

I'm pretty confident that my patch that Josh linked to would "fix" the
lockdep warning below. According to the backtrace, it is barking about a
directory inode and a regular inode having a circular locking
dependency, so deadlock is not possible in this case.

akpm picked up my patch in the mm tree, but it still hasn't made it into
mainline.

Tyler

> 
> josh
> 
> > 
> > 	Dave
> > 
> > 
> > ======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 3.3.0-rc3+ #2 Not tainted
> > -------------------------------------------------------
> > trinity/30663 is trying to acquire lock:
> >  (&sb->s_type->i_mutex_key#18){+.+...}, at: [<ffffffff81298169>] hugetlbfs_file_mmap+0x89/0x140
> > 
> > but task is already holding lock:
> >  (&mm->mmap_sem){++++++}, at: [<ffffffff81182d97>] sys_mmap_pgoff+0x1d7/0x230
> > 
> > which lock already depends on the new lock.
> > 
> > 
> > the existing dependency chain (in reverse order) is:
> > 
> > -> #1 (&mm->mmap_sem){++++++}:
> >        [<ffffffff810d073d>] lock_acquire+0x9d/0x220
> >        [<ffffffff811789c0>] might_fault+0x80/0xb0
> >        [<ffffffff811d2997>] filldir+0x77/0xe0
> >        [<ffffffff811e61ae>] dcache_readdir+0x5e/0x220
> >        [<ffffffff811d2c68>] vfs_readdir+0xb8/0xf0
> >        [<ffffffff811d2d99>] sys_getdents+0x89/0x100
> >        [<ffffffff816a5b69>] system_call_fastpath+0x16/0x1b
> > 
> > -> #0 (&sb->s_type->i_mutex_key#18){+.+...}:
> >        [<ffffffff810d0008>] __lock_acquire+0x1bf8/0x1c20
> >        [<ffffffff810d073d>] lock_acquire+0x9d/0x220
> >        [<ffffffff8169a5b9>] __mutex_lock_common+0x59/0x500
> >        [<ffffffff8169ab94>] mutex_lock_nested+0x44/0x50
> >        [<ffffffff81298169>] hugetlbfs_file_mmap+0x89/0x140
> >        [<ffffffff811826a9>] mmap_region+0x369/0x4f0
> >        [<ffffffff81182b9f>] do_mmap_pgoff+0x36f/0x390
> >        [<ffffffff81182db7>] sys_mmap_pgoff+0x1f7/0x230
> >        [<ffffffff8101eda2>] sys_mmap+0x22/0x30
> >        [<ffffffff816a5b69>] system_call_fastpath+0x16/0x1b
> > 
> > other info that might help us debug this:
> > 
> >  Possible unsafe locking scenario:
> > 
> >        CPU0                    CPU1
> >        ----                    ----
> >   lock(&mm->mmap_sem);
> >                                lock(&sb->s_type->i_mutex_key#18);
> >                                lock(&mm->mmap_sem);
> >   lock(&sb->s_type->i_mutex_key#18);
> > 
> >  *** DEADLOCK ***
> > 
> > 1 lock held by trinity/30663:
> >  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff81182d97>] sys_mmap_pgoff+0x1d7/0x230
> > 
> > stack backtrace:
> > Pid: 30663, comm: trinity Not tainted 3.3.0-rc3+ #2
> > Call Trace:
> >  [<ffffffff816924d7>] print_circular_bug+0x1fb/0x20c
> >  [<ffffffff810d0008>] __lock_acquire+0x1bf8/0x1c20
> >  [<ffffffff816a1c2d>] ? sub_preempt_count+0x9d/0xd0
> >  [<ffffffff811a40cc>] ? deactivate_slab+0x54c/0x5f0
> >  [<ffffffff810d073d>] lock_acquire+0x9d/0x220
> >  [<ffffffff81298169>] ? hugetlbfs_file_mmap+0x89/0x140
> >  [<ffffffff810d12fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
> >  [<ffffffff8169a5b9>] __mutex_lock_common+0x59/0x500
> >  [<ffffffff81298169>] ? hugetlbfs_file_mmap+0x89/0x140
> >  [<ffffffff811825e5>] ? mmap_region+0x2a5/0x4f0
> >  [<ffffffff81298169>] ? hugetlbfs_file_mmap+0x89/0x140
> >  [<ffffffff8169ab94>] mutex_lock_nested+0x44/0x50
> >  [<ffffffff81298169>] hugetlbfs_file_mmap+0x89/0x140
> >  [<ffffffff811826a9>] mmap_region+0x369/0x4f0
> >  [<ffffffff812c1e9a>] ? file_map_prot_check+0xaa/0xe0
> >  [<ffffffff81182b9f>] do_mmap_pgoff+0x36f/0x390
> >  [<ffffffff81182d97>] ? sys_mmap_pgoff+0x1d7/0x230
> >  [<ffffffff81182db7>] sys_mmap_pgoff+0x1f7/0x230
> >  [<ffffffff810d12fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
> >  [<ffffffff8101eda2>] sys_mmap+0x22/0x30
> >  [<ffffffff816a5b69>] system_call_fastpath+0x16/0x1b
> > 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: hugetlbfs lockdep spew revisited.
  2012-02-17  0:38   ` Tyler Hicks
@ 2012-02-17  0:49     ` Al Viro
  2012-02-17  3:42       ` Tyler Hicks
                         ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Al Viro @ 2012-02-17  0:49 UTC (permalink / raw)
  To: Tyler Hicks; +Cc: Josh Boyer, Dave Jones, Linux Kernel

On Thu, Feb 16, 2012 at 06:38:49PM -0600, Tyler Hicks wrote:
> On 2012-02-16 19:16:34, Josh Boyer wrote:
> > On Thu, Feb 16, 2012 at 07:08:57PM -0500, Dave Jones wrote:
> > > Remember this ? https://lkml.org/lkml/2011/4/15/272
> > > Josh took a stab at fixing it in e096d0c7e2e4e5893792db865dd065ac73cf1f00,
> > > but it seems to still be there.
> > 
> > I think Tyler Hicks actually noticed this a while ago, but his patch has
> > been waiting on comment from Al and Christoph:
> > 
> > http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> > 
> > I've been hesitant to comment because I obviously screwed up once
> > already.  We could try this patch in Fedora for a while if Al and
> > company don't speak up soon.
> 
> I'm pretty confident that my patch that Josh linked to would "fix" the
> lockdep warning below. According to the backtrace, it is barking about a
> directory inode and a regular inode having a circular locking
> dependency, so deadlock is not possible in this case.

Sigh...  That patch is correct, but it has nothing to do with the locking
order violation that really *is* there.  The only benefit would be to
get rid of the "deadlock is not possible" nonsense, since you would see
read/write vs. mmap instead of readdir vs. mmap in the traces.  Locking
order is the *same* for directories and nondirectories; both can have
pagefaults under ->i_mutex on their respective inodes.  And while mmap
cannot happen for directories, it certainly can happen for regular files,
so taking ->i_mutex in ->mmap() is a plain and simple bug.  Should never
be done; in particular, hugetlbfs has ->i_mutex held in read() around
pagefaults, which gives you an obvious deadlock with its ->mmap().

Folks, this is not a false positive and it has nothing to do with misannotation
for directories.  Deadlock is real; I have no idea WTF do we what ->i_mutex
held over that area in hugetlbfs ->mmap(), but doing that is really, really
wrong, whatever the reason.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: hugetlbfs lockdep spew revisited.
  2012-02-17  0:49     ` Al Viro
@ 2012-02-17  3:42       ` Tyler Hicks
  2012-02-21 18:21         ` Mimi Zohar
  2012-02-17  6:47       ` J. R. Okajima
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Tyler Hicks @ 2012-02-17  3:42 UTC (permalink / raw)
  To: Al Viro; +Cc: Josh Boyer, Dave Jones, Linux Kernel, Mimi Zohar

[-- Attachment #1: Type: text/plain, Size: 2288 bytes --]

On 2012-02-17 00:49:22, Al Viro wrote:
> On Thu, Feb 16, 2012 at 06:38:49PM -0600, Tyler Hicks wrote:
> > On 2012-02-16 19:16:34, Josh Boyer wrote:
> > > On Thu, Feb 16, 2012 at 07:08:57PM -0500, Dave Jones wrote:
> > > > Remember this ? https://lkml.org/lkml/2011/4/15/272
> > > > Josh took a stab at fixing it in e096d0c7e2e4e5893792db865dd065ac73cf1f00,
> > > > but it seems to still be there.
> > > 
> > > I think Tyler Hicks actually noticed this a while ago, but his patch has
> > > been waiting on comment from Al and Christoph:
> > > 
> > > http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> > > 
> > > I've been hesitant to comment because I obviously screwed up once
> > > already.  We could try this patch in Fedora for a while if Al and
> > > company don't speak up soon.
> > 
> > I'm pretty confident that my patch that Josh linked to would "fix" the
> > lockdep warning below. According to the backtrace, it is barking about a
> > directory inode and a regular inode having a circular locking
> > dependency, so deadlock is not possible in this case.
> 
> Sigh...  That patch is correct, but it has nothing to do with the locking
> order violation that really *is* there.  The only benefit would be to
> get rid of the "deadlock is not possible" nonsense, since you would see
> read/write vs. mmap instead of readdir vs. mmap in the traces.  Locking
> order is the *same* for directories and nondirectories; both can have
> pagefaults under ->i_mutex on their respective inodes.  And while mmap
> cannot happen for directories, it certainly can happen for regular files,
> so taking ->i_mutex in ->mmap() is a plain and simple bug.  Should never
> be done; in particular, hugetlbfs has ->i_mutex held in read() around
> pagefaults, which gives you an obvious deadlock with its ->mmap().
> 
> Folks, this is not a false positive and it has nothing to do with misannotation
> for directories.  Deadlock is real; I have no idea WTF do we what ->i_mutex
> held over that area in hugetlbfs ->mmap(), but doing that is really, really
> wrong, whatever the reason.

Thanks for clearing that up, Al. I only knew that the inodes were being
incorrectly annotated, but I wasn't sure about the correct locking order.

Tyler


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: hugetlbfs lockdep spew revisited.
  2012-02-17  3:42       ` Tyler Hicks
@ 2012-02-21 18:21         ` Mimi Zohar
  0 siblings, 0 replies; 13+ messages in thread
From: Mimi Zohar @ 2012-02-21 18:21 UTC (permalink / raw)
  To: Tyler Hicks; +Cc: Al Viro, Josh Boyer, Dave Jones, Linux Kernel

On Thu, 2012-02-16 at 21:42 -0600, Tyler Hicks wrote:
> On 2012-02-17 00:49:22, Al Viro wrote:
> > On Thu, Feb 16, 2012 at 06:38:49PM -0600, Tyler Hicks wrote:
> > > On 2012-02-16 19:16:34, Josh Boyer wrote:
> > > > On Thu, Feb 16, 2012 at 07:08:57PM -0500, Dave Jones wrote:
> > > > > Remember this ? https://lkml.org/lkml/2011/4/15/272
> > > > > Josh took a stab at fixing it in e096d0c7e2e4e5893792db865dd065ac73cf1f00,
> > > > > but it seems to still be there.
> > > > 
> > > > I think Tyler Hicks actually noticed this a while ago, but his patch has
> > > > been waiting on comment from Al and Christoph:
> > > > 
> > > > http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> > > > 
> > > > I've been hesitant to comment because I obviously screwed up once
> > > > already.  We could try this patch in Fedora for a while if Al and
> > > > company don't speak up soon.
> > > 
> > > I'm pretty confident that my patch that Josh linked to would "fix" the
> > > lockdep warning below. According to the backtrace, it is barking about a
> > > directory inode and a regular inode having a circular locking
> > > dependency, so deadlock is not possible in this case.
> > 
> > Sigh...  That patch is correct, but it has nothing to do with the locking
> > order violation that really *is* there.  The only benefit would be to
> > get rid of the "deadlock is not possible" nonsense, since you would see
> > read/write vs. mmap instead of readdir vs. mmap in the traces.  Locking
> > order is the *same* for directories and nondirectories; both can have
> > pagefaults under ->i_mutex on their respective inodes.  And while mmap
> > cannot happen for directories, it certainly can happen for regular files,
> > so taking ->i_mutex in ->mmap() is a plain and simple bug.  Should never
> > be done; in particular, hugetlbfs has ->i_mutex held in read() around
> > pagefaults, which gives you an obvious deadlock with its ->mmap().
> > 
> > Folks, this is not a false positive and it has nothing to do with misannotation
> > for directories.  Deadlock is real; I have no idea WTF do we what ->i_mutex
> > held over that area in hugetlbfs ->mmap(), but doing that is really, really
> > wrong, whatever the reason.
> 
> Thanks for clearing that up, Al. I only knew that the inodes were being
> incorrectly annotated, but I wasn't sure about the correct locking order.
> 
> Tyler

Al, thanks for the clarification.  An i_mutex/mmap_sem lockdep exists
for IMA as well.  https://lkml.org/lkml/2012/1/24/246 resolves the
lockdep by moving ima_file_mmap() before the mmap_sem is taken.  Do you
see any problems with this patch?

thanks,

Mimi


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: hugetlbfs lockdep spew revisited.
  2012-02-17  0:49     ` Al Viro
  2012-02-17  3:42       ` Tyler Hicks
@ 2012-02-17  6:47       ` J. R. Okajima
  2012-02-17 17:48       ` udf deadlock (was Re: hugetlbfs lockdep spew revisited.) Al Viro
  2012-02-18 10:55       ` hugetlbfs lockdep spew revisited Aneesh Kumar K.V
  3 siblings, 0 replies; 13+ messages in thread
From: J. R. Okajima @ 2012-02-17  6:47 UTC (permalink / raw)
  To: Al Viro; +Cc: Tyler Hicks, Josh Boyer, Dave Jones, Linux Kernel


Al Viro:
> Sigh...  That patch is correct, but it has nothing to do with the locking
> order violation that really *is* there.  The only benefit would be to
> get rid of the "deadlock is not possible" nonsense, since you would see
> read/write vs. mmap instead of readdir vs. mmap in the traces.  Locking
	:::

How do you think about this patch?

Re: [RFC 0/2] locking order of mm->mmap_sem and various FS
http://marc.info/?l=linux-kernel&m=132124846728745&w=2

Ah, I found mutex_destroy() call in hugetlbfs_destroy_inode() should be
removed.
If you think this approach is good, then I'd post a revised patch.


J. R. Okajima

^ permalink raw reply	[flat|nested] 13+ messages in thread

* udf deadlock (was Re: hugetlbfs lockdep spew revisited.)
  2012-02-17  0:49     ` Al Viro
  2012-02-17  3:42       ` Tyler Hicks
  2012-02-17  6:47       ` J. R. Okajima
@ 2012-02-17 17:48       ` Al Viro
  2012-02-20 16:01         ` Jan Kara
  2012-02-18 10:55       ` hugetlbfs lockdep spew revisited Aneesh Kumar K.V
  3 siblings, 1 reply; 13+ messages in thread
From: Al Viro @ 2012-02-17 17:48 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Kara, Josh Boyer, Dave Jones, Linux Kernel, Tyler Hicks

On Fri, Feb 17, 2012 at 12:49:22AM +0000, Al Viro wrote:
> Folks, this is not a false positive and it has nothing to do with misannotation
> for directories.  Deadlock is real; I have no idea WTF do we what ->i_mutex
> held over that area in hugetlbfs ->mmap(), but doing that is really, really
> wrong, whatever the reason.

Arrrrgh...  Some grepping around has uncovered another deadlock on
i_mutex/mmap_sem and this one is not hard to hit at all.

Thread A:
	opens file on UDF (O_RDWR open)
	does big, fat write() to it
Thread B:
	opens the same file (also O_RDWR)
	mmaps it
	closes
	does munmap()

and there we are - munmap() will end up closing the second struct file,
call udf_release_file() and we are hitting ->i_mutex while under
->mmap_sem.  Blocking on it, actually, since generic_file_aio_write()
in the first thread is holding ->i_mutex.  And as soon as thread A gets
around to faulting the next piece of data in, well...  To widen the
window a lot, mmap something large sitting on NFS and do write() from
that mmapped area.  Race window as wide as one could ask for...

What happens there is prealloc discard on close; do we even want ->i_mutex
there these days?  Note that there's also
	down_write(&UDF_I(inode)->i_data_sem);
in udf_release_file()...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: udf deadlock (was Re: hugetlbfs lockdep spew revisited.)
  2012-02-17 17:48       ` udf deadlock (was Re: hugetlbfs lockdep spew revisited.) Al Viro
@ 2012-02-20 16:01         ` Jan Kara
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2012-02-20 16:01 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Jan Kara, Josh Boyer, Dave Jones, Linux Kernel,
	Tyler Hicks

On Fri 17-02-12 17:48:18, Al Viro wrote:
> On Fri, Feb 17, 2012 at 12:49:22AM +0000, Al Viro wrote:
> > Folks, this is not a false positive and it has nothing to do with misannotation
> > for directories.  Deadlock is real; I have no idea WTF do we what ->i_mutex
> > held over that area in hugetlbfs ->mmap(), but doing that is really, really
> > wrong, whatever the reason.
> 
> Arrrrgh...  Some grepping around has uncovered another deadlock on
> i_mutex/mmap_sem and this one is not hard to hit at all.
> 
> Thread A:
> 	opens file on UDF (O_RDWR open)
> 	does big, fat write() to it
> Thread B:
> 	opens the same file (also O_RDWR)
> 	mmaps it
> 	closes
> 	does munmap()
> 
> and there we are - munmap() will end up closing the second struct file,
> call udf_release_file() and we are hitting ->i_mutex while under
> ->mmap_sem.  Blocking on it, actually, since generic_file_aio_write()
> in the first thread is holding ->i_mutex.  And as soon as thread A gets
> around to faulting the next piece of data in, well...  To widen the
> window a lot, mmap something large sitting on NFS and do write() from
> that mmapped area.  Race window as wide as one could ask for...
  Right, I didn't realize ->release() may be called with mmap_sem held.
Thanks for spotting this.  BTW: Documentation/filesystems/Locking might
need an update since it states:
locking rules:
        All may block except for ->setlease.
        No VFS locks held on entry except for ->setlease.

> What happens there is prealloc discard on close; do we even want ->i_mutex
> there these days?  Note that there's also
> 	down_write(&UDF_I(inode)->i_data_sem);
> in udf_release_file()...
  I've looked around and it seems we don't need i_mutex for anything.
i_data_sem should be enough. So I'll just remove i_mutex.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: hugetlbfs lockdep spew revisited.
  2012-02-17  0:49     ` Al Viro
                         ` (2 preceding siblings ...)
  2012-02-17 17:48       ` udf deadlock (was Re: hugetlbfs lockdep spew revisited.) Al Viro
@ 2012-02-18 10:55       ` Aneesh Kumar K.V
  3 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2012-02-18 10:55 UTC (permalink / raw)
  To: Al Viro, Tyler Hicks; +Cc: Josh Boyer, Dave Jones, Linux Kernel

On Fri, 17 Feb 2012 00:49:22 +0000, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> On Thu, Feb 16, 2012 at 06:38:49PM -0600, Tyler Hicks wrote:
> > On 2012-02-16 19:16:34, Josh Boyer wrote:
> > > On Thu, Feb 16, 2012 at 07:08:57PM -0500, Dave Jones wrote:
> > > > Remember this ? https://lkml.org/lkml/2011/4/15/272
> > > > Josh took a stab at fixing it in e096d0c7e2e4e5893792db865dd065ac73cf1f00,
> > > > but it seems to still be there.
> > > 
> > > I think Tyler Hicks actually noticed this a while ago, but his patch has
> > > been waiting on comment from Al and Christoph:
> > > 
> > > http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> > > 
> > > I've been hesitant to comment because I obviously screwed up once
> > > already.  We could try this patch in Fedora for a while if Al and
> > > company don't speak up soon.
> > 
> > I'm pretty confident that my patch that Josh linked to would "fix" the
> > lockdep warning below. According to the backtrace, it is barking about a
> > directory inode and a regular inode having a circular locking
> > dependency, so deadlock is not possible in this case.
> 
> Sigh...  That patch is correct, but it has nothing to do with the locking
> order violation that really *is* there.  The only benefit would be to
> get rid of the "deadlock is not possible" nonsense, since you would see
> read/write vs. mmap instead of readdir vs. mmap in the traces.  Locking
> order is the *same* for directories and nondirectories; both can have
> pagefaults under ->i_mutex on their respective inodes.  And while mmap
> cannot happen for directories, it certainly can happen for regular files,
> so taking ->i_mutex in ->mmap() is a plain and simple bug.  Should never
> be done; in particular, hugetlbfs has ->i_mutex held in read() around
> pagefaults, which gives you an obvious deadlock with its ->mmap().
> 
> Folks, this is not a false positive and it has nothing to do with misannotation
> for directories.  Deadlock is real; I have no idea WTF do we what ->i_mutex
> held over that area in hugetlbfs ->mmap(), but doing that is really, really
> wrong, whatever the reason.

I looked at hugetlbfs recently and noticed this. Another strange thing
with hugetlbfs is, it doesn't support write, instead allows to bump
the file size via mmap. I don't have a patch for inode->i_mutex issue
yet.

-aneesh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: hugetlbfs lockdep spew revisited.
  2012-02-17  0:08 hugetlbfs lockdep spew revisited Dave Jones
  2012-02-17  0:16 ` Josh Boyer
@ 2012-02-17  0:27 ` Al Viro
  2012-02-23  9:27   ` Aneesh Kumar K.V
  1 sibling, 1 reply; 13+ messages in thread
From: Al Viro @ 2012-02-17  0:27 UTC (permalink / raw)
  To: Dave Jones, Josh Boyer, Linux Kernel

On Thu, Feb 16, 2012 at 07:08:57PM -0500, Dave Jones wrote:
> Remember this ? https://lkml.org/lkml/2011/4/15/272
> Josh took a stab at fixing it in e096d0c7e2e4e5893792db865dd065ac73cf1f00,
> but it seems to still be there.

> the existing dependency chain (in reverse order) is:

[snip]

... and as bloody usual, that mentioning of readdir in the output is a
red herring; the real problem (and yes, it *is* deadlock-prone) is not
with getdents(2) that cannot happen on anything that could be mmaped;
it's with hugetlbfs_read() (i.e. read(2)) that very definitely *can*.

This is *not* a misannotation and not a false positive; this is a real,
honest deadlock.
Thread A:
	read() on hugetlbfs
	hugetlbfs_read() called
	i_mutex grabbed
	hugetlbfs_read_actor() called
	__copy_to_user() called
	page fault is triggered
Thread B, sharing address space with A:
	mmap() the same file
	->mmap_sem is grabbed on task_B->mm->mmap_sem
	hugetlbfs_file_mmap() is called
	attempt to grab ->i_mutex and block waiting for A to give it up
Thread A:
	pagefault handled blocked on attempt to grab task_A->mm->mmap_sem,
which happens to be the same thing as task_B->mm->mmap_sem.  Block waiting
for B to give it up.

Deadlock.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: hugetlbfs lockdep spew revisited.
  2012-02-17  0:27 ` Al Viro
@ 2012-02-23  9:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2012-02-23  9:27 UTC (permalink / raw)
  To: Al Viro, Dave Jones, Josh Boyer, Linux Kernel, Andrew Morton

On Fri, 17 Feb 2012 00:27:26 +0000, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> On Thu, Feb 16, 2012 at 07:08:57PM -0500, Dave Jones wrote:
> > Remember this ? https://lkml.org/lkml/2011/4/15/272
> > Josh took a stab at fixing it in e096d0c7e2e4e5893792db865dd065ac73cf1f00,
> > but it seems to still be there.
> 
> > the existing dependency chain (in reverse order) is:
> 
> [snip]
> 
> ... and as bloody usual, that mentioning of readdir in the output is a
> red herring; the real problem (and yes, it *is* deadlock-prone) is not
> with getdents(2) that cannot happen on anything that could be mmaped;
> it's with hugetlbfs_read() (i.e. read(2)) that very definitely *can*.
> 
> This is *not* a misannotation and not a false positive; this is a real,
> honest deadlock.
> Thread A:
> 	read() on hugetlbfs
> 	hugetlbfs_read() called
> 	i_mutex grabbed
> 	hugetlbfs_read_actor() called
> 	__copy_to_user() called
> 	page fault is triggered
> Thread B, sharing address space with A:
> 	mmap() the same file
> 	->mmap_sem is grabbed on task_B->mm->mmap_sem
> 	hugetlbfs_file_mmap() is called
> 	attempt to grab ->i_mutex and block waiting for A to give it up
> Thread A:
> 	pagefault handled blocked on attempt to grab task_A->mm->mmap_sem,
> which happens to be the same thing as task_B->mm->mmap_sem.  Block waiting
> for B to give it up.
> 
> Deadlock.

How about the below patch ? If this is ok, I can send this as a separate
mail. I am still not sure about dropping inode->i_mutex in mmap as that
will mean we cannot update i_size in mmap and that will break userspace ?

commit 8fb2df40cabd99dc4f39f7fd26ba1d4db885cb5e
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Wed Feb 22 10:18:51 2012 +0530

    hugetlbfs: Add new rw_semaphore for truncate/read race
    
    Drop using inode->i_mutex from read, since that can result in deadlock with
    mmap. Ideally we can extend the patch to make sure we don't increase i_size
    in mmap. But that will break userspace, because application will have to now
    use truncate(2) to increase i_size in hugetlbfs.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 2680578..cd33685 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -238,8 +238,9 @@ static ssize_t hugetlbfs_read(struct file *filp, char __user *buf,
 	unsigned long end_index;
 	loff_t isize;
 	ssize_t retval = 0;
+	struct hugetlbfs_inode_info *hinfo = HUGETLBFS_I(inode);
 
-	mutex_lock(&inode->i_mutex);
+	down_read(&hinfo->truncate_sem);
 
 	/* validate length */
 	if (len == 0)
@@ -309,7 +310,7 @@ static ssize_t hugetlbfs_read(struct file *filp, char __user *buf,
 	}
 out:
 	*ppos = ((loff_t)index << huge_page_shift(h)) + offset;
-	mutex_unlock(&inode->i_mutex);
+	up_read(&hinfo->truncate_sem);
 	return retval;
 }
 
@@ -408,16 +409,19 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset)
 	pgoff_t pgoff;
 	struct address_space *mapping = inode->i_mapping;
 	struct hstate *h = hstate_inode(inode);
+	struct hugetlbfs_inode_info *hinfo = HUGETLBFS_I(inode);
 
 	BUG_ON(offset & ~huge_page_mask(h));
 	pgoff = offset >> PAGE_SHIFT;
 
+	down_write(&hinfo->truncate_sem);
 	i_size_write(inode, offset);
 	mutex_lock(&mapping->i_mmap_mutex);
 	if (!prio_tree_empty(&mapping->i_mmap))
 		hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff);
 	mutex_unlock(&mapping->i_mmap_mutex);
 	truncate_hugepages(inode, offset);
+	up_write(&hinfo->truncate_sem);
 	return 0;
 }
 
@@ -695,9 +699,10 @@ static const struct address_space_operations hugetlbfs_aops = {
 
 static void init_once(void *foo)
 {
-	struct hugetlbfs_inode_info *ei = (struct hugetlbfs_inode_info *)foo;
+	struct hugetlbfs_inode_info *hinfo = (struct hugetlbfs_inode_info *)foo;
 
-	inode_init_once(&ei->vfs_inode);
+	init_rwsem(&hinfo->truncate_sem);
+	inode_init_once(&hinfo->vfs_inode);
 }
 
 const struct file_operations hugetlbfs_file_operations = {
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 226f488..57fb788 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -155,6 +155,7 @@ struct hugetlbfs_sb_info {
 
 struct hugetlbfs_inode_info {
 	struct shared_policy policy;
+	struct rw_semaphore truncate_sem;
 	struct inode vfs_inode;
 };
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-02-23  9:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-17  0:08 hugetlbfs lockdep spew revisited Dave Jones
2012-02-17  0:16 ` Josh Boyer
2012-02-17  0:34   ` Al Viro
2012-02-17  0:38   ` Tyler Hicks
2012-02-17  0:49     ` Al Viro
2012-02-17  3:42       ` Tyler Hicks
2012-02-21 18:21         ` Mimi Zohar
2012-02-17  6:47       ` J. R. Okajima
2012-02-17 17:48       ` udf deadlock (was Re: hugetlbfs lockdep spew revisited.) Al Viro
2012-02-20 16:01         ` Jan Kara
2012-02-18 10:55       ` hugetlbfs lockdep spew revisited Aneesh Kumar K.V
2012-02-17  0:27 ` Al Viro
2012-02-23  9:27   ` Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).