* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:02 ` Andrew Morton
@ 2012-03-08 21:10 ` Dave Jones
2012-03-08 21:19 ` Tyler Hicks
2012-03-08 21:44 ` Al Viro
2 siblings, 0 replies; 14+ messages in thread
From: Dave Jones @ 2012-03-08 21:10 UTC (permalink / raw)
To: Andrew Morton
Cc: Aneesh Kumar K.V, linux-mm, jboyer, tyhicks, linux-kernel,
Al Viro, Peter Zijlstra, Mimi Zohar
On Thu, Mar 08, 2012 at 01:02:56PM -0800, Andrew Morton wrote:
> > ======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 3.3.0-rc4+ #190 Not tainted
> > -------------------------------------------------------
> > shared/1568 is trying to acquire lock:
> > (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
> >
> > but task is already holding lock:
> > (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
> >
> > which lock already depends on the new lock.
> >
>
> Why have these lockdep warnings started coming out now - was the VFS
> changed to newly take i_mutex somewhere in the directory handling?
This has been happening for almost a year!
https://lkml.org/lkml/2011/4/15/272
See also https://lkml.org/lkml/2012/2/16/498
Dave
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:02 ` Andrew Morton
2012-03-08 21:10 ` Dave Jones
@ 2012-03-08 21:19 ` Tyler Hicks
2012-03-08 21:40 ` Andrew Morton
2012-03-08 21:44 ` Al Viro
2 siblings, 1 reply; 14+ messages in thread
From: Tyler Hicks @ 2012-03-08 21:19 UTC (permalink / raw)
To: Andrew Morton
Cc: Aneesh Kumar K.V, linux-mm, davej, jboyer, linux-kernel, Al Viro,
Peter Zijlstra, Mimi Zohar
[-- Attachment #1: Type: text/plain, Size: 4169 bytes --]
On 2012-03-08 13:02:56, Andrew Morton wrote:
> On Thu, 8 Mar 2012 14:45:16 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >
> > This fix the below lockdep warning
>
> OK, what's going on here.
>
> > ======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 3.3.0-rc4+ #190 Not tainted
> > -------------------------------------------------------
> > shared/1568 is trying to acquire lock:
> > (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
> >
> > but task is already holding lock:
> > (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
> >
> > which lock already depends on the new lock.
> >
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #1 (&mm->mmap_sem){++++++}:
> > [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
> > [<ffffffff810ee439>] might_fault+0x6d/0x90
> > [<ffffffff8111bc12>] filldir+0x6a/0xc2
> > [<ffffffff81129942>] dcache_readdir+0x5c/0x222
> > [<ffffffff8111be58>] vfs_readdir+0x76/0xac
> > [<ffffffff8111bf6a>] sys_getdents+0x79/0xc9
> > [<ffffffff816940a2>] system_call_fastpath+0x16/0x1b
> >
> > -> #0 (&sb->s_type->i_mutex_key#12){+.+.+.}:
> > [<ffffffff8109f40a>] __lock_acquire+0xa6c/0xd60
> > [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
> > [<ffffffff816916be>] __mutex_lock_common+0x48/0x350
> > [<ffffffff81691a85>] mutex_lock_nested+0x2a/0x31
> > [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
> > [<ffffffff810f4fd0>] mmap_region+0x26f/0x466
> > [<ffffffff810f545b>] do_mmap_pgoff+0x294/0x2ee
> > [<ffffffff810f55a9>] sys_mmap_pgoff+0xf4/0x12f
> > [<ffffffff8103d1f2>] sys_mmap+0x1d/0x1f
> > [<ffffffff816940a2>] system_call_fastpath+0x16/0x1b
> >
> > other info that might help us debug this:
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(&mm->mmap_sem);
> > lock(&sb->s_type->i_mutex_key#12);
> > lock(&mm->mmap_sem);
> > lock(&sb->s_type->i_mutex_key#12);
> >
> > *** DEADLOCK ***
> >
> > 1 lock held by shared/1568:
> > #0: (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
> >
> > stack backtrace:
> > Pid: 1568, comm: shared Not tainted 3.3.0-rc4+ #190
> > Call Trace:
> > [<ffffffff81688bf9>] print_circular_bug+0x1f8/0x209
> > [<ffffffff8109f40a>] __lock_acquire+0xa6c/0xd60
> > [<ffffffff8110e7b6>] ? files_lglock_local_lock_cpu+0x61/0x61
> > [<ffffffff811efa0f>] ? hugetlbfs_file_mmap+0x7d/0x108
> > [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
> > [<ffffffff811efa0f>] ? hugetlbfs_file_mmap+0x7d/0x108
> >
>
> Why have these lockdep warnings started coming out now - was the VFS
> changed to newly take i_mutex somewhere in the directory handling?
I'm not sure that they're new warnings. My patch (linked to below) may
have just gave folks a false hope that their nagging lockdep problems
are over.
>
>
> Sigh. Was lockdep_annotate_inode_mutex_key() sufficiently
> self-explanatory to justify leaving it undocumented?
>
> <goes off and reads e096d0c7e2e>
>
> OK, the patch looks correct given the explanation in e096d0c7e2e, but
> I'd like to understand why it becomes necessary only now.
>
> > NOTE: This patch also require
> > http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> > to remove the lockdep warning
>
> And that patch has been basically ignored.
Al commented on it here:
https://lkml.org/lkml/2012/2/16/518
He said that while my patch is correct, taking i_mutex inside mmap_sem
is still wrong.
Tyler
>
> Sigh. I guess I'll grab both patches, but I'm not confident in doing
> so without an overall explanation of what is happening here.
>
>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:19 ` Tyler Hicks
@ 2012-03-08 21:40 ` Andrew Morton
2012-03-08 21:49 ` Al Viro
2012-03-09 5:03 ` Aneesh Kumar K.V
0 siblings, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2012-03-08 21:40 UTC (permalink / raw)
To: Tyler Hicks
Cc: Aneesh Kumar K.V, linux-mm, davej, jboyer, linux-kernel, Al Viro,
Peter Zijlstra, Mimi Zohar, David Gibson
On Thu, 8 Mar 2012 15:19:27 -0600
Tyler Hicks <tyhicks@canonical.com> wrote:
> >
> >
> > Sigh. Was lockdep_annotate_inode_mutex_key() sufficiently
> > self-explanatory to justify leaving it undocumented?
> >
> > <goes off and reads e096d0c7e2e>
> >
> > OK, the patch looks correct given the explanation in e096d0c7e2e, but
> > I'd like to understand why it becomes necessary only now.
> >
> > > NOTE: This patch also require
> > > http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> > > to remove the lockdep warning
> >
> > And that patch has been basically ignored.
>
> Al commented on it here:
>
> https://lkml.org/lkml/2012/2/16/518
>
> He said that while my patch is correct, taking i_mutex inside mmap_sem
> is still wrong.
OK, thanks, yup. Taking i_mutex in file_operations.mmap() is wrong.
Is hugetlbfs actually deadlockable because of this, or is it the case
that the i_mutex->mmap_sem ordering happens to never happen for this
filesystem? Although we shouldn't go and create incompatible lock
ranking rules for different filesystems!
So we need to pull the i_mutex out of hugetlbfs_file_mmap(). What's it
actually trying to do in there? If we switch to
i_size_read()/i_size_write() then AFAICT the problem comes down to
hugetlb_reserve_pages().
hugetlb_reserve_pages() fiddles with i_mapping->private_list and the fs
owns private_list and is free to use a lock other than i_mutex to
protect it. (In fact i_mapping.private_lock is the usual lock for
private_list).
So from a quick scan here I'm thinking that a decent fix is to remove
the i_mutex locking from hugetlbfs_file_mmap(), switch
hugetlbfs_file_mmap() to i_size_read/write then use a hugetlb-private
lock to protect i_mapping->private_list. region_chg() will do
GFP_KERNEL allocations under that lock, so some care is needed.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:40 ` Andrew Morton
@ 2012-03-08 21:49 ` Al Viro
2012-03-08 22:19 ` Andrew Morton
2012-03-09 5:03 ` Aneesh Kumar K.V
1 sibling, 1 reply; 14+ messages in thread
From: Al Viro @ 2012-03-08 21:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Tyler Hicks, Aneesh Kumar K.V, linux-mm, davej, jboyer,
linux-kernel, Peter Zijlstra, Mimi Zohar, David Gibson
On Thu, Mar 08, 2012 at 01:40:50PM -0800, Andrew Morton wrote:
> OK, thanks, yup. Taking i_mutex in file_operations.mmap() is wrong.
... or in .release() (munmap() does fput() under mmap_sem).
> Is hugetlbfs actually deadlockable because of this, or is it the case
> that the i_mutex->mmap_sem ordering happens to never happen for this
> filesystem?
Yes, it is. Look at read(2) on hugetlbfs; it copies userland data
while holding ->i_mutex. So we have
read(2):
mutex_lock(&A)
down_read(&B)
mmap(2):
down_write(&B);
mutex_lock(&A);
which is an obvious deadlock.
> So we need to pull the i_mutex out of hugetlbfs_file_mmap().
IIRC, you have a patch in your tree doing just that...
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:49 ` Al Viro
@ 2012-03-08 22:19 ` Andrew Morton
2012-03-08 22:33 ` Dave Jones
2012-03-09 5:00 ` Aneesh Kumar K.V
0 siblings, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2012-03-08 22:19 UTC (permalink / raw)
To: Al Viro
Cc: Tyler Hicks, Aneesh Kumar K.V, linux-mm, davej, jboyer,
linux-kernel, Peter Zijlstra, Mimi Zohar, David Gibson
On Thu, 8 Mar 2012 21:49:52 +0000
Al Viro <viro@ZenIV.linux.org.uk> wrote:
> > So we need to pull the i_mutex out of hugetlbfs_file_mmap().
>
> IIRC, you have a patch in your tree doing just that...
Nope.
But it seems that you've recently seen such a patch - can you recall
where? Or was it the ecryptfs thing?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 22:19 ` Andrew Morton
@ 2012-03-08 22:33 ` Dave Jones
2012-03-08 22:45 ` Andrew Morton
2012-03-09 5:00 ` Aneesh Kumar K.V
1 sibling, 1 reply; 14+ messages in thread
From: Dave Jones @ 2012-03-08 22:33 UTC (permalink / raw)
To: Andrew Morton
Cc: Al Viro, Tyler Hicks, Aneesh Kumar K.V, linux-mm, jboyer,
linux-kernel, Peter Zijlstra, Mimi Zohar, David Gibson
On Thu, Mar 08, 2012 at 02:19:38PM -0800, Andrew Morton wrote:
> On Thu, 8 Mar 2012 21:49:52 +0000
> Al Viro <viro@ZenIV.linux.org.uk> wrote:
>
> > > So we need to pull the i_mutex out of hugetlbfs_file_mmap().
> >
> > IIRC, you have a patch in your tree doing just that...
>
> Nope.
>
> But it seems that you've recently seen such a patch - can you recall
> where?
this ? https://lkml.org/lkml/2012/2/23/64
Dave
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 22:33 ` Dave Jones
@ 2012-03-08 22:45 ` Andrew Morton
0 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2012-03-08 22:45 UTC (permalink / raw)
To: Dave Jones
Cc: Al Viro, Tyler Hicks, Aneesh Kumar K.V, linux-mm, jboyer,
linux-kernel, Peter Zijlstra, Mimi Zohar, David Gibson
On Thu, 8 Mar 2012 17:33:34 -0500
Dave Jones <davej@redhat.com> wrote:
> On Thu, Mar 08, 2012 at 02:19:38PM -0800, Andrew Morton wrote:
> > On Thu, 8 Mar 2012 21:49:52 +0000
> > Al Viro <viro@ZenIV.linux.org.uk> wrote:
> >
> > > > So we need to pull the i_mutex out of hugetlbfs_file_mmap().
> > >
> > > IIRC, you have a patch in your tree doing just that...
> >
> > Nope.
> >
> > But it seems that you've recently seen such a patch - can you recall
> > where?
>
> this ? https://lkml.org/lkml/2012/2/23/64
>
Thanks, yes, probably that. Needs the i_size_read()/write() changes.
I worry a bit about the region handling code in mm/hugetlb.c.
* The region data structures are protected by a combination of the mmap_sem
* and the hugetlb_instantion_mutex. To access or modify a region the caller
* must either hold the mmap_sem for write, or the mmap_sem for read and
* the hugetlb_instantiation mutex:
I hope that's true - it would be nice to have some debug assertions in
the various region_foo() functions to verify that the required locks are
held.
But if that code is all nice and tight, I guess that removing that
i_mutex acquisition will be pretty simple.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 22:19 ` Andrew Morton
2012-03-08 22:33 ` Dave Jones
@ 2012-03-09 5:00 ` Aneesh Kumar K.V
1 sibling, 0 replies; 14+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-09 5:00 UTC (permalink / raw)
To: Andrew Morton, Al Viro
Cc: Tyler Hicks, linux-mm, davej, jboyer, linux-kernel,
Peter Zijlstra, Mimi Zohar, David Gibson
On Thu, 8 Mar 2012 14:19:38 -0800, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 8 Mar 2012 21:49:52 +0000
> Al Viro <viro@ZenIV.linux.org.uk> wrote:
>
> > > So we need to pull the i_mutex out of hugetlbfs_file_mmap().
> >
> > IIRC, you have a patch in your tree doing just that...
>
> Nope.
>
> But it seems that you've recently seen such a patch - can you recall
> where? Or was it the ecryptfs thing?
>
So what we ended up doing was
http://article.gmane.org/gmane.linux.kernel.mm/74732
The patch update hugetlbfs_read to not take i_mutex. That should make
sure deadlock won't happen.
-aneesh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:40 ` Andrew Morton
2012-03-08 21:49 ` Al Viro
@ 2012-03-09 5:03 ` Aneesh Kumar K.V
1 sibling, 0 replies; 14+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-09 5:03 UTC (permalink / raw)
To: Andrew Morton, Tyler Hicks
Cc: linux-mm, davej, jboyer, linux-kernel, Al Viro, Peter Zijlstra,
Mimi Zohar, David Gibson
On Thu, 8 Mar 2012 13:40:50 -0800, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 8 Mar 2012 15:19:27 -0600
> Tyler Hicks <tyhicks@canonical.com> wrote:
>
> > >
> > >
> > > Sigh. Was lockdep_annotate_inode_mutex_key() sufficiently
> > > self-explanatory to justify leaving it undocumented?
> > >
> > > <goes off and reads e096d0c7e2e>
> > >
> > > OK, the patch looks correct given the explanation in e096d0c7e2e, but
> > > I'd like to understand why it becomes necessary only now.
> > >
> > > > NOTE: This patch also require
> > > > http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> > > > to remove the lockdep warning
> > >
> > > And that patch has been basically ignored.
> >
> > Al commented on it here:
> >
> > https://lkml.org/lkml/2012/2/16/518
> >
> > He said that while my patch is correct, taking i_mutex inside mmap_sem
> > is still wrong.
>
> OK, thanks, yup. Taking i_mutex in file_operations.mmap() is wrong.
>
> Is hugetlbfs actually deadlockable because of this, or is it the case
> that the i_mutex->mmap_sem ordering happens to never happen for this
> filesystem? Although we shouldn't go and create incompatible lock
> ranking rules for different filesystems!
>
> So we need to pull the i_mutex out of hugetlbfs_file_mmap(). What's it
> actually trying to do in there? If we switch to
> i_size_read()/i_size_write() then AFAICT the problem comes down to
> hugetlb_reserve_pages().
>
> hugetlb_reserve_pages() fiddles with i_mapping->private_list and the fs
> owns private_list and is free to use a lock other than i_mutex to
> protect it. (In fact i_mapping.private_lock is the usual lock for
> private_list).
>
>
>
> So from a quick scan here I'm thinking that a decent fix is to remove
> the i_mutex locking from hugetlbfs_file_mmap(), switch
> hugetlbfs_file_mmap() to i_size_read/write then use a hugetlb-private
> lock to protect i_mapping->private_list. region_chg() will do
> GFP_KERNEL allocations under that lock, so some care is needed.
>
But as per 7762f5a0b709b415fda132258ad37b9f2a1db994 i_size_write should
always happen with i_mutex held
-aneesh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:02 ` Andrew Morton
2012-03-08 21:10 ` Dave Jones
2012-03-08 21:19 ` Tyler Hicks
@ 2012-03-08 21:44 ` Al Viro
2012-03-08 22:44 ` Peter Zijlstra
2 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2012-03-08 21:44 UTC (permalink / raw)
To: Andrew Morton
Cc: Aneesh Kumar K.V, linux-mm, davej, jboyer, tyhicks, linux-kernel,
Peter Zijlstra, Mimi Zohar
On Thu, Mar 08, 2012 at 01:02:56PM -0800, Andrew Morton wrote:
> > This fix the below lockdep warning
>
> OK, what's going on here.
Deadlock in hugetlbfs mmap getting misreported.
One last time: ->mmap_sem nests inside ->i_mutex. Both for regular
files and for directories. Always had.
For directories there's copy_to_user() from ->readdir() done under ->i_mutex.
For regular files there's copy_from_user() from ->write(), usually done under
->i_mutex. On hugetlbfs there's copy_to_user() from ->read() done under
->i_mutex.
It had not changed at all. Lockdep sees both call chains; the only question
is which chain is seen first. And usually reading a directory happens earlier
in the boot than writing into a file. That's all there is to it.
Unfortunately, the fact that call chain being reported is obviously about
directories leads to false hopes that deadlock doesn't exist - mmap()
obviously can't happen to a directory inode, so people hope that it's a
false positive. It isn't.
Patch separating directory and non-directory ->i_mutex into different classes
went in at some point, precisely due to those hopes. It had a braino that
made it useless. Fix for that braino had been posted and sits my queue; I'll
push it to Linus along with other pending fixes tonight.
It will *not* eliminate the (very real) deadlock. It might make the warning
go away, but only if read() on hugetlbfs files doesn't happen during boot.
I suspect that they right thing would be to have a way to set explicit
nesting rules, not tied to speficic call trace. I hadn't looked into
lockdep guts, so no idea how much will that hurt to implement. As in
lockdep_lock_nests(class_outer, class_inner, message), acting as if
there had been a call chain where class_outer had been taken before
class_inner, with message going in place of call trace for that chain
when we run into a conflict...
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:44 ` Al Viro
@ 2012-03-08 22:44 ` Peter Zijlstra
2012-03-08 22:46 ` Peter Zijlstra
0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2012-03-08 22:44 UTC (permalink / raw)
To: Al Viro
Cc: Andrew Morton, Aneesh Kumar K.V, linux-mm, davej, jboyer, tyhicks,
linux-kernel, Mimi Zohar
On Thu, 2012-03-08 at 21:44 +0000, Al Viro wrote:
> I suspect that they right thing would be to have a way to set explicit
> nesting rules, not tied to speficic call trace.
See might_lock() / might_lock_read(), these are used to implement
might_fault(), which is used to annotate paths that could -- but rarely
do -- fault.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 22:44 ` Peter Zijlstra
@ 2012-03-08 22:46 ` Peter Zijlstra
0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2012-03-08 22:46 UTC (permalink / raw)
To: Al Viro
Cc: Andrew Morton, Aneesh Kumar K.V, linux-mm, davej, jboyer, tyhicks,
linux-kernel, Mimi Zohar
On Thu, 2012-03-08 at 23:44 +0100, Peter Zijlstra wrote:
> On Thu, 2012-03-08 at 21:44 +0000, Al Viro wrote:
> > I suspect that they right thing would be to have a way to set explicit
> > nesting rules, not tied to speficic call trace.
>
> See might_lock() / might_lock_read(), these are used to implement
> might_fault(), which is used to annotate paths that could -- but rarely
> do -- fault.
This will of course result in a specific trace, but if you do it early
enough the trace points to your setup function, which can contain a
comment explaining things.
^ permalink raw reply [flat|nested] 14+ messages in thread