* [PATCH] hugetlbfs: lockdep annotate root inode properly
@ 2012-03-08 9:15 Aneesh Kumar K.V
2012-03-08 21:02 ` Andrew Morton
0 siblings, 1 reply; 15+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-08 9:15 UTC (permalink / raw)
To: linux-mm, akpm, davej, jboyer, tyhicks; +Cc: linux-kernel, Aneesh Kumar K.V
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
This fix the below lockdep warning
======================================================
[ INFO: possible circular locking dependency detected ]
3.3.0-rc4+ #190 Not tainted
-------------------------------------------------------
shared/1568 is trying to acquire lock:
(&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
but task is already holding lock:
(&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&mm->mmap_sem){++++++}:
[<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
[<ffffffff810ee439>] might_fault+0x6d/0x90
[<ffffffff8111bc12>] filldir+0x6a/0xc2
[<ffffffff81129942>] dcache_readdir+0x5c/0x222
[<ffffffff8111be58>] vfs_readdir+0x76/0xac
[<ffffffff8111bf6a>] sys_getdents+0x79/0xc9
[<ffffffff816940a2>] system_call_fastpath+0x16/0x1b
-> #0 (&sb->s_type->i_mutex_key#12){+.+.+.}:
[<ffffffff8109f40a>] __lock_acquire+0xa6c/0xd60
[<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
[<ffffffff816916be>] __mutex_lock_common+0x48/0x350
[<ffffffff81691a85>] mutex_lock_nested+0x2a/0x31
[<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
[<ffffffff810f4fd0>] mmap_region+0x26f/0x466
[<ffffffff810f545b>] do_mmap_pgoff+0x294/0x2ee
[<ffffffff810f55a9>] sys_mmap_pgoff+0xf4/0x12f
[<ffffffff8103d1f2>] sys_mmap+0x1d/0x1f
[<ffffffff816940a2>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(&sb->s_type->i_mutex_key#12);
lock(&mm->mmap_sem);
lock(&sb->s_type->i_mutex_key#12);
*** DEADLOCK ***
1 lock held by shared/1568:
#0: (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
stack backtrace:
Pid: 1568, comm: shared Not tainted 3.3.0-rc4+ #190
Call Trace:
[<ffffffff81688bf9>] print_circular_bug+0x1f8/0x209
[<ffffffff8109f40a>] __lock_acquire+0xa6c/0xd60
[<ffffffff8110e7b6>] ? files_lglock_local_lock_cpu+0x61/0x61
[<ffffffff811efa0f>] ? hugetlbfs_file_mmap+0x7d/0x108
[<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
[<ffffffff811efa0f>] ? hugetlbfs_file_mmap+0x7d/0x108
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/hugetlbfs/inode.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
NOTE: This patch also require
http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
to remove the lockdep warning
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3645cd3..ca4fa70 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -459,6 +459,7 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb,
inode->i_fop = &simple_dir_operations;
/* directory inodes start off with i_nlink == 2 (for "." entry) */
inc_nlink(inode);
+ lockdep_annotate_inode_mutex_key(inode);
}
return inode;
}
--
1.7.9
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 9:15 [PATCH] hugetlbfs: lockdep annotate root inode properly Aneesh Kumar K.V
@ 2012-03-08 21:02 ` Andrew Morton
2012-03-08 21:10 ` Dave Jones
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Andrew Morton @ 2012-03-08 21:02 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, davej, jboyer, tyhicks, linux-kernel, Al Viro,
Peter Zijlstra, Mimi Zohar
On Thu, 8 Mar 2012 14:45:16 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> This fix the below lockdep warning
OK, what's going on here.
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.3.0-rc4+ #190 Not tainted
> -------------------------------------------------------
> shared/1568 is trying to acquire lock:
> (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
>
> but task is already holding lock:
> (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&mm->mmap_sem){++++++}:
> [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
> [<ffffffff810ee439>] might_fault+0x6d/0x90
> [<ffffffff8111bc12>] filldir+0x6a/0xc2
> [<ffffffff81129942>] dcache_readdir+0x5c/0x222
> [<ffffffff8111be58>] vfs_readdir+0x76/0xac
> [<ffffffff8111bf6a>] sys_getdents+0x79/0xc9
> [<ffffffff816940a2>] system_call_fastpath+0x16/0x1b
>
> -> #0 (&sb->s_type->i_mutex_key#12){+.+.+.}:
> [<ffffffff8109f40a>] __lock_acquire+0xa6c/0xd60
> [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
> [<ffffffff816916be>] __mutex_lock_common+0x48/0x350
> [<ffffffff81691a85>] mutex_lock_nested+0x2a/0x31
> [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
> [<ffffffff810f4fd0>] mmap_region+0x26f/0x466
> [<ffffffff810f545b>] do_mmap_pgoff+0x294/0x2ee
> [<ffffffff810f55a9>] sys_mmap_pgoff+0xf4/0x12f
> [<ffffffff8103d1f2>] sys_mmap+0x1d/0x1f
> [<ffffffff816940a2>] system_call_fastpath+0x16/0x1b
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&mm->mmap_sem);
> lock(&sb->s_type->i_mutex_key#12);
> lock(&mm->mmap_sem);
> lock(&sb->s_type->i_mutex_key#12);
>
> *** DEADLOCK ***
>
> 1 lock held by shared/1568:
> #0: (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
>
> stack backtrace:
> Pid: 1568, comm: shared Not tainted 3.3.0-rc4+ #190
> Call Trace:
> [<ffffffff81688bf9>] print_circular_bug+0x1f8/0x209
> [<ffffffff8109f40a>] __lock_acquire+0xa6c/0xd60
> [<ffffffff8110e7b6>] ? files_lglock_local_lock_cpu+0x61/0x61
> [<ffffffff811efa0f>] ? hugetlbfs_file_mmap+0x7d/0x108
> [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
> [<ffffffff811efa0f>] ? hugetlbfs_file_mmap+0x7d/0x108
>
Why have these lockdep warnings started coming out now - was the VFS
changed to newly take i_mutex somewhere in the directory handling?
Sigh. Was lockdep_annotate_inode_mutex_key() sufficiently
self-explanatory to justify leaving it undocumented?
<goes off and reads e096d0c7e2e>
OK, the patch looks correct given the explanation in e096d0c7e2e, but
I'd like to understand why it becomes necessary only now.
> NOTE: This patch also require
> http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> to remove the lockdep warning
And that patch has been basically ignored.
Sigh. I guess I'll grab both patches, but I'm not confident in doing
so without an overall explanation of what is happening here.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:02 ` Andrew Morton
@ 2012-03-08 21:10 ` Dave Jones
2012-03-08 21:19 ` Tyler Hicks
2012-03-08 21:44 ` Al Viro
2 siblings, 0 replies; 15+ messages in thread
From: Dave Jones @ 2012-03-08 21:10 UTC (permalink / raw)
To: Andrew Morton
Cc: Aneesh Kumar K.V, linux-mm, jboyer, tyhicks, linux-kernel,
Al Viro, Peter Zijlstra, Mimi Zohar
On Thu, Mar 08, 2012 at 01:02:56PM -0800, Andrew Morton wrote:
> > ======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 3.3.0-rc4+ #190 Not tainted
> > -------------------------------------------------------
> > shared/1568 is trying to acquire lock:
> > (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
> >
> > but task is already holding lock:
> > (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
> >
> > which lock already depends on the new lock.
> >
>
> Why have these lockdep warnings started coming out now - was the VFS
> changed to newly take i_mutex somewhere in the directory handling?
This has been happening for almost a year!
https://lkml.org/lkml/2011/4/15/272
See also https://lkml.org/lkml/2012/2/16/498
Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:02 ` Andrew Morton
2012-03-08 21:10 ` Dave Jones
@ 2012-03-08 21:19 ` Tyler Hicks
2012-03-08 21:40 ` Andrew Morton
2012-03-08 21:44 ` Al Viro
2 siblings, 1 reply; 15+ messages in thread
From: Tyler Hicks @ 2012-03-08 21:19 UTC (permalink / raw)
To: Andrew Morton
Cc: Aneesh Kumar K.V, linux-mm, davej, jboyer, linux-kernel, Al Viro,
Peter Zijlstra, Mimi Zohar
[-- Attachment #1: Type: text/plain, Size: 4169 bytes --]
On 2012-03-08 13:02:56, Andrew Morton wrote:
> On Thu, 8 Mar 2012 14:45:16 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >
> > This fix the below lockdep warning
>
> OK, what's going on here.
>
> > ======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 3.3.0-rc4+ #190 Not tainted
> > -------------------------------------------------------
> > shared/1568 is trying to acquire lock:
> > (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
> >
> > but task is already holding lock:
> > (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
> >
> > which lock already depends on the new lock.
> >
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #1 (&mm->mmap_sem){++++++}:
> > [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
> > [<ffffffff810ee439>] might_fault+0x6d/0x90
> > [<ffffffff8111bc12>] filldir+0x6a/0xc2
> > [<ffffffff81129942>] dcache_readdir+0x5c/0x222
> > [<ffffffff8111be58>] vfs_readdir+0x76/0xac
> > [<ffffffff8111bf6a>] sys_getdents+0x79/0xc9
> > [<ffffffff816940a2>] system_call_fastpath+0x16/0x1b
> >
> > -> #0 (&sb->s_type->i_mutex_key#12){+.+.+.}:
> > [<ffffffff8109f40a>] __lock_acquire+0xa6c/0xd60
> > [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
> > [<ffffffff816916be>] __mutex_lock_common+0x48/0x350
> > [<ffffffff81691a85>] mutex_lock_nested+0x2a/0x31
> > [<ffffffff811efa0f>] hugetlbfs_file_mmap+0x7d/0x108
> > [<ffffffff810f4fd0>] mmap_region+0x26f/0x466
> > [<ffffffff810f545b>] do_mmap_pgoff+0x294/0x2ee
> > [<ffffffff810f55a9>] sys_mmap_pgoff+0xf4/0x12f
> > [<ffffffff8103d1f2>] sys_mmap+0x1d/0x1f
> > [<ffffffff816940a2>] system_call_fastpath+0x16/0x1b
> >
> > other info that might help us debug this:
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(&mm->mmap_sem);
> > lock(&sb->s_type->i_mutex_key#12);
> > lock(&mm->mmap_sem);
> > lock(&sb->s_type->i_mutex_key#12);
> >
> > *** DEADLOCK ***
> >
> > 1 lock held by shared/1568:
> > #0: (&mm->mmap_sem){++++++}, at: [<ffffffff810f5589>] sys_mmap_pgoff+0xd4/0x12f
> >
> > stack backtrace:
> > Pid: 1568, comm: shared Not tainted 3.3.0-rc4+ #190
> > Call Trace:
> > [<ffffffff81688bf9>] print_circular_bug+0x1f8/0x209
> > [<ffffffff8109f40a>] __lock_acquire+0xa6c/0xd60
> > [<ffffffff8110e7b6>] ? files_lglock_local_lock_cpu+0x61/0x61
> > [<ffffffff811efa0f>] ? hugetlbfs_file_mmap+0x7d/0x108
> > [<ffffffff8109fb8f>] lock_acquire+0xd5/0xfa
> > [<ffffffff811efa0f>] ? hugetlbfs_file_mmap+0x7d/0x108
> >
>
> Why have these lockdep warnings started coming out now - was the VFS
> changed to newly take i_mutex somewhere in the directory handling?
I'm not sure that they're new warnings. My patch (linked to below) may
have just gave folks a false hope that their nagging lockdep problems
are over.
>
>
> Sigh. Was lockdep_annotate_inode_mutex_key() sufficiently
> self-explanatory to justify leaving it undocumented?
>
> <goes off and reads e096d0c7e2e>
>
> OK, the patch looks correct given the explanation in e096d0c7e2e, but
> I'd like to understand why it becomes necessary only now.
>
> > NOTE: This patch also require
> > http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> > to remove the lockdep warning
>
> And that patch has been basically ignored.
Al commented on it here:
https://lkml.org/lkml/2012/2/16/518
He said that while my patch is correct, taking i_mutex inside mmap_sem
is still wrong.
Tyler
>
> Sigh. I guess I'll grab both patches, but I'm not confident in doing
> so without an overall explanation of what is happening here.
>
>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:19 ` Tyler Hicks
@ 2012-03-08 21:40 ` Andrew Morton
2012-03-08 21:49 ` Al Viro
2012-03-09 5:03 ` Aneesh Kumar K.V
0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2012-03-08 21:40 UTC (permalink / raw)
To: Tyler Hicks
Cc: Aneesh Kumar K.V, linux-mm, davej, jboyer, linux-kernel, Al Viro,
Peter Zijlstra, Mimi Zohar, David Gibson
On Thu, 8 Mar 2012 15:19:27 -0600
Tyler Hicks <tyhicks@canonical.com> wrote:
> >
> >
> > Sigh. Was lockdep_annotate_inode_mutex_key() sufficiently
> > self-explanatory to justify leaving it undocumented?
> >
> > <goes off and reads e096d0c7e2e>
> >
> > OK, the patch looks correct given the explanation in e096d0c7e2e, but
> > I'd like to understand why it becomes necessary only now.
> >
> > > NOTE: This patch also require
> > > http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> > > to remove the lockdep warning
> >
> > And that patch has been basically ignored.
>
> Al commented on it here:
>
> https://lkml.org/lkml/2012/2/16/518
>
> He said that while my patch is correct, taking i_mutex inside mmap_sem
> is still wrong.
OK, thanks, yup. Taking i_mutex in file_operations.mmap() is wrong.
Is hugetlbfs actually deadlockable because of this, or is it the case
that the i_mutex->mmap_sem ordering happens to never happen for this
filesystem? Although we shouldn't go and create incompatible lock
ranking rules for different filesystems!
So we need to pull the i_mutex out of hugetlbfs_file_mmap(). What's it
actually trying to do in there? If we switch to
i_size_read()/i_size_write() then AFAICT the problem comes down to
hugetlb_reserve_pages().
hugetlb_reserve_pages() fiddles with i_mapping->private_list and the fs
owns private_list and is free to use a lock other than i_mutex to
protect it. (In fact i_mapping.private_lock is the usual lock for
private_list).
So from a quick scan here I'm thinking that a decent fix is to remove
the i_mutex locking from hugetlbfs_file_mmap(), switch
hugetlbfs_file_mmap() to i_size_read/write then use a hugetlb-private
lock to protect i_mapping->private_list. region_chg() will do
GFP_KERNEL allocations under that lock, so some care is needed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:40 ` Andrew Morton
@ 2012-03-08 21:49 ` Al Viro
2012-03-08 22:19 ` Andrew Morton
2012-03-09 5:03 ` Aneesh Kumar K.V
1 sibling, 1 reply; 15+ messages in thread
From: Al Viro @ 2012-03-08 21:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Tyler Hicks, Aneesh Kumar K.V, linux-mm, davej, jboyer,
linux-kernel, Peter Zijlstra, Mimi Zohar, David Gibson
On Thu, Mar 08, 2012 at 01:40:50PM -0800, Andrew Morton wrote:
> OK, thanks, yup. Taking i_mutex in file_operations.mmap() is wrong.
... or in .release() (munmap() does fput() under mmap_sem).
> Is hugetlbfs actually deadlockable because of this, or is it the case
> that the i_mutex->mmap_sem ordering happens to never happen for this
> filesystem?
Yes, it is. Look at read(2) on hugetlbfs; it copies userland data
while holding ->i_mutex. So we have
read(2):
mutex_lock(&A)
down_read(&B)
mmap(2):
down_write(&B);
mutex_lock(&A);
which is an obvious deadlock.
> So we need to pull the i_mutex out of hugetlbfs_file_mmap().
IIRC, you have a patch in your tree doing just that...
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:49 ` Al Viro
@ 2012-03-08 22:19 ` Andrew Morton
2012-03-08 22:33 ` Dave Jones
2012-03-09 5:00 ` Aneesh Kumar K.V
0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2012-03-08 22:19 UTC (permalink / raw)
To: Al Viro
Cc: Tyler Hicks, Aneesh Kumar K.V, linux-mm, davej, jboyer,
linux-kernel, Peter Zijlstra, Mimi Zohar, David Gibson
On Thu, 8 Mar 2012 21:49:52 +0000
Al Viro <viro@ZenIV.linux.org.uk> wrote:
> > So we need to pull the i_mutex out of hugetlbfs_file_mmap().
>
> IIRC, you have a patch in your tree doing just that...
Nope.
But it seems that you've recently seen such a patch - can you recall
where? Or was it the ecryptfs thing?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 22:19 ` Andrew Morton
@ 2012-03-08 22:33 ` Dave Jones
2012-03-08 22:45 ` Andrew Morton
2012-03-09 5:00 ` Aneesh Kumar K.V
1 sibling, 1 reply; 15+ messages in thread
From: Dave Jones @ 2012-03-08 22:33 UTC (permalink / raw)
To: Andrew Morton
Cc: Al Viro, Tyler Hicks, Aneesh Kumar K.V, linux-mm, jboyer,
linux-kernel, Peter Zijlstra, Mimi Zohar, David Gibson
On Thu, Mar 08, 2012 at 02:19:38PM -0800, Andrew Morton wrote:
> On Thu, 8 Mar 2012 21:49:52 +0000
> Al Viro <viro@ZenIV.linux.org.uk> wrote:
>
> > > So we need to pull the i_mutex out of hugetlbfs_file_mmap().
> >
> > IIRC, you have a patch in your tree doing just that...
>
> Nope.
>
> But it seems that you've recently seen such a patch - can you recall
> where?
this ? https://lkml.org/lkml/2012/2/23/64
Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 22:33 ` Dave Jones
@ 2012-03-08 22:45 ` Andrew Morton
0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2012-03-08 22:45 UTC (permalink / raw)
To: Dave Jones
Cc: Al Viro, Tyler Hicks, Aneesh Kumar K.V, linux-mm, jboyer,
linux-kernel, Peter Zijlstra, Mimi Zohar, David Gibson
On Thu, 8 Mar 2012 17:33:34 -0500
Dave Jones <davej@redhat.com> wrote:
> On Thu, Mar 08, 2012 at 02:19:38PM -0800, Andrew Morton wrote:
> > On Thu, 8 Mar 2012 21:49:52 +0000
> > Al Viro <viro@ZenIV.linux.org.uk> wrote:
> >
> > > > So we need to pull the i_mutex out of hugetlbfs_file_mmap().
> > >
> > > IIRC, you have a patch in your tree doing just that...
> >
> > Nope.
> >
> > But it seems that you've recently seen such a patch - can you recall
> > where?
>
> this ? https://lkml.org/lkml/2012/2/23/64
>
Thanks, yes, probably that. Needs the i_size_read()/write() changes.
I worry a bit about the region handling code in mm/hugetlb.c.
* The region data structures are protected by a combination of the mmap_sem
* and the hugetlb_instantion_mutex. To access or modify a region the caller
* must either hold the mmap_sem for write, or the mmap_sem for read and
* the hugetlb_instantiation mutex:
I hope that's true - it would be nice to have some debug assertions in
the various region_foo() functions to verify that the required locks are
held.
But if that code is all nice and tight, I guess that removing that
i_mutex acquisition will be pretty simple.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 22:19 ` Andrew Morton
2012-03-08 22:33 ` Dave Jones
@ 2012-03-09 5:00 ` Aneesh Kumar K.V
1 sibling, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-09 5:00 UTC (permalink / raw)
To: Andrew Morton, Al Viro
Cc: Tyler Hicks, linux-mm, davej, jboyer, linux-kernel,
Peter Zijlstra, Mimi Zohar, David Gibson
On Thu, 8 Mar 2012 14:19:38 -0800, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 8 Mar 2012 21:49:52 +0000
> Al Viro <viro@ZenIV.linux.org.uk> wrote:
>
> > > So we need to pull the i_mutex out of hugetlbfs_file_mmap().
> >
> > IIRC, you have a patch in your tree doing just that...
>
> Nope.
>
> But it seems that you've recently seen such a patch - can you recall
> where? Or was it the ecryptfs thing?
>
So what we ended up doing was
http://article.gmane.org/gmane.linux.kernel.mm/74732
The patch update hugetlbfs_read to not take i_mutex. That should make
sure deadlock won't happen.
-aneesh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:40 ` Andrew Morton
2012-03-08 21:49 ` Al Viro
@ 2012-03-09 5:03 ` Aneesh Kumar K.V
1 sibling, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-09 5:03 UTC (permalink / raw)
To: Andrew Morton, Tyler Hicks
Cc: linux-mm, davej, jboyer, linux-kernel, Al Viro, Peter Zijlstra,
Mimi Zohar, David Gibson
On Thu, 8 Mar 2012 13:40:50 -0800, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 8 Mar 2012 15:19:27 -0600
> Tyler Hicks <tyhicks@canonical.com> wrote:
>
> > >
> > >
> > > Sigh. Was lockdep_annotate_inode_mutex_key() sufficiently
> > > self-explanatory to justify leaving it undocumented?
> > >
> > > <goes off and reads e096d0c7e2e>
> > >
> > > OK, the patch looks correct given the explanation in e096d0c7e2e, but
> > > I'd like to understand why it becomes necessary only now.
> > >
> > > > NOTE: This patch also require
> > > > http://thread.gmane.org/gmane.linux.file-systems/58795/focus=59565
> > > > to remove the lockdep warning
> > >
> > > And that patch has been basically ignored.
> >
> > Al commented on it here:
> >
> > https://lkml.org/lkml/2012/2/16/518
> >
> > He said that while my patch is correct, taking i_mutex inside mmap_sem
> > is still wrong.
>
> OK, thanks, yup. Taking i_mutex in file_operations.mmap() is wrong.
>
> Is hugetlbfs actually deadlockable because of this, or is it the case
> that the i_mutex->mmap_sem ordering happens to never happen for this
> filesystem? Although we shouldn't go and create incompatible lock
> ranking rules for different filesystems!
>
> So we need to pull the i_mutex out of hugetlbfs_file_mmap(). What's it
> actually trying to do in there? If we switch to
> i_size_read()/i_size_write() then AFAICT the problem comes down to
> hugetlb_reserve_pages().
>
> hugetlb_reserve_pages() fiddles with i_mapping->private_list and the fs
> owns private_list and is free to use a lock other than i_mutex to
> protect it. (In fact i_mapping.private_lock is the usual lock for
> private_list).
>
>
>
> So from a quick scan here I'm thinking that a decent fix is to remove
> the i_mutex locking from hugetlbfs_file_mmap(), switch
> hugetlbfs_file_mmap() to i_size_read/write then use a hugetlb-private
> lock to protect i_mapping->private_list. region_chg() will do
> GFP_KERNEL allocations under that lock, so some care is needed.
>
But as per 7762f5a0b709b415fda132258ad37b9f2a1db994 i_size_write should
always happen with i_mutex held
-aneesh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:02 ` Andrew Morton
2012-03-08 21:10 ` Dave Jones
2012-03-08 21:19 ` Tyler Hicks
@ 2012-03-08 21:44 ` Al Viro
2012-03-08 22:44 ` Peter Zijlstra
2 siblings, 1 reply; 15+ messages in thread
From: Al Viro @ 2012-03-08 21:44 UTC (permalink / raw)
To: Andrew Morton
Cc: Aneesh Kumar K.V, linux-mm, davej, jboyer, tyhicks, linux-kernel,
Peter Zijlstra, Mimi Zohar
On Thu, Mar 08, 2012 at 01:02:56PM -0800, Andrew Morton wrote:
> > This fix the below lockdep warning
>
> OK, what's going on here.
Deadlock in hugetlbfs mmap getting misreported.
One last time: ->mmap_sem nests inside ->i_mutex. Both for regular
files and for directories. Always had.
For directories there's copy_to_user() from ->readdir() done under ->i_mutex.
For regular files there's copy_from_user() from ->write(), usually done under
->i_mutex. On hugetlbfs there's copy_to_user() from ->read() done under
->i_mutex.
It had not changed at all. Lockdep sees both call chains; the only question
is which chain is seen first. And usually reading a directory happens earlier
in the boot than writing into a file. That's all there is to it.
Unfortunately, the fact that call chain being reported is obviously about
directories leads to false hopes that deadlock doesn't exist - mmap()
obviously can't happen to a directory inode, so people hope that it's a
false positive. It isn't.
Patch separating directory and non-directory ->i_mutex into different classes
went in at some point, precisely due to those hopes. It had a braino that
made it useless. Fix for that braino had been posted and sits my queue; I'll
push it to Linus along with other pending fixes tonight.
It will *not* eliminate the (very real) deadlock. It might make the warning
go away, but only if read() on hugetlbfs files doesn't happen during boot.
I suspect that they right thing would be to have a way to set explicit
nesting rules, not tied to speficic call trace. I hadn't looked into
lockdep guts, so no idea how much will that hurt to implement. As in
lockdep_lock_nests(class_outer, class_inner, message), acting as if
there had been a call chain where class_outer had been taken before
class_inner, with message going in place of call trace for that chain
when we run into a conflict...
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 21:44 ` Al Viro
@ 2012-03-08 22:44 ` Peter Zijlstra
2012-03-08 22:46 ` Peter Zijlstra
0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2012-03-08 22:44 UTC (permalink / raw)
To: Al Viro
Cc: Andrew Morton, Aneesh Kumar K.V, linux-mm, davej, jboyer, tyhicks,
linux-kernel, Mimi Zohar
On Thu, 2012-03-08 at 21:44 +0000, Al Viro wrote:
> I suspect that they right thing would be to have a way to set explicit
> nesting rules, not tied to speficic call trace.
See might_lock() / might_lock_read(), these are used to implement
might_fault(), which is used to annotate paths that could -- but rarely
do -- fault.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
2012-03-08 22:44 ` Peter Zijlstra
@ 2012-03-08 22:46 ` Peter Zijlstra
0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2012-03-08 22:46 UTC (permalink / raw)
To: Al Viro
Cc: Andrew Morton, Aneesh Kumar K.V, linux-mm, davej, jboyer, tyhicks,
linux-kernel, Mimi Zohar
On Thu, 2012-03-08 at 23:44 +0100, Peter Zijlstra wrote:
> On Thu, 2012-03-08 at 21:44 +0000, Al Viro wrote:
> > I suspect that they right thing would be to have a way to set explicit
> > nesting rules, not tied to speficic call trace.
>
> See might_lock() / might_lock_read(), these are used to implement
> might_fault(), which is used to annotate paths that could -- but rarely
> do -- fault.
This will of course result in a specific trace, but if you do it early
enough the trace points to your setup function, which can contain a
comment explaining things.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] hugetlbfs: lockdep annotate root inode properly
@ 2012-04-16 20:28 Aneesh Kumar K.V
0 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2012-04-16 20:28 UTC (permalink / raw)
To: akpm, linux-mm, davej, linux-kernel, viro, jwboyer; +Cc: Aneesh Kumar K.V
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
This fix the below reported false lockdep warning. e096d0c7e2e4e5893792db865dd065ac73cf1f00
did a similar annotation for every other inode in hugetlbfs but missed the root
inode because it was allocated by a separate function.
For HugeTLB fs we allow taking i_mutex in mmap. HugeTLB fs doesn't support file
write and its file read callback is modified in a05b0855fd15504972dba2358e5faa172a1e50ba
to not take i_mutex. Hence for HugeTLB fs with regular files we really don't take
i_mutex with mmap_sem held.
======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc1+ #322 Not tainted
-------------------------------------------------------
bash/1572 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: [<ffffffff810f1618>] might_fault+0x40/0x90
but task is already holding lock:
(&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff81125f88>] vfs_readdir+0x56/0xa8
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&sb->s_type->i_mutex_key#12){+.+.+.}:
[<ffffffff810a09e5>] lock_acquire+0xd5/0xfa
[<ffffffff816a2f5e>] __mutex_lock_common+0x48/0x350
[<ffffffff816a3325>] mutex_lock_nested+0x2a/0x31
[<ffffffff811fb8e1>] hugetlbfs_file_mmap+0x7d/0x104
[<ffffffff810f859a>] mmap_region+0x272/0x47d
[<ffffffff810f8a39>] do_mmap_pgoff+0x294/0x2ee
[<ffffffff810f8b65>] sys_mmap_pgoff+0xd2/0x10e
[<ffffffff8103d19e>] sys_mmap+0x1d/0x1f
[<ffffffff816a5922>] system_call_fastpath+0x16/0x1b
-> #0 (&mm->mmap_sem){++++++}:
[<ffffffff810a0256>] __lock_acquire+0xa81/0xd75
[<ffffffff810a09e5>] lock_acquire+0xd5/0xfa
[<ffffffff810f1645>] might_fault+0x6d/0x90
[<ffffffff81125d62>] filldir+0x6a/0xc2
[<ffffffff81133a83>] dcache_readdir+0x5c/0x222
[<ffffffff81125fa8>] vfs_readdir+0x76/0xa8
[<ffffffff811260b6>] sys_getdents+0x79/0xc9
[<ffffffff816a5922>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sb->s_type->i_mutex_key#12);
lock(&mm->mmap_sem);
lock(&sb->s_type->i_mutex_key#12);
lock(&mm->mmap_sem);
*** DEADLOCK ***
1 lock held by bash/1572:
#0: (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff81125f88>] vfs_readdir+0x56/0xa8
stack backtrace:
Pid: 1572, comm: bash Not tainted 3.4.0-rc1+ #322
Call Trace:
[<ffffffff81699a3c>] print_circular_bug+0x1f8/0x209
[<ffffffff810a0256>] __lock_acquire+0xa81/0xd75
[<ffffffff810f38aa>] ? handle_pte_fault+0x5ff/0x614
[<ffffffff8109e622>] ? mark_lock+0x2d/0x258
[<ffffffff810f1618>] ? might_fault+0x40/0x90
[<ffffffff810a09e5>] lock_acquire+0xd5/0xfa
[<ffffffff810f1618>] ? might_fault+0x40/0x90
[<ffffffff816a3249>] ? __mutex_lock_common+0x333/0x350
[<ffffffff810f1645>] might_fault+0x6d/0x90
[<ffffffff810f1618>] ? might_fault+0x40/0x90
[<ffffffff81125d62>] filldir+0x6a/0xc2
[<ffffffff81133a83>] dcache_readdir+0x5c/0x222
[<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74
[<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74
[<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74
[<ffffffff81125fa8>] vfs_readdir+0x76/0xa8
[<ffffffff811260b6>] sys_getdents+0x79/0xc9
[<ffffffff816a5922>] system_call_fastpath+0x16/0x1b
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/hugetlbfs/inode.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 92f75aa..d8899e1 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -485,6 +485,7 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb,
inode->i_fop = &simple_dir_operations;
/* directory inodes start off with i_nlink == 2 (for "." entry) */
inc_nlink(inode);
+ lockdep_annotate_inode_mutex_key(inode);
}
return inode;
}
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-04-16 20:29 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-08 9:15 [PATCH] hugetlbfs: lockdep annotate root inode properly Aneesh Kumar K.V
2012-03-08 21:02 ` Andrew Morton
2012-03-08 21:10 ` Dave Jones
2012-03-08 21:19 ` Tyler Hicks
2012-03-08 21:40 ` Andrew Morton
2012-03-08 21:49 ` Al Viro
2012-03-08 22:19 ` Andrew Morton
2012-03-08 22:33 ` Dave Jones
2012-03-08 22:45 ` Andrew Morton
2012-03-09 5:00 ` Aneesh Kumar K.V
2012-03-09 5:03 ` Aneesh Kumar K.V
2012-03-08 21:44 ` Al Viro
2012-03-08 22:44 ` Peter Zijlstra
2012-03-08 22:46 ` Peter Zijlstra
-- strict thread matches above, loose matches on Subject: below --
2012-04-16 20:28 Aneesh Kumar K.V
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).