* lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
@ 2010-01-15 12:02 Dave Chinner
2010-01-15 12:11 ` Peter Zijlstra
0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2010-01-15 12:02 UTC (permalink / raw)
To: linux-kernel; +Cc: peterz, mingo
Just got this on a 2.6.33-rc3 kernel during unmount:
[21819.329256] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
[21819.349943] kswapd0/407 [HC0[0]:SC0[0]:HE1:SE1] takes:
[21819.349943] (iprune_sem){+++++-}, at: [<ffffffff81132a92>] shrink_icache_memory+0x82/0x2b0
[21819.349943] {RECLAIM_FS-ON-W} state was registered at:
[21819.349943] [<ffffffff810824c3>] mark_held_locks+0x73/0x90
[21819.349943] [<ffffffff810825a5>] lockdep_trace_alloc+0xc5/0xd0
[21819.349943] [<ffffffff811145f1>] kmem_cache_alloc+0x41/0x150
[21819.349943] [<ffffffff813504f9>] kmem_zone_alloc+0x99/0xe0
[21819.349943] [<ffffffff8135055e>] kmem_zone_zalloc+0x1e/0x50
[21819.349943] [<ffffffff81346148>] _xfs_trans_alloc+0x38/0x80
[21819.349943] [<ffffffff8134634f>] xfs_trans_alloc+0x9f/0xb0
[21819.349943] [<ffffffff8134b3d0>] xfs_free_eofblocks+0x120/0x290
[21819.349943] [<ffffffff8134f353>] xfs_inactive+0x103/0x560
[21819.349943] [<ffffffff8135e6bf>] xfs_fs_clear_inode+0xdf/0x120
[21819.349943] [<ffffffff81132615>] clear_inode+0xb5/0x140
[21819.349943] [<ffffffff81132918>] dispose_list+0x38/0x130
[21819.349943] [<ffffffff81132de3>] invalidate_inodes+0x123/0x170
[21819.349943] [<ffffffff8111db4e>] generic_shutdown_super+0x4e/0x100
[21819.349943] [<ffffffff8111dc31>] kill_block_super+0x31/0x50
[21819.349943] [<ffffffff8111e455>] deactivate_super+0x85/0xa0
[21819.349943] [<ffffffff81136f8a>] mntput_no_expire+0xca/0x110
[21819.349943] [<ffffffff81137374>] sys_umount+0x64/0x370
[21819.349943] [<ffffffff81002fdb>] system_call_fastpath+0x16/0x1b
[21819.349943] irq event stamp: 4151539
[21819.349943] hardirqs last enabled at (4151539): [<ffffffff81706e04>] _raw_spin_unlock_irqrestore+0x44/0x70
[21819.349943] hardirqs last disabled at (4151538): [<ffffffff81706505>] _raw_spin_lock_irqsave+0x25/0x90
[21819.349943] softirqs last enabled at (4151312): [<ffffffff8105373b>] __do_softirq+0x18b/0x1e0
[21819.349943] softirqs last disabled at (4150645): [<ffffffff81003e8c>] call_softirq+0x1c/0x50
[21819.349943]
[21819.349943] other info that might help us debug this:
[21819.349943] 1 lock held by kswapd0/407:
[21819.349943] #0: (shrinker_rwsem){++++..}, at: [<ffffffff810e5c5d>] shrink_slab+0x3d/0x180
[21819.349943]
[21819.349943] stack backtrace:
[21819.349943] Pid: 407, comm: kswapd0 Not tainted 2.6.33-rc3-dgc #35
[21819.349943] Call Trace:
[21819.349943] [<ffffffff81081353>] print_usage_bug+0x183/0x190
[21819.349943] [<ffffffff81082372>] mark_lock+0x342/0x420
[21819.349943] [<ffffffff810814c0>] ? check_usage_forwards+0x0/0x100
[21819.349943] [<ffffffff81083531>] __lock_acquire+0x4d1/0x17a0
[21819.349943] [<ffffffff8108327f>] ? __lock_acquire+0x21f/0x17a0
[21819.349943] [<ffffffff810848ce>] lock_acquire+0xce/0x100
[21819.349943] [<ffffffff81132a92>] ? shrink_icache_memory+0x82/0x2b0
[21819.349943] [<ffffffff81705592>] down_read+0x52/0x90
[21819.349943] [<ffffffff81132a92>] ? shrink_icache_memory+0x82/0x2b0
[21819.349943] [<ffffffff81132a92>] shrink_icache_memory+0x82/0x2b0
[21819.349943] [<ffffffff810e5d4a>] shrink_slab+0x12a/0x180
[21819.349943] [<ffffffff810e66d6>] kswapd+0x586/0x990
[21819.349943] [<ffffffff810e35c0>] ? isolate_pages_global+0x0/0x240
[21819.349943] [<ffffffff8106c7c0>] ? autoremove_wake_function+0x0/0x40
[21819.349943] [<ffffffff810e6150>] ? kswapd+0x0/0x990
[21819.349943] [<ffffffff8106c276>] kthread+0x96/0xa0
[21819.349943] [<ffffffff81003d94>] kernel_thread_helper+0x4/0x10
[21819.349943] [<ffffffff8170717c>] ? restore_args+0x0/0x30
[21819.349943] [<ffffffff8106c1e0>] ? kthread+0x0/0xa0
[21819.349943] [<ffffffff81003d90>] ? kernel_thread_helper+0x0/0x10
I can't work out what the <mumble>RECLAIM_FS<mumble> notations are
supposed to mean from the code and they are not documented at
all, so I need someone to explain what this means before I can
determine if it is a valid warning or not....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage. 2010-01-15 12:02 lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage Dave Chinner @ 2010-01-15 12:11 ` Peter Zijlstra 2010-01-15 12:44 ` Dave Chinner 0 siblings, 1 reply; 7+ messages in thread From: Peter Zijlstra @ 2010-01-15 12:11 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-kernel, mingo On Fri, 2010-01-15 at 23:02 +1100, Dave Chinner wrote: > Just got this on a 2.6.33-rc3 kernel during unmount: > > [21819.329256] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage. > [21819.349943] kswapd0/407 [HC0[0]:SC0[0]:HE1:SE1] takes: > [21819.349943] (iprune_sem){+++++-}, at: [<ffffffff81132a92>] shrink_icache_memory+0x82/0x2b0 > [21819.349943] {RECLAIM_FS-ON-W} state was registered at: > [21819.349943] [<ffffffff810824c3>] mark_held_locks+0x73/0x90 > [21819.349943] [<ffffffff810825a5>] lockdep_trace_alloc+0xc5/0xd0 > [21819.349943] [<ffffffff811145f1>] kmem_cache_alloc+0x41/0x150 > [21819.349943] [<ffffffff813504f9>] kmem_zone_alloc+0x99/0xe0 > [21819.349943] [<ffffffff8135055e>] kmem_zone_zalloc+0x1e/0x50 > [21819.349943] [<ffffffff81346148>] _xfs_trans_alloc+0x38/0x80 > [21819.349943] [<ffffffff8134634f>] xfs_trans_alloc+0x9f/0xb0 > [21819.349943] [<ffffffff8134b3d0>] xfs_free_eofblocks+0x120/0x290 > [21819.349943] [<ffffffff8134f353>] xfs_inactive+0x103/0x560 > [21819.349943] [<ffffffff8135e6bf>] xfs_fs_clear_inode+0xdf/0x120 > [21819.349943] [<ffffffff81132615>] clear_inode+0xb5/0x140 > [21819.349943] [<ffffffff81132918>] dispose_list+0x38/0x130 > [21819.349943] [<ffffffff81132de3>] invalidate_inodes+0x123/0x170 > [21819.349943] [<ffffffff8111db4e>] generic_shutdown_super+0x4e/0x100 > [21819.349943] [<ffffffff8111dc31>] kill_block_super+0x31/0x50 > [21819.349943] [<ffffffff8111e455>] deactivate_super+0x85/0xa0 > [21819.349943] [<ffffffff81136f8a>] mntput_no_expire+0xca/0x110 > [21819.349943] [<ffffffff81137374>] sys_umount+0x64/0x370 > [21819.349943] [<ffffffff81002fdb>] system_call_fastpath+0x16/0x1b > [21819.349943] irq event stamp: 4151539 > [21819.349943] hardirqs last enabled at (4151539): [<ffffffff81706e04>] _raw_spin_unlock_irqrestore+0x44/0x70 > [21819.349943] hardirqs last disabled at (4151538): [<ffffffff81706505>] _raw_spin_lock_irqsave+0x25/0x90 > [21819.349943] softirqs last enabled at (4151312): [<ffffffff8105373b>] __do_softirq+0x18b/0x1e0 > [21819.349943] softirqs last disabled at (4150645): [<ffffffff81003e8c>] call_softirq+0x1c/0x50 > [21819.349943] > [21819.349943] other info that might help us debug this: > [21819.349943] 1 lock held by kswapd0/407: > [21819.349943] #0: (shrinker_rwsem){++++..}, at: [<ffffffff810e5c5d>] shrink_slab+0x3d/0x180 > [21819.349943] > [21819.349943] stack backtrace: > [21819.349943] Pid: 407, comm: kswapd0 Not tainted 2.6.33-rc3-dgc #35 > [21819.349943] Call Trace: > [21819.349943] [<ffffffff81081353>] print_usage_bug+0x183/0x190 > [21819.349943] [<ffffffff81082372>] mark_lock+0x342/0x420 > [21819.349943] [<ffffffff810814c0>] ? check_usage_forwards+0x0/0x100 > [21819.349943] [<ffffffff81083531>] __lock_acquire+0x4d1/0x17a0 > [21819.349943] [<ffffffff8108327f>] ? __lock_acquire+0x21f/0x17a0 > [21819.349943] [<ffffffff810848ce>] lock_acquire+0xce/0x100 > [21819.349943] [<ffffffff81132a92>] ? shrink_icache_memory+0x82/0x2b0 > [21819.349943] [<ffffffff81705592>] down_read+0x52/0x90 > [21819.349943] [<ffffffff81132a92>] ? shrink_icache_memory+0x82/0x2b0 > [21819.349943] [<ffffffff81132a92>] shrink_icache_memory+0x82/0x2b0 > [21819.349943] [<ffffffff810e5d4a>] shrink_slab+0x12a/0x180 > [21819.349943] [<ffffffff810e66d6>] kswapd+0x586/0x990 > [21819.349943] [<ffffffff810e35c0>] ? isolate_pages_global+0x0/0x240 > [21819.349943] [<ffffffff8106c7c0>] ? autoremove_wake_function+0x0/0x40 > [21819.349943] [<ffffffff810e6150>] ? kswapd+0x0/0x990 > [21819.349943] [<ffffffff8106c276>] kthread+0x96/0xa0 > [21819.349943] [<ffffffff81003d94>] kernel_thread_helper+0x4/0x10 > [21819.349943] [<ffffffff8170717c>] ? restore_args+0x0/0x30 > [21819.349943] [<ffffffff8106c1e0>] ? kthread+0x0/0xa0 > [21819.349943] [<ffffffff81003d90>] ? kernel_thread_helper+0x0/0x10 > > I can't work out what the <mumble>RECLAIM_FS<mumble> notations are > supposed to mean from the code and they are not documented at > all, so I need someone to explain what this means before I can > determine if it is a valid warning or not.... The <mumble>RECLAIM_FS<mumble> bit means that lock (iprune_sem) was taken from reclaim and is also taken over an allocation. It warns that it might deadlock if that allocation ends up trying to reclaim memory. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage. 2010-01-15 12:11 ` Peter Zijlstra @ 2010-01-15 12:44 ` Dave Chinner 2010-01-15 12:53 ` Peter Zijlstra 0 siblings, 1 reply; 7+ messages in thread From: Dave Chinner @ 2010-01-15 12:44 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, mingo On Fri, Jan 15, 2010 at 01:11:13PM +0100, Peter Zijlstra wrote: > On Fri, 2010-01-15 at 23:02 +1100, Dave Chinner wrote: > > Just got this on a 2.6.33-rc3 kernel during unmount: > > > > [21819.329256] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage. > > [21819.349943] kswapd0/407 [HC0[0]:SC0[0]:HE1:SE1] takes: > > [21819.349943] (iprune_sem){+++++-}, at: [<ffffffff81132a92>] shrink_icache_memory+0x82/0x2b0 > > [21819.349943] {RECLAIM_FS-ON-W} state was registered at: > > [21819.349943] [<ffffffff810824c3>] mark_held_locks+0x73/0x90 > > [21819.349943] [<ffffffff810825a5>] lockdep_trace_alloc+0xc5/0xd0 > > [21819.349943] [<ffffffff811145f1>] kmem_cache_alloc+0x41/0x150 > > [21819.349943] [<ffffffff813504f9>] kmem_zone_alloc+0x99/0xe0 > > [21819.349943] [<ffffffff8135055e>] kmem_zone_zalloc+0x1e/0x50 > > [21819.349943] [<ffffffff81346148>] _xfs_trans_alloc+0x38/0x80 > > [21819.349943] [<ffffffff8134634f>] xfs_trans_alloc+0x9f/0xb0 > > [21819.349943] [<ffffffff8134b3d0>] xfs_free_eofblocks+0x120/0x290 > > [21819.349943] [<ffffffff8134f353>] xfs_inactive+0x103/0x560 > > [21819.349943] [<ffffffff8135e6bf>] xfs_fs_clear_inode+0xdf/0x120 > > [21819.349943] [<ffffffff81132615>] clear_inode+0xb5/0x140 > > [21819.349943] [<ffffffff81132918>] dispose_list+0x38/0x130 > > [21819.349943] [<ffffffff81132de3>] invalidate_inodes+0x123/0x170 > > [21819.349943] [<ffffffff8111db4e>] generic_shutdown_super+0x4e/0x100 > > [21819.349943] [<ffffffff8111dc31>] kill_block_super+0x31/0x50 > > [21819.349943] [<ffffffff8111e455>] deactivate_super+0x85/0xa0 > > [21819.349943] [<ffffffff81136f8a>] mntput_no_expire+0xca/0x110 > > [21819.349943] [<ffffffff81137374>] sys_umount+0x64/0x370 > > [21819.349943] [<ffffffff81002fdb>] system_call_fastpath+0x16/0x1b > > [21819.349943] irq event stamp: 4151539 > > [21819.349943] hardirqs last enabled at (4151539): [<ffffffff81706e04>] _raw_spin_unlock_irqrestore+0x44/0x70 > > [21819.349943] hardirqs last disabled at (4151538): [<ffffffff81706505>] _raw_spin_lock_irqsave+0x25/0x90 > > [21819.349943] softirqs last enabled at (4151312): [<ffffffff8105373b>] __do_softirq+0x18b/0x1e0 > > [21819.349943] softirqs last disabled at (4150645): [<ffffffff81003e8c>] call_softirq+0x1c/0x50 > > [21819.349943] > > [21819.349943] other info that might help us debug this: > > [21819.349943] 1 lock held by kswapd0/407: > > [21819.349943] #0: (shrinker_rwsem){++++..}, at: [<ffffffff810e5c5d>] shrink_slab+0x3d/0x180 > > [21819.349943] > > [21819.349943] stack backtrace: > > [21819.349943] Pid: 407, comm: kswapd0 Not tainted 2.6.33-rc3-dgc #35 > > [21819.349943] Call Trace: > > [21819.349943] [<ffffffff81081353>] print_usage_bug+0x183/0x190 > > [21819.349943] [<ffffffff81082372>] mark_lock+0x342/0x420 > > [21819.349943] [<ffffffff810814c0>] ? check_usage_forwards+0x0/0x100 > > [21819.349943] [<ffffffff81083531>] __lock_acquire+0x4d1/0x17a0 > > [21819.349943] [<ffffffff8108327f>] ? __lock_acquire+0x21f/0x17a0 > > [21819.349943] [<ffffffff810848ce>] lock_acquire+0xce/0x100 > > [21819.349943] [<ffffffff81132a92>] ? shrink_icache_memory+0x82/0x2b0 > > [21819.349943] [<ffffffff81705592>] down_read+0x52/0x90 > > [21819.349943] [<ffffffff81132a92>] ? shrink_icache_memory+0x82/0x2b0 > > [21819.349943] [<ffffffff81132a92>] shrink_icache_memory+0x82/0x2b0 > > [21819.349943] [<ffffffff810e5d4a>] shrink_slab+0x12a/0x180 > > [21819.349943] [<ffffffff810e66d6>] kswapd+0x586/0x990 > > [21819.349943] [<ffffffff810e35c0>] ? isolate_pages_global+0x0/0x240 > > [21819.349943] [<ffffffff8106c7c0>] ? autoremove_wake_function+0x0/0x40 > > [21819.349943] [<ffffffff810e6150>] ? kswapd+0x0/0x990 > > [21819.349943] [<ffffffff8106c276>] kthread+0x96/0xa0 > > [21819.349943] [<ffffffff81003d94>] kernel_thread_helper+0x4/0x10 > > [21819.349943] [<ffffffff8170717c>] ? restore_args+0x0/0x30 > > [21819.349943] [<ffffffff8106c1e0>] ? kthread+0x0/0xa0 > > [21819.349943] [<ffffffff81003d90>] ? kernel_thread_helper+0x0/0x10 > > > > I can't work out what the <mumble>RECLAIM_FS<mumble> notations are > > supposed to mean from the code and they are not documented at > > all, so I need someone to explain what this means before I can > > determine if it is a valid warning or not.... > > The <mumble>RECLAIM_FS<mumble> bit means that lock (iprune_sem) was > taken from reclaim and is also taken over an allocation. So there's an implicit, undocumented requirement that inode reclaim during unmount requires a filesystem to do GFP_NOFS allocation? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage. 2010-01-15 12:44 ` Dave Chinner @ 2010-01-15 12:53 ` Peter Zijlstra 2010-01-15 13:33 ` Dave Chinner 2010-01-19 18:46 ` Christoph Hellwig 0 siblings, 2 replies; 7+ messages in thread From: Peter Zijlstra @ 2010-01-15 12:53 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-kernel, mingo, Christoph Hellwig, Nick Piggin On Fri, 2010-01-15 at 23:44 +1100, Dave Chinner wrote: > > > > I can't work out what the <mumble>RECLAIM_FS<mumble> notations are > > > supposed to mean from the code and they are not documented at > > > all, so I need someone to explain what this means before I can > > > determine if it is a valid warning or not.... > > > > The <mumble>RECLAIM_FS<mumble> bit means that lock (iprune_sem) was > > taken from reclaim and is also taken over an allocation. > > So there's an implicit, undocumented requirement that inode reclaim > during unmount requires a filesystem to do GFP_NOFS allocation? Well, I don't know enough about xfs (of filesystems in generic) to say that with any certainty, but I can imagine inode writeback from the sync that goes with umount to cause issues. If this inode reclaim is past all that and the filesystem is basically RO, then I don't think so and this could be considered a false positive, in which case we need an annotation for this. I added hch since he poked at similar reclaim recursions on XFS before and Nick since he thought up this annotation and knows more about filesystems than I do. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage. 2010-01-15 12:53 ` Peter Zijlstra @ 2010-01-15 13:33 ` Dave Chinner 2010-01-19 18:46 ` Christoph Hellwig 1 sibling, 0 replies; 7+ messages in thread From: Dave Chinner @ 2010-01-15 13:33 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, mingo, Christoph Hellwig, Nick Piggin On Fri, Jan 15, 2010 at 01:53:15PM +0100, Peter Zijlstra wrote: > On Fri, 2010-01-15 at 23:44 +1100, Dave Chinner wrote: > > > > > > I can't work out what the <mumble>RECLAIM_FS<mumble> notations are > > > > supposed to mean from the code and they are not documented at > > > > all, so I need someone to explain what this means before I can > > > > determine if it is a valid warning or not.... > > > > > > The <mumble>RECLAIM_FS<mumble> bit means that lock (iprune_sem) was > > > taken from reclaim and is also taken over an allocation. > > > > So there's an implicit, undocumented requirement that inode reclaim > > during unmount requires a filesystem to do GFP_NOFS allocation? > > Well, I don't know enough about xfs (of filesystems in generic) to say > that with any certainty, but I can imagine inode writeback from the sync > that goes with umount to cause issues. > > If this inode reclaim is past all that and the filesystem is basically > RO, then I don't think so and this could be considered a false positive, > in which case we need an annotation for this. The issue is that the iprune_sem is held write locked over dispose_list() even though the inodes have been removed from the unused list. While iprune_sem is held write locked, we can't enter shrink_icache_memory because that takes the iprune_sem in read mode. Hence allocation anywhere in the dispose_list path has to be GFP_NOFS to avoid this. XFS relies on the PF_MEMALLOC flag to clear the __GFP_FS flag in allocations so that the same code paths work in both normal and reclaim situations (like _xfs_trans_alloc), but the unmount path sets no such flag. Setting this flag would avoid the problem, but is messy. FWIW, I'm not sure why we need to hold the iprune_sem after the inodes have been detached from the unused list in the unmount path. The iprune_sem is there to prevent against concurrent access by the shrink_icache_memory path, so once all the inodes are isolated it seems the iprune_sem is not needed anymore. Of course, this code is a maze of twisty passages, so there's likely to be something I've missed that means that this is the only way it can work.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage. 2010-01-15 12:53 ` Peter Zijlstra 2010-01-15 13:33 ` Dave Chinner @ 2010-01-19 18:46 ` Christoph Hellwig 2010-01-20 10:57 ` Steven Whitehouse 1 sibling, 1 reply; 7+ messages in thread From: Christoph Hellwig @ 2010-01-19 18:46 UTC (permalink / raw) To: Peter Zijlstra Cc: Dave Chinner, linux-kernel, mingo, Christoph Hellwig, Nick Piggin, linux-fsdevel, viro, swhiteho On Fri, Jan 15, 2010 at 01:53:15PM +0100, Peter Zijlstra wrote: > Well, I don't know enough about xfs (of filesystems in generic) to say > that with any certainty, but I can imagine inode writeback from the sync > that goes with umount to cause issues. > > If this inode reclaim is past all that and the filesystem is basically > RO, then I don't think so and this could be considered a false positive, > in which case we need an annotation for this. The issue is a bit more complicated. In the unmount case invalidate_inodes() is indeed called after the filesystem is effectively read-only for user origination operations. But there's a miriad of other invalidate_inodes() calls: - fs/block_dev.c:__invalidate_device() This gets called from block device codes for various kinds of invalidations. Doesn't make any sense at all to me, but hey.. - fs/ext2/super.c:ext2_remount() Appears like it's used to check for activate inodes during remount. Very fishy usage, and could just be replaced with a list walk without any I/O - fs/gfs2/glock.c:gfs2_gl_hash_clear() No idea. - fs/gfs2/ops_fstype.c:fill_super() Tries to kill all inodes in the fill_super error path, looks very fishy. - fs/ntfs/super.c:ntfs_fill_super() Failure case of fill_super again, does not look very useful.A - fs/smbfs/inode.c:smb_invalidate_inodes() Used when a connection goes bad. In short we can't generally rely on this only happening on a dead fs. But in the end we abuse iprune_sem to work around a ref counting problem. As long as we keep a reference to the superblock for each inode on the dispose list the superblock can't go away and there's no need for the lock at all. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage. 2010-01-19 18:46 ` Christoph Hellwig @ 2010-01-20 10:57 ` Steven Whitehouse 0 siblings, 0 replies; 7+ messages in thread From: Steven Whitehouse @ 2010-01-20 10:57 UTC (permalink / raw) To: Christoph Hellwig Cc: Peter Zijlstra, Dave Chinner, linux-kernel, mingo, Nick Piggin, linux-fsdevel, viro Hi, On Tue, 2010-01-19 at 13:46 -0500, Christoph Hellwig wrote: > On Fri, Jan 15, 2010 at 01:53:15PM +0100, Peter Zijlstra wrote: > > Well, I don't know enough about xfs (of filesystems in generic) to say > > that with any certainty, but I can imagine inode writeback from the sync > > that goes with umount to cause issues. > > > > If this inode reclaim is past all that and the filesystem is basically > > RO, then I don't think so and this could be considered a false positive, > > in which case we need an annotation for this. > > The issue is a bit more complicated. In the unmount case > invalidate_inodes() is indeed called after the filesystem is effectively > read-only for user origination operations. But there's a miriad of > other invalidate_inodes() calls: > > - fs/block_dev.c:__invalidate_device() > > This gets called from block device codes for various kinds of > invalidations. Doesn't make any sense at all to me, but hey.. > > - fs/ext2/super.c:ext2_remount() > > Appears like it's used to check for activate inodes during > remount. Very fishy usage, and could just be replaced with > a list walk without any I/O > > - fs/gfs2/glock.c:gfs2_gl_hash_clear() > > No idea. > Its rather complicated and all down to using "special" inodes to cache metadata so that GFS2 has two VFS inodes per "real" inode, one as per normal and one just to cache metadata. This causes a circular dependency between glocks and inodes since we have something like this (in my best ascii art): gfs2 inode -> iopen glock -> inode glock -> metadata inode So at umount time, historically we've had to invalidate inodes once, and then get rid of the inode glocks which implied a iput() on the metadata inode and then invalidate inodes again to be rid of the metadata inodes. This has been the source of many problems at umount time. In my -nmw git tree at the moment, there are a couple of patches which are aimed at fixing this issue. The solution is to embed a struct address_space in each glock which caches metadata, rather than a complete inode. > - fs/gfs2/ops_fstype.c:fill_super() > > Tries to kill all inodes in the fill_super error path, looks > very fishy. > For the same reason as above. It should be possible to remove one or even both of these calls now. The two patches in the -nmw tree do the bare minimum really to make the change, but it should be possible to do a bit more clean up in that area now, Steve. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-01-20 10:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-15 12:02 lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage Dave Chinner
2010-01-15 12:11 ` Peter Zijlstra
2010-01-15 12:44 ` Dave Chinner
2010-01-15 12:53 ` Peter Zijlstra
2010-01-15 13:33 ` Dave Chinner
2010-01-19 18:46 ` Christoph Hellwig
2010-01-20 10:57 ` Steven Whitehouse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox