lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
@ 2010-01-15 12:02 Dave Chinner
  2010-01-15 12:11 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2010-01-15 12:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo

Just got this on a 2.6.33-rc3 kernel during unmount:

[21819.329256] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
[21819.349943] kswapd0/407 [HC0[0]:SC0[0]:HE1:SE1] takes:
[21819.349943]  (iprune_sem){+++++-}, at: [<ffffffff81132a92>] shrink_icache_memory+0x82/0x2b0
[21819.349943] {RECLAIM_FS-ON-W} state was registered at:
[21819.349943]   [<ffffffff810824c3>] mark_held_locks+0x73/0x90
[21819.349943]   [<ffffffff810825a5>] lockdep_trace_alloc+0xc5/0xd0
[21819.349943]   [<ffffffff811145f1>] kmem_cache_alloc+0x41/0x150
[21819.349943]   [<ffffffff813504f9>] kmem_zone_alloc+0x99/0xe0
[21819.349943]   [<ffffffff8135055e>] kmem_zone_zalloc+0x1e/0x50
[21819.349943]   [<ffffffff81346148>] _xfs_trans_alloc+0x38/0x80
[21819.349943]   [<ffffffff8134634f>] xfs_trans_alloc+0x9f/0xb0
[21819.349943]   [<ffffffff8134b3d0>] xfs_free_eofblocks+0x120/0x290
[21819.349943]   [<ffffffff8134f353>] xfs_inactive+0x103/0x560
[21819.349943]   [<ffffffff8135e6bf>] xfs_fs_clear_inode+0xdf/0x120
[21819.349943]   [<ffffffff81132615>] clear_inode+0xb5/0x140
[21819.349943]   [<ffffffff81132918>] dispose_list+0x38/0x130
[21819.349943]   [<ffffffff81132de3>] invalidate_inodes+0x123/0x170
[21819.349943]   [<ffffffff8111db4e>] generic_shutdown_super+0x4e/0x100
[21819.349943]   [<ffffffff8111dc31>] kill_block_super+0x31/0x50
[21819.349943]   [<ffffffff8111e455>] deactivate_super+0x85/0xa0
[21819.349943]   [<ffffffff81136f8a>] mntput_no_expire+0xca/0x110
[21819.349943]   [<ffffffff81137374>] sys_umount+0x64/0x370
[21819.349943]   [<ffffffff81002fdb>] system_call_fastpath+0x16/0x1b
[21819.349943] irq event stamp: 4151539
[21819.349943] hardirqs last  enabled at (4151539): [<ffffffff81706e04>] _raw_spin_unlock_irqrestore+0x44/0x70
[21819.349943] hardirqs last disabled at (4151538): [<ffffffff81706505>] _raw_spin_lock_irqsave+0x25/0x90
[21819.349943] softirqs last  enabled at (4151312): [<ffffffff8105373b>] __do_softirq+0x18b/0x1e0
[21819.349943] softirqs last disabled at (4150645): [<ffffffff81003e8c>] call_softirq+0x1c/0x50
[21819.349943]
[21819.349943] other info that might help us debug this:
[21819.349943] 1 lock held by kswapd0/407:
[21819.349943]  #0:  (shrinker_rwsem){++++..}, at: [<ffffffff810e5c5d>] shrink_slab+0x3d/0x180
[21819.349943]
[21819.349943] stack backtrace:
[21819.349943] Pid: 407, comm: kswapd0 Not tainted 2.6.33-rc3-dgc #35
[21819.349943] Call Trace:
[21819.349943]  [<ffffffff81081353>] print_usage_bug+0x183/0x190
[21819.349943]  [<ffffffff81082372>] mark_lock+0x342/0x420
[21819.349943]  [<ffffffff810814c0>] ? check_usage_forwards+0x0/0x100
[21819.349943]  [<ffffffff81083531>] __lock_acquire+0x4d1/0x17a0
[21819.349943]  [<ffffffff8108327f>] ? __lock_acquire+0x21f/0x17a0
[21819.349943]  [<ffffffff810848ce>] lock_acquire+0xce/0x100
[21819.349943]  [<ffffffff81132a92>] ? shrink_icache_memory+0x82/0x2b0
[21819.349943]  [<ffffffff81705592>] down_read+0x52/0x90
[21819.349943]  [<ffffffff81132a92>] ? shrink_icache_memory+0x82/0x2b0
[21819.349943]  [<ffffffff81132a92>] shrink_icache_memory+0x82/0x2b0
[21819.349943]  [<ffffffff810e5d4a>] shrink_slab+0x12a/0x180
[21819.349943]  [<ffffffff810e66d6>] kswapd+0x586/0x990
[21819.349943]  [<ffffffff810e35c0>] ? isolate_pages_global+0x0/0x240
[21819.349943]  [<ffffffff8106c7c0>] ? autoremove_wake_function+0x0/0x40
[21819.349943]  [<ffffffff810e6150>] ? kswapd+0x0/0x990
[21819.349943]  [<ffffffff8106c276>] kthread+0x96/0xa0
[21819.349943]  [<ffffffff81003d94>] kernel_thread_helper+0x4/0x10
[21819.349943]  [<ffffffff8170717c>] ? restore_args+0x0/0x30
[21819.349943]  [<ffffffff8106c1e0>] ? kthread+0x0/0xa0
[21819.349943]  [<ffffffff81003d90>] ? kernel_thread_helper+0x0/0x10

I can't work out what the <mumble>RECLAIM_FS<mumble> notations are
supposed to mean from the code and they are not documented at
all, so I need someone to explain what this means before I can
determine if it is a valid warning or not....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
  2010-01-15 12:02 lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage Dave Chinner
@ 2010-01-15 12:11 ` Peter Zijlstra
  2010-01-15 12:44   ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2010-01-15 12:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, mingo

On Fri, 2010-01-15 at 23:02 +1100, Dave Chinner wrote:
> Just got this on a 2.6.33-rc3 kernel during unmount:
> 
> [21819.329256] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
> [21819.349943] kswapd0/407 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [21819.349943]  (iprune_sem){+++++-}, at: [<ffffffff81132a92>] shrink_icache_memory+0x82/0x2b0
> [21819.349943] {RECLAIM_FS-ON-W} state was registered at:
> [21819.349943]   [<ffffffff810824c3>] mark_held_locks+0x73/0x90
> [21819.349943]   [<ffffffff810825a5>] lockdep_trace_alloc+0xc5/0xd0
> [21819.349943]   [<ffffffff811145f1>] kmem_cache_alloc+0x41/0x150
> [21819.349943]   [<ffffffff813504f9>] kmem_zone_alloc+0x99/0xe0
> [21819.349943]   [<ffffffff8135055e>] kmem_zone_zalloc+0x1e/0x50
> [21819.349943]   [<ffffffff81346148>] _xfs_trans_alloc+0x38/0x80
> [21819.349943]   [<ffffffff8134634f>] xfs_trans_alloc+0x9f/0xb0
> [21819.349943]   [<ffffffff8134b3d0>] xfs_free_eofblocks+0x120/0x290
> [21819.349943]   [<ffffffff8134f353>] xfs_inactive+0x103/0x560
> [21819.349943]   [<ffffffff8135e6bf>] xfs_fs_clear_inode+0xdf/0x120
> [21819.349943]   [<ffffffff81132615>] clear_inode+0xb5/0x140
> [21819.349943]   [<ffffffff81132918>] dispose_list+0x38/0x130
> [21819.349943]   [<ffffffff81132de3>] invalidate_inodes+0x123/0x170
> [21819.349943]   [<ffffffff8111db4e>] generic_shutdown_super+0x4e/0x100
> [21819.349943]   [<ffffffff8111dc31>] kill_block_super+0x31/0x50
> [21819.349943]   [<ffffffff8111e455>] deactivate_super+0x85/0xa0
> [21819.349943]   [<ffffffff81136f8a>] mntput_no_expire+0xca/0x110
> [21819.349943]   [<ffffffff81137374>] sys_umount+0x64/0x370
> [21819.349943]   [<ffffffff81002fdb>] system_call_fastpath+0x16/0x1b
> [21819.349943] irq event stamp: 4151539
> [21819.349943] hardirqs last  enabled at (4151539): [<ffffffff81706e04>] _raw_spin_unlock_irqrestore+0x44/0x70
> [21819.349943] hardirqs last disabled at (4151538): [<ffffffff81706505>] _raw_spin_lock_irqsave+0x25/0x90
> [21819.349943] softirqs last  enabled at (4151312): [<ffffffff8105373b>] __do_softirq+0x18b/0x1e0
> [21819.349943] softirqs last disabled at (4150645): [<ffffffff81003e8c>] call_softirq+0x1c/0x50
> [21819.349943]
> [21819.349943] other info that might help us debug this:
> [21819.349943] 1 lock held by kswapd0/407:
> [21819.349943]  #0:  (shrinker_rwsem){++++..}, at: [<ffffffff810e5c5d>] shrink_slab+0x3d/0x180
> [21819.349943]
> [21819.349943] stack backtrace:
> [21819.349943] Pid: 407, comm: kswapd0 Not tainted 2.6.33-rc3-dgc #35
> [21819.349943] Call Trace:
> [21819.349943]  [<ffffffff81081353>] print_usage_bug+0x183/0x190
> [21819.349943]  [<ffffffff81082372>] mark_lock+0x342/0x420
> [21819.349943]  [<ffffffff810814c0>] ? check_usage_forwards+0x0/0x100
> [21819.349943]  [<ffffffff81083531>] __lock_acquire+0x4d1/0x17a0
> [21819.349943]  [<ffffffff8108327f>] ? __lock_acquire+0x21f/0x17a0
> [21819.349943]  [<ffffffff810848ce>] lock_acquire+0xce/0x100
> [21819.349943]  [<ffffffff81132a92>] ? shrink_icache_memory+0x82/0x2b0
> [21819.349943]  [<ffffffff81705592>] down_read+0x52/0x90
> [21819.349943]  [<ffffffff81132a92>] ? shrink_icache_memory+0x82/0x2b0
> [21819.349943]  [<ffffffff81132a92>] shrink_icache_memory+0x82/0x2b0
> [21819.349943]  [<ffffffff810e5d4a>] shrink_slab+0x12a/0x180
> [21819.349943]  [<ffffffff810e66d6>] kswapd+0x586/0x990
> [21819.349943]  [<ffffffff810e35c0>] ? isolate_pages_global+0x0/0x240
> [21819.349943]  [<ffffffff8106c7c0>] ? autoremove_wake_function+0x0/0x40
> [21819.349943]  [<ffffffff810e6150>] ? kswapd+0x0/0x990
> [21819.349943]  [<ffffffff8106c276>] kthread+0x96/0xa0
> [21819.349943]  [<ffffffff81003d94>] kernel_thread_helper+0x4/0x10
> [21819.349943]  [<ffffffff8170717c>] ? restore_args+0x0/0x30
> [21819.349943]  [<ffffffff8106c1e0>] ? kthread+0x0/0xa0
> [21819.349943]  [<ffffffff81003d90>] ? kernel_thread_helper+0x0/0x10
> 
> I can't work out what the <mumble>RECLAIM_FS<mumble> notations are
> supposed to mean from the code and they are not documented at
> all, so I need someone to explain what this means before I can
> determine if it is a valid warning or not....

The <mumble>RECLAIM_FS<mumble> bit means that lock (iprune_sem) was
taken from reclaim and is also taken over an allocation.

It warns that it might deadlock if that allocation ends up trying to
reclaim memory.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
  2010-01-15 12:11 ` Peter Zijlstra
@ 2010-01-15 12:44   ` Dave Chinner
  2010-01-15 12:53     ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2010-01-15 12:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo

On Fri, Jan 15, 2010 at 01:11:13PM +0100, Peter Zijlstra wrote:
> On Fri, 2010-01-15 at 23:02 +1100, Dave Chinner wrote:
> > Just got this on a 2.6.33-rc3 kernel during unmount:
> > 
> > [21819.329256] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
> > [21819.349943] kswapd0/407 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > [21819.349943]  (iprune_sem){+++++-}, at: [<ffffffff81132a92>] shrink_icache_memory+0x82/0x2b0
> > [21819.349943] {RECLAIM_FS-ON-W} state was registered at:
> > [21819.349943]   [<ffffffff810824c3>] mark_held_locks+0x73/0x90
> > [21819.349943]   [<ffffffff810825a5>] lockdep_trace_alloc+0xc5/0xd0
> > [21819.349943]   [<ffffffff811145f1>] kmem_cache_alloc+0x41/0x150
> > [21819.349943]   [<ffffffff813504f9>] kmem_zone_alloc+0x99/0xe0
> > [21819.349943]   [<ffffffff8135055e>] kmem_zone_zalloc+0x1e/0x50
> > [21819.349943]   [<ffffffff81346148>] _xfs_trans_alloc+0x38/0x80
> > [21819.349943]   [<ffffffff8134634f>] xfs_trans_alloc+0x9f/0xb0
> > [21819.349943]   [<ffffffff8134b3d0>] xfs_free_eofblocks+0x120/0x290
> > [21819.349943]   [<ffffffff8134f353>] xfs_inactive+0x103/0x560
> > [21819.349943]   [<ffffffff8135e6bf>] xfs_fs_clear_inode+0xdf/0x120
> > [21819.349943]   [<ffffffff81132615>] clear_inode+0xb5/0x140
> > [21819.349943]   [<ffffffff81132918>] dispose_list+0x38/0x130
> > [21819.349943]   [<ffffffff81132de3>] invalidate_inodes+0x123/0x170
> > [21819.349943]   [<ffffffff8111db4e>] generic_shutdown_super+0x4e/0x100
> > [21819.349943]   [<ffffffff8111dc31>] kill_block_super+0x31/0x50
> > [21819.349943]   [<ffffffff8111e455>] deactivate_super+0x85/0xa0
> > [21819.349943]   [<ffffffff81136f8a>] mntput_no_expire+0xca/0x110
> > [21819.349943]   [<ffffffff81137374>] sys_umount+0x64/0x370
> > [21819.349943]   [<ffffffff81002fdb>] system_call_fastpath+0x16/0x1b
> > [21819.349943] irq event stamp: 4151539
> > [21819.349943] hardirqs last  enabled at (4151539): [<ffffffff81706e04>] _raw_spin_unlock_irqrestore+0x44/0x70
> > [21819.349943] hardirqs last disabled at (4151538): [<ffffffff81706505>] _raw_spin_lock_irqsave+0x25/0x90
> > [21819.349943] softirqs last  enabled at (4151312): [<ffffffff8105373b>] __do_softirq+0x18b/0x1e0
> > [21819.349943] softirqs last disabled at (4150645): [<ffffffff81003e8c>] call_softirq+0x1c/0x50
> > [21819.349943]
> > [21819.349943] other info that might help us debug this:
> > [21819.349943] 1 lock held by kswapd0/407:
> > [21819.349943]  #0:  (shrinker_rwsem){++++..}, at: [<ffffffff810e5c5d>] shrink_slab+0x3d/0x180
> > [21819.349943]
> > [21819.349943] stack backtrace:
> > [21819.349943] Pid: 407, comm: kswapd0 Not tainted 2.6.33-rc3-dgc #35
> > [21819.349943] Call Trace:
> > [21819.349943]  [<ffffffff81081353>] print_usage_bug+0x183/0x190
> > [21819.349943]  [<ffffffff81082372>] mark_lock+0x342/0x420
> > [21819.349943]  [<ffffffff810814c0>] ? check_usage_forwards+0x0/0x100
> > [21819.349943]  [<ffffffff81083531>] __lock_acquire+0x4d1/0x17a0
> > [21819.349943]  [<ffffffff8108327f>] ? __lock_acquire+0x21f/0x17a0
> > [21819.349943]  [<ffffffff810848ce>] lock_acquire+0xce/0x100
> > [21819.349943]  [<ffffffff81132a92>] ? shrink_icache_memory+0x82/0x2b0
> > [21819.349943]  [<ffffffff81705592>] down_read+0x52/0x90
> > [21819.349943]  [<ffffffff81132a92>] ? shrink_icache_memory+0x82/0x2b0
> > [21819.349943]  [<ffffffff81132a92>] shrink_icache_memory+0x82/0x2b0
> > [21819.349943]  [<ffffffff810e5d4a>] shrink_slab+0x12a/0x180
> > [21819.349943]  [<ffffffff810e66d6>] kswapd+0x586/0x990
> > [21819.349943]  [<ffffffff810e35c0>] ? isolate_pages_global+0x0/0x240
> > [21819.349943]  [<ffffffff8106c7c0>] ? autoremove_wake_function+0x0/0x40
> > [21819.349943]  [<ffffffff810e6150>] ? kswapd+0x0/0x990
> > [21819.349943]  [<ffffffff8106c276>] kthread+0x96/0xa0
> > [21819.349943]  [<ffffffff81003d94>] kernel_thread_helper+0x4/0x10
> > [21819.349943]  [<ffffffff8170717c>] ? restore_args+0x0/0x30
> > [21819.349943]  [<ffffffff8106c1e0>] ? kthread+0x0/0xa0
> > [21819.349943]  [<ffffffff81003d90>] ? kernel_thread_helper+0x0/0x10
> > 
> > I can't work out what the <mumble>RECLAIM_FS<mumble> notations are
> > supposed to mean from the code and they are not documented at
> > all, so I need someone to explain what this means before I can
> > determine if it is a valid warning or not....
> 
> The <mumble>RECLAIM_FS<mumble> bit means that lock (iprune_sem) was
> taken from reclaim and is also taken over an allocation.

So there's an implicit, undocumented requirement that inode reclaim
during unmount requires a filesystem to do GFP_NOFS allocation?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
  2010-01-15 12:44   ` Dave Chinner
@ 2010-01-15 12:53     ` Peter Zijlstra
  2010-01-15 13:33       ` Dave Chinner
  2010-01-19 18:46       ` Christoph Hellwig
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Zijlstra @ 2010-01-15 12:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, mingo, Christoph Hellwig, Nick Piggin

On Fri, 2010-01-15 at 23:44 +1100, Dave Chinner wrote:
> 
> > > I can't work out what the <mumble>RECLAIM_FS<mumble> notations are
> > > supposed to mean from the code and they are not documented at
> > > all, so I need someone to explain what this means before I can
> > > determine if it is a valid warning or not....
> > 
> > The <mumble>RECLAIM_FS<mumble> bit means that lock (iprune_sem) was
> > taken from reclaim and is also taken over an allocation.
> 
> So there's an implicit, undocumented requirement that inode reclaim
> during unmount requires a filesystem to do GFP_NOFS allocation? 

Well, I don't know enough about xfs (of filesystems in generic) to say
that with any certainty, but I can imagine inode writeback from the sync
that goes with umount to cause issues.

If this inode reclaim is past all that and the filesystem is basically
RO, then I don't think so and this could be considered a false positive,
in which case we need an annotation for this.

I added hch since he poked at similar reclaim recursions on XFS before
and Nick since he thought up this annotation and knows more about
filesystems than I do.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
  2010-01-15 12:53     ` Peter Zijlstra
@ 2010-01-15 13:33       ` Dave Chinner
  2010-01-19 18:46       ` Christoph Hellwig
  1 sibling, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2010-01-15 13:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo, Christoph Hellwig, Nick Piggin

On Fri, Jan 15, 2010 at 01:53:15PM +0100, Peter Zijlstra wrote:
> On Fri, 2010-01-15 at 23:44 +1100, Dave Chinner wrote:
> > 
> > > > I can't work out what the <mumble>RECLAIM_FS<mumble> notations are
> > > > supposed to mean from the code and they are not documented at
> > > > all, so I need someone to explain what this means before I can
> > > > determine if it is a valid warning or not....
> > > 
> > > The <mumble>RECLAIM_FS<mumble> bit means that lock (iprune_sem) was
> > > taken from reclaim and is also taken over an allocation.
> > 
> > So there's an implicit, undocumented requirement that inode reclaim
> > during unmount requires a filesystem to do GFP_NOFS allocation? 
> 
> Well, I don't know enough about xfs (of filesystems in generic) to say
> that with any certainty, but I can imagine inode writeback from the sync
> that goes with umount to cause issues.
> 
> If this inode reclaim is past all that and the filesystem is basically
> RO, then I don't think so and this could be considered a false positive,
> in which case we need an annotation for this.

The issue is that the iprune_sem is held write locked over
dispose_list() even though the inodes have been removed from the
unused list. While iprune_sem is held write locked, we can't enter
shrink_icache_memory because that takes the iprune_sem in read mode.
Hence allocation anywhere in the dispose_list path has to be
GFP_NOFS to avoid this.

XFS relies on the PF_MEMALLOC flag to clear the __GFP_FS
flag in allocations so that the same code paths work in both
normal and reclaim situations (like _xfs_trans_alloc), but the
unmount path sets no such flag. Setting this flag would
avoid the problem, but is messy.

FWIW, I'm not sure why we need to hold the iprune_sem after the
inodes have been detached from the unused list in the unmount path.
The iprune_sem is there to prevent against concurrent access by the
shrink_icache_memory path, so once all the inodes are isolated it
seems the iprune_sem is not needed anymore. Of course, this code is
a maze of twisty passages, so there's likely to be something I've
missed that means that this is the only way it can work....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
  2010-01-15 12:53     ` Peter Zijlstra
  2010-01-15 13:33       ` Dave Chinner
@ 2010-01-19 18:46       ` Christoph Hellwig
  2010-01-20 10:57         ` Steven Whitehouse
  1 sibling, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2010-01-19 18:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Chinner, linux-kernel, mingo, Christoph Hellwig, Nick Piggin,
	linux-fsdevel, viro, swhiteho

On Fri, Jan 15, 2010 at 01:53:15PM +0100, Peter Zijlstra wrote:
> Well, I don't know enough about xfs (of filesystems in generic) to say
> that with any certainty, but I can imagine inode writeback from the sync
> that goes with umount to cause issues.
> 
> If this inode reclaim is past all that and the filesystem is basically
> RO, then I don't think so and this could be considered a false positive,
> in which case we need an annotation for this.

The issue is a bit more complicated.  In the unmount case
invalidate_inodes() is indeed called after the filesystem is effectively
read-only for user origination operations.  But there's a miriad of
other invalidate_inodes() calls:

 - fs/block_dev.c:__invalidate_device()

	This gets called from block device codes for various kinds of
	invalidations.  Doesn't make any sense at all to me, but hey..

 - fs/ext2/super.c:ext2_remount()

	Appears like it's used to check for activate inodes during
	remount.  Very fishy usage, and could just be replaced with
	a list walk without any I/O

 - fs/gfs2/glock.c:gfs2_gl_hash_clear()

	No idea.

 - fs/gfs2/ops_fstype.c:fill_super()

	Tries to kill all inodes in the fill_super error path, looks
	very fishy.

 - fs/ntfs/super.c:ntfs_fill_super()

	Failure case of fill_super again, does not look very useful.A

 - fs/smbfs/inode.c:smb_invalidate_inodes()

	Used when a connection goes bad.

In short we can't generally rely on this only happening on a dead fs.


But in the end we abuse iprune_sem to work around a ref counting
problem.  As long as we keep a reference to the superblock for each
inode on the dispose list the superblock can't go away and there's
no need for the lock at all.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
  2010-01-19 18:46       ` Christoph Hellwig
@ 2010-01-20 10:57         ` Steven Whitehouse
  0 siblings, 0 replies; 7+ messages in thread
From: Steven Whitehouse @ 2010-01-20 10:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Peter Zijlstra, Dave Chinner, linux-kernel, mingo, Nick Piggin,
	linux-fsdevel, viro

Hi,

On Tue, 2010-01-19 at 13:46 -0500, Christoph Hellwig wrote:
> On Fri, Jan 15, 2010 at 01:53:15PM +0100, Peter Zijlstra wrote:
> > Well, I don't know enough about xfs (of filesystems in generic) to say
> > that with any certainty, but I can imagine inode writeback from the sync
> > that goes with umount to cause issues.
> > 
> > If this inode reclaim is past all that and the filesystem is basically
> > RO, then I don't think so and this could be considered a false positive,
> > in which case we need an annotation for this.
> 
> The issue is a bit more complicated.  In the unmount case
> invalidate_inodes() is indeed called after the filesystem is effectively
> read-only for user origination operations.  But there's a miriad of
> other invalidate_inodes() calls:
> 
>  - fs/block_dev.c:__invalidate_device()
> 
> 	This gets called from block device codes for various kinds of
> 	invalidations.  Doesn't make any sense at all to me, but hey..
> 
>  - fs/ext2/super.c:ext2_remount()
> 
> 	Appears like it's used to check for activate inodes during
> 	remount.  Very fishy usage, and could just be replaced with
> 	a list walk without any I/O
> 
>  - fs/gfs2/glock.c:gfs2_gl_hash_clear()
> 
> 	No idea.
> 
Its rather complicated and all down to using "special" inodes to cache
metadata so that GFS2 has two VFS inodes per "real" inode, one as per
normal and one just to cache metadata.

This causes a circular dependency between glocks and inodes since we
have something like this (in my best ascii art):

gfs2 inode -> iopen glock
           -> inode glock -> metadata inode

So at umount time, historically we've had to invalidate inodes once, and
then get rid of the inode glocks which implied a iput() on the metadata
inode and then invalidate inodes again to be rid of the metadata inodes.

This has been the source of many problems at umount time. In my -nmw git
tree at the moment, there are a couple of patches which are aimed at
fixing this issue. The solution is to embed a struct address_space in
each glock which caches metadata, rather than a complete inode.

>  - fs/gfs2/ops_fstype.c:fill_super()
> 
> 	Tries to kill all inodes in the fill_super error path, looks
> 	very fishy.
> 
For the same reason as above.

It should be possible to remove one or even both of these calls now. The
two patches in the -nmw tree do the bare minimum really to make the
change, but it should be possible to do a bit more clean up in that
area now,

Steve.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-01-20 10:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-15 12:02 lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage Dave Chinner
2010-01-15 12:11 ` Peter Zijlstra
2010-01-15 12:44   ` Dave Chinner
2010-01-15 12:53     ` Peter Zijlstra
2010-01-15 13:33       ` Dave Chinner
2010-01-19 18:46       ` Christoph Hellwig
2010-01-20 10:57         ` Steven Whitehouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox