public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* XFS reclaim lock order bug
@ 2010-11-23 12:18 Nick Piggin
  2010-11-23 21:12 ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Nick Piggin @ 2010-11-23 12:18 UTC (permalink / raw)
  To: xfs

Hi,

IIRC I've reported this before. Perhaps it is a false positive, but even
so it is still annoying that it triggers and turns off lockdep for
subsequent debugging.

Any chance it can get fixed or properly annotated?

Thanks,
Nick


[  286.895008] 
[  286.895010] =================================
[  286.895020] [ INFO: inconsistent lock state ]
[  286.895020] 2.6.37-rc3+ #28
[  286.895020] ---------------------------------
[  286.895020] inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
[  286.895020] rm/1844 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  286.895020]  (&(&ip->i_iolock)->mr_lock#2){++++-+}, at: [<ffffffffa0067e58>] xfs_ilock+0xe8/0x1e0 [xfs]
[  286.895020] {RECLAIM_FS-ON-R} state was registered at:
[  286.895020]   [<ffffffff8108380b>] mark_held_locks+0x6b/0xa0
[  286.895020]   [<ffffffff810838d1>] lockdep_trace_alloc+0x91/0xd0
[  286.895020]   [<ffffffff810d1851>] __alloc_pages_nodemask+0x91/0x780
[  286.895020]   [<ffffffff8110a043>] alloc_page_vma+0x93/0x150
[  286.895020]   [<ffffffff810ed909>] handle_mm_fault+0x719/0x9a0
[  286.895020]   [<ffffffff816068e3>] do_page_fault+0x133/0x4f0
[  286.895020]   [<ffffffff816039df>] page_fault+0x1f/0x30
[  286.895020]   [<ffffffff810cb52a>] generic_file_aio_read+0x2fa/0x730
[  286.895020]   [<ffffffffa009a29b>] xfs_file_aio_read+0x15b/0x390 [xfs]
[  286.895020]   [<ffffffff81117812>] do_sync_read+0xd2/0x110
[  286.895020]   [<ffffffff81117b55>] vfs_read+0xc5/0x190
[  286.895020]   [<ffffffff8111846c>] sys_read+0x4c/0x80
[  286.895020]   [<ffffffff8100312b>] system_call_fastpath+0x16/0x1b
[  286.895020] irq event stamp: 1095103
[  286.895020] hardirqs last  enabled at (1095103): [<ffffffff81603215>] _raw_spin_unlock_irqrestore+0x65/0x80
[  286.895020] hardirqs last disabled at (1095102): [<ffffffff81602be7>] _raw_spin_lock_irqsave+0x17/0x60
[  286.895020] softirqs last  enabled at (1093048): [<ffffffff81050c4e>] __do_softirq+0x16e/0x360
[  286.895020] softirqs last disabled at (1093009): [<ffffffff81003fcc>] call_softirq+0x1c/0x50
[  286.895020] 
[  286.895020] other info that might help us debug this:
[  286.895020] 3 locks held by rm/1844:
[  286.895020]  #0:  (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [<ffffffff81124cdc>] do_lookup+0xfc/0x170
[  286.895020]  #1:  (shrinker_rwsem){++++..}, at: [<ffffffff810d8f68>] shrink_slab+0x38/0x190
[  286.895020]  #2:  (&pag->pag_ici_reclaim_lock){+.+...}, at: [<ffffffffa00a2944>] xfs_reclaim_inodes_ag+0xa4/0x370 [xfs]
[  286.895020] 
[  286.895020] stack backtrace:
[  286.895020] Pid: 1844, comm: rm Not tainted 2.6.37-rc3+ #28
[  286.895020] Call Trace:
[  286.895020]  [<ffffffff810828f0>] print_usage_bug+0x170/0x180
[  286.895020]  [<ffffffff810835b1>] mark_lock+0x211/0x400
[  286.895020]  [<ffffffff810841ae>] __lock_acquire+0x40e/0x1490
[  286.895020]  [<ffffffff810852c5>] lock_acquire+0x95/0x1b0
[  286.895020]  [<ffffffffa0067e58>] ? xfs_ilock+0xe8/0x1e0 [xfs]
[  286.895020]  [<ffffffffa00a2774>] ? xfs_reclaim_inode+0x174/0x2a0 [xfs]
[  286.895020]  [<ffffffff81073c3a>] down_write_nested+0x4a/0x70
[  286.895020]  [<ffffffffa0067e58>] ? xfs_ilock+0xe8/0x1e0 [xfs]
[  286.895020]  [<ffffffffa0067e58>] xfs_ilock+0xe8/0x1e0 [xfs]
[  286.895020]  [<ffffffffa00a27c0>] xfs_reclaim_inode+0x1c0/0x2a0 [xfs]
[  286.895020]  [<ffffffffa00a2aaf>] xfs_reclaim_inodes_ag+0x20f/0x370 [xfs]
[  286.895020]  [<ffffffffa00a2c88>] xfs_reclaim_inode_shrink+0x78/0x80 [xfs]
[  286.895020]  [<ffffffff810d9057>] shrink_slab+0x127/0x190
[  286.895020]  [<ffffffff810dbf09>] zone_reclaim+0x349/0x420
[  286.895020]  [<ffffffff810cf815>] ? zone_watermark_ok+0x25/0xe0
[  286.895020]  [<ffffffff810d1603>] get_page_from_freelist+0x673/0x830
[  286.895020]  [<ffffffff8110c013>] ? init_object+0x43/0x80
[  286.895020]  [<ffffffffa00929ec>] ? kmem_zone_alloc+0x8c/0xd0 [xfs]
[  286.895020]  [<ffffffff8108380b>] ? mark_held_locks+0x6b/0xa0
[  286.895020]  [<ffffffff8108380b>] ? mark_held_locks+0x6b/0xa0
[  286.895020]  [<ffffffff810d18d0>] __alloc_pages_nodemask+0x110/0x780
[  286.895020]  [<ffffffff8110eb3a>] ? unfreeze_slab+0x11a/0x160
[  286.895020]  [<ffffffff811089d6>] alloc_pages_current+0x76/0xf0
[  286.895020]  [<ffffffff8110ce45>] new_slab+0x205/0x2b0
[  286.895020]  [<ffffffff8110ef7c>] __slab_alloc+0x30c/0x480
[  286.895020]  [<ffffffff8112f372>] ? d_alloc+0x22/0x200
[  286.895020]  [<ffffffff8112f372>] ? d_alloc+0x22/0x200
[  286.895020]  [<ffffffff8112f372>] ? d_alloc+0x22/0x200
[  286.895020]  [<ffffffff8110fe18>] kmem_cache_alloc+0xf8/0x1a0
[  286.895020]  [<ffffffff8112f1e0>] ? __d_lookup+0x1c0/0x1f0
[  286.895020]  [<ffffffff8112f020>] ? __d_lookup+0x0/0x1f0
[  286.895020]  [<ffffffff8112f372>] d_alloc+0x22/0x200
[  286.895020]  [<ffffffff811236fb>] d_alloc_and_lookup+0x2b/0x90
[  286.895020]  [<ffffffff8112f24c>] ? d_lookup+0x3c/0x60
[  286.895020]  [<ffffffff81124cfa>] do_lookup+0x11a/0x170
[  286.895020]  [<ffffffff81125a4a>] link_path_walk+0x31a/0xa50
[  286.895020]  [<ffffffff81126292>] path_walk+0x62/0xe0
[  286.895020]  [<ffffffff8112636b>] do_path_lookup+0x5b/0x60
[  286.895020]  [<ffffffff81126fe2>] user_path_at+0x52/0xa0
[  286.895020]  [<ffffffff8110e8d5>] ? kmem_cache_free+0xe5/0x190
[  286.895020]  [<ffffffff81083b4d>] ? trace_hardirqs_on+0xd/0x10
[  286.895020]  [<ffffffff81126760>] ? do_unlinkat+0x60/0x1d0
[  286.895020]  [<ffffffff8111d077>] vfs_fstatat+0x37/0x70
[  286.895020]  [<ffffffff8111d21f>] sys_newfstatat+0x1f/0x50
[  286.895020]  [<ffffffff81083afd>] ? trace_hardirqs_on_caller+0x13d/0x180
[  286.895020]  [<ffffffff816028c9>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  286.895020]  [<ffffffff8100312b>] system_call_fastpath+0x16/0x1b


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS reclaim lock order bug
  2010-11-23 12:18 XFS reclaim lock order bug Nick Piggin
@ 2010-11-23 21:12 ` Dave Chinner
  2010-11-24  0:58   ` Nick Piggin
  2010-11-24 20:03   ` Christoph Hellwig
  0 siblings, 2 replies; 14+ messages in thread
From: Dave Chinner @ 2010-11-23 21:12 UTC (permalink / raw)
  To: Nick Piggin; +Cc: xfs

On Tue, Nov 23, 2010 at 11:18:02PM +1100, Nick Piggin wrote:
> Hi,
> 
> IIRC I've reported this before. Perhaps it is a false positive, but even
> so it is still annoying that it triggers and turns off lockdep for
> subsequent debugging.
> 
> Any chance it can get fixed or properly annotated?

It is supposed to be handled by the re-initialisation of the
ip->i_iolock in ->evict_inode (xfs_fs_evict_inode). An inode found
in the reclaim state must have passed through this reinitialisation,
so from a lockdep perspective the iolock in the vfs path is a
different context to the iolock in the reclaim path. That fixed all
the non-reclaim state related lockdep false positives, so Perhaps
there is an issue with the lockdep reclaim state checking that does
not interact well with re-initialised lock contexts?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS reclaim lock order bug
  2010-11-23 21:12 ` Dave Chinner
@ 2010-11-24  0:58   ` Nick Piggin
  2010-11-24  2:26     ` Dave Chinner
  2010-11-24 20:03   ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: Nick Piggin @ 2010-11-24  0:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Nick Piggin, xfs

On Wed, Nov 24, 2010 at 08:12:58AM +1100, Dave Chinner wrote:
> On Tue, Nov 23, 2010 at 11:18:02PM +1100, Nick Piggin wrote:
> > Hi,
> > 
> > IIRC I've reported this before. Perhaps it is a false positive, but even
> > so it is still annoying that it triggers and turns off lockdep for
> > subsequent debugging.
> > 
> > Any chance it can get fixed or properly annotated?
> 
> It is supposed to be handled by the re-initialisation of the
> ip->i_iolock in ->evict_inode (xfs_fs_evict_inode). An inode found
> in the reclaim state must have passed through this reinitialisation,
> so from a lockdep perspective the iolock in the vfs path is a
> different context to the iolock in the reclaim path. That fixed all
> the non-reclaim state related lockdep false positives, so Perhaps
> there is an issue with the lockdep reclaim state checking that does
> not interact well with re-initialised lock contexts?

Hmm. I suppose that should work.

So xfs_reclaim_inode can only call xfs_ilock _after_ the Linux inode
has gone through ->evict_inode call? If so, then let's ask the lockdep
people.

Thanks,
Nick

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS reclaim lock order bug
  2010-11-24  0:58   ` Nick Piggin
@ 2010-11-24  2:26     ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2010-11-24  2:26 UTC (permalink / raw)
  To: Nick Piggin; +Cc: xfs

On Wed, Nov 24, 2010 at 11:58:11AM +1100, Nick Piggin wrote:
> On Wed, Nov 24, 2010 at 08:12:58AM +1100, Dave Chinner wrote:
> > On Tue, Nov 23, 2010 at 11:18:02PM +1100, Nick Piggin wrote:
> > > Hi,
> > > 
> > > IIRC I've reported this before. Perhaps it is a false positive, but even
> > > so it is still annoying that it triggers and turns off lockdep for
> > > subsequent debugging.
> > > 
> > > Any chance it can get fixed or properly annotated?
> > 
> > It is supposed to be handled by the re-initialisation of the
> > ip->i_iolock in ->evict_inode (xfs_fs_evict_inode). An inode found
> > in the reclaim state must have passed through this reinitialisation,
> > so from a lockdep perspective the iolock in the vfs path is a
> > different context to the iolock in the reclaim path. That fixed all
> > the non-reclaim state related lockdep false positives, so Perhaps
> > there is an issue with the lockdep reclaim state checking that does
> > not interact well with re-initialised lock contexts?
> 
> Hmm. I suppose that should work.
> 
> So xfs_reclaim_inode can only call xfs_ilock _after_ the Linux inode
> has gone through ->evict_inode call?

I think so - it's only found by xfs_reclaim_inode() if the radix
tree reclaim tag is set for the inode, which is only set in two
places: ->destroy_inode(), and in a failed lookup (which doesn't
touch the iolock). I can't see any other way we can get to an inode
in reclaim...

> If so, then let's ask the lockdep people.

Sure.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS reclaim lock order bug
  2010-11-23 21:12 ` Dave Chinner
  2010-11-24  0:58   ` Nick Piggin
@ 2010-11-24 20:03   ` Christoph Hellwig
  2010-11-25  3:48     ` Nick Piggin
  1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2010-11-24 20:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Nick Piggin, xfs

On Wed, Nov 24, 2010 at 08:12:58AM +1100, Dave Chinner wrote:
> It is supposed to be handled by the re-initialisation of the
> ip->i_iolock in ->evict_inode (xfs_fs_evict_inode). An inode found
> in the reclaim state must have passed through this reinitialisation,
> so from a lockdep perspective the iolock in the vfs path is a
> different context to the iolock in the reclaim path. That fixed all
> the non-reclaim state related lockdep false positives, so Perhaps
> there is an issue with the lockdep reclaim state checking that does
> not interact well with re-initialised lock contexts?

I've been looking through this again, and I think it's indeed not
enough.  We don't just need to re-initialize it, but also set a
different lockclass for it.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS reclaim lock order bug
  2010-11-24 20:03   ` Christoph Hellwig
@ 2010-11-25  3:48     ` Nick Piggin
  2010-11-25  6:25       ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Nick Piggin @ 2010-11-25  3:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Peter Zijlstra, Ingo Molnar, Nick Piggin, xfs

On Wed, Nov 24, 2010 at 03:03:41PM -0500, Christoph Hellwig wrote:
> On Wed, Nov 24, 2010 at 08:12:58AM +1100, Dave Chinner wrote:
> > It is supposed to be handled by the re-initialisation of the
> > ip->i_iolock in ->evict_inode (xfs_fs_evict_inode). An inode found
> > in the reclaim state must have passed through this reinitialisation,
> > so from a lockdep perspective the iolock in the vfs path is a
> > different context to the iolock in the reclaim path. That fixed all
> > the non-reclaim state related lockdep false positives, so Perhaps
> > there is an issue with the lockdep reclaim state checking that does
> > not interact well with re-initialised lock contexts?
> 
> I've been looking through this again, and I think it's indeed not
> enough.  We don't just need to re-initialize it, but also set a
> different lockclass for it.

Doesn't init_rwsem give it a new class?

Guys, can you take a quick look at the code Dave is referring to
(xfs_fs_evict_inode), and check that it actually does what he
intends?

We're getting what seems to be false positives in reclaim inversion
of lockings. Backtraces here
http://oss.sgi.com/pipermail/xfs/2010-November/048092.html

Thanks,
Nick

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS reclaim lock order bug
  2010-11-25  3:48     ` Nick Piggin
@ 2010-11-25  6:25       ` Peter Zijlstra
  2010-11-25  7:08         ` Nick Piggin
  2010-11-25 10:29         ` Dave Chinner
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2010-11-25  6:25 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Christoph Hellwig, Ingo Molnar, xfs

On Thu, 2010-11-25 at 14:48 +1100, Nick Piggin wrote:
> On Wed, Nov 24, 2010 at 03:03:41PM -0500, Christoph Hellwig wrote:
> > On Wed, Nov 24, 2010 at 08:12:58AM +1100, Dave Chinner wrote:
> > > It is supposed to be handled by the re-initialisation of the
> > > ip->i_iolock in ->evict_inode (xfs_fs_evict_inode). An inode found
> > > in the reclaim state must have passed through this reinitialisation,
> > > so from a lockdep perspective the iolock in the vfs path is a
> > > different context to the iolock in the reclaim path. That fixed all
> > > the non-reclaim state related lockdep false positives, so Perhaps
> > > there is an issue with the lockdep reclaim state checking that does
> > > not interact well with re-initialised lock contexts?
> > 
> > I've been looking through this again, and I think it's indeed not
> > enough.  We don't just need to re-initialize it, but also set a
> > different lockclass for it.
> 
> Doesn't init_rwsem give it a new class?

Per call-site, yes it should.

> Guys, can you take a quick look at the code Dave is referring to
> (xfs_fs_evict_inode), and check that it actually does what he
> intends?

Right, so this is trying to set a different class from the regular init
site, which (/me applies grep) lives in xfs_inode_alloc(), right?

Ought to work.. assuming the inode will be fully destroyed and new
inodes are always obtained through xfs_inode_alloc() and not reused.

> We're getting what seems to be false positives in reclaim inversion
> of lockings. Backtraces here
> http://oss.sgi.com/pipermail/xfs/2010-November/048092.html

Right, so there its holding the inode in the read path while taking a
page-fault which does an allocation.

vs

acquiring the inode in the xfs_reclaim_node_shrink() path.


Presumably the whole xfs_fs_evict_inode() stuff will happen _after_ its
possible to end up in that read path?


Something like the below would give the lock-class an explicit name,
because both sites now use the exact same init thing they're called:

  (&(&ip->i_iolock)->mr_lock)
  (&(&ip->i_iolock)->mr_lock#2)

Which is hard to tell apart, but I suspect #2 is the dead one, since
they get numbered in order of appearance and its hard to have a dead
inode before having a life one ;-)

In that case though, it would suggest the inode got re-used instead of
destroyed and re-created using xfs_alloc_inode(), is that at all
possible?

---
 fs/xfs/linux-2.6/xfs_super.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 064f964..721c1c5 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1091,6 +1091,8 @@ xfs_fs_write_inode(
 	return -error;
 }
 
+static struct lock_class_key xfs_dead_inode;
+
 STATIC void
 xfs_fs_evict_inode(
 	struct inode		*inode)
@@ -1118,6 +1120,8 @@ xfs_fs_evict_inode(
 	 */
 	ASSERT(!rwsem_is_locked(&ip->i_iolock.mr_lock));
 	mrlock_init(&ip->i_iolock, MRLOCK_BARRIER, "xfsio", ip->i_ino);
+	lockdep_set_class_and_name(&ip->i_iolock->mr_lock, &xfs_dead_inode, 
+			"xfd_dead_inode");
 
 	xfs_inactive(ip);
 }

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: XFS reclaim lock order bug
  2010-11-25  6:25       ` Peter Zijlstra
@ 2010-11-25  7:08         ` Nick Piggin
  2010-11-25  7:28           ` Peter Zijlstra
  2010-11-25 10:29         ` Dave Chinner
  1 sibling, 1 reply; 14+ messages in thread
From: Nick Piggin @ 2010-11-25  7:08 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Christoph Hellwig, Ingo Molnar, Nick Piggin, xfs

On Thu, Nov 25, 2010 at 07:25:25AM +0100, Peter Zijlstra wrote:
> On Thu, 2010-11-25 at 14:48 +1100, Nick Piggin wrote:
> > On Wed, Nov 24, 2010 at 03:03:41PM -0500, Christoph Hellwig wrote:
> > > On Wed, Nov 24, 2010 at 08:12:58AM +1100, Dave Chinner wrote:
> > > > It is supposed to be handled by the re-initialisation of the
> > > > ip->i_iolock in ->evict_inode (xfs_fs_evict_inode). An inode found
> > > > in the reclaim state must have passed through this reinitialisation,
> > > > so from a lockdep perspective the iolock in the vfs path is a
> > > > different context to the iolock in the reclaim path. That fixed all
> > > > the non-reclaim state related lockdep false positives, so Perhaps
> > > > there is an issue with the lockdep reclaim state checking that does
> > > > not interact well with re-initialised lock contexts?
> > > 
> > > I've been looking through this again, and I think it's indeed not
> > > enough.  We don't just need to re-initialize it, but also set a
> > > different lockclass for it.
> > 
> > Doesn't init_rwsem give it a new class?
> 
> Per call-site, yes it should.
> 
> > Guys, can you take a quick look at the code Dave is referring to
> > (xfs_fs_evict_inode), and check that it actually does what he
> > intends?
> 
> Right, so this is trying to set a different class from the regular init
> site, which (/me applies grep) lives in xfs_inode_alloc(), right?
> 
> Ought to work.. assuming the inode will be fully destroyed and new
> inodes are always obtained through xfs_inode_alloc() and not reused.
> 
> > We're getting what seems to be false positives in reclaim inversion
> > of lockings. Backtraces here
> > http://oss.sgi.com/pipermail/xfs/2010-November/048092.html
> 
> Right, so there its holding the inode in the read path while taking a
> page-fault which does an allocation.
> 
> vs
> 
> acquiring the inode in the xfs_reclaim_node_shrink() path.
> 
> 
> Presumably the whole xfs_fs_evict_inode() stuff will happen _after_ its
> possible to end up in that read path?

I think that's the idea.

 
> Something like the below would give the lock-class an explicit name,
> because both sites now use the exact same init thing they're called:
> 
>   (&(&ip->i_iolock)->mr_lock)
>   (&(&ip->i_iolock)->mr_lock#2)
> 
> Which is hard to tell apart, but I suspect #2 is the dead one, since
> they get numbered in order of appearance and its hard to have a dead
> inode before having a life one ;-)
> 
> In that case though, it would suggest the inode got re-used instead of
> destroyed and re-created using xfs_alloc_inode(), is that at all
> possible?

Ah, I see what you mean. An inode that has been through the evict_inode
path is now found to be locked for read(2). The rwsem is in the
init_always path of the allocator, so it seems like it's getting reused
after evict_inode. Dave?

> 
> ---
>  fs/xfs/linux-2.6/xfs_super.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
> index 064f964..721c1c5 100644
> --- a/fs/xfs/linux-2.6/xfs_super.c
> +++ b/fs/xfs/linux-2.6/xfs_super.c
> @@ -1091,6 +1091,8 @@ xfs_fs_write_inode(
>  	return -error;
>  }
>  
> +static struct lock_class_key xfs_dead_inode;
> +
>  STATIC void
>  xfs_fs_evict_inode(
>  	struct inode		*inode)
> @@ -1118,6 +1120,8 @@ xfs_fs_evict_inode(
>  	 */
>  	ASSERT(!rwsem_is_locked(&ip->i_iolock.mr_lock));
>  	mrlock_init(&ip->i_iolock, MRLOCK_BARRIER, "xfsio", ip->i_ino);
> +	lockdep_set_class_and_name(&ip->i_iolock->mr_lock, &xfs_dead_inode, 
> +			"xfd_dead_inode");
>  
>  	xfs_inactive(ip);
>  }

With this change, I assume the mrlock_init can go? (it would be nice
to have a wrapper to allocate the class by itself)


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS reclaim lock order bug
  2010-11-25  7:08         ` Nick Piggin
@ 2010-11-25  7:28           ` Peter Zijlstra
  2010-11-25 10:32             ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2010-11-25  7:28 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Christoph Hellwig, Ingo Molnar, xfs

On Thu, 2010-11-25 at 18:08 +1100, Nick Piggin wrote:
> > +static struct lock_class_key xfs_dead_inode;
> > +
> >  STATIC void
> >  xfs_fs_evict_inode(
> >       struct inode            *inode)
> > @@ -1118,6 +1120,8 @@ xfs_fs_evict_inode(
> >        */
> >       ASSERT(!rwsem_is_locked(&ip->i_iolock.mr_lock));
> >       mrlock_init(&ip->i_iolock, MRLOCK_BARRIER, "xfsio", ip->i_ino);
> > +     lockdep_set_class_and_name(&ip->i_iolock->mr_lock, &xfs_dead_inode, 
> > +                     "xfd_dead_inode");
> >  
> >       xfs_inactive(ip);
> >  }
> 
> With this change, I assume the mrlock_init can go? (it would be nice
> to have a wrapper to allocate the class by itself)


mrlock_init() does allocate a class (well rwsem_init, really), but sets
the name to a stringified version of the lock argument.

The lockdep_set_class*() interface is only guaranteed to work on a
freshly initialized lock structure -- which in this case is a bit of a
waste, but for debugging purposes would allow setting a clearer name.

Alternatively, you can write the code like:

  xfs_inode_t	dead_ip = XFS_I(inode);

  mrlock_init(&dead_ip->i_iolock, ...);

In which case its also obvious, as that would result in:

   (&(&dead_ip->i_iolock)->mr_lock)

as opposed to:

   (&(&ip->i_iolock)->mr_lock)


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS reclaim lock order bug
  2010-11-25  6:25       ` Peter Zijlstra
  2010-11-25  7:08         ` Nick Piggin
@ 2010-11-25 10:29         ` Dave Chinner
  2010-11-25 10:36           ` Peter Zijlstra
  2010-11-25 11:25           ` Christoph Hellwig
  1 sibling, 2 replies; 14+ messages in thread
From: Dave Chinner @ 2010-11-25 10:29 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Christoph Hellwig, Ingo Molnar, Nick Piggin, xfs

On Thu, Nov 25, 2010 at 07:25:25AM +0100, Peter Zijlstra wrote:
> On Thu, 2010-11-25 at 14:48 +1100, Nick Piggin wrote:
> > On Wed, Nov 24, 2010 at 03:03:41PM -0500, Christoph Hellwig wrote:
> > > On Wed, Nov 24, 2010 at 08:12:58AM +1100, Dave Chinner wrote:
> > > > It is supposed to be handled by the re-initialisation of the
> > > > ip->i_iolock in ->evict_inode (xfs_fs_evict_inode). An inode found
> > > > in the reclaim state must have passed through this reinitialisation,
> > > > so from a lockdep perspective the iolock in the vfs path is a
> > > > different context to the iolock in the reclaim path. That fixed all
> > > > the non-reclaim state related lockdep false positives, so Perhaps
> > > > there is an issue with the lockdep reclaim state checking that does
> > > > not interact well with re-initialised lock contexts?
> > > 
> > > I've been looking through this again, and I think it's indeed not
> > > enough.  We don't just need to re-initialize it, but also set a
> > > different lockclass for it.
> > 
> > Doesn't init_rwsem give it a new class?
> 
> Per call-site, yes it should.
> 
> > Guys, can you take a quick look at the code Dave is referring to
> > (xfs_fs_evict_inode), and check that it actually does what he
> > intends?
> 
> Right, so this is trying to set a different class from the regular init
> site, which (/me applies grep) lives in xfs_inode_alloc(), right?
> 
> Ought to work.. assuming the inode will be fully destroyed and new
> inodes are always obtained through xfs_inode_alloc() and not reused.
> 
> > We're getting what seems to be false positives in reclaim inversion
> > of lockings. Backtraces here
> > http://oss.sgi.com/pipermail/xfs/2010-November/048092.html
> 
> Right, so there its holding the inode in the read path while taking a
> page-fault which does an allocation.
> 
> vs
> 
> acquiring the inode in the xfs_reclaim_node_shrink() path.
> 
> 
> Presumably the whole xfs_fs_evict_inode() stuff will happen _after_ its
> possible to end up in that read path?
> 
> 
> Something like the below would give the lock-class an explicit name,
> because both sites now use the exact same init thing they're called:
> 
>   (&(&ip->i_iolock)->mr_lock)
>   (&(&ip->i_iolock)->mr_lock#2)
> 
> Which is hard to tell apart, but I suspect #2 is the dead one, since
> they get numbered in order of appearance and its hard to have a dead
> inode before having a life one ;-)
> 
> In that case though, it would suggest the inode got re-used instead of
> destroyed and re-created using xfs_alloc_inode(), is that at all
> possible?

Yes, actually it is - see the XFS_IRECLAIMABLE case in
xfs_iget_cache_hit(). I guess we haven't seen the original lock
inversion false positives that this was supposed to fix because the
reclaim warnings trip first...

I think that means we also need to reinitialise the lock when we recycle
the inode out of the XFS_IRECLAIMABLE state.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS reclaim lock order bug
  2010-11-25  7:28           ` Peter Zijlstra
@ 2010-11-25 10:32             ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2010-11-25 10:32 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Christoph Hellwig, Ingo Molnar, Nick Piggin, xfs

On Thu, Nov 25, 2010 at 08:28:17AM +0100, Peter Zijlstra wrote:
> On Thu, 2010-11-25 at 18:08 +1100, Nick Piggin wrote:
> > > +static struct lock_class_key xfs_dead_inode;
> > > +
> > >  STATIC void
> > >  xfs_fs_evict_inode(
> > >       struct inode            *inode)
> > > @@ -1118,6 +1120,8 @@ xfs_fs_evict_inode(
> > >        */
> > >       ASSERT(!rwsem_is_locked(&ip->i_iolock.mr_lock));
> > >       mrlock_init(&ip->i_iolock, MRLOCK_BARRIER, "xfsio", ip->i_ino);
> > > +     lockdep_set_class_and_name(&ip->i_iolock->mr_lock, &xfs_dead_inode, 
> > > +                     "xfd_dead_inode");
> > >  
> > >       xfs_inactive(ip);
> > >  }
> > 
> > With this change, I assume the mrlock_init can go? (it would be nice
> > to have a wrapper to allocate the class by itself)
> 
> 
> mrlock_init() does allocate a class (well rwsem_init, really), but sets
> the name to a stringified version of the lock argument.
> 
> The lockdep_set_class*() interface is only guaranteed to work on a
> freshly initialized lock structure -- which in this case is a bit of a
> waste, but for debugging purposes would allow setting a clearer name.
> 
> Alternatively, you can write the code like:
> 
>   xfs_inode_t	dead_ip = XFS_I(inode);
> 
>   mrlock_init(&dead_ip->i_iolock, ...);
> 
> In which case its also obvious, as that would result in:
> 
>    (&(&dead_ip->i_iolock)->mr_lock)
> 
> as opposed to:
> 
>    (&(&ip->i_iolock)->mr_lock)

Ok, that's a handy trick to know. I'll try and sort this out
tomorrow and make use of this trick to help identify the different
lock classes.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS reclaim lock order bug
  2010-11-25 10:29         ` Dave Chinner
@ 2010-11-25 10:36           ` Peter Zijlstra
  2010-11-25 11:25           ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2010-11-25 10:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, Ingo Molnar, Nick Piggin, xfs

On Thu, 2010-11-25 at 21:29 +1100, Dave Chinner wrote:

> > In that case though, it would suggest the inode got re-used instead of
> > destroyed and re-created using xfs_alloc_inode(), is that at all
> > possible?
> 
> Yes, actually it is - see the XFS_IRECLAIMABLE case in
> xfs_iget_cache_hit(). I guess we haven't seen the original lock
> inversion false positives that this was supposed to fix because the
> reclaim warnings trip first...
> 
> I think that means we also need to reinitialise the lock when we recycle
> the inode out of the XFS_IRECLAIMABLE state.

Right, in which case you probably want an explicit lock class and use
that one class in both xfs_alloc_inode() and the reclaim case.

See my earlier suggestion wrt lockdep_set_class*() on how to do that.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS reclaim lock order bug
  2010-11-25 10:29         ` Dave Chinner
  2010-11-25 10:36           ` Peter Zijlstra
@ 2010-11-25 11:25           ` Christoph Hellwig
  2010-11-25 11:37             ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2010-11-25 11:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Peter Zijlstra, Christoph Hellwig, Ingo Molnar, Nick Piggin, xfs

On Thu, Nov 25, 2010 at 09:29:40PM +1100, Dave Chinner wrote:
> Yes, actually it is - see the XFS_IRECLAIMABLE case in
> xfs_iget_cache_hit(). I guess we haven't seen the original lock
> inversion false positives that this was supposed to fix because the
> reclaim warnings trip first...
> 
> I think that means we also need to reinitialise the lock when we recycle
> the inode out of the XFS_IRECLAIMABLE state.

I came up with the patch below when we had a previous report of the
warning, but I couldn't convince myself that it really helps:

Index: linux-2.6/fs/xfs/xfs_iget.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_iget.c	2010-09-20 12:10:28.227444173 -0300
+++ linux-2.6/fs/xfs/xfs_iget.c	2010-09-20 12:11:25.631444190 -0300
@@ -207,6 +207,10 @@ xfs_iget_cache_hit(
 
 			ip->i_flags &= ~XFS_INEW;
 			ip->i_flags |= XFS_IRECLAIMABLE;
+
+			ASSERT(!rwsem_is_locked(&ip->i_iolock.mr_lock));
+			mrlock_init(&ip->i_iolock, MRLOCK_BARRIER, "xfsio", ip->i_ino);
+
 			__xfs_inode_set_reclaim_tag(pag, ip);
 			trace_xfs_iget_reclaim_fail(ip);
 			goto out_error;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS reclaim lock order bug
  2010-11-25 11:25           ` Christoph Hellwig
@ 2010-11-25 11:37             ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2010-11-25 11:37 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ingo Molnar, Nick Piggin, xfs

On Thu, 2010-11-25 at 06:25 -0500, Christoph Hellwig wrote:
> On Thu, Nov 25, 2010 at 09:29:40PM +1100, Dave Chinner wrote:
> > Yes, actually it is - see the XFS_IRECLAIMABLE case in
> > xfs_iget_cache_hit(). I guess we haven't seen the original lock
> > inversion false positives that this was supposed to fix because the
> > reclaim warnings trip first...
> > 
> > I think that means we also need to reinitialise the lock when we recycle
> > the inode out of the XFS_IRECLAIMABLE state.
> 
> I came up with the patch below when we had a previous report of the
> warning, but I couldn't convince myself that it really helps:
> 
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2010-09-20 12:10:28.227444173 -0300
> +++ linux-2.6/fs/xfs/xfs_iget.c	2010-09-20 12:11:25.631444190 -0300
> @@ -207,6 +207,10 @@ xfs_iget_cache_hit(
>  
>  			ip->i_flags &= ~XFS_INEW;
>  			ip->i_flags |= XFS_IRECLAIMABLE;
> +
> +			ASSERT(!rwsem_is_locked(&ip->i_iolock.mr_lock));
> +			mrlock_init(&ip->i_iolock, MRLOCK_BARRIER, "xfsio", ip->i_ino);
> +
>  			__xfs_inode_set_reclaim_tag(pag, ip);
>  			trace_xfs_iget_reclaim_fail(ip);
>  			goto out_error;


That adds a 3rd class which should work, but doesn't validate that the
first -- xfs_inode_alloc() and this one are in fact similar.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-11-25 11:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-23 12:18 XFS reclaim lock order bug Nick Piggin
2010-11-23 21:12 ` Dave Chinner
2010-11-24  0:58   ` Nick Piggin
2010-11-24  2:26     ` Dave Chinner
2010-11-24 20:03   ` Christoph Hellwig
2010-11-25  3:48     ` Nick Piggin
2010-11-25  6:25       ` Peter Zijlstra
2010-11-25  7:08         ` Nick Piggin
2010-11-25  7:28           ` Peter Zijlstra
2010-11-25 10:32             ` Dave Chinner
2010-11-25 10:29         ` Dave Chinner
2010-11-25 10:36           ` Peter Zijlstra
2010-11-25 11:25           ` Christoph Hellwig
2010-11-25 11:37             ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox