public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] Fix xfs_iunpin() sets I_DIRTY_SYNC after clear_inode().
@ 2006-10-04  9:20 Takenori Nagano
  2006-10-06  3:26 ` David Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Takenori Nagano @ 2006-10-04  9:20 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: Type: text/plain, Size: 4558 bytes --]

Hi,

The patch attached to this mail is a fix for a race of xfs_iunpin()
and generic_delete_inode().

generic_delete_inode() checks inode->i_state using BUG_ON() after
clear_inode(). At this point inode->i_state value must be I_CLEAR after
clear_inode().

But we detected inode->i_state was not I_CLEAR after clear_inode().
Kernel panic occurred by BUG_ON(inode->i_state != I_CLEAR). We
analyzed the memory dump, then we found I_DIRTY_SYNC and I_CLEAR ware
set. The function to set I_DIRTY_SYNC is only __mark_inode_dirty(). We
took a backtrace when i_state is I_CLEAR in __mark_inode_dirty().

This is a backtrace when inode->i_state=I_CLEAR in __mark_inode_dirty().

> > Call Trace:
> >  [<a000000100019980>] show_stack+0x80/0xa0
> >                                 sp=e00000012c077970 bsp=e00000012c0713e8
> >  [<a00000010003d1e0>] die+0x1c0/0x2e0
> >                                 sp=e00000012c077b40 bsp=e00000012c0713b0
> >  [<a00000010003e810>] ia64_bad_break+0x2f0/0x400
> >                                 sp=e00000012c077b40 bsp=e00000012c071388
> >  [<a000000100010180>] ia64_leave_kernel+0x0/0x260
> >                                 sp=e00000012c077bd0 bsp=e00000012c071388
> >  [<a0000001001c6170>] __mark_inode_dirty+0x390/0x3a0
> >                                 sp=e00000012c077da0 bsp=e00000012c071330
> >  [<a00000021c88a070>] xfs_iunpin+0x110/0x120 [xfs]
> >                                 sp=e00000012c077da0 bsp=e00000012c071310
> >  [<a00000021c892550>] xfs_inode_item_unpin+0x30/0x60 [xfs]
> >                                 sp=e00000012c077da0 bsp=e00000012c0712f0
> >  [<a00000021c8b1d00>] xfs_trans_chunk_committed+0x280/0x380 [xfs]
> >                                 sp=e00000012c077da0 bsp=e00000012c071298
> >  [<a00000021c8b1e80>] xfs_trans_committed+0x80/0x320 [xfs]
> >                                 sp=e00000012c077da0 bsp=e00000012c071248
> >  [<a00000021c898280>] xlog_state_do_callback+0x4a0/0xa20 [xfs]
> >                                 sp=e00000012c077da0 bsp=e00000012c0711b0
> >  [<a00000021c898990>] xlog_iodone+0x190/0x300 [xfs]
> >                                 sp=e00000012c077da0 bsp=e00000012c071168
> >  [<a00000021c8e6280>] pagebuf_iodone_work+0xc0/0x120 [xfs]
> >                                 sp=e00000012c077da0 bsp=e00000012c071148
> >  [<a0000001000d2f50>] worker_thread+0x3f0/0x5c0
> >                                 sp=e00000012c077da0 bsp=e00000012c0710b0
> >  [<a0000001000dd3e0>] kthread+0x220/0x280
> >                                 sp=e00000012c077e10 bsp=e00000012c071068
> >  [<a0000001000181a0>] kernel_thread_helper+0xe0/0x100
> >                                 sp=e00000012c077e30 bsp=e00000012c071040

We found __mark_inode_dirty() was called by xfs_iunpin().
xfs_iunpin() sets I_DIRTY_SYNC on inode->i_state if i_pincount is 0.

If __mark_inode_dirty() is running simultaneously between
clear_inode() and BUG_ON() in generic_delete_inode(), BUG_ON() is
called. We think this is a cause of this bug.

All dirty buffers are invalidated by clear_inode(), but in-core log is
not deleted and the state will be inconsistent. The in-core log is
written by xfs_logd even if inode was already deleted. A cause of this
bug is xfs does not care in-core log after deleting the inode.

xfs_fs_clear_inode() calls xfs_reclaim(). We think the recent fixes to
xfs_iunpin() were not correct. With those patches, xfs_iunpin() now
can determine whether xfs_inode is recycled or not, but it is not
essential way to fix this bug. xfs_iunpin() must never touch xfs_inode
which is already freed. If try_to_free_page() collects some slabs
including pinned inode, it is possible to result in memory corruption.

We come up with three possible solutions:
1. xfs_fs_clear_inode() waits for i_pincount to become 0.
2. xfs_fs_clear_inode() syncs in-core log if i_pincount is not 0.
3. xfs_fs_clear_inode() invalidates in-core log that relates to the
deleted inode.

We chose 2, because the frequency of sync is almost same to as that of
BUG(), and it is the same way to sync in-core log in xfs_fsync() when
inode is still pinned. It has very very little effect for xfs performance.

This patch fixes to sync in-core log if i_pincount is not 0 in
xfs_fs_clear_inode(). We think this is essential.
We already tested this patch for more than 100 hours in kernel-2.6.18.
If we did not use this patch, BUG() was called within only 5 minutes
on 32way Itanium server.
We used a test program that repeats open(), write() and unlink() in
parallel.

Best Regards,
--
Takenori Nagano, NEC
t-nagano@ah.jp.nec.com

[-- Attachment #2: xfs-fix-log-race.patch --]
[-- Type: text/plain, Size: 879 bytes --]

diff -Naru linux-2.6.18.orig/fs/xfs/linux-2.6/xfs_super.c linux-2.6.18/fs/xfs/linux-2.6/xfs_super.c
--- linux-2.6.18.orig/fs/xfs/linux-2.6/xfs_super.c	2006-09-20 12:42:06.000000000 +0900
+++ linux-2.6.18/fs/xfs/linux-2.6/xfs_super.c	2006-09-28 18:16:02.280000000 +0900
@@ -433,6 +433,7 @@
 	struct inode		*inode)
 {
 	bhv_vnode_t		*vp = vn_from_inode(inode);
+	xfs_inode_t		*ip;
 
 	vn_trace_entry(vp, __FUNCTION__, (inst_t *)__return_address);
 
@@ -452,10 +453,14 @@
 	vp->v_flag &= ~VMODIFIED;
 	VN_UNLOCK(vp, 0);
 
-	if (VNHEAD(vp))
+	if (VNHEAD(vp)) {
+		ip = XFS_BHVTOI(VNHEAD(vp));
+		if (xfs_ipincount(ip)) 
+			xfs_log_force(ip->i_mount, (xfs_lsn_t)0,
+				      XFS_LOG_FORCE | XFS_LOG_SYNC);
 		if (bhv_vop_reclaim(vp))
 			panic("%s: cannot reclaim 0x%p\n", __FUNCTION__, vp);
-
+	}
 	ASSERT(VNHEAD(vp) == NULL);
 
 #ifdef XFS_VNODE_TRACE

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-10-23  6:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-04  9:20 [patch] Fix xfs_iunpin() sets I_DIRTY_SYNC after clear_inode() Takenori Nagano
2006-10-06  3:26 ` David Chinner
2006-10-11  6:43   ` David Chinner
2006-10-12 12:20     ` Takenori Nagano
2006-10-13  1:46       ` David Chinner
2006-10-13  8:06         ` Timothy Shimmin
2006-10-13 12:17         ` Takenori Nagano
2006-10-17  2:02           ` David Chinner
2006-10-18  2:33             ` David Chinner
2006-10-18  9:07               ` David Chinner
2006-10-19  2:23                 ` Takenori Nagano
2006-10-19  4:58                   ` David Chinner
2006-10-20  4:25                     ` Takenori Nagano
2006-10-23  6:53                       ` Takenori Nagano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox