* crash with latest code drop.
@ 2008-10-15 1:49 Peter Leckie
2008-10-15 1:18 ` Dave Chinner
0 siblings, 1 reply; 17+ messages in thread
From: Peter Leckie @ 2008-10-15 1:49 UTC (permalink / raw)
To: Dave Chinner, xfs
Hi Dave and list, I hit the following crash with the latest code drop
that was pushed in yesterday
while running test 177 in a loop, after 4-5 loops it crashed as follows:
"<0>general protection fault: 0000 [1] SMP"
[1]kdb> bt
Stack traceback for pid 5425
0xffff88007b9b38d0 5425 5242 1 1 R 0xffff88007b9b3c38 *sync
sp ip Function (args)
0xffff88006c45fde8 0xffffffff80320990 radix_tree_tagged+0xa
(0x6b6b6b6b6b6b6b73, 0x0)
0xffff88006c45fe10 0xffffffffa01f36c6 [xfs]xfs_sync_inodes_ag+0x197
(0xffff88007d4025c8, invalid, invalid)
0xffff88006c45fe80 0xffffffffa01f38d1 [xfs]xfs_sync_inodes+0x63
(0xffff88007d4025c8, invalid)
0xffff88006c45fec0 0xffffffffa01f3997 [xfs]xfs_quiesce_data+0x13
(0xffff88007d4025c8)
0xffff88006c45fee0 0xffffffffa01f1800 [xfs]xfs_fs_sync_super+0x2b
(0xffff88007d9f10b8)
0xffff88006c45ff40 0xffffffff80292fd2 sync_filesystems+0xae (invalid)
0xffff88006c45ff60 0xffffffff802af48b do_sync+0x2f (0x1)
0xffff88006c45ff70 0xffffffff802af4c4 sys_sync+0xe
bb_special_case: Invalid bb_reg_state.memory, missing trailing entries
bb_special_case: on transfer to int_with_check
Assuming system_call_fastpath is 'pass through' with 6 register parameters
kdb_bb: 0xffffffff8020be0b [kernel]system_call_fastpath failed at
0xffffffff8020be98
Using old style backtrace, unreliable with no arguments
sp ip Function (args)
0xffff88006c45fdb0 0xffffffffa01f369a [xfs]xfs_sync_inodes_ag+0x16b
0xffff88006c45fde8 0xffffffff80320990 radix_tree_tagged+0xa
0xffff88006c45fe10 0xffffffffa01f36c6 [xfs]xfs_sync_inodes_ag+0x197
0xffff88006c45fe80 0xffffffffa01f38d1 [xfs]xfs_sync_inodes+0x63
0xffff88006c45fec0 0xffffffffa01f3997 [xfs]xfs_quiesce_data+0x13
0xffff88006c45fec8 0xffffffff802452b9 autoremove_wake_function
0xffff88006c45fee0 0xffffffffa01f1800 [xfs]xfs_fs_sync_super+0x2b
0xffff88006c45ff00 0xffffffff8043b871 __down_read+0x12
0xffff88006c45ff10 0xffffffffa024d395 [ext3]ext3_sync_fs+0x46
0xffff88006c45ff40 0xffffffff80292fd2 sync_filesystems+0xae
0xffff88006c45ff60 0xffffffff802af48b do_sync+0x2f
0xffff88006c45ff70 0xffffffff802af4c4 sys_sync+0xe
[1]kdb> rd
r15 = 0x0000000000000002 r14 = 0x0000000000000000
r13 = 0x000000000000000a r12 = 0x0000000000000040
bp = 0xffff88003793a9d8 bx = 0xffff880055c10250
r11 = 0x0000000000000001 r10 = 0xffff880055d0ade8
r9 = 0x000000000002309f r8 = 0xffffffffa01f369a
ax = 0x0000000000200000 cx = 0x0000000000000015
dx = 0x0000000000000000 si = 0x0000000000000000
di = 0x6b6b6b6b6b6b6b73 orig_ax = 0xffffffffffffffff
ip = 0xffffffff80320990 cs = 0x0000000000000010
flags = 0x0000000000010206 sp = 0xffff88006c45fe00
ss = 0x0000000000000018 ®s = 0xffff88006c45fd68
[1]kdb> id %ip
0xffffffff80320990 radix_tree_tagged+0xa: and 0x4(%rdi),%eax
0xffffffff80320993 radix_tree_tagged+0xd: retq
0xffffffff80320994 radix_tree_callback: cmp $0x7,%rsi
0xffffffff80320998 radix_tree_callback+0x4: push %rbx
0xffffffff80320999 radix_tree_callback+0x5: je
0xffffffff803209a1 radix_tree_callback+0xd
0xffffffff8032099b radix_tree_callback+0x7: cmp $0x17,%rsi
0xffffffff8032099f radix_tree_callback+0xb: jne
0xffffffff803209e1 radix_tree_callback+0x4d
0xffffffff803209a1 radix_tree_callback+0xd: movslq %edx,%rax
0xffffffff803209a4 radix_tree_callback+0x10: mov
3961501(%rip),%rdx # 0xffffffff806e7c48 _cpu_pda
0xffffffff803209ab radix_tree_callback+0x17: mov
$0xffffffff80796480,%rbx
0xffffffff803209b2 radix_tree_callback+0x1e: mov (%rdx,%rax,8),%rax
0xffffffff803209b6 radix_tree_callback+0x22: add 0x8(%rax),%rbx
0xffffffff803209ba radix_tree_callback+0x26: jmp
0xffffffff803209db radix_tree_callback+0x47
0xffffffff803209bc radix_tree_callback+0x28: cltq
0xffffffff803209be radix_tree_callback+0x2a: mov
5545627(%rip),%rdi # 0xffffffff8086a860 __key.8127
0xffffffff803209c5 radix_tree_callback+0x31: mov (%rbx,%rax,8),%rsi
The back trace is a little busted xfs_sync_inodes_ag appears to be calling:
xfs_sync_inodes_ag->xfs_flush_pages->mapping_tagged->radix_tree_tagged()
The poison appears to be from the i_mapping pointer in the linux inode
and it crashes
in radix_tree_tagged() after following &mapping->page_tree.
Any insight into the possible causes for this crash would be much
appreciated as this crash may be related to
another issue I'm looking at where sync fails to write out a deleted
inode due to missing XFS_DINODE_MAGIC.
Thanks,
Pete
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: crash with latest code drop. 2008-10-15 1:49 crash with latest code drop Peter Leckie @ 2008-10-15 1:18 ` Dave Chinner 2008-10-15 2:29 ` Christoph Hellwig 2008-10-15 3:47 ` Peter Leckie 0 siblings, 2 replies; 17+ messages in thread From: Dave Chinner @ 2008-10-15 1:18 UTC (permalink / raw) To: Peter Leckie; +Cc: xfs On Wed, Oct 15, 2008 at 11:49:20AM +1000, Peter Leckie wrote: > Hi Dave and list, I hit the following crash with the latest code drop > that was pushed in yesterday > while running test 177 in a loop, after 4-5 loops it crashed as follows: > "<0>general protection fault: 0000 [1] SMP" .... > > > [1]kdb> bt > Stack traceback for pid 5425 > 0xffff88007b9b38d0 5425 5242 1 1 R 0xffff88007b9b3c38 *sync > sp ip Function (args) > 0xffff88006c45fde8 0xffffffff80320990 radix_tree_tagged+0xa > (0x6b6b6b6b6b6b6b73, 0x0) > 0xffff88006c45fe10 0xffffffffa01f36c6 [xfs]xfs_sync_inodes_ag+0x197 > (0xffff88007d4025c8, invalid, invalid) > 0xffff88006c45fe80 0xffffffffa01f38d1 [xfs]xfs_sync_inodes+0x63 > (0xffff88007d4025c8, invalid) > 0xffff88006c45fec0 0xffffffffa01f3997 [xfs]xfs_quiesce_data+0x13 > (0xffff88007d4025c8) > 0xffff88006c45fee0 0xffffffffa01f1800 [xfs]xfs_fs_sync_super+0x2b > (0xffff88007d9f10b8) > 0xffff88006c45ff40 0xffffffff80292fd2 sync_filesystems+0xae (invalid) > 0xffff88006c45ff60 0xffffffff802af48b do_sync+0x2f (0x1) > 0xffff88006c45ff70 0xffffffff802af4c4 sys_sync+0xe > bb_special_case: Invalid bb_reg_state.memory, missing trailing entries > bb_special_case: on transfer to int_with_check > Assuming system_call_fastpath is 'pass through' with 6 register parameters > kdb_bb: 0xffffffff8020be0b [kernel]system_call_fastpath failed at > 0xffffffff8020be98 > > Using old style backtrace, unreliable with no arguments > sp ip Function (args) > 0xffff88006c45fdb0 0xffffffffa01f369a [xfs]xfs_sync_inodes_ag+0x16b > 0xffff88006c45fde8 0xffffffff80320990 radix_tree_tagged+0xa > 0xffff88006c45fe10 0xffffffffa01f36c6 [xfs]xfs_sync_inodes_ag+0x197 > 0xffff88006c45fe80 0xffffffffa01f38d1 [xfs]xfs_sync_inodes+0x63 > 0xffff88006c45fec0 0xffffffffa01f3997 [xfs]xfs_quiesce_data+0x13 > 0xffff88006c45fec8 0xffffffff802452b9 autoremove_wake_function > 0xffff88006c45fee0 0xffffffffa01f1800 [xfs]xfs_fs_sync_super+0x2b > 0xffff88006c45ff00 0xffffffff8043b871 __down_read+0x12 > 0xffff88006c45ff10 0xffffffffa024d395 [ext3]ext3_sync_fs+0x46 > 0xffff88006c45ff40 0xffffffff80292fd2 sync_filesystems+0xae > 0xffff88006c45ff60 0xffffffff802af48b do_sync+0x2f > 0xffff88006c45ff70 0xffffffff802af4c4 sys_sync+0xe > > > [1]kdb> rd > r15 = 0x0000000000000002 r14 = 0x0000000000000000 > r13 = 0x000000000000000a r12 = 0x0000000000000040 > bp = 0xffff88003793a9d8 bx = 0xffff880055c10250 > r11 = 0x0000000000000001 r10 = 0xffff880055d0ade8 > r9 = 0x000000000002309f r8 = 0xffffffffa01f369a > ax = 0x0000000000200000 cx = 0x0000000000000015 > dx = 0x0000000000000000 si = 0x0000000000000000 > di = 0x6b6b6b6b6b6b6b73 orig_ax = 0xffffffffffffffff > ip = 0xffffffff80320990 cs = 0x0000000000000010 > flags = 0x0000000000010206 sp = 0xffff88006c45fe00 > ss = 0x0000000000000018 ®s = 0xffff88006c45fd68 > > > [1]kdb> id %ip > 0xffffffff80320990 radix_tree_tagged+0xa: and 0x4(%rdi),%eax > 0xffffffff80320993 radix_tree_tagged+0xd: retq > 0xffffffff80320994 radix_tree_callback: cmp $0x7,%rsi > 0xffffffff80320998 radix_tree_callback+0x4: push %rbx > 0xffffffff80320999 radix_tree_callback+0x5: je > 0xffffffff803209a1 radix_tree_callback+0xd > 0xffffffff8032099b radix_tree_callback+0x7: cmp $0x17,%rsi > 0xffffffff8032099f radix_tree_callback+0xb: jne > 0xffffffff803209e1 radix_tree_callback+0x4d > 0xffffffff803209a1 radix_tree_callback+0xd: movslq %edx,%rax > 0xffffffff803209a4 radix_tree_callback+0x10: mov > 3961501(%rip),%rdx # 0xffffffff806e7c48 _cpu_pda > 0xffffffff803209ab radix_tree_callback+0x17: mov > $0xffffffff80796480,%rbx > 0xffffffff803209b2 radix_tree_callback+0x1e: mov (%rdx,%rax,8),%rax > 0xffffffff803209b6 radix_tree_callback+0x22: add 0x8(%rax),%rbx > 0xffffffff803209ba radix_tree_callback+0x26: jmp > 0xffffffff803209db radix_tree_callback+0x47 > 0xffffffff803209bc radix_tree_callback+0x28: cltq > 0xffffffff803209be radix_tree_callback+0x2a: mov > 5545627(%rip),%rdi # 0xffffffff8086a860 __key.8127 > 0xffffffff803209c5 radix_tree_callback+0x31: mov (%rbx,%rax,8),%rsi > > > > The back trace is a little busted xfs_sync_inodes_ag appears to be calling: > xfs_sync_inodes_ag->xfs_flush_pages->mapping_tagged->radix_tree_tagged() I think you'll find it's VN_DIRTY() here: 158 /* 159 * If we have to flush data or wait for I/O completion 160 * we need to drop the ilock that we currently hold. 161 * If we need to drop the lock, insert a marker if we 162 * have not already done so. 163 */ 164 if ((flags & SYNC_DELWRI) && VN_DIRTY(inode)) { 165 xfs_iunlock(ip, XFS_ILOCK_SHARED); 166 error = xfs_flush_pages(ip, 0, -1, fflag, FI_NONE); 167 if (flags & SYNC_IOWAIT) 168 vn_iowait(ip); 169 xfs_ilock(ip, XFS_ILOCK_SHARED); 170 } As: #define VN_DIRTY(vp) mapping_tagged(vp->i_mapping, \ PAGECACHE_TAG_DIRTY) Seems like we have a possible race with reclaim - we've found the dirty inode in the radix tree, then checked the reclaim state, then locked the inode, then tried to access the linux inode which is now now longer present. Can you confirm that the xfs_inode has either the I_RECLAIM or I_RECLAIMABLE flag set on it when the panic occurred? If this is the case, then the patch below will probably fix it. Cheers, Dave. -- Dave Chinner david@fromorbit.com XFS: avoid all reclaimable inodes in xfs_sync_inodes_ag If we are syncing data in xfs_sync_inodes_ag(), the VFS inode must still be referencable as the dirty data state is carried on the VFS inode. hence if we can't get a reference via igrab(), the inode must be in reclaim which implies that it has no dirty data attached. Leave such inodes to the reclaim code to flush the dirty inode state to disk and so avoid attempting to access the VFS inode when it may not exist in xfs_sync_inodes_ag(). Signed-off-by: Dave Chinner <david@fromorbit.com> --- fs/xfs/linux-2.6/xfs_sync.c | 23 ++++------------------- 1 files changed, 4 insertions(+), 19 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c index 08b2acf..85495df 100644 --- a/fs/xfs/linux-2.6/xfs_sync.c +++ b/fs/xfs/linux-2.6/xfs_sync.c @@ -80,7 +80,6 @@ xfs_sync_inodes_ag( do { struct inode *inode; - boolean_t inode_refed; xfs_inode_t *ip = NULL; /* @@ -133,26 +132,15 @@ xfs_sync_inodes_ag( /* * If we can't get a reference on the VFS_I, the inode must be - * in reclaim. If we can get the inode lock without blocking, - * it is safe to flush the inode because we hold the tree lock - * and xfs_iextract will block right now. Hence if we lock the - * inode while holding the tree lock, xfs_ireclaim() is - * guaranteed to block on the inode lock we now hold and hence - * it is safe to reference the inode until we drop the inode - * locks completely. + * in reclaim. Leave it for the reclaim code to flush. */ - inode_refed = B_FALSE; if (igrab(inode)) { read_unlock(&pag->pag_ici_lock); xfs_ilock(ip, lock_flags); - inode_refed = B_TRUE; } else { - if (!xfs_ilock_nowait(ip, lock_flags)) { - /* leave it to reclaim */ - read_unlock(&pag->pag_ici_lock); - continue; - } + /* leave it to reclaim */ read_unlock(&pag->pag_ici_lock); + continue; } /* @@ -186,10 +174,7 @@ xfs_sync_inodes_ag( if (lock_flags) xfs_iunlock(ip, lock_flags); - - if (inode_refed) { - IRELE(ip); - } + IRELE(ip); if (error) last_error = error; ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-15 1:18 ` Dave Chinner @ 2008-10-15 2:29 ` Christoph Hellwig 2008-10-15 3:16 ` Dave Chinner 2008-10-15 3:47 ` Peter Leckie 1 sibling, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2008-10-15 2:29 UTC (permalink / raw) To: Peter Leckie, xfs > + * in reclaim. Leave it for the reclaim code to flush. > */ > if (igrab(inode)) { > read_unlock(&pag->pag_ici_lock); > xfs_ilock(ip, lock_flags); > } else { > + /* leave it to reclaim */ > read_unlock(&pag->pag_ici_lock); > + continue; > } Might be betters as if (!igrab(inode)) { /* leave it to reclaim */ read_unlock(&pag->pag_ici_lock); continue; } read_unlock(&pag->pag_ici_lock); xfs_ilock(ip, lock_flags); which then also shows that we could stop doing the ilock at all for the DELWRI case, thas is after fixing the last caller doing SYNC_ATTR|SYNC_DELWRI. Well yeah, lots of things still to sort out in this area. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-15 2:29 ` Christoph Hellwig @ 2008-10-15 3:16 ` Dave Chinner 2008-10-15 3:24 ` Christoph Hellwig 0 siblings, 1 reply; 17+ messages in thread From: Dave Chinner @ 2008-10-15 3:16 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Peter Leckie, xfs On Tue, Oct 14, 2008 at 10:29:49PM -0400, Christoph Hellwig wrote: > > + * in reclaim. Leave it for the reclaim code to flush. > > */ > > if (igrab(inode)) { > > read_unlock(&pag->pag_ici_lock); > > xfs_ilock(ip, lock_flags); > > } else { > > + /* leave it to reclaim */ > > read_unlock(&pag->pag_ici_lock); > > + continue; > > } > > Might be betters as > > if (!igrab(inode)) { > /* leave it to reclaim */ > read_unlock(&pag->pag_ici_lock); > continue; > } > > read_unlock(&pag->pag_ici_lock); > xfs_ilock(ip, lock_flags); Yes, saner and more consistent with other code to do it that way. > which then also shows that we could stop doing the ilock at all for > the DELWRI case, thas is after fixing the last caller doing > SYNC_ATTR|SYNC_DELWRI. Well yeah, lots of things still to sort out in > this area. Here's an updated patch that takes into account all this. it removes a fair chunk of code now.... Cheers, Dave. -- Dave Chinner david@fromorbit.com XFS: avoid all reclaimable inodes in xfs_sync_inodes_ag If we are syncing data in xfs_sync_inodes_ag(), the VFS inode must still be referencable as the dirty data state is carried on the VFS inode. hence if we can't get a reference via igrab(), the inode must be in reclaim which implies that it has no dirty data attached. Leave such inodes to the reclaim code to flush the dirty inode state to disk and so avoid attempting to access the VFS inode when it may not exist in xfs_sync_inodes_ag(). Version 2: o change igrab logic to be more linear o remove initial reclaimable inode check now that we are using igrab() failure to detect inodes in reclaim o assert that igrab failure occurs only on reclaimable inodes o clean up inode locking - only grab the iolock if we are doing a SYNC_DELWRI call and we have a dirty inode. Signed-off-by: Dave Chinner <david@fromorbit.com> --- fs/xfs/linux-2.6/xfs_sync.c | 63 +++++++++--------------------------------- 1 files changed, 14 insertions(+), 49 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c index 08b2acf..d371423 100644 --- a/fs/xfs/linux-2.6/xfs_sync.c +++ b/fs/xfs/linux-2.6/xfs_sync.c @@ -63,25 +63,16 @@ xfs_sync_inodes_ag( int error = 0; int last_error = 0; int fflag = XFS_B_ASYNC; - int lock_flags = XFS_ILOCK_SHARED; if (flags & SYNC_DELWRI) fflag = XFS_B_DELWRI; if (flags & SYNC_WAIT) fflag = 0; /* synchronous overrides all */ - if (flags & SYNC_DELWRI) { - /* - * We need the I/O lock if we're going to call any of - * the flush/inval routines. - */ - lock_flags |= XFS_IOLOCK_SHARED; - } - do { struct inode *inode; - boolean_t inode_refed; xfs_inode_t *ip = NULL; + int lock_flags = XFS_ILOCK_SHARED; /* * use a gang lookup to find the next inode in the tree @@ -109,15 +100,6 @@ xfs_sync_inodes_ag( break; } - /* - * skip inodes in reclaim. Let xfs_syncsub do that for - * us so we don't need to worry. - */ - if (xfs_iflags_test(ip, (XFS_IRECLAIM|XFS_IRECLAIMABLE))) { - read_unlock(&pag->pag_ici_lock); - continue; - } - /* bad inodes are dealt with elsewhere */ inode = VFS_I(ip); if (is_bad_inode(inode)) { @@ -132,42 +114,29 @@ xfs_sync_inodes_ag( } /* - * If we can't get a reference on the VFS_I, the inode must be - * in reclaim. If we can get the inode lock without blocking, - * it is safe to flush the inode because we hold the tree lock - * and xfs_iextract will block right now. Hence if we lock the - * inode while holding the tree lock, xfs_ireclaim() is - * guaranteed to block on the inode lock we now hold and hence - * it is safe to reference the inode until we drop the inode - * locks completely. + * If we can't get a reference on the inode, it must be + * in reclaim. Leave it for the reclaim code to flush. */ - inode_refed = B_FALSE; - if (igrab(inode)) { - read_unlock(&pag->pag_ici_lock); - xfs_ilock(ip, lock_flags); - inode_refed = B_TRUE; - } else { - if (!xfs_ilock_nowait(ip, lock_flags)) { - /* leave it to reclaim */ - read_unlock(&pag->pag_ici_lock); - continue; - } + if (!igrab(inode)) { + ASSERT(xfs_iflags_test(ip, + (XFS_IRECLAIM|XFS_IRECLAIMABLE))); read_unlock(&pag->pag_ici_lock); + continue; } + read_unlock(&pag->pag_ici_lock); /* * If we have to flush data or wait for I/O completion - * we need to drop the ilock that we currently hold. - * If we need to drop the lock, insert a marker if we - * have not already done so. + * we need to hold the iolock. */ if ((flags & SYNC_DELWRI) && VN_DIRTY(inode)) { - xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_ilock(ip, XFS_IOLOCK_SHARED); + lock_flags |= XFS_IOLOCK_SHARED; error = xfs_flush_pages(ip, 0, -1, fflag, FI_NONE); if (flags & SYNC_IOWAIT) vn_iowait(ip); - xfs_ilock(ip, XFS_ILOCK_SHARED); } + xfs_ilock(ip, XFS_ILOCK_SHARED); if ((flags & SYNC_ATTR) && !xfs_inode_clean(ip)) { if (flags & SYNC_WAIT) { @@ -184,12 +153,8 @@ xfs_sync_inodes_ag( } } - if (lock_flags) - xfs_iunlock(ip, lock_flags); - - if (inode_refed) { - IRELE(ip); - } + xfs_iunlock(ip, lock_flags); + IRELE(ip); if (error) last_error = error; ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-15 3:16 ` Dave Chinner @ 2008-10-15 3:24 ` Christoph Hellwig 2008-10-15 3:51 ` Dave Chinner 0 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2008-10-15 3:24 UTC (permalink / raw) To: Christoph Hellwig, Peter Leckie, xfs On Wed, Oct 15, 2008 at 02:16:46PM +1100, Dave Chinner wrote: > + xfs_iunlock(ip, lock_flags); > + IRELE(ip); Aka xfs_iput(ip, lock_flags); ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-15 3:24 ` Christoph Hellwig @ 2008-10-15 3:51 ` Dave Chinner 2008-10-15 5:50 ` Peter Leckie 0 siblings, 1 reply; 17+ messages in thread From: Dave Chinner @ 2008-10-15 3:51 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Peter Leckie, xfs On Tue, Oct 14, 2008 at 11:24:31PM -0400, Christoph Hellwig wrote: > On Wed, Oct 15, 2008 at 02:16:46PM +1100, Dave Chinner wrote: > > + xfs_iunlock(ip, lock_flags); > > + IRELE(ip); > > Aka xfs_iput(ip, lock_flags); Update below. Cheers, Dave. -- Dave Chinner david@fromorbit.com XFS: avoid all reclaimable inodes in xfs_sync_inodes_ag If we are syncing data in xfs_sync_inodes_ag(), the VFS inode must still be referencable as the dirty data state is carried on the VFS inode. hence if we can't get a reference via igrab(), the inode must be in reclaim which implies that it has no dirty data attached. Leave such inodes to the reclaim code to flush the dirty inode state to disk and so avoid attempting to access the VFS inode when it may not exist in xfs_sync_inodes_ag(). Version 3: o converted unlock/rele to an xfs_iput() call. Version 2: o change igrab logic to be more linear o remove initial reclaimable inode check now that we are using igrab() failure to find reclaimable inodes o assert that igrab failure occurs only on reclaimable inodes o clean up inode locking - only grab the iolock if we are doing a SYNC_DELWRI call and we have a dirty inode. --- fs/xfs/linux-2.6/xfs_sync.c | 63 +++++++++---------------------------------- 1 files changed, 13 insertions(+), 50 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c index 08b2acf..8a4bb3c 100644 --- a/fs/xfs/linux-2.6/xfs_sync.c +++ b/fs/xfs/linux-2.6/xfs_sync.c @@ -63,25 +63,16 @@ xfs_sync_inodes_ag( int error = 0; int last_error = 0; int fflag = XFS_B_ASYNC; - int lock_flags = XFS_ILOCK_SHARED; if (flags & SYNC_DELWRI) fflag = XFS_B_DELWRI; if (flags & SYNC_WAIT) fflag = 0; /* synchronous overrides all */ - if (flags & SYNC_DELWRI) { - /* - * We need the I/O lock if we're going to call any of - * the flush/inval routines. - */ - lock_flags |= XFS_IOLOCK_SHARED; - } - do { struct inode *inode; - boolean_t inode_refed; xfs_inode_t *ip = NULL; + int lock_flags = XFS_ILOCK_SHARED; /* * use a gang lookup to find the next inode in the tree @@ -109,15 +100,6 @@ xfs_sync_inodes_ag( break; } - /* - * skip inodes in reclaim. Let xfs_syncsub do that for - * us so we don't need to worry. - */ - if (xfs_iflags_test(ip, (XFS_IRECLAIM|XFS_IRECLAIMABLE))) { - read_unlock(&pag->pag_ici_lock); - continue; - } - /* bad inodes are dealt with elsewhere */ inode = VFS_I(ip); if (is_bad_inode(inode)) { @@ -132,42 +114,29 @@ xfs_sync_inodes_ag( } /* - * If we can't get a reference on the VFS_I, the inode must be - * in reclaim. If we can get the inode lock without blocking, - * it is safe to flush the inode because we hold the tree lock - * and xfs_iextract will block right now. Hence if we lock the - * inode while holding the tree lock, xfs_ireclaim() is - * guaranteed to block on the inode lock we now hold and hence - * it is safe to reference the inode until we drop the inode - * locks completely. + * If we can't get a reference on the inode, it must be + * in reclaim. Leave it for the reclaim code to flush. */ - inode_refed = B_FALSE; - if (igrab(inode)) { - read_unlock(&pag->pag_ici_lock); - xfs_ilock(ip, lock_flags); - inode_refed = B_TRUE; - } else { - if (!xfs_ilock_nowait(ip, lock_flags)) { - /* leave it to reclaim */ - read_unlock(&pag->pag_ici_lock); - continue; - } + if (!igrab(inode)) { + ASSERT(xfs_iflags_test(ip, + (XFS_IRECLAIM|XFS_IRECLAIMABLE))); read_unlock(&pag->pag_ici_lock); + continue; } + read_unlock(&pag->pag_ici_lock); /* * If we have to flush data or wait for I/O completion - * we need to drop the ilock that we currently hold. - * If we need to drop the lock, insert a marker if we - * have not already done so. + * we need to hold the iolock. */ if ((flags & SYNC_DELWRI) && VN_DIRTY(inode)) { - xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_ilock(ip, XFS_IOLOCK_SHARED); + lock_flags |= XFS_IOLOCK_SHARED; error = xfs_flush_pages(ip, 0, -1, fflag, FI_NONE); if (flags & SYNC_IOWAIT) vn_iowait(ip); - xfs_ilock(ip, XFS_ILOCK_SHARED); } + xfs_ilock(ip, XFS_ILOCK_SHARED); if ((flags & SYNC_ATTR) && !xfs_inode_clean(ip)) { if (flags & SYNC_WAIT) { @@ -183,13 +152,7 @@ xfs_sync_inodes_ag( xfs_ifunlock(ip); } } - - if (lock_flags) - xfs_iunlock(ip, lock_flags); - - if (inode_refed) { - IRELE(ip); - } + xfs_iput(ip, lock_flags); if (error) last_error = error; ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-15 3:51 ` Dave Chinner @ 2008-10-15 5:50 ` Peter Leckie 2008-10-15 6:19 ` Dave Chinner 0 siblings, 1 reply; 17+ messages in thread From: Peter Leckie @ 2008-10-15 5:50 UTC (permalink / raw) To: Dave Chinner, Peter Leckie, xfs Dave Chinner wrote: > Update below. > > Cheers, > > Dave. > The original patch appeared to fix the issue, however the latest one Oops as follows: 177 553s ...BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8 IP: [<ffffffff802a406e>] is_bad_inode+0x2/0x11 PGD 7b0fe067 PUD 4c414067 PMD 0 Oops: 0000 [1] SMP Entering kdb (current=0xffff88007f15e190, pid 6915) on processor 1 Oops: <NULL> due to oops @ 0xffffffff802a406e r15 = 0x0000000000000040 r14 = 0xffff88007e9ae608 r13 = 0x0000000000000000 r12 = 0x000000000000000a bp = 0xffff8800375a8d68 bx = 0x0000000000000000 r11 = 0x0000000000000001 r10 = 0x0000000000000085 r9 = 0xffff880057cd3dc0 r8 = 0xffff88007b0cddf0 ax = 0x0000000000000000 cx = 0x000000000000001b dx = 0x0000000000000084 si = 0xffff880057d0c248 di = 0x0000000000000000 orig_ax = 0xffffffffffffffff ip = 0xffffffff802a406e cs = 0x0000000000000010 flags = 0x0000000000010246 sp = 0xffff88007b0cde10 ss = 0x0000000000000018 ®s = 0xffff88007b0cdd78 [1]kdb> bt Stack traceback for pid 6915 0xffff88007f15e190 6915 6735 1 1 R 0xffff88007f15e4f8 *bulkstat_unlink sp ip Function (args) 0xffff88007b0cddf8 0xffffffff802a406e is_bad_inode+0x2 (0x0) 0xffff88007b0cde20 0xffffffffa01f3615 [xfs]xfs_sync_inodes_ag+0xe6 (0xffff88007e9ae608, invalid, invalid) 0xffff88007b0cde80 0xffffffffa01f3864 [xfs]xfs_sync_inodes+0x63 (0xffff88007e9ae608, invalid) 0xffff88007b0cdec0 0xffffffffa01f392a [xfs]xfs_quiesce_data+0x13 (0xffff88007e9ae608) 0xffff88007b0cdee0 0xffffffffa01f1800 [xfs]xfs_fs_sync_super+0x2b (0xffff88007b878730) 0xffff88007b0cdf40 0xffffffff80292fd2 sync_filesystems+0xae (invalid) 0xffff88007b0cdf60 0xffffffff802af48b do_sync+0x2f (0x1) 0xffff88007b0cdf70 0xffffffff802af4c4 sys_sync+0xe bb_special_case: Invalid bb_reg_state.memory, missing trailing entries bb_special_case: on transfer to int_with_check Assuming system_call_fastpath is 'pass through' with 6 register parameters kdb_bb: 0xffffffff8020be0b [kernel]system_call_fastpath failed at 0xffffffff8020be98 Using old style backtrace, unreliable with no arguments sp ip Function (args) 0xffff88007b0cddf8 0xffffffff802a406e is_bad_inode+0x2 0xffff88007b0cde20 0xffffffffa01f3615 [xfs]xfs_sync_inodes_ag+0xe6 0xffff88007b0cde80 0xffffffffa01f3864 [xfs]xfs_sync_inodes+0x63 0xffff88007b0cdec0 0xffffffffa01f392a [xfs]xfs_quiesce_data+0x13 0xffff88007b0cdec8 0xffffffff802452b9 autoremove_wake_function 0xffff88007b0cdee0 0xffffffffa01f1800 [xfs]xfs_fs_sync_super+0x2b 0xffff88007b0cdf00 0xffffffff8043b871 __down_read+0x12 0xffff88007b0cdf10 0xffffffffa024d395 [ext3]ext3_sync_fs+0x46 0xffff88007b0cdf40 0xffffffff80292fd2 sync_filesystems+0xae 0xffff88007b0cdf60 0xffffffff802af48b do_sync+0x2f 0xffff88007b0cdf70 0xffffffff802af4c4 sys_sync+0xe [1]kdb> id %ip 0xffffffff802a406e is_bad_inode+0x2: cmpq $0xffffffff8045ce20,0xe8(%rdi) 0xffffffff802a4079 is_bad_inode+0xd: sete %al 0xffffffff802a407c is_bad_inode+0x10: retq 0xffffffff802a407d make_bad_inode: push %rbx 0xffffffff802a407e make_bad_inode+0x1: mov %rdi,%rbx 0xffffffff802a4081 make_bad_inode+0x4: callq 0xffffffff802a25d5 remove_inode_hash 0xffffffff802a4086 make_bad_inode+0x9: mov 0xf8(%rbx),%rdi 0xffffffff802a408d make_bad_inode+0x10: movw $0x8000,0xb2(%rbx) 0xffffffff802a4096 make_bad_inode+0x19: callq 0xffffffff80237d6b current_fs_time 0xffffffff802a409b make_bad_inode+0x1e: movq $0xffffffff8045ce20,0xe8(%rbx) 0xffffffff802a40a6 make_bad_inode+0x29: mov %rax,0x90(%rbx) 0xffffffff802a40ad make_bad_inode+0x30: mov %rdx,0x98(%rbx) 0xffffffff802a40b4 make_bad_inode+0x37: mov %rax,0x80(%rbx) 0xffffffff802a40bb make_bad_inode+0x3e: mov %rdx,0x88(%rbx) 0xffffffff802a40c2 make_bad_inode+0x45: mov %rax,0x70(%rbx) 0xffffffff802a40c6 make_bad_inode+0x49: mov %rdx,0x78(%rbx) Thanks, Pete ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-15 5:50 ` Peter Leckie @ 2008-10-15 6:19 ` Dave Chinner 2008-10-15 7:51 ` Peter Leckie 2008-10-22 6:19 ` Dave Chinner 0 siblings, 2 replies; 17+ messages in thread From: Dave Chinner @ 2008-10-15 6:19 UTC (permalink / raw) To: Peter Leckie; +Cc: xfs On Wed, Oct 15, 2008 at 03:50:48PM +1000, Peter Leckie wrote: > Dave Chinner wrote: >> Update below. >> >> Cheers, >> >> Dave. >> > The original patch appeared to fix the issue, however the latest one > Oops as follows: Well, I think the problem dhould be obvious - it's the same as the first report - deferencing the linux inode without first having a refernce on it. FWIW, if you apply the "combine linux/xfs inode" patch series, this failure will go away. Updated patch below. Cheers, Dave. -- Dave Chinner david@fromorbit.com XFS: avoid all reclaimable inodes in xfs_sync_inodes_ag If we are syncing data in xfs_sync_inodes_ag(), the VFS inode must still be referencable as the dirty data state is carried on the VFS inode. hence if we can't get a reference via igrab(), the inode must be in reclaim which implies that it has no dirty data attached. Leave such inodes to the reclaim code to flush the dirty inode state to disk and so avoid attempting to access the VFS inode when it may not exist in xfs_sync_inodes_ag(). Version 4: o don't reference liinux inode untiil after igrab() succeeds Version 3: o converted unlock/rele to an xfs_iput() call. Version 2: o change igrab logic to be more linear o remove initial reclaimable inode check now that we are using igrab() failure to find reclaimable inodes o assert that igrab failure occurs only on reclaimable inodes o clean up inode locking - only grab the iolock if we are doing a SYNC_DELWRI call and we have a dirty inode. --- fs/xfs/linux-2.6/xfs_sync.c | 75 ++++++++++-------------------------------- 1 files changed, 18 insertions(+), 57 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c index 08b2acf..39ed7f3 100644 --- a/fs/xfs/linux-2.6/xfs_sync.c +++ b/fs/xfs/linux-2.6/xfs_sync.c @@ -63,25 +63,16 @@ xfs_sync_inodes_ag( int error = 0; int last_error = 0; int fflag = XFS_B_ASYNC; - int lock_flags = XFS_ILOCK_SHARED; if (flags & SYNC_DELWRI) fflag = XFS_B_DELWRI; if (flags & SYNC_WAIT) fflag = 0; /* synchronous overrides all */ - if (flags & SYNC_DELWRI) { - /* - * We need the I/O lock if we're going to call any of - * the flush/inval routines. - */ - lock_flags |= XFS_IOLOCK_SHARED; - } - do { struct inode *inode; - boolean_t inode_refed; xfs_inode_t *ip = NULL; + int lock_flags = XFS_ILOCK_SHARED; /* * use a gang lookup to find the next inode in the tree @@ -109,22 +100,6 @@ xfs_sync_inodes_ag( break; } - /* - * skip inodes in reclaim. Let xfs_syncsub do that for - * us so we don't need to worry. - */ - if (xfs_iflags_test(ip, (XFS_IRECLAIM|XFS_IRECLAIMABLE))) { - read_unlock(&pag->pag_ici_lock); - continue; - } - - /* bad inodes are dealt with elsewhere */ - inode = VFS_I(ip); - if (is_bad_inode(inode)) { - read_unlock(&pag->pag_ici_lock); - continue; - } - /* nothing to sync during shutdown */ if (XFS_FORCED_SHUTDOWN(mp)) { read_unlock(&pag->pag_ici_lock); @@ -132,42 +107,34 @@ xfs_sync_inodes_ag( } /* - * If we can't get a reference on the VFS_I, the inode must be - * in reclaim. If we can get the inode lock without blocking, - * it is safe to flush the inode because we hold the tree lock - * and xfs_iextract will block right now. Hence if we lock the - * inode while holding the tree lock, xfs_ireclaim() is - * guaranteed to block on the inode lock we now hold and hence - * it is safe to reference the inode until we drop the inode - * locks completely. + * If we can't get a reference on the inode, it must be + * in reclaim. Leave it for the reclaim code to flush. */ - inode_refed = B_FALSE; - if (igrab(inode)) { - read_unlock(&pag->pag_ici_lock); - xfs_ilock(ip, lock_flags); - inode_refed = B_TRUE; - } else { - if (!xfs_ilock_nowait(ip, lock_flags)) { - /* leave it to reclaim */ - read_unlock(&pag->pag_ici_lock); - continue; - } + inode = VFS_I(ip); + if (!igrab(inode)) { read_unlock(&pag->pag_ici_lock); + continue; + } + read_unlock(&pag->pag_ici_lock); + + /* bad inodes are dealt with elsewhere */ + if (is_bad_inode(inode)) { + IRELE(ip); + continue; } /* * If we have to flush data or wait for I/O completion - * we need to drop the ilock that we currently hold. - * If we need to drop the lock, insert a marker if we - * have not already done so. + * we need to hold the iolock. */ if ((flags & SYNC_DELWRI) && VN_DIRTY(inode)) { - xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_ilock(ip, XFS_IOLOCK_SHARED); + lock_flags |= XFS_IOLOCK_SHARED; error = xfs_flush_pages(ip, 0, -1, fflag, FI_NONE); if (flags & SYNC_IOWAIT) vn_iowait(ip); - xfs_ilock(ip, XFS_ILOCK_SHARED); } + xfs_ilock(ip, XFS_ILOCK_SHARED); if ((flags & SYNC_ATTR) && !xfs_inode_clean(ip)) { if (flags & SYNC_WAIT) { @@ -183,13 +150,7 @@ xfs_sync_inodes_ag( xfs_ifunlock(ip); } } - - if (lock_flags) - xfs_iunlock(ip, lock_flags); - - if (inode_refed) { - IRELE(ip); - } + xfs_iput(ip, lock_flags); if (error) last_error = error; ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-15 6:19 ` Dave Chinner @ 2008-10-15 7:51 ` Peter Leckie 2008-10-16 2:43 ` Peter Leckie 2008-10-22 6:19 ` Dave Chinner 1 sibling, 1 reply; 17+ messages in thread From: Peter Leckie @ 2008-10-15 7:51 UTC (permalink / raw) To: Dave Chinner, xfs Dave Chinner wrote: > On Wed, Oct 15, 2008 at 03:50:48PM +1000, Peter Leckie wrote: > >> Dave Chinner wrote: >> >>> Update below. >>> >>> Cheers, >>> >>> Dave. >>> >>> >> The original patch appeared to fix the issue, however the latest one >> Oops as follows: >> > > Well, I think the problem dhould be obvious Gee thanks Dave. > - it's the same as > the first report - deferencing the linux inode without first having > a refernce on it. > Yes it resolves the issue. Pete ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-15 7:51 ` Peter Leckie @ 2008-10-16 2:43 ` Peter Leckie 2008-10-16 5:55 ` Dave Chinner 2008-10-16 9:00 ` Christoph Hellwig 0 siblings, 2 replies; 17+ messages in thread From: Peter Leckie @ 2008-10-16 2:43 UTC (permalink / raw) To: Dave Chinner, xfs >> - it's the same as >> the first report - deferencing the linux inode without first having >> a refernce on it. >> > > Yes it resolves the issue. I spoke to soon, Ooops as follows: <1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000208 <1>IP: [<ffffffff802a37de>] igrab+0x10/0x30 <4>PGD 6ac33067 PUD 7dda2067 PMD 0 <0>Oops: 0000 [1] SMP [0]kdb> bt Stack traceback for pid 5195 0xffff8800378e4c50 5195 5094 1 0 R 0xffff8800378e4fb8 *bulkstat_unlink sp ip Function (args) 0xffff88007d5abde8 0xffffffff802a37de igrab+0x10 (0x0) 0xffff88007d5abe20 0xffffffffa01f3623 [xfs]xfs_sync_inodes_ag+0xf4 (0xffff88003756e288, invalid, invalid) 0xffff88007d5abe80 0xffffffffa01f3853 [xfs]xfs_sync_inodes+0x63 (0xffff88003756e288, invalid) 0xffff88007d5abec0 0xffffffffa01f3919 [xfs]xfs_quiesce_data+0x13 (0xffff88003756e288) 0xffff88007d5abee0 0xffffffffa01f1800 [xfs]xfs_fs_sync_super+0x2b (0xffff88007f1c09c8) 0xffff88007d5abf40 0xffffffff80292fd2 sync_filesystems+0xae (invalid) 0xffff88007d5abf60 0xffffffff802af48b do_sync+0x2f (0x1) 0xffff88007d5abf70 0xffffffff802af4c4 sys_sync+0xe bb_special_case: Invalid bb_reg_state.memory, missing trailing entries bb_special_case: on transfer to int_with_check Assuming system_call_fastpath is 'pass through' with 6 register parameters kdb_bb: 0xffffffff8020be0b [kernel]system_call_fastpath failed at 0xffffffff8020be98 Using old style backtrace, unreliable with no arguments sp ip Function (args) 0xffff88007d5abde8 0xffffffff802a37de igrab+0x10 0xffff88007d5abe10 0xffffffff802a37de igrab+0x10 0xffff88007d5abe20 0xffffffffa01f3623 [xfs]xfs_sync_inodes_ag+0xf4 0xffff88007d5abe80 0xffffffffa01f3853 [xfs]xfs_sync_inodes+0x63 0xffff88007d5abec0 0xffffffffa01f3919 [xfs]xfs_quiesce_data+0x13 0xffff88007d5abec8 0xffffffff802452b9 autoremove_wake_function 0xffff88007d5abee0 0xffffffffa01f1800 [xfs]xfs_fs_sync_super+0x2b 0xffff88007d5abf00 0xffffffff8043b871 __down_read+0x12 0xffff88007d5abf10 0xffffffffa024d395 [ext3]ext3_sync_fs+0x46 0xffff88007d5abf40 0xffffffff80292fd2 sync_filesystems+0xae 0xffff88007d5abf60 0xffffffff802af48b do_sync+0x2f 0xffff88007d5abf70 0xffffffff802af4c4 sys_sync+0xe Adding the following resolved the issue however you may wish to solve it in another manner. @@ -102,7 +102,7 @@ xfs_sync_inodes_ag( * in reclaim. Leave it for the reclaim code to flush. */ inode = VFS_I(ip); - if (!igrab(inode)) { + if (!inode || !igrab(inode)) { read_unlock(&pag->pag_ici_lock); continue; } Thanks, Pete ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-16 2:43 ` Peter Leckie @ 2008-10-16 5:55 ` Dave Chinner 2008-10-16 9:00 ` Christoph Hellwig 1 sibling, 0 replies; 17+ messages in thread From: Dave Chinner @ 2008-10-16 5:55 UTC (permalink / raw) To: Peter Leckie; +Cc: xfs On Thu, Oct 16, 2008 at 12:43:10PM +1000, Peter Leckie wrote: > >>> - it's the same as >>> the first report - deferencing the linux inode without first having >>> a refernce on it. >>> >> >> Yes it resolves the issue. > > I spoke to soon, Ooops as follows: .... > Adding the following resolved the issue however you may wish to solve it > in another manner. > > @@ -102,7 +102,7 @@ xfs_sync_inodes_ag( > * in reclaim. Leave it for the reclaim code to flush. > */ > inode = VFS_I(ip); > - if (!igrab(inode)) { > + if (!inode || !igrab(inode)) { > read_unlock(&pag->pag_ici_lock); > continue; > } Yes, or you could simply apply the "combine linux/XFS inode" patch series and then VFS_I(ip) will never, ever return NULL. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-16 2:43 ` Peter Leckie 2008-10-16 5:55 ` Dave Chinner @ 2008-10-16 9:00 ` Christoph Hellwig 2008-10-17 1:01 ` Lachlan McIlroy 1 sibling, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2008-10-16 9:00 UTC (permalink / raw) To: Peter Leckie; +Cc: Dave Chinner, xfs On Thu, Oct 16, 2008 at 12:43:10PM +1000, Peter Leckie wrote: > Adding the following resolved the issue however you may wish to solve it > in another manner. > > @@ -102,7 +102,7 @@ xfs_sync_inodes_ag( > * in reclaim. Leave it for the reclaim code to flush. > */ > inode = VFS_I(ip); > - if (!igrab(inode)) { > + if (!inode || !igrab(inode)) { > read_unlock(&pag->pag_ici_lock); > continue; > } > This would be the correct fix for the current code, but can you please put in the Inode/XFS Inode unification patches? At least I have only QAed the whole patchkit, and from the issues here it seems like Dave did the same. And with that it would also be very good if there was any chance Dave and me (and all others) could actually access the current tree. The ptools -> CVS export has been broken for mor than two days, and there is no uptodate git tree eiter, so us external developers are completely tapping in the dark vs the current tree. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-16 9:00 ` Christoph Hellwig @ 2008-10-17 1:01 ` Lachlan McIlroy 0 siblings, 0 replies; 17+ messages in thread From: Lachlan McIlroy @ 2008-10-17 1:01 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Peter Leckie, Dave Chinner, xfs Christoph Hellwig wrote: > On Thu, Oct 16, 2008 at 12:43:10PM +1000, Peter Leckie wrote: >> Adding the following resolved the issue however you may wish to solve it >> in another manner. >> >> @@ -102,7 +102,7 @@ xfs_sync_inodes_ag( >> * in reclaim. Leave it for the reclaim code to flush. >> */ >> inode = VFS_I(ip); >> - if (!igrab(inode)) { >> + if (!inode || !igrab(inode)) { >> read_unlock(&pag->pag_ici_lock); >> continue; >> } >> > > This would be the correct fix for the current code, but can you please > put in the Inode/XFS Inode unification patches? At least I have only > QAed the whole patchkit, and from the issues here it seems like Dave did > the same. That patchset is already in. > > And with that it would also be very good if there was any chance Dave > and me (and all others) could actually access the current tree. The > ptools -> CVS export has been broken for mor than two days, and there > is no uptodate git tree eiter, so us external developers are completely > tapping in the dark vs the current tree. We had no idea the CVS tree was not updating. I'll get the OSS trees updated today. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-15 6:19 ` Dave Chinner 2008-10-15 7:51 ` Peter Leckie @ 2008-10-22 6:19 ` Dave Chinner 2008-10-23 4:23 ` Peter Leckie 1 sibling, 1 reply; 17+ messages in thread From: Dave Chinner @ 2008-10-22 6:19 UTC (permalink / raw) To: Peter Leckie, xfs Ping? Can this patch be applied to the new sync code before it is pushed to Linus? Cheers, Dave. On Wed, Oct 15, 2008 at 05:19:17PM +1100, Dave Chinner wrote: > On Wed, Oct 15, 2008 at 03:50:48PM +1000, Peter Leckie wrote: > > Dave Chinner wrote: > >> Update below. > >> > >> Cheers, > >> > >> Dave. > >> > > The original patch appeared to fix the issue, however the latest one > > Oops as follows: > > Well, I think the problem dhould be obvious - it's the same as > the first report - deferencing the linux inode without first having > a refernce on it. > > FWIW, if you apply the "combine linux/xfs inode" patch series, > this failure will go away. > > Updated patch below. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > > XFS: avoid all reclaimable inodes in xfs_sync_inodes_ag > > If we are syncing data in xfs_sync_inodes_ag(), the VFS > inode must still be referencable as the dirty data state > is carried on the VFS inode. hence if we can't get a > reference via igrab(), the inode must be in reclaim which > implies that it has no dirty data attached. > > Leave such inodes to the reclaim code to flush the dirty > inode state to disk and so avoid attempting to access the > VFS inode when it may not exist in xfs_sync_inodes_ag(). > > Version 4: > o don't reference liinux inode untiil after igrab() succeeds > > Version 3: > o converted unlock/rele to an xfs_iput() call. > > Version 2: > o change igrab logic to be more linear > o remove initial reclaimable inode check now that we are using > igrab() failure to find reclaimable inodes > o assert that igrab failure occurs only on reclaimable inodes > o clean up inode locking - only grab the iolock if we are doing > a SYNC_DELWRI call and we have a dirty inode. > --- > fs/xfs/linux-2.6/xfs_sync.c | 75 ++++++++++-------------------------------- > 1 files changed, 18 insertions(+), 57 deletions(-) > > diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c > index 08b2acf..39ed7f3 100644 > --- a/fs/xfs/linux-2.6/xfs_sync.c > +++ b/fs/xfs/linux-2.6/xfs_sync.c > @@ -63,25 +63,16 @@ xfs_sync_inodes_ag( > int error = 0; > int last_error = 0; > int fflag = XFS_B_ASYNC; > - int lock_flags = XFS_ILOCK_SHARED; > > if (flags & SYNC_DELWRI) > fflag = XFS_B_DELWRI; > if (flags & SYNC_WAIT) > fflag = 0; /* synchronous overrides all */ > > - if (flags & SYNC_DELWRI) { > - /* > - * We need the I/O lock if we're going to call any of > - * the flush/inval routines. > - */ > - lock_flags |= XFS_IOLOCK_SHARED; > - } > - > do { > struct inode *inode; > - boolean_t inode_refed; > xfs_inode_t *ip = NULL; > + int lock_flags = XFS_ILOCK_SHARED; > > /* > * use a gang lookup to find the next inode in the tree > @@ -109,22 +100,6 @@ xfs_sync_inodes_ag( > break; > } > > - /* > - * skip inodes in reclaim. Let xfs_syncsub do that for > - * us so we don't need to worry. > - */ > - if (xfs_iflags_test(ip, (XFS_IRECLAIM|XFS_IRECLAIMABLE))) { > - read_unlock(&pag->pag_ici_lock); > - continue; > - } > - > - /* bad inodes are dealt with elsewhere */ > - inode = VFS_I(ip); > - if (is_bad_inode(inode)) { > - read_unlock(&pag->pag_ici_lock); > - continue; > - } > - > /* nothing to sync during shutdown */ > if (XFS_FORCED_SHUTDOWN(mp)) { > read_unlock(&pag->pag_ici_lock); > @@ -132,42 +107,34 @@ xfs_sync_inodes_ag( > } > > /* > - * If we can't get a reference on the VFS_I, the inode must be > - * in reclaim. If we can get the inode lock without blocking, > - * it is safe to flush the inode because we hold the tree lock > - * and xfs_iextract will block right now. Hence if we lock the > - * inode while holding the tree lock, xfs_ireclaim() is > - * guaranteed to block on the inode lock we now hold and hence > - * it is safe to reference the inode until we drop the inode > - * locks completely. > + * If we can't get a reference on the inode, it must be > + * in reclaim. Leave it for the reclaim code to flush. > */ > - inode_refed = B_FALSE; > - if (igrab(inode)) { > - read_unlock(&pag->pag_ici_lock); > - xfs_ilock(ip, lock_flags); > - inode_refed = B_TRUE; > - } else { > - if (!xfs_ilock_nowait(ip, lock_flags)) { > - /* leave it to reclaim */ > - read_unlock(&pag->pag_ici_lock); > - continue; > - } > + inode = VFS_I(ip); > + if (!igrab(inode)) { > read_unlock(&pag->pag_ici_lock); > + continue; > + } > + read_unlock(&pag->pag_ici_lock); > + > + /* bad inodes are dealt with elsewhere */ > + if (is_bad_inode(inode)) { > + IRELE(ip); > + continue; > } > > /* > * If we have to flush data or wait for I/O completion > - * we need to drop the ilock that we currently hold. > - * If we need to drop the lock, insert a marker if we > - * have not already done so. > + * we need to hold the iolock. > */ > if ((flags & SYNC_DELWRI) && VN_DIRTY(inode)) { > - xfs_iunlock(ip, XFS_ILOCK_SHARED); > + xfs_ilock(ip, XFS_IOLOCK_SHARED); > + lock_flags |= XFS_IOLOCK_SHARED; > error = xfs_flush_pages(ip, 0, -1, fflag, FI_NONE); > if (flags & SYNC_IOWAIT) > vn_iowait(ip); > - xfs_ilock(ip, XFS_ILOCK_SHARED); > } > + xfs_ilock(ip, XFS_ILOCK_SHARED); > > if ((flags & SYNC_ATTR) && !xfs_inode_clean(ip)) { > if (flags & SYNC_WAIT) { > @@ -183,13 +150,7 @@ xfs_sync_inodes_ag( > xfs_ifunlock(ip); > } > } > - > - if (lock_flags) > - xfs_iunlock(ip, lock_flags); > - > - if (inode_refed) { > - IRELE(ip); > - } > + xfs_iput(ip, lock_flags); > > if (error) > last_error = error; > > > -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-22 6:19 ` Dave Chinner @ 2008-10-23 4:23 ` Peter Leckie 2008-10-23 5:39 ` Dave Chinner 0 siblings, 1 reply; 17+ messages in thread From: Peter Leckie @ 2008-10-23 4:23 UTC (permalink / raw) To: Dave Chinner, xfs Hey Dave I'll push it now, I assume it's ok to go in with out the check for a inode first as the combine linux/XFS inode patch is in. Thanks, Pete Dave Chinner wrote: > Ping? > > Can this patch be applied to the new sync code before it is pushed > to Linus? > > Cheers, > > Dave. > > On Wed, Oct 15, 2008 at 05:19:17PM +1100, Dave Chinner wrote: > >> On Wed, Oct 15, 2008 at 03:50:48PM +1000, Peter Leckie wrote: >> >>> Dave Chinner wrote: >>> >>>> Update below. >>>> >>>> Cheers, >>>> >>>> Dave. >>>> >>>> >>> The original patch appeared to fix the issue, however the latest one >>> Oops as follows: >>> >> Well, I think the problem dhould be obvious - it's the same as >> the first report - deferencing the linux inode without first having >> a refernce on it. >> >> FWIW, if you apply the "combine linux/xfs inode" patch series, >> this failure will go away. >> >> Updated patch below. >> >> Cheers, >> >> Dave. >> -- >> Dave Chinner >> david@fromorbit.com >> >> XFS: avoid all reclaimable inodes in xfs_sync_inodes_ag >> >> If we are syncing data in xfs_sync_inodes_ag(), the VFS >> inode must still be referencable as the dirty data state >> is carried on the VFS inode. hence if we can't get a >> reference via igrab(), the inode must be in reclaim which >> implies that it has no dirty data attached. >> >> Leave such inodes to the reclaim code to flush the dirty >> inode state to disk and so avoid attempting to access the >> VFS inode when it may not exist in xfs_sync_inodes_ag(). >> >> Version 4: >> o don't reference liinux inode untiil after igrab() succeeds >> >> Version 3: >> o converted unlock/rele to an xfs_iput() call. >> >> Version 2: >> o change igrab logic to be more linear >> o remove initial reclaimable inode check now that we are using >> igrab() failure to find reclaimable inodes >> o assert that igrab failure occurs only on reclaimable inodes >> o clean up inode locking - only grab the iolock if we are doing >> a SYNC_DELWRI call and we have a dirty inode. >> --- >> fs/xfs/linux-2.6/xfs_sync.c | 75 ++++++++++-------------------------------- >> 1 files changed, 18 insertions(+), 57 deletions(-) >> >> diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c >> index 08b2acf..39ed7f3 100644 >> --- a/fs/xfs/linux-2.6/xfs_sync.c >> +++ b/fs/xfs/linux-2.6/xfs_sync.c >> @@ -63,25 +63,16 @@ xfs_sync_inodes_ag( >> int error = 0; >> int last_error = 0; >> int fflag = XFS_B_ASYNC; >> - int lock_flags = XFS_ILOCK_SHARED; >> >> if (flags & SYNC_DELWRI) >> fflag = XFS_B_DELWRI; >> if (flags & SYNC_WAIT) >> fflag = 0; /* synchronous overrides all */ >> >> - if (flags & SYNC_DELWRI) { >> - /* >> - * We need the I/O lock if we're going to call any of >> - * the flush/inval routines. >> - */ >> - lock_flags |= XFS_IOLOCK_SHARED; >> - } >> - >> do { >> struct inode *inode; >> - boolean_t inode_refed; >> xfs_inode_t *ip = NULL; >> + int lock_flags = XFS_ILOCK_SHARED; >> >> /* >> * use a gang lookup to find the next inode in the tree >> @@ -109,22 +100,6 @@ xfs_sync_inodes_ag( >> break; >> } >> >> - /* >> - * skip inodes in reclaim. Let xfs_syncsub do that for >> - * us so we don't need to worry. >> - */ >> - if (xfs_iflags_test(ip, (XFS_IRECLAIM|XFS_IRECLAIMABLE))) { >> - read_unlock(&pag->pag_ici_lock); >> - continue; >> - } >> - >> - /* bad inodes are dealt with elsewhere */ >> - inode = VFS_I(ip); >> - if (is_bad_inode(inode)) { >> - read_unlock(&pag->pag_ici_lock); >> - continue; >> - } >> - >> /* nothing to sync during shutdown */ >> if (XFS_FORCED_SHUTDOWN(mp)) { >> read_unlock(&pag->pag_ici_lock); >> @@ -132,42 +107,34 @@ xfs_sync_inodes_ag( >> } >> >> /* >> - * If we can't get a reference on the VFS_I, the inode must be >> - * in reclaim. If we can get the inode lock without blocking, >> - * it is safe to flush the inode because we hold the tree lock >> - * and xfs_iextract will block right now. Hence if we lock the >> - * inode while holding the tree lock, xfs_ireclaim() is >> - * guaranteed to block on the inode lock we now hold and hence >> - * it is safe to reference the inode until we drop the inode >> - * locks completely. >> + * If we can't get a reference on the inode, it must be >> + * in reclaim. Leave it for the reclaim code to flush. >> */ >> - inode_refed = B_FALSE; >> - if (igrab(inode)) { >> - read_unlock(&pag->pag_ici_lock); >> - xfs_ilock(ip, lock_flags); >> - inode_refed = B_TRUE; >> - } else { >> - if (!xfs_ilock_nowait(ip, lock_flags)) { >> - /* leave it to reclaim */ >> - read_unlock(&pag->pag_ici_lock); >> - continue; >> - } >> + inode = VFS_I(ip); >> + if (!igrab(inode)) { >> read_unlock(&pag->pag_ici_lock); >> + continue; >> + } >> + read_unlock(&pag->pag_ici_lock); >> + >> + /* bad inodes are dealt with elsewhere */ >> + if (is_bad_inode(inode)) { >> + IRELE(ip); >> + continue; >> } >> >> /* >> * If we have to flush data or wait for I/O completion >> - * we need to drop the ilock that we currently hold. >> - * If we need to drop the lock, insert a marker if we >> - * have not already done so. >> + * we need to hold the iolock. >> */ >> if ((flags & SYNC_DELWRI) && VN_DIRTY(inode)) { >> - xfs_iunlock(ip, XFS_ILOCK_SHARED); >> + xfs_ilock(ip, XFS_IOLOCK_SHARED); >> + lock_flags |= XFS_IOLOCK_SHARED; >> error = xfs_flush_pages(ip, 0, -1, fflag, FI_NONE); >> if (flags & SYNC_IOWAIT) >> vn_iowait(ip); >> - xfs_ilock(ip, XFS_ILOCK_SHARED); >> } >> + xfs_ilock(ip, XFS_ILOCK_SHARED); >> >> if ((flags & SYNC_ATTR) && !xfs_inode_clean(ip)) { >> if (flags & SYNC_WAIT) { >> @@ -183,13 +150,7 @@ xfs_sync_inodes_ag( >> xfs_ifunlock(ip); >> } >> } >> - >> - if (lock_flags) >> - xfs_iunlock(ip, lock_flags); >> - >> - if (inode_refed) { >> - IRELE(ip); >> - } >> + xfs_iput(ip, lock_flags); >> >> if (error) >> last_error = error; >> >> >> >> > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-23 4:23 ` Peter Leckie @ 2008-10-23 5:39 ` Dave Chinner 0 siblings, 0 replies; 17+ messages in thread From: Dave Chinner @ 2008-10-23 5:39 UTC (permalink / raw) To: Peter Leckie; +Cc: xfs On Thu, Oct 23, 2008 at 02:23:59PM +1000, Peter Leckie wrote: > Hey Dave I'll push it now, I assume it's ok to go in with out the check > for a inode first > as the combine linux/XFS inode patch is in. Yes, it's safe to go in as it stands. Thanks. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: crash with latest code drop. 2008-10-15 1:18 ` Dave Chinner 2008-10-15 2:29 ` Christoph Hellwig @ 2008-10-15 3:47 ` Peter Leckie 1 sibling, 0 replies; 17+ messages in thread From: Peter Leckie @ 2008-10-15 3:47 UTC (permalink / raw) To: Peter Leckie, xfs Dave Chinner wrote: > Can you confirm that the xfs_inode has either the I_RECLAIM or > I_RECLAIMABLE flag set on it when the panic occurred? If this > is the case, then the patch below will probably fix it. > Yep XFS_IRECLAIMABLE is set, I'll give the patch a test and get back to you. Thanks, Pete ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2008-10-23 5:37 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-10-15 1:49 crash with latest code drop Peter Leckie 2008-10-15 1:18 ` Dave Chinner 2008-10-15 2:29 ` Christoph Hellwig 2008-10-15 3:16 ` Dave Chinner 2008-10-15 3:24 ` Christoph Hellwig 2008-10-15 3:51 ` Dave Chinner 2008-10-15 5:50 ` Peter Leckie 2008-10-15 6:19 ` Dave Chinner 2008-10-15 7:51 ` Peter Leckie 2008-10-16 2:43 ` Peter Leckie 2008-10-16 5:55 ` Dave Chinner 2008-10-16 9:00 ` Christoph Hellwig 2008-10-17 1:01 ` Lachlan McIlroy 2008-10-22 6:19 ` Dave Chinner 2008-10-23 4:23 ` Peter Leckie 2008-10-23 5:39 ` Dave Chinner 2008-10-15 3:47 ` Peter Leckie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox