From: Shyam Kaushik <shyam@zadarastorage.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Alex Lyakas <alex@zadarastorage.com>, xfs@oss.sgi.com
Subject: RE: XFS hung task in xfs_ail_push_all_sync() when unmounting FS after disk failure/recovery
Date: Tue, 22 Mar 2016 18:31:48 +0530 [thread overview]
Message-ID: <6457b1d9de271ec6cca6bc2626aac161@mail.gmail.com> (raw)
In-Reply-To: <20160322121922.GA53693@bfoster.bfoster>
Hi Brian,
Thanks for your quick reply. I repeated the test & trace-pipe is
constantly filled with this:
xfsaild/dm-10-3335 [003] ...2 24890.546491: xfs_ilock_nowait: dev
253:10 ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 24890.546492: xfs_iunlock: dev 253:10
ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 24890.546493: xfs_ail_flushing: dev
253:10 lip 0xffff8800a9f437b8 lsn 1/38624 type XFS_LI_INODE flags IN_AIL
xfsaild/dm-10-3335 [003] ...2 24890.596491: xfs_ilock_nowait: dev
253:10 ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 24890.596492: xfs_iunlock: dev 253:10
ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 24890.596494: xfs_ail_flushing: dev
253:10 lip 0xffff8800a9f437b8 lsn 1/38624 type XFS_LI_INODE flags IN_AIL
xfsaild/dm-10-3335 [003] ...2 24890.646497: xfs_ilock_nowait: dev
253:10 ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 24890.646498: xfs_iunlock: dev 253:10
ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 24890.646500: xfs_ail_flushing: dev
253:10 lip 0xffff8800a9f437b8 lsn 1/38624 type XFS_LI_INODE flags IN_AIL
xfsaild/dm-10-3335 [003] ...2 24890.696467: xfs_ilock_nowait: dev
253:10 ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 24890.696468: xfs_iunlock: dev 253:10
ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 24890.696468: xfs_ail_flushing: dev
253:10 lip 0xffff8800a9f437b8 lsn 1/38624 type XFS_LI_INODE flags IN_AIL
xfsaild/dm-10-3335 [003] ...2 24890.746548: xfs_ilock_nowait: dev
253:10 ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 24890.746550: xfs_iunlock: dev 253:10
ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 24890.746550: xfs_ail_flushing: dev
253:10 lip 0xffff8800a9f437b8 lsn 1/38624 type XFS_LI_INODE flags IN_AIL
xfsaild/dm-10-3335 [003] ...2 24890.796479: xfs_ilock_nowait: dev
253:10 ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 24890.796480: xfs_iunlock: dev 253:10
ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 24890.796480: xfs_ail_flushing: dev
253:10 lip 0xffff8800a9f437b8 lsn 1/38624 type XFS_LI_INODE flags IN_AIL
xfsaild/dm-10-3335 [003] ...2 24890.846467: xfs_ilock_nowait: dev
253:10 ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 24890.846468: xfs_iunlock: dev 253:10
ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
while regular activity seems to happen on other inodes/kworker threads
kworker/u8:4-27691 [001] ...1 24895.811474: xfs_writepage: dev 253:10
ino 0x1801061 pgoff 0x29000 size 0x1aebbc offset 0 length 0 delalloc 1
unwritten 0
kworker/u8:4-27691 [001] ...1 24895.811477: xfs_invalidatepage: dev
253:10 ino 0x1801061 pgoff 0x29000 size 0x1aebbc offset 0 length 1000
delalloc 1 unwritten 0
kworker/u8:4-27691 [001] ...1 24895.811478: xfs_releasepage: dev
253:10 ino 0x1801061 pgoff 0x29000 size 0x1aebbc offset 0 length 0
delalloc 0 unwritten 0
kworker/u8:4-27691 [001] ...1 24895.811482: xfs_writepage: dev 253:10
ino 0x4017bdf pgoff 0x29000 size 0x1aebbc offset 0 length 0 delalloc 1
unwritten 0
kworker/u8:4-27691 [001] ...1 24895.811482: xfs_invalidatepage: dev
253:10 ino 0x4017bdf pgoff 0x29000 size 0x1aebbc offset 0 length 1000
delalloc 1 unwritten 0
kworker/u8:4-27691 [001] ...1 24895.811483: xfs_releasepage: dev
253:10 ino 0x4017bdf pgoff 0x29000 size 0x1aebbc offset 0 length 0
delalloc 0 unwritten 0
kworker/u8:4-27691 [001] ...1 24895.811485: xfs_writepage: dev 253:10
ino 0x68048c3 pgoff 0x29000 size 0x1aebbc offset 0 length 0 delalloc 1
unwritten 0
kworker/u8:4-27691 [001] ...1 24895.811486: xfs_invalidatepage: dev
253:10 ino 0x68048c3 pgoff 0x29000 size 0x1aebbc offset 0 length 1000
delalloc 1 unwritten 0
kworker/u8:4-27691 [001] ...1 24895.811486: xfs_releasepage: dev
253:10 ino 0x68048c3 pgoff 0x29000 size 0x1aebbc offset 0 length 0
delalloc 0 unwritten 0
kworker/u8:4-27691 [001] ...1 24895.812381: xfs_writepage: dev 253:10
ino 0x1805e37 pgoff 0x29000 size 0x68470 offset 0 length 0 delalloc 1
unwritten 0
kworker/u8:4-27691 [001] ...1 24895.812382: xfs_invalidatepage: dev
253:10 ino 0x1805e37 pgoff 0x29000 size 0x68470 offset 0 length 1000
delalloc 1 unwritten 0
kworker/u8:4-27691 [001] ...1 24895.812382: xfs_releasepage: dev
253:10 ino 0x1805e37 pgoff 0x29000 size 0x68470 offset 0 length 0 delalloc
0 unwritten 0
kworker/u8:4-27691 [001] ...1 24895.812385: xfs_writepage: dev 253:10
ino 0x4019c95 pgoff 0x29000 size 0x68470 offset 0 length 0 delalloc 1
unwritten 0
kworker/u8:4-27691 [001] ...1 24895.812385: xfs_invalidatepage: dev
253:10 ino 0x4019c95 pgoff 0x29000 size 0x68470 offset 0 length 1000
delalloc 1 unwritten 0
looks like xfsaild is not able to take lock until hung-task timeout kicks
in
xfsaild/dm-10-3335 [003] ...2 25247.649468: xfs_ilock_nowait: dev
253:10 ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 25247.649469: xfs_iunlock: dev 253:10
ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 25247.649469: xfs_ail_flushing: dev
253:10 lip 0xffff8800a9f437b8 lsn 1/38624 type XFS_LI_INODE flags IN_AIL
xfsaild/dm-10-3335 [003] ...2 25247.699478: xfs_ilock_nowait: dev
253:10 ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 25247.699516: xfs_iunlock: dev 253:10
ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 25247.699517: xfs_ail_flushing: dev
253:10 lip 0xffff8800a9f437b8 lsn 1/38624 type XFS_LI_INODE flags IN_AIL
xfsaild/dm-10-3335 [003] ...2 25247.749471: xfs_ilock_nowait: dev
253:10 ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 25247.749478: xfs_iunlock: dev 253:10
ino 0xc0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
xfsaild/dm-10-3335 [003] ...2 25247.749479: xfs_ail_flushing: dev
253:10 lip 0xffff8800a9f437b8 lsn 1/38624 type XFS_LI_INODE flags IN_AIL
Please let me know how to debug this further. Thanks.
--Shyam
-----Original Message-----
From: Brian Foster [mailto:bfoster@redhat.com]
Sent: 22 March 2016 17:49
To: Shyam Kaushik
Cc: xfs@oss.sgi.com; Alex Lyakas
Subject: Re: XFS hung task in xfs_ail_push_all_sync() when unmounting FS
after disk failure/recovery
On Tue, Mar 22, 2016 at 04:51:39PM +0530, Shyam Kaushik wrote:
> Hi XFS developers,
>
> We are seeing the following issue with XFS on kernel 3.18.19.
>
> We have XFS mounted over a raw disk. Disk was pulled out manually. There
> were async writes on files that were errored like this
>
...
>
> And XFS hit metadata & Log IO errors that it decides to shutdown:
>
> Mar 16 16:03:22 host0 kernel: [ 4637.351841] XFS (dm-29): metadata I/O
> error: block 0x3a27fbd0 ("xlog_iodone") error 5 numblks 64
> Mar 16 16:03:22 host0 kernel: [ 4637.352820] XFS(dm-29): SHUTDOWN!!!
> old_flags=0x0 new_flags=0x2
> Mar 16 16:03:22 host0 kernel: [ 4637.353187] XFS (dm-29): Log I/O Error
> Detected. Shutting down filesystem
...
> Later the drive was re-inserted back. After the drive was re-inserted,
XFS
> was attempted to be unmounted
>
> Mar 16 16:16:53 host0 controld: [2557] [ ] umount[202]
> : umount(/sdisk/vol5b0, xfs)
>
> But nothing happens except for the 30-secs xfs_log_force errors that
keeps
> repeating
>
...
>
> This problem doesn't happen consistently, but happens periodically with
a
> drive failure/recovery followed by XFS unmount. I couldn't find this
issue
> fixed in later kernels. Can you please suggest how I can debug this
issue
> further?
>
Similar problems have been reproduced due to racy/incorrect EFI/EFD
object tracking, which are internal data structures associated with
freeing extents.
What happens if you enable tracepoints while the fs is in this hung
unmount state?
# trace-cmd start -e "xfs:*"
# cat /sys/kernel/debug/tracing/trace_pipe
Brian
> Thanks!
>
> --Shyam
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2016-03-22 13:01 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-22 11:21 XFS hung task in xfs_ail_push_all_sync() when unmounting FS after disk failure/recovery Shyam Kaushik
2016-03-22 12:19 ` Brian Foster
2016-03-22 13:01 ` Shyam Kaushik [this message]
2016-03-22 14:03 ` Brian Foster
2016-03-22 15:38 ` Carlos Maiolino
2016-03-22 15:56 ` Carlos Maiolino
2016-03-23 9:43 ` Shyam Kaushik
2016-03-23 12:30 ` Brian Foster
2016-03-23 15:32 ` Carlos Maiolino
2016-03-23 22:37 ` Dave Chinner
2016-03-24 11:08 ` Carlos Maiolino
2016-03-24 16:52 ` Carlos Maiolino
2016-03-24 21:56 ` Dave Chinner
2016-04-01 12:31 ` Carlos Maiolino
2016-03-23 9:52 ` Shyam Kaushik
2016-03-24 13:38 ` Shyam Kaushik
2016-04-08 10:51 ` Shyam Kaushik
2016-04-08 13:16 ` Brian Foster
2016-04-08 13:35 ` Shyam Kaushik
2016-04-08 14:31 ` Carlos Maiolino
2016-04-08 17:48 ` Shyam Kaushik
2016-04-08 19:00 ` Brian Foster
2016-04-08 17:51 ` Shyam Kaushik
2016-04-08 22:46 ` Dave Chinner
2016-04-10 18:40 ` Alex Lyakas
2016-04-11 1:21 ` Dave Chinner
2016-04-11 14:52 ` Shyam Kaushik
2016-04-11 22:47 ` Dave Chinner
2016-04-12 5:20 ` Shyam Kaushik
2016-04-12 6:59 ` Shyam Kaushik
2016-04-12 8:19 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6457b1d9de271ec6cca6bc2626aac161@mail.gmail.com \
--to=shyam@zadarastorage.com \
--cc=alex@zadarastorage.com \
--cc=bfoster@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.