From: John Quigley <jquigley@jquigley.com>
To: XFS Development <xfs@oss.sgi.com>
Subject: File system corruption
Date: Thu, 16 Jul 2009 13:08:12 -0500 [thread overview]
Message-ID: <4A5F6C8C.609@jquigley.com> (raw)
Hey Folks:
I'm periodically encountering an issue with XFS that you might perhaps be interested in. The environment in which this manifests itself is on a CentOS Linux machine (custom 2.6.28.7 kernel), which is serving the XFS mount point in question with the standard Linux nfsd. The XFS file system lives on an LVM device in a striping configuration (2 wide stripe), with two iSCSI volumes acting as the constituent physical volumes. This configuration is somewhat baroque, I know.
I'm experiencing periodic file system corruption, which manifests in the XFS file system going offline, and refusing subsequent mounts. The only way to recover from this has been to perform a xfs_repair -L, which has resulted in data loss on each occasion, as expected.
Now, here's what I witness in the system logs:
<snip>
kernel: XFS: bad magic number
kernel: XFS: SB validate failed
kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
kernel: Filesystem "dm-0": XFS internal error xfs_ialloc_read_agi at line 1408 of file fs/xfs/xfs_ialloc.c. Caller 0xffffffff8118711a
kernel: Pid: 3842, comm: nfsd Not tainted 2.6.28.7.cs.8 #3
kernel: Call Trace:
kernel: [<ffffffff8118711a>] xfs_ialloc_ag_select+0x22a/0x320
kernel: [<ffffffff81186481>] xfs_ialloc_read_agi+0xe1/0x140
kernel: [<ffffffff8118711a>] xfs_ialloc_ag_select+0x22a/0x320
kernel: [<ffffffff811f5bfd>] swiotlb_map_single_attrs+0x1d/0xf0
kernel: [<ffffffff8118711a>] xfs_ialloc_ag_select+0x22a/0x320
kernel: [<ffffffff81187bfc>] xfs_dialloc+0x31c/0xa90
kernel: [<ffffffff81076be5>] __alloc_pages_internal+0xf5/0x4f0
kernel: [<ffffffff8109ac46>] cache_alloc_refill+0x96/0x5a0
kernel: [<ffffffff8119012f>] xfs_ialloc+0x7f/0x6f0
kernel: [<ffffffff811ad0c6>] kmem_zone_alloc+0x86/0xc0
kernel: [<ffffffff811a66d8>] xfs_dir_ialloc+0xa8/0x360
kernel: [<ffffffff811a4008>] xfs_trans_reserve+0xa8/0x220
kernel: [<ffffffff813a29e7>] __down_write_nested+0x17/0xa0
kernel: [<ffffffff811a952f>] xfs_create+0x2ef/0x4e0
kernel: [<ffffffff811b523c>] xfs_vn_mknod+0x14c/0x1a0
kernel: [<ffffffff810a864c>] vfs_create+0xec/0x160
kernel: [<ffffffffa00c53c3>] nfsd_create_v3+0x3b3/0x500 [nfsd]
kernel: [<ffffffffa00cc178>] nfsd3_proc_create+0x118/0x1b0 [nfsd]
kernel: [<ffffffffa00be22a>] nfsd_dispatch+0xba/0x270 [nfsd]
kernel: [<ffffffffa0061fde>] svc_process+0x49e/0x800 [sunrpc]
kernel: [<ffffffff8102efc0>] default_wake_function+0x0/0x10
kernel: [<ffffffff813a2a97>] __down_read+0x17/0xa6
kernel: [<ffffffffa00be9a9>] nfsd+0x199/0x2c0 [nfsd]
kernel: [<ffffffffa00be810>] nfsd+0x0/0x2c0 [nfsd]
kernel: [<ffffffff8104a4b7>] kthread+0x47/0x90
kernel: [<ffffffff810322a7>] schedule_tail+0x27/0x70
kernel: [<ffffffff8100d0d9>] child_rip+0xa/0x11
kernel: [<ffffffff8104a470>] kthread+0x0/0x90
kernel: [<ffffffff8100d0cf>] child_rip+0x0/0x11
</snip>
The resultant stack trace coming from "XFS internal error xfs_ialloc_read_agi" repeats itself numerous times, at which point, the following is seen:
<snip>
kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
kernel: Filesystem "dm-0": XFS internal error xfs_alloc_read_agf at line 2194 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff8115cf09
kernel: Pid: 3756, comm: nfsd Not tainted 2.6.28.7.cs.8 #3
kernel: Call Trace:
kernel: [<ffffffff8115cf09>] xfs_alloc_fix_freelist+0x3e9/0x480
kernel: [<ffffffff8115abe3>] xfs_alloc_read_agf+0xd3/0x1e0
kernel: [<ffffffff8115cf09>] xfs_alloc_fix_freelist+0x3e9/0x480
kernel: [<ffffffff8100d0cf>] child_rip+0x0/0x11
kernel: [<ffffffff8115cf09>] xfs_alloc_fix_freelist+0x3e9/0x480
kernel: [<ffffffff811e8033>] vsnprintf+0x743/0x890
kernel: [<ffffffff81268a8a>] wait_for_xmitr+0x5a/0xc0
kernel: [<ffffffff8100d0cf>] child_rip+0x0/0x11
kernel: [<ffffffff813a2a97>] __down_read+0x17/0xa6
kernel: [<ffffffff8115d215>] xfs_alloc_vextent+0x1b5/0x4e0
kernel: [<ffffffff8116c0e8>] xfs_bmap_btalloc+0x608/0xb00
kernel: [<ffffffff8116f60a>] xfs_bmapi+0xa4a/0x12a0
kernel: [<ffffffff8118e93c>] xfs_imap_to_bp+0xac/0x130
kernel: [<ffffffff8117a37a>] xfs_dir2_grow_inode+0x15a/0x410
kernel: [<ffffffff8117b26f>] xfs_dir2_sf_to_block+0x9f/0x5c0
kernel: [<ffffffff811ad0c6>] kmem_zone_alloc+0x86/0xc0
kernel: [<ffffffff811ad132>] kmem_zone_zalloc+0x32/0x50
kernel: [<ffffffff811918ce>] xfs_inode_item_init+0x1e/0x80
kernel: [<ffffffff81183880>] xfs_dir2_sf_addname+0x430/0x5d0
kernel: [<ffffffff811903c8>] xfs_ialloc+0x318/0x6f0
kernel: [<ffffffff8117b0a2>] xfs_dir_createname+0x182/0x1e0
kernel: [<ffffffff811a95df>] xfs_create+0x39f/0x4e0
kernel: [<ffffffff811b523c>] xfs_vn_mknod+0x14c/0x1a0
kernel: [<ffffffff810a864c>] vfs_create+0xec/0x160
kernel: [<ffffffffa00c53c3>] nfsd_create_v3+0x3b3/0x500 [nfsd]
kernel: [<ffffffffa00cc178>] nfsd3_proc_create+0x118/0x1b0 [nfsd]
kernel: [<ffffffffa00be22a>] nfsd_dispatch+0xba/0x270 [nfsd]
kernel: [<ffffffffa0061fde>] svc_process+0x49e/0x800 [sunrpc]
kernel: [<ffffffff813a2a97>] __down_read+0x17/0xa6
kernel: [<ffffffffa00be9a9>] nfsd+0x199/0x2c0 [nfsd]
kernel: [<ffffffffa00be810>] nfsd+0x0/0x2c0 [nfsd]
kernel: [<ffffffff8104a4b7>] kthread+0x47/0x90
kernel: [<ffffffff810322a7>] schedule_tail+0x27/0x70
kernel: [<ffffffff8100d0d9>] child_rip+0xa/0x11
kernel: [<ffffffff8104a470>] kthread+0x0/0x90
kernel: [<ffffffff8100d0cf>] child_rip+0x0/0x11
kernel: Filesystem "dm-0": XFS internal error xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c. Caller 0xffffffff811a9411
kernel: Pid: 3756, comm: nfsd Not tainted 2.6.28.7.cs.8 #3
kernel: Call Trace:
kernel: [<ffffffff811a9411>] xfs_create+0x1d1/0x4e0
kernel: [<ffffffff811a3475>] xfs_trans_cancel+0xe5/0x110
kernel: [<ffffffff811a9411>] xfs_create+0x1d1/0x4e0
kernel: [<ffffffff811b523c>] xfs_vn_mknod+0x14c/0x1a0
kernel: [<ffffffff810a864c>] vfs_create+0xec/0x160
kernel: [<ffffffffa00c53c3>] nfsd_create_v3+0x3b3/0x500 [nfsd]
kernel: [<ffffffffa00cc178>] nfsd3_proc_create+0x118/0x1b0 [nfsd]
kernel: [<ffffffffa00be22a>] nfsd_dispatch+0xba/0x270 [nfsd]
kernel: [<ffffffffa0061fde>] svc_process+0x49e/0x800 [sunrpc]
kernel: [<ffffffff813a2a97>] __down_read+0x17/0xa6
kernel: [<ffffffffa00be9a9>] nfsd+0x199/0x2c0 [nfsd]
kernel: [<ffffffffa00be810>] nfsd+0x0/0x2c0 [nfsd]
kernel: [<ffffffff8104a4b7>] kthread+0x47/0x90
kernel: [<ffffffff810322a7>] schedule_tail+0x27/0x70
kernel: [<ffffffff8100d0d9>] child_rip+0xa/0x11
kernel: [<ffffffff8104a470>] kthread+0x0/0x90
kernel: [<ffffffff8100d0cf>] child_rip+0x0/0x11
kernel: xfs_force_shutdown(dm-0,0x8) called from line 1165 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff811a348e
kernel: Filesystem "dm-0": Corruption of in-memory data detected. Shutting down filesystem: dm-0
kernel: Please umount the filesystem, and rectify the problem(s)
kernel: nfsd: non-standard errno: -117
kernel: Filesystem "dm-0": xfs_log_force: error 5 returned.
</snip>
I'm somewhat at a loss with this one - it's been experienced on a customer's installation, so I don't have ready access to the machine. All internal tests to attempt reproduction with identical hardware/software configurations has been unfruitful. I'm concerned about the custom kernel, and may attempt to downgrade to the stock CentOS 5.3 kernel (2.6.18, if I remember correctly).
Any insight would be hugely appreciated, and of course tell me how I can help further. Thanks so much.
John Quigley
jquigley.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next reply other threads:[~2009-07-16 18:07 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-16 18:08 John Quigley [this message]
2009-07-16 19:20 ` File system corruption Eric Sandeen
-- strict thread matches above, loose matches on Subject: below --
2012-10-11 17:52 Wayne Walker
2012-10-11 18:03 ` Wayne Walker
2012-10-11 21:07 ` Dave Chinner
[not found] ` <50789076.7040402@crossroads.com>
2012-10-13 0:14 ` Dave Chinner
2012-10-24 21:19 ` Wayne Walker
2012-10-24 22:51 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A5F6C8C.609@jquigley.com \
--to=jquigley@jquigley.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox