XFS corruption on 3ware RAID6-volume

* XFS corruption on 3ware RAID6-volume
@ 2011-02-23 13:27 Erik Gulliksson
  2011-02-23 14:46 ` Emmanuel Florac
  2011-02-23 16:56 ` Stan Hoeppner
  0 siblings, 2 replies; 13+ messages in thread
From: Erik Gulliksson @ 2011-02-23 13:27 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: Type: text/plain, Size: 9309 bytes --]

Dear XFS people,

I have bumped in to a corruption problem with one a XFS filesystems.
The filesystem lives on a RAID6-volume on a 3ware 9650SE-24M8 with
battery backup and writecache enabled. RAID6-configuration is 11 2.0TB
WD15EARS disks and the volume is reported as OK by the RAID-card. I
believe the corruption below happened when the RAID-card reset itself,
due to disk timeouts on another RAID6-volume on the same controller
card (different story). I have tried to gather some relevant
information, in hope that someone can point me in the right direction
repairing this corruption.

Kernel: 2.6.26-2-amd64
OS: Debian Linux lenny 64-bit
xfsprogs: 2.9.8

Output from xfs_info:
meta-data=/dev/sda1              isize=256    agcount=13, agsize=268435455 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=3295874295, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0

mounting the filesystems gives the following in dmesg:
[858397.713452] Starting XFS recovery on filesystem: sda1 (logdev: internal)
[858403.841603] Filesystem "sda1": XFS internal error
xfs_btree_check_sblock at line 334 of file fs/xfs/xfs_btree.c.  Caller
0xffffffffa0138321
[858403.841603] Pid: 31433, comm: mount Not tainted 2.6.26-2-amd64 #1
[858403.841603]
[858403.841603] Call Trace:
[858403.841603]  [<ffffffffa0138321>] :xfs:xfs_alloc_lookup+0x133/0x34f
[858403.841603]  [<ffffffffa014c7fb>] :xfs:xfs_btree_check_sblock+0xaf/0xbf
[858403.841603]  [<ffffffffa0138321>] :xfs:xfs_alloc_lookup+0x133/0x34f
[858403.841603]  [<ffffffffa014c322>] :xfs:xfs_btree_init_cursor+0x31/0x1ae
[858403.841603]  [<ffffffffa0135d17>] :xfs:xfs_free_ag_extent+0x63/0x6b5
[858403.841603]  [<ffffffff8042a354>] __down_read+0x12/0xa1
[858403.841603]  [<ffffffffa01379dd>] :xfs:xfs_free_extent+0xa9/0xc9
[858403.841603]  [<ffffffffa01694b3>] :xfs:xlog_recover_process_efi+0x10e/0x167
[858403.841603]  [<ffffffffa016a6a4>] :xfs:xlog_recover_process_efis+0x4b/0x85
[858403.841603]  [<ffffffffa016a6f3>] :xfs:xlog_recover_finish+0x15/0xb5
[858403.841603]  [<ffffffffa016f2f7>] :xfs:xfs_mountfs+0x475/0x5ac
[858403.841603]  [<ffffffffa017a311>] :xfs:kmem_alloc+0x60/0xc4
[858403.841603]  [<ffffffffa0174eb4>] :xfs:xfs_mount+0x29b/0x347
[858403.841603]  [<ffffffffa01833e6>] :xfs:xfs_fs_fill_super+0x0/0x1ee
[858403.841603]  [<ffffffffa018349b>] :xfs:xfs_fs_fill_super+0xb5/0x1ee
[858403.841603]  [<ffffffff8029d334>] get_sb_bdev+0xf8/0x145
[858403.841603]  [<ffffffff8029cd58>] vfs_kern_mount+0x93/0x11b
[858403.841603]  [<ffffffff8029ce33>] do_kern_mount+0x43/0xe3
[858403.841603]  [<ffffffff802b18c9>] do_new_mount+0x5b/0x95
[858403.841603]  [<ffffffff802b1ac0>] do_mount+0x1bd/0x1e7
[858403.841603]  [<ffffffff802769a1>] __alloc_pages_internal+0xd6/0x3bf
[858403.841603]  [<ffffffff802b1b74>] sys_mount+0x8a/0xce
[858403.841603]  [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f
[858403.841603]
[858403.841603] Failed to recover EFIs on filesystem: sda1
[858403.841603] XFS: log mount finish failed

Output from xfs_check -v:
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_check.  If you are unable to mount the filesystem, then use
the xfs_repair -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Output from xfs_repair -n produced a 2GB file, so this is the cleaned
up version:

- block (1,499522) already used, state 7 (3_636_499 of these)
- block (7,480993) multiply claimed by bno space tree, state -
(26_547_241 of these)
- bno freespace btree block claimed (state 1), agno 7, bno 65565,
suspect 0 (158 of these)
- bcnt freespace btree block claimed (state 1), agno 7, bno 567395,
suspect 0 (175 of these)
- data fork in ino 84753919 claims free block 291349280 (4_580_113 of these)
- would have junked entry "foo" in directory inode 136 (10_095 of these)
- would have corrected i8 count in directory 136 from 2 to 1 (9_016 of these)
- entry "foo" at block 0 offset 72 in directory inode 16955069
references non-existent inode 30065663864
        would clear inode number in entry at offset 72... (43_379 of these)

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
bad magic # 0x26c4 in btcnt block 1/302903
expected level 0 got 514 in btcnt block 1/302903
bad magic # 0x26c4 in btbno block 7/604731
expected level 0 got 256 in btbno block 7/604731
bad magic # 0x26c4 in btbno block 9/8428277
expected level 0 got 59755 in btbno block 9/8428277
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
found inodes not in the inode allocation tree
found inodes not in the inode allocation tree
found inodes not in the inode allocation tree
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
bad directory block magic # 0x26c4 in block 0 for directory inode 6159645547
corrupt block 0 in directory inode 6159645547
	would junk block
no . entry for directory 6159645547
no .. entry for directory 6159645547
problem with directory contents in inode 6159645547
would have cleared inode 6159645547
        - agno = 2
        - agno = 3
        - agno = 4
bad directory block magic # 0x6173733d in block 0 for directory inode
19126674939
corrupt block 0 in directory inode 19126674939
	would junk block
no . entry for directory 19126674939
no .. entry for directory 19126674939
problem with directory contents in inode 19126674939
would have cleared inode 19126674939
        - agno = 5
        - agno = 6
        - agno = 7
42189950: Badness in key lookup (length)
bp=(bno 15170340024, len 16384 bytes) key=(bno 15170340024, len 8192 bytes)
        - agno = 8
bad directory block magic # 0x45b419cb in block 0 for directory inode
35775783660
corrupt block 0 in directory inode 35775783660
	would junk block
no . entry for directory 35775783660
no .. entry for directory 35775783660
problem with directory contents in inode 35775783660
would have cleared inode 35775783660
        - agno = 9
        - agno = 10
bad nblocks 20513 for inode 43585639210, would reset to 15192
bad nextents 37 for inode 43585639210, would reset to 32
        - agno = 11
        - agno = 12
bad directory block magic # 0x58443244 in block 0 for directory inode
51803060746
corrupt block 0 in directory inode 51803060746
	would junk block
no . entry for directory 51803060746
no .. entry for directory 51803060746
problem with directory contents in inode 51803060746
would have cleared inode 51803060746
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
bad directory block magic # 0x26c4 in block 0 for directory inode 6159645547
corrupt block 0 in directory inode 6159645547
	would junk block
no . entry for directory 6159645547
no .. entry for directory 6159645547
problem with directory contents in inode 6159645547
would have cleared inode 6159645547
        - agno = 2
        - agno = 3
        - agno = 4
bad directory block magic # 0x6173733d in block 0 for directory inode
19126674939
corrupt block 0 in directory inode 19126674939
	would junk block
no . entry for directory 19126674939
no .. entry for directory 19126674939
problem with directory contents in inode 19126674939
would have cleared inode 19126674939
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
bad directory block magic # 0x45b419cb in block 0 for directory inode
35775783660
corrupt block 0 in directory inode 35775783660
	would junk block
no . entry for directory 35775783660
no .. entry for directory 35775783660
problem with directory contents in inode 35775783660
would have cleared inode 35775783660
        - agno = 9
        - agno = 10
bad nblocks 20513 for inode 43585639210, would reset to 15192
bad nextents 37 for inode 43585639210, would reset to 32
        - agno = 11
        - agno = 12
bad directory block magic # 0x58443244 in block 0 for directory inode
51803060746
corrupt block 0 in directory inode 51803060746
	would junk block
no . entry for directory 51803060746
no .. entry for directory 51803060746
problem with directory contents in inode 51803060746
would have cleared inode 51803060746
No modify flag set, skipping phase 5
Inode allocation btrees are too corrupted, skipping phases 6 and 7
No modify flag set, skipping filesystem flush and exiting.

I did run "xfs_repair -L" on an image of the filesystem on another
server and I ended up with about 50000 entries in lost+found (~750000
entries recursively). Attaching output from "xfs_logprint -t" and a
xfs_metadump can be made available. Is there any way to diagnose and
salvage this? Any and all help is much appreciated.

Best regards
Erik Gulliksson

[-- Attachment #2: xfs_logprint-t.txt.gz --]
[-- Type: application/x-gzip, Size: 132474 bytes --]

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread