* XFS corruption on ubuntu 2.6.27-9-server
@ 2009-02-04 0:32 George Barnett
2009-02-04 1:28 ` Eric Sandeen
0 siblings, 1 reply; 6+ messages in thread
From: George Barnett @ 2009-02-04 0:32 UTC (permalink / raw)
To: xfs
Hi,
I'm seeing the following errors:
[822153.422851] Filesystem "md2": XFS internal error xfs_da_do_buf(2)
at line 2107 of file /build/buildd/linux-2.6.27/fs/xfs/
xfs_da_btree.c. Caller 0xffffffffa03be8da
[822153.422903] Pid: 3273, comm: du Not tainted 2.6.27-9-server #1
[822153.422905]
[822153.422906] Call Trace:
[822153.422931] [<ffffffffa03cab23>] xfs_error_report+0x43/0x50 [xfs]
[822153.422956] [<ffffffffa03be8da>] ? xfs_da_read_buf+0x2a/0x30 [xfs]
[822153.422976] [<ffffffffa03cab8d>] xfs_corruption_error+0x5d/0x80
[xfs]
[822153.422995] [<ffffffffa03be808>] xfs_da_do_buf+0x6a8/0x700 [xfs]
[822153.423014] [<ffffffffa03be8da>] ? xfs_da_read_buf+0x2a/0x30 [xfs]
[822153.423019] [<ffffffff80305b06>] ? mntput_no_expire+0x36/0x160
[822153.423022] [<ffffffff803866e1>] ? aa_permission+0x21/0xd0
[822153.423041] [<ffffffffa03be8da>] xfs_da_read_buf+0x2a/0x30 [xfs]
[822153.423061] [<ffffffffa03c34fa>] ? xfs_dir2_block_getdents+0x9a/
0x210 [xfs]
[822153.423080] [<ffffffffa03c34fa>] xfs_dir2_block_getdents+0x9a/
0x210 [xfs]
[822153.423099] [<ffffffffa03adf7b>] ? xfs_bmap_last_offset+0x13b/
0x150 [xfs]
[822153.423119] [<ffffffffa03f9970>] ? xfs_hack_filldir+0x0/0x60 [xfs]
[822153.423138] [<ffffffffa03f9970>] ? xfs_hack_filldir+0x0/0x60 [xfs]
[822153.423157] [<ffffffffa03c148b>] xfs_readdir+0x9b/0xf0 [xfs]
[822153.423176] [<ffffffffa03f98a6>] xfs_file_readdir+0xd6/0x1a0 [xfs]
[822153.423180] [<ffffffff802f8810>] ? filldir+0x0/0xe0
[822153.423183] [<ffffffff80386821>] ? aa_file_permission+0x21/0xf0
[822153.423185] [<ffffffff802f8810>] ? filldir+0x0/0xe0
[822153.423188] [<ffffffff802f8810>] ? filldir+0x0/0xe0
[822153.423191] [<ffffffff802f8a9b>] vfs_readdir+0xbb/0xe0
[822153.423194] [<ffffffff802f8c28>] sys_getdents+0x88/0xe0
[822153.423199] [<ffffffff8021285a>] system_call_fastpath+0x16/0x1b
This seems to happen reasonably regularly on my system and causes the
filesystem to be marked as dirty. xfs_repair runs fine, but I end up
with a bunch of files moved to lost+found. There are no device errors
logged when this happens.
Mount options:
/dev/md2 on /data type xfs (rw,noatime)
Kernel:
Linux slut 2.6.27-9-server #1 SMP Thu Nov 20 22:56:07 UTC 2008 x86_64
GNU/Linux
Device:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md2 : active raid10 sda2[0] sdd2[3] sdb2[1]
1947655680 blocks super 1.2 128K chunks 2 far-copies [4/3] [UUUU]
# xfs_info /dev/md2
meta-data=/dev/md2 isize=256 agcount=32,
agsize=15216064 blks
= sectsz=512 attr=0
data = bsize=4096 blocks=486913920,
imaxpct=25
= sunit=32 swidth=128 blks
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=1
= sectsz=512 sunit=0 blks, lazy-
count=0
realtime =none extsz=524288 blocks=0, rtextents=0
Any assistance would be greatly appreciated.
George
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: XFS corruption on ubuntu 2.6.27-9-server 2009-02-04 0:32 XFS corruption on ubuntu 2.6.27-9-server George Barnett @ 2009-02-04 1:28 ` Eric Sandeen 2009-02-04 1:34 ` George Barnett 0 siblings, 1 reply; 6+ messages in thread From: Eric Sandeen @ 2009-02-04 1:28 UTC (permalink / raw) To: George Barnett; +Cc: xfs George Barnett wrote: > Hi, > > I'm seeing the following errors: > > [822153.422851] Filesystem "md2": XFS internal error xfs_da_do_buf(2) > at line 2107 of file /build/buildd/linux-2.6.27/fs/xfs/ we really should make that more informative. What it means is that you read a piece of metadata that did not match any of the metadata magic numbers. hard to say whether it might be an xfs bug I think; this does come up occasionally though and it'd at least be nice to print more details on the error (what the magic *was*, what block, etc) Do you happen to have the repair output? Did your md raid lose power w/ write cache enabled? -Eric > xfs_da_btree.c. Caller 0xffffffffa03be8da > [822153.422903] Pid: 3273, comm: du Not tainted 2.6.27-9-server #1 > [822153.422905] > [822153.422906] Call Trace: > [822153.422931] [<ffffffffa03cab23>] xfs_error_report+0x43/0x50 [xfs] > [822153.422956] [<ffffffffa03be8da>] ? xfs_da_read_buf+0x2a/0x30 [xfs] > [822153.422976] [<ffffffffa03cab8d>] xfs_corruption_error+0x5d/0x80 > [xfs] > [822153.422995] [<ffffffffa03be808>] xfs_da_do_buf+0x6a8/0x700 [xfs] > [822153.423014] [<ffffffffa03be8da>] ? xfs_da_read_buf+0x2a/0x30 [xfs] > [822153.423019] [<ffffffff80305b06>] ? mntput_no_expire+0x36/0x160 > [822153.423022] [<ffffffff803866e1>] ? aa_permission+0x21/0xd0 > [822153.423041] [<ffffffffa03be8da>] xfs_da_read_buf+0x2a/0x30 [xfs] > [822153.423061] [<ffffffffa03c34fa>] ? xfs_dir2_block_getdents+0x9a/ > 0x210 [xfs] > [822153.423080] [<ffffffffa03c34fa>] xfs_dir2_block_getdents+0x9a/ > 0x210 [xfs] > [822153.423099] [<ffffffffa03adf7b>] ? xfs_bmap_last_offset+0x13b/ > 0x150 [xfs] > [822153.423119] [<ffffffffa03f9970>] ? xfs_hack_filldir+0x0/0x60 [xfs] > [822153.423138] [<ffffffffa03f9970>] ? xfs_hack_filldir+0x0/0x60 [xfs] > [822153.423157] [<ffffffffa03c148b>] xfs_readdir+0x9b/0xf0 [xfs] > [822153.423176] [<ffffffffa03f98a6>] xfs_file_readdir+0xd6/0x1a0 [xfs] > [822153.423180] [<ffffffff802f8810>] ? filldir+0x0/0xe0 > [822153.423183] [<ffffffff80386821>] ? aa_file_permission+0x21/0xf0 > [822153.423185] [<ffffffff802f8810>] ? filldir+0x0/0xe0 > [822153.423188] [<ffffffff802f8810>] ? filldir+0x0/0xe0 > [822153.423191] [<ffffffff802f8a9b>] vfs_readdir+0xbb/0xe0 > [822153.423194] [<ffffffff802f8c28>] sys_getdents+0x88/0xe0 > [822153.423199] [<ffffffff8021285a>] system_call_fastpath+0x16/0x1b > > This seems to happen reasonably regularly on my system and causes the > filesystem to be marked as dirty. xfs_repair runs fine, but I end up > with a bunch of files moved to lost+found. There are no device errors > logged when this happens. > > Mount options: > > /dev/md2 on /data type xfs (rw,noatime) > > Kernel: > > Linux slut 2.6.27-9-server #1 SMP Thu Nov 20 22:56:07 UTC 2008 x86_64 > GNU/Linux > > Device: > > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > [raid4] [raid10] > md2 : active raid10 sda2[0] sdd2[3] sdb2[1] > 1947655680 blocks super 1.2 128K chunks 2 far-copies [4/3] [UUUU] > > # xfs_info /dev/md2 > meta-data=/dev/md2 isize=256 agcount=32, > agsize=15216064 blks > = sectsz=512 attr=0 > data = bsize=4096 blocks=486913920, > imaxpct=25 > = sunit=32 swidth=128 blks > naming =version 2 bsize=4096 > log =internal bsize=4096 blocks=32768, version=1 > = sectsz=512 sunit=0 blks, lazy- > count=0 > realtime =none extsz=524288 blocks=0, rtextents=0 > > Any assistance would be greatly appreciated. > > George > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS corruption on ubuntu 2.6.27-9-server 2009-02-04 1:28 ` Eric Sandeen @ 2009-02-04 1:34 ` George Barnett 2009-02-04 1:46 ` Eric Sandeen 0 siblings, 1 reply; 6+ messages in thread From: George Barnett @ 2009-02-04 1:34 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs On 04/02/2009, at 12:28 PM, Eric Sandeen wrote: > George Barnett wrote: >> Hi, >> >> I'm seeing the following errors: >> >> [822153.422851] Filesystem "md2": XFS internal error xfs_da_do_buf(2) >> at line 2107 of file /build/buildd/linux-2.6.27/fs/xfs/ > > we really should make that more informative. > > What it means is that you read a piece of metadata that did not match > any of the metadata magic numbers. > > hard to say whether it might be an xfs bug I think; this does come up > occasionally though and it'd at least be nice to print more details on > the error (what the magic *was*, what block, etc) > > Do you happen to have the repair output? > > Did your md raid lose power w/ write cache enabled? Hi Eric, Thanks for your response. The system did not lose power. This failure just "happens". I have a cronjob which rsync's /data to a spare drive that's not on raid. It seems that is enough to cause this failure. Fortunately, I still have the xfs_repair output in my term buffer: root@slut:/# xfs_repair /dev/md2 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 bad magic number 0x0 on inode 18042 bad version number 0x0 on inode 18042 bad magic number 0x0 on inode 18043 bad version number 0x0 on inode 18043 bad magic number 0x0 on inode 18044 bad version number 0x0 on inode 18044 bad magic number 0x0 on inode 18045 bad version number 0x0 on inode 18045 bad magic number 0x0 on inode 18046 bad version number 0x0 on inode 18046 bad magic number 0x0 on inode 18047 bad version number 0x0 on inode 18047 bad directory block magic # 0 in block 0 for directory inode 18000 corrupt block 0 in directory inode 18000 will junk block no . entry for directory 18000 no .. entry for directory 18000 problem with directory contents in inode 18000 cleared inode 18000 bad directory block magic # 0 in block 0 for directory inode 18006 corrupt block 0 in directory inode 18006 will junk block no . entry for directory 18006 no .. entry for directory 18006 problem with directory contents in inode 18006 cleared inode 18006 bad magic number 0x0 on inode 18042, resetting magic number bad version number 0x0 on inode 18042, resetting version number imap claims a free inode 18042 is in use, correcting imap and clearing inode cleared inode 18042 bad magic number 0x0 on inode 18043, resetting magic number bad version number 0x0 on inode 18043, resetting version number imap claims a free inode 18043 is in use, correcting imap and clearing inode cleared inode 18043 bad magic number 0x0 on inode 18044, resetting magic number bad version number 0x0 on inode 18044, resetting version number imap claims a free inode 18044 is in use, correcting imap and clearing inode cleared inode 18044 bad magic number 0x0 on inode 18045, resetting magic number bad version number 0x0 on inode 18045, resetting version number imap claims a free inode 18045 is in use, correcting imap and clearing inode cleared inode 18045 bad magic number 0x0 on inode 18046, resetting magic number bad version number 0x0 on inode 18046, resetting version number imap claims a free inode 18046 is in use, correcting imap and clearing inode cleared inode 18046 bad magic number 0x0 on inode 18047, resetting magic number bad version number 0x0 on inode 18047, resetting version number imap claims a free inode 18047 is in use, correcting imap and clearing inode cleared inode 18047 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 19 - agno = 20 - agno = 21 - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - agno = 30 - agno = 31 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 entry "classes.nib" in shortform directory 18009 references free inode 18047 junking entry "classes.nib" in directory inode 18009 - agno = 4 - agno = 5 - agno = 6 - agno = 7 entry "Spanish.lproj" at block 0 offset 296 in directory inode 1610630695 references free inode 18000 clearing inode number in entry at offset 296... - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 19 - agno = 20 - agno = 21 - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - agno = 30 - agno = 31 entry "Resources" in shortform directory 4049684106 references free inode 18006 junking entry "Resources" in directory inode 4049684106 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... bad hash table for directory inode 1610630695 (no data entry): rebuilding rebuilding directory inode 1610630695 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected dir inode 18007, moving to lost+found disconnected inode 18024, moving to lost+found disconnected inode 18025, moving to lost+found disconnected inode 18026, moving to lost+found disconnected inode 18027, moving to lost+found disconnected inode 18028, moving to lost+found disconnected inode 18029, moving to lost+found disconnected inode 18030, moving to lost+found disconnected inode 18031, moving to lost+found disconnected inode 18037, moving to lost+found disconnected inode 18038, moving to lost+found disconnected inode 18039, moving to lost+found disconnected inode 18040, moving to lost+found disconnected inode 18041, moving to lost+found disconnected dir inode 268452448, moving to lost+found disconnected dir inode 268452449, moving to lost+found disconnected dir inode 536889431, moving to lost+found disconnected dir inode 536889432, moving to lost+found disconnected dir inode 805323820, moving to lost+found disconnected dir inode 805323821, moving to lost+found disconnected dir inode 1073761501, moving to lost+found disconnected dir inode 1073761502, moving to lost+found disconnected dir inode 1342194790, moving to lost+found disconnected dir inode 1342194791, moving to lost+found disconnected dir inode 1610630702, moving to lost+found disconnected dir inode 1879067163, moving to lost+found disconnected dir inode 2147967564, moving to lost+found disconnected dir inode 2436769389, moving to lost+found disconnected dir inode 2703685645, moving to lost+found disconnected dir inode 2703685648, moving to lost+found disconnected dir inode 2970453601, moving to lost+found disconnected dir inode 3240533616, moving to lost+found disconnected dir inode 3508468806, moving to lost+found disconnected dir inode 3777743419, moving to lost+found disconnected dir inode 4049684107, moving to lost+found Phase 7 - verify and correct link counts... resetting inode 5682 nlinks from 2 to 24 resetting inode 1610630695 nlinks from 21 to 20 resetting inode 4049684106 nlinks from 3 to 2 done Regards, George _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS corruption on ubuntu 2.6.27-9-server 2009-02-04 1:34 ` George Barnett @ 2009-02-04 1:46 ` Eric Sandeen 2009-02-04 1:53 ` George Barnett 0 siblings, 1 reply; 6+ messages in thread From: Eric Sandeen @ 2009-02-04 1:46 UTC (permalink / raw) To: George Barnett; +Cc: xfs George Barnett wrote: > On 04/02/2009, at 12:28 PM, Eric Sandeen wrote: > >> George Barnett wrote: >>> Hi, >>> >>> I'm seeing the following errors: >>> >>> [822153.422851] Filesystem "md2": XFS internal error xfs_da_do_buf(2) >>> at line 2107 of file /build/buildd/linux-2.6.27/fs/xfs/ >> we really should make that more informative. >> >> What it means is that you read a piece of metadata that did not match >> any of the metadata magic numbers. >> >> hard to say whether it might be an xfs bug I think; this does come up >> occasionally though and it'd at least be nice to print more details on >> the error (what the magic *was*, what block, etc) >> >> Do you happen to have the repair output? >> >> Did your md raid lose power w/ write cache enabled? > > Hi Eric, > > Thanks for your response. The system did not lose power. This > failure just "happens". I have a cronjob which rsync's /data to a > spare drive that's not on raid. It seems that is enough to cause this > failure. > > Fortunately, I still have the xfs_repair output in my term buffer: > > root@slut:/# xfs_repair /dev/md2 > Phase 1 - find and verify superblock... > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > - found root inode chunk > Phase 3 - for each AG... > - scan and clear agi unlinked lists... > - process known inodes and perform inode discovery... > - agno = 0 > bad magic number 0x0 on inode 18042 > bad version number 0x0 on inode 18042 > bad magic number 0x0 on inode 18043 > bad version number 0x0 on inode 18043 > bad magic number 0x0 on inode 18044 > bad version number 0x0 on inode 18044 > bad magic number 0x0 on inode 18045 > bad version number 0x0 on inode 18045 > bad magic number 0x0 on inode 18046 > bad version number 0x0 on inode 18046 > bad magic number 0x0 on inode 18047 > bad version number 0x0 on inode 18047 > bad directory block magic # 0 in block 0 for directory inode 18000 Interesting that all the bad magic numbers were 0... not sure what to make of that, offhand, I'm afraid... -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS corruption on ubuntu 2.6.27-9-server 2009-02-04 1:46 ` Eric Sandeen @ 2009-02-04 1:53 ` George Barnett 2009-02-04 2:05 ` Eric Sandeen 0 siblings, 1 reply; 6+ messages in thread From: George Barnett @ 2009-02-04 1:53 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs On 04/02/2009, at 12:46 PM, Eric Sandeen wrote: >> bad version number 0x0 on inode 18046 >> bad magic number 0x0 on inode 18047 >> bad version number 0x0 on inode 18047 >> bad directory block magic # 0 in block 0 for directory inode 18000 > > Interesting that all the bad magic numbers were 0... not sure what to > make of that, offhand, I'm afraid... Oh dear. I'm going to try moving the filesystem to ext3 to see if this continues. If it does, it would suggest a bug in the underlying raid10 implementation or a problem with the disks, although they're not reporting any errors [1]. Is there any further debugging I can do before I start fresh? George 1. The hardware ecc recovered smartctl metric is /very/ high, although I'm told this may be normal for samsung drives. I cant think of any way to confirm a disk problem without a CRC checking fs though. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS corruption on ubuntu 2.6.27-9-server 2009-02-04 1:53 ` George Barnett @ 2009-02-04 2:05 ` Eric Sandeen 0 siblings, 0 replies; 6+ messages in thread From: Eric Sandeen @ 2009-02-04 2:05 UTC (permalink / raw) To: George Barnett; +Cc: xfs George Barnett wrote: > On 04/02/2009, at 12:46 PM, Eric Sandeen wrote: > >>> bad version number 0x0 on inode 18046 >>> bad magic number 0x0 on inode 18047 >>> bad version number 0x0 on inode 18047 >>> bad directory block magic # 0 in block 0 for directory inode 18000 >> Interesting that all the bad magic numbers were 0... not sure what to >> make of that, offhand, I'm afraid... > > Oh dear. > > I'm going to try moving the filesystem to ext3 to see if this > continues. If it does, it would suggest a bug in the underlying > raid10 implementation or a problem with the disks, although they're > not reporting any errors [1]. one thing to note is that xfs is very good at detecting on-disk corruption, not sure ext3 will be as good. So ext3 may seem to run finer, longer, even if there is an underlying problem. > Is there any further debugging I can do before I start fresh? well, it'd be great to have an isolated testcase, if you can reproduce it succinctly. Also I don't know what exact kernel ubuntu uses or what patches are in it; you might try a stock upstream kernel w/ the same config, 2.6.27.$LATEST, and see if you continue to have problems. -Eric > George > > > 1. The hardware ecc recovered smartctl metric is /very/ high, > although I'm told this may be normal for samsung drives. I cant think > of any way to confirm a disk problem without a CRC checking fs though. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-02-04 2:06 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-02-04 0:32 XFS corruption on ubuntu 2.6.27-9-server George Barnett 2009-02-04 1:28 ` Eric Sandeen 2009-02-04 1:34 ` George Barnett 2009-02-04 1:46 ` Eric Sandeen 2009-02-04 1:53 ` George Barnett 2009-02-04 2:05 ` Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox