* XFS corruption on ubuntu 2.6.27-9-server
@ 2009-02-04 0:32 George Barnett
2009-02-04 1:28 ` Eric Sandeen
0 siblings, 1 reply; 6+ messages in thread
From: George Barnett @ 2009-02-04 0:32 UTC (permalink / raw)
To: xfs
Hi,
I'm seeing the following errors:
[822153.422851] Filesystem "md2": XFS internal error xfs_da_do_buf(2)
at line 2107 of file /build/buildd/linux-2.6.27/fs/xfs/
xfs_da_btree.c. Caller 0xffffffffa03be8da
[822153.422903] Pid: 3273, comm: du Not tainted 2.6.27-9-server #1
[822153.422905]
[822153.422906] Call Trace:
[822153.422931] [<ffffffffa03cab23>] xfs_error_report+0x43/0x50 [xfs]
[822153.422956] [<ffffffffa03be8da>] ? xfs_da_read_buf+0x2a/0x30 [xfs]
[822153.422976] [<ffffffffa03cab8d>] xfs_corruption_error+0x5d/0x80
[xfs]
[822153.422995] [<ffffffffa03be808>] xfs_da_do_buf+0x6a8/0x700 [xfs]
[822153.423014] [<ffffffffa03be8da>] ? xfs_da_read_buf+0x2a/0x30 [xfs]
[822153.423019] [<ffffffff80305b06>] ? mntput_no_expire+0x36/0x160
[822153.423022] [<ffffffff803866e1>] ? aa_permission+0x21/0xd0
[822153.423041] [<ffffffffa03be8da>] xfs_da_read_buf+0x2a/0x30 [xfs]
[822153.423061] [<ffffffffa03c34fa>] ? xfs_dir2_block_getdents+0x9a/
0x210 [xfs]
[822153.423080] [<ffffffffa03c34fa>] xfs_dir2_block_getdents+0x9a/
0x210 [xfs]
[822153.423099] [<ffffffffa03adf7b>] ? xfs_bmap_last_offset+0x13b/
0x150 [xfs]
[822153.423119] [<ffffffffa03f9970>] ? xfs_hack_filldir+0x0/0x60 [xfs]
[822153.423138] [<ffffffffa03f9970>] ? xfs_hack_filldir+0x0/0x60 [xfs]
[822153.423157] [<ffffffffa03c148b>] xfs_readdir+0x9b/0xf0 [xfs]
[822153.423176] [<ffffffffa03f98a6>] xfs_file_readdir+0xd6/0x1a0 [xfs]
[822153.423180] [<ffffffff802f8810>] ? filldir+0x0/0xe0
[822153.423183] [<ffffffff80386821>] ? aa_file_permission+0x21/0xf0
[822153.423185] [<ffffffff802f8810>] ? filldir+0x0/0xe0
[822153.423188] [<ffffffff802f8810>] ? filldir+0x0/0xe0
[822153.423191] [<ffffffff802f8a9b>] vfs_readdir+0xbb/0xe0
[822153.423194] [<ffffffff802f8c28>] sys_getdents+0x88/0xe0
[822153.423199] [<ffffffff8021285a>] system_call_fastpath+0x16/0x1b
This seems to happen reasonably regularly on my system and causes the
filesystem to be marked as dirty. xfs_repair runs fine, but I end up
with a bunch of files moved to lost+found. There are no device errors
logged when this happens.
Mount options:
/dev/md2 on /data type xfs (rw,noatime)
Kernel:
Linux slut 2.6.27-9-server #1 SMP Thu Nov 20 22:56:07 UTC 2008 x86_64
GNU/Linux
Device:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md2 : active raid10 sda2[0] sdd2[3] sdb2[1]
1947655680 blocks super 1.2 128K chunks 2 far-copies [4/3] [UUUU]
# xfs_info /dev/md2
meta-data=/dev/md2 isize=256 agcount=32,
agsize=15216064 blks
= sectsz=512 attr=0
data = bsize=4096 blocks=486913920,
imaxpct=25
= sunit=32 swidth=128 blks
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=1
= sectsz=512 sunit=0 blks, lazy-
count=0
realtime =none extsz=524288 blocks=0, rtextents=0
Any assistance would be greatly appreciated.
George
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS corruption on ubuntu 2.6.27-9-server
2009-02-04 0:32 XFS corruption on ubuntu 2.6.27-9-server George Barnett
@ 2009-02-04 1:28 ` Eric Sandeen
2009-02-04 1:34 ` George Barnett
0 siblings, 1 reply; 6+ messages in thread
From: Eric Sandeen @ 2009-02-04 1:28 UTC (permalink / raw)
To: George Barnett; +Cc: xfs
George Barnett wrote:
> Hi,
>
> I'm seeing the following errors:
>
> [822153.422851] Filesystem "md2": XFS internal error xfs_da_do_buf(2)
> at line 2107 of file /build/buildd/linux-2.6.27/fs/xfs/
we really should make that more informative.
What it means is that you read a piece of metadata that did not match
any of the metadata magic numbers.
hard to say whether it might be an xfs bug I think; this does come up
occasionally though and it'd at least be nice to print more details on
the error (what the magic *was*, what block, etc)
Do you happen to have the repair output?
Did your md raid lose power w/ write cache enabled?
-Eric
> xfs_da_btree.c. Caller 0xffffffffa03be8da
> [822153.422903] Pid: 3273, comm: du Not tainted 2.6.27-9-server #1
> [822153.422905]
> [822153.422906] Call Trace:
> [822153.422931] [<ffffffffa03cab23>] xfs_error_report+0x43/0x50 [xfs]
> [822153.422956] [<ffffffffa03be8da>] ? xfs_da_read_buf+0x2a/0x30 [xfs]
> [822153.422976] [<ffffffffa03cab8d>] xfs_corruption_error+0x5d/0x80
> [xfs]
> [822153.422995] [<ffffffffa03be808>] xfs_da_do_buf+0x6a8/0x700 [xfs]
> [822153.423014] [<ffffffffa03be8da>] ? xfs_da_read_buf+0x2a/0x30 [xfs]
> [822153.423019] [<ffffffff80305b06>] ? mntput_no_expire+0x36/0x160
> [822153.423022] [<ffffffff803866e1>] ? aa_permission+0x21/0xd0
> [822153.423041] [<ffffffffa03be8da>] xfs_da_read_buf+0x2a/0x30 [xfs]
> [822153.423061] [<ffffffffa03c34fa>] ? xfs_dir2_block_getdents+0x9a/
> 0x210 [xfs]
> [822153.423080] [<ffffffffa03c34fa>] xfs_dir2_block_getdents+0x9a/
> 0x210 [xfs]
> [822153.423099] [<ffffffffa03adf7b>] ? xfs_bmap_last_offset+0x13b/
> 0x150 [xfs]
> [822153.423119] [<ffffffffa03f9970>] ? xfs_hack_filldir+0x0/0x60 [xfs]
> [822153.423138] [<ffffffffa03f9970>] ? xfs_hack_filldir+0x0/0x60 [xfs]
> [822153.423157] [<ffffffffa03c148b>] xfs_readdir+0x9b/0xf0 [xfs]
> [822153.423176] [<ffffffffa03f98a6>] xfs_file_readdir+0xd6/0x1a0 [xfs]
> [822153.423180] [<ffffffff802f8810>] ? filldir+0x0/0xe0
> [822153.423183] [<ffffffff80386821>] ? aa_file_permission+0x21/0xf0
> [822153.423185] [<ffffffff802f8810>] ? filldir+0x0/0xe0
> [822153.423188] [<ffffffff802f8810>] ? filldir+0x0/0xe0
> [822153.423191] [<ffffffff802f8a9b>] vfs_readdir+0xbb/0xe0
> [822153.423194] [<ffffffff802f8c28>] sys_getdents+0x88/0xe0
> [822153.423199] [<ffffffff8021285a>] system_call_fastpath+0x16/0x1b
>
> This seems to happen reasonably regularly on my system and causes the
> filesystem to be marked as dirty. xfs_repair runs fine, but I end up
> with a bunch of files moved to lost+found. There are no device errors
> logged when this happens.
>
> Mount options:
>
> /dev/md2 on /data type xfs (rw,noatime)
>
> Kernel:
>
> Linux slut 2.6.27-9-server #1 SMP Thu Nov 20 22:56:07 UTC 2008 x86_64
> GNU/Linux
>
> Device:
>
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md2 : active raid10 sda2[0] sdd2[3] sdb2[1]
> 1947655680 blocks super 1.2 128K chunks 2 far-copies [4/3] [UUUU]
>
> # xfs_info /dev/md2
> meta-data=/dev/md2 isize=256 agcount=32,
> agsize=15216064 blks
> = sectsz=512 attr=0
> data = bsize=4096 blocks=486913920,
> imaxpct=25
> = sunit=32 swidth=128 blks
> naming =version 2 bsize=4096
> log =internal bsize=4096 blocks=32768, version=1
> = sectsz=512 sunit=0 blks, lazy-
> count=0
> realtime =none extsz=524288 blocks=0, rtextents=0
>
> Any assistance would be greatly appreciated.
>
> George
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS corruption on ubuntu 2.6.27-9-server
2009-02-04 1:28 ` Eric Sandeen
@ 2009-02-04 1:34 ` George Barnett
2009-02-04 1:46 ` Eric Sandeen
0 siblings, 1 reply; 6+ messages in thread
From: George Barnett @ 2009-02-04 1:34 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
On 04/02/2009, at 12:28 PM, Eric Sandeen wrote:
> George Barnett wrote:
>> Hi,
>>
>> I'm seeing the following errors:
>>
>> [822153.422851] Filesystem "md2": XFS internal error xfs_da_do_buf(2)
>> at line 2107 of file /build/buildd/linux-2.6.27/fs/xfs/
>
> we really should make that more informative.
>
> What it means is that you read a piece of metadata that did not match
> any of the metadata magic numbers.
>
> hard to say whether it might be an xfs bug I think; this does come up
> occasionally though and it'd at least be nice to print more details on
> the error (what the magic *was*, what block, etc)
>
> Do you happen to have the repair output?
>
> Did your md raid lose power w/ write cache enabled?
Hi Eric,
Thanks for your response. The system did not lose power. This
failure just "happens". I have a cronjob which rsync's /data to a
spare drive that's not on raid. It seems that is enough to cause this
failure.
Fortunately, I still have the xfs_repair output in my term buffer:
root@slut:/# xfs_repair /dev/md2
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
bad magic number 0x0 on inode 18042
bad version number 0x0 on inode 18042
bad magic number 0x0 on inode 18043
bad version number 0x0 on inode 18043
bad magic number 0x0 on inode 18044
bad version number 0x0 on inode 18044
bad magic number 0x0 on inode 18045
bad version number 0x0 on inode 18045
bad magic number 0x0 on inode 18046
bad version number 0x0 on inode 18046
bad magic number 0x0 on inode 18047
bad version number 0x0 on inode 18047
bad directory block magic # 0 in block 0 for directory inode 18000
corrupt block 0 in directory inode 18000
will junk block
no . entry for directory 18000
no .. entry for directory 18000
problem with directory contents in inode 18000
cleared inode 18000
bad directory block magic # 0 in block 0 for directory inode 18006
corrupt block 0 in directory inode 18006
will junk block
no . entry for directory 18006
no .. entry for directory 18006
problem with directory contents in inode 18006
cleared inode 18006
bad magic number 0x0 on inode 18042, resetting magic number
bad version number 0x0 on inode 18042, resetting version number
imap claims a free inode 18042 is in use, correcting imap and clearing
inode
cleared inode 18042
bad magic number 0x0 on inode 18043, resetting magic number
bad version number 0x0 on inode 18043, resetting version number
imap claims a free inode 18043 is in use, correcting imap and clearing
inode
cleared inode 18043
bad magic number 0x0 on inode 18044, resetting magic number
bad version number 0x0 on inode 18044, resetting version number
imap claims a free inode 18044 is in use, correcting imap and clearing
inode
cleared inode 18044
bad magic number 0x0 on inode 18045, resetting magic number
bad version number 0x0 on inode 18045, resetting version number
imap claims a free inode 18045 is in use, correcting imap and clearing
inode
cleared inode 18045
bad magic number 0x0 on inode 18046, resetting magic number
bad version number 0x0 on inode 18046, resetting version number
imap claims a free inode 18046 is in use, correcting imap and clearing
inode
cleared inode 18046
bad magic number 0x0 on inode 18047, resetting magic number
bad version number 0x0 on inode 18047, resetting version number
imap claims a free inode 18047 is in use, correcting imap and clearing
inode
cleared inode 18047
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
entry "classes.nib" in shortform directory 18009 references free inode
18047
junking entry "classes.nib" in directory inode 18009
- agno = 4
- agno = 5
- agno = 6
- agno = 7
entry "Spanish.lproj" at block 0 offset 296 in directory inode
1610630695 references free inode 18000
clearing inode number in entry at offset 296...
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
entry "Resources" in shortform directory 4049684106 references free
inode 18006
junking entry "Resources" in directory inode 4049684106
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
bad hash table for directory inode 1610630695 (no data entry):
rebuilding
rebuilding directory inode 1610630695
- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected dir inode 18007, moving to lost+found
disconnected inode 18024, moving to lost+found
disconnected inode 18025, moving to lost+found
disconnected inode 18026, moving to lost+found
disconnected inode 18027, moving to lost+found
disconnected inode 18028, moving to lost+found
disconnected inode 18029, moving to lost+found
disconnected inode 18030, moving to lost+found
disconnected inode 18031, moving to lost+found
disconnected inode 18037, moving to lost+found
disconnected inode 18038, moving to lost+found
disconnected inode 18039, moving to lost+found
disconnected inode 18040, moving to lost+found
disconnected inode 18041, moving to lost+found
disconnected dir inode 268452448, moving to lost+found
disconnected dir inode 268452449, moving to lost+found
disconnected dir inode 536889431, moving to lost+found
disconnected dir inode 536889432, moving to lost+found
disconnected dir inode 805323820, moving to lost+found
disconnected dir inode 805323821, moving to lost+found
disconnected dir inode 1073761501, moving to lost+found
disconnected dir inode 1073761502, moving to lost+found
disconnected dir inode 1342194790, moving to lost+found
disconnected dir inode 1342194791, moving to lost+found
disconnected dir inode 1610630702, moving to lost+found
disconnected dir inode 1879067163, moving to lost+found
disconnected dir inode 2147967564, moving to lost+found
disconnected dir inode 2436769389, moving to lost+found
disconnected dir inode 2703685645, moving to lost+found
disconnected dir inode 2703685648, moving to lost+found
disconnected dir inode 2970453601, moving to lost+found
disconnected dir inode 3240533616, moving to lost+found
disconnected dir inode 3508468806, moving to lost+found
disconnected dir inode 3777743419, moving to lost+found
disconnected dir inode 4049684107, moving to lost+found
Phase 7 - verify and correct link counts...
resetting inode 5682 nlinks from 2 to 24
resetting inode 1610630695 nlinks from 21 to 20
resetting inode 4049684106 nlinks from 3 to 2
done
Regards,
George
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS corruption on ubuntu 2.6.27-9-server
2009-02-04 1:34 ` George Barnett
@ 2009-02-04 1:46 ` Eric Sandeen
2009-02-04 1:53 ` George Barnett
0 siblings, 1 reply; 6+ messages in thread
From: Eric Sandeen @ 2009-02-04 1:46 UTC (permalink / raw)
To: George Barnett; +Cc: xfs
George Barnett wrote:
> On 04/02/2009, at 12:28 PM, Eric Sandeen wrote:
>
>> George Barnett wrote:
>>> Hi,
>>>
>>> I'm seeing the following errors:
>>>
>>> [822153.422851] Filesystem "md2": XFS internal error xfs_da_do_buf(2)
>>> at line 2107 of file /build/buildd/linux-2.6.27/fs/xfs/
>> we really should make that more informative.
>>
>> What it means is that you read a piece of metadata that did not match
>> any of the metadata magic numbers.
>>
>> hard to say whether it might be an xfs bug I think; this does come up
>> occasionally though and it'd at least be nice to print more details on
>> the error (what the magic *was*, what block, etc)
>>
>> Do you happen to have the repair output?
>>
>> Did your md raid lose power w/ write cache enabled?
>
> Hi Eric,
>
> Thanks for your response. The system did not lose power. This
> failure just "happens". I have a cronjob which rsync's /data to a
> spare drive that's not on raid. It seems that is enough to cause this
> failure.
>
> Fortunately, I still have the xfs_repair output in my term buffer:
>
> root@slut:/# xfs_repair /dev/md2
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> - scan filesystem freespace and inode maps...
> - found root inode chunk
> Phase 3 - for each AG...
> - scan and clear agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> bad magic number 0x0 on inode 18042
> bad version number 0x0 on inode 18042
> bad magic number 0x0 on inode 18043
> bad version number 0x0 on inode 18043
> bad magic number 0x0 on inode 18044
> bad version number 0x0 on inode 18044
> bad magic number 0x0 on inode 18045
> bad version number 0x0 on inode 18045
> bad magic number 0x0 on inode 18046
> bad version number 0x0 on inode 18046
> bad magic number 0x0 on inode 18047
> bad version number 0x0 on inode 18047
> bad directory block magic # 0 in block 0 for directory inode 18000
Interesting that all the bad magic numbers were 0... not sure what to
make of that, offhand, I'm afraid...
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS corruption on ubuntu 2.6.27-9-server
2009-02-04 1:46 ` Eric Sandeen
@ 2009-02-04 1:53 ` George Barnett
2009-02-04 2:05 ` Eric Sandeen
0 siblings, 1 reply; 6+ messages in thread
From: George Barnett @ 2009-02-04 1:53 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
On 04/02/2009, at 12:46 PM, Eric Sandeen wrote:
>> bad version number 0x0 on inode 18046
>> bad magic number 0x0 on inode 18047
>> bad version number 0x0 on inode 18047
>> bad directory block magic # 0 in block 0 for directory inode 18000
>
> Interesting that all the bad magic numbers were 0... not sure what to
> make of that, offhand, I'm afraid...
Oh dear.
I'm going to try moving the filesystem to ext3 to see if this
continues. If it does, it would suggest a bug in the underlying
raid10 implementation or a problem with the disks, although they're
not reporting any errors [1].
Is there any further debugging I can do before I start fresh?
George
1. The hardware ecc recovered smartctl metric is /very/ high,
although I'm told this may be normal for samsung drives. I cant think
of any way to confirm a disk problem without a CRC checking fs though.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS corruption on ubuntu 2.6.27-9-server
2009-02-04 1:53 ` George Barnett
@ 2009-02-04 2:05 ` Eric Sandeen
0 siblings, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2009-02-04 2:05 UTC (permalink / raw)
To: George Barnett; +Cc: xfs
George Barnett wrote:
> On 04/02/2009, at 12:46 PM, Eric Sandeen wrote:
>
>>> bad version number 0x0 on inode 18046
>>> bad magic number 0x0 on inode 18047
>>> bad version number 0x0 on inode 18047
>>> bad directory block magic # 0 in block 0 for directory inode 18000
>> Interesting that all the bad magic numbers were 0... not sure what to
>> make of that, offhand, I'm afraid...
>
> Oh dear.
>
> I'm going to try moving the filesystem to ext3 to see if this
> continues. If it does, it would suggest a bug in the underlying
> raid10 implementation or a problem with the disks, although they're
> not reporting any errors [1].
one thing to note is that xfs is very good at detecting on-disk
corruption, not sure ext3 will be as good. So ext3 may seem to run
finer, longer, even if there is an underlying problem.
> Is there any further debugging I can do before I start fresh?
well, it'd be great to have an isolated testcase, if you can reproduce
it succinctly.
Also I don't know what exact kernel ubuntu uses or what patches are in
it; you might try a stock upstream kernel w/ the same config,
2.6.27.$LATEST, and see if you continue to have problems.
-Eric
> George
>
>
> 1. The hardware ecc recovered smartctl metric is /very/ high,
> although I'm told this may be normal for samsung drives. I cant think
> of any way to confirm a disk problem without a CRC checking fs though.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-02-04 2:06 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-04 0:32 XFS corruption on ubuntu 2.6.27-9-server George Barnett
2009-02-04 1:28 ` Eric Sandeen
2009-02-04 1:34 ` George Barnett
2009-02-04 1:46 ` Eric Sandeen
2009-02-04 1:53 ` George Barnett
2009-02-04 2:05 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox