* XFS internal error xfs_btree_check_sblock
@ 2007-12-11 18:26 David Greaves
2007-12-11 22:25 ` David Chinner
0 siblings, 1 reply; 7+ messages in thread
From: David Greaves @ 2007-12-11 18:26 UTC (permalink / raw)
To: xfs
Hi
I've been having problems with this filesystem for a while now.
I upgraded to 2.6.23 to see if it's improved (no).
Once every 2 or 3 cold boots I get this in dmesg as the user logs in and
accesses the /scratch filesystem. If the error doesn't occur as the user logs in
then it won't happen at all.
Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file
fs/xfs/xfs_btree.c. Caller 0xc01b7bc1
[<c010511a>] show_trace_log_lvl+0x1a/0x30
[<c0105d72>] show_trace+0x12/0x20
[<c0105d95>] dump_stack+0x15/0x20
[<c01dd34f>] xfs_error_report+0x4f/0x60
[<c01cfcb6>] xfs_btree_check_sblock+0x56/0xd0
[<c01b7bc1>] xfs_alloc_lookup+0x181/0x390
[<c01b7e23>] xfs_alloc_lookup_eq+0x13/0x20
[<c01b5594>] xfs_free_ag_extent+0x2f4/0x690
[<c01b7164>] xfs_free_extent+0xb4/0xd0
[<c01c1979>] xfs_bmap_finish+0x119/0x170
[<c0209aa7>] xfs_remove+0x247/0x4f0
[<c0211cc2>] xfs_vn_unlink+0x22/0x50
[<c0172f28>] vfs_unlink+0x68/0xa0
[<c01751e9>] do_unlinkat+0xb9/0x140
[<c0175280>] sys_unlink+0x10/0x20
[<c010420a>] syscall_call+0x7/0xb
=======================
xfs_force_shutdown(dm-0,0x8) called from line 4274 of file fs/xfs/xfs_bmap.c.
Return address = 0xc0214dae
Filesystem "dm-0": Corruption of in-memory data detected. Shutting down
filesystem: dm-0
Please umount the filesystem, and rectify the problem(s)
I ssh in as root, umount, mount, umount and run xfs_repair.
This is what I got this time:
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
ir_freecount/free mismatch, inode chunk 59/5027968, freecount 27 nfree 26
- found root inode chunk
All the rest was clean.
It is possible this fs suffered in the 2.6.17 timeframe
It is also possible something got broken whilst I was having lots of issues with
hibernate (which is still unreliable).
I wonder if the fs is borked and xfs_repair isn't fixing it?
David
PS Please cc on replies.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: XFS internal error xfs_btree_check_sblock 2007-12-11 18:26 XFS internal error xfs_btree_check_sblock David Greaves @ 2007-12-11 22:25 ` David Chinner 2007-12-11 23:40 ` David Greaves 0 siblings, 1 reply; 7+ messages in thread From: David Chinner @ 2007-12-11 22:25 UTC (permalink / raw) To: David Greaves; +Cc: xfs On Tue, Dec 11, 2007 at 06:26:55PM +0000, David Greaves wrote: > Hi > > I've been having problems with this filesystem for a while now. > > I upgraded to 2.6.23 to see if it's improved (no). > > Once every 2 or 3 cold boots I get this in dmesg as the user logs in and > accesses the /scratch filesystem. If the error doesn't occur as the user logs in > then it won't happen at all. > > Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file > fs/xfs/xfs_btree.c. Caller 0xc01b7bc1 > [<c010511a>] show_trace_log_lvl+0x1a/0x30 > [<c0105d72>] show_trace+0x12/0x20 > [<c0105d95>] dump_stack+0x15/0x20 > [<c01dd34f>] xfs_error_report+0x4f/0x60 > [<c01cfcb6>] xfs_btree_check_sblock+0x56/0xd0 > [<c01b7bc1>] xfs_alloc_lookup+0x181/0x390 > [<c01b7e23>] xfs_alloc_lookup_eq+0x13/0x20 > [<c01b5594>] xfs_free_ag_extent+0x2f4/0x690 > [<c01b7164>] xfs_free_extent+0xb4/0xd0 > [<c01c1979>] xfs_bmap_finish+0x119/0x170 > [<c0209aa7>] xfs_remove+0x247/0x4f0 > [<c0211cc2>] xfs_vn_unlink+0x22/0x50 > [<c0172f28>] vfs_unlink+0x68/0xa0 > [<c01751e9>] do_unlinkat+0xb9/0x140 > [<c0175280>] sys_unlink+0x10/0x20 > [<c010420a>] syscall_call+0x7/0xb > ======================= > xfs_force_shutdown(dm-0,0x8) called from line 4274 of file fs/xfs/xfs_bmap.c. > Return address = 0xc0214dae > Filesystem "dm-0": Corruption of in-memory data detected. Shutting down > filesystem: dm-0 > Please umount the filesystem, and rectify the problem(s) So there's a corrupted freespace btree block. > I ssh in as root, umount, mount, umount and run xfs_repair. > > This is what I got this time: > > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > ir_freecount/free mismatch, inode chunk 59/5027968, freecount 27 nfree 26 > - found root inode chunk > > All the rest was clean. repair doesn't check the freespace btrees - it just rebuilds them from scratch. use xfs_check to tell you what is wrong with the filesystem, then use xfs_repair to fix it.... > It is possible this fs suffered in the 2.6.17 timeframe > It is also possible something got broken whilst I was having lots of issues with > hibernate (which is still unreliable). Suspend does not quiesce filesystems safely, so you risk filesystem corruption every time you suspend and resume no matter what filesystem you use. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS internal error xfs_btree_check_sblock 2007-12-11 22:25 ` David Chinner @ 2007-12-11 23:40 ` David Greaves 2007-12-12 11:12 ` David Chinner 0 siblings, 1 reply; 7+ messages in thread From: David Greaves @ 2007-12-11 23:40 UTC (permalink / raw) To: David Chinner; +Cc: xfs David Chinner wrote: > On Tue, Dec 11, 2007 at 06:26:55PM +0000, David Greaves wrote: >> Once every 2 or 3 cold boots I get this in dmesg as the user logs in and > So there's a corrupted freespace btree block. OK, ta >> I ssh in as root, umount, mount, umount and run xfs_repair. > repair doesn't check the freespace btrees - it just rebuilds them from > scratch. use xfs_check to tell you what is wrong with the filesystem, then > use xfs_repair to fix it.... OK, having repaired it: haze:~# xfs_check /dev/video_vg/video_lv haze:~# So why do I have to do this on a regular basis (ie run xfs_repair)? I am shutting the machine down cleanly (init 0) >> It is possible this fs suffered in the 2.6.17 timeframe >> It is also possible something got broken whilst I was having lots of issues with >> hibernate (which is still unreliable). > > Suspend does not quiesce filesystems safely, so you risk filesystem > corruption every time you suspend and resume no matter what filesystem > you use. Well, FWIW, I've not hibernated this machine for a *long* time. Also my hibernate script used to run xfs_freeze before hibernating (to be on the safe side). This would regularly hang with an xfs_io process (or some such IIRC) in an unkillable state. I was about to edit my init scripts to do a mount, umount, xfs_repair, mount cycle. But then I thought "this is wrong - I'll report it". So is there anything else I should do? David ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS internal error xfs_btree_check_sblock 2007-12-11 23:40 ` David Greaves @ 2007-12-12 11:12 ` David Chinner 2007-12-12 11:39 ` David Greaves 0 siblings, 1 reply; 7+ messages in thread From: David Chinner @ 2007-12-12 11:12 UTC (permalink / raw) To: David Greaves; +Cc: David Chinner, xfs On Tue, Dec 11, 2007 at 11:40:56PM +0000, David Greaves wrote: > David Chinner wrote: > > On Tue, Dec 11, 2007 at 06:26:55PM +0000, David Greaves wrote: > >> Once every 2 or 3 cold boots I get this in dmesg as the user logs in and > > So there's a corrupted freespace btree block. > OK, ta > > >> I ssh in as root, umount, mount, umount and run xfs_repair. > > repair doesn't check the freespace btrees - it just rebuilds them from > > scratch. use xfs_check to tell you what is wrong with the filesystem, then > > use xfs_repair to fix it.... > > OK, having repaired it: > haze:~# xfs_check /dev/video_vg/video_lv > haze:~# Of course there's no errors - you just repaired them ;) Run xfs_check before you run xfs-repair when a corruption occurs. > So why do I have to do this on a regular basis (ie run xfs_repair)? Don't know yet. > I am shutting the machine down cleanly (init 0) That doesn't mean everything shuts down cleanly.... > >> It is possible this fs suffered in the 2.6.17 timeframe > >> It is also possible something got broken whilst I was having lots of issues with > >> hibernate (which is still unreliable). > > > > Suspend does not quiesce filesystems safely, so you risk filesystem > > corruption every time you suspend and resume no matter what filesystem > > you use. > > Well, FWIW, I've not hibernated this machine for a *long* time. Ok, so ignore that. > Also my hibernate script used to run xfs_freeze before hibernating (to be on the > safe side). This would regularly hang with an xfs_io process (or some such IIRC) > in an unkillable state. Well, 2.6.23 completely broke this, along with freezing XFS filesystems. > I was about to edit my init scripts to do a mount, umount, xfs_repair, mount > cycle. But then I thought "this is wrong - I'll report it". > So is there anything else I should do? Check the filesystem before repairing it. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS internal error xfs_btree_check_sblock 2007-12-12 11:12 ` David Chinner @ 2007-12-12 11:39 ` David Greaves 2007-12-12 22:00 ` David Chinner 0 siblings, 1 reply; 7+ messages in thread From: David Greaves @ 2007-12-12 11:39 UTC (permalink / raw) To: David Chinner; +Cc: xfs David Chinner wrote: > On Tue, Dec 11, 2007 at 11:40:56PM +0000, David Greaves wrote: >> So is there anything else I should do? > > Check the filesystem before repairing it. yeah, OK :) Well, it happened next boot. So: haze:~# umount /scratch haze:~# xfs_check /dev/video_vg/video_lv ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_check. If you are unable to mount the filesystem, then use the xfs_repair -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. haze:~# mount /scratch haze:~# umount /scratch haze:~# xfs_check /dev/video_vg/video_lv bad format 2 for inode 1435146910 type 0 ir_freecount/free mismatch, inode chunk 42/25860704, freecount 64 nfree 63 bad format 2 for inode 1435150526 type 0 ir_freecount/free mismatch, inode chunk 42/25864320, freecount 64 nfree 63 bad format 2 for inode 1435173822 type 0 ir_freecount/free mismatch, inode chunk 42/25887616, freecount 64 nfree 63 bad format 2 for inode 1984739518 type 0 ir_freecount/free mismatch, inode chunk 59/5027968, freecount 27 nfree 26 allocated inode 1435146910 has 0 link count allocated inode 1435173822 has 0 link count allocated inode 1435150526 has 0 link count allocated inode 1984739518 has 0 link count haze:~# Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file fs/xfs/xfs_btree.c. Caller 0xc01b7bc1 [<c010511a>] show_trace_log_lvl+0x1a/0x30 [<c0105d72>] show_trace+0x12/0x20 [<c0105d95>] dump_stack+0x15/0x20 [<c01dd34f>] xfs_error_report+0x4f/0x60 [<c01cfcb6>] xfs_btree_check_sblock+0x56/0xd0 [<c01b7bc1>] xfs_alloc_lookup+0x181/0x390 [<c01b7e06>] xfs_alloc_lookup_ge+0x16/0x20 [<c01b5e12>] xfs_alloc_ag_vextent_size+0x52/0x410 [<c01b6c57>] xfs_alloc_ag_vextent+0x107/0x110 [<c01b6e58>] xfs_alloc_fix_freelist+0x1f8/0x450 [<c01b713c>] xfs_free_extent+0x8c/0xd0 [<c01c1979>] xfs_bmap_finish+0x119/0x170 [<c01e6f5a>] xfs_itruncate_finish+0x23a/0x3a0 [<c020328d>] xfs_free_eofblocks+0x26d/0x2b0 [<c0207de1>] xfs_release+0x171/0x270 [<c020f216>] xfs_file_release+0x16/0x20 [<c016ba2b>] __fput+0x9b/0x190 [<c016bb88>] fput+0x18/0x20 [<c0168ec7>] filp_close+0x47/0x80 [<c016a3b7>] sys_close+0x87/0x110 [<c010420a>] syscall_call+0x7/0xb ======================= xfs_force_shutdown(dm-0,0x8) called from line 4274 of file fs/xfs/xfs_bmap.c. Return address = 0xc0214dae Filesystem "dm-0": Corruption of in-memory data detected. Shutting down filesystem: dm-0 I've not yet run a repair... David ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS internal error xfs_btree_check_sblock 2007-12-12 11:39 ` David Greaves @ 2007-12-12 22:00 ` David Chinner 2007-12-13 10:42 ` David Greaves 0 siblings, 1 reply; 7+ messages in thread From: David Chinner @ 2007-12-12 22:00 UTC (permalink / raw) To: David Greaves; +Cc: David Chinner, xfs On Wed, Dec 12, 2007 at 11:39:36AM +0000, David Greaves wrote: > David Chinner wrote: > > On Tue, Dec 11, 2007 at 11:40:56PM +0000, David Greaves wrote: > >> So is there anything else I should do? > > > > Check the filesystem before repairing it. > yeah, OK :) > > Well, it happened next boot. So: > > haze:~# umount /scratch > haze:~# xfs_check /dev/video_vg/video_lv > ERROR: The filesystem has valuable metadata changes in a log which needs to > be replayed. Mount the filesystem to replay the log, and unmount it before > re-running xfs_check. If you are unable to mount the filesystem, then use > the xfs_repair -L option to destroy the log and attempt a repair. > Note that destroying the log may cause corruption -- please attempt a mount > of the filesystem before doing this. > haze:~# mount /scratch > haze:~# umount /scratch > haze:~# xfs_check /dev/video_vg/video_lv > bad format 2 for inode 1435146910 type 0 > ir_freecount/free mismatch, inode chunk 42/25860704, freecount 64 nfree 63 > bad format 2 for inode 1435150526 type 0 > ir_freecount/free mismatch, inode chunk 42/25864320, freecount 64 nfree 63 > bad format 2 for inode 1435173822 type 0 > ir_freecount/free mismatch, inode chunk 42/25887616, freecount 64 nfree 63 > bad format 2 for inode 1984739518 type 0 > ir_freecount/free mismatch, inode chunk 59/5027968, freecount 27 nfree 26 > allocated inode 1435146910 has 0 link count > allocated inode 1435173822 has 0 link count > allocated inode 1435150526 has 0 link count > allocated inode 1984739518 has 0 link count This is after the shutdown, right? Hmmmm - that looks like inodes that have not been unlinked correctly. This is after the shutdown, right? Also, "bad format 2" indicates that the di_mode field is invalid or the data fork format of the inode is invalid. Can you print out these inodes with: # xfs_db -r -c "inode <ino #>" -c p /dev/video_vg/video_lv And post that so we can see what state they are apparently in? Also, no freespace btree corruption has been reported, so if a btree block is being corrupted in memory as indicated by the shutdown there's either a logic error in the btree code or something is trashing your memory. Have you run memtest86 on this box to see if the memory is ok? > I've not yet run a repair... Can you hol doff for a while longer? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS internal error xfs_btree_check_sblock 2007-12-12 22:00 ` David Chinner @ 2007-12-13 10:42 ` David Greaves 0 siblings, 0 replies; 7+ messages in thread From: David Greaves @ 2007-12-13 10:42 UTC (permalink / raw) To: David Chinner; +Cc: xfs David Chinner wrote: > Can you hol doff for a while longer? I'm afraid SWMBO came home and "did the usual fix" whilst I was making coffee.. (she's almost as well trained as I am!) I'll wait for a recurrence and do as you suggested... David ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-12-13 10:43 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-12-11 18:26 XFS internal error xfs_btree_check_sblock David Greaves 2007-12-11 22:25 ` David Chinner 2007-12-11 23:40 ` David Greaves 2007-12-12 11:12 ` David Chinner 2007-12-12 11:39 ` David Greaves 2007-12-12 22:00 ` David Chinner 2007-12-13 10:42 ` David Greaves
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox