XFS internal error xfs_btree_check

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* XFS internal error xfs_btree_check_sblock
@ 2007-12-11 18:26 David Greaves
  2007-12-11 22:25 ` David Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: David Greaves @ 2007-12-11 18:26 UTC (permalink / raw)
  To: xfs

Hi

I've been having problems with this filesystem for a while now.

I upgraded to 2.6.23 to see if it's improved (no).

Once every 2 or 3 cold boots I get this in dmesg as the user logs in and
accesses the /scratch filesystem. If the error doesn't occur as the user logs in
then it won't happen at all.

Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file
fs/xfs/xfs_btree.c.  Caller 0xc01b7bc1
 [<c010511a>] show_trace_log_lvl+0x1a/0x30
 [<c0105d72>] show_trace+0x12/0x20
 [<c0105d95>] dump_stack+0x15/0x20
 [<c01dd34f>] xfs_error_report+0x4f/0x60
 [<c01cfcb6>] xfs_btree_check_sblock+0x56/0xd0
 [<c01b7bc1>] xfs_alloc_lookup+0x181/0x390
 [<c01b7e23>] xfs_alloc_lookup_eq+0x13/0x20
 [<c01b5594>] xfs_free_ag_extent+0x2f4/0x690
 [<c01b7164>] xfs_free_extent+0xb4/0xd0
 [<c01c1979>] xfs_bmap_finish+0x119/0x170
 [<c0209aa7>] xfs_remove+0x247/0x4f0
 [<c0211cc2>] xfs_vn_unlink+0x22/0x50
 [<c0172f28>] vfs_unlink+0x68/0xa0
 [<c01751e9>] do_unlinkat+0xb9/0x140
 [<c0175280>] sys_unlink+0x10/0x20
 [<c010420a>] syscall_call+0x7/0xb
 =======================
xfs_force_shutdown(dm-0,0x8) called from line 4274 of file fs/xfs/xfs_bmap.c.
Return address = 0xc0214dae
Filesystem "dm-0": Corruption of in-memory data detected.  Shutting down
filesystem: dm-0
Please umount the filesystem, and rectify the problem(s)

I ssh in as root, umount, mount, umount and run xfs_repair.

This is what I got this time:

Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
ir_freecount/free mismatch, inode chunk 59/5027968, freecount 27 nfree 26
        - found root inode chunk

All the rest was clean.

It is possible this fs suffered in the 2.6.17 timeframe
It is also possible something got broken whilst I was having lots of issues with
 hibernate (which is still unreliable).

I wonder if the fs is borked and xfs_repair isn't fixing it?

David
PS Please cc on replies.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS internal error xfs_btree_check_sblock
  2007-12-11 18:26 XFS internal error xfs_btree_check_sblock David Greaves
@ 2007-12-11 22:25 ` David Chinner
  2007-12-11 23:40   ` David Greaves
  0 siblings, 1 reply; 7+ messages in thread
From: David Chinner @ 2007-12-11 22:25 UTC (permalink / raw)
  To: David Greaves; +Cc: xfs

On Tue, Dec 11, 2007 at 06:26:55PM +0000, David Greaves wrote:
> Hi
> 
> I've been having problems with this filesystem for a while now.
> 
> I upgraded to 2.6.23 to see if it's improved (no).
> 
> Once every 2 or 3 cold boots I get this in dmesg as the user logs in and
> accesses the /scratch filesystem. If the error doesn't occur as the user logs in
> then it won't happen at all.
> 
> Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file
> fs/xfs/xfs_btree.c.  Caller 0xc01b7bc1
>  [<c010511a>] show_trace_log_lvl+0x1a/0x30
>  [<c0105d72>] show_trace+0x12/0x20
>  [<c0105d95>] dump_stack+0x15/0x20
>  [<c01dd34f>] xfs_error_report+0x4f/0x60
>  [<c01cfcb6>] xfs_btree_check_sblock+0x56/0xd0
>  [<c01b7bc1>] xfs_alloc_lookup+0x181/0x390
>  [<c01b7e23>] xfs_alloc_lookup_eq+0x13/0x20
>  [<c01b5594>] xfs_free_ag_extent+0x2f4/0x690
>  [<c01b7164>] xfs_free_extent+0xb4/0xd0
>  [<c01c1979>] xfs_bmap_finish+0x119/0x170
>  [<c0209aa7>] xfs_remove+0x247/0x4f0
>  [<c0211cc2>] xfs_vn_unlink+0x22/0x50
>  [<c0172f28>] vfs_unlink+0x68/0xa0
>  [<c01751e9>] do_unlinkat+0xb9/0x140
>  [<c0175280>] sys_unlink+0x10/0x20
>  [<c010420a>] syscall_call+0x7/0xb
>  =======================
> xfs_force_shutdown(dm-0,0x8) called from line 4274 of file fs/xfs/xfs_bmap.c.
> Return address = 0xc0214dae
> Filesystem "dm-0": Corruption of in-memory data detected.  Shutting down
> filesystem: dm-0
> Please umount the filesystem, and rectify the problem(s)

So there's a corrupted freespace btree block.

> I ssh in as root, umount, mount, umount and run xfs_repair.
> 
> This is what I got this time:
> 
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
> ir_freecount/free mismatch, inode chunk 59/5027968, freecount 27 nfree 26
>         - found root inode chunk
> 
> All the rest was clean.

repair doesn't check the freespace btrees - it just rebuilds them from
scratch. use xfs_check to tell you what is wrong with the filesystem, then
use xfs_repair to fix it....

> It is possible this fs suffered in the 2.6.17 timeframe
> It is also possible something got broken whilst I was having lots of issues with
>  hibernate (which is still unreliable).

Suspend does not quiesce filesystems safely, so you risk filesystem
corruption every time you suspend and resume no matter what filesystem
you use.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS internal error xfs_btree_check_sblock
  2007-12-11 22:25 ` David Chinner
@ 2007-12-11 23:40   ` David Greaves
  2007-12-12 11:12     ` David Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: David Greaves @ 2007-12-11 23:40 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs

David Chinner wrote:
> On Tue, Dec 11, 2007 at 06:26:55PM +0000, David Greaves wrote:
>> Once every 2 or 3 cold boots I get this in dmesg as the user logs in and
> So there's a corrupted freespace btree block.
OK, ta

>> I ssh in as root, umount, mount, umount and run xfs_repair.
> repair doesn't check the freespace btrees - it just rebuilds them from
> scratch. use xfs_check to tell you what is wrong with the filesystem, then
> use xfs_repair to fix it....

OK, having repaired it:
haze:~# xfs_check /dev/video_vg/video_lv
haze:~#

So why do I have to do this on a regular basis (ie run xfs_repair)?
I am shutting the machine down cleanly (init 0)

>> It is possible this fs suffered in the 2.6.17 timeframe
>> It is also possible something got broken whilst I was having lots of issues with
>>  hibernate (which is still unreliable).
> 
> Suspend does not quiesce filesystems safely, so you risk filesystem
> corruption every time you suspend and resume no matter what filesystem
> you use.

Well, FWIW, I've not hibernated this machine for a *long* time.
Also my hibernate script used to run xfs_freeze before hibernating (to be on the
safe side). This would regularly hang with an xfs_io process (or some such IIRC)
in an unkillable state.

I was about to edit my init scripts to do a mount, umount, xfs_repair, mount
cycle. But then I thought "this is wrong - I'll report it".
So is there anything else I should do?

David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS internal error xfs_btree_check_sblock
  2007-12-11 23:40   ` David Greaves
@ 2007-12-12 11:12     ` David Chinner
  2007-12-12 11:39       ` David Greaves
  0 siblings, 1 reply; 7+ messages in thread
From: David Chinner @ 2007-12-12 11:12 UTC (permalink / raw)
  To: David Greaves; +Cc: David Chinner, xfs

On Tue, Dec 11, 2007 at 11:40:56PM +0000, David Greaves wrote:
> David Chinner wrote:
> > On Tue, Dec 11, 2007 at 06:26:55PM +0000, David Greaves wrote:
> >> Once every 2 or 3 cold boots I get this in dmesg as the user logs in and
> > So there's a corrupted freespace btree block.
> OK, ta
> 
> >> I ssh in as root, umount, mount, umount and run xfs_repair.
> > repair doesn't check the freespace btrees - it just rebuilds them from
> > scratch. use xfs_check to tell you what is wrong with the filesystem, then
> > use xfs_repair to fix it....
> 
> OK, having repaired it:
> haze:~# xfs_check /dev/video_vg/video_lv
> haze:~#

Of course there's no errors - you just repaired them ;)

Run xfs_check before you run xfs-repair when a corruption occurs.

> So why do I have to do this on a regular basis (ie run xfs_repair)?

Don't know yet.

> I am shutting the machine down cleanly (init 0)

That doesn't mean everything shuts down cleanly....

> >> It is possible this fs suffered in the 2.6.17 timeframe
> >> It is also possible something got broken whilst I was having lots of issues with
> >>  hibernate (which is still unreliable).
> > 
> > Suspend does not quiesce filesystems safely, so you risk filesystem
> > corruption every time you suspend and resume no matter what filesystem
> > you use.
> 
> Well, FWIW, I've not hibernated this machine for a *long* time.

Ok, so ignore that.

> Also my hibernate script used to run xfs_freeze before hibernating (to be on the
> safe side). This would regularly hang with an xfs_io process (or some such IIRC)
> in an unkillable state.

Well, 2.6.23 completely broke this, along with freezing XFS filesystems.

> I was about to edit my init scripts to do a mount, umount, xfs_repair, mount
> cycle. But then I thought "this is wrong - I'll report it".
> So is there anything else I should do?

Check the filesystem before repairing it.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS internal error xfs_btree_check_sblock
  2007-12-12 11:12     ` David Chinner
@ 2007-12-12 11:39       ` David Greaves
  2007-12-12 22:00         ` David Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: David Greaves @ 2007-12-12 11:39 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs

David Chinner wrote:
> On Tue, Dec 11, 2007 at 11:40:56PM +0000, David Greaves wrote:
>> So is there anything else I should do?
> 
> Check the filesystem before repairing it.
yeah, OK :)

Well, it happened next boot. So:

haze:~# umount /scratch
haze:~# xfs_check /dev/video_vg/video_lv
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_check.  If you are unable to mount the filesystem, then use
the xfs_repair -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
haze:~# mount /scratch
haze:~# umount /scratch
haze:~# xfs_check /dev/video_vg/video_lv
bad format 2 for inode 1435146910 type 0
ir_freecount/free mismatch, inode chunk 42/25860704, freecount 64 nfree 63
bad format 2 for inode 1435150526 type 0
ir_freecount/free mismatch, inode chunk 42/25864320, freecount 64 nfree 63
bad format 2 for inode 1435173822 type 0
ir_freecount/free mismatch, inode chunk 42/25887616, freecount 64 nfree 63
bad format 2 for inode 1984739518 type 0
ir_freecount/free mismatch, inode chunk 59/5027968, freecount 27 nfree 26
allocated inode 1435146910 has 0 link count
allocated inode 1435173822 has 0 link count
allocated inode 1435150526 has 0 link count
allocated inode 1984739518 has 0 link count
haze:~#

Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file
fs/xfs/xfs_btree.c.  Caller 0xc01b7bc1
 [<c010511a>] show_trace_log_lvl+0x1a/0x30
 [<c0105d72>] show_trace+0x12/0x20
 [<c0105d95>] dump_stack+0x15/0x20
 [<c01dd34f>] xfs_error_report+0x4f/0x60
 [<c01cfcb6>] xfs_btree_check_sblock+0x56/0xd0
 [<c01b7bc1>] xfs_alloc_lookup+0x181/0x390
 [<c01b7e06>] xfs_alloc_lookup_ge+0x16/0x20
 [<c01b5e12>] xfs_alloc_ag_vextent_size+0x52/0x410
 [<c01b6c57>] xfs_alloc_ag_vextent+0x107/0x110
 [<c01b6e58>] xfs_alloc_fix_freelist+0x1f8/0x450
 [<c01b713c>] xfs_free_extent+0x8c/0xd0
 [<c01c1979>] xfs_bmap_finish+0x119/0x170
 [<c01e6f5a>] xfs_itruncate_finish+0x23a/0x3a0
 [<c020328d>] xfs_free_eofblocks+0x26d/0x2b0
 [<c0207de1>] xfs_release+0x171/0x270
 [<c020f216>] xfs_file_release+0x16/0x20
 [<c016ba2b>] __fput+0x9b/0x190
 [<c016bb88>] fput+0x18/0x20
 [<c0168ec7>] filp_close+0x47/0x80
 [<c016a3b7>] sys_close+0x87/0x110
 [<c010420a>] syscall_call+0x7/0xb
 =======================
xfs_force_shutdown(dm-0,0x8) called from line 4274 of file fs/xfs/xfs_bmap.c.
Return address = 0xc0214dae
Filesystem "dm-0": Corruption of in-memory data detected.  Shutting down
filesystem: dm-0

I've not yet run a repair...

David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS internal error xfs_btree_check_sblock
  2007-12-12 11:39       ` David Greaves
@ 2007-12-12 22:00         ` David Chinner
  2007-12-13 10:42           ` David Greaves
  0 siblings, 1 reply; 7+ messages in thread
From: David Chinner @ 2007-12-12 22:00 UTC (permalink / raw)
  To: David Greaves; +Cc: David Chinner, xfs

On Wed, Dec 12, 2007 at 11:39:36AM +0000, David Greaves wrote:
> David Chinner wrote:
> > On Tue, Dec 11, 2007 at 11:40:56PM +0000, David Greaves wrote:
> >> So is there anything else I should do?
> > 
> > Check the filesystem before repairing it.
> yeah, OK :)
> 
> Well, it happened next boot. So:
> 
> haze:~# umount /scratch
> haze:~# xfs_check /dev/video_vg/video_lv
> ERROR: The filesystem has valuable metadata changes in a log which needs to
> be replayed.  Mount the filesystem to replay the log, and unmount it before
> re-running xfs_check.  If you are unable to mount the filesystem, then use
> the xfs_repair -L option to destroy the log and attempt a repair.
> Note that destroying the log may cause corruption -- please attempt a mount
> of the filesystem before doing this.
> haze:~# mount /scratch
> haze:~# umount /scratch
> haze:~# xfs_check /dev/video_vg/video_lv
> bad format 2 for inode 1435146910 type 0
> ir_freecount/free mismatch, inode chunk 42/25860704, freecount 64 nfree 63
> bad format 2 for inode 1435150526 type 0
> ir_freecount/free mismatch, inode chunk 42/25864320, freecount 64 nfree 63
> bad format 2 for inode 1435173822 type 0
> ir_freecount/free mismatch, inode chunk 42/25887616, freecount 64 nfree 63
> bad format 2 for inode 1984739518 type 0
> ir_freecount/free mismatch, inode chunk 59/5027968, freecount 27 nfree 26
> allocated inode 1435146910 has 0 link count
> allocated inode 1435173822 has 0 link count
> allocated inode 1435150526 has 0 link count
> allocated inode 1984739518 has 0 link count

This is after the shutdown, right?

Hmmmm - that looks like inodes that have not been unlinked correctly. This is
after the shutdown, right? Also, "bad format 2" indicates that the di_mode
field is invalid or the data fork format of the inode is invalid. Can
you print out these inodes with:

# xfs_db -r -c "inode <ino #>" -c p /dev/video_vg/video_lv

And post that so we can see what state they are apparently in?

Also, no freespace btree corruption has been reported, so if a btree block
is being corrupted in memory as indicated by the shutdown there's either
a logic error in the btree code or something is trashing your memory.

Have you run memtest86 on this box to see if the memory is ok?

> I've not yet run a repair...

Can you hol doff for a while longer?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS internal error xfs_btree_check_sblock
  2007-12-12 22:00         ` David Chinner
@ 2007-12-13 10:42           ` David Greaves
  0 siblings, 0 replies; 7+ messages in thread
From: David Greaves @ 2007-12-13 10:42 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs

David Chinner wrote:
> Can you hol doff for a while longer?
I'm afraid SWMBO came home and "did the usual fix" whilst I was making coffee..
(she's almost as well trained as I am!)

I'll wait for a recurrence and do as you suggested...

David

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-12-13 10:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-11 18:26 XFS internal error xfs_btree_check_sblock David Greaves
2007-12-11 22:25 ` David Chinner
2007-12-11 23:40   ` David Greaves
2007-12-12 11:12     ` David Chinner
2007-12-12 11:39       ` David Greaves
2007-12-12 22:00         ` David Chinner
2007-12-13 10:42           ` David Greaves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox