public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* fs corruption
@ 2011-04-12  9:33 stress_buster
  2011-04-12  9:49 ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: stress_buster @ 2011-04-12  9:33 UTC (permalink / raw)
  To: xfs


My dmesg output shows the below trace. It repeats over and over again.

XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1545 of file
fs/xfs/xfs_alloc.c.  Caller 0xffffffff881a8961

Call Trace:
 [<ffffffff881a6e27>] :xfs:xfs_free_ag_extent+0x19e/0x67e
 [<ffffffff881a8961>] :xfs:xfs_free_extent+0xa9/0xc9
 [<ffffffff881d96cf>] :xfs:xlog_recover_process_efi+0x112/0x16c
 [<ffffffff881f31b4>] :xfs:xfs_fs_fill_super+0x0/0x3e4
 [<ffffffff881da8c2>] :xfs:xlog_recover_process_efis+0x4f/0x8d
 [<ffffffff881da914>] :xfs:xlog_recover_finish+0x14/0xad
 [<ffffffff881f31b4>] :xfs:xfs_fs_fill_super+0x0/0x3e4
 [<ffffffff881df420>] :xfs:xfs_mountfs+0x498/0x5e2
 [<ffffffff881dfb42>] :xfs:xfs_mru_cache_create+0x113/0x143
 [<ffffffff881f33b7>] :xfs:xfs_fs_fill_super+0x203/0x3e4
 [<ffffffff800e544f>] get_sb_bdev+0x10a/0x16c
 [<ffffffff800e4dec>] vfs_kern_mount+0x93/0x11a
 [<ffffffff800e4eb5>] do_kern_mount+0x36/0x4d
 [<ffffffff800ef2ed>] do_mount+0x6a9/0x719
 [<ffffffff80008d84>] __handle_mm_fault+0x5f2/0xfaa
 [<ffffffff80022127>] __up_read+0x19/0x7f
 [<ffffffff80067b88>] do_page_fault+0x4fe/0x874
 [<ffffffff8012c580>] inode_doinit_with_dentry+0x86/0x47c
 [<ffffffff800cd378>] zone_statistics+0x3e/0x6d
 [<ffffffff8000f2ff>] __alloc_pages+0x78/0x308
 [<ffffffff8004c9fd>] sys_mount+0x8a/0xcd
 [<ffffffff8005e116>] system_call+0x7e/0x83

Failed to recover EFIs on filesystem: cciss/c0d0
XFS: log mount finish failed

Can someone shed some light on what is happening here?

Also what the next steps I need to take to repair the fs? (assuming my xfs
fs is corrupted)
Will running xfs_repair be good enough in this case?

Thanks in advance
-- 
View this message in context: http://old.nabble.com/fs-corruption-tp31377534p31377534.html
Sent from the Xfs - General mailing list archive at Nabble.com.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs corruption
  2011-04-12  9:33 fs corruption stress_buster
@ 2011-04-12  9:49 ` Dave Chinner
  2011-04-12 10:51   ` Leo Davis
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2011-04-12  9:49 UTC (permalink / raw)
  To: stress_buster; +Cc: xfs

On Tue, Apr 12, 2011 at 02:33:07AM -0700, stress_buster wrote:
> 
> My dmesg output shows the below trace. It repeats over and over again.
> 
> XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1545 of file
> fs/xfs/xfs_alloc.c.  Caller 0xffffffff881a8961
> 
> Call Trace:
>  [<ffffffff881a6e27>] :xfs:xfs_free_ag_extent+0x19e/0x67e
>  [<ffffffff881a8961>] :xfs:xfs_free_extent+0xa9/0xc9
>  [<ffffffff881d96cf>] :xfs:xlog_recover_process_efi+0x112/0x16c
>  [<ffffffff881f31b4>] :xfs:xfs_fs_fill_super+0x0/0x3e4
>  [<ffffffff881da8c2>] :xfs:xlog_recover_process_efis+0x4f/0x8d
>  [<ffffffff881da914>] :xfs:xlog_recover_finish+0x14/0xad
>  [<ffffffff881f31b4>] :xfs:xfs_fs_fill_super+0x0/0x3e4
>  [<ffffffff881df420>] :xfs:xfs_mountfs+0x498/0x5e2
>  [<ffffffff881dfb42>] :xfs:xfs_mru_cache_create+0x113/0x143
>  [<ffffffff881f33b7>] :xfs:xfs_fs_fill_super+0x203/0x3e4
>  [<ffffffff800e544f>] get_sb_bdev+0x10a/0x16c
>  [<ffffffff800e4dec>] vfs_kern_mount+0x93/0x11a
>  [<ffffffff800e4eb5>] do_kern_mount+0x36/0x4d
>  [<ffffffff800ef2ed>] do_mount+0x6a9/0x719
>  [<ffffffff80008d84>] __handle_mm_fault+0x5f2/0xfaa
>  [<ffffffff80022127>] __up_read+0x19/0x7f
>  [<ffffffff80067b88>] do_page_fault+0x4fe/0x874
>  [<ffffffff8012c580>] inode_doinit_with_dentry+0x86/0x47c
>  [<ffffffff800cd378>] zone_statistics+0x3e/0x6d
>  [<ffffffff8000f2ff>] __alloc_pages+0x78/0x308
>  [<ffffffff8004c9fd>] sys_mount+0x8a/0xcd
>  [<ffffffff8005e116>] system_call+0x7e/0x83
> 
> Failed to recover EFIs on filesystem: cciss/c0d0
> XFS: log mount finish failed
> 
> Can someone shed some light on what is happening here?

You have a corrupted free space btree. How it occurred, I have no
idea.

> Also what the next steps I need to take to repair the fs? (assuming my xfs
> fs is corrupted)
> Will running xfs_repair be good enough in this case?

That's all you can do. If it's really important, and you don't have
a backup, I'd suggest mounting with "-o ro,norecovery" and taking a
backup first....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs corruption
  2011-04-12  9:49 ` Dave Chinner
@ 2011-04-12 10:51   ` Leo Davis
  2011-04-12 11:05     ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Leo Davis @ 2011-04-12 10:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 2575 bytes --]

You have a corrupted free space btree.

Err... apologies for my ignorance, but what is a free space btree?

I had serial trace from raid controller which i just checked and it logged some 
'Loose cabling', but this was months back.....
not sure whether that can be the cause of this.. strange if that is the case 
since it's been a long time




________________________________
From: Dave Chinner <david@fromorbit.com>
To: stress_buster <leo1783@yahoo.com>
Cc: xfs@oss.sgi.com
Sent: Tue, April 12, 2011 3:19:17 PM
Subject: Re: fs corruption

On Tue, Apr 12, 2011 at 02:33:07AM -0700, stress_buster wrote:
> 
> My dmesg output shows the below trace. It repeats over and over again.
> 
> XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1545 of file
> fs/xfs/xfs_alloc.c.  Caller 0xffffffff881a8961
> 
> Call Trace:
>  [<ffffffff881a6e27>] :xfs:xfs_free_ag_extent+0x19e/0x67e
>  [<ffffffff881a8961>] :xfs:xfs_free_extent+0xa9/0xc9
>  [<ffffffff881d96cf>] :xfs:xlog_recover_process_efi+0x112/0x16c
>  [<ffffffff881f31b4>] :xfs:xfs_fs_fill_super+0x0/0x3e4
>  [<ffffffff881da8c2>] :xfs:xlog_recover_process_efis+0x4f/0x8d
>  [<ffffffff881da914>] :xfs:xlog_recover_finish+0x14/0xad
>  [<ffffffff881f31b4>] :xfs:xfs_fs_fill_super+0x0/0x3e4
>  [<ffffffff881df420>] :xfs:xfs_mountfs+0x498/0x5e2
>  [<ffffffff881dfb42>] :xfs:xfs_mru_cache_create+0x113/0x143
>  [<ffffffff881f33b7>] :xfs:xfs_fs_fill_super+0x203/0x3e4
>  [<ffffffff800e544f>] get_sb_bdev+0x10a/0x16c
>  [<ffffffff800e4dec>] vfs_kern_mount+0x93/0x11a
>  [<ffffffff800e4eb5>] do_kern_mount+0x36/0x4d
>  [<ffffffff800ef2ed>] do_mount+0x6a9/0x719
>  [<ffffffff80008d84>] __handle_mm_fault+0x5f2/0xfaa
>  [<ffffffff80022127>] __up_read+0x19/0x7f
>  [<ffffffff80067b88>] do_page_fault+0x4fe/0x874
>  [<ffffffff8012c580>] inode_doinit_with_dentry+0x86/0x47c
>  [<ffffffff800cd378>] zone_statistics+0x3e/0x6d
>  [<ffffffff8000f2ff>] __alloc_pages+0x78/0x308
>  [<ffffffff8004c9fd>] sys_mount+0x8a/0xcd
>  [<ffffffff8005e116>] system_call+0x7e/0x83
> 
> Failed to recover EFIs on filesystem: cciss/c0d0
> XFS: log mount finish failed
> 
> Can someone shed some light on what is happening here?

You have a corrupted free space btree. How it occurred, I have no
idea.

> Also what the next steps I need to take to repair the fs? (assuming my xfs
> fs is corrupted)
> Will running xfs_repair be good enough in this case?

That's all you can do. If it's really important, and you don't have
a backup, I'd suggest mounting with "-o ro,norecovery" and taking a
backup first....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

[-- Attachment #1.2: Type: text/html, Size: 3826 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs corruption
  2011-04-12 10:51   ` Leo Davis
@ 2011-04-12 11:05     ` Dave Chinner
  2011-04-12 11:37       ` Emmanuel Florac
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2011-04-12 11:05 UTC (permalink / raw)
  To: Leo Davis; +Cc: xfs

On Tue, Apr 12, 2011 at 03:51:20AM -0700, Leo Davis wrote:
> You have a corrupted free space btree.
> 
> Err... apologies for my ignorance, but what is a free space btree?

A tree that indexes the free space in the filesystem. Every time you
write a file or remove a file you are allocating or freeing space,
and these tree keep track of that free space.

If you want to know - at a high level - how XFS is structured (good
for understanding what a free space tree is), read this paper:

http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html

It's from 1996, but still correct on all the major structural
details.

> I had serial trace from raid controller which i just checked and
> it logged some 'Loose cabling', but this was months back.....  not
> sure whether that can be the cause of this.. strange if that is
> the case since it's been a long time

it's possible that it took a couple of months to trip over a random
metadata corruption. I've seen that before in directory trees and
inode clusters where corruption is not detected until next time they
are read from disk....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs corruption
  2011-04-12 11:05     ` Dave Chinner
@ 2011-04-12 11:37       ` Emmanuel Florac
  0 siblings, 0 replies; 6+ messages in thread
From: Emmanuel Florac @ 2011-04-12 11:37 UTC (permalink / raw)
  Cc: xfs, Leo Davis

Le Tue, 12 Apr 2011 21:05:32 +1000
Dave Chinner <david@fromorbit.com> écrivait:

> it's possible that it took a couple of months to trip over a random
> metadata corruption. I've seen that before in directory trees and
> inode clusters where corruption is not detected until next time they
> are read from disk....
> 

That's why background scrubbing of RAID arrays is generally a good
habit to contract :)

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs corruption
@ 2011-04-25  5:47 Leo Davis
  0 siblings, 0 replies; 6+ messages in thread
From: Leo Davis @ 2011-04-25  5:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 4711 bytes --]

Just to add if it helps- I find this logged by smart array controller:
Corrected ECC Error, Status=0x00000001 Addr=0x060f4e00
 
 

________________________________
From: Leo Davis <leo1783@yahoo.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Sent: Mon, April 25, 2011 9:55:02 AM
Subject: Re: fs corruption


Thank you for that :).

However,I've run into another fs corruption issue on my other server. I just 
thought I would use the same thread rather than opening new.

I was troublehooting a weird fiber channel issue ( logins going missing to my 
storage) when I noticed these backtraces in dmesg. 


Filesystem "cciss/c3d1p1": XFS internal error xfs_btree_check_lblock at line 186 
of file fs/xfs/xfs_btree.c. Caller 0xffffffff881b92d6
Call Trace:
[<ffffffff881bce83>] :xfs:xfs_btree_check_lblock+0xf4/0xfe
[<ffffffff881b92d6>] :xfs:xfs_bmbt_lookup+0x159/0x420
[<ffffffff881b41cc>] :xfs:xfs_bmap_add_extent_delay_real+0x62a/0x103a
[<ffffffff881a8cfa>] :xfs:xfs_alloc_vextent+0x379/0x3ff
[<ffffffff881b543a>] :xfs:xfs_bmap_add_extent+0x1fb/0x390
[<ffffffff881b7f34>] :xfs:xfs_bmapi+0x895/0xe79
[<ffffffff881d4082>] :xfs:xfs_iomap_write_allocate+0x201/0x328
[<ffffffff881d4b09>] :xfs:xfs_iomap+0x22a/0x2a5
[<ffffffff881e9ae3>] :xfs:xfs_map_blocks+0x2d/0x65
[<ffffffff881ea723>] :xfs:xfs_page_state_convert+0x2af/0x544
[<ffffffff881eab04>] :xfs:xfs_vm_writepage+0xa7/0xdf
[<ffffffff8001cef2>] mpage_writepages+0x1bf/0x37d
[<ffffffff881eaa5d>] :xfs:xfs_vm_writepage+0x0/0xdf
[<ffffffff8005b1ea>] do_writepages+0x20/0x2f
[<ffffffff8005000e>] __filemap_fdatawrite_range+0x50/0x5b
[<ffffffff80050717>] do_fsync+0x2f/0xa4
[<ffffffff800e1ce9>] __do_fsync+0x23/0x36
[<ffffffff8005e116>] system_call+0x7e/0x83
Filesystem "cciss/c3d1p1": XFS internal error xfs_trans_cancel at line 1164 of 
file fs/xfs/xfs_trans.c. Caller 0xffffffff881d4186
Call Trace:
[<ffffffff881e1b37>] :xfs:xfs_trans_cancel+0x55/0xfa
[<ffffffff881d4186>] :xfs:xfs_iomap_write_allocate+0x305/0x328
[<ffffffff881d4b09>] :xfs:xfs_iomap+0x22a/0x2a5
[<ffffffff881e9ae3>] :xfs:xfs_map_blocks+0x2d/0x65
[<ffffffff881ea723>] :xfs:xfs_page_state_convert+0x2af/0x544
[<ffffffff881eab04>] :xfs:xfs_vm_writepage+0xa7/0xdf
[<ffffffff8001cef2>] mpage_writepages+0x1bf/0x37d
[<ffffffff881eaa5d>] :xfs:xfs_vm_writepage+0x0/0xdf
[<ffffffff8005b1ea>] do_writepages+0x20/0x2f
[<ffffffff8005000e>] __filemap_fdatawrite_range+0x50/0x5b
[<ffffffff80050717>] do_fsync+0x2f/0xa4
[<ffffffff800e1ce9>] __do_fsync+0x23/0x36
[<ffffffff8005e116>] system_call+0x7e/0x83
xfs_force_shutdown(cciss/c3d1p1,0x8) called from line 1165 of file 
fs/xfs/xfs_trans.c. Return address = 0xffffffff881e1b50
Filesystem "cciss/c3d1p1": Corruption of in-memory data detected. Shutting down 
filesystem: cciss/c3d1p1
Please umount the filesystem, and rectify the problem(s)
Filesystem "cciss/c3d1p1": xfs_log_force: error 5 returned.
Filesystem "cciss/c3d1p1": xfs_log_force: error 5 returned.
 

Any thoughts on what the root cause might be?
- I've checked the underlying drives, array controller etc and all looks 
healthy; (indicating it is a fs issue for sure?)
I did the xfs_repair which corrected the issue but I'm worried as to how fs 
ended up in this state, this being a production box.

Thanks in advance.




________________________________
From: Dave Chinner <david@fromorbit.com>
To: Leo Davis <leo1783@yahoo.com>
Cc: xfs@oss.sgi.com
Sent: Tue, April 12, 2011 4:35:32 PM
Subject: Re: fs corruption

On Tue, Apr 12, 2011 at 03:51:20AM -0700, Leo Davis wrote:
> You have a corrupted free space btree.
> 
> Err... apologies for my ignorance, but what is a free space btree?

A tree that indexes the free space in the filesystem. Every time you
write a file or remove a file you are allocating or freeing space,
and these tree keep track of that free space.

If you want to know - at a high level - how XFS is structured (good
for understanding what a free space tree is), read this paper:

http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html

It's from 1996, but still correct on all the major structural
details.

> I had serial trace from raid controller which i just checked and
> it logged some 'Loose cabling', but this was months back.....  not
> sure whether that can be the cause of this.. strange if that is
> the case since it's been a long time

it's possible that it took a couple of months to trip over a random
metadata corruption. I've seen that before in directory trees and
inode clusters where corruption is not detected until next time they
are read from disk....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

[-- Attachment #1.2: Type: text/html, Size: 6820 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-04-25  5:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-12  9:33 fs corruption stress_buster
2011-04-12  9:49 ` Dave Chinner
2011-04-12 10:51   ` Leo Davis
2011-04-12 11:05     ` Dave Chinner
2011-04-12 11:37       ` Emmanuel Florac
  -- strict thread matches above, loose matches on Subject: below --
2011-04-25  5:47 Leo Davis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox