All of lore.kernel.org
 help / color / mirror / Atom feed
* fs corruption
@ 2011-04-12  9:33 stress_buster
  2011-04-12  9:49 ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: stress_buster @ 2011-04-12  9:33 UTC (permalink / raw)
  To: xfs


My dmesg output shows the below trace. It repeats over and over again.

XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1545 of file
fs/xfs/xfs_alloc.c.  Caller 0xffffffff881a8961

Call Trace:
 [<ffffffff881a6e27>] :xfs:xfs_free_ag_extent+0x19e/0x67e
 [<ffffffff881a8961>] :xfs:xfs_free_extent+0xa9/0xc9
 [<ffffffff881d96cf>] :xfs:xlog_recover_process_efi+0x112/0x16c
 [<ffffffff881f31b4>] :xfs:xfs_fs_fill_super+0x0/0x3e4
 [<ffffffff881da8c2>] :xfs:xlog_recover_process_efis+0x4f/0x8d
 [<ffffffff881da914>] :xfs:xlog_recover_finish+0x14/0xad
 [<ffffffff881f31b4>] :xfs:xfs_fs_fill_super+0x0/0x3e4
 [<ffffffff881df420>] :xfs:xfs_mountfs+0x498/0x5e2
 [<ffffffff881dfb42>] :xfs:xfs_mru_cache_create+0x113/0x143
 [<ffffffff881f33b7>] :xfs:xfs_fs_fill_super+0x203/0x3e4
 [<ffffffff800e544f>] get_sb_bdev+0x10a/0x16c
 [<ffffffff800e4dec>] vfs_kern_mount+0x93/0x11a
 [<ffffffff800e4eb5>] do_kern_mount+0x36/0x4d
 [<ffffffff800ef2ed>] do_mount+0x6a9/0x719
 [<ffffffff80008d84>] __handle_mm_fault+0x5f2/0xfaa
 [<ffffffff80022127>] __up_read+0x19/0x7f
 [<ffffffff80067b88>] do_page_fault+0x4fe/0x874
 [<ffffffff8012c580>] inode_doinit_with_dentry+0x86/0x47c
 [<ffffffff800cd378>] zone_statistics+0x3e/0x6d
 [<ffffffff8000f2ff>] __alloc_pages+0x78/0x308
 [<ffffffff8004c9fd>] sys_mount+0x8a/0xcd
 [<ffffffff8005e116>] system_call+0x7e/0x83

Failed to recover EFIs on filesystem: cciss/c0d0
XFS: log mount finish failed

Can someone shed some light on what is happening here?

Also what the next steps I need to take to repair the fs? (assuming my xfs
fs is corrupted)
Will running xfs_repair be good enough in this case?

Thanks in advance
-- 
View this message in context: http://old.nabble.com/fs-corruption-tp31377534p31377534.html
Sent from the Xfs - General mailing list archive at Nabble.com.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: fs corruption
@ 2011-04-25  5:47 Leo Davis
  0 siblings, 0 replies; 7+ messages in thread
From: Leo Davis @ 2011-04-25  5:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 4711 bytes --]

Just to add if it helps- I find this logged by smart array controller:
Corrected ECC Error, Status=0x00000001 Addr=0x060f4e00
 
 

________________________________
From: Leo Davis <leo1783@yahoo.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Sent: Mon, April 25, 2011 9:55:02 AM
Subject: Re: fs corruption


Thank you for that :).

However,I've run into another fs corruption issue on my other server. I just 
thought I would use the same thread rather than opening new.

I was troublehooting a weird fiber channel issue ( logins going missing to my 
storage) when I noticed these backtraces in dmesg. 


Filesystem "cciss/c3d1p1": XFS internal error xfs_btree_check_lblock at line 186 
of file fs/xfs/xfs_btree.c. Caller 0xffffffff881b92d6
Call Trace:
[<ffffffff881bce83>] :xfs:xfs_btree_check_lblock+0xf4/0xfe
[<ffffffff881b92d6>] :xfs:xfs_bmbt_lookup+0x159/0x420
[<ffffffff881b41cc>] :xfs:xfs_bmap_add_extent_delay_real+0x62a/0x103a
[<ffffffff881a8cfa>] :xfs:xfs_alloc_vextent+0x379/0x3ff
[<ffffffff881b543a>] :xfs:xfs_bmap_add_extent+0x1fb/0x390
[<ffffffff881b7f34>] :xfs:xfs_bmapi+0x895/0xe79
[<ffffffff881d4082>] :xfs:xfs_iomap_write_allocate+0x201/0x328
[<ffffffff881d4b09>] :xfs:xfs_iomap+0x22a/0x2a5
[<ffffffff881e9ae3>] :xfs:xfs_map_blocks+0x2d/0x65
[<ffffffff881ea723>] :xfs:xfs_page_state_convert+0x2af/0x544
[<ffffffff881eab04>] :xfs:xfs_vm_writepage+0xa7/0xdf
[<ffffffff8001cef2>] mpage_writepages+0x1bf/0x37d
[<ffffffff881eaa5d>] :xfs:xfs_vm_writepage+0x0/0xdf
[<ffffffff8005b1ea>] do_writepages+0x20/0x2f
[<ffffffff8005000e>] __filemap_fdatawrite_range+0x50/0x5b
[<ffffffff80050717>] do_fsync+0x2f/0xa4
[<ffffffff800e1ce9>] __do_fsync+0x23/0x36
[<ffffffff8005e116>] system_call+0x7e/0x83
Filesystem "cciss/c3d1p1": XFS internal error xfs_trans_cancel at line 1164 of 
file fs/xfs/xfs_trans.c. Caller 0xffffffff881d4186
Call Trace:
[<ffffffff881e1b37>] :xfs:xfs_trans_cancel+0x55/0xfa
[<ffffffff881d4186>] :xfs:xfs_iomap_write_allocate+0x305/0x328
[<ffffffff881d4b09>] :xfs:xfs_iomap+0x22a/0x2a5
[<ffffffff881e9ae3>] :xfs:xfs_map_blocks+0x2d/0x65
[<ffffffff881ea723>] :xfs:xfs_page_state_convert+0x2af/0x544
[<ffffffff881eab04>] :xfs:xfs_vm_writepage+0xa7/0xdf
[<ffffffff8001cef2>] mpage_writepages+0x1bf/0x37d
[<ffffffff881eaa5d>] :xfs:xfs_vm_writepage+0x0/0xdf
[<ffffffff8005b1ea>] do_writepages+0x20/0x2f
[<ffffffff8005000e>] __filemap_fdatawrite_range+0x50/0x5b
[<ffffffff80050717>] do_fsync+0x2f/0xa4
[<ffffffff800e1ce9>] __do_fsync+0x23/0x36
[<ffffffff8005e116>] system_call+0x7e/0x83
xfs_force_shutdown(cciss/c3d1p1,0x8) called from line 1165 of file 
fs/xfs/xfs_trans.c. Return address = 0xffffffff881e1b50
Filesystem "cciss/c3d1p1": Corruption of in-memory data detected. Shutting down 
filesystem: cciss/c3d1p1
Please umount the filesystem, and rectify the problem(s)
Filesystem "cciss/c3d1p1": xfs_log_force: error 5 returned.
Filesystem "cciss/c3d1p1": xfs_log_force: error 5 returned.
 

Any thoughts on what the root cause might be?
- I've checked the underlying drives, array controller etc and all looks 
healthy; (indicating it is a fs issue for sure?)
I did the xfs_repair which corrected the issue but I'm worried as to how fs 
ended up in this state, this being a production box.

Thanks in advance.




________________________________
From: Dave Chinner <david@fromorbit.com>
To: Leo Davis <leo1783@yahoo.com>
Cc: xfs@oss.sgi.com
Sent: Tue, April 12, 2011 4:35:32 PM
Subject: Re: fs corruption

On Tue, Apr 12, 2011 at 03:51:20AM -0700, Leo Davis wrote:
> You have a corrupted free space btree.
> 
> Err... apologies for my ignorance, but what is a free space btree?

A tree that indexes the free space in the filesystem. Every time you
write a file or remove a file you are allocating or freeing space,
and these tree keep track of that free space.

If you want to know - at a high level - how XFS is structured (good
for understanding what a free space tree is), read this paper:

http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html

It's from 1996, but still correct on all the major structural
details.

> I had serial trace from raid controller which i just checked and
> it logged some 'Loose cabling', but this was months back.....  not
> sure whether that can be the cause of this.. strange if that is
> the case since it's been a long time

it's possible that it took a couple of months to trip over a random
metadata corruption. I've seen that before in directory trees and
inode clusters where corruption is not detected until next time they
are read from disk....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

[-- Attachment #1.2: Type: text/html, Size: 6820 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread
* FS corruption.
@ 2002-07-01 13:35 Joakim Tjernlund
  0 siblings, 0 replies; 7+ messages in thread
From: Joakim Tjernlund @ 2002-07-01 13:35 UTC (permalink / raw)
  To: MTD

Hi 

I just got a FS crash/corruption a one of our boards. I am running the stable jffs2 branch
from late April. Any idea's what caused this?

 Jocke 

> Setting Target eth0 IP address: a000108.
> NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
> LLC 2.0 by Procom, 1997, Arnaldo C. Melo, 2001
> NET4.0 IEEE 802.2 extended support
> NET4.0 IEEE 802.2 User Interface SAPs, Jay Schulist, 2001
> JFFS2: Total scan time: 5.97 sec
> Eep. Child "S40umountfs" (ino #245) of dir ino #13 doesn't exist!
> Eep. Child "telnet" (ino #568) of dir ino #65 doesn't exist!
> Eep. Child "ansi" (ino #765) of dir ino #78 doesn't exist!
> Eep. Child "ansis" (ino #804) of dir ino #78 doesn't exist!
> Eep. Child "ansi80x60-mono" (ino #778) of dir ino #78 doesn't exist!
> Eep. Child "ansi80x30" (ino #771) of dir ino #78 doesn't exist!
> Eep. Child "ansisysk" (ino #812) of dir ino #78 doesn't exist!
> Eep. Child "ACT" (ino #1206) of dir ino #91 doesn't exist!
> Eep. Child "Yancowinna" (ino #1215) of dir ino #91 doesn't exist!
> Eep. Child "etc" (ino #1594) of dir ino #1560 doesn't exist!
> Eep. Child "liblockfile.so.1.0" (ino #671) of dir ino #66 doesn't exist!
> Eep. Child "crt1.o" (ino #637) of dir ino #66 doesn't exist!
> Eep. Child "libe2p.so.2.3" (ino #622) of dir ino #66 doesn't exist!
> Eep. Child "libpopt.so.0.0.0" (ino #694) of dir ino #66 doesn't exist!
> Eep. Child "alib_swusd" (ino #1629) of dir ino #1561 doesn't exist!
> Eep. Child "date" (ino #175) of dir ino #2 doesn't exist!
> Eep. Child "domainname" (ino #163) of dir ino #2 doesn't exist!
> Eep. Child "vt52" (ino #881) of dir ino #80 doesn't exist!
> Eep. Child "vt125" (ino #840) of dir ino #80 doesn't exist!
> Eep. Child "vt200-old" (ino #847) of dir ino #80 doesn't exist!
> Eep. Child "vt100-vb" (ino #832) of dir ino #80 doesn't exist!
> Eep. Child "vt320-w-nam" (ino #867) of dir ino #80 doesn't exist!
> Eep. Child "Saskatchewan" (ino #1227) of dir ino #93 doesn't exist!
> Eep. Child "mtp_cm_bl_shell" (ino #490) of dir ino #55 doesn't exist!
> Eep. Child "fd" (ino #193) of dir ino #4 doesn't exist!
> Eep. Child "Greenwich" (ino #1270) of dir ino #95 doesn't exist!
> Eep. Child "installComplete" (ino #1759) of dir ino #1707 doesn't exist!
> Eep. Child "S10syslog" (ino #260) of dir ino #18 doesn't exist!
> Eep. Child "MT5634ZLX.dat" (ino #367) of dir ino #44 doesn't exist!
> Eep. Child "3CXEM556.dat" (ino #361) of dir ino #44 doesn't exist!
> Eep. Child "te_server" (ino #1736) of dir ino #1708 doesn't exist!
> Eep. Child "mtp_cm_bl" (ino #1715) of dir ino #1708 doesn't exist!
> Eep. Child "K30lumentis" (ino #264) of dir ino #19 doesn't exist!
> Eep. Child "snmpd.conf" (ino #384) of dir ino #45 doesn't exist!
> Eep. Child "Asuncion" (ino #977) of dir ino #84 doesn't exist!
> Eep. Child "Cayman" (ino #990) of dir ino #84 doesn't exist!
> Eep. Child "Inuvik" (ino #1015) of dir ino #84 doesn't exist!
> Eep. Child "motd" (ino #279) of dir ino #7 doesn't exist!
> Eep. Child "group" (ino #284) of dir ino #7 doesn't exist!
> Eep. Child "anacrontab" (ino #338) of dir ino #7 doesn't exist!
> Eep. Child "minicom.users" (ino #343) of dir ino #7 doesn't exist!
> Eep. Child "inetd.conf" (ino #348) of dir ino #7 doesn't exist!
> Eep. Child "nsswitch.conf" (ino #305) of dir ino #7 doesn't exist!
> Eep. Child "passwd" (ino #283) of dir ino #7 doesn't exist!
> Eep. Child "inetd.conf~" (ino #390) of dir ino #7 doesn't exist!
> Eep. Child "pam_env.conf" (ino #319) of dir ino #35 doesn't exist!
> Eep. Child "libctutils.so.0" (ino #403) of dir ino #48 doesn't exist!
> Eep. Child "halt" (ino #555) of dir ino #61 doesn't exist!
> Eep. Child "rarp" (ino #550) of dir ino #61 doesn't exist!
> Eep. Child "klogd" (ino #553) of dir ino #61 doesn't exist!
> Eep. Child "inetd" (ino #731) of dir ino #74 doesn't exist!
> Eep. Child "ifupdown" (ino #228) of dir ino #11 doesn't exist!
> Eep. Child "lumentis~" (ino #241) of dir ino #11 doesn't exist!
> Eep. Child "skeleton" (ino #222) of dir ino #11 doesn't exist!
> Eep. Child "checkfs.sh" (ino #209) of dir ino #11 doesn't exist!
> Eep. Child "devfsd" (ino #291) of dir ino #24 doesn't exist!
> Eep. Child "Dhaka" (ino #1183) of dir ino #89 doesn't exist!
> Eep. Child "Samarkand" (ino #1156) of dir ino #89 doesn't exist!
> Eep. Child "Novosibirsk" (ino #1148) of dir ino #89 doesn't exist!
> Eep. Child "linux-lat" (ino #762) of dir ino #77 doesn't exist!
> Eep. Child "Bermuda" (ino #1186) of dir ino #90 doesn't exist!
> Eep. Child "current" (ino #1548) of dir ino #1533 doesn't exist!
> Eep. Child "previous" (ino #1550) of dir ino #1533 doesn't exist!
> VFS: Mounted root (jffs2 filesystem).
> Freeing unused kernel memory: 52k init 4k openfirmware
> INIT: version 2.78 booting
> jffs2_read_inode() on nonexistent ino 305
> jffs2_read_inode() on nonexistent ino 283
> Fast boot, no file system check
> none on /dev/shm type shm (rw)
> ramfs on /tmp type ramfs (rw)
> ramfs on /var/run type ramfs (rw)
> ramfs on /tftpboot type ramfs (rw)
> jffs2_read_inode() on nonexistent ino 228
> Enabling packet forwarding: done.
> Configuring network interfaces: /sbin/ifup: interface lo already configured
> done.
> Cleaning: /tmp /var/lock /var/runjffs2_read_inode() on nonexistent ino 284
> grep: /etc/group: Input/output error
> .
> INIT: Entering runlevel: 2
> jffs2_read_inode() on nonexistent ino 553
> jffs2_read_inode() on nonexistent ino 731
> Starting Lumentis Main Script: /opt/appl/next is a link
> /opt/appl/tuappl01a-r1b-020627_1
> jffs2_read_inode() on nonexistent ino 1736
> Could not find path... Use /usr/local ....
> Starting lumentis UP daemons: te_server start-stop-daemon: stat
> /usr/local/bin/t
> e_server: No such file or directory
> te_log start-stop-daemon: stat /usr/local/bin/te_log: No such file or
> directory
> te start-stop-daemon: stat /usr/local/bin/te: No such file or directory
> ntpd start-stop-daemon: stat /usr/local/bin/ntpd: No such file or directory
> alib_psupd start-stop-daemon: stat /usr/local/bin/alib_psupd: No such file
> or di
> rectory
> icn_server start-stop-daemon: stat /usr/local/bin/icn_server: No such file
> or di
> rectory
> upc_upc_bl start-stop-daemon: stat /usr/local/bin/upc_upc_bl: No such file
> or di
> rectory
> tosv_server start-stop-daemon: stat /usr/local/bin/tosv_server: No such file
> or
> directory
> tosv_supervisor start-stop-daemon: stat /usr/local/bin/tosv_supervisor: No
> such
> file or directory
> swusd start-stop-daemon: stat /usr/local/bin/alib_swusd: No such file or
> directo
> ry
> .
> Done.
> 
> up1_8 login:
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-04-25  5:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-12  9:33 fs corruption stress_buster
2011-04-12  9:49 ` Dave Chinner
2011-04-12 10:51   ` Leo Davis
2011-04-12 11:05     ` Dave Chinner
2011-04-12 11:37       ` Emmanuel Florac
  -- strict thread matches above, loose matches on Subject: below --
2011-04-25  5:47 Leo Davis
2002-07-01 13:35 FS corruption Joakim Tjernlund

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.