* Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO) @ 2008-02-25 22:20 slaton 2008-02-25 22:40 ` Eric Sandeen 0 siblings, 1 reply; 9+ messages in thread From: slaton @ 2008-02-25 22:20 UTC (permalink / raw) To: xfs-oss A RAID5 (3ware card w/ 8 drive cage) filesystem on our cluster login node shut down the other night with this error: kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1563 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff8812b3a3 kernel: Call Trace: [<ffffffff88129713>] [<ffffffff8812b3a3>] kernel: [<ffffffff88150df7>] [<ffffffff8816af8a>] [<ffffffff88137d6c>] kernel: [<ffffffff88157d25>] [<ffffffff8816ed1c>] [<ffffffff811051fa>] kernel: [<ffffffff8817a5b2>] [<ffffffff8102c988>] [<ffffffff882566ee>] kernel: [<ffffffff8825ba4d>] [<ffffffff8825170a>] [<ffffffff881a379e>] kernel: [<ffffffff882512da>] [<ffffffff882514a0>] [<ffffffff810604e2>] kernel: [<ffffffff882512da>] [<ffffffff882512da>] [<ffffffff810604da>] kernel: xfs_force_shutdown(sda1,0x8) called from line 4091 of file fs/xfs/xfs_bmap.c. Return address = 0xffffffff88137daf kernel: Filesystem "sda1": Corruption of in-memory data detected. Shutting down filesystem: sda1 kernel: Please umount the filesystem, and rectify the problem(s) kernel: nfsd: non-standard errno: -990 System hung upon attempting to umount the volume. Have not yet rebooted. Some additional info: - Server arch is x86_64 (smp). - Distro is caos2 linux, kernel 2.6.17 (smp). 2.6.23 pkg is also available. - Kernel not compiled with CONFIG_4KSTACKS=y. - xfsprogs package is xfsprogs-2.6.13 Memtest86 is running now - no errors yet reported. After doing some searches, once this occurs it appears to repeat with increasing frequency, and i did read of a number of folks losing all data. There also appear to be issues related to using some older kernels and xfsprogs. What kernel and xfsprogs version do you recommend i proceed with, before i attempt to remount or run xfs_repair? Any alternate suggestions for recovery, and how to prevent this from recurring? thanks for any help slaton ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO) 2008-02-25 22:20 Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO) slaton @ 2008-02-25 22:40 ` Eric Sandeen 2008-02-26 7:54 ` slaton 0 siblings, 1 reply; 9+ messages in thread From: Eric Sandeen @ 2008-02-25 22:40 UTC (permalink / raw) To: slaton; +Cc: xfs-oss slaton wrote: > A RAID5 (3ware card w/ 8 drive cage) filesystem on our cluster login node > shut down the other night with this error: > > kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1563 of file > fs/xfs/xfs_alloc.c. Caller 0xffffffff8812b3a3 > kernel: Call Trace: [<ffffffff88129713>] [<ffffffff8812b3a3>] > kernel: [<ffffffff88150df7>] [<ffffffff8816af8a>] > [<ffffffff88137d6c>] > kernel: [<ffffffff88157d25>] [<ffffffff8816ed1c>] > [<ffffffff811051fa>] > kernel: [<ffffffff8817a5b2>] [<ffffffff8102c988>] > [<ffffffff882566ee>] > kernel: [<ffffffff8825ba4d>] [<ffffffff8825170a>] > [<ffffffff881a379e>] > kernel: [<ffffffff882512da>] [<ffffffff882514a0>] > [<ffffffff810604e2>] > kernel: [<ffffffff882512da>] [<ffffffff882512da>] > [<ffffffff810604da>] > kernel: xfs_force_shutdown(sda1,0x8) called from line 4091 of file > fs/xfs/xfs_bmap.c. Return address = 0xffffffff88137daf > kernel: Filesystem "sda1": Corruption of in-memory data detected. > Shutting down filesystem: sda1 > kernel: Please umount the filesystem, and rectify the problem(s) > kernel: nfsd: non-standard errno: -990 > > System hung upon attempting to umount the volume. Have not yet rebooted. > > Some additional info: > > - Server arch is x86_64 (smp). > > - Distro is caos2 linux, kernel 2.6.17 (smp). 2.6.23 pkg is also > available. ksymoops might be good so we can see what the actual backtrace was. Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ? -Eric ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO) 2008-02-25 22:40 ` Eric Sandeen @ 2008-02-26 7:54 ` slaton 2008-02-27 22:44 ` slaton 0 siblings, 1 reply; 9+ messages in thread From: slaton @ 2008-02-26 7:54 UTC (permalink / raw) To: xfs-oss [-- Attachment #1: Type: TEXT/PLAIN, Size: 1049 bytes --] Thanks for the reply. > Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ? Presumably not - i'm using 2.6.17.11, and that information indicates the bug was fixed in 2.6.17.7. I've attached the output from running ksymoops on messages.1. First crash/trace (Feb 21 19:xx) corresponds to the original XFS event; the second (Feb 22 15:xx) is the system going down when i tried to unmount the volume. Here are the additional syslog msgs corresponding to the Feb 22 15:xx crash. Feb 22 15:47:13 qln01 kernel: grsec: From 10.0.2.93: unmount of /dev/sda1 by /bin/umount[umount:18604] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:31972] uid/euid:0/0 gid/egid:0/0 Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from line 338 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff88173ce4 Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from line 338 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff88173ce4 Feb 22 15:47:28 qln01 kernel: BUG: soft lockup detected on CPU#0! thanks slaton [-- Attachment #2: Type: TEXT/plain, Size: 6373 bytes --] ksymoops 2.4.9 on x86_64 2.6.17.11-102.caos.smp. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.6.17.11-102.caos.smp/ (default) -m /usr/src/linux/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Error (regular_file): read_ksyms stat /proc/ksyms failed No modules in ksyms, skipping objects No ksyms, skipping lsmod Error (regular_file): read_system_map stat /usr/src/linux/System.map failed Warning (merge_maps): no symbols in merged map Feb 19 11:44:27 qln01 kernel: Machine check events logged Feb 19 16:44:24 qln01 kernel: Machine check events logged Feb 19 18:39:23 qln01 kernel: Machine check events logged Feb 19 21:09:22 qln01 kernel: Machine check events logged Feb 19 23:49:20 qln01 kernel: Machine check events logged Feb 20 02:29:19 qln01 kernel: Machine check events logged Feb 20 14:24:12 qln01 kernel: Machine check events logged Feb 20 19:29:10 qln01 kernel: Machine check events logged Feb 21 19:00:58 qln01 kernel: Call Trace: [<ffffffff88129713>] [<ffffffff8812b3a3>] Feb 21 19:00:58 qln01 kernel: [<ffffffff88150df7>] [<ffffffff8816af8a>] [<ffffffff88137d6c>] Feb 21 19:00:58 qln01 kernel: [<ffffffff88157d25>] [<ffffffff8816ed1c>] [<ffffffff811051fa>] Feb 21 19:00:58 qln01 kernel: [<ffffffff8817a5b2>] [<ffffffff8102c988>] [<ffffffff882566ee>] Feb 21 19:00:58 qln01 kernel: [<ffffffff8825ba4d>] [<ffffffff8825170a>] [<ffffffff881a379e>] Feb 21 19:00:58 qln01 kernel: [<ffffffff882512da>] [<ffffffff882514a0>] [<ffffffff810604e2>] Feb 21 19:00:58 qln01 kernel: [<ffffffff882512da>] [<ffffffff882512da>] [<ffffffff810604da>] Warning (Oops_read): Code line not seen, dumping what data is available Trace; ffffffff88129713 No symbols available Trace; ffffffff8812b3a3 No symbols available Trace; ffffffff88150df7 No symbols available Trace; ffffffff8816af8a No symbols available Trace; ffffffff88137d6c No symbols available Trace; ffffffff88157d25 No symbols available Trace; ffffffff8816ed1c No symbols available Trace; ffffffff811051fa No symbols available Trace; ffffffff8817a5b2 No symbols available Trace; ffffffff8102c988 No symbols available Trace; ffffffff882566ee No symbols available Trace; ffffffff8825ba4d No symbols available Trace; ffffffff8825170a No symbols available Trace; ffffffff881a379e No symbols available Trace; ffffffff882512da No symbols available Trace; ffffffff882514a0 No symbols available Trace; ffffffff810604e2 No symbols available Trace; ffffffff882512da No symbols available Trace; ffffffff882512da No symbols available Trace; ffffffff810604da No symbols available Feb 22 15:08:10 qln01 kernel: e1000: eth2: e1000_watchdog_task: NIC Link is Down Feb 22 15:47:28 qln01 kernel: Call Trace: <IRQ> [<ffffffff810aef1a>] [<ffffffff88158f7a>] Feb 22 15:47:28 qln01 kernel: [<ffffffff8108c245>] [<ffffffff810737dd>] [<ffffffff81073842>] Feb 22 15:47:28 qln01 kernel: [<ffffffff8106018c>] <EOI> [<ffffffff88158f7a>] [<ffffffff88158f7a>] Feb 22 15:47:28 qln01 kernel: [<ffffffff81064ae6>] [<ffffffff81064ab7>] [<ffffffff8817277f>] Feb 22 15:47:28 qln01 kernel: [<ffffffff88158f7a>] [<ffffffff881677df>] [<ffffffff8816d77a>] Feb 22 15:47:28 qln01 kernel: [<ffffffff8817c656>] [<ffffffff810c3bdf>] [<ffffffff810c48de>] Feb 22 15:47:28 qln01 kernel: [<ffffffff810c3b2d>] [<ffffffff810cc0b7>] [<ffffffff8101124f>] Feb 22 15:47:28 qln01 kernel: [<ffffffff81060329>] [<ffffffff8105f452>] Warning (Oops_read): Code line not seen, dumping what data is available Trace; ffffffff8108c245 No symbols available Trace; ffffffff810737dd No symbols available Trace; ffffffff81073842 No symbols available Trace; ffffffff8106018c No symbols available Trace; ffffffff81064ae6 No symbols available Trace; ffffffff81064ab7 No symbols available Trace; ffffffff8817277f No symbols available Trace; ffffffff88158f7a No symbols available Trace; ffffffff881677df No symbols available Trace; ffffffff8816d77a No symbols available Trace; ffffffff8817c656 No symbols available Trace; ffffffff810c3bdf No symbols available Trace; ffffffff810c48de No symbols available Trace; ffffffff810c3b2d No symbols available Trace; ffffffff810cc0b7 No symbols available Trace; ffffffff8101124f No symbols available Trace; ffffffff81060329 No symbols available Trace; ffffffff8105f452 No symbols available Feb 25 21:47:52 qln01 kernel: CPU 0: aperture @ 770000000 size 32 MB Feb 25 21:47:52 qln01 kernel: CPU 1: Syncing TSC to CPU 0. Feb 25 21:47:52 qln01 kernel: CPU 1: synchronized TSC with CPU 0 (last diff 5 cycles, maxerr 1077 cycles) Feb 25 21:47:52 qln01 kernel: testing NMI watchdog ... OK. Feb 25 21:47:55 qln01 kernel: e1000: 0000:02:03.0: e1000_probe: (PCI:66MHz:32-bit) 00:d0:68:06:b0:5e Feb 25 21:47:55 qln01 kernel: e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection Feb 25 21:47:55 qln01 kernel: e1000: 0000:02:04.0: e1000_probe: (PCI:66MHz:32-bit) 00:d0:68:06:b0:5f Feb 25 21:47:55 qln01 kernel: e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection Feb 25 21:47:55 qln01 kernel: e1000: 0000:01:01.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:04:23:a8:ac:78 Feb 25 21:47:55 qln01 kernel: e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection Feb 25 21:47:55 qln01 kernel: e1000: 0000:01:01.1: e1000_probe: (PCI-X:133MHz:64-bit) 00:04:23:a8:ac:79 Feb 25 21:47:55 qln01 kernel: e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection Feb 25 21:47:55 qln01 kernel: e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex Feb 25 21:47:55 qln01 kernel: e1000: eth1: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex Feb 25 21:47:55 qln01 kernel: lo: Disabled Privacy Extensions Feb 25 21:48:27 qln01 kernel: e1000: eth2: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex 4 warnings and 2 errors issued. Results may not be reliable. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO) 2008-02-26 7:54 ` slaton @ 2008-02-27 22:44 ` slaton 2008-02-28 3:16 ` Barry Naujok 0 siblings, 1 reply; 9+ messages in thread From: slaton @ 2008-02-27 22:44 UTC (permalink / raw) To: xfs-oss Hi, I'm still hoping for some help with this. Is any more information needed in addition to the ksymoops output previously posted? In particular i'd like to know if just remounting the filesystem (to replay the journal), then unmounting and running xfs_repair is the best course of action. In addition, i'd like to know what recommended kernel/xfsprogs versions to use for best results. thanks slaton Slaton Lipscomb Nogales Lab, Howard Hughes Medical Institute http://cryoem.berkeley.edu On Mon, 25 Feb 2008, slaton wrote: > Thanks for the reply. > > > Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ? > > Presumably not - i'm using 2.6.17.11, and that information indicates the > bug was fixed in 2.6.17.7. > > I've attached the output from running ksymoops on messages.1. First > crash/trace (Feb 21 19:xx) corresponds to the original XFS event; the > second (Feb 22 15:xx) is the system going down when i tried to unmount the > volume. > > Here are the additional syslog msgs corresponding to the Feb 22 15:xx > crash. > > Feb 22 15:47:13 qln01 kernel: grsec: From 10.0.2.93: unmount of /dev/sda1 > by /bin/umount[umount:18604] uid/euid:0/0 gid/egid:0/0, parent > /bin/bash[bash:31972] uid/euid:0/0 gid/egid:0/0 > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from > line 338 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff88173ce4 > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from > line 338 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff88173ce4 > Feb 22 15:47:28 qln01 kernel: BUG: soft lockup detected on CPU#0! > > thanks > slaton ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO) 2008-02-27 22:44 ` slaton @ 2008-02-28 3:16 ` Barry Naujok 2008-03-01 4:09 ` slaton 2008-03-04 1:29 ` slaton 0 siblings, 2 replies; 9+ messages in thread From: Barry Naujok @ 2008-02-28 3:16 UTC (permalink / raw) To: slaton, xfs-oss On Thu, 28 Feb 2008 09:44:04 +1100, slaton <slaton@berkeley.edu> wrote: > Hi, > > I'm still hoping for some help with this. Is any more information needed > in addition to the ksymoops output previously posted? > > In particular i'd like to know if just remounting the filesystem (to > replay the journal), then unmounting and running xfs_repair is the best > course of action. In addition, i'd like to know what recommended > kernel/xfsprogs versions to use for best results. I would get xfsprogs 2.9.4 (2.9.6 is not a good version with your kernel), ftp://oss.sgi.com/projects/xfs/previous/cmd_tars/xfsprogs_2.9.4-1.tar.gz To be on the safe side, either make an entire copy of your drive to another device, or run "xfs_metadump -o /dev/sda1" to capture a metadata (no file data) of your filesystem. Then run xfs_repair (mount/unmount maybe required if the log is dirty). If the filesystem is in a bad state after the repair (eg. everything in lost+found), email the xfs_repair log and request further advise. Regards, Barry. > thanks > slaton > > Slaton Lipscomb > Nogales Lab, Howard Hughes Medical Institute > http://cryoem.berkeley.edu > > On Mon, 25 Feb 2008, slaton wrote: > >> Thanks for the reply. >> >> > Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ? >> >> Presumably not - i'm using 2.6.17.11, and that information indicates the >> bug was fixed in 2.6.17.7. >> >> I've attached the output from running ksymoops on messages.1. First >> crash/trace (Feb 21 19:xx) corresponds to the original XFS event; the >> second (Feb 22 15:xx) is the system going down when i tried to unmount >> the >> volume. >> >> Here are the additional syslog msgs corresponding to the Feb 22 15:xx >> crash. >> >> Feb 22 15:47:13 qln01 kernel: grsec: From 10.0.2.93: unmount of >> /dev/sda1 >> by /bin/umount[umount:18604] uid/euid:0/0 gid/egid:0/0, parent >> /bin/bash[bash:31972] uid/euid:0/0 gid/egid:0/0 >> Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from >> line 338 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff88173ce4 >> Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from >> line 338 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff88173ce4 >> Feb 22 15:47:28 qln01 kernel: BUG: soft lockup detected on CPU#0! >> >> thanks >> slaton > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO) 2008-02-28 3:16 ` Barry Naujok @ 2008-03-01 4:09 ` slaton 2008-03-04 1:29 ` slaton 1 sibling, 0 replies; 9+ messages in thread From: slaton @ 2008-03-01 4:09 UTC (permalink / raw) To: Barry Naujok; +Cc: xfs-oss Thanks Barry. Couple of follow-up questions: For "making an entire of the device", i presume you mean using dd, since it's an unmounted filesystem? Also, I noted that my system's older xfsprogs 2.6.13-1 doesn't include xfs_metadump; is this a newer utility? Rather than updating this system, i'm thinking of performing the recovery from a linux LiveCD type setup. I was thinking of Knoppix 5.1.1, which includes linux 2.6.19 xfsprogs 2.8.11-1 Any concerns with these? Or would you strongly recommend i roll my own xfsprogs 2.9.4 and use the system itself (choice of kernels 2.6.17.11 or 2.6.23.16)? thanks slaton Slaton Lipscomb Nogales Lab, Howard Hughes Medical Institute http://cryoem.berkeley.edu On Thu, 28 Feb 2008, Barry Naujok wrote: > On Thu, 28 Feb 2008 09:44:04 +1100, slaton <slaton@berkeley.edu> wrote: > > > Hi, > > > > I'm still hoping for some help with this. Is any more information > > needed in addition to the ksymoops output previously posted? > > > > In particular i'd like to know if just remounting the filesystem (to > > replay the journal), then unmounting and running xfs_repair is the > > best course of action. In addition, i'd like to know what recommended > > kernel/xfsprogs versions to use for best results. > > I would get xfsprogs 2.9.4 (2.9.6 is not a good version with your > kernel), > ftp://oss.sgi.com/projects/xfs/previous/cmd_tars/xfsprogs_2.9.4-1.tar.gz > > To be on the safe side, either make an entire copy of your drive to > another device, or run "xfs_metadump -o /dev/sda1" to capture a metadata > (no file data) of your filesystem. > > Then run xfs_repair (mount/unmount maybe required if the log is dirty). > > If the filesystem is in a bad state after the repair (eg. everything in > lost+found), email the xfs_repair log and request further advise. > > Regards, > Barry. > > > > thanks > > slaton > > > > Slaton Lipscomb > > Nogales Lab, Howard Hughes Medical Institute > > http://cryoem.berkeley.edu > > > > On Mon, 25 Feb 2008, slaton wrote: > > > > > Thanks for the reply. > > > > > > > Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ? > > > > > > Presumably not - i'm using 2.6.17.11, and that information indicates the > > > bug was fixed in 2.6.17.7. > > > > > > I've attached the output from running ksymoops on messages.1. First > > > crash/trace (Feb 21 19:xx) corresponds to the original XFS event; the > > > second (Feb 22 15:xx) is the system going down when i tried to unmount the > > > volume. > > > > > > Here are the additional syslog msgs corresponding to the Feb 22 15:xx > > > crash. > > > > > > Feb 22 15:47:13 qln01 kernel: grsec: From 10.0.2.93: unmount of /dev/sda1 > > > by /bin/umount[umount:18604] uid/euid:0/0 gid/egid:0/0, parent > > > /bin/bash[bash:31972] uid/euid:0/0 gid/egid:0/0 > > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from > > > line 338 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff88173ce4 > > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from > > > line 338 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff88173ce4 > > > Feb 22 15:47:28 qln01 kernel: BUG: soft lockup detected on CPU#0! > > > > > > thanks > > > slaton > > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO) 2008-02-28 3:16 ` Barry Naujok 2008-03-01 4:09 ` slaton @ 2008-03-04 1:29 ` slaton 2008-03-04 1:36 ` Barry Naujok 1 sibling, 1 reply; 9+ messages in thread From: slaton @ 2008-03-04 1:29 UTC (permalink / raw) To: Barry Naujok; +Cc: xfs-oss Barry, I ran xfs_metadump (with -g -o -w options) on the partition and in addition to the file output this was written to stder: xfs_metadump: suspicious count 22 in bmap extent 9 in dir2 ino 940064492 xfs_metadump: suspicious count 21 in bmap extent 8 in dir2 ino 1348807890 xfs_metadump: suspicious count 29 in bmap extent 9 in dir2 ino 2826081099 xfs_metadump: suspicious count 23 in bmap extent 54 in dir2 ino 3093231364 xfs_metadump: suspicious count 106 in bmap extent 4 in dir2 ino 3505884782 Should i go ahead and do a mount/umount (to replay log) and then xfs_repair, or would another course of action be recommended, given these potential problem inodes? thanks slaton Slaton Lipscomb Nogales Lab, Howard Hughes Medical Institute http://cryoem.berkeley.edu On Thu, 28 Feb 2008, Barry Naujok wrote: > On Thu, 28 Feb 2008 09:44:04 +1100, slaton <slaton@berkeley.edu> wrote: > > > Hi, > > > > I'm still hoping for some help with this. Is any more information needed > > in addition to the ksymoops output previously posted? > > > > In particular i'd like to know if just remounting the filesystem (to > > replay the journal), then unmounting and running xfs_repair is the best > > course of action. In addition, i'd like to know what recommended > > kernel/xfsprogs versions to use for best results. > > I would get xfsprogs 2.9.4 (2.9.6 is not a good version with your kernel), > ftp://oss.sgi.com/projects/xfs/previous/cmd_tars/xfsprogs_2.9.4-1.tar.gz > > To be on the safe side, either make an entire copy of your drive to > another device, or run "xfs_metadump -o /dev/sda1" to capture > a metadata (no file data) of your filesystem. > > Then run xfs_repair (mount/unmount maybe required if the log is dirty). > > If the filesystem is in a bad state after the repair (eg. everything in > lost+found), email the xfs_repair log and request further advise. > > Regards, > Barry. > > > > thanks > > slaton > > > > Slaton Lipscomb > > Nogales Lab, Howard Hughes Medical Institute > > http://cryoem.berkeley.edu > > > > On Mon, 25 Feb 2008, slaton wrote: > > > > > Thanks for the reply. > > > > > > > Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ? > > > > > > Presumably not - i'm using 2.6.17.11, and that information indicates the > > > bug was fixed in 2.6.17.7. > > > > > > I've attached the output from running ksymoops on messages.1. First > > > crash/trace (Feb 21 19:xx) corresponds to the original XFS event; the > > > second (Feb 22 15:xx) is the system going down when i tried to unmount the > > > volume. > > > > > > Here are the additional syslog msgs corresponding to the Feb 22 15:xx > > > crash. > > > > > > Feb 22 15:47:13 qln01 kernel: grsec: From 10.0.2.93: unmount of /dev/sda1 > > > by /bin/umount[umount:18604] uid/euid:0/0 gid/egid:0/0, parent > > > /bin/bash[bash:31972] uid/euid:0/0 gid/egid:0/0 > > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from > > > line 338 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff88173ce4 > > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from > > > line 338 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff88173ce4 > > > Feb 22 15:47:28 qln01 kernel: BUG: soft lockup detected on CPU#0! > > > > > > thanks > > > slaton > > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO) 2008-03-04 1:29 ` slaton @ 2008-03-04 1:36 ` Barry Naujok 2008-03-04 1:43 ` slaton 0 siblings, 1 reply; 9+ messages in thread From: Barry Naujok @ 2008-03-04 1:36 UTC (permalink / raw) To: slaton; +Cc: xfs-oss On Tue, 04 Mar 2008 12:29:27 +1100, slaton <slaton@berkeley.edu> wrote: > Barry, > > I ran xfs_metadump (with -g -o -w options) on the partition and in > addition to the file output this was written to stder: > > xfs_metadump: suspicious count 22 in bmap extent 9 in dir2 ino 940064492 > xfs_metadump: suspicious count 21 in bmap extent 8 in dir2 ino 1348807890 > xfs_metadump: suspicious count 29 in bmap extent 9 in dir2 ino 2826081099 > xfs_metadump: suspicious count 23 in bmap extent 54 in dir2 ino > 3093231364 > xfs_metadump: suspicious count 106 in bmap extent 4 in dir2 ino > 3505884782 > > Should i go ahead and do a mount/umount (to replay log) and then > xfs_repair, or would another course of action be recommended, given these > potential problem inodes? Depending on the size of the directories, these numbers are probably fine. I believe a mount/unmount/repair is the best course of action from here. So be extra safe, run another metadump after mount/unmount before running repair. Barry. > thanks > slaton > > Slaton Lipscomb > Nogales Lab, Howard Hughes Medical Institute > http://cryoem.berkeley.edu > > On Thu, 28 Feb 2008, Barry Naujok wrote: > >> On Thu, 28 Feb 2008 09:44:04 +1100, slaton <slaton@berkeley.edu> wrote: >> >> > Hi, >> > >> > I'm still hoping for some help with this. Is any more information >> needed >> > in addition to the ksymoops output previously posted? >> > >> > In particular i'd like to know if just remounting the filesystem (to >> > replay the journal), then unmounting and running xfs_repair is the >> best >> > course of action. In addition, i'd like to know what recommended >> > kernel/xfsprogs versions to use for best results. >> >> I would get xfsprogs 2.9.4 (2.9.6 is not a good version with your >> kernel), >> ftp://oss.sgi.com/projects/xfs/previous/cmd_tars/xfsprogs_2.9.4-1.tar.gz >> >> To be on the safe side, either make an entire copy of your drive to >> another device, or run "xfs_metadump -o /dev/sda1" to capture >> a metadata (no file data) of your filesystem. >> >> Then run xfs_repair (mount/unmount maybe required if the log is dirty). >> >> If the filesystem is in a bad state after the repair (eg. everything in >> lost+found), email the xfs_repair log and request further advise. >> >> Regards, >> Barry. >> >> >> > thanks >> > slaton >> > >> > Slaton Lipscomb >> > Nogales Lab, Howard Hughes Medical Institute >> > http://cryoem.berkeley.edu >> > >> > On Mon, 25 Feb 2008, slaton wrote: >> > >> > > Thanks for the reply. >> > > >> > > > Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ? >> > > >> > > Presumably not - i'm using 2.6.17.11, and that information >> indicates the >> > > bug was fixed in 2.6.17.7. >> > > >> > > I've attached the output from running ksymoops on messages.1. First >> > > crash/trace (Feb 21 19:xx) corresponds to the original XFS event; >> the >> > > second (Feb 22 15:xx) is the system going down when i tried to >> unmount the >> > > volume. >> > > >> > > Here are the additional syslog msgs corresponding to the Feb 22 >> 15:xx >> > > crash. >> > > >> > > Feb 22 15:47:13 qln01 kernel: grsec: From 10.0.2.93: unmount of >> /dev/sda1 >> > > by /bin/umount[umount:18604] uid/euid:0/0 gid/egid:0/0, parent >> > > /bin/bash[bash:31972] uid/euid:0/0 gid/egid:0/0 >> > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called >> from >> > > line 338 of file fs/xfs/xfs_rw.c. Return address = >> 0xffffffff88173ce4 >> > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called >> from >> > > line 338 of file fs/xfs/xfs_rw.c. Return address = >> 0xffffffff88173ce4 >> > > Feb 22 15:47:28 qln01 kernel: BUG: soft lockup detected on CPU#0! >> > > >> > > thanks >> > > slaton >> > >> > >> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO) 2008-03-04 1:36 ` Barry Naujok @ 2008-03-04 1:43 ` slaton 0 siblings, 0 replies; 9+ messages in thread From: slaton @ 2008-03-04 1:43 UTC (permalink / raw) To: Barry Naujok; +Cc: xfs-oss Unfortunately, mounting triggered another XFS_WANT_CORRUPTED_GOTO error: XFS mounting filesystem sda1 Starting XFS recovery on filesystem: sda1 (logdev: internal) XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1546 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff882c3be6 Call Trace: [<ffffffff882c204b>] :xfs:xfs_free_ag_extent+0x18a/0x690 [<ffffffff882c3be6>] :xfs:xfs_free_extent+0xa9/0xc9 [<ffffffff882fabf5>] :xfs:xlog_recover_process_efi+0x117/0x149 [<ffffffff882fac6d>] :xfs:xlog_recover_process_efis+0x46/0x6f [<ffffffff882fbb7e>] :xfs:xlog_recover_finish+0x16/0x98 [<ffffffff882f4e68>] :xfs:xfs_log_mount_finish+0x19/0x1c [<ffffffff882fdb52>] :xfs:xfs_mountfs+0x892/0x99a [<ffffffff8830b663>] :xfs:kmem_alloc+0x67/0xcd [<ffffffff8830b6d2>] :xfs:kmem_zalloc+0x9/0x21 [<ffffffff882fe7a0>] :xfs:xfs_mru_cache_create+0x127/0x188 [<ffffffff8830376e>] :xfs:xfs_mount+0x333/0x3b4 [<ffffffff88314452>] :xfs:xfs_fs_fill_super+0x0/0x1ab [<ffffffff883144d0>] :xfs:xfs_fs_fill_super+0x7e/0x1ab [<ffffffff80449fe3>] __down_write_nested+0x12/0x9a [<ffffffff802a131e>] get_filesystem+0x12/0x35 [<ffffffff8028e8aa>] sget+0x379/0x38e [<ffffffff8028ef31>] set_bdev_super+0x0/0xf [<ffffffff8028f06a>] get_sb_bdev+0x11d/0x168 [<ffffffff8028f296>] vfs_kern_mount+0x94/0x124 [<ffffffff8028f363>] do_kern_mount+0x3d/0xee [<ffffffff802a35ff>] do_mount+0x6e5/0x738 [<ffffffff80275743>] handle_mm_fault+0x385/0x789 [<ffffffff8030dfe9>] __up_read+0x10/0x8a [<ffffffff8022341c>] do_page_fault+0x453/0x7a3 [<ffffffff802757bd>] handle_mm_fault+0x3ff/0x789 [<ffffffff80271188>] zone_statistics+0x41/0x63 [<ffffffff8026aa1b>] __alloc_pages+0x6a/0x2d4 [<ffffffff802a3903>] sys_mount+0x8b/0xce [<ffffffff8020bdde>] system_call+0x7e/0x83 Ending XFS recovery on filesystem: sda1 (logdev: internal) Haven't tried to unmount or anything else, yet. How to proceed? Just to reiterate, currently using kernel 2.6.23.16 and xfsprogs 2.9.4-1. thanks slaton Slaton Lipscomb Nogales Lab, Howard Hughes Medical Institute http://cryoem.berkeley.edu On Tue, 4 Mar 2008, Barry Naujok wrote: > On Tue, 04 Mar 2008 12:29:27 +1100, slaton <slaton@berkeley.edu> wrote: > > > Barry, > > > > I ran xfs_metadump (with -g -o -w options) on the partition and in > > addition to the file output this was written to stder: > > > > xfs_metadump: suspicious count 22 in bmap extent 9 in dir2 ino 940064492 > > xfs_metadump: suspicious count 21 in bmap extent 8 in dir2 ino 1348807890 > > xfs_metadump: suspicious count 29 in bmap extent 9 in dir2 ino 2826081099 > > xfs_metadump: suspicious count 23 in bmap extent 54 in dir2 ino 3093231364 > > xfs_metadump: suspicious count 106 in bmap extent 4 in dir2 ino 3505884782 > > > > Should i go ahead and do a mount/umount (to replay log) and then > > xfs_repair, or would another course of action be recommended, given these > > potential problem inodes? > > Depending on the size of the directories, these numbers are probably fine. > I believe a mount/unmount/repair is the best course of action from here. > > So be extra safe, run another metadump after mount/unmount before running > repair. > > Barry. > > > thanks > > slaton > > > > Slaton Lipscomb > > Nogales Lab, Howard Hughes Medical Institute > > http://cryoem.berkeley.edu > > > > On Thu, 28 Feb 2008, Barry Naujok wrote: > > > > > On Thu, 28 Feb 2008 09:44:04 +1100, slaton <slaton@berkeley.edu> wrote: > > > > > > > Hi, > > > > > > > > I'm still hoping for some help with this. Is any more information needed > > > > in addition to the ksymoops output previously posted? > > > > > > > > In particular i'd like to know if just remounting the filesystem (to > > > > replay the journal), then unmounting and running xfs_repair is the best > > > > course of action. In addition, i'd like to know what recommended > > > > kernel/xfsprogs versions to use for best results. > > > > > > I would get xfsprogs 2.9.4 (2.9.6 is not a good version with your kernel), > > > ftp://oss.sgi.com/projects/xfs/previous/cmd_tars/xfsprogs_2.9.4-1.tar.gz > > > > > > To be on the safe side, either make an entire copy of your drive to > > > another device, or run "xfs_metadump -o /dev/sda1" to capture > > > a metadata (no file data) of your filesystem. > > > > > > Then run xfs_repair (mount/unmount maybe required if the log is dirty). > > > > > > If the filesystem is in a bad state after the repair (eg. everything in > > > lost+found), email the xfs_repair log and request further advise. > > > > > > Regards, > > > Barry. > > > > > > > > > > thanks > > > > slaton > > > > > > > > Slaton Lipscomb > > > > Nogales Lab, Howard Hughes Medical Institute > > > > http://cryoem.berkeley.edu > > > > > > > > On Mon, 25 Feb 2008, slaton wrote: > > > > > > > > > Thanks for the reply. > > > > > > > > > > > Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ? > > > > > > > > > > Presumably not - i'm using 2.6.17.11, and that information indicates > > > > the > > > > > bug was fixed in 2.6.17.7. > > > > > > > > > > I've attached the output from running ksymoops on messages.1. First > > > > > crash/trace (Feb 21 19:xx) corresponds to the original XFS event; the > > > > > second (Feb 22 15:xx) is the system going down when i tried to unmount > > > > the > > > > > volume. > > > > > > > > > > Here are the additional syslog msgs corresponding to the Feb 22 15:xx > > > > > crash. > > > > > > > > > > Feb 22 15:47:13 qln01 kernel: grsec: From 10.0.2.93: unmount of > > > > /dev/sda1 > > > > > by /bin/umount[umount:18604] uid/euid:0/0 gid/egid:0/0, parent > > > > > /bin/bash[bash:31972] uid/euid:0/0 gid/egid:0/0 > > > > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from > > > > > line 338 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff88173ce4 > > > > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from > > > > > line 338 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff88173ce4 > > > > > Feb 22 15:47:28 qln01 kernel: BUG: soft lockup detected on CPU#0! > > > > > > > > > > thanks > > > > > slaton > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-03-04 1:51 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-02-25 22:20 Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO) slaton 2008-02-25 22:40 ` Eric Sandeen 2008-02-26 7:54 ` slaton 2008-02-27 22:44 ` slaton 2008-02-28 3:16 ` Barry Naujok 2008-03-01 4:09 ` slaton 2008-03-04 1:29 ` slaton 2008-03-04 1:36 ` Barry Naujok 2008-03-04 1:43 ` slaton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox