public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO)
@ 2008-02-25 22:20 slaton
  2008-02-25 22:40 ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: slaton @ 2008-02-25 22:20 UTC (permalink / raw)
  To: xfs-oss

A RAID5 (3ware card w/ 8 drive cage) filesystem on our cluster login node 
shut down the other night with this error:

kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1563 of file 
	fs/xfs/xfs_alloc.c.  Caller 0xffffffff8812b3a3
kernel: Call Trace: [<ffffffff88129713>] [<ffffffff8812b3a3>]
kernel:        [<ffffffff88150df7>] [<ffffffff8816af8a>] 
	[<ffffffff88137d6c>]
kernel:        [<ffffffff88157d25>] [<ffffffff8816ed1c>] 
	[<ffffffff811051fa>]
kernel:        [<ffffffff8817a5b2>] [<ffffffff8102c988>] 
	[<ffffffff882566ee>]
kernel:        [<ffffffff8825ba4d>] [<ffffffff8825170a>] 
	[<ffffffff881a379e>]
kernel:        [<ffffffff882512da>] [<ffffffff882514a0>] 
	[<ffffffff810604e2>]
kernel:        [<ffffffff882512da>] [<ffffffff882512da>] 
	[<ffffffff810604da>]
kernel: xfs_force_shutdown(sda1,0x8) called from line 4091 of file 
	fs/xfs/xfs_bmap.c.  Return address = 0xffffffff88137daf
kernel: Filesystem "sda1": Corruption of in-memory data detected.  
Shutting down filesystem: sda1 
kernel: Please umount the filesystem, and rectify the problem(s)
kernel: nfsd: non-standard errno: -990

System hung upon attempting to umount the volume. Have not yet rebooted.

Some additional info:

- Server arch is x86_64 (smp).

- Distro is caos2 linux, kernel 2.6.17 (smp). 2.6.23 pkg is also 
available.

- Kernel not compiled with CONFIG_4KSTACKS=y.

- xfsprogs package is xfsprogs-2.6.13

Memtest86 is running now - no errors yet reported.

After doing some searches, once this occurs it appears to repeat with 
increasing frequency, and i did read of a number of folks losing all data. 
There also appear to be issues related to using some older kernels and 
xfsprogs.

What kernel and xfsprogs version do you recommend i proceed with, before i 
attempt to remount or run xfs_repair?

Any alternate suggestions for recovery, and how to prevent this from 
recurring?

thanks for any help

slaton

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO)
  2008-02-25 22:20 Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO) slaton
@ 2008-02-25 22:40 ` Eric Sandeen
  2008-02-26  7:54   ` slaton
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2008-02-25 22:40 UTC (permalink / raw)
  To: slaton; +Cc: xfs-oss

slaton wrote:
> A RAID5 (3ware card w/ 8 drive cage) filesystem on our cluster login node 
> shut down the other night with this error:
> 
> kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1563 of file 
> 	fs/xfs/xfs_alloc.c.  Caller 0xffffffff8812b3a3
> kernel: Call Trace: [<ffffffff88129713>] [<ffffffff8812b3a3>]
> kernel:        [<ffffffff88150df7>] [<ffffffff8816af8a>] 
> 	[<ffffffff88137d6c>]
> kernel:        [<ffffffff88157d25>] [<ffffffff8816ed1c>] 
> 	[<ffffffff811051fa>]
> kernel:        [<ffffffff8817a5b2>] [<ffffffff8102c988>] 
> 	[<ffffffff882566ee>]
> kernel:        [<ffffffff8825ba4d>] [<ffffffff8825170a>] 
> 	[<ffffffff881a379e>]
> kernel:        [<ffffffff882512da>] [<ffffffff882514a0>] 
> 	[<ffffffff810604e2>]
> kernel:        [<ffffffff882512da>] [<ffffffff882512da>] 
> 	[<ffffffff810604da>]
> kernel: xfs_force_shutdown(sda1,0x8) called from line 4091 of file 
> 	fs/xfs/xfs_bmap.c.  Return address = 0xffffffff88137daf
> kernel: Filesystem "sda1": Corruption of in-memory data detected.  
> Shutting down filesystem: sda1 
> kernel: Please umount the filesystem, and rectify the problem(s)
> kernel: nfsd: non-standard errno: -990
> 
> System hung upon attempting to umount the volume. Have not yet rebooted.
> 
> Some additional info:
> 
> - Server arch is x86_64 (smp).
> 
> - Distro is caos2 linux, kernel 2.6.17 (smp). 2.6.23 pkg is also 
> available.

ksymoops might be good so we can see what the actual backtrace was.

Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ?

-Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO)
  2008-02-25 22:40 ` Eric Sandeen
@ 2008-02-26  7:54   ` slaton
  2008-02-27 22:44     ` slaton
  0 siblings, 1 reply; 9+ messages in thread
From: slaton @ 2008-02-26  7:54 UTC (permalink / raw)
  To: xfs-oss

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1049 bytes --]

Thanks for the reply.

> Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ?

Presumably not - i'm using 2.6.17.11, and that information indicates the 
bug was fixed in 2.6.17.7.

I've attached the output from running ksymoops on messages.1. First 
crash/trace (Feb 21 19:xx) corresponds to the original XFS event; the 
second (Feb 22 15:xx) is the system going down when i tried to unmount the 
volume.

Here are the additional syslog msgs corresponding to the Feb 22 15:xx 
crash.

Feb 22 15:47:13 qln01 kernel: grsec: From 10.0.2.93: unmount of /dev/sda1 
by /bin/umount[umount:18604] uid/euid:0/0 gid/egid:0/0, parent 
/bin/bash[bash:31972] uid/euid:0/0 gid/egid:0/0
Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from 
line 338 of file fs/xfs/xfs_rw.c.  Return address = 0xffffffff88173ce4
Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from 
line 338 of file fs/xfs/xfs_rw.c.  Return address = 0xffffffff88173ce4
Feb 22 15:47:28 qln01 kernel: BUG: soft lockup detected on CPU#0!

thanks
slaton

[-- Attachment #2: Type: TEXT/plain, Size: 6373 bytes --]

ksymoops 2.4.9 on x86_64 2.6.17.11-102.caos.smp.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.6.17.11-102.caos.smp/ (default)
     -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Error (regular_file): read_ksyms stat /proc/ksyms failed
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Error (regular_file): read_system_map stat /usr/src/linux/System.map failed
Warning (merge_maps): no symbols in merged map
Feb 19 11:44:27 qln01 kernel: Machine check events logged
Feb 19 16:44:24 qln01 kernel: Machine check events logged
Feb 19 18:39:23 qln01 kernel: Machine check events logged
Feb 19 21:09:22 qln01 kernel: Machine check events logged
Feb 19 23:49:20 qln01 kernel: Machine check events logged
Feb 20 02:29:19 qln01 kernel: Machine check events logged
Feb 20 14:24:12 qln01 kernel: Machine check events logged
Feb 20 19:29:10 qln01 kernel: Machine check events logged
Feb 21 19:00:58 qln01 kernel: Call Trace: [<ffffffff88129713>] [<ffffffff8812b3a3>]
Feb 21 19:00:58 qln01 kernel:        [<ffffffff88150df7>] [<ffffffff8816af8a>] [<ffffffff88137d6c>]
Feb 21 19:00:58 qln01 kernel:        [<ffffffff88157d25>] [<ffffffff8816ed1c>] [<ffffffff811051fa>]
Feb 21 19:00:58 qln01 kernel:        [<ffffffff8817a5b2>] [<ffffffff8102c988>] [<ffffffff882566ee>]
Feb 21 19:00:58 qln01 kernel:        [<ffffffff8825ba4d>] [<ffffffff8825170a>] [<ffffffff881a379e>]
Feb 21 19:00:58 qln01 kernel:        [<ffffffff882512da>] [<ffffffff882514a0>] [<ffffffff810604e2>]
Feb 21 19:00:58 qln01 kernel:        [<ffffffff882512da>] [<ffffffff882512da>] [<ffffffff810604da>]
Warning (Oops_read): Code line not seen, dumping what data is available


Trace; ffffffff88129713 No symbols available
Trace; ffffffff8812b3a3 No symbols available
Trace; ffffffff88150df7 No symbols available
Trace; ffffffff8816af8a No symbols available
Trace; ffffffff88137d6c No symbols available
Trace; ffffffff88157d25 No symbols available
Trace; ffffffff8816ed1c No symbols available
Trace; ffffffff811051fa No symbols available
Trace; ffffffff8817a5b2 No symbols available
Trace; ffffffff8102c988 No symbols available
Trace; ffffffff882566ee No symbols available
Trace; ffffffff8825ba4d No symbols available
Trace; ffffffff8825170a No symbols available
Trace; ffffffff881a379e No symbols available
Trace; ffffffff882512da No symbols available
Trace; ffffffff882514a0 No symbols available
Trace; ffffffff810604e2 No symbols available
Trace; ffffffff882512da No symbols available
Trace; ffffffff882512da No symbols available
Trace; ffffffff810604da No symbols available

Feb 22 15:08:10 qln01 kernel: e1000: eth2: e1000_watchdog_task: NIC Link is Down
Feb 22 15:47:28 qln01 kernel: Call Trace: <IRQ> [<ffffffff810aef1a>] [<ffffffff88158f7a>]
Feb 22 15:47:28 qln01 kernel:        [<ffffffff8108c245>] [<ffffffff810737dd>] [<ffffffff81073842>]
Feb 22 15:47:28 qln01 kernel:        [<ffffffff8106018c>] <EOI> [<ffffffff88158f7a>] [<ffffffff88158f7a>]
Feb 22 15:47:28 qln01 kernel:        [<ffffffff81064ae6>] [<ffffffff81064ab7>] [<ffffffff8817277f>]
Feb 22 15:47:28 qln01 kernel:        [<ffffffff88158f7a>] [<ffffffff881677df>] [<ffffffff8816d77a>]
Feb 22 15:47:28 qln01 kernel:        [<ffffffff8817c656>] [<ffffffff810c3bdf>] [<ffffffff810c48de>]
Feb 22 15:47:28 qln01 kernel:        [<ffffffff810c3b2d>] [<ffffffff810cc0b7>] [<ffffffff8101124f>]
Feb 22 15:47:28 qln01 kernel:        [<ffffffff81060329>] [<ffffffff8105f452>]
Warning (Oops_read): Code line not seen, dumping what data is available


Trace; ffffffff8108c245 No symbols available
Trace; ffffffff810737dd No symbols available
Trace; ffffffff81073842 No symbols available
Trace; ffffffff8106018c No symbols available
Trace; ffffffff81064ae6 No symbols available
Trace; ffffffff81064ab7 No symbols available
Trace; ffffffff8817277f No symbols available
Trace; ffffffff88158f7a No symbols available
Trace; ffffffff881677df No symbols available
Trace; ffffffff8816d77a No symbols available
Trace; ffffffff8817c656 No symbols available
Trace; ffffffff810c3bdf No symbols available
Trace; ffffffff810c48de No symbols available
Trace; ffffffff810c3b2d No symbols available
Trace; ffffffff810cc0b7 No symbols available
Trace; ffffffff8101124f No symbols available
Trace; ffffffff81060329 No symbols available
Trace; ffffffff8105f452 No symbols available

Feb 25 21:47:52 qln01 kernel: CPU 0: aperture @ 770000000 size 32 MB
Feb 25 21:47:52 qln01 kernel: CPU 1: Syncing TSC to CPU 0.
Feb 25 21:47:52 qln01 kernel: CPU 1: synchronized TSC with CPU 0 (last diff 5 cycles, maxerr 1077 cycles)
Feb 25 21:47:52 qln01 kernel: testing NMI watchdog ... OK.
Feb 25 21:47:55 qln01 kernel: e1000: 0000:02:03.0: e1000_probe: (PCI:66MHz:32-bit) 00:d0:68:06:b0:5e
Feb 25 21:47:55 qln01 kernel: e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
Feb 25 21:47:55 qln01 kernel: e1000: 0000:02:04.0: e1000_probe: (PCI:66MHz:32-bit) 00:d0:68:06:b0:5f
Feb 25 21:47:55 qln01 kernel: e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
Feb 25 21:47:55 qln01 kernel: e1000: 0000:01:01.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:04:23:a8:ac:78
Feb 25 21:47:55 qln01 kernel: e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection
Feb 25 21:47:55 qln01 kernel: e1000: 0000:01:01.1: e1000_probe: (PCI-X:133MHz:64-bit) 00:04:23:a8:ac:79
Feb 25 21:47:55 qln01 kernel: e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection
Feb 25 21:47:55 qln01 kernel: e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex
Feb 25 21:47:55 qln01 kernel: e1000: eth1: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex
Feb 25 21:47:55 qln01 kernel: lo: Disabled Privacy Extensions
Feb 25 21:48:27 qln01 kernel: e1000: eth2: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex

4 warnings and 2 errors issued.  Results may not be reliable.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO)
  2008-02-26  7:54   ` slaton
@ 2008-02-27 22:44     ` slaton
  2008-02-28  3:16       ` Barry Naujok
  0 siblings, 1 reply; 9+ messages in thread
From: slaton @ 2008-02-27 22:44 UTC (permalink / raw)
  To: xfs-oss

Hi,

I'm still hoping for some help with this. Is any more information needed 
in addition to the ksymoops output previously posted?

In particular i'd like to know if just remounting the filesystem (to 
replay the journal), then unmounting and running xfs_repair is the best 
course of action. In addition, i'd like to know what recommended 
kernel/xfsprogs versions to use for best results.

thanks
slaton

Slaton Lipscomb
Nogales Lab, Howard Hughes Medical Institute
http://cryoem.berkeley.edu

On Mon, 25 Feb 2008, slaton wrote:

> Thanks for the reply.
> 
> > Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ?
> 
> Presumably not - i'm using 2.6.17.11, and that information indicates the 
> bug was fixed in 2.6.17.7.
> 
> I've attached the output from running ksymoops on messages.1. First 
> crash/trace (Feb 21 19:xx) corresponds to the original XFS event; the 
> second (Feb 22 15:xx) is the system going down when i tried to unmount the 
> volume.
> 
> Here are the additional syslog msgs corresponding to the Feb 22 15:xx 
> crash.
> 
> Feb 22 15:47:13 qln01 kernel: grsec: From 10.0.2.93: unmount of /dev/sda1 
> by /bin/umount[umount:18604] uid/euid:0/0 gid/egid:0/0, parent 
> /bin/bash[bash:31972] uid/euid:0/0 gid/egid:0/0
> Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from 
> line 338 of file fs/xfs/xfs_rw.c.  Return address = 0xffffffff88173ce4
> Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from 
> line 338 of file fs/xfs/xfs_rw.c.  Return address = 0xffffffff88173ce4
> Feb 22 15:47:28 qln01 kernel: BUG: soft lockup detected on CPU#0!
> 
> thanks
> slaton

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO)
  2008-02-27 22:44     ` slaton
@ 2008-02-28  3:16       ` Barry Naujok
  2008-03-01  4:09         ` slaton
  2008-03-04  1:29         ` slaton
  0 siblings, 2 replies; 9+ messages in thread
From: Barry Naujok @ 2008-02-28  3:16 UTC (permalink / raw)
  To: slaton, xfs-oss

On Thu, 28 Feb 2008 09:44:04 +1100, slaton <slaton@berkeley.edu> wrote:

> Hi,
>
> I'm still hoping for some help with this. Is any more information needed
> in addition to the ksymoops output previously posted?
>
> In particular i'd like to know if just remounting the filesystem (to
> replay the journal), then unmounting and running xfs_repair is the best
> course of action. In addition, i'd like to know what recommended
> kernel/xfsprogs versions to use for best results.

I would get xfsprogs 2.9.4 (2.9.6 is not a good version with your kernel),
ftp://oss.sgi.com/projects/xfs/previous/cmd_tars/xfsprogs_2.9.4-1.tar.gz

To be on the safe side, either make an entire copy of your drive to
another device, or run "xfs_metadump -o /dev/sda1" to capture
a metadata (no file data) of your filesystem.

Then run xfs_repair (mount/unmount maybe required if the log is dirty).

If the filesystem is in a bad state after the repair (eg. everything in
lost+found), email the xfs_repair log and request further advise.

Regards,
Barry.


> thanks
> slaton
>
> Slaton Lipscomb
> Nogales Lab, Howard Hughes Medical Institute
> http://cryoem.berkeley.edu
>
> On Mon, 25 Feb 2008, slaton wrote:
>
>> Thanks for the reply.
>>
>> > Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ?
>>
>> Presumably not - i'm using 2.6.17.11, and that information indicates the
>> bug was fixed in 2.6.17.7.
>>
>> I've attached the output from running ksymoops on messages.1. First
>> crash/trace (Feb 21 19:xx) corresponds to the original XFS event; the
>> second (Feb 22 15:xx) is the system going down when i tried to unmount  
>> the
>> volume.
>>
>> Here are the additional syslog msgs corresponding to the Feb 22 15:xx
>> crash.
>>
>> Feb 22 15:47:13 qln01 kernel: grsec: From 10.0.2.93: unmount of  
>> /dev/sda1
>> by /bin/umount[umount:18604] uid/euid:0/0 gid/egid:0/0, parent
>> /bin/bash[bash:31972] uid/euid:0/0 gid/egid:0/0
>> Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from
>> line 338 of file fs/xfs/xfs_rw.c.  Return address = 0xffffffff88173ce4
>> Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from
>> line 338 of file fs/xfs/xfs_rw.c.  Return address = 0xffffffff88173ce4
>> Feb 22 15:47:28 qln01 kernel: BUG: soft lockup detected on CPU#0!
>>
>> thanks
>> slaton
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO)
  2008-02-28  3:16       ` Barry Naujok
@ 2008-03-01  4:09         ` slaton
  2008-03-04  1:29         ` slaton
  1 sibling, 0 replies; 9+ messages in thread
From: slaton @ 2008-03-01  4:09 UTC (permalink / raw)
  To: Barry Naujok; +Cc: xfs-oss

Thanks Barry. Couple of follow-up questions:

For "making an entire of the device", i presume you mean using dd, since 
it's an unmounted filesystem?

Also, I noted that my system's older xfsprogs 2.6.13-1 doesn't include 
xfs_metadump; is this a newer utility?

Rather than updating this system, i'm thinking of performing the recovery 
from a linux LiveCD type setup. I was thinking of Knoppix 5.1.1, which 
includes

 linux 2.6.19
 xfsprogs 2.8.11-1

Any concerns with these? Or would you strongly recommend i roll my own 
xfsprogs 2.9.4 and use the system itself (choice of kernels 2.6.17.11 or 
2.6.23.16)?

thanks
slaton

Slaton Lipscomb
Nogales Lab, Howard Hughes Medical Institute
http://cryoem.berkeley.edu

On Thu, 28 Feb 2008, Barry Naujok wrote:

> On Thu, 28 Feb 2008 09:44:04 +1100, slaton <slaton@berkeley.edu> wrote:
> 
> > Hi,
> > 
> > I'm still hoping for some help with this. Is any more information 
> > needed in addition to the ksymoops output previously posted?
> > 
> > In particular i'd like to know if just remounting the filesystem (to 
> > replay the journal), then unmounting and running xfs_repair is the 
> > best course of action. In addition, i'd like to know what recommended 
> > kernel/xfsprogs versions to use for best results.
> 
> I would get xfsprogs 2.9.4 (2.9.6 is not a good version with your 
> kernel), 
> ftp://oss.sgi.com/projects/xfs/previous/cmd_tars/xfsprogs_2.9.4-1.tar.gz
> 
> To be on the safe side, either make an entire copy of your drive to 
> another device, or run "xfs_metadump -o /dev/sda1" to capture a metadata 
> (no file data) of your filesystem.
> 
> Then run xfs_repair (mount/unmount maybe required if the log is dirty).
> 
> If the filesystem is in a bad state after the repair (eg. everything in 
> lost+found), email the xfs_repair log and request further advise.
> 
> Regards,
> Barry.
> 
> 
> > thanks
> > slaton
> > 
> > Slaton Lipscomb
> > Nogales Lab, Howard Hughes Medical Institute
> > http://cryoem.berkeley.edu
> > 
> > On Mon, 25 Feb 2008, slaton wrote:
> > 
> > > Thanks for the reply.
> > > 
> > > > Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ?
> > > 
> > > Presumably not - i'm using 2.6.17.11, and that information indicates the
> > > bug was fixed in 2.6.17.7.
> > > 
> > > I've attached the output from running ksymoops on messages.1. First
> > > crash/trace (Feb 21 19:xx) corresponds to the original XFS event; the
> > > second (Feb 22 15:xx) is the system going down when i tried to unmount the
> > > volume.
> > > 
> > > Here are the additional syslog msgs corresponding to the Feb 22 15:xx
> > > crash.
> > > 
> > > Feb 22 15:47:13 qln01 kernel: grsec: From 10.0.2.93: unmount of /dev/sda1
> > > by /bin/umount[umount:18604] uid/euid:0/0 gid/egid:0/0, parent
> > > /bin/bash[bash:31972] uid/euid:0/0 gid/egid:0/0
> > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from
> > > line 338 of file fs/xfs/xfs_rw.c.  Return address = 0xffffffff88173ce4
> > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from
> > > line 338 of file fs/xfs/xfs_rw.c.  Return address = 0xffffffff88173ce4
> > > Feb 22 15:47:28 qln01 kernel: BUG: soft lockup detected on CPU#0!
> > > 
> > > thanks
> > > slaton
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO)
  2008-02-28  3:16       ` Barry Naujok
  2008-03-01  4:09         ` slaton
@ 2008-03-04  1:29         ` slaton
  2008-03-04  1:36           ` Barry Naujok
  1 sibling, 1 reply; 9+ messages in thread
From: slaton @ 2008-03-04  1:29 UTC (permalink / raw)
  To: Barry Naujok; +Cc: xfs-oss

Barry,

I ran xfs_metadump (with -g -o -w options) on the partition and in 
addition to the file output this was written to stder:

xfs_metadump: suspicious count 22 in bmap extent 9 in dir2 ino 940064492
xfs_metadump: suspicious count 21 in bmap extent 8 in dir2 ino 1348807890
xfs_metadump: suspicious count 29 in bmap extent 9 in dir2 ino 2826081099
xfs_metadump: suspicious count 23 in bmap extent 54 in dir2 ino 3093231364
xfs_metadump: suspicious count 106 in bmap extent 4 in dir2 ino 3505884782

Should i go ahead and do a mount/umount (to replay log) and then 
xfs_repair, or would another course of action be recommended, given these 
potential problem inodes?

thanks
slaton

Slaton Lipscomb
Nogales Lab, Howard Hughes Medical Institute
http://cryoem.berkeley.edu

On Thu, 28 Feb 2008, Barry Naujok wrote:

> On Thu, 28 Feb 2008 09:44:04 +1100, slaton <slaton@berkeley.edu> wrote:
> 
> > Hi,
> > 
> > I'm still hoping for some help with this. Is any more information needed
> > in addition to the ksymoops output previously posted?
> > 
> > In particular i'd like to know if just remounting the filesystem (to
> > replay the journal), then unmounting and running xfs_repair is the best
> > course of action. In addition, i'd like to know what recommended
> > kernel/xfsprogs versions to use for best results.
> 
> I would get xfsprogs 2.9.4 (2.9.6 is not a good version with your kernel),
> ftp://oss.sgi.com/projects/xfs/previous/cmd_tars/xfsprogs_2.9.4-1.tar.gz
> 
> To be on the safe side, either make an entire copy of your drive to
> another device, or run "xfs_metadump -o /dev/sda1" to capture
> a metadata (no file data) of your filesystem.
> 
> Then run xfs_repair (mount/unmount maybe required if the log is dirty).
> 
> If the filesystem is in a bad state after the repair (eg. everything in
> lost+found), email the xfs_repair log and request further advise.
> 
> Regards,
> Barry.
> 
> 
> > thanks
> > slaton
> > 
> > Slaton Lipscomb
> > Nogales Lab, Howard Hughes Medical Institute
> > http://cryoem.berkeley.edu
> > 
> > On Mon, 25 Feb 2008, slaton wrote:
> > 
> > > Thanks for the reply.
> > > 
> > > > Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ?
> > > 
> > > Presumably not - i'm using 2.6.17.11, and that information indicates the
> > > bug was fixed in 2.6.17.7.
> > > 
> > > I've attached the output from running ksymoops on messages.1. First
> > > crash/trace (Feb 21 19:xx) corresponds to the original XFS event; the
> > > second (Feb 22 15:xx) is the system going down when i tried to unmount the
> > > volume.
> > > 
> > > Here are the additional syslog msgs corresponding to the Feb 22 15:xx
> > > crash.
> > > 
> > > Feb 22 15:47:13 qln01 kernel: grsec: From 10.0.2.93: unmount of /dev/sda1
> > > by /bin/umount[umount:18604] uid/euid:0/0 gid/egid:0/0, parent
> > > /bin/bash[bash:31972] uid/euid:0/0 gid/egid:0/0
> > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from
> > > line 338 of file fs/xfs/xfs_rw.c.  Return address = 0xffffffff88173ce4
> > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from
> > > line 338 of file fs/xfs/xfs_rw.c.  Return address = 0xffffffff88173ce4
> > > Feb 22 15:47:28 qln01 kernel: BUG: soft lockup detected on CPU#0!
> > > 
> > > thanks
> > > slaton
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO)
  2008-03-04  1:29         ` slaton
@ 2008-03-04  1:36           ` Barry Naujok
  2008-03-04  1:43             ` slaton
  0 siblings, 1 reply; 9+ messages in thread
From: Barry Naujok @ 2008-03-04  1:36 UTC (permalink / raw)
  To: slaton; +Cc: xfs-oss

On Tue, 04 Mar 2008 12:29:27 +1100, slaton <slaton@berkeley.edu> wrote:

> Barry,
>
> I ran xfs_metadump (with -g -o -w options) on the partition and in
> addition to the file output this was written to stder:
>
> xfs_metadump: suspicious count 22 in bmap extent 9 in dir2 ino 940064492
> xfs_metadump: suspicious count 21 in bmap extent 8 in dir2 ino 1348807890
> xfs_metadump: suspicious count 29 in bmap extent 9 in dir2 ino 2826081099
> xfs_metadump: suspicious count 23 in bmap extent 54 in dir2 ino  
> 3093231364
> xfs_metadump: suspicious count 106 in bmap extent 4 in dir2 ino  
> 3505884782
>
> Should i go ahead and do a mount/umount (to replay log) and then
> xfs_repair, or would another course of action be recommended, given these
> potential problem inodes?

Depending on the size of the directories, these numbers are probably fine.
I believe a mount/unmount/repair is the best course of action from here.

So be extra safe, run another metadump after mount/unmount before running
repair.

Barry.

> thanks
> slaton
>
> Slaton Lipscomb
> Nogales Lab, Howard Hughes Medical Institute
> http://cryoem.berkeley.edu
>
> On Thu, 28 Feb 2008, Barry Naujok wrote:
>
>> On Thu, 28 Feb 2008 09:44:04 +1100, slaton <slaton@berkeley.edu> wrote:
>>
>> > Hi,
>> >
>> > I'm still hoping for some help with this. Is any more information  
>> needed
>> > in addition to the ksymoops output previously posted?
>> >
>> > In particular i'd like to know if just remounting the filesystem (to
>> > replay the journal), then unmounting and running xfs_repair is the  
>> best
>> > course of action. In addition, i'd like to know what recommended
>> > kernel/xfsprogs versions to use for best results.
>>
>> I would get xfsprogs 2.9.4 (2.9.6 is not a good version with your  
>> kernel),
>> ftp://oss.sgi.com/projects/xfs/previous/cmd_tars/xfsprogs_2.9.4-1.tar.gz
>>
>> To be on the safe side, either make an entire copy of your drive to
>> another device, or run "xfs_metadump -o /dev/sda1" to capture
>> a metadata (no file data) of your filesystem.
>>
>> Then run xfs_repair (mount/unmount maybe required if the log is dirty).
>>
>> If the filesystem is in a bad state after the repair (eg. everything in
>> lost+found), email the xfs_repair log and request further advise.
>>
>> Regards,
>> Barry.
>>
>>
>> > thanks
>> > slaton
>> >
>> > Slaton Lipscomb
>> > Nogales Lab, Howard Hughes Medical Institute
>> > http://cryoem.berkeley.edu
>> >
>> > On Mon, 25 Feb 2008, slaton wrote:
>> >
>> > > Thanks for the reply.
>> > >
>> > > > Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ?
>> > >
>> > > Presumably not - i'm using 2.6.17.11, and that information  
>> indicates the
>> > > bug was fixed in 2.6.17.7.
>> > >
>> > > I've attached the output from running ksymoops on messages.1. First
>> > > crash/trace (Feb 21 19:xx) corresponds to the original XFS event;  
>> the
>> > > second (Feb 22 15:xx) is the system going down when i tried to  
>> unmount the
>> > > volume.
>> > >
>> > > Here are the additional syslog msgs corresponding to the Feb 22  
>> 15:xx
>> > > crash.
>> > >
>> > > Feb 22 15:47:13 qln01 kernel: grsec: From 10.0.2.93: unmount of  
>> /dev/sda1
>> > > by /bin/umount[umount:18604] uid/euid:0/0 gid/egid:0/0, parent
>> > > /bin/bash[bash:31972] uid/euid:0/0 gid/egid:0/0
>> > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called  
>> from
>> > > line 338 of file fs/xfs/xfs_rw.c.  Return address =  
>> 0xffffffff88173ce4
>> > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called  
>> from
>> > > line 338 of file fs/xfs/xfs_rw.c.  Return address =  
>> 0xffffffff88173ce4
>> > > Feb 22 15:47:28 qln01 kernel: BUG: soft lockup detected on CPU#0!
>> > >
>> > > thanks
>> > > slaton
>> >
>> >
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO)
  2008-03-04  1:36           ` Barry Naujok
@ 2008-03-04  1:43             ` slaton
  0 siblings, 0 replies; 9+ messages in thread
From: slaton @ 2008-03-04  1:43 UTC (permalink / raw)
  To: Barry Naujok; +Cc: xfs-oss

Unfortunately, mounting triggered another XFS_WANT_CORRUPTED_GOTO error:

XFS mounting filesystem sda1
Starting XFS recovery on filesystem: sda1 (logdev: internal)
XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1546 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffff882c3be6
Call Trace:
 [<ffffffff882c204b>] :xfs:xfs_free_ag_extent+0x18a/0x690
 [<ffffffff882c3be6>] :xfs:xfs_free_extent+0xa9/0xc9
 [<ffffffff882fabf5>] :xfs:xlog_recover_process_efi+0x117/0x149
 [<ffffffff882fac6d>] :xfs:xlog_recover_process_efis+0x46/0x6f
 [<ffffffff882fbb7e>] :xfs:xlog_recover_finish+0x16/0x98
 [<ffffffff882f4e68>] :xfs:xfs_log_mount_finish+0x19/0x1c
 [<ffffffff882fdb52>] :xfs:xfs_mountfs+0x892/0x99a
 [<ffffffff8830b663>] :xfs:kmem_alloc+0x67/0xcd
 [<ffffffff8830b6d2>] :xfs:kmem_zalloc+0x9/0x21
 [<ffffffff882fe7a0>] :xfs:xfs_mru_cache_create+0x127/0x188
 [<ffffffff8830376e>] :xfs:xfs_mount+0x333/0x3b4
 [<ffffffff88314452>] :xfs:xfs_fs_fill_super+0x0/0x1ab
 [<ffffffff883144d0>] :xfs:xfs_fs_fill_super+0x7e/0x1ab
 [<ffffffff80449fe3>] __down_write_nested+0x12/0x9a
 [<ffffffff802a131e>] get_filesystem+0x12/0x35
 [<ffffffff8028e8aa>] sget+0x379/0x38e
 [<ffffffff8028ef31>] set_bdev_super+0x0/0xf
 [<ffffffff8028f06a>] get_sb_bdev+0x11d/0x168
 [<ffffffff8028f296>] vfs_kern_mount+0x94/0x124
 [<ffffffff8028f363>] do_kern_mount+0x3d/0xee
 [<ffffffff802a35ff>] do_mount+0x6e5/0x738
 [<ffffffff80275743>] handle_mm_fault+0x385/0x789
 [<ffffffff8030dfe9>] __up_read+0x10/0x8a
 [<ffffffff8022341c>] do_page_fault+0x453/0x7a3
 [<ffffffff802757bd>] handle_mm_fault+0x3ff/0x789
 [<ffffffff80271188>] zone_statistics+0x41/0x63
 [<ffffffff8026aa1b>] __alloc_pages+0x6a/0x2d4
 [<ffffffff802a3903>] sys_mount+0x8b/0xce
 [<ffffffff8020bdde>] system_call+0x7e/0x83
Ending XFS recovery on filesystem: sda1 (logdev: internal)

Haven't tried to unmount or anything else, yet. How to proceed?

Just to reiterate, currently using kernel 2.6.23.16 and xfsprogs 2.9.4-1.

thanks
slaton

Slaton Lipscomb
Nogales Lab, Howard Hughes Medical Institute
http://cryoem.berkeley.edu

On Tue, 4 Mar 2008, Barry Naujok wrote:

> On Tue, 04 Mar 2008 12:29:27 +1100, slaton <slaton@berkeley.edu> wrote:
> 
> > Barry,
> > 
> > I ran xfs_metadump (with -g -o -w options) on the partition and in
> > addition to the file output this was written to stder:
> > 
> > xfs_metadump: suspicious count 22 in bmap extent 9 in dir2 ino 940064492
> > xfs_metadump: suspicious count 21 in bmap extent 8 in dir2 ino 1348807890
> > xfs_metadump: suspicious count 29 in bmap extent 9 in dir2 ino 2826081099
> > xfs_metadump: suspicious count 23 in bmap extent 54 in dir2 ino 3093231364
> > xfs_metadump: suspicious count 106 in bmap extent 4 in dir2 ino 3505884782
> > 
> > Should i go ahead and do a mount/umount (to replay log) and then
> > xfs_repair, or would another course of action be recommended, given these
> > potential problem inodes?
> 
> Depending on the size of the directories, these numbers are probably fine.
> I believe a mount/unmount/repair is the best course of action from here.
> 
> So be extra safe, run another metadump after mount/unmount before running
> repair.
> 
> Barry.
> 
> > thanks
> > slaton
> > 
> > Slaton Lipscomb
> > Nogales Lab, Howard Hughes Medical Institute
> > http://cryoem.berkeley.edu
> > 
> > On Thu, 28 Feb 2008, Barry Naujok wrote:
> > 
> > > On Thu, 28 Feb 2008 09:44:04 +1100, slaton <slaton@berkeley.edu> wrote:
> > > 
> > > > Hi,
> > > > 
> > > > I'm still hoping for some help with this. Is any more information needed
> > > > in addition to the ksymoops output previously posted?
> > > > 
> > > > In particular i'd like to know if just remounting the filesystem (to
> > > > replay the journal), then unmounting and running xfs_repair is the best
> > > > course of action. In addition, i'd like to know what recommended
> > > > kernel/xfsprogs versions to use for best results.
> > > 
> > > I would get xfsprogs 2.9.4 (2.9.6 is not a good version with your kernel),
> > > ftp://oss.sgi.com/projects/xfs/previous/cmd_tars/xfsprogs_2.9.4-1.tar.gz
> > > 
> > > To be on the safe side, either make an entire copy of your drive to
> > > another device, or run "xfs_metadump -o /dev/sda1" to capture
> > > a metadata (no file data) of your filesystem.
> > > 
> > > Then run xfs_repair (mount/unmount maybe required if the log is dirty).
> > > 
> > > If the filesystem is in a bad state after the repair (eg. everything in
> > > lost+found), email the xfs_repair log and request further advise.
> > > 
> > > Regards,
> > > Barry.
> > > 
> > > 
> > > > thanks
> > > > slaton
> > > > 
> > > > Slaton Lipscomb
> > > > Nogales Lab, Howard Hughes Medical Institute
> > > > http://cryoem.berkeley.edu
> > > > 
> > > > On Mon, 25 Feb 2008, slaton wrote:
> > > > 
> > > > > Thanks for the reply.
> > > > >
> > > > > > Are you hitting http://oss.sgi.com/projects/xfs/faq.html#dir2 ?
> > > > >
> > > > > Presumably not - i'm using 2.6.17.11, and that information indicates
> > > > the
> > > > > bug was fixed in 2.6.17.7.
> > > > >
> > > > > I've attached the output from running ksymoops on messages.1. First
> > > > > crash/trace (Feb 21 19:xx) corresponds to the original XFS event; the
> > > > > second (Feb 22 15:xx) is the system going down when i tried to unmount
> > > > the
> > > > > volume.
> > > > >
> > > > > Here are the additional syslog msgs corresponding to the Feb 22 15:xx
> > > > > crash.
> > > > >
> > > > > Feb 22 15:47:13 qln01 kernel: grsec: From 10.0.2.93: unmount of
> > > > /dev/sda1
> > > > > by /bin/umount[umount:18604] uid/euid:0/0 gid/egid:0/0, parent
> > > > > /bin/bash[bash:31972] uid/euid:0/0 gid/egid:0/0
> > > > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from
> > > > > line 338 of file fs/xfs/xfs_rw.c.  Return address = 0xffffffff88173ce4
> > > > > Feb 22 15:47:14 qln01 kernel: xfs_force_shutdown(sda1,0x1) called from
> > > > > line 338 of file fs/xfs/xfs_rw.c.  Return address = 0xffffffff88173ce4
> > > > > Feb 22 15:47:28 qln01 kernel: BUG: soft lockup detected on CPU#0!
> > > > >
> > > > > thanks
> > > > > slaton
> > > > 
> > > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-03-04  1:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-25 22:20 Linux XFS filesystem corruption (XFS_WANT_CORRUPTED_GOTO) slaton
2008-02-25 22:40 ` Eric Sandeen
2008-02-26  7:54   ` slaton
2008-02-27 22:44     ` slaton
2008-02-28  3:16       ` Barry Naujok
2008-03-01  4:09         ` slaton
2008-03-04  1:29         ` slaton
2008-03-04  1:36           ` Barry Naujok
2008-03-04  1:43             ` slaton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox