public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Repeated XFS corruption -Corruption of in-memory data detected
@ 2007-07-30 16:10 Ryan Bair
  2007-07-31  1:53 ` David Chinner
  0 siblings, 1 reply; 2+ messages in thread
From: Ryan Bair @ 2007-07-30 16:10 UTC (permalink / raw)
  To: linux-kernel

Kernel: 2.6.18-4-amd64 (Debian 2.6.18.dfsg.1-12etch2) Debian Etch
System: Dell PowerEdge 1850
Processor: 3.2 GHz Intel Xeon w/ microcode v1.14a, Hyperthreading disabled.
RAM: 2x1GB ECC DDR-400
RAID Controller: Dell PERC5/E using megaraid driver

I got another unexpected error on my XFS partition today. I was able
to reboot the system normally and the journal recovered on the
following mount. Shortly thereafter, the error occurred again. After
this the filesystem was no longer able to be mounted as the error
would occur immediately.

The volume is on a 9.5TB LVM2 volume on a Dell MD1000 loaded with 15
750GB drives in a RAID5 set. Writeback is disabled. Memtest86+ was run
on this system for 48 hours without fault. The system is otherwise
stable.

XFS was able to repair the damage, but previously the drive returned
to its corrupted state within a few hours of heavy I/O.

Here is the message:
SGI XFS with ACLs, security attributes, realtime, large block/inode
numbers, no debug enabled
 SGI XFS Quota Management subsystem
 Filesystem "dm-3": Disabling barriers, not supported by the underlying device
 XFS mounting filesystem dm-3
 Starting XFS recovery on filesystem: dm-3 (logdev: internal)
 XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1561 of file
fs/xfs/xfs_alloc.c.  Caller 0xffffffff881fa6a7

 Call Trace:
  [<ffffffff881f8db0>] :xfs:xfs_free_ag_extent+0x19f
/0x67f
  [<ffffffff881fa6a7>] :xfs:xfs_free_extent+0xa9/0xc9
  [<ffffffff88236200>] :xfs:xfs_trans_log_efd_extent+0x1c/0x4b
  [<ffffffff8822f355>] :xfs:xlog_recover_finish+0x157/0x241
  [<ffffffff88232dc4>] :xfs:xfs_mountfs+0xa29/0xc35
  [<ffffffff8020bce8>] _atomic_dec_and_lock+0x39/0x57
  [<ffffffff88238b68>] :xfs:xfs_mount+0x762/0x83b
  [<ffffffff8824816b>] :xfs:xfs_fs_fill_super+0x0/0x1e5
  [<ffffffff882481e9>] :xfs:xfs_fs_fill_super+0x7e/0x1e5
  [<ffffffff8025e244>] __down_write_nested+0x12/0x9a
  [<ffffffff802c3941>] get_filesystem+0x12/0x3b
  [<ffffffff802bc95d>] sget+0x383/0x395
  [<ffffffff802bc287>] set_bdev_super+0x0/0xf
  [<ffffffff802bc296>] test_bdev_super+0x0/0xd
  [<ffffffff802bd299>] get_sb_bdev+0xf8/0x152
  [<ffffffff802bcc48>] vfs_kern_mount+0x93/0x11a
  [<ffffffff802bcd11>] do_kern_mount+0x36/0x4d
  [<ffffffff802c52f7>] do_mount+0x68c/0x6ff
  [<ffffffff8022ae6d>] mntput_no_expire+0x19/0x8b
  [<ffffffff8020dd5f>] link_path_walk+0xd3/0xe5
  [<ffffffff8020c5d8>] bit_waitqueue+0x38/0x9b
  [<ffffffff88170033>] :ext3:ext3_delete_inode+0x0/0xd5
  [<ffffffff8023a1ca>] do_unlinkat+0xef/0x148
  [<ffffffff802aaccd>] zone_statistics+0x3e/0x6d
  [<ffffffff802265f0>] vfs_stat_fd+0x1b/0x4a
  [<ffffffff8020de4a>] __alloc_pages+0x5c/0x2a9
  [<ffffffff8023a1ca>] do_unlinkat+0xef/0x148
  [<ffffffff80248310>] sys_mount+0x8a/0xd7
  [<ffffffff802584d6>] system_call+0x7e/0x83

 XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1561 of file
fs/xfs/xfs_alloc.c.  Caller 0xffffffff881fa6a7

 Call Trace:
  [<ffffffff881f8db0>] :xfs:xfs_free_ag_extent+0x19f/0x67f
  [<ffffffff881fa6a7>] :xfs:xfs_free_extent+0xa9/0xc9
  [<ffffffff88207416>] :xfs:xfs_bmap_finish+0xf0/0x169
  [<ffffffff8822500a>] :xfs:xfs_itruncate_finish+0x172/0x2b3
  [<ffffffff8823e436>] :xfs:xfs_inactive+0x22e/0x823
  [<ffffffff88242563>] :xfs:xfs_buf_read_flags+0x12/0x7f
  [<ffffffff88235ea6>] :xfs:xfs_trans_read_buf+0x4c/0x2c7
  [<ffffffff88247d9c>] :xfs:xfs_fs_clear_inode+0xa5/0xec
  [<ffffffff80220e47>] clear_inode+0xc5/0xf6
  [<ffffffff8022d3ad>] generic_delete_inode+0xde/0x143
  [<ffffffff8822f010>] :xfs:xlog_recover_process_iunlinks+0x1de/0x3cc
  [<ffffffff8822f3d6>] :xfs:xlog_recover_finish+0x1d8/0x241
  [<ffffffff88232dc4>] :xfs:xfs_mountfs+0xa29/0xc35
  [<ffffffff8020bce8>] _atomic_dec_and_lock+0x39/0x57
  [<ffffffff88238b68>] :xfs:xfs_mount+0x762/0x83b
  [<ffffffff8824816b>] :xfs:xfs_fs_fill_super+0x0/0x1e5
  [<ffffffff882481e9>] :xfs:xfs_fs_fill_super+0x7e/0x1e5
  [<ffffffff8025e244>] __down_write_nested+0x12/0x9a
  [<ffffffff802c3941>] get_filesystem+0x12/0x3b
  [<ffffffff802bc95d>] sget+0x383/0x395
  [<ffffffff802bc287>] set_bdev_super+0x0/0xf
  [<ffffffff802bc296>] test_bdev_super+0x0/0xd
  [<ffffffff802bd299>] get_sb_bdev+0xf8/0x152
  [<ffffffff802bcc48>] vfs_kern_mount+0x93/0x11a
  [<ffffffff802bcd11>] do_kern_mount+0x36/0x4d
  [<ffffffff802c52f7>] do_mount+0x68c/0x6ff
  [<ffffffff8022ae6d>] mntput_no_expire+0x19/0x8b
  [<ffffffff8020dd5f>] link_path_walk+0xd3/0xe5
  [<ffffffff8020c5d8>] bit_waitqueue+0x38/0x9b
  [<ffffffff88170033>] :ext3:ext3_delete_inode+0x0/0xd5
  [<ffffffff8023a1ca>] do_unlinkat+0xef/0x148
  [<ffffffff802aaccd>] zone_statistics+0x3e/0x6d
  [<ffffffff802265f0>] vfs_stat_fd+0x1b/0x4a
  [<ffffffff8020de4a>] __alloc_pages+0x5c/0x2a9
  [<ffffffff8023a1ca>] do_unlinkat+0xef/0x148
  [<ffffffff80248310>] sys_mount+0x8a/0xd7
  [<ffffffff802584d6>] system_call+0x7e/0x83

 xfs_force_shutdown(dm-3,0x8) called from line 4267 of file
fs/xfs/xfs_bmap.c.  Return address = 0xffffffff88207453
 Filesystem "dm-3": Corruption of in-memory data detected.  Shutting
down filesystem: dm-3
 Please umount the filesystem, and rectify the problem(s)
 Ending XFS recovery on filesystem: dm-3 (logdev: internal)

Let me know if more information is required.
Please CC me as I am not subscribed to this list.

Thank you

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Repeated XFS corruption -Corruption of in-memory data detected
  2007-07-30 16:10 Repeated XFS corruption -Corruption of in-memory data detected Ryan Bair
@ 2007-07-31  1:53 ` David Chinner
  0 siblings, 0 replies; 2+ messages in thread
From: David Chinner @ 2007-07-31  1:53 UTC (permalink / raw)
  To: Ryan Bair; +Cc: linux-kernel, xfs

[cc xfs@oss.sgi.com]

On Mon, Jul 30, 2007 at 12:10:52PM -0400, Ryan Bair wrote:
> Kernel: 2.6.18-4-amd64 (Debian 2.6.18.dfsg.1-12etch2) Debian Etch
> System: Dell PowerEdge 1850
> Processor: 3.2 GHz Intel Xeon w/ microcode v1.14a, Hyperthreading disabled.
> RAM: 2x1GB ECC DDR-400
> RAID Controller: Dell PERC5/E using megaraid driver
> 
> I got another unexpected error on my XFS partition today. I was able
> to reboot the system normally and the journal recovered on the
> following mount. Shortly thereafter, the error occurred again. After
> this the filesystem was no longer able to be mounted as the error
> would occur immediately.
> 
> The volume is on a 9.5TB LVM2 volume on a Dell MD1000 loaded with 15
> 750GB drives in a RAID5 set. Writeback is disabled. Memtest86+ was run
> on this system for 48 hours without fault. The system is otherwise
> stable.

<sigh>

You're the second person today to report a software RAID5+XFS corruption on
the 2.6.18-4 Debian kernel. Almost the same signature as well - that is a
corrupted free space btree.

> XFS was able to repair the damage, but previously the drive returned
> to its corrupted state within a few hours of heavy I/O.

The other report was a shutdown before corruption got to disk,
so maybe they are different problems.

Can you post the repair output so we can see what the damage was?
Also, can you post your md/dm config so I can see if I can recreate
a similar config?

Also, seeing as the previous report was caught before corruption
got to disk, I suspected memory corruption of some kind. Can
you enable slab, vm and filesystem debugging for you kernel and
run with that?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-07-31  1:53 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-30 16:10 Repeated XFS corruption -Corruption of in-memory data detected Ryan Bair
2007-07-31  1:53 ` David Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox