From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bas van Schaik Subject: EXT3 filesystem corruptions on AoE, RAID and LVM? Date: Fri, 15 Dec 2006 23:35:13 +0100 Message-ID: <45832321.4010305@tuxes.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Received: from a-eskwadraat.nl ([131.211.39.72]:38780 "EHLO a-eskwadraat.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030353AbWLOXAX (ORCPT ); Fri, 15 Dec 2006 18:00:23 -0500 Received: from niels.localdomain ([10.14.0.4] ident=sjeik) by a-eskwadraat.nl with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.50) id 1GvLe5-0003Qb-Us for linux-ext4@vger.kernel.org; Fri, 15 Dec 2006 23:35:14 +0100 To: Linux extfs development Sender: linux-ext4-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org Hi all, I'm maintaining two clusters, with machines running a mix between Debian Stable with Etch-kernels to have AoE (ATA over Ethernet support). Machines in these clusters "export" their harddisks using AoE, and one machine in the cluster imports those using the kernel "aoe"-module. On top of those imported devices, multiple RAID5-arrays are created, and LVM is running on top of RAID, ext3 on the LVM LV. After a few days, I get EXT3-errors. like this: >> EXT3-fs: mounted filesystem with ordered data mode. >> EXT3-fs error (device loop0): ext3_free_blocks_sb: bit already cleared for block 412186 >> Aborting journal on device loop0. >> EXT3-fs error (device loop0) in ext3_free_blocks_sb: Journal has aborted >> EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has aborted >> EXT3-fs error (device loop0) in ext3_truncate: Journal has aborted >> EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has aborted >> EXT3-fs error (device loop0) in ext3_orphan_del: Journal has aborted >> EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has aborted >> EXT3-fs error (device loop0) in ext3_delete_inode: Journal has aborted >> __journal_remove_journal_head: freeing b_committed_data >> __journal_remove_journal_head: freeing b_committed_data (...) >> __journal_remove_journal_head: freeing b_committed_data >> ext3_abort called. >> EXT3-fs error (device loop0): ext3_journal_start_sb: Detected aborted journal >> Remounting filesystem read-only >> __journal_remove_journal_head: freeing b_committed_data FSCK'ing the filesystem fixes those errors, but after a few days (or weeks, depending on the fs load) the corruptions appear again. I might be worth telling you that there are no other suspicious messages in my logs. I saw some other discussions on the mailinglist, but I don't think their related to my problems. I don't know if I need to file a bug on this, neither do I know which details you need to help me solve this problem. So for now I just want to here your thoughts. FYI: Kernel information for cluster 1: >> root@infinity:~# uname -a >> Linux infinity 2.6.17-2-686 #1 SMP Wed Sep 13 16:34:10 UTC 2006 i686 GNU/Linux And cluster 2: >> dust:~# uname -a >> Linux dust 2.6.18-3-686 #1 SMP Thu Nov 23 20:49:23 UTC 2006 i686 GNU/Linux Note that these are not vanilla kernels, but Debian kernels. However, AFAIK there are no Debian-specific patches to AoE, ext3, LVM or RAID. Thanks for your replies! Best regards, -- Bas van Schaik