From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753510Ab1HBO1T (ORCPT ); Tue, 2 Aug 2011 10:27:19 -0400 Received: from li9-11.members.linode.com ([67.18.176.11]:53501 "EHLO test.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753250Ab1HBO1L (ORCPT ); Tue, 2 Aug 2011 10:27:11 -0400 Date: Tue, 2 Aug 2011 10:27:08 -0400 From: "Ted Ts'o" To: Luke Kenneth Casson Leighton Cc: linux-kernel@vger.kernel.org Subject: Re: corrupted ext4 1000gb filesystem (2.6.32, debian stable) Message-ID: <20110802142708.GA2967@thunk.org> Mail-Followup-To: Ted Ts'o , Luke Kenneth Casson Leighton , linux-kernel@vger.kernel.org References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on test.thunk.org); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 02, 2011 at 02:14:19PM +0100, Luke Kenneth Casson Leighton wrote: > On Tue, Aug 2, 2011 at 1:15 AM, Luke Kenneth Casson Leighton > wrote: > > > ok, um... i have a bit more information about this situation, to > > report.  two consecutive runs of fsck.ext4, and the filesystem still > > reports errors after the first run "corrected" all errors. i'd say > > that was a bit serious. > > rright - apologies but i've located the likely source of the problem > - e2fsck. the issue is that the bitmaps for the 3-way RAID1 mirror > were corrupted. thus, the filesystem would be fixed by e2fsck, only > to be completely buggered up by picking wildly inappropriate sections > of the drive... that presumably by either bad luck or by a powercut > and writes occurring at the time happened to be on inode blocks. E2fsck doesn't depend on the bitmaps; those are regenerated based on the information from the inode tables. Assuming that the disks are stable --- that is, a read from a block returns the same contents all the time, and writes are not lost (i.e., after a write, reads to that block return the written data consistently), then there should not be any corruptions found after the first run of e2fsck fixes all errors. That being said, there have been cases where that's not true, and I consider that a bug in e2fsck. *However*, if you have a RAID1 setup where the data on the disks are consistent, this can be the cause of much mischief. Depending on which disk you read from the mirror, you might get different results. Once that's the case, all bets with e2fsck are off. I suggest you make sure that your RAID1 mirror is stable first of all; in general, you *have* to fix problems with the storage stack from the lowest level on up. First make sure the hard drives are all sane; then make sure the partition table and/or LVM setups are sane; then make sure any RAID setups are sane; and only *then* run a filesystem-level checker. This is true regardless of what file system you use. Finally, I strongly recommend that when you are doing this kind of repair work, that you save a copy of everything you do useing a program like "script". A transcript of the e2fsck output can be critally useful. Reviewing the transcript can also be useful in identifying mistakes that you might have made during the recovery process. Regards, - Ted P.S. Note that if you are running e2fsck, and you haven't mounted the disk yet, if you are seeing failures after a second run of e2fsck, then it obviously can be a failing in the ext4 kernel code.