From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?ISO-8859-15?Q?Luk=E1=A8_Czerner?= Subject: Re: ext4 won't mount - fsck required - 2nd fsck in less than a week Date: Tue, 11 Sep 2012 13:59:47 -0400 (EDT) Message-ID: References: <20120910024709.GA3439@thunk.org> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: "Theodore Ts'o" , linux-ext4@vger.kernel.org To: Terry Return-path: Received: from mx1.redhat.com ([209.132.183.28]:47895 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752121Ab2IKR7w (ORCPT ); Tue, 11 Sep 2012 13:59:52 -0400 In-Reply-To: Sender: linux-ext4-owner@vger.kernel.org List-ID: On Tue, 11 Sep 2012, Terry wrote: > Date: Tue, 11 Sep 2012 11:22:27 -0500 > From: Terry > To: Theodore Ts'o > Cc: linux-ext4@vger.kernel.org > Subject: Re: ext4 won't mount - fsck required - 2nd fsck in less than a week > > On Mon, Sep 10, 2012 at 8:56 AM, Terry wrote: > > On Mon, Sep 10, 2012 at 8:48 AM, Terry wrote: > >> On Sun, Sep 9, 2012 at 10:18 PM, Terry wrote: > >>> On Sun, Sep 9, 2012 at 9:53 PM, Terry wrote: > >>>> On Sun, Sep 9, 2012 at 9:47 PM, Theodore Ts'o wrote: > >>>>> On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote: > >>>>>> > >>>>>> As the subject says, we have a 15 TB fsck drive that won't mount with > >>>>>> these errors: > >>>>>> > >>>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors: > >>>>>> Inode bitmap for group 3200 not in group (block 4161027887)! > >>>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted! > >>>>> > >>>>> These indicate a very basic file system corruption where the block > >>>>> group descriptors are corrupted. E2fsck will complain immediately > >>>>> upon seeing this sort of fs inconsistency, and the first thing it will > >>>>> try to do is fix it. > >>>>> > >>>>>> We did a proactive fsck on Tuesday of last week because it was > >>>>>> starting to give filesystem errors. It ran through and mounted fine. > >>>>>> > >>>>>> The filesystem lives on an equallogic SAN spread across 36 drives. > >>>>>> Could this be something with the physical layer or is it not abnormal > >>>>>> to have to run multiple rounds of fsck to fully fix an issue? > >>>>> > >>>>> This is most probably a hardware problem; normally e2fsck will fix > >>>>> file system corruptions (and certainly problems such as corrupt block > >>>>> group scriptors) in a single pass. If e2fsck finished and the file > >>>>> system mounted fine last week, and now you're getting this kind of > >>>>> error, it basically screams some kind of physical layer problem, or > >>>>> perhaps a bad hard drive, or perhaps the SAN disk is getting > >>>>> incorrectly written to by some other system, etc. > >>>>> > >>>>> - Ted > >>>> > >>>> Thanks for the reply. It is part of a RHEL cluster but we did not > >>>> have any situations where multiple systems mounted the filesystem. It > >>>> is a an old SAN so perhaps we have a physical issue. We'll see what it > >>>> happens with this pass. > >>> > >>> While I am waiting for fsck to finish, another thought. This > >>> filesystem contains a lot of small files. 35,867,642 files to be > >>> exact. Anything else I should check or know to ensure a smooth > >>> operation for these types of filesystems? I formatted them with > >>> standard RHEL 6 options. > >> > >> FSCK completed fixing a lot of things. The file system then mounted > >> without any errors. We are still getting these types of errors in > >> /var/log/messages: > >> > >> Sep 10 08:40:49 narf kernel: EXT4-fs error (device dm-6): > >> ext4_dx_find_entry: bad entry in directory #743966900: directory entry > >> across blocks - block=2975876794offset=0(946176), inode=1414751737, > >> rec_len=45724, name_len=206 > >> > >> Thoughts? > > > > Hold that thought. This is another filesystem. Let me fix that one > > then come back to this problem if it still exists. > > Ok, fixed the other filesystem (dm-6) yesterday. Today, getting these > errors still on it: > Sep 11 11:17:47 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 90851: 0 blocks in bitmap, 5048 > in gd > Sep 11 11:18:17 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 90670: 0 blocks in bitmap, 6665 > in gd > Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 37589: 420 blocks in bitmap, > 8302 in gd > Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 71777: 7071 blocks in bitmap, > 23711 in gd > Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 71778: 10664 blocks in bitmap, > 26624 in gd > Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 13499: 9884 blocks in bitmap, > 1256 in gd > Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 13498: 383 blocks in bitmap, > 384 in gd > Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 13496: 2356 blocks in bitmap, > 10453 in gd > Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 13497: 3593 blocks in bitmap, > 5641 in gd > Sep 11 11:19:50 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 49528: 25850 blocks in bitmap, > 29946 in gd Hi, what RHEL version are you using, or even better what kernel version are you using ? If you have RHEL subscription, you should definitely Red Hat about the issue. Thanks! -Lukas > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >